[jira] [Commented] (YARN-9473) [Umbrella] Support Vector Engine ( a new accelerator hardware) based on pluggable device framework
[ https://issues.apache.org/jira/browse/YARN-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829931#comment-16829931 ] Hudson commented on YARN-9473: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16479 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16479/]) YARN-9476. [YARN-9473] Create unit tests for VE plugin. Contributed by (ztang: rev 7fbaa7d66f3ff40b80b70d4563545035e91e44a6) * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/com/nec/TestNECVEPlugin.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/com/nec/NECVEPlugin.java > [Umbrella] Support Vector Engine ( a new accelerator hardware) based on > pluggable device framework > -- > > Key: YARN-9473 > URL: https://issues.apache.org/jira/browse/YARN-9473 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Zhankun Tang >Assignee: Peter Bacsko >Priority: Major > > As the heterogeneous computation trend rises, new acceleration hardware like > GPU, FPGA is used to satisfy various requirements. > And a new hardware Vector Engine (VE) which released by NEC company is > another example. The VE is like GPU but has different characteristics. It's > suitable for machine learning and HPC due to better memory bandwidth and no > PCIe bottleneck. > Please Check here for more VE details: > [https://www.nextplatform.com/2017/11/22/deep-dive-necs-aurora-vector-engine/] > [https://www.hotchips.org/hc30/2conf/2.14_NEC_vector_NEC_SXAurora_TSUBASA_HotChips30_finalb.pdf] > As we know, YARN-8851 is a pluggable device framework which provides an easy > way to develop a plugin for such new accelerators. This JIRA proposes to > develop a new VE plugin based on that framework and be implemented as current > GPU's "NvidiaGPUPluginForRuntimeV2" plugin. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9476) Create unit tests for VE plugin
[ https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829930#comment-16829930 ] Hudson commented on YARN-9476: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16479 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16479/]) YARN-9476. [YARN-9473] Create unit tests for VE plugin. Contributed by (ztang: rev 7fbaa7d66f3ff40b80b70d4563545035e91e44a6) * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/com/nec/TestNECVEPlugin.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/resourceplugin/com/nec/NECVEPlugin.java > Create unit tests for VE plugin > --- > > Key: YARN-9476 > URL: https://issues.apache.org/jira/browse/YARN-9476 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9476-001.patch, YARN-9476-002.patch, > YARN-9476-003.patch, YARN-9476-004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart
[ https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829927#comment-16829927 ] Hadoop QA commented on YARN-9510: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 41s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 3s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 14 new + 241 unchanged - 0 fixed = 255 total (was 241) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 49s{color} | {color:red} hadoop-yarn-api in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 37s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 44s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 31s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 25m 46s{color} | {color:red} hadoop-yarn-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | TEST-TestYarnConfigurationFields | | | hadoop.yarn.client.cli.TestRMAdminCLI | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce
[jira] [Commented] (YARN-9476) Create unit tests for VE plugin
[ https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829917#comment-16829917 ] Zhankun Tang commented on YARN-9476: [~snemeth] Thanks for the review! [~pbacsko] Thanks for the patch! +1. The patch is committed. > Create unit tests for VE plugin > --- > > Key: YARN-9476 > URL: https://issues.apache.org/jira/browse/YARN-9476 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9476-001.patch, YARN-9476-002.patch, > YARN-9476-003.patch, YARN-9476-004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart
[ https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie updated YARN-9510: -- Attachment: YARN-9510_1.patch > Proxyuser access timeline and getdelegationtoken failed without Timeline > server restart > --- > > Key: YARN-9510 > URL: https://issues.apache.org/jira/browse/YARN-9510 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 3.1.0 >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Major > Attachments: YARN-9510_1.patch > > > We add a proxyuser by changing "hadoop.proxyuser.xx.yy",if without restart > timeline server.YARN job will fail and throws : > {code:java} > Caused by: > org.apache.hadoop.security.authentication.client.AuthenticationException: > Authentication failed, URL: > http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, > status: 403, message: Forbidden > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) > at > org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) > at > org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) > {code} > Seems that proxyuser info in timeline server has not been refreshed. > In production cluster, we sometimes add a new proxy user during runtime, and > expect that impersonation takes effect after execute a command like > "...refreshSuperUserGroupsConfiguration", without restart timeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829704#comment-16829704 ] Sudhir Babu Pothineni edited comment on YARN-9520 at 4/29/19 8:54 PM: -- Lets say 100 applications in the queue A, 10 applications are running occupied 100% of the cluster, In my case they should keep running even after Fairshare timeout, only after as soon as the containers finished from running jobs, remaining jobs should be allocated. But I think these running jobs are preempted by waiting jobs after Fair share timeout, Preemption enabled because Queue B or C can be active any time. If I put maximum applications per queue 10, cluster is under utilized. Capacity scheduler has inter-queue-preemption.enabled, intra-queue-preemption.enabled, is there any specific reason they are not there in fair scheduler? was (Author: sbpothineni): Lets say 100 applications in the queue A, 10 applications are running occupied 100% of the cluster, In my case they should keep running even after Fairshare timeout, only after as soon as the containers finished from running jobs, remaining jobs should be allocated. But I think these running jobs are preempted by waiting jobs after Fair share timeout, Preemption enabled because Queue B or C can be active any time. If I put maximum applications per queue 10, cluster is under utilized. Capacity scheduler has inter-queue-preemption.enabled, intra-queue-preemption.enabled, is there any specifica reason they are not there in fair scheduler? > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829704#comment-16829704 ] Sudhir Babu Pothineni commented on YARN-9520: - Lets say 100 applications in the queue A, 10 applications are running occupied 100% of the cluster, In my case they should keep running even after Fairshare timeout, only after as soon as the containers finished from running jobs, remaining jobs should be allocated. But I think these running jobs are preempted by waiting jobs after Fair share timeout, Preemption enabled because Queue B or C can be active any time. If I put maximum applications per queue 10, cluster is under utilized. Capacity scheduler has inter-queue-preemption.enabled, intra-queue-preemption.enabled, is there any specifica reason they are not there in fair scheduler? > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829685#comment-16829685 ] Yufei Gu commented on YARN-9520: Could you elaborate the user case? > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudhir Babu Pothineni updated YARN-9520: Summary: fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options (was: fair scheduler: inter-queue-preemption-enabled, intra-queue-preemption-enabled options) > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9520) fair scheduler: inter-queue-preemption-enabled, intra-queue-preemption-enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudhir Babu Pothineni updated YARN-9520: Summary: fair scheduler: inter-queue-preemption-enabled, intra-queue-preemption-enabled options (was: fair scheduler: inter-queue-preemption-enabled, intra-queue-preemption-eblaned options) > fair scheduler: inter-queue-preemption-enabled, > intra-queue-preemption-enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9520) fair scheduler: inter-queue-preemption-enabled, intra-queue-preemption-eblaned options
Sudhir Babu Pothineni created YARN-9520: --- Summary: fair scheduler: inter-queue-preemption-enabled, intra-queue-preemption-eblaned options Key: YARN-9520 URL: https://issues.apache.org/jira/browse/YARN-9520 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Sudhir Babu Pothineni Its good to have inter-queue-preemption-enabled, intra-queue-preemption-enabled options for fair scheduler, i have a use case where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829570#comment-16829570 ] Tan, Wangda commented on YARN-9517: --- [~shurong.mai], thanks for putting a patch. However, I'm not sure why you closed this Jira? Is the patch or fix already in mentioned branches? > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > yarn-site.xml > {code:java} > > yarn.log-aggregation-enable > false > > {code} > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829328#comment-16829328 ] Jim Brennan commented on YARN-9518: --- [~shurong.mai], are you running with the latest code (trunk)? The patch you put up looks like it is based on a version of CgroupsLCEResourcesHandler() from before 5/19/2017 (YARN-5301). Can you verify the problem exists in trunk? > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at >
[jira] [Commented] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config
[ https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829306#comment-16829306 ] Hadoop QA commented on YARN-9519: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 23s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 50s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 55m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9519 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967377/YARN-9519.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 258c4f499640 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1cef194 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24029/testReport/ | | Max. process+thread count | 455 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24029/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > TFile log
[jira] [Updated] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config
[ https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal updated YARN-9519: - Attachment: YARN-9519.001.patch > TFile log aggregation file format is insensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config > > > Key: YARN-9519 > URL: https://issues.apache.org/jira/browse/YARN-9519 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > Attachments: YARN-9519.001.patch > > > The TFile log aggregation file format is not sensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config. > In {{LogAggregationTFileController$initInternal}}: > {code:java} > this.remoteRootLogDir = new Path( > conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); > {code} > So the remoteRootLogDir is only aware of the > yarn.nodemanager.remote-app-log-dir config, while other file format, like > IFile defaults to the file format config, so its priority is bigger. > From {{LogAggregationIndexedFileController$initInternal}}: > {code:java} > String remoteDirStr = String.format( > YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, > this.fileControllerName); > String remoteDir = conf.get(remoteDirStr); > if (remoteDir == null || remoteDir.isEmpty()) { > remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); > } > {code} > (Where these configs are: ) > {code:java} > public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT > = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; > public static final String NM_REMOTE_APP_LOG_DIR = > NM_PREFIX + "remote-app-log-dir"; > {code} > I suggest TFile should try to obtain the remote dir config from > yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not > specified falls back to the yarn.nodemanager.remote-app-log-dir config. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) can't use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Summary: can't use CGroups with YARN in centos7 (was: can not use CGroups with YARN in centos7 ) > can't use CGroups with YARN in centos7 > --- > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2019-04-19 20:17:20,108 INFO >
[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829233#comment-16829233 ] Shurong Mai commented on YARN-9518: --- hi, [~adam.antal], I have completed the description of this issue and submit a patch, please review. > can not use CGroups with YARN in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > The patch is universally applicable to cgroup subsystem paths, such as cgroup > network subsystem as follows: > {code:java} > /sys/fs/cgroup/net_cls -> net_cls,net_prio > /sys/fs/cgroup/net_prio -> net_cls,net_prio > /sys/fs/cgroup/net_cls,net_prio{code} > > > ## > {panel:title=exceptional nodemanager logs:} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) >
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. The patch is universally applicable to cgroup subsystem paths, such as cgroup network subsystem as follows: {code:java} /sys/fs/cgroup/net_cls -> net_cls,net_prio /sys/fs/cgroup/net_prio -> net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio{code} ## {panel:title=exceptional nodemanager logs:} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. The patch is universally applicable to cgroup subsystem paths, such as cgroup network subsystem as follows: {code:java} /sys/fs/cgroup/net_cls -> net_cls,net_prio /sys/fs/cgroup/net_prio -> net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio{code} {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO
[jira] [Assigned] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config
[ https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Antal reassigned YARN-9519: Assignee: Adam Antal > TFile log aggregation file format is insensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config > > > Key: YARN-9519 > URL: https://issues.apache.org/jira/browse/YARN-9519 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 >Reporter: Adam Antal >Assignee: Adam Antal >Priority: Major > > The TFile log aggregation file format is not sensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config. > In {{LogAggregationTFileController$initInternal}}: > {code:java} > this.remoteRootLogDir = new Path( > conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); > {code} > So the remoteRootLogDir is only aware of the > yarn.nodemanager.remote-app-log-dir config, while other file format, like > IFile defaults to the file format config, so its priority is bigger. > From {{LogAggregationIndexedFileController$initInternal}}: > {code:java} > String remoteDirStr = String.format( > YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, > this.fileControllerName); > String remoteDir = conf.get(remoteDirStr); > if (remoteDir == null || remoteDir.isEmpty()) { > remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); > } > {code} > (Where these configs are: ) > {code:java} > public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT > = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; > public static final String NM_REMOTE_APP_LOG_DIR = > NM_PREFIX + "remote-app-log-dir"; > {code} > I suggest TFile should try to obtain the remote dir config from > yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not > specified falls back to the yarn.nodemanager.remote-app-log-dir config. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829224#comment-16829224 ] Hadoop QA commented on YARN-9518: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} YARN-9518 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-9518 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967374/YARN-9518.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24028/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > can not use CGroups with YARN in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > Attachments: YARN-9518.patch > > > The os version is centos7. > > When I had set configuration variables for cgroup with yarn, nodemanager > could be start without any matter. But when I ran a job, the job failed with > these exceptional nodemanager logs in the end. > In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > After I analysed, I found the reason. In centos6, the cgroup "cpu" and > "cpuacct" subsystem are as follows: > {code:java} > /sys/fs/cgroup/cpu > /sys/fs/cgroup/cpuacct > {code} > But in centos7, as follows: > {code:java} > /sys/fs/cgroup/cpu -> cpu,cpuacct > /sys/fs/cgroup/cpuacct -> cpu,cpuacct > /sys/fs/cgroup/cpu,cpuacct{code} > "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are > symbol links. > As I look at source code, nodemamager get the cgroup subsystem info by > reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also > "/sys/fs/cgroup/cpu,cpuacct". > The resource description arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > There is a comma in the cgroup path, but the comma is separator of multi > resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" > rather than correct cgroup path " > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks > " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as > node manager - Is a directory " > Hence I modify the source code and submit a patch. The idea of patch is that > nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than > "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description > arguments of container-executor is such as follows: > {code:java} > cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks > {code} > Note that there is no comma in the path, and is a valid path because > "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". > After applied the patch, the problem is resolved and the job can run > successfully. > > {panel:title=exceptional nodemanager logs} > 2019-04-19 20:17:20,095 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1554210318404_0042_01_01 transitioned from LOCALIZED > to RUNNING > 2019-04-19 20:17:20,101 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code > from container container_1554210318404_0042_01_01 is : 27 > 2019-04-19 20:17:20,103 WARN > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception > from container-launch with container ID: container_155421031840 > 4_0042_01_01 and exit code: 27 > ExitCodeException exitCode=27: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at
[jira] [Commented] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config
[ https://issues.apache.org/jira/browse/YARN-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829223#comment-16829223 ] Adam Antal commented on YARN-9519: -- It looks like it is also insensitive to the suffix version of this config: {code:java} public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_SUFFIX_FMT = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir-suffix"; {code} > TFile log aggregation file format is insensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config > > > Key: YARN-9519 > URL: https://issues.apache.org/jira/browse/YARN-9519 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.2.0 >Reporter: Adam Antal >Priority: Major > > The TFile log aggregation file format is not sensitive to the > yarn.log-aggregation.TFile.remote-app-log-dir config. > In {{LogAggregationTFileController$initInternal}}: > {code:java} > this.remoteRootLogDir = new Path( > conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); > {code} > So the remoteRootLogDir is only aware of the > yarn.nodemanager.remote-app-log-dir config, while other file format, like > IFile defaults to the file format config, so its priority is bigger. > From {{LogAggregationIndexedFileController$initInternal}}: > {code:java} > String remoteDirStr = String.format( > YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, > this.fileControllerName); > String remoteDir = conf.get(remoteDirStr); > if (remoteDir == null || remoteDir.isEmpty()) { > remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, > YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); > } > {code} > (Where these configs are: ) > {code:java} > public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT > = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; > public static final String NM_REMOTE_APP_LOG_DIR = > NM_PREFIX + "remote-app-log-dir"; > {code} > I suggest TFile should try to obtain the remote dir config from > yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not > specified falls back to the yarn.nodemanager.remote-app-log-dir config. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: The os version is centos7. When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys.fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2019-04-19 20:17:20,109 INFO
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. As I look at source code, nodemamager get the cgroup subsystem info by reading /proc/mounts. So It get the cpu and cpuacct subsystem path are also "/sys/fs/cgroup/cpu,cpuacct". The resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} There is a comma in the cgroup path, but the comma is separator of multi resource. Therefore, the cgroup path is truncated as "/sys/fs/cgroup/cpu" rather than correct cgroup path " /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_1554210318404_0057_02_01/tasks " and report the error in the log " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " Hence I modify the source code and submit a patch. The idea of patch is that nodemanager get the cgroup cpu path as "/sys/fs/cgroup/cpu" rather than "/sys/fs/cgroup/cpu,cpuacct". As a result, the resource description arguments of container-executor is such as follows: {code:java} cgroups=/sys/fs/cgroup/cpu/hadoop-yarn/container_1554210318404_0057_02_01/tasks {code} Note that there is no comma in the path, and is a valid path because "/sys/fs/cgroup/cpu" is symbol link to "/sys/fs/cgroup/cpu,cpuacct". After applied the patch, the problem is resolved and the job can run successfully. {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at
[jira] [Created] (YARN-9519) TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config
Adam Antal created YARN-9519: Summary: TFile log aggregation file format is insensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config Key: YARN-9519 URL: https://issues.apache.org/jira/browse/YARN-9519 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 3.2.0 Reporter: Adam Antal The TFile log aggregation file format is not sensitive to the yarn.log-aggregation.TFile.remote-app-log-dir config. In {{LogAggregationTFileController$initInternal}}: {code:java} this.remoteRootLogDir = new Path( conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR)); {code} So the remoteRootLogDir is only aware of the yarn.nodemanager.remote-app-log-dir config, while other file format, like IFile defaults to the file format config, so its priority is bigger. >From {{LogAggregationIndexedFileController$initInternal}}: {code:java} String remoteDirStr = String.format( YarnConfiguration.LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT, this.fileControllerName); String remoteDir = conf.get(remoteDirStr); if (remoteDir == null || remoteDir.isEmpty()) { remoteDir = conf.get(YarnConfiguration.NM_REMOTE_APP_LOG_DIR, YarnConfiguration.DEFAULT_NM_REMOTE_APP_LOG_DIR); } {code} (Where these configs are: ) {code:java} public static final String LOG_AGGREGATION_REMOTE_APP_LOG_DIR_FMT = YARN_PREFIX + "log-aggregation.%s.remote-app-log-dir"; public static final String NM_REMOTE_APP_LOG_DIR = NM_PREFIX + "remote-app-log-dir"; {code} I suggest TFile should try to obtain the remote dir config from yarn.log-aggregation.TFile.remote-app-log-dir first, and only if that is not specified falls back to the yarn.nodemanager.remote-app-log-dir config. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9476) Create unit tests for VE plugin
[ https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829205#comment-16829205 ] Peter Bacsko commented on YARN-9476: [~sunilg] [~tangzhankun] could you please also review and commit this patch? > Create unit tests for VE plugin > --- > > Key: YARN-9476 > URL: https://issues.apache.org/jira/browse/YARN-9476 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9476-001.patch, YARN-9476-002.patch, > YARN-9476-003.patch, YARN-9476-004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9504) [UI2] Fair scheduler queue view page is broken
[ https://issues.apache.org/jira/browse/YARN-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829200#comment-16829200 ] Zoltan Siegl commented on YARN-9504: [~sunilg] could you have a look at this? > [UI2] Fair scheduler queue view page is broken > -- > > Key: YARN-9504 > URL: https://issues.apache.org/jira/browse/YARN-9504 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, yarn-ui-v2 >Affects Versions: 3.2.0, 3.3.0, 3.2.1 >Reporter: Zoltan Siegl >Assignee: Zoltan Siegl >Priority: Major > Fix For: 3.3.0, 3.2.1 > > Attachments: Screenshot 2019-04-23 at 14.52.57.png, Screenshot > 2019-04-23 at 14.59.35.png, YARN-9504.001.patch, YARN-9504.002.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > UI2 queue page currently displays white screen for Fair Scheduler. > > In src/main/webapp/app/components/tree-selector.js:377 (getUsedCapacity) code > refers to > queueData.get("partitionMap") which is null for fair scheduler queue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9476) Create unit tests for VE plugin
[ https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829194#comment-16829194 ] Szilard Nemeth commented on YARN-9476: -- Hi [~pbacsko]! 1 (non-binding) for the latest patch! > Create unit tests for VE plugin > --- > > Key: YARN-9476 > URL: https://issues.apache.org/jira/browse/YARN-9476 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9476-001.patch, YARN-9476-002.patch, > YARN-9476-003.patch, YARN-9476-004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct". "cpu" and "cpuacct" are symbol links. {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Description: When I had set configuration variables for cgroup with yarn, nodemanager could be start without any matter. But when I ran a job, the job failed with these exceptional nodemanager logs in the end. In these logs, the important logs is " Can't open file /sys/fs/cgroup/cpu as node manager - Is a directory " After I analysed, I found the reason. In centos6, the cgroup "cpu" and "cpuacct" subsystem are as follows: {code:java} /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct {code} But in centos7, as follows: {code:java} /sys/fs/cgroup/cpu -> cpu,cpuacct /sys/fs/cgroup/cpuacct -> cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct{code} "cpu" and "cpuacct" have merge as "cpu,cpuacct" {panel:title=exceptional nodemanager logs} 2019-04-19 20:17:20,095 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1554210318404_0042_01_01 transitioned from LOCALIZED to RUNNING 2019-04-19 20:17:20,101 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1554210318404_0042_01_01 is : 27 2019-04-19 20:17:20,103 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_155421031840 4_0042_01_01 and exit code: 27 ExitCodeException exitCode=27: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1554210318404_0042_01_01 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 27 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: ExitCodeException exitCode=27: 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:585) 2019-04-19 20:17:20,108 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:482) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:299) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745) 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 2019-04-19 20:17:20,109 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Shell output: main : command provided 1 2019-04-19 20:17:20,109 INFO
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Summary: can not use CGroups with YARN in centos7 (was: cgroup subsystem in centos7 ) > can not use CGroups with YARN in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829153#comment-16829153 ] Shurong Mai commented on YARN-9518: --- hi [~adam.antal], thank you for your attention. I am editting this issue, please wait for moments. > can not use CGroups with YARN in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) can not use CGroups with YARN in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Affects Version/s: 3.2.0 2.9.2 2.8.5 2.7.7 3.1.2 > can not use CGroups with YARN in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0, 2.9.2, 2.8.5, 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart
[ https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie updated YARN-9510: -- Description: We add a proxyuser by changing "hadoop.proxyuser.xx.yy",if without restart timeline server.YARN job will fail and throws : {code:java} Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, URL: http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, status: 403, message: Forbidden at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) {code} Seems that proxyuser info in timeline server has not been refreshed. In production cluster, we sometimes add a new proxy user during runtime, and expect that impersonation takes effect after execute a command like "...refreshSuperUserGroupsConfiguration", without restart timeline. was: We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't restart timeline server.MR job will fail and throws : {code:java} Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, URL: http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, status: 403, message: Forbidden at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) seems that proxyuser info in timeline server has not been refreshed. {code} > Proxyuser access timeline and getdelegationtoken failed without Timeline > server restart > --- > > Key: YARN-9510 > URL: https://issues.apache.org/jira/browse/YARN-9510 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 3.1.0 >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Major > > We add a proxyuser by changing "hadoop.proxyuser.xx.yy",if without restart > timeline server.YARN job will fail and throws : > {code:java} > Caused by: > org.apache.hadoop.security.authentication.client.AuthenticationException: > Authentication failed, URL: > http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, > status: 403, message: Forbidden > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) > at > org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) > at > org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) > {code} > Seems that proxyuser info in timeline server has not been refreshed. > In production cluster, we sometimes add a new proxy user during runtime, and > expect that impersonation takes effect after execute a command like > "...refreshSuperUserGroupsConfiguration", without restart timeline. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart
[ https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie reassigned YARN-9510: - Assignee: Shen Yinjie > Proxyuser access timeline and getdelegationtoken failed without Timeline > server restart > --- > > Key: YARN-9510 > URL: https://issues.apache.org/jira/browse/YARN-9510 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 3.1.0 >Reporter: Shen Yinjie >Assignee: Shen Yinjie >Priority: Major > > We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute > yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't restart > timeline server.MR job will fail and throws : > {code:java} > Caused by: > org.apache.hadoop.security.authentication.client.AuthenticationException: > Authentication failed, URL: > http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, > status: 403, message: Forbidden > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) > at > org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) > at > org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) > seems that proxyuser info in timeline server has not been refreshed. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9518) cgroup subsystem in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829064#comment-16829064 ] Adam Antal commented on YARN-9518: -- Hi [~shurong.mai], could you please provide a description about the issue? What is the bug/what do you want to achieve? > cgroup subsystem in centos7 > > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart
[ https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829039#comment-16829039 ] Shen Yinjie commented on YARN-9510: --- Propose to implement RefreshUserMappingsProtocol in ApplicationHistoryClientService,I'll upload a draft patch soon. > Proxyuser access timeline and getdelegationtoken failed without Timeline > server restart > --- > > Key: YARN-9510 > URL: https://issues.apache.org/jira/browse/YARN-9510 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 3.1.0 >Reporter: Shen Yinjie >Priority: Major > > We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute > yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't restart > timeline server.MR job will fail and throws : > {code:java} > Caused by: > org.apache.hadoop.security.authentication.client.AuthenticationException: > Authentication failed, URL: > http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, > status: 403, message: Forbidden > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) > at > org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) > at > org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) > seems that proxyuser info in timeline server has not been refreshed. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart
[ https://issues.apache.org/jira/browse/YARN-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shen Yinjie updated YARN-9510: -- Description: We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't restart timeline server.MR job will fail and throws : {code:java} Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, URL: http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, status: 403, message: Forbidden at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) seems that proxyuser info in timeline server has not been refreshed. {code} was: We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't restart timeline server.MR job will fail and throws : Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, URL: http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, status: 403, message: Forbidden at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) seems that proxyuser info in timeline server has not been refreshed. > Proxyuser access timeline and getdelegationtoken failed without Timeline > server restart > --- > > Key: YARN-9510 > URL: https://issues.apache.org/jira/browse/YARN-9510 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Affects Versions: 3.1.0 >Reporter: Shen Yinjie >Priority: Major > > We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute > yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't restart > timeline server.MR job will fail and throws : > {code:java} > Caused by: > org.apache.hadoop.security.authentication.client.AuthenticationException: > Authentication failed, URL: > http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, > status: 403, message: Forbidden > at > org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) > at > org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) > at > org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) > seems that proxyuser info in timeline server has not been refreshed. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) cgroup subsystem in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Summary: cgroup subsystem in centos7 (was: cgroup in centos7) > cgroup subsystem in centos7 > > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9518) cgroup in centos7
[ https://issues.apache.org/jira/browse/YARN-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9518: -- Priority: Major (was: Critical) > cgroup in centos7 > - > > Key: YARN-9518 > URL: https://issues.apache.org/jira/browse/YARN-9518 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Shurong Mai >Priority: Major > Labels: cgroup, patch > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9518) cgroup in centos7
Shurong Mai created YARN-9518: - Summary: cgroup in centos7 Key: YARN-9518 URL: https://issues.apache.org/jira/browse/YARN-9518 Project: Hadoop YARN Issue Type: Bug Reporter: Shurong Mai -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: yarn-site.xml {code:java} yarn.log-aggregation-enable false {code} When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which is simple and can apply to this hadoop version. was: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which is simple and can apply to this hadoop version. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > yarn-site.xml > {code:java} > > yarn.log-aggregation-enable > false > > {code} > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai reopened YARN-9517: --- > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Labels: patch (was: ) > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai resolved YARN-9517. --- Resolution: Fixed > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Labels: patch > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828982#comment-16828982 ] Shurong Mai edited comment on YARN-9517 at 4/29/19 7:20 AM: We had applied the patch to our hadoop and test ok was (Author: shurong.mai): We have applied the patch to our hadoop and test ok > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828982#comment-16828982 ] Shurong Mai edited comment on YARN-9517 at 4/29/19 7:20 AM: We had applied the patch to our hadoop and test ok. was (Author: shurong.mai): We had applied the patch to our hadoop and test ok > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which is simple and can apply to this hadoop version. was: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which can apply to this hadoop version. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7, 3.1.2 >Reporter: Shurong Mai >Priority: Major > Attachments: YARN-9517.patch > > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which is simple and can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Affects Version/s: 2.2.0 2.3.0 2.4.1 2.5.2 2.6.5 3.2.0 2.9.2 2.8.5 > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.2.0, 2.3.0, 2.4.1, 2.5.2, 2.6.5, 3.2.0, 2.9.2, 2.8.5, > 2.7.7 >Reporter: Shurong Mai >Priority: Major > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit a patch which can apply to this hadoop version. was: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x and 3.x. > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x.y and 3.x.y and I submit > a patch which can apply to this hadoop version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; was:Aggregation is not enabled, when we click > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; I also fund this problem in all hadoop version 2.x and 3.x. was: When aggregation is not enabled, we click the "container log link"(in web page "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) after a job is finished successfully. It will jump to the webpage displaying "Aggregation is not enabled. Try the nodemanager at yy:48038" after we click, and the url is "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > When aggregation is not enabled, we click the "container log link"(in web > page > "http://xx:19888/jobhistory/attempts/job_1556431770792_0001/m/SUCCESSFUL;) > after a job is finished successfully. > It will jump to the webpage displaying "Aggregation is not enabled. Try the > nodemanager at yy:48038" after we click, and the url is > "http://xx:19888/jobhistory/logs/yy:48038/container_1556431770792_0001_01_02/attempt_1556431770792_0001_m_00_0/hadoop; > I also fund this problem in all hadoop version 2.x and 3.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Summary: When aggregation is not enabled, can't see the container log (was: When Aggregation is not enabled, can't see the container log) > When aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > Aggregation is not enabled, when we click -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9517) When Aggregation is not enabled, can't see the container log
[ https://issues.apache.org/jira/browse/YARN-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shurong Mai updated YARN-9517: -- Description: Aggregation is not enabled, when we click > When Aggregation is not enabled, can't see the container log > > > Key: YARN-9517 > URL: https://issues.apache.org/jira/browse/YARN-9517 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.7 >Reporter: Shurong Mai >Priority: Major > > Aggregation is not enabled, when we click -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org