[jira] [Assigned] (YARN-9307) node_partitions constraint does not work
[ https://issues.apache.org/jira/browse/YARN-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang reassigned YARN-9307: - Resolution: Fixed Assignee: kyungwan nam Hadoop Flags: Reviewed Fix Version/s: 3.1.3 > node_partitions constraint does not work > > > Key: YARN-9307 > URL: https://issues.apache.org/jira/browse/YARN-9307 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Fix For: 3.1.3 > > Attachments: YARN-9307.branch-3.1.001.patch > > > when a yarn-service app is submitted with below configuration, > node_partitions constraint does not work. > {code} > … > "placement_policy": { >"constraints": [ > { >"type": "ANTI_AFFINITY", >"scope": "NODE", >"target_tags": [ > "ws" >], >"node_partitions": [ > "" >] > } >] > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9307) node_partitions constraint does not work
[ https://issues.apache.org/jira/browse/YARN-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826647#comment-16826647 ] Weiwei Yang commented on YARN-9307: --- Committed to branch-3.1, thanks for the fix [~kyungwan nam]. > node_partitions constraint does not work > > > Key: YARN-9307 > URL: https://issues.apache.org/jira/browse/YARN-9307 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: kyungwan nam >Assignee: kyungwan nam >Priority: Major > Fix For: 3.1.3 > > Attachments: YARN-9307.branch-3.1.001.patch > > > when a yarn-service app is submitted with below configuration, > node_partitions constraint does not work. > {code} > … > "placement_policy": { >"constraints": [ > { >"type": "ANTI_AFFINITY", >"scope": "NODE", >"target_tags": [ > "ws" >], >"node_partitions": [ > "" >] > } >] > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9307) node_partitions constraint does not work
[ https://issues.apache.org/jira/browse/YARN-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826644#comment-16826644 ] Weiwei Yang commented on YARN-9307: --- LGTM. +1 > node_partitions constraint does not work > > > Key: YARN-9307 > URL: https://issues.apache.org/jira/browse/YARN-9307 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: kyungwan nam >Priority: Major > Attachments: YARN-9307.branch-3.1.001.patch > > > when a yarn-service app is submitted with below configuration, > node_partitions constraint does not work. > {code} > … > "placement_policy": { >"constraints": [ > { >"type": "ANTI_AFFINITY", >"scope": "NODE", >"target_tags": [ > "ws" >], >"node_partitions": [ > "" >] > } >] > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9507) Fix NPE if NM fails to init
[ https://issues.apache.org/jira/browse/YARN-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826630#comment-16826630 ] Bilwa S T commented on YARN-9507: - Thanks [~Tao Yang] for reviewing. > Fix NPE if NM fails to init > --- > > Key: YARN-9507 > URL: https://issues.apache.org/jira/browse/YARN-9507 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Minor > Attachments: YARN-9507-001.patch > > > 2019-04-24 14:06:44,101 WARN org.apache.hadoop.service.AbstractService: When > stopping the service NodeManager > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:492) > at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:947) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1018) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9307) node_partitions constraint does not work
[ https://issues.apache.org/jira/browse/YARN-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826627#comment-16826627 ] Hadoop QA commented on YARN-9307: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-3.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 59s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 45s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 0s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} branch-3.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} branch-3.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 65m 1s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}128m 2s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:080e9d0 | | JIRA Issue | YARN-9307 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12958805/YARN-9307.branch-3.1.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8c79f40c4214 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.1 / d242b16 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24023/testReport/ | | Max. process+thread count | 902 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output |
[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly
[ https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826580#comment-16826580 ] Hudson commented on YARN-9486: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16466 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16466/]) YARN-9486. Docker container exited with failure does not get clean up (ebadger: rev 79d3d35398cb5348cfd62e41e3318ec7a337421a) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerCleanup.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerRelaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerCleanup.java > Docker container exited with failure does not get clean up correctly > > > Key: YARN-9486 > URL: https://issues.apache.org/jira/browse/YARN-9486 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9486.001.patch, YARN-9486.002.patch, > YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch > > > When docker container encounters error and exit prematurely > (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we > get messages that look like this: > {code} > java.io.IOException: Could not find > nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid > in any of the directories > 2019-04-15 20:42:16,454 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > RELAUNCHING to EXITED_WITH_FAILURE > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Cleaning up container container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Container container_1555111445937_0008_01_07 not launched. No cleanup > needed to be done > 2019-04-15 20:42:16,455 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1555111445937_0008 > CONTAINERID=container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > EXITED_WITH_FAILURE to DONE > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Removing container_1555111445937_0008_01_07 from application > application_1555111445937_0008 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Stopping resource-monitoring for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: > Considering container container_1555111445937_0008_01_07 for > log-aggregation > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting localization status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: > COMPLETE, Capability: , Diagnostics: ..., ExitStatus: > -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE] > 2019-04-15 20:42:18,464 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed > completed containers from NM context: [container_1555111445937_0008_01_07] > 2019-04-15 20:43:50,476 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Stopping container with container Id:
[jira] [Commented] (YARN-9505) Add container allocation latency for Opportunistic Scheduler
[ https://issues.apache.org/jira/browse/YARN-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826411#comment-16826411 ] Íñigo Goiri commented on YARN-9505: --- It looks like the use is a little random of Time and SystemClock. I'm more used to see Time but as you mention, YARN-4526 uses SystemClock in many places. > Add container allocation latency for Opportunistic Scheduler > > > Key: YARN-9505 > URL: https://issues.apache.org/jira/browse/YARN-9505 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9505.001.patch, YARN-9505.002.patch > > > This will help in tuning the opportunistic scheduler and it's configuration > parameters. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9505) Add container allocation latency for Opportunistic Scheduler
[ https://issues.apache.org/jira/browse/YARN-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826386#comment-16826386 ] Abhishek Modi commented on YARN-9505: - [~elgoiri] thanks for reviewing this. I used SystemClock based on the description of the jira YARN-4526. Please let me know if we should change it to monotonic time. > Add container allocation latency for Opportunistic Scheduler > > > Key: YARN-9505 > URL: https://issues.apache.org/jira/browse/YARN-9505 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9505.001.patch, YARN-9505.002.patch > > > This will help in tuning the opportunistic scheduler and it's configuration > parameters. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6272) TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
[ https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826350#comment-16826350 ] Prabhu Joseph commented on YARN-6272: - Thanks [~giovanni.fumarola] for reviewing. The testcase heartbeats once with three NMs and expects the increase allocation will happen immediately. But it won't in cases where the allocation happens on some other NM. The allocation has to happen on the same NM as of actual container for which increase resource requested until then the request will be added back and will be processed only on subsequent node update. Heartbeating with only the NM where the container was allocated initially will not require any sleep. But MiniYarnCluster sends node update for all NMs thus the allocation will be random out of three NMs and so the testcase requires wait and retry till the container allocated on right NM out of three. The fix heartbeats with only the right NM. This will increase the possibility (even though MiniYarnCluster does nodeUpdate for all) and does wait and retry till the new increased container gets allocated on same NM. And validated the fix with multiple 500 iterations and did not face test failure. Without the fix, the testcase consistently fails within 50 iterations. The other way is to use MockRM and MockNM (as per above Jason's comment), have tried this and felt lot of changes. Let me know if this is not convincing, will test it with MockRM and MockNM . > TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently > - > > Key: YARN-6272 > URL: https://issues.apache.org/jira/browse/YARN-6272 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 3.0.0-alpha4 >Reporter: Ray Chiang >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-6272-001.patch > > > I'm seeing this unit test fail fairly often in trunk: > testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient) > Time elapsed: 5.113 sec <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6272) TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
[ https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826267#comment-16826267 ] Giovanni Matteo Fumarola commented on YARN-6272: Thanks [~Prabhu Joseph] for the patch. I am not a fan of Sleep instructions in unit tests. Can you explain the fix? > TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently > - > > Key: YARN-6272 > URL: https://issues.apache.org/jira/browse/YARN-6272 > Project: Hadoop YARN > Issue Type: Test > Components: yarn >Affects Versions: 3.0.0-alpha4 >Reporter: Ray Chiang >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-6272-001.patch > > > I'm seeing this unit test fail fairly often in trunk: > testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient) > Time elapsed: 5.113 sec <<< FAILURE! > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly
[ https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826234#comment-16826234 ] Hadoop QA commented on YARN-9486: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9486 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967031/YARN-9486.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1fc77044de83 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b5dcf64 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24022/testReport/ | | Max. process+thread count | 446 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24022/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Docker container exited with failure does not get clean up
[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly
[ https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826198#comment-16826198 ] Jim Brennan commented on YARN-9486: --- [~eyang] thanks for updating the comment. +1 (non-binding) on patch 005. > Docker container exited with failure does not get clean up correctly > > > Key: YARN-9486 > URL: https://issues.apache.org/jira/browse/YARN-9486 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9486.001.patch, YARN-9486.002.patch, > YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch > > > When docker container encounters error and exit prematurely > (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we > get messages that look like this: > {code} > java.io.IOException: Could not find > nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid > in any of the directories > 2019-04-15 20:42:16,454 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > RELAUNCHING to EXITED_WITH_FAILURE > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Cleaning up container container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Container container_1555111445937_0008_01_07 not launched. No cleanup > needed to be done > 2019-04-15 20:42:16,455 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1555111445937_0008 > CONTAINERID=container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > EXITED_WITH_FAILURE to DONE > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Removing container_1555111445937_0008_01_07 from application > application_1555111445937_0008 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Stopping resource-monitoring for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: > Considering container container_1555111445937_0008_01_07 for > log-aggregation > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting localization status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: > COMPLETE, Capability: , Diagnostics: ..., ExitStatus: > -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE] > 2019-04-15 20:42:18,464 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed > completed containers from NM context: [container_1555111445937_0008_01_07] > 2019-04-15 20:43:50,476 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Stopping container with container Id: container_1555111445937_0008_01_07 > {code} > There is no docker rm command performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly
[ https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826192#comment-16826192 ] Eric Yang commented on YARN-9486: - [~Jim_Brennan] Thank you for the review. Patch 005 is same as patch 004 with comment added to explain the corner cases. > Docker container exited with failure does not get clean up correctly > > > Key: YARN-9486 > URL: https://issues.apache.org/jira/browse/YARN-9486 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9486.001.patch, YARN-9486.002.patch, > YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch > > > When docker container encounters error and exit prematurely > (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we > get messages that look like this: > {code} > java.io.IOException: Could not find > nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid > in any of the directories > 2019-04-15 20:42:16,454 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > RELAUNCHING to EXITED_WITH_FAILURE > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Cleaning up container container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Container container_1555111445937_0008_01_07 not launched. No cleanup > needed to be done > 2019-04-15 20:42:16,455 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1555111445937_0008 > CONTAINERID=container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > EXITED_WITH_FAILURE to DONE > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Removing container_1555111445937_0008_01_07 from application > application_1555111445937_0008 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Stopping resource-monitoring for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: > Considering container container_1555111445937_0008_01_07 for > log-aggregation > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting localization status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: > COMPLETE, Capability: , Diagnostics: ..., ExitStatus: > -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE] > 2019-04-15 20:42:18,464 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed > completed containers from NM context: [container_1555111445937_0008_01_07] > 2019-04-15 20:43:50,476 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Stopping container with container Id: container_1555111445937_0008_01_07 > {code} > There is no docker rm command performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9513) [JDK11] TestMetricsInvariantChecker#testManyRuns InvariantViolationException: ReferenceError: "GcCountPS_Scavenge" is not defined in at line number 1
Siyao Meng created YARN-9513: Summary: [JDK11] TestMetricsInvariantChecker#testManyRuns InvariantViolationException: ReferenceError: "GcCountPS_Scavenge" is not defined in at line number 1 Key: YARN-9513 URL: https://issues.apache.org/jira/browse/YARN-9513 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Siyao Meng Found in maven JDK 11 unit test run. Compiled on JDK 8: {code} [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.502 s<<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker [ERROR] testManyRuns(org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker) Time elapsed: 0.206 s <<< ERROR!org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.InvariantViolationException: ReferenceError: "GcCountPS_Scavenge" is not defined in at line number 1 at org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.InvariantsChecker.logOrThrow(InvariantsChecker.java:74) at org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.MetricsInvariantChecker.editSchedule(MetricsInvariantChecker.java:180) at org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker.testManyRuns(TestMetricsInvariantChecker.java:69) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9486) Docker container exited with failure does not get clean up correctly
[ https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang updated YARN-9486: Attachment: YARN-9486.005.patch > Docker container exited with failure does not get clean up correctly > > > Key: YARN-9486 > URL: https://issues.apache.org/jira/browse/YARN-9486 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9486.001.patch, YARN-9486.002.patch, > YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch > > > When docker container encounters error and exit prematurely > (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we > get messages that look like this: > {code} > java.io.IOException: Could not find > nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid > in any of the directories > 2019-04-15 20:42:16,454 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > RELAUNCHING to EXITED_WITH_FAILURE > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Cleaning up container container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Container container_1555111445937_0008_01_07 not launched. No cleanup > needed to be done > 2019-04-15 20:42:16,455 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1555111445937_0008 > CONTAINERID=container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > EXITED_WITH_FAILURE to DONE > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Removing container_1555111445937_0008_01_07 from application > application_1555111445937_0008 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Stopping resource-monitoring for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: > Considering container container_1555111445937_0008_01_07 for > log-aggregation > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting localization status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: > COMPLETE, Capability: , Diagnostics: ..., ExitStatus: > -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE] > 2019-04-15 20:42:18,464 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed > completed containers from NM context: [container_1555111445937_0008_01_07] > 2019-04-15 20:43:50,476 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Stopping container with container Id: container_1555111445937_0008_01_07 > {code} > There is no docker rm command performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9512) [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLCla
Siyao Meng created YARN-9512: Summary: [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader Key: YARN-9512 URL: https://issues.apache.org/jira/browse/YARN-9512 Project: Hadoop YARN Issue Type: Bug Components: test Reporter: Siyao Meng Found in maven JDK 11 unit test run. Compiled on JDK 8: {code} [ERROR] testCustomizedAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices) Time elapsed: 0.019 s <<< ERROR!java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap') at org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices$ServiceC.getMetaData(TestAuxServices.java:197) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStart(AuxServices.java:315) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testCustomizedAuxServiceClassPath(TestAuxServices.java:344) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436
[ https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated YARN-9511: - Component/s: test > [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: > The remote jarfile should not be writable by group or others. The current > Permission is 436 > --- > > Key: YARN-9511 > URL: https://issues.apache.org/jira/browse/YARN-9511 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Siyao Meng >Priority: Major > > Found in maven JDK 11 unit test run. Compiled on JDK 8. > {code} > [ERROR] > testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices) > Time elapsed: 0.551 s <<< > ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote > jarfile should not be writable by group or others. The current Permission is > 436 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) > at org.junit.runners.ParentRunner.run(ParentRunner.java:309) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436
Siyao Meng created YARN-9511: Summary: [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436 Key: YARN-9511 URL: https://issues.apache.org/jira/browse/YARN-9511 Project: Hadoop YARN Issue Type: Bug Reporter: Siyao Meng Found in maven JDK 11 unit test run. Compiled on JDK 8. {code} [ERROR] testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices) Time elapsed: 0.551 s <<< ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436 at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9476) Create unit tests for VE plugin
[ https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826185#comment-16826185 ] Hadoop QA commented on YARN-9476: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 35s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 20s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e | | JIRA Issue | YARN-9476 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12967025/YARN-9476-004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2cc6239b5327 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b5dcf64 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24021/testReport/ | | Max. process+thread count | 412 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24021/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Create unit tests for VE plugin >
[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly
[ https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826142#comment-16826142 ] Jim Brennan commented on YARN-9486: --- [~eyang], I am +1 (non-binding) on patch 004. > Docker container exited with failure does not get clean up correctly > > > Key: YARN-9486 > URL: https://issues.apache.org/jira/browse/YARN-9486 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9486.001.patch, YARN-9486.002.patch, > YARN-9486.003.patch, YARN-9486.004.patch > > > When docker container encounters error and exit prematurely > (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we > get messages that look like this: > {code} > java.io.IOException: Could not find > nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid > in any of the directories > 2019-04-15 20:42:16,454 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > RELAUNCHING to EXITED_WITH_FAILURE > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Cleaning up container container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Container container_1555111445937_0008_01_07 not launched. No cleanup > needed to be done > 2019-04-15 20:42:16,455 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1555111445937_0008 > CONTAINERID=container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > EXITED_WITH_FAILURE to DONE > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Removing container_1555111445937_0008_01_07 from application > application_1555111445937_0008 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Stopping resource-monitoring for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: > Considering container container_1555111445937_0008_01_07 for > log-aggregation > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting localization status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: > COMPLETE, Capability: , Diagnostics: ..., ExitStatus: > -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE] > 2019-04-15 20:42:18,464 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed > completed containers from NM context: [container_1555111445937_0008_01_07] > 2019-04-15 20:43:50,476 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Stopping container with container Id: container_1555111445937_0008_01_07 > {code} > There is no docker rm command performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9476) Create unit tests for VE plugin
[ https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9476: --- Attachment: YARN-9476-004.patch > Create unit tests for VE plugin > --- > > Key: YARN-9476 > URL: https://issues.apache.org/jira/browse/YARN-9476 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9476-001.patch, YARN-9476-002.patch, > YARN-9476-003.patch, YARN-9476-004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly
[ https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826101#comment-16826101 ] Jim Brennan commented on YARN-9486: --- {quote} As the result, we need to check both markedLaunched and isLaunchCompleted to get a better picture if the contained failed to launch, still running, or has not started at all. {quote} [~eyang] Thanks again for the follow-up. I agree that adding the isLaunchCompleted check is warranted to cover all cases. It might be helpful to add a comment about the relaunch case where a containerAlreadyLaunched is false but isCompleted is true, which seems counter-intuitive. > Docker container exited with failure does not get clean up correctly > > > Key: YARN-9486 > URL: https://issues.apache.org/jira/browse/YARN-9486 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.2.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > Attachments: YARN-9486.001.patch, YARN-9486.002.patch, > YARN-9486.003.patch, YARN-9486.004.patch > > > When docker container encounters error and exit prematurely > (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we > get messages that look like this: > {code} > java.io.IOException: Could not find > nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid > in any of the directories > 2019-04-15 20:42:16,454 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > RELAUNCHING to EXITED_WITH_FAILURE > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Cleaning up container container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Container container_1555111445937_0008_01_07 not launched. No cleanup > needed to be done > 2019-04-15 20:42:16,455 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1555111445937_0008 > CONTAINERID=container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_07 transitioned from > EXITED_WITH_FAILURE to DONE > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Removing container_1555111445937_0008_01_07 from application > application_1555111445937_0008 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Stopping resource-monitoring for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: > Considering container container_1555111445937_0008_01_07 for > log-aggregation > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting localization status for container_1555111445937_0008_01_07 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: > COMPLETE, Capability: , Diagnostics: ..., ExitStatus: > -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE] > 2019-04-15 20:42:18,464 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed > completed containers from NM context: [container_1555111445937_0008_01_07] > 2019-04-15 20:43:50,476 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Stopping container with container Id: container_1555111445937_0008_01_07 > {code} > There is no docker rm command performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9476) Create unit tests for VE plugin
[ https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826097#comment-16826097 ] Szilard Nemeth commented on YARN-9476: -- Hi [~pbacsko]! Latest patch loogs good, except one minor thing I suggested before: Please store the result of {code:java} f.mkdirs() {code} and have an assertion on the value. I meant a similar thing for when you are setting the executable flag of the files: {code:java} scriptPath.toFile().setExecutable(true) {code} Please store the value and assert if it is true. Thanks! > Create unit tests for VE plugin > --- > > Key: YARN-9476 > URL: https://issues.apache.org/jira/browse/YARN-9476 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9476-001.patch, YARN-9476-002.patch, > YARN-9476-003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826085#comment-16826085 ] Peter Bacsko commented on YARN-9477: [~tangzhankun] [~snemeth] could you please check out this POC? > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-POC.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826084#comment-16826084 ] Peter Bacsko commented on YARN-9477: Note that this part is the heart of the improvement: {code} Pointer sysPathPtr = libUdev.udev_device_get_syspath(device); {code} We need the {{sysPath}} to determine where the file {{os_state}} is. Reading the Python script provided by NEC, the following happens: - Get the {{veslot}} device files under {{/dev}} like {{/dev/veslot0}} - Get the device object from udev (we know the major/minor of the device file -> convert it to a device number (like {{os.makedev()}} in Python)) - Get the syspath using libudev for a particular device file - Get the PCI bus slot using libudev - Read the {{/os_state}} file to determine the status of the card Note that the PCI bus slot is optional, we don't need that (although we can retrieve it too) to construct the device object. > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-POC.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9477) Implement VE discovery using libudev
[ https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9477: --- Attachment: YARN-9477-POC.patch > Implement VE discovery using libudev > > > Key: YARN-9477 > URL: https://issues.apache.org/jira/browse/YARN-9477 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9477-POC.patch > > > Right now we have a Python script which is able to discover VE cards using > pyudev: https://pyudev.readthedocs.io/en/latest/ > Java does not officially support libudev. There are some projects on Github > (example: https://github.com/Zubnix/udev-java-bindings) but they're not > available as Maven artifacts. > However it's not that difficult to create a minimal layer around libudev > using JNA. We don't have to wrap every function, we need to call 4-5 methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9410) Typo in documentation: Using FPGA On YARN
[ https://issues.apache.org/jira/browse/YARN-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826043#comment-16826043 ] kevin su commented on YARN-9410: Could I do this issue? I don't have permission to assign to myself > Typo in documentation: Using FPGA On YARN > -- > > Key: YARN-9410 > URL: https://issues.apache.org/jira/browse/YARN-9410 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Szilard Nemeth >Priority: Major > Labels: newbie, newbie++ > > fpag.major-device-number should be changed to fpga... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart
Shen Yinjie created YARN-9510: - Summary: Proxyuser access timeline and getdelegationtoken failed without Timeline server restart Key: YARN-9510 URL: https://issues.apache.org/jira/browse/YARN-9510 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Affects Versions: 3.1.0 Reporter: Shen Yinjie We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute yarn rmadmin -refreshSuperUserGroupsConfiguration but didn't restart timeline server.MR job will fail and throws : Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, URL: http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa, status: 403, message: Forbidden at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401) at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213) seems that proxyuser info in timeline server has not been refreshed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org