[jira] [Assigned] (YARN-9307) node_partitions constraint does not work

2019-04-25 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang reassigned YARN-9307:
-

   Resolution: Fixed
 Assignee: kyungwan nam
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.3

> node_partitions constraint does not work
> 
>
> Key: YARN-9307
> URL: https://issues.apache.org/jira/browse/YARN-9307
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Fix For: 3.1.3
>
> Attachments: YARN-9307.branch-3.1.001.patch
>
>
> when a yarn-service app is submitted with below configuration, 
> node_partitions constraint does not work.
> {code}
> …
>  "placement_policy": {
>"constraints": [
>  {
>"type": "ANTI_AFFINITY",
>"scope": "NODE",
>"target_tags": [
>  "ws"
>],
>"node_partitions": [
>  ""
>]
>  }
>]
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9307) node_partitions constraint does not work

2019-04-25 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826647#comment-16826647
 ] 

Weiwei Yang commented on YARN-9307:
---

Committed to branch-3.1, thanks for the fix [~kyungwan nam].

> node_partitions constraint does not work
> 
>
> Key: YARN-9307
> URL: https://issues.apache.org/jira/browse/YARN-9307
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Fix For: 3.1.3
>
> Attachments: YARN-9307.branch-3.1.001.patch
>
>
> when a yarn-service app is submitted with below configuration, 
> node_partitions constraint does not work.
> {code}
> …
>  "placement_policy": {
>"constraints": [
>  {
>"type": "ANTI_AFFINITY",
>"scope": "NODE",
>"target_tags": [
>  "ws"
>],
>"node_partitions": [
>  ""
>]
>  }
>]
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9307) node_partitions constraint does not work

2019-04-25 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826644#comment-16826644
 ] 

Weiwei Yang commented on YARN-9307:
---

LGTM. +1

> node_partitions constraint does not work
> 
>
> Key: YARN-9307
> URL: https://issues.apache.org/jira/browse/YARN-9307
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: kyungwan nam
>Priority: Major
> Attachments: YARN-9307.branch-3.1.001.patch
>
>
> when a yarn-service app is submitted with below configuration, 
> node_partitions constraint does not work.
> {code}
> …
>  "placement_policy": {
>"constraints": [
>  {
>"type": "ANTI_AFFINITY",
>"scope": "NODE",
>"target_tags": [
>  "ws"
>],
>"node_partitions": [
>  ""
>]
>  }
>]
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9507) Fix NPE if NM fails to init

2019-04-25 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826630#comment-16826630
 ] 

Bilwa S T commented on YARN-9507:
-

Thanks [~Tao Yang] for reviewing.

> Fix NPE if NM fails to init
> ---
>
> Key: YARN-9507
> URL: https://issues.apache.org/jira/browse/YARN-9507
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-9507-001.patch
>
>
> 2019-04-24 14:06:44,101 WARN org.apache.hadoop.service.AbstractService: When 
> stopping the service NodeManager
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:492)
>  at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220)
>  at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54)
>  at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:947)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1018)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9307) node_partitions constraint does not work

2019-04-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826627#comment-16826627
 ] 

Hadoop QA commented on YARN-9307:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-3.1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
59s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} branch-3.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} branch-3.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 65m  
1s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:080e9d0 |
| JIRA Issue | YARN-9307 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958805/YARN-9307.branch-3.1.001.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8c79f40c4214 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.1 / d242b16 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24023/testReport/ |
| Max. process+thread count | 902 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 

[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly

2019-04-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826580#comment-16826580
 ] 

Hudson commented on YARN-9486:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16466 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16466/])
YARN-9486. Docker container exited with failure does not get clean up (ebadger: 
rev 79d3d35398cb5348cfd62e41e3318ec7a337421a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerCleanup.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerRelaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerCleanup.java


> Docker container exited with failure does not get clean up correctly
> 
>
> Key: YARN-9486
> URL: https://issues.apache.org/jira/browse/YARN-9486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9486.001.patch, YARN-9486.002.patch, 
> YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch
>
>
> When docker container encounters error and exit prematurely 
> (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we 
> get messages that look like this:
> {code}
> java.io.IOException: Could not find 
> nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid
>  in any of the directories
> 2019-04-15 20:42:16,454 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> RELAUNCHING to EXITED_WITH_FAILURE
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Cleaning up container container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Container container_1555111445937_0008_01_07 not launched. No cleanup 
> needed to be done
> 2019-04-15 20:42:16,455 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase  
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE  
>   APPID=application_1555111445937_0008
> CONTAINERID=container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> EXITED_WITH_FAILURE to DONE
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_1555111445937_0008_01_07 from application 
> application_1555111445937_0008
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Stopping resource-monitoring for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Considering container container_1555111445937_0008_01_07 for 
> log-aggregation
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting localization status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: 
> COMPLETE, Capability: , Diagnostics: ..., ExitStatus: 
> -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE]
> 2019-04-15 20:42:18,464 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
> completed containers from NM context: [container_1555111445937_0008_01_07]
> 2019-04-15 20:43:50,476 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Stopping container with container Id: 

[jira] [Commented] (YARN-9505) Add container allocation latency for Opportunistic Scheduler

2019-04-25 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826411#comment-16826411
 ] 

Íñigo Goiri commented on YARN-9505:
---

It looks like the use is a little random of Time and SystemClock.
I'm more used to see Time but as you mention, YARN-4526 uses SystemClock in 
many places.

> Add container allocation latency for Opportunistic Scheduler
> 
>
> Key: YARN-9505
> URL: https://issues.apache.org/jira/browse/YARN-9505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9505.001.patch, YARN-9505.002.patch
>
>
> This will help in tuning the opportunistic scheduler and it's configuration 
> parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9505) Add container allocation latency for Opportunistic Scheduler

2019-04-25 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826386#comment-16826386
 ] 

Abhishek Modi commented on YARN-9505:
-

[~elgoiri] thanks for reviewing this. I used SystemClock based on the 
description of the jira YARN-4526. Please let me know if we should change it to 
monotonic time.

> Add container allocation latency for Opportunistic Scheduler
> 
>
> Key: YARN-9505
> URL: https://issues.apache.org/jira/browse/YARN-9505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-9505.001.patch, YARN-9505.002.patch
>
>
> This will help in tuning the opportunistic scheduler and it's configuration 
> parameters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6272) TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently

2019-04-25 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826350#comment-16826350
 ] 

Prabhu Joseph commented on YARN-6272:
-

Thanks [~giovanni.fumarola] for reviewing.

The testcase heartbeats once with three NMs and expects the increase allocation 
will happen immediately. But it won't in cases where the allocation happens on 
some other NM. The allocation has to happen on the same NM as of actual 
container for which increase resource requested until then the request will be 
added back and will be processed only on subsequent node update.

Heartbeating with only the NM where the container was allocated initially will 
not require any sleep. But MiniYarnCluster sends node update for all NMs thus 
the allocation will be random out of three NMs and so the testcase requires 
wait and retry till the container allocated on right NM out of three.

The fix heartbeats with only the right NM. This will increase the possibility 
(even though MiniYarnCluster does nodeUpdate for all) and does wait and retry 
till the new increased container gets allocated on same NM. And validated the 
fix with multiple 500 iterations and did not face test failure. Without the 
fix, the testcase consistently fails within 50 iterations.

The other way is to use MockRM and MockNM (as per above Jason's comment), have 
tried this and felt lot of changes. Let me know if this is not convincing, will 
test it with MockRM and MockNM .


> TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
> -
>
> Key: YARN-6272
> URL: https://issues.apache.org/jira/browse/YARN-6272
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: yarn
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-6272-001.patch
>
>
> I'm seeing this unit test fail fairly often in trunk:
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 5.113 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087)
> at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6272) TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently

2019-04-25 Thread Giovanni Matteo Fumarola (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826267#comment-16826267
 ] 

Giovanni Matteo Fumarola commented on YARN-6272:


Thanks [~Prabhu Joseph] for the patch. 

I am not a fan of Sleep instructions in unit tests.
Can you explain the fix?

> TestAMRMClient#testAMRMClientWithContainerResourceChange fails intermittently
> -
>
> Key: YARN-6272
> URL: https://issues.apache.org/jira/browse/YARN-6272
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: yarn
>Affects Versions: 3.0.0-alpha4
>Reporter: Ray Chiang
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-6272-001.patch
>
>
> I'm seeing this unit test fail fairly often in trunk:
> testAMRMClientWithContainerResourceChange(org.apache.hadoop.yarn.client.api.impl.TestAMRMClient)
>   Time elapsed: 5.113 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.doContainerResourceChange(TestAMRMClient.java:1087)
> at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.testAMRMClientWithContainerResourceChange(TestAMRMClient.java:963)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly

2019-04-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826234#comment-16826234
 ] 

Hadoop QA commented on YARN-9486:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9486 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967031/YARN-9486.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1fc77044de83 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b5dcf64 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24022/testReport/ |
| Max. process+thread count | 446 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24022/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Docker container exited with failure does not get clean up 

[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly

2019-04-25 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826198#comment-16826198
 ] 

Jim Brennan commented on YARN-9486:
---

[~eyang] thanks for updating the comment.  +1 (non-binding) on patch 005.

> Docker container exited with failure does not get clean up correctly
> 
>
> Key: YARN-9486
> URL: https://issues.apache.org/jira/browse/YARN-9486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9486.001.patch, YARN-9486.002.patch, 
> YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch
>
>
> When docker container encounters error and exit prematurely 
> (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we 
> get messages that look like this:
> {code}
> java.io.IOException: Could not find 
> nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid
>  in any of the directories
> 2019-04-15 20:42:16,454 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> RELAUNCHING to EXITED_WITH_FAILURE
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Cleaning up container container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Container container_1555111445937_0008_01_07 not launched. No cleanup 
> needed to be done
> 2019-04-15 20:42:16,455 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase  
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE  
>   APPID=application_1555111445937_0008
> CONTAINERID=container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> EXITED_WITH_FAILURE to DONE
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_1555111445937_0008_01_07 from application 
> application_1555111445937_0008
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Stopping resource-monitoring for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Considering container container_1555111445937_0008_01_07 for 
> log-aggregation
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting localization status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: 
> COMPLETE, Capability: , Diagnostics: ..., ExitStatus: 
> -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE]
> 2019-04-15 20:42:18,464 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
> completed containers from NM context: [container_1555111445937_0008_01_07]
> 2019-04-15 20:43:50,476 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Stopping container with container Id: container_1555111445937_0008_01_07
> {code}
> There is no docker rm command performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly

2019-04-25 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826192#comment-16826192
 ] 

Eric Yang commented on YARN-9486:
-

[~Jim_Brennan] Thank you for the review.  Patch 005 is same as patch 004 with 
comment added to explain the corner cases.

> Docker container exited with failure does not get clean up correctly
> 
>
> Key: YARN-9486
> URL: https://issues.apache.org/jira/browse/YARN-9486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9486.001.patch, YARN-9486.002.patch, 
> YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch
>
>
> When docker container encounters error and exit prematurely 
> (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we 
> get messages that look like this:
> {code}
> java.io.IOException: Could not find 
> nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid
>  in any of the directories
> 2019-04-15 20:42:16,454 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> RELAUNCHING to EXITED_WITH_FAILURE
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Cleaning up container container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Container container_1555111445937_0008_01_07 not launched. No cleanup 
> needed to be done
> 2019-04-15 20:42:16,455 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase  
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE  
>   APPID=application_1555111445937_0008
> CONTAINERID=container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> EXITED_WITH_FAILURE to DONE
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_1555111445937_0008_01_07 from application 
> application_1555111445937_0008
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Stopping resource-monitoring for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Considering container container_1555111445937_0008_01_07 for 
> log-aggregation
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting localization status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: 
> COMPLETE, Capability: , Diagnostics: ..., ExitStatus: 
> -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE]
> 2019-04-15 20:42:18,464 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
> completed containers from NM context: [container_1555111445937_0008_01_07]
> 2019-04-15 20:43:50,476 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Stopping container with container Id: container_1555111445937_0008_01_07
> {code}
> There is no docker rm command performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9513) [JDK11] TestMetricsInvariantChecker#testManyRuns InvariantViolationException: ReferenceError: "GcCountPS_Scavenge" is not defined in at line number 1

2019-04-25 Thread Siyao Meng (JIRA)
Siyao Meng created YARN-9513:


 Summary: [JDK11] TestMetricsInvariantChecker#testManyRuns 
InvariantViolationException: ReferenceError: "GcCountPS_Scavenge" is not 
defined in  at line number 1
 Key: YARN-9513
 URL: https://issues.apache.org/jira/browse/YARN-9513
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Siyao Meng


Found in maven JDK 11 unit test run. Compiled on JDK 8:
{code}
[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.502 
s<<< FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker
[ERROR] 
testManyRuns(org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker)
  Time elapsed: 0.206 s  <<< 
ERROR!org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.InvariantViolationException:
 ReferenceError: "GcCountPS_Scavenge" is not defined in  at line number 1
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.InvariantsChecker.logOrThrow(InvariantsChecker.java:74)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.MetricsInvariantChecker.editSchedule(MetricsInvariantChecker.java:180)
at 
org.apache.hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker.testManyRuns(TestMetricsInvariantChecker.java:69)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9486) Docker container exited with failure does not get clean up correctly

2019-04-25 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9486:

Attachment: YARN-9486.005.patch

> Docker container exited with failure does not get clean up correctly
> 
>
> Key: YARN-9486
> URL: https://issues.apache.org/jira/browse/YARN-9486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9486.001.patch, YARN-9486.002.patch, 
> YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch
>
>
> When docker container encounters error and exit prematurely 
> (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we 
> get messages that look like this:
> {code}
> java.io.IOException: Could not find 
> nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid
>  in any of the directories
> 2019-04-15 20:42:16,454 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> RELAUNCHING to EXITED_WITH_FAILURE
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Cleaning up container container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Container container_1555111445937_0008_01_07 not launched. No cleanup 
> needed to be done
> 2019-04-15 20:42:16,455 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase  
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE  
>   APPID=application_1555111445937_0008
> CONTAINERID=container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> EXITED_WITH_FAILURE to DONE
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_1555111445937_0008_01_07 from application 
> application_1555111445937_0008
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Stopping resource-monitoring for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Considering container container_1555111445937_0008_01_07 for 
> log-aggregation
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting localization status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: 
> COMPLETE, Capability: , Diagnostics: ..., ExitStatus: 
> -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE]
> 2019-04-15 20:42:18,464 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
> completed containers from NM context: [container_1555111445937_0008_01_07]
> 2019-04-15 20:43:50,476 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Stopping container with container Id: container_1555111445937_0008_01_07
> {code}
> There is no docker rm command performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9512) [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLCla

2019-04-25 Thread Siyao Meng (JIRA)
Siyao Meng created YARN-9512:


 Summary: [JDK11] TestAuxServices#testCustomizedAuxServiceClassPath 
ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader 
cannot be cast to class java.net.URLClassLoader
 Key: YARN-9512
 URL: https://issues.apache.org/jira/browse/YARN-9512
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Siyao Meng


Found in maven JDK 11 unit test run. Compiled on JDK 8:
{code}
[ERROR] 
testCustomizedAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices)
  Time elapsed: 0.019 s  <<< ERROR!java.lang.ClassCastException: class 
jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class 
java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and 
java.net.URLClassLoader are in module java.base of loader 'bootstrap')
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices$ServiceC.getMetaData(TestAuxServices.java:197)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStart(AuxServices.java:315)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testCustomizedAuxServiceClassPath(TestAuxServices.java:344)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436

2019-04-25 Thread Siyao Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated YARN-9511:
-
Component/s: test

> [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: 
> The remote jarfile should not be writable by group or others. The current 
> Permission is 436
> ---
>
> Key: YARN-9511
> URL: https://issues.apache.org/jira/browse/YARN-9511
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Siyao Meng
>Priority: Major
>
> Found in maven JDK 11 unit test run. Compiled on JDK 8.
> {code}
> [ERROR] 
> testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices)
>   Time elapsed: 0.551 s  <<< 
> ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote 
> jarfile should not be writable by group or others. The current Permission is 
> 436
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9511) [JDK11] TestAuxServices#testRemoteAuxServiceClassPath YarnRuntimeException: The remote jarfile should not be writable by group or others. The current Permission is 436

2019-04-25 Thread Siyao Meng (JIRA)
Siyao Meng created YARN-9511:


 Summary: [JDK11] TestAuxServices#testRemoteAuxServiceClassPath 
YarnRuntimeException: The remote jarfile should not be writable by group or 
others. The current Permission is 436
 Key: YARN-9511
 URL: https://issues.apache.org/jira/browse/YARN-9511
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siyao Meng


Found in maven JDK 11 unit test run. Compiled on JDK 8.
{code}
[ERROR] 
testRemoteAuxServiceClassPath(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices)
  Time elapsed: 0.551 s  <<< 
ERROR!org.apache.hadoop.yarn.exceptions.YarnRuntimeException: The remote 
jarfile should not be writable by group or others. The current Permission is 436
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:202)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestAuxServices.testRemoteAuxServiceClassPath(TestAuxServices.java:268)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9476) Create unit tests for VE plugin

2019-04-25 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826185#comment-16826185
 ] 

Hadoop QA commented on YARN-9476:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
20s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9476 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12967025/YARN-9476-004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2cc6239b5327 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b5dcf64 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24021/testReport/ |
| Max. process+thread count | 412 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24021/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Create unit tests for VE plugin
> 

[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly

2019-04-25 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826142#comment-16826142
 ] 

Jim Brennan commented on YARN-9486:
---

[~eyang], I am +1 (non-binding) on patch 004.

> Docker container exited with failure does not get clean up correctly
> 
>
> Key: YARN-9486
> URL: https://issues.apache.org/jira/browse/YARN-9486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9486.001.patch, YARN-9486.002.patch, 
> YARN-9486.003.patch, YARN-9486.004.patch
>
>
> When docker container encounters error and exit prematurely 
> (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we 
> get messages that look like this:
> {code}
> java.io.IOException: Could not find 
> nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid
>  in any of the directories
> 2019-04-15 20:42:16,454 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> RELAUNCHING to EXITED_WITH_FAILURE
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Cleaning up container container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Container container_1555111445937_0008_01_07 not launched. No cleanup 
> needed to be done
> 2019-04-15 20:42:16,455 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase  
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE  
>   APPID=application_1555111445937_0008
> CONTAINERID=container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> EXITED_WITH_FAILURE to DONE
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_1555111445937_0008_01_07 from application 
> application_1555111445937_0008
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Stopping resource-monitoring for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Considering container container_1555111445937_0008_01_07 for 
> log-aggregation
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting localization status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: 
> COMPLETE, Capability: , Diagnostics: ..., ExitStatus: 
> -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE]
> 2019-04-15 20:42:18,464 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
> completed containers from NM context: [container_1555111445937_0008_01_07]
> 2019-04-15 20:43:50,476 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Stopping container with container Id: container_1555111445937_0008_01_07
> {code}
> There is no docker rm command performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9476) Create unit tests for VE plugin

2019-04-25 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9476:
---
Attachment: YARN-9476-004.patch

> Create unit tests for VE plugin
> ---
>
> Key: YARN-9476
> URL: https://issues.apache.org/jira/browse/YARN-9476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9476-001.patch, YARN-9476-002.patch, 
> YARN-9476-003.patch, YARN-9476-004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9486) Docker container exited with failure does not get clean up correctly

2019-04-25 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826101#comment-16826101
 ] 

Jim Brennan commented on YARN-9486:
---

{quote}
As the result, we need to check both markedLaunched and isLaunchCompleted to 
get a better picture if the contained failed to launch, still running, or has 
not started at all.
{quote}
[~eyang] Thanks again for the follow-up.   I agree that adding the 
isLaunchCompleted check is warranted to cover all cases.
It might be helpful to add a comment about the relaunch case where a 
containerAlreadyLaunched is false but isCompleted is true, which seems 
counter-intuitive.

> Docker container exited with failure does not get clean up correctly
> 
>
> Key: YARN-9486
> URL: https://issues.apache.org/jira/browse/YARN-9486
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.2.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9486.001.patch, YARN-9486.002.patch, 
> YARN-9486.003.patch, YARN-9486.004.patch
>
>
> When docker container encounters error and exit prematurely 
> (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we 
> get messages that look like this:
> {code}
> java.io.IOException: Could not find 
> nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_07//container_1555111445937_0008_01_07.pid
>  in any of the directories
> 2019-04-15 20:42:16,454 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> RELAUNCHING to EXITED_WITH_FAILURE
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Cleaning up container container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,455 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup:
>  Container container_1555111445937_0008_01_07 not launched. No cleanup 
> needed to be done
> 2019-04-15 20:42:16,455 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase  
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE  
>   APPID=application_1555111445937_0008
> CONTAINERID=container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_1555111445937_0008_01_07 transitioned from 
> EXITED_WITH_FAILURE to DONE
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Removing container_1555111445937_0008_01_07 from application 
> application_1555111445937_0008
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Stopping resource-monitoring for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,458 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Considering container container_1555111445937_0008_01_07 for 
> log-aggregation
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting container-status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Getting localization status for container_1555111445937_0008_01_07
> 2019-04-15 20:42:16,804 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Returning ContainerStatus: [ContainerId: 
> container_1555111445937_0008_01_07, ExecutionType: GUARANTEED, State: 
> COMPLETE, Capability: , Diagnostics: ..., ExitStatus: 
> -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE]
> 2019-04-15 20:42:18,464 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
> completed containers from NM context: [container_1555111445937_0008_01_07]
> 2019-04-15 20:43:50,476 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Stopping container with container Id: container_1555111445937_0008_01_07
> {code}
> There is no docker rm command performed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9476) Create unit tests for VE plugin

2019-04-25 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826097#comment-16826097
 ] 

Szilard Nemeth commented on YARN-9476:
--

Hi [~pbacsko]!
Latest patch loogs good, except one minor thing I suggested before: 
Please store the result of 
{code:java}
f.mkdirs()
{code} 
and have an assertion on the value.
I meant a similar thing for when you are setting the executable flag of the 
files: 

{code:java}
scriptPath.toFile().setExecutable(true)
{code}
Please store the value and assert if it is true.

Thanks!


> Create unit tests for VE plugin
> ---
>
> Key: YARN-9476
> URL: https://issues.apache.org/jira/browse/YARN-9476
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9476-001.patch, YARN-9476-002.patch, 
> YARN-9476-003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9477) Implement VE discovery using libudev

2019-04-25 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826085#comment-16826085
 ] 

Peter Bacsko commented on YARN-9477:


[~tangzhankun] [~snemeth] could you please check out this POC?

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-POC.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9477) Implement VE discovery using libudev

2019-04-25 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826084#comment-16826084
 ] 

Peter Bacsko commented on YARN-9477:


Note that this part is the heart of the improvement:

{code}
Pointer sysPathPtr = libUdev.udev_device_get_syspath(device);
{code}

We need the {{sysPath}} to determine where the file {{os_state}} is. Reading 
the Python script provided by NEC, the following happens:
- Get the {{veslot}} device files under {{/dev}} like {{/dev/veslot0}}
- Get the device object from udev (we know the major/minor of the device file 
-> convert it to a device number (like {{os.makedev()}} in Python))
- Get the syspath using libudev for a particular device file
- Get the PCI bus slot using libudev
- Read the {{/os_state}} file to determine the status of the card

Note that the PCI bus slot is optional, we don't need that (although we can 
retrieve it too) to construct the device object.

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-POC.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9477) Implement VE discovery using libudev

2019-04-25 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9477:
---
Attachment: YARN-9477-POC.patch

> Implement VE discovery using libudev
> 
>
> Key: YARN-9477
> URL: https://issues.apache.org/jira/browse/YARN-9477
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9477-POC.patch
>
>
> Right now we have a Python script which is able to discover VE cards using 
> pyudev: https://pyudev.readthedocs.io/en/latest/
> Java does not officially support libudev. There are some projects on Github 
> (example: https://github.com/Zubnix/udev-java-bindings) but they're not 
> available as Maven artifacts.
> However it's not that difficult to create a minimal layer around libudev 
> using JNA. We don't have to wrap every function, we need to call 4-5 methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9410) Typo in documentation: Using FPGA On YARN

2019-04-25 Thread kevin su (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826043#comment-16826043
 ] 

kevin su commented on YARN-9410:


Could I do this issue?
I don't have permission to assign to myself  

> Typo in documentation: Using FPGA On YARN 
> --
>
> Key: YARN-9410
> URL: https://issues.apache.org/jira/browse/YARN-9410
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Priority: Major
>  Labels: newbie, newbie++
>
> fpag.major-device-number should be changed to fpga... 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9510) Proxyuser access timeline and getdelegationtoken failed without Timeline server restart

2019-04-25 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-9510:
-

 Summary: Proxyuser access timeline and getdelegationtoken failed 
without Timeline server restart
 Key: YARN-9510
 URL: https://issues.apache.org/jira/browse/YARN-9510
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver
Affects Versions: 3.1.0
Reporter: Shen Yinjie


We add a proxyuser by changing "hadoop.proxyuser.xx.yy", and then execute yarn 
rmadmin -refreshSuperUserGroupsConfiguration but didn't  restart timeline 
server.MR job will fail and throws :
Caused by: 
org.apache.hadoop.security.authentication.client.AuthenticationException: 
Authentication failed, URL: 
http://hostname:8188/ws/v1/timeline/?op=GETDELEGATIONTOKEN=alluxio=rm%2Fhc1%40XXF=ambari-qa,
 status: 403, message: Forbidden
at 
org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:401)
at 
org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:74)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:147)
at 
org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:213)

seems that proxyuser info in timeline server has not been refreshed.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org