[jira] [Comment Edited] (YARN-10040) DistributedShell test failure on X86 and ARM

2021-01-06 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260098#comment-17260098
 ] 

Ahmed Hussein edited comment on YARN-10040 at 1/6/21, 10:59 PM:


Thanks [~iwasakims] for fixing {{testDSShellWithOpportunisticContainers}}!
 I found the fix to \{{testDSShellWithEnforceExecutionType}}. It is part of the 
[PR-2581|https://github.com/apache/hadoop/pull/2581].

See the description of the bug in the unit test in my 
[comment-pr-2581|https://github.com/apache/hadoop/pull/2581#issuecomment-755765315]


was (Author: ahussein):
Thanks [~iwasakims] for fixing {{testDSShellWithOpportunisticContainers}}!
I found the fix to{{ testDSShellWithEnforceExecutionType}}. It is part of the 
[PR-2581|https://github.com/apache/hadoop/pull/2581].

See the description of the bug in the unit test in my 
[comment-pr-2581|https://github.com/apache/hadoop/pull/2581#issuecomment-755765315]

> DistributedShell test failure on X86 and ARM
> 
>
> Key: YARN-10040
> URL: https://issues.apache.org/jira/browse/YARN-10040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
> Environment: X86/ARM
> OS: ubuntu1804
> Java 8
>Reporter: zhao bo
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-10040.001.patch
>
>
> * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers
>  * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType
> Please see the Apache Jenkins Test result:
> [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/]
>  
> These 2 tests are failed on both X86 and ARM platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10040) DistributedShell test failure on X86 and ARM

2020-12-22 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253737#comment-17253737
 ] 

Ahmed Hussein edited comment on YARN-10040 at 12/22/20, 8:48 PM:
-

[~abmodi] can you suggest anyone familiar with the changes done in YARN-9697?


was (Author: ahussein):
I changed the status of this Jira to blocker.

[~abmodi] can you suggest anyone familiar with the changes done in YARN-9697?

> DistributedShell test failure on X86 and ARM
> 
>
> Key: YARN-10040
> URL: https://issues.apache.org/jira/browse/YARN-10040
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications/distributed-shell
> Environment: X86/ARM
> OS: ubuntu1804
> Java 8
>Reporter: zhao bo
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-10040.001.patch
>
>
> * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers
>  * 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType
> Please see the Apache Jenkins Test result:
> [https://builds.apache.org/job/hadoop-multibranch/job/PR-1767/1/testReport/]
>  
> These 2 tests are failed on both X86 and ARM platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10040) DistributedShell test failure on X86 and ARM

2020-12-10 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247564#comment-17247564
 ] 

Ahmed Hussein edited comment on YARN-10040 at 12/11/20, 5:28 AM:
-

{quote}Abhishek Modi any pointers about this? Is the code only broken or just 
the test. If the functionality itself has some issue we should consider 
reverting YARN-9697, else if this is only a test issue, we should wrap this up, 
if there isn't a fix available we can disable this test for time being. Let me 
know what is the actual situation. I can try help in whichever way 
possible.{quote}

[~abmodi] Would you mind please taking a look at the failures?




was (Author: ahussein):
On iOS The {{TestDistributedShell}} does not run. But I thought to dump the 
error here because a NPE could be a hint to what's broken in the implementation.


{code:bash}
2020-12-10 17:29:22,129 INFO  [IPC Server listener on 8048] ipc.Server 
(Server.java:run(1344)) - IPC Server listener on 8048: starting
2020-12-10 17:29:22,131 INFO  [Listener at localhost/8048] 
collectormanager.NMCollectorService (NMCollectorService.java:serviceStart(101)) 
- NMCollectorService started at localhost/127.0.0.1:8048
2020-12-10 17:29:22,131 INFO  [Listener at localhost/8048] 
nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:serviceStart(267)) - Node ID assigned is : 
localhost:54943
2020-12-10 17:29:22,207 INFO  [Listener at localhost/8048] 
resourcemanager.ResourceTrackerService 
(ResourceTrackerService.java:registerNodeManager(617)) - NodeManager from node 
localhost(cmPort: 54943 httpPort: 54946) registered with capability: 
, assigned nodeId localhost:54943
2020-12-10 17:29:22,210 INFO  [Listener at localhost/8048] 
security.NMContainerTokenSecretManager 
(NMContainerTokenSecretManager.java:setMasterKey(143)) - Rolling master-key for 
container-tokens, got key with id -210390460
2020-12-10 17:29:22,210 INFO  [Listener at localhost/8048] 
security.NMTokenSecretManagerInNM 
(NMTokenSecretManagerInNM.java:setMasterKey(143)) - Rolling master-key for 
container-tokens, got key with id -1432443197
2020-12-10 17:29:22,210 INFO  [Listener at localhost/8048] 
nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:registerWithRM(486)) - Registered with 
ResourceManager as localhost:54943 with total resource of 
2020-12-10 17:29:22,212 INFO  [Listener at localhost/8048] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating 
the current master key for generating delegation tokens
2020-12-10 17:29:22,212 INFO  [Thread[Thread-282,5,FailOnTimeoutGroup]] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(701)) - Starting expired 
delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2020-12-10 17:29:22,212 INFO  [Thread[Thread-282,5,FailOnTimeoutGroup]] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(367)) - Updating 
the current master key for generating delegation tokens
2020-12-10 17:29:22,212 INFO  [RM Event dispatcher] rmnode.RMNodeImpl 
(RMNodeImpl.java:handle(774)) - localhost:54943 Node Transitioned from NEW to 
UNHEALTHY
2020-12-10 17:29:22,214 INFO  
[org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event
 Processor] distributed.NodeQueueLoadMonitor 
(NodeQueueLoadMonitor.java:removeNode(202)) - Node delete event for: localhost
2020-12-10 17:29:22,215 ERROR [SchedulerEventDispatcher:Event Processor] 
capacity.CapacityScheduler (CapacityScheduler.java:removeNode(2127)) - 
Attempting to remove non-existent node localhost:54943
2020-12-10 17:29:22,215 ERROR 
[org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event
 Processor] event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error 
in handling event type NODE_REMOVED to the Event Dispatcher
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeFromNodeIdsByRack(NodeQueueLoadMonitor.java:405)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.distributed.NodeQueueLoadMonitor.removeNode(NodeQueueLoadMonitor.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:399)
at 
org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.handle(OpportunisticContainerAllocatorAMService.java:94)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:71)
at java.lang.Thread.run(Thread.java:748)
2020-12-10 17:29:22,216 INFO  
[org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService:Event
 Proces