date:20200512

[jira] [Comment Edited] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2020-05-12 Thread Prabhu Joseph (Jira)

[
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105987#comment-17105987
]

Prabhu Joseph edited comment on YARN-10259 at 5/13/20, 5:31 AM:

*ISSUE 1: No new Allocation/Reservation happens in Multi Node Placement when a
node is Full and has a Reserved Container.*

Below is the flow where issue happens:

CapacityScheduler#allocateContainersOnMultiNodes ->
CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers
-> LeafQueue#allocateFromReservedContainer

When CS tries to allocate or reserve a new container on a node, LeafQueue does
allocate from already Reserved Container iterating all the nodes part of multi
node candidatesSet. When a node is full with reserved container, it sends
ReReserved Allocation. This runs in a loop without moving to next nodes for
allocating / reserving new containers.

*Example:*

NodeA (fully utilized with reserved container of 5GB), NodeB (has space for 5GB)

A. CS tries to allocate or reserve new container on NodeB -> B. LeafQueue does
allocate of reserved containers iterating all nodes part of multi node
candidatesSet -> C. Checks NodeA reserved container can not be ALLOCATED -> D.
RE-RESERVE and Return RESERVED assignment -> A to D runs in a loop without
trying to allocate on NodeB

*SOLUTION:*
LeafQueue#allocateFromReservedContainer should not happen for a Multi Node
Placement when the call is from
CapacityScheduler#allocateOrReserveNewContainers. It has to happen only for
CapacityScheduler#allocateFromReservedContainer.
CapacityScheduler#allocateFromReservedContainer will send a Single Node
Candidate for both Single Node / Multi Node placement.

*ISSUE 2: No new Allocation happens in Multi Node Placement when the first node
part of multi node iterator is Full.*

Below is the flow where issue happens:

CapacityScheduler#allocateContainersOnMultiNodes ->
CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers
-> FiCaSchedulerApp#assignContainers ->
RegularContainerAllocator#assignContainers -> RegularContainerAllocator#allocate

When CS tries to allocate or reserve new container on a node,
RegularContainerAllocator#allocate iterates all nodes given by
MultiNodeLookupPolicy. If the first node does not have space to fit the
SchedulerRequestKey, it will send a CSAssignment With RESERVED Allocation. It
skips checking Subsequent nodes with space. (This won't be a problem for
ResourceUsageMultiNodeLookupPolicy as that always has first node with less
usage but affects custom policies like BinPacking).

*Example:*

NodeA (2GB available space), NodeB (3GB available space)

MultiNodeIterator order => NodeA, NodeB

CS tries to allocate/reserve on NodeA (3GB pending request) ->
RegularContainerAllocator gets first node part of iterator (Node A) -> Sends
Allocation RESERVED

CS tries to allocate/reserve on NodeB (3GB pending request) ->
RegularContainerAllocator gets first node part of iterator (Node A) -> Sends
ReReserved

There is no new allocation / reservation happens on subsequent nodes of Multi
Node Iteartor.

*SOLUTION:*

RegularContainerAllocator#allocate has to try to allocate on subsequent nodes
as well before sending RESREVED / RERESERVED.

was (Author: prabhu joseph):
*ISSUE 1: No new Allocation/Reservation happens in Multi Node Placement when a
node is Full and has a Reserved Container.*

Below is the flow where issue happens:

CapacityScheduler#allocateContainersOnMultiNodes ->
CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers
-> LeafQueue#allocateFromReservedContainer

Example:

NodeA (fully utilized with reserved container of 5GB), NodeB (has space for 5GB)

SOLUTION:
LeafQueue#allocateFromReservedContainer should not happen for a Multi Node
Placement when the call is from
CapacityScheduler#allocateOrReserveNewContainers. It has to happen only for
CapacityScheduler#allocateFromReservedContainer.
CapacityScheduler#allocateFromReservedContainer will send a Single Node
Candidate for both Single Node / Multi Node placement.

*ISSUE 2: No new Allocation happens in Multi Node Placement when the first node
part of multi node iterator is

[jira] [Commented] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2020-05-12 Thread Prabhu Joseph (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105987#comment-17105987
 ] 

Prabhu Joseph commented on YARN-10259:
--

*ISSUE 1: No new Allocation/Reservation happens in Multi Node Placement when a 
node is Full and has a Reserved Container.*

Below is the flow where issue happens:

CapacityScheduler#allocateContainersOnMultiNodes -> 
CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers 
-> LeafQueue#allocateFromReservedContainer

When CS tries to allocate or reserve a new container on a node, LeafQueue does 
allocate from already Reserved Container iterating all the nodes part of multi 
node candidatesSet. When a node is full with reserved container, it sends 
ReReserved Allocation. This runs in a loop without moving to next nodes for 
allocating / reserving new containers.

Example:

NodeA (fully utilized with reserved container of 5GB), NodeB (has space for 5GB)

A. CS tries to allocate or reserve new container on NodeB -> B. LeafQueue does 
allocate of reserved containers iterating all nodes part of multi node 
candidatesSet -> C. Checks NodeA reserved container can not be ALLOCATED -> D. 
RE-RESERVE and Return RESERVED assignment -> A to D runs in a loop without 
trying to allocate on NodeB

SOLUTION:
LeafQueue#allocateFromReservedContainer should not happen for a Multi Node 
Placement when the call is from 
CapacityScheduler#allocateOrReserveNewContainers. It has to happen only for 
CapacityScheduler#allocateFromReservedContainer. 
CapacityScheduler#allocateFromReservedContainer will send a Single Node 
Candidate for both Single Node / Multi Node placement.


*ISSUE 2: No new Allocation happens in Multi Node Placement when the first node 
part of multi node iterator is Full.*

Below is the flow where issue happens:

CapacityScheduler#allocateContainersOnMultiNodes -> 
CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers 
-> FiCaSchedulerApp#assignContainers -> 
RegularContainerAllocator#assignContainers -> RegularContainerAllocator#allocate

When CS tries to allocate or reserve new container on a node, 
RegularContainerAllocator#allocate iterates all nodes given by 
MultiNodeLookupPolicy. If the first node does not have space to fit the 
SchedulerRequestKey, it will send a CSAssignment With RESERVED Allocation. It 
skips checking Subsequent nodes with space. (This won't be a problem for 
ResourceUsageMultiNodeLookupPolicy as that always has first node with less 
usage but affects custom policies like BinPacking).

Example:

NodeA (2GB available space), NodeB (3GB available space)

MultiNodeIterator order => NodeA, NodeB

CS tries to allocate/reserve on NodeA (3GB pending request) -> 
RegularContainerAllocator gets first node part of iterator (Node A) -> Sends 
Allocation RESERVED

CS tries to allocate/reserve on NodeB (3GB pending request) -> 
RegularContainerAllocator gets first node part of iterator (Node A) -> Sends 
ReReserved

There is no new allocation / reservation happens on subsequent nodes of Multi 
Node Iteartor.

SOLUTION:

RegularContainerAllocator#allocate has to try to allocate on subsequent nodes 
as well before sending RESREVED / RERESERVED.

> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement
> ---
>
> Key: YARN-10259
> URL: https://issues.apache.org/jira/browse/YARN-10259
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10259-001.patch, YARN-10259-002.patch, 
> YARN-10259-003.patch
>
>
> Reserved Containers are not allocated from the available space of other nodes 
> in CandidateNodeSet in MultiNodePlacement. 
> *Repro:*
> 1. MultiNode Placement Enabled.
> 2. Two nodes h1 and h2 with 8GB
> 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
> placed in h2.
> 4. Submit app3 AM which is reserved in h1
> 5. Kill app2 which frees space in h2.
> 6. app3 AM never gets ALLOCATED
> RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on 
> h2 as it expects the assignment to be on same node where reservation has 
> happened.
> {code}
> 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
> appattempt_1588684773609_0003_01 reserved container 
> container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 
> available= used=. This attempt 
> currently has 1 reserved containers at priority 0; currentReservation 
>

[jira] [Commented] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2020-05-12 Thread Prabhu Joseph (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105969#comment-17105969
 ] 

Prabhu Joseph commented on YARN-10259:
--

Thanks [~wangda] for the review.

Have found only the above log is in WARN in the code flow of 
RegularContainerAllocator#tryAllocateOnNode. Have changed it to debug in  
[^YARN-10259-003.patch] 

[~bibinchundatt] Have not addressed the improvements suggested by you, will 
scope it in a new Jira.
Please let me know if you are fine with the latest patch. Thanks.

> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement
> ---
>
> Key: YARN-10259
> URL: https://issues.apache.org/jira/browse/YARN-10259
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10259-001.patch, YARN-10259-002.patch, 
> YARN-10259-003.patch
>
>
> Reserved Containers are not allocated from the available space of other nodes 
> in CandidateNodeSet in MultiNodePlacement. 
> *Repro:*
> 1. MultiNode Placement Enabled.
> 2. Two nodes h1 and h2 with 8GB
> 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
> placed in h2.
> 4. Submit app3 AM which is reserved in h1
> 5. Kill app2 which frees space in h2.
> 6. app3 AM never gets ALLOCATED
> RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on 
> h2 as it expects the assignment to be on same node where reservation has 
> happened.
> {code}
> 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
> appattempt_1588684773609_0003_01 reserved container 
> container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 
> available= used=. This attempt 
> currently has 1 reserved containers at priority 0; currentReservation 
> 
> 2020-05-05 18:49:37,264 INFO  [AsyncDispatcher event handler] 
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved 
> container=container_1588684773609_0003_01_01, on node=host: h1:1234 
> #containers=1 available= used= 
> with resource=
>RESERVED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h1:1234; Resource=)]
>
> 2020-05-05 18:49:38,283 DEBUG [Time-limited test] 
> allocator.RegularContainerAllocator 
> (RegularContainerAllocator.java:assignContainer(514)) - assignContainers: 
> node=h2 application=application_1588684773609_0003 priority=0 
> pendingAsk=,repeat=1> 
> type=OFF_SWITCH
> 2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate 
> from reserved container container_1588684773609_0003_01_01, but node is 
> not reserved
>ALLOCATED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h2:1234; Resource=)]
> {code}
> Attached testcase which reproduces the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2020-05-12 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10259:
-
Description: 
Reserved Containers are not allocated from the available space of other nodes 
in CandidateNodeSet in MultiNodePlacement. 

*Repro:*

1. MultiNode Placement Enabled.
2. Two nodes h1 and h2 with 8GB
3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
placed in h2.
4. Submit app3 AM which is reserved in h1
5. Kill app2 which frees space in h2.
6. app3 AM never gets ALLOCATED

RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on h2 
as it expects the assignment to be on same node where reservation has happened.

{code}
2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
appattempt_1588684773609_0003_01 reserved container 
container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 
available= used=. This attempt 
currently has 1 reserved containers at priority 0; currentReservation 


2020-05-05 18:49:37,264 INFO  [AsyncDispatcher event handler] 
fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved 
container=container_1588684773609_0003_01_01, on node=host: h1:1234 
#containers=1 available= used= 
with resource=
 RESERVED=[(Application=appattempt_1588684773609_0003_01; 
Node=h1:1234; Resource=)]
 
2020-05-05 18:49:38,283 DEBUG [Time-limited test] 
allocator.RegularContainerAllocator 
(RegularContainerAllocator.java:assignContainer(514)) - assignContainers: 
node=h2 application=application_1588684773609_0003 priority=0 
pendingAsk=,repeat=1> 
type=OFF_SWITCH

2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp 
(FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate 
from reserved container container_1588684773609_0003_01_01, but node is not 
reserved
 ALLOCATED=[(Application=appattempt_1588684773609_0003_01; 
Node=h2:1234; Resource=)]
{code}

Attached testcase which reproduces the issue.

  was:
Reserved Containers are not allocated from the available space of other nodes 
in CandidateNodeSet in MultiNodePlacement. 

*Repro:*

1. MultiNode Placement Enabled.
2. Two nodes h1 and h2 with 8GB
3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
placed in h2.
4. Submit app3 AM which is reserved in h1
5. Kill app2 which frees space in h2.
6. app3 AM never gets ALLOCATED

RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on h2 
as it expects the assignment to be on same node where reservation has happened.

{code}
2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
scheduler.SchedulerApplicationAttempt 
(SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
appattempt_1588684773609_0003_01 reserved container 
container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 
available= used=. This attempt 
currently has 1 reserved containers at priority 0; currentReservation 


2020-05-05 18:49:37,264 INFO  [AsyncDispatcher event handler] 
fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved 
container=container_1588684773609_0003_01_01, on node=host: h1:1234 
#containers=1 available= used= 
with resource=
 RESERVED=[(Application=appattempt_1588684773609_0003_01; 
Node=h1:1234; Resource=)]
 
2020-05-05 18:49:38,283 DEBUG [Time-limited test] 
allocator.RegularContainerAllocator 
(RegularContainerAllocator.java:assignContainer(514)) - assignContainers: 
node=h2 application=application_1588684773609_0003 priority=0 
pendingAsk=,repeat=1> 
type=OFF_SWITCH

2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp 
(FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate 
from reserved container container_1588684773609_0003_01_01, but node is not 
reserved
 ALLOCATED=[(Application=appattempt_1588684773609_0003_01; 
Node=h2:1234; Resource=)]
{code}

After reverting fix of YARN-8127, it works. Attached testcase which reproduces 
the issue.


> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement
> ---
>
> Key: YARN-10259
> URL: https://issues.apache.org/jira/browse/YARN-10259
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10259-001.patch, YARN-10259-002.patch, 
> YARN-10259-003.patch
>
>
> Reserved Containers are not

[jira] [Updated] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2020-05-12 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10259:
-
Attachment: YARN-10259-003.patch

> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement
> ---
>
> Key: YARN-10259
> URL: https://issues.apache.org/jira/browse/YARN-10259
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10259-001.patch, YARN-10259-002.patch, 
> YARN-10259-003.patch
>
>
> Reserved Containers are not allocated from the available space of other nodes 
> in CandidateNodeSet in MultiNodePlacement. 
> *Repro:*
> 1. MultiNode Placement Enabled.
> 2. Two nodes h1 and h2 with 8GB
> 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
> placed in h2.
> 4. Submit app3 AM which is reserved in h1
> 5. Kill app2 which frees space in h2.
> 6. app3 AM never gets ALLOCATED
> RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on 
> h2 as it expects the assignment to be on same node where reservation has 
> happened.
> {code}
> 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
> appattempt_1588684773609_0003_01 reserved container 
> container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 
> available= used=. This attempt 
> currently has 1 reserved containers at priority 0; currentReservation 
> 
> 2020-05-05 18:49:37,264 INFO  [AsyncDispatcher event handler] 
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved 
> container=container_1588684773609_0003_01_01, on node=host: h1:1234 
> #containers=1 available= used= 
> with resource=
>RESERVED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h1:1234; Resource=)]
>
> 2020-05-05 18:49:38,283 DEBUG [Time-limited test] 
> allocator.RegularContainerAllocator 
> (RegularContainerAllocator.java:assignContainer(514)) - assignContainers: 
> node=h2 application=application_1588684773609_0003 priority=0 
> pendingAsk=,repeat=1> 
> type=OFF_SWITCH
> 2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate 
> from reserved container container_1588684773609_0003_01_01, but node is 
> not reserved
>ALLOCATED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h2:1234; Resource=)]
> {code}
> After reverting fix of YARN-8127, it works. Attached testcase which 
> reproduces the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-05-12 Thread Manikandan R (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105962#comment-17105962
 ] 

Manikandan R commented on YARN-10154:
-

Looks good to me. Thanks [~prabhujoseph] and [~sunilg]

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch, YARN-10154.addendum-001.patch, 
> YARN-10154.addendum-002.patch, YARN-10154.addendum-003.patch, 
> YARN-10154.addendum-004.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9898) Dependency netty-all-4.1.27.Final doesn't support ARM platform

2020-05-12 Thread liusheng (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105940#comment-17105940
 ] 

liusheng commented on YARN-9898:


The 2 above failed tests I have run locally without this patch applied and 
still fail

> Dependency netty-all-4.1.27.Final doesn't support ARM platform
> --
>
> Key: YARN-9898
> URL: https://issues.apache.org/jira/browse/YARN-9898
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Assignee: liusheng
>Priority: Major
> Attachments: YARN-9898.001.patch, YARN-9898.002.patch, 
> YARN-9898.003.patch, YARN-9898.004.patch
>
>
> Hadoop dependent the Netty package, but the *netty-all-4.1.27.Final* of 
> io.netty maven repo, cannot support ARM platform. 
> When run the test *TestCsiClient.testIdentityService* on ARM server, it will 
> raise error like following:
> {code:java}
> Caused by: java.io.FileNotFoundException: 
> META-INF/native/libnetty_transport_native_epoll_aarch_64.so
> at 
> io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:161)
> ... 45 more
> Suppressed: java.lang.UnsatisfiedLinkError: no 
> netty_transport_native_epoll_aarch_64 in java.library.path
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
> at java.lang.Runtime.loadLibrary0(Runtime.java:870)
> at java.lang.System.loadLibrary(System.java:1122)
> at 
> io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:38)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:243)
> at 
> io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:124)
> ... 45 more
> Suppressed: java.lang.UnsatisfiedLinkError: no 
> netty_transport_native_epoll_aarch_64 in java.library.path
> at 
> java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
> at java.lang.Runtime.loadLibrary0(Runtime.java:870)
> at java.lang.System.loadLibrary(System.java:1122)
> at 
> io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:38)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:263)
> at java.security.AccessController.doPrivileged(Native 
> Method)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:255)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:233)
> ... 46 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10248) when config allowed-gpu-devices , excluded GPUs still be visible to containers

2020-05-12 Thread Zhankun Tang (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105933#comment-17105933
 ] 

Zhankun Tang commented on YARN-10248:
-

[~jasstionzyf], do you mean the existing test case 
"testAllocationWithoutAllowedGpus" fails but is not related to our changes?

> when config allowed-gpu-devices , excluded GPUs still be visible to containers
> --
>
> Key: YARN-10248
> URL: https://issues.apache.org/jira/browse/YARN-10248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.1
>Reporter: zhao yufei
>Assignee: zhao yufei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.2.1
>
> Attachments: YARN-10248-branch-3.2.001.path, 
> YARN-10248-branch-3.2.001.path
>
>
> I have a server with two GPU, and i want to use only one of them within yarn 
> cluster.
> according to hadoop document, i set configs:
> {code:java}
> 
> yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices
> 0:1
>   
> 
> 
> yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables
> /etc/alternatives/x86_64-linux-gnu_nvidia_smi
>   
> {code}
> then i running following command to test:
> {code:java}
> yarn jar 
> ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar \
>  -jar 
> ./share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.2.1.jar  
> -shell_command ' nvidia-smi & sleep 3  ' \
>  -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=1  \
>  -num_containers 1 -queue yufei -node_label_expression slaves
> {code}
> iI expected gpu with minor number 0 will not visible to container, but in the 
> launched container, nvidia-smi  print two gpu information.
> I check the related source code and find it is a bug.
> the problem is:
> when you specify allowed-gpu-devices, GpuDiscoverer will populate usable gpus 
> from it,  
> then when assign to a container some of the gpus, it will set denied gpus for 
> the container,
> but it never consider excluded gpu of the host. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10176) TestTimelineAuthFilterForV2 fails intermittently

2020-05-12 Thread Ahmed Hussein (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105725#comment-17105725
 ] 

Ahmed Hussein commented on YARN-10176:
--


{code:bash}
lineservice.security.TestTimelineAuthFilterForV2
[ERROR] 
testPutTimelineEntities[1](org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2)
  Time elapsed: 6.611 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.verifyEntity(TestTimelineAuthFilterForV2.java:293)
at 
org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2.testPutTimelineEntities(TestTimelineAuthFilterForV2.java:437)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   
TestTimelineAuthFilterForV2.testPutTimelineEntities:437->verifyEntity:293
[INFO]
[ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0
[INFO]
[ERROR] There are test failures.

{code}


> TestTimelineAuthFilterForV2 fails intermittently
> 
>
> Key: YARN-10176
> URL: https://issues.apache.org/jira/browse/YARN-10176
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Reporter: Ahmed Hussein
>Assignee: Prabhu Joseph
>Priority: Major
>
> TestTimelineAuthFilterForV2 fails intermittently on trunk and branch-2.10.
> To reproduce the failure, execute TestTimelineAuthFilterForV2 inside a loop.
> {code:bash}
> [INFO] Running 
> org.apache.hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2
> [ERROR] Tests

[jira] [Commented] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2020-05-12 Thread Wangda Tan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105675#comment-17105675
 ] 

Wangda Tan commented on YARN-10259:
---

Reviewed the patch, it looks good to me, I think it may introduce performance 
regression for large clusters, but I agree this is the right fix otherwise we 
can see issues such as scheduler got stuck. 

Can we move this (and similar logs) to debug: 
{code:java}
LOG.warn("Node : " + node.getNodeID()
+ " does not have sufficient resource for ask : " + pendingAsk
+ " node total capability : " + node.getTotalResource()); {code}
Because for a heterogeneous cluster, we can see this quite often, putting this 
to warn is overkill to me.

So +1 to the patch, please move some logs to debug to make sure we won't see 
the number of logs increased too much after this change. 

> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement
> ---
>
> Key: YARN-10259
> URL: https://issues.apache.org/jira/browse/YARN-10259
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10259-001.patch, YARN-10259-002.patch
>
>
> Reserved Containers are not allocated from the available space of other nodes 
> in CandidateNodeSet in MultiNodePlacement. 
> *Repro:*
> 1. MultiNode Placement Enabled.
> 2. Two nodes h1 and h2 with 8GB
> 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
> placed in h2.
> 4. Submit app3 AM which is reserved in h1
> 5. Kill app2 which frees space in h2.
> 6. app3 AM never gets ALLOCATED
> RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on 
> h2 as it expects the assignment to be on same node where reservation has 
> happened.
> {code}
> 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
> appattempt_1588684773609_0003_01 reserved container 
> container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 
> available= used=. This attempt 
> currently has 1 reserved containers at priority 0; currentReservation 
> 
> 2020-05-05 18:49:37,264 INFO  [AsyncDispatcher event handler] 
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved 
> container=container_1588684773609_0003_01_01, on node=host: h1:1234 
> #containers=1 available= used= 
> with resource=
>RESERVED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h1:1234; Resource=)]
>
> 2020-05-05 18:49:38,283 DEBUG [Time-limited test] 
> allocator.RegularContainerAllocator 
> (RegularContainerAllocator.java:assignContainer(514)) - assignContainers: 
> node=h2 application=application_1588684773609_0003 priority=0 
> pendingAsk=,repeat=1> 
> type=OFF_SWITCH
> 2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate 
> from reserved container container_1588684773609_0003_01_01, but node is 
> not reserved
>ALLOCATED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h2:1234; Resource=)]
> {code}
> After reverting fix of YARN-8127, it works. Attached testcase which 
> reproduces the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10260) Allow transitioning queue from DRAINING to RUNNING state

2020-05-12 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105648#comment-17105648
 ] 

Hudson commented on YARN-10260:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18243 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18243/])
YARN-10260. Allow transitioning queue from DRAINING to RUNNING state. (jhung: 
rev fff1d2c1226ec23841b04dd478e8b97f31abbeba)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueState.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java


> Allow transitioning queue from DRAINING to RUNNING state
> 
>
> Key: YARN-10260
> URL: https://issues.apache.org/jira/browse/YARN-10260
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.2.2, 2.10.1, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: YARN-10260.001.patch
>
>
> We found in our cluster, a queue was erroneously stopped. Then queue is 
> internally in DRAINING state. It cannot be moved back to RUNNING state until 
> the queue is finished draining. For queues with large workloads, this can 
> block other apps from submitting to this queue for a long time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6526) Refactoring SQLFederationStateStore by avoiding to recreate a connection at every call

2020-05-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105604#comment-17105604
 ] 

Hadoop QA commented on YARN-6526:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
18s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
16s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common in 
trunk has 1 extant findbugs warnings. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 41s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
57s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 70m  0s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26026/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-6526 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13002748/YARN-6526.007.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 8b69327bc242 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / a3f945fb846 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/26026/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common-warnings.html
 |

[jira] [Commented] (YARN-9301) Too many InvalidStateTransitionException with SLS

2020-05-12 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105546#comment-17105546
 ] 

Hudson commented on YARN-9301:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18240 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18240/])
YARN-9301. Too many InvalidStateTransitionException with SLS. (inigoiri: rev 
96bbc3bc972619bd830b2f935c06a1585a5470c6)
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/resourcemanager/MockAMLauncher.java


> Too many InvalidStateTransitionException with SLS
> -
>
> Key: YARN-9301
> URL: https://issues.apache.org/jira/browse/YARN-9301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Major
>  Labels: simulator
> Fix For: 3.4.0
>
> Attachments: YARN-9301-001.patch, YARN-9301.002.patch
>
>
> Too many InvalidStateTransistionExcetion
> {noformat}
> 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> LAUNCHED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:483)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.containerLaunchedOnNode(SchedulerApplicationAttempt.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.containerLaunchedOnNode(AbstractYarnScheduler.java:359)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNewContainerInfo(AbstractYarnScheduler.java:1010)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1295)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1752)
> at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205)
> at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Invalid event LAUNCHED 
> on container container_1550059705491_0067_01_01
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8942) PriorityBasedRouterPolicy throws exception if all sub-cluster weights have negative value

2020-05-12 Thread Bilwa S T (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-8942:

Fix Version/s: 3.4.0

> PriorityBasedRouterPolicy throws exception if all sub-cluster weights have 
> negative value
> -
>
> Key: YARN-8942
> URL: https://issues.apache.org/jira/browse/YARN-8942
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akshay Agarwal
>Assignee: Bilwa S T
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: YARN-8942.001.patch, YARN-8942.002.patch
>
>
> In *PriorityBasedRouterPolicy* if all sub-cluster weights are *set to 
> negative values* it is throwing exception while running a job.
> Ideally it should handle the negative priority as well according to the home 
> sub cluster selection process of the policy.
>  *Exception Details:*
> {code:java}
> java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Unable 
> to insert the ApplicationId application_1540356760422_0015 into the 
> FederationStateStore
> at 
> org.apache.hadoop.yarn.server.router.RouterServerUtil.logAndThrowException(RouterServerUtil.java:56)
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.submitApplication(FederationClientInterceptor.java:418)
> at 
> org.apache.hadoop.yarn.server.router.clientrm.RouterClientRMService.submitApplication(RouterClientRMService.java:218)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:282)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:579)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Caused by: 
> org.apache.hadoop.yarn.server.federation.store.exception.FederationStateStoreInvalidInputException:
>  Missing SubCluster Id information. Please try again by specifying Subcluster 
> Id information.
> at 
> org.apache.hadoop.yarn.server.federation.store.utils.FederationMembershipStateStoreInputValidator.checkSubClusterId(FederationMembershipStateStoreInputValidator.java:247)
> at 
> org.apache.hadoop.yarn.server.federation.store.utils.FederationApplicationHomeSubClusterStoreInputValidator.checkApplicationHomeSubCluster(FederationApplicationHomeSubClusterStoreInputValidator.java:160)
> at 
> org.apache.hadoop.yarn.server.federation.store.utils.FederationApplicationHomeSubClusterStoreInputValidator.validate(FederationApplicationHomeSubClusterStoreInputValidator.java:65)
> at 
> org.apache.hadoop.yarn.server.federation.store.impl.ZookeeperFederationStateStore.addApplicationHomeSubCluster(ZookeeperFederationStateStore.java:159)
> at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy84.addApplicationHomeSubCluster(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.federation.utils.FederationStateStoreFacade.addApplicationHomeSubCluster(FederationStateStoreFacade.java:402)
> at 
> org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor.submitApplication(FederationClientInterceptor.java:413)
> ... 11 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org

[jira] [Commented] (YARN-6526) Refactoring SQLFederationStateStore by avoiding to recreate a connection at every call

2020-05-12 Thread Bilwa S T (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105538#comment-17105538
 ] 

Bilwa S T commented on YARN-6526:
-

Thanks [~elgoiri] for reviewing. I updated patch

> Refactoring SQLFederationStateStore by avoiding to recreate a connection at 
> every call
> --
>
> Key: YARN-6526
> URL: https://issues.apache.org/jira/browse/YARN-6526
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-6526.001.patch, YARN-6526.002.patch, 
> YARN-6526.003.patch, YARN-6526.004.patch, YARN-6526.005.patch, 
> YARN-6526.006.patch, YARN-6526.007.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9301) Too many InvalidStateTransitionException with SLS

2020-05-12 Thread Bilwa S T (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105537#comment-17105537
 ] 

Bilwa S T commented on YARN-9301:
-

Thanks [~elgoiri] for committing.  I will take care from next time.

> Too many InvalidStateTransitionException with SLS
> -
>
> Key: YARN-9301
> URL: https://issues.apache.org/jira/browse/YARN-9301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Major
>  Labels: simulator
> Fix For: 3.4.0
>
> Attachments: YARN-9301-001.patch, YARN-9301.002.patch
>
>
> Too many InvalidStateTransistionExcetion
> {noformat}
> 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> LAUNCHED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:483)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.containerLaunchedOnNode(SchedulerApplicationAttempt.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.containerLaunchedOnNode(AbstractYarnScheduler.java:359)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNewContainerInfo(AbstractYarnScheduler.java:1010)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1295)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1752)
> at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205)
> at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Invalid event LAUNCHED 
> on container container_1550059705491_0067_01_01
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-6526) Refactoring SQLFederationStateStore by avoiding to recreate a connection at every call

2020-05-12 Thread Bilwa S T (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-6526:

Attachment: YARN-6526.007.patch

> Refactoring SQLFederationStateStore by avoiding to recreate a connection at 
> every call
> --
>
> Key: YARN-6526
> URL: https://issues.apache.org/jira/browse/YARN-6526
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-6526.001.patch, YARN-6526.002.patch, 
> YARN-6526.003.patch, YARN-6526.004.patch, YARN-6526.005.patch, 
> YARN-6526.006.patch, YARN-6526.007.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9301) Too many InvalidStateTransitionException with SLS

2020-05-12 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105533#comment-17105533
 ] 

Hudson commented on YARN-9301:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18239 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18239/])
YARN-9301. Too many InvalidStateTransitionException with SLS. (inigoiri: rev 
9cbd0cd2a9268ff2e8fed0af335e9c4f91c5f601)
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/resourcemanager/MockAMLauncher.java


> Too many InvalidStateTransitionException with SLS
> -
>
> Key: YARN-9301
> URL: https://issues.apache.org/jira/browse/YARN-9301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Major
>  Labels: simulator
> Fix For: 3.4.0
>
> Attachments: YARN-9301-001.patch, YARN-9301.002.patch
>
>
> Too many InvalidStateTransistionExcetion
> {noformat}
> 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> LAUNCHED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:483)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.containerLaunchedOnNode(SchedulerApplicationAttempt.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.containerLaunchedOnNode(AbstractYarnScheduler.java:359)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNewContainerInfo(AbstractYarnScheduler.java:1010)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1295)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1752)
> at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205)
> at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Invalid event LAUNCHED 
> on container container_1550059705491_0067_01_01
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10201) Make AMRMProxyPolicy aware of SC load

2020-05-12 Thread Jira



[ 
https://issues.apache.org/jira/browse/YARN-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105528#comment-17105528
 ] 

Íñigo Goiri commented on YARN-10201:


The latest report from Yetus doesn't look very healthy.
[~youchen], could you take a pass?

> Make AMRMProxyPolicy aware of SC load
> -
>
> Key: YARN-10201
> URL: https://issues.apache.org/jira/browse/YARN-10201
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Major
> Attachments: YARN-10201.v0.patch, YARN-10201.v1.patch, 
> YARN-10201.v10.patch, YARN-10201.v2.patch, YARN-10201.v3.patch, 
> YARN-10201.v4.patch, YARN-10201.v5.patch, YARN-10201.v6.patch, 
> YARN-10201.v7.patch, YARN-10201.v8.patch, YARN-10201.v9.patch
>
>
> LocalityMulticastAMRMProxyPolicy is currently unaware of SC load when 
> splitting resource requests. We propose changes to the policy so that it 
> receives feedback from SCs and can load balance requests across the federated 
> cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6526) Refactoring SQLFederationStateStore by avoiding to recreate a connection at every call

2020-05-12 Thread Jira



[ 
https://issues.apache.org/jira/browse/YARN-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105526#comment-17105526
 ] 

Íñigo Goiri commented on YARN-6526:
---

Thanks for the update in  [^YARN-6526.006.patch].
For the debugging, no need for:
{code}
if (LOG.isDebugEnabled()) {
  LOG.info("Connection created");
}
{code}
Just do:
{code}
LOG.debug("Connection created");
{code}

> Refactoring SQLFederationStateStore by avoiding to recreate a connection at 
> every call
> --
>
> Key: YARN-6526
> URL: https://issues.apache.org/jira/browse/YARN-6526
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation
>Reporter: Giovanni Matteo Fumarola
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-6526.001.patch, YARN-6526.002.patch, 
> YARN-6526.003.patch, YARN-6526.004.patch, YARN-6526.005.patch, 
> YARN-6526.006.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9301) Too many InvalidStateTransitionException with SLS

2020-05-12 Thread Jira



[ 
https://issues.apache.org/jira/browse/YARN-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105522#comment-17105522
 ] 

Íñigo Goiri commented on YARN-9301:
---

I got confused with  [^YARN-9301-001.patch] and [^YARN-9301.002.patch] and I 
ended up doing it with two commits...
Let's try to keep the naming consistent across patches to avoid this next time.
Doing PRs might be a better option at this point.

Anyway, committed to trunk.
Thanks [~BilwaST] for the patch and [~bibinchundatt] for the review.

> Too many InvalidStateTransitionException with SLS
> -
>
> Key: YARN-9301
> URL: https://issues.apache.org/jira/browse/YARN-9301
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Major
>  Labels: simulator
> Fix For: 3.4.0
>
> Attachments: YARN-9301-001.patch, YARN-9301.002.patch
>
>
> Too many InvalidStateTransistionExcetion
> {noformat}
> 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> LAUNCHED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:483)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.containerLaunchedOnNode(SchedulerApplicationAttempt.java:655)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.containerLaunchedOnNode(AbstractYarnScheduler.java:359)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNewContainerInfo(AbstractYarnScheduler.java:1010)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1295)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1752)
> at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205)
> at 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> 19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Invalid event LAUNCHED 
> on container container_1550059705491_0067_01_01
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10220) RM HA times out intermittently

2020-05-12 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein resolved YARN-10220.
--
Resolution: Cannot Reproduce

I will close it for now since I cannot reproduce the failures as reported in 
YARN-2710

> RM HA times out intermittently
> --
>
> Key: YARN-10220
> URL: https://issues.apache.org/jira/browse/YARN-10220
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3
>Reporter: Ahmed Hussein
>Assignee: Bilwa S T
>Priority: Major
>
> TestResourceTrackerOnHA Among other tests time out intermittently
> * TestApplicationClientProtocolOnHA
> * TestApplicationMasterServiceProtocolForTimelineV2
> * TestApplicationMasterServiceProtocolOnHA
> {code:bash}
> [INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ 
> hadoop-yarn-client ---
> [INFO]
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 19.612 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> [ERROR] 
> testResourceTrackerOnHA(org.apache.hadoop.yarn.client.TestResourceTrackerOnHA)
>   Time elapsed: 19.473 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 15000 
> milliseconds
>   at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
>   at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
>   at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:699)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:812)
>   at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1452)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy93.registerNodeManager(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:73)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy94.registerNodeManager(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA.testResourceTrackerOnHA(TestResourceTrackerOnHA.java:64)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
>

[jira] [Commented] (YARN-10220) RM HA times out intermittently

2020-05-12 Thread Ahmed Hussein (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105499#comment-17105499
 ] 

Ahmed Hussein commented on YARN-10220:
--

[~BilwaST], I could not reproduce it again for 3.x or 2.10.

I think this is good news then!

I will close it.

> RM HA times out intermittently
> --
>
> Key: YARN-10220
> URL: https://issues.apache.org/jira/browse/YARN-10220
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.3.0, 3.2.1, 3.1.3
>Reporter: Ahmed Hussein
>Assignee: Bilwa S T
>Priority: Major
>
> TestResourceTrackerOnHA Among other tests time out intermittently
> * TestApplicationClientProtocolOnHA
> * TestApplicationMasterServiceProtocolForTimelineV2
> * TestApplicationMasterServiceProtocolOnHA
> {code:bash}
> [INFO] --- maven-surefire-plugin:3.0.0-M1:test (default-test) @ 
> hadoop-yarn-client ---
> [INFO]
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 19.612 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
> [ERROR] 
> testResourceTrackerOnHA(org.apache.hadoop.yarn.client.TestResourceTrackerOnHA)
>   Time elapsed: 19.473 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 15000 
> milliseconds
>   at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
>   at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
>   at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:699)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:812)
>   at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1452)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>   at com.sun.proxy.$Proxy93.registerNodeManager(Unknown Source)
>   at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:73)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>   at com.sun.proxy.$Proxy94.registerNodeManager(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA.testResourceTrackerOnHA(TestResourceTrackerOnHA.java:64)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
>

[jira] [Commented] (YARN-9404) TestApplicationLifetimeMonitor#testApplicationLifetimeMonitor fails intermittent

2020-05-12 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105453#comment-17105453
 ] 

Jim Brennan commented on YARN-9404:
---

Thanks [~prabhujoseph]!

 

> TestApplicationLifetimeMonitor#testApplicationLifetimeMonitor fails 
> intermittent
> 
>
> Key: YARN-9404
> URL: https://issues.apache.org/jira/browse/YARN-9404
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: YARN-9404-001.patch
>
>
> TestApplicationLifetimeMonitor#testApplicationLifetimeMonitor fails 
> intermittent. 
> {code}
> [ERROR] 
> testApplicationLifetimeMonitor[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor)
>  Time elapsed: 34.75 s <<< FAILURE! java.lang.AssertionError: Application 
> killed before lifetime value at org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor(TestApplicationLifetimeMonitor.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> As per testcase logs, submittime is 1553240813597 and finishtime is 
> 1553240844372. The testcase does (finishtime - submittime) / 1000 = 30775 / 
> 1000 = 30 and loses the decimal, 775 ms.
> {code}
> 2019-03-22 07:47:24,357 INFO  [Ping Checker] util.AbstractLivelinessMonitor 
> (AbstractLivelinessMonitor.java:run(149)) - 
> Expired:application_1553240811329_0004_LIFETIME Timed out after 0 secs
> 2019-03-22 07:47:24,384 INFO  [AsyncDispatcher event handler] 
> resourcemanager.RMAppManager$ApplicationSummary 
> (RMAppManager.java:logAppSummary(219)) - 
> appId=application_1553240811329_0004,name=,user=jenkins,queue=default,state=KILLED,trackingUrl=http://869e1f448cdd:8088/cluster/app/application_1553240811329_0004,appMasterHost=N/A,submitTime=1553240813597,startTime=1553240813604,launchTime=0,finishTime=1553240844372,finalStatus=KILLED,memorySeconds=0,vcoreSeconds=0,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=  vCores:0>,applicationType=YARN,resourceSeconds=0 MB-seconds\, 0 
> vcore-seconds,preemptedResourceSeconds=0 MB-seconds\, 0 vcore-seconds
> {code}
> Testcase succeeds only when the seconds taken is above 30L.
> {code}
>  long totalTimeRun =
> (app4.getFinishTime() - app4.getSubmitTime()) / 1000;
>  Assert.assertTrue("Application killed before lifetime value",
> totalTimeRun > maxLifetime);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2020-05-12 Thread Bilwa S T (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105439#comment-17105439
 ] 

Bilwa S T commented on YARN-6539:
-

Thanks [~yifan.stan] for patch.
 * Can you please add a testcase for this? You can use 
*KerberosSecurityTestcase* to start MiniKdc. You can refer 
TestMultiSchemeAuthenticationHandler for reference. 

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9898) Dependency netty-all-4.1.27.Final doesn't support ARM platform

2020-05-12 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105425#comment-17105425
 ] 

Ayush Saxena commented on YARN-9898:


Thanx [~seanlau] for the update. The build seems happy now.
v004 LGTM +1
if no further comments, will commit by tomorrow EOD.

> Dependency netty-all-4.1.27.Final doesn't support ARM platform
> --
>
> Key: YARN-9898
> URL: https://issues.apache.org/jira/browse/YARN-9898
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Assignee: liusheng
>Priority: Major
> Attachments: YARN-9898.001.patch, YARN-9898.002.patch, 
> YARN-9898.003.patch, YARN-9898.004.patch
>
>
> Hadoop dependent the Netty package, but the *netty-all-4.1.27.Final* of 
> io.netty maven repo, cannot support ARM platform. 
> When run the test *TestCsiClient.testIdentityService* on ARM server, it will 
> raise error like following:
> {code:java}
> Caused by: java.io.FileNotFoundException: 
> META-INF/native/libnetty_transport_native_epoll_aarch_64.so
> at 
> io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:161)
> ... 45 more
> Suppressed: java.lang.UnsatisfiedLinkError: no 
> netty_transport_native_epoll_aarch_64 in java.library.path
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
> at java.lang.Runtime.loadLibrary0(Runtime.java:870)
> at java.lang.System.loadLibrary(System.java:1122)
> at 
> io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:38)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:243)
> at 
> io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:124)
> ... 45 more
> Suppressed: java.lang.UnsatisfiedLinkError: no 
> netty_transport_native_epoll_aarch_64 in java.library.path
> at 
> java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
> at java.lang.Runtime.loadLibrary0(Runtime.java:870)
> at java.lang.System.loadLibrary(System.java:1122)
> at 
> io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:38)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:263)
> at java.security.AccessController.doPrivileged(Native 
> Method)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:255)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:233)
> ... 46 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9898) Dependency netty-all-4.1.27.Final doesn't support ARM platform

2020-05-12 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105388#comment-17105388
 ] 

Hadoop QA commented on YARN-9898:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
44s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
50s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
54m 56s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
5s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
33s{color} | {color:green} hadoop-project in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
20s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 44s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
4s{color} | {color:green} hadoop-yarn-csi in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 5s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}201m 21s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26025/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-9898 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13002693/YARN-9898.004.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient xml |
| uname | Linux 6088aa8e135a 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build

[jira] [Commented] (YARN-10160) Add auto queue creation related configs to RMWebService#CapacitySchedulerQueueInfo

2020-05-12 Thread Prabhu Joseph (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105197#comment-17105197
 ] 

Prabhu Joseph commented on YARN-10160:
--

Thanks [~snemeth].

> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo
> --
>
> Key: YARN-10160
> URL: https://issues.apache.org/jira/browse/YARN-10160
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.4.0, 3.3.1
>
> Attachments: Screen Shot 2020-02-25 at 9.06.52 PM.png, 
> YARN-10160-001.patch, YARN-10160-002.patch, YARN-10160-003.patch, 
> YARN-10160-004.patch, YARN-10160-005.patch, YARN-10160-006.patch, 
> YARN-10160-007.patch, YARN-10160-008.patch, YARN-10160-009.patch, 
> YARN-10160-branch-3.3.001.patch
>
>
> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo.
> {code}
> yarn.scheduler.capacity..auto-create-child-queue.enabled
> yarn.scheduler.capacity..leaf-queue-template.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10160) Add auto queue creation related configs to RMWebService#CapacitySchedulerQueueInfo

2020-05-12 Thread Szilard Nemeth (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105153#comment-17105153
 ] 

Szilard Nemeth commented on YARN-10160:
---

Thanks [~prabhujoseph],
Committed to branch-3.3.
Resolved jira.

> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo
> --
>
> Key: YARN-10160
> URL: https://issues.apache.org/jira/browse/YARN-10160
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.4.0, 3.3.1
>
> Attachments: Screen Shot 2020-02-25 at 9.06.52 PM.png, 
> YARN-10160-001.patch, YARN-10160-002.patch, YARN-10160-003.patch, 
> YARN-10160-004.patch, YARN-10160-005.patch, YARN-10160-006.patch, 
> YARN-10160-007.patch, YARN-10160-008.patch, YARN-10160-009.patch, 
> YARN-10160-branch-3.3.001.patch
>
>
> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo.
> {code}
> yarn.scheduler.capacity..auto-create-child-queue.enabled
> yarn.scheduler.capacity..leaf-queue-template.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10160) Add auto queue creation related configs to RMWebService#CapacitySchedulerQueueInfo

2020-05-12 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10160:
--
Fix Version/s: 3.3.1
   3.3.0

> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo
> --
>
> Key: YARN-10160
> URL: https://issues.apache.org/jira/browse/YARN-10160
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.4.0, 3.3.1
>
> Attachments: Screen Shot 2020-02-25 at 9.06.52 PM.png, 
> YARN-10160-001.patch, YARN-10160-002.patch, YARN-10160-003.patch, 
> YARN-10160-004.patch, YARN-10160-005.patch, YARN-10160-006.patch, 
> YARN-10160-007.patch, YARN-10160-008.patch, YARN-10160-009.patch, 
> YARN-10160-branch-3.3.001.patch
>
>
> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo.
> {code}
> yarn.scheduler.capacity..auto-create-child-queue.enabled
> yarn.scheduler.capacity..leaf-queue-template.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-05-12 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105136#comment-17105136
 ] 

Sunil G commented on YARN-10154:


+1 on latest addendum. Please get this in.

[~maniraj...@gmail.com] any other comments?

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch, YARN-10154.addendum-001.patch, 
> YARN-10154.addendum-002.patch, YARN-10154.addendum-003.patch, 
> YARN-10154.addendum-004.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9404) TestApplicationLifetimeMonitor#testApplicationLifetimeMonitor fails intermittent

2020-05-12 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9404:

Fix Version/s: 2.10.1

> TestApplicationLifetimeMonitor#testApplicationLifetimeMonitor fails 
> intermittent
> 
>
> Key: YARN-9404
> URL: https://issues.apache.org/jira/browse/YARN-9404
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: YARN-9404-001.patch
>
>
> TestApplicationLifetimeMonitor#testApplicationLifetimeMonitor fails 
> intermittent. 
> {code}
> [ERROR] 
> testApplicationLifetimeMonitor[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor)
>  Time elapsed: 34.75 s <<< FAILURE! java.lang.AssertionError: Application 
> killed before lifetime value at org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor(TestApplicationLifetimeMonitor.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> As per testcase logs, submittime is 1553240813597 and finishtime is 
> 1553240844372. The testcase does (finishtime - submittime) / 1000 = 30775 / 
> 1000 = 30 and loses the decimal, 775 ms.
> {code}
> 2019-03-22 07:47:24,357 INFO  [Ping Checker] util.AbstractLivelinessMonitor 
> (AbstractLivelinessMonitor.java:run(149)) - 
> Expired:application_1553240811329_0004_LIFETIME Timed out after 0 secs
> 2019-03-22 07:47:24,384 INFO  [AsyncDispatcher event handler] 
> resourcemanager.RMAppManager$ApplicationSummary 
> (RMAppManager.java:logAppSummary(219)) - 
> appId=application_1553240811329_0004,name=,user=jenkins,queue=default,state=KILLED,trackingUrl=http://869e1f448cdd:8088/cluster/app/application_1553240811329_0004,appMasterHost=N/A,submitTime=1553240813597,startTime=1553240813604,launchTime=0,finishTime=1553240844372,finalStatus=KILLED,memorySeconds=0,vcoreSeconds=0,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=  vCores:0>,applicationType=YARN,resourceSeconds=0 MB-seconds\, 0 
> vcore-seconds,preemptedResourceSeconds=0 MB-seconds\, 0 vcore-seconds
> {code}
> Testcase succeeds only when the seconds taken is above 30L.
> {code}
>  long totalTimeRun =
> (app4.getFinishTime() - app4.getSubmitTime()) / 1000;
>  Assert.assertTrue("Application killed before lifetime value",
> totalTimeRun > maxLifetime);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9404) TestApplicationLifetimeMonitor#testApplicationLifetimeMonitor fails intermittent

2020-05-12 Thread Prabhu Joseph (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105123#comment-17105123
 ] 

Prabhu Joseph commented on YARN-9404:
-

[~Jim_Brennan] Have cherry-picked this Jira to branch-2.10.

> TestApplicationLifetimeMonitor#testApplicationLifetimeMonitor fails 
> intermittent
> 
>
> Key: YARN-9404
> URL: https://issues.apache.org/jira/browse/YARN-9404
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: resourcemanager
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9404-001.patch
>
>
> TestApplicationLifetimeMonitor#testApplicationLifetimeMonitor fails 
> intermittent. 
> {code}
> [ERROR] 
> testApplicationLifetimeMonitor[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor)
>  Time elapsed: 34.75 s <<< FAILURE! java.lang.AssertionError: Application 
> killed before lifetime value at org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor(TestApplicationLifetimeMonitor.java:209)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> {code}
> As per testcase logs, submittime is 1553240813597 and finishtime is 
> 1553240844372. The testcase does (finishtime - submittime) / 1000 = 30775 / 
> 1000 = 30 and loses the decimal, 775 ms.
> {code}
> 2019-03-22 07:47:24,357 INFO  [Ping Checker] util.AbstractLivelinessMonitor 
> (AbstractLivelinessMonitor.java:run(149)) - 
> Expired:application_1553240811329_0004_LIFETIME Timed out after 0 secs
> 2019-03-22 07:47:24,384 INFO  [AsyncDispatcher event handler] 
> resourcemanager.RMAppManager$ApplicationSummary 
> (RMAppManager.java:logAppSummary(219)) - 
> appId=application_1553240811329_0004,name=,user=jenkins,queue=default,state=KILLED,trackingUrl=http://869e1f448cdd:8088/cluster/app/application_1553240811329_0004,appMasterHost=N/A,submitTime=1553240813597,startTime=1553240813604,launchTime=0,finishTime=1553240844372,finalStatus=KILLED,memorySeconds=0,vcoreSeconds=0,preemptedMemorySeconds=0,preemptedVcoreSeconds=0,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=  vCores:0>,applicationType=YARN,resourceSeconds=0 MB-seconds\, 0 
> vcore-seconds,preemptedResourceSeconds=0 MB-seconds\, 0 vcore-seconds
> {code}
> Testcase succeeds only when the seconds taken is above 30L.
> {code}
>  long totalTimeRun =
> (app4.getFinishTime() - app4.getSubmitTime()) / 1000;
>  Assert.assertTrue("Application killed before lifetime value",
> totalTimeRun > maxLifetime);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10245) Verbose logging in Capacity Scheduler

2020-05-12 Thread Prabhu Joseph (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105122#comment-17105122
 ] 

Prabhu Joseph commented on YARN-10245:
--

[~tanu.ajmera] Have observed another issue in CapacityScheduler logging. The 
below log will be useful if it prints the NodeId which does not have 
heartbeated. Can you include the same. Thanks.

{code}
 LOG.info("Skip scheduling on node because it haven't heartbeated for "
+ timeElapsedFromLastHeartbeat / 1000.0f + " secs");
{code}

> Verbose logging in Capacity Scheduler
> -
>
> Key: YARN-10245
> URL: https://issues.apache.org/jira/browse/YARN-10245
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tanu Ajmera
>Assignee: Tanu Ajmera
>Priority: Minor
> Attachments: YARN-10245-001.patch, YARN-10245-002.patch, 
> YARN-10245-003.patch
>
>
> Capacity Scheduler logs in every minute. Has to be changed to DEBUG level
> INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Allocation proposal accepted
> cc [~prabhujoseph]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9898) Dependency netty-all-4.1.27.Final doesn't support ARM platform

2020-05-12 Thread liusheng (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105116#comment-17105116
 ] 

liusheng commented on YARN-9898:


I have no idea about the Jenkins results :(..

> Dependency netty-all-4.1.27.Final doesn't support ARM platform
> --
>
> Key: YARN-9898
> URL: https://issues.apache.org/jira/browse/YARN-9898
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: liusheng
>Assignee: liusheng
>Priority: Major
> Attachments: YARN-9898.001.patch, YARN-9898.002.patch, 
> YARN-9898.003.patch
>
>
> Hadoop dependent the Netty package, but the *netty-all-4.1.27.Final* of 
> io.netty maven repo, cannot support ARM platform. 
> When run the test *TestCsiClient.testIdentityService* on ARM server, it will 
> raise error like following:
> {code:java}
> Caused by: java.io.FileNotFoundException: 
> META-INF/native/libnetty_transport_native_epoll_aarch_64.so
> at 
> io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:161)
> ... 45 more
> Suppressed: java.lang.UnsatisfiedLinkError: no 
> netty_transport_native_epoll_aarch_64 in java.library.path
> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
> at java.lang.Runtime.loadLibrary0(Runtime.java:870)
> at java.lang.System.loadLibrary(System.java:1122)
> at 
> io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:38)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:243)
> at 
> io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:124)
> ... 45 more
> Suppressed: java.lang.UnsatisfiedLinkError: no 
> netty_transport_native_epoll_aarch_64 in java.library.path
> at 
> java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
> at java.lang.Runtime.loadLibrary0(Runtime.java:870)
> at java.lang.System.loadLibrary(System.java:1122)
> at 
> io.netty.util.internal.NativeLibraryUtil.loadLibrary(NativeLibraryUtil.java:38)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> io.netty.util.internal.NativeLibraryLoader$1.run(NativeLibraryLoader.java:263)
> at java.security.AccessController.doPrivileged(Native 
> Method)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:255)
> at 
> io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:233)
> ... 46 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10160) Add auto queue creation related configs to RMWebService#CapacitySchedulerQueueInfo

2020-05-12 Thread Prabhu Joseph (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105114#comment-17105114
 ] 

Prabhu Joseph commented on YARN-10160:
--

[~snemeth] Have attached  [^YARN-10160-branch-3.3.001.patch]  for branch-3.3, 
can you review the same when you get time. Thanks.

> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo
> --
>
> Key: YARN-10160
> URL: https://issues.apache.org/jira/browse/YARN-10160
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: Screen Shot 2020-02-25 at 9.06.52 PM.png, 
> YARN-10160-001.patch, YARN-10160-002.patch, YARN-10160-003.patch, 
> YARN-10160-004.patch, YARN-10160-005.patch, YARN-10160-006.patch, 
> YARN-10160-007.patch, YARN-10160-008.patch, YARN-10160-009.patch, 
> YARN-10160-branch-3.3.001.patch
>
>
> Add auto queue creation related configs to 
> RMWebService#CapacitySchedulerQueueInfo.
> {code}
> yarn.scheduler.capacity..auto-create-child-queue.enabled
> yarn.scheduler.capacity..leaf-queue-template.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-05-12 Thread Prabhu Joseph (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105113#comment-17105113
 ] 

Prabhu Joseph commented on YARN-10154:
--

[~sunilg] [~maniraj...@gmail.com] Can you review the latest addendum patch  
[^YARN-10154.addendum-004.patch]  when you get time. Thanks.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch, YARN-10154.addendum-001.patch, 
> YARN-10154.addendum-002.patch, YARN-10154.addendum-003.patch, 
> YARN-10154.addendum-004.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

37 matches

Mail list logo