[jira] [Updated] (YARN-10287) Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping

2020-05-28 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10287:
-
Reporter: Akhil PB  (was: Prabhu Joseph)

> Update scheduler-conf corrupts the CS configuration when removing queue which 
> is referred in queue mapping
> --
>
> Key: YARN-10287
> URL: https://issues.apache.org/jira/browse/YARN-10287
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Akhil PB
>Assignee: Prabhu Joseph
>Priority: Major
>
> Update scheduler-conf corrupts the CS configuration when removing queue which 
> is referred in queue mapping.  The deletion is failed with below error 
> message but the queue got removed from CS configuration and job submission 
> failed and not removed from the backend ZKConfigurationStore. On subsequent 
> modify using scheduler-conf, the queue appears again from ZKConfigurationStore
> {code}
> 2020-05-22 12:38:38,252 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception 
> thrown when modifying configuration.
> java.io.IOException: Failed to re-init queues : mapping contains invalid or 
> non-leaf queue Prod
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:478)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2389)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2377)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.updateSchedulerConfiguration(RMWebServices.java:2377)
> {code}
> *Repro:*
> {code}
> 1. Setup Queue Mapping
> yarn.scheduler.capacity.root.queues=default,dummy
> yarn.scheduler.capacity.queue-mappings=g:hadoop:dummy
> 2. Stop the root.dummy queue
> 
>root.dummy
>
>  
>state
>STOPPED
>  
>
>  
>
>
> 3. Delete the root.dummy queue
> curl --negotiate -u : -X PUT -d @abc.xml -H "Content-type: application/xml" 
> 'http://:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'
> 
>   
>   root.default
>   
> 
>   capacity
>   100
> 
>   
> 
> root.dummy
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10292) FS-CS converter: add an option to enable asynchronous scheduling in CapacityScheduler

2020-05-28 Thread Benjamin Teke (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke updated YARN-10292:
-
Description: FS doesn't have an equivalent setting to the 
CapacityScheduler's yarn.scheduler.capacity.schedule-asynchronously.enable 
option so the FS to CS converter won't add this to the yarn-site.xml. An 
optional command line switch should be added to support this option during 
migration.  (was: FS doesn't have an equivalent setting to the 
yarn.scheduler.capacity.schedule-asynchronously.enable so the FS to CS 
converter won't add this option to the yarn-site.xml. An optional command line 
switch should be added to support this option during migration.)

> FS-CS converter: add an option to enable asynchronous scheduling in 
> CapacityScheduler
> -
>
> Key: YARN-10292
> URL: https://issues.apache.org/jira/browse/YARN-10292
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>
> FS doesn't have an equivalent setting to the CapacityScheduler's 
> yarn.scheduler.capacity.schedule-asynchronously.enable option so the FS to CS 
> converter won't add this to the yarn-site.xml. An optional command line 
> switch should be added to support this option during migration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10293) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)

2020-05-28 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-10293:


 Summary: Reserved Containers not allocated from available space of 
other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)
 Key: YARN-10293
 URL: https://issues.apache.org/jira/browse/YARN-10293
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Reserved Containers not allocated from available space of other nodes in 
CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues related 
to it 
https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987

Have found one more bug in the CapacityScheduler.java code which causes the 
same issue with slight difference in the repro.

*Repro:*

*Nodes :   Available : Used*
Node1 -  8GB, 8vcores -  8GB. 8cores
Node2 -  8GB, 8vcores - 8GB. 8cores
Node3 -  8GB, 8vcores - 8GB. 8cores

Queues -> A and B both 50% capacity, 100% max capacity

MultiNode enabled + Preemption enabled

1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores

2. JobB Submitted to B queue with AM size of 1GB

{code}
2020-05-21 12:12:27,313 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest  
IP=172.27.160.139   OPERATION=Submit Application Request
TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1590046667304_0005
CALLERCONTEXT=CLI   QUEUENAME=dummy
{code}

3. Preemption happens and used capacity is lesser than 1.0f

{code}
2020-05-21 12:12:48,222 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics:
 Non-AM container preempted, current 
appAttemptId=appattempt_1590046667304_0004_01, 
containerId=container_e09_1590046667304_0004_01_24, resource=
{code}

4. JobB gets a Reserved Container as part of 
CapacityScheduler#allocateOrReserveNewContainer

{code}
2020-05-21 12:12:48,226 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_e09_1590046667304_0005_01_01 Container Transitioned from NEW to 
RESERVED
2020-05-21 12:12:48,226 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
 Reserved container=container_e09_1590046667304_0005_01_01, on node=host: 
tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 
available= used= with 
resource=
{code}

*Why RegularContainerAllocator reserved the container when the used capacity is 
<= 1.0f ?*

{code}
The reason is even though the container is preempted - nodemanager has to stop 
the container and heartbeat and update the available and unallocated resources 
to ResourceManager.
{code}

5. Now, no new allocation happens and reserved container stays at reserved.

After reservation the used capacity becomes 1.0f, below will be in a loop and 
no new allocate or reserve happens. The reserved container cannot be allocated 
as reserved node does not have space. node2 has space for 1GB, 1vcore but 
CapacityScheduler#allocateOrReserveNewContainers not getting called causing the 
Hang.


*[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> 
CapacityScheduler#allocateContainersOnMultiNodes#allocateFromReservedContainer 
-> Node3 has reserved container*

{code}
2020-05-21 12:13:33,242 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Trying to fulfill reservation for application application_1590046667304_0005 
on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041
2020-05-21 12:13:33,242 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
assignContainers: partition= #applications=1
2020-05-21 12:13:33,242 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
 Reserved container=container_e09_1590046667304_0005_01_01, on node=host: 
tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 
available= used= with 
resource=
2020-05-21 12:13:33,243 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Allocation proposal accepted
{code}

CapacityScheduler#allocateOrReserveNewContainers won't be called as below check 
in allocateContainersOnMultiNodes fails

{code}
 if (getRootQueue().getQueueCapacities().getUsedCapacity(
candidates.getPartition()) >= 1.0f
&& preemptionManager.getKillableResource(
{code}






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10293) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)

2020-05-28 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10293:
-
Description: 
Reserved Containers not allocated from available space of other nodes in 
CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues related 
to it 
https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987

Have found one more bug in the CapacityScheduler.java code which causes the 
same issue with slight difference in the repro.

*Repro:*

*Nodes :   Available : Used*
Node1 -  8GB, 8vcores -  8GB. 8cores
Node2 -  8GB, 8vcores - 8GB. 8cores
Node3 -  8GB, 8vcores - 8GB. 8cores

Queues -> A and B both 50% capacity, 100% max capacity

MultiNode enabled + Preemption enabled

1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores

2. JobB Submitted to B queue with AM size of 1GB

{code}
2020-05-21 12:12:27,313 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest  
IP=172.27.160.139   OPERATION=Submit Application Request
TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1590046667304_0005
CALLERCONTEXT=CLI   QUEUENAME=dummy
{code}

3. Preemption happens and used capacity is lesser than 1.0f

{code}
2020-05-21 12:12:48,222 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics:
 Non-AM container preempted, current 
appAttemptId=appattempt_1590046667304_0004_01, 
containerId=container_e09_1590046667304_0004_01_24, resource=
{code}

4. JobB gets a Reserved Container as part of 
CapacityScheduler#allocateOrReserveNewContainer

{code}
2020-05-21 12:12:48,226 INFO 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
container_e09_1590046667304_0005_01_01 Container Transitioned from NEW to 
RESERVED
2020-05-21 12:12:48,226 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
 Reserved container=container_e09_1590046667304_0005_01_01, on node=host: 
tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 
available= used= with 
resource=
{code}

*Why RegularContainerAllocator reserved the container when the used capacity is 
<= 1.0f ?*

{code}
The reason is even though the container is preempted - nodemanager has to stop 
the container and heartbeat and update the available and unallocated resources 
to ResourceManager.
{code}

5. Now, no new allocation happens and reserved container stays at reserved.

After reservation the used capacity becomes 1.0f, below will be in a loop and 
no new allocate or reserve happens. The reserved container cannot be allocated 
as reserved node does not have space. node2 has space for 1GB, 1vcore but 
CapacityScheduler#allocateOrReserveNewContainers not getting called causing the 
Hang.


*[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> 
CapacityScheduler#allocateFromReservedContainer -> Re-reserve the container on 
node*

{code}
2020-05-21 12:13:33,242 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Trying to fulfill reservation for application application_1590046667304_0005 
on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041
2020-05-21 12:13:33,242 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
assignContainers: partition= #applications=1
2020-05-21 12:13:33,242 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
 Reserved container=container_e09_1590046667304_0005_01_01, on node=host: 
tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 
available= used= with 
resource=
2020-05-21 12:13:33,243 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Allocation proposal accepted
{code}

CapacityScheduler#allocateOrReserveNewContainers won't be called as below check 
in allocateContainersOnMultiNodes fails

{code}
 if (getRootQueue().getQueueCapacities().getUsedCapacity(
candidates.getPartition()) >= 1.0f
&& preemptionManager.getKillableResource(
{code}




  was:
Reserved Containers not allocated from available space of other nodes in 
CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues related 
to it 
https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987

Have found one more bug in the CapacityScheduler.java code which causes the 
same issue with slight difference in the repro.

*Repro:*

*Nodes :   Available : Used*
Node1 -  8GB, 8vcores -  8GB. 8cores
Node2 -  8GB, 8vcores - 8GB. 8cores
Node3 -  8GB, 8vcores - 8GB. 8cores

Queues -> A and B both 50% capacity, 100% max capacity

MultiNode enabled + 

[jira] [Updated] (YARN-10287) Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping

2020-05-28 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10287:
-
Attachment: YARN-10287-001.patch

> Update scheduler-conf corrupts the CS configuration when removing queue which 
> is referred in queue mapping
> --
>
> Key: YARN-10287
> URL: https://issues.apache.org/jira/browse/YARN-10287
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.3.0
>Reporter: Akhil PB
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10287-001.patch
>
>
> Update scheduler-conf corrupts the CS configuration when removing queue which 
> is referred in queue mapping.  The deletion is failed with below error 
> message but the queue got removed from CS configuration and job submission 
> failed and not removed from the backend ZKConfigurationStore. On subsequent 
> modify using scheduler-conf, the queue appears again from ZKConfigurationStore
> {code}
> 2020-05-22 12:38:38,252 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception 
> thrown when modifying configuration.
> java.io.IOException: Failed to re-init queues : mapping contains invalid or 
> non-leaf queue Prod
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:478)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2389)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2377)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.updateSchedulerConfiguration(RMWebServices.java:2377)
> {code}
> *Repro:*
> {code}
> 1. Setup Queue Mapping
> yarn.scheduler.capacity.root.queues=default,dummy
> yarn.scheduler.capacity.queue-mappings=g:hadoop:dummy
> 2. Stop the root.dummy queue
> 
>root.dummy
>
>  
>state
>STOPPED
>  
>
>  
>
>
> 3. Delete the root.dummy queue
> curl --negotiate -u : -X PUT -d @abc.xml -H "Content-type: application/xml" 
> 'http://:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'
> 
>   
>   root.default
>   
> 
>   capacity
>   100
> 
>   
> 
> root.dummy
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10293) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)

2020-05-28 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10293:
-
Attachment: YARN-10293-001.patch

> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement (YARN-10259)
> 
>
> Key: YARN-10293
> URL: https://issues.apache.org/jira/browse/YARN-10293
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10293-001.patch
>
>
> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues 
> related to it 
> https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987
> Have found one more bug in the CapacityScheduler.java code which causes the 
> same issue with slight difference in the repro.
> *Repro:*
> *Nodes :   Available : Used*
> Node1 -  8GB, 8vcores -  8GB. 8cores
> Node2 -  8GB, 8vcores - 8GB. 8cores
> Node3 -  8GB, 8vcores - 8GB. 8cores
> Queues -> A and B both 50% capacity, 100% max capacity
> MultiNode enabled + Preemption enabled
> 1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores
> 2. JobB Submitted to B queue with AM size of 1GB
> {code}
> 2020-05-21 12:12:27,313 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest  
> IP=172.27.160.139   OPERATION=Submit Application Request
> TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1590046667304_0005  
>   CALLERCONTEXT=CLI   QUEUENAME=dummy
> {code}
> 3. Preemption happens and used capacity is lesser than 1.0f
> {code}
> 2020-05-21 12:12:48,222 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics:
>  Non-AM container preempted, current 
> appAttemptId=appattempt_1590046667304_0004_01, 
> containerId=container_e09_1590046667304_0004_01_24, 
> resource=
> {code}
> 4. JobB gets a Reserved Container as part of 
> CapacityScheduler#allocateOrReserveNewContainer
> {code}
> 2020-05-21 12:12:48,226 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_e09_1590046667304_0005_01_01 Container Transitioned from NEW to 
> RESERVED
> 2020-05-21 12:12:48,226 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Reserved container=container_e09_1590046667304_0005_01_01, on node=host: 
> tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 
> available= used= with 
> resource=
> {code}
> *Why RegularContainerAllocator reserved the container when the used capacity 
> is <= 1.0f ?*
> {code}
> The reason is even though the container is preempted - nodemanager has to 
> stop the container and heartbeat and update the available and unallocated 
> resources to ResourceManager.
> {code}
> 5. Now, no new allocation happens and reserved container stays at reserved.
> After reservation the used capacity becomes 1.0f, below will be in a loop and 
> no new allocate or reserve happens. The reserved container cannot be 
> allocated as reserved node does not have space. node2 has space for 1GB, 
> 1vcore but CapacityScheduler#allocateOrReserveNewContainers not getting 
> called causing the Hang.
> *[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> 
> CapacityScheduler#allocateContainersOnMultiNodes#allocateFromReservedContainer
>  -> Node3 has reserved container*
> {code}
> 2020-05-21 12:13:33,242 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Trying to fulfill reservation for application application_1590046667304_0005 
> on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041
> 2020-05-21 12:13:33,242 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
> assignContainers: partition= #applications=1
> 2020-05-21 12:13:33,242 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
>  Reserved container=container_e09_1590046667304_0005_01_01, on node=host: 
> tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 
> available= used= with 
> resource=
> 2020-05-21 12:13:33,243 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Allocation proposal accepted
> {code}
> CapacityScheduler#allocateOrReserveNewContainers won't be called as below 
> check in allocateContainersOnMultiNodes fails
> {code}
>  if 

[jira] [Updated] (YARN-10284) Add lazy initialization of LogAggregationFileControllerFactory in LogServlet

2020-05-28 Thread Adam Antal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-10284:
--
Attachment: YARN-10284.002.patch

> Add lazy initialization of LogAggregationFileControllerFactory in LogServlet
> 
>
> Key: YARN-10284
> URL: https://issues.apache.org/jira/browse/YARN-10284
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10284.001.patch, YARN-10284.002.patch
>
>
> Suppose the {{mapred}} user has no access to the remote folder. Pinging the 
> JHS if it's online in every few seconds will produce the following entry in 
> the log:
> {noformat}
> 2020-05-19 00:17:20,331 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController:
>  Unable to determine if the filesystem supports append operation
> java.nio.file.AccessDeniedException: test-bucket: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: There is no mapped role 
> for the group(s) associated with the authenticated user. (user: mapred)
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:204)
> [...]
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:513)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.getRollOverLogMaxSize(LogAggregationIndexedFileController.java:1157)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initInternal(LogAggregationIndexedFileController.java:149)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.initialize(LogAggregationFileController.java:135)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.(LogAggregationFileControllerFactory.java:139)
>   at 
> org.apache.hadoop.yarn.server.webapp.LogServlet.(LogServlet.java:66)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.(HsWebServices.java:99)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices$$FastClassByGuice$$1eb8d5d6.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
> [...]
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> We should only create the {{LogAggregationFactory}} instance when we actually 
> need it, not every time the {{LogServlet}} object is instantiated (so 
> definitely not in the constructor). In this way we prevent pressure on the 
> S3A auth side, especially if the authentication request is a costly operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10284) Add lazy initialization of LogAggregationFileControllerFactory in LogServlet

2020-05-28 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118621#comment-17118621
 ] 

Adam Antal commented on YARN-10284:
---

Thanks for the comment [~gandras], I agree. Modified the patch accordingly, and 
added unit test.

> Add lazy initialization of LogAggregationFileControllerFactory in LogServlet
> 
>
> Key: YARN-10284
> URL: https://issues.apache.org/jira/browse/YARN-10284
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation, yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Major
> Attachments: YARN-10284.001.patch, YARN-10284.002.patch
>
>
> Suppose the {{mapred}} user has no access to the remote folder. Pinging the 
> JHS if it's online in every few seconds will produce the following entry in 
> the log:
> {noformat}
> 2020-05-19 00:17:20,331 WARN 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController:
>  Unable to determine if the filesystem supports append operation
> java.nio.file.AccessDeniedException: test-bucket: 
> org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: There is no mapped role 
> for the group(s) associated with the authenticated user. (user: mapred)
>   at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:204)
> [...]
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:513)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.getRollOverLogMaxSize(LogAggregationIndexedFileController.java:1157)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initInternal(LogAggregationIndexedFileController.java:149)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileController.initialize(LogAggregationFileController.java:135)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.LogAggregationFileControllerFactory.(LogAggregationFileControllerFactory.java:139)
>   at 
> org.apache.hadoop.yarn.server.webapp.LogServlet.(LogServlet.java:66)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.(HsWebServices.java:99)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices$$FastClassByGuice$$1eb8d5d6.newInstance()
>   at 
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
> [...]
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> We should only create the {{LogAggregationFactory}} instance when we actually 
> need it, not every time the {{LogServlet}} object is instantiated (so 
> definitely not in the constructor). In this way we prevent pressure on the 
> S3A auth side, especially if the authentication request is a costly operation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10294) NodeManager shows a wrong reason when a YARN service fails to start

2020-05-28 Thread YCozy (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YCozy updated YARN-10294:
-
Description: 
We have a YARN cluster and try to start a sleeper service. A NodeManager NM1 
gets assigned and tries to start the service. We can see from its log:
{noformat}
2020-05-28 14:48:18,650 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
 Starting container [container_6_0001_01_01]
2020-05-28 14:48:18,710 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_6_0001_01_01 transitioned from SCHEDULED to 
RUNNING{noformat}
Due to some misconfiguration, the container fails to start. We can also see 
from the container's serviceam.log:
{noformat}
2020-05-28 14:48:56,651 [Curator-Framework-0] ERROR imps.CuratorFrameworkImpl - 
Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
ConnectionLoss
  at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:972)
  at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
  at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
  at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)                   
 
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
2020-05-28 14:49:04,621 [pool-5-thread-1] ERROR service.ServiceScheduler - 
Failed to register app sleeper1 in registry
org.apache.hadoop.registry.client.exceptions.RegistryIOException: 
`/registry/users/root/services/yarn-service': Failure of mkdir()  on 
/registry/users/root/services/yarn-service: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for /registry/users/root/services/yarn-service: KeeperErrorCode 
= ConnectionLoss for /registry/users/root/services/yarn-service
  at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:440)
  at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkMkPath(CuratorService.java:595)
  at 
org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.mknode(RegistryOperationsService.java:99)
  at 
org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:194)
  at 
org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:210)
  at 
org.apache.hadoop.yarn.service.ServiceScheduler$2.run(ServiceScheduler.java:575)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)    
 
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)                   
 
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)                                      
 
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /registry/users/root/services/yarn-service
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)      
 
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)       
 
  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1637)                 
 
  at 
org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1180)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
  at 
org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)             
 
  at 
org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
                                                                                
                                                                                
  at 

[jira] [Comment Edited] (YARN-9930) Support max running app logic for CapacityScheduler

2020-05-28 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118899#comment-17118899
 ] 

Peter Bacsko edited comment on YARN-9930 at 5/28/20, 5:09 PM:
--

Created a POC based on the solution exists in FS. No tests yet at all.

Note that I copy-pasted {{MaxRunningAppsEnforcer}}. I started to refactor it so 
that a single class could serve both FS and CS but it required way too many 
changes. The class is heavily tied to FS. So I created 
{{CSMaxRunningAppsEnforcer}}.



was (Author: pbacsko):
Created a POC based on the solution exists in FS. No tests yet at all.

Note that I copy-pasted {{MaxRunningAppsEnforcer}}. I started to refactor it so 
that a single class could serve both FS and CS but the it required way too many 
changes. The class is heavily tied to FS. So I created 
{{CSMaxRunningAppsEnforcer}}.


> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10274) Merge QueueMapping and QueueMappingEntity

2020-05-28 Thread Gergely Pollak (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118651#comment-17118651
 ] 

Gergely Pollak commented on YARN-10274:
---

Kept QueueMapping, and replaced all uses of QueueMappingEntity with 
QueueMappings. The main difference between the two classes is QueueMapping uses 
a builder to create an object, while QueueMappingEntity had multiple 
constructors. For now I've kept the builder approach, however it might need a 
revisit at a later point since the Builder creates a builder object, which then 
creates the actual object, which means we store every data 2 times for an 
object with like 4 fields. This might be a bit of an overkill.

> Merge QueueMapping and QueueMappingEntity
> -
>
> Key: YARN-10274
> URL: https://issues.apache.org/jira/browse/YARN-10274
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
>
> The role, usage and internal behaviour of these classes are almost identical, 
> but it makes no sense to keep both of them. One is used by UserGroup 
> placement rule definitions the other is used by Application placement rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10274) Merge QueueMapping and QueueMappingEntity

2020-05-28 Thread Gergely Pollak (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Pollak updated YARN-10274:
--
Attachment: YARN-10274.001.patch

> Merge QueueMapping and QueueMappingEntity
> -
>
> Key: YARN-10274
> URL: https://issues.apache.org/jira/browse/YARN-10274
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: yarn
>Reporter: Gergely Pollak
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-10274.001.patch
>
>
> The role, usage and internal behaviour of these classes are almost identical, 
> but it makes no sense to keep both of them. One is used by UserGroup 
> placement rule definitions the other is used by Application placement rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10287) Update scheduler-conf corrupts the CS configuration when removing queue which is referred in queue mapping

2020-05-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118701#comment-17118701
 ] 

Hadoop QA commented on YARN-10287:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 32m 
57s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m 
49s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 97m 
16s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}207m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26072/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10287 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004240/YARN-10287-001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 1d9125b99422 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 9b38be43c63 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/26072/testReport/ |
| Max. process+thread count | 838 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-9460) QueueACLsManager and ReservationsACLManager should not use instanceof checks

2020-05-28 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118868#comment-17118868
 ] 

Bilwa S T commented on YARN-9460:
-

[~snemeth]
Please review when you have free time

> QueueACLsManager and ReservationsACLManager should not use instanceof checks
> 
>
> Key: YARN-9460
> URL: https://issues.apache.org/jira/browse/YARN-9460
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9460.001.patch, YARN-9460.002.patch
>
>
> QueueACLsManager and ReservationsACLManager should not use instanceof checks 
> for the scheduler type.
> Rather, we should abstract this into two classes: Capacity and Fair variants 
> of these ACL classes.
> QueueACLsManager and ReservationsACLManager could be abstract classes, but 
> the implementation is the decision of one who will work on this jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler

2020-05-28 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118899#comment-17118899
 ] 

Peter Bacsko commented on YARN-9930:


Created a POC based on the solution exists in FS. No tests yet at all.

Note that I copy-pasted {{MaxRunningAppsEnforcer}}. I started to refactor it so 
that a single class could serve both FS and CS but it just got too big. The 
class is heavily tied to FS. So I created {{CSMaxRunningAppsEnforcer}}.


> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10284) Add lazy initialization of LogAggregationFileControllerFactory in LogServlet

2020-05-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118704#comment-17118704
 ] 

Hadoop QA commented on YARN-10284:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
28s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: 
The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 55s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
40s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26074/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10284 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004250/YARN-10284.002.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 3647f35e336b 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 9b38be43c63 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| checkstyle | 

[jira] [Commented] (YARN-10293) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement (YARN-10259)

2020-05-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118750#comment-17118750
 ] 

Hadoop QA commented on YARN-10293:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
42s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
39s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 40s{color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager
 generated 1 new + 27 unchanged - 0 fixed = 28 total (was 27) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 32s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 21 new + 98 unchanged - 0 fixed = 119 total (was 98) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 34s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 18s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26073/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10293 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004248/YARN-10293-001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 52a829c00f0d 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 
11:09:48 UTC 2020 

[jira] [Commented] (YARN-8047) RMWebApp make external class pluggable

2020-05-28 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118865#comment-17118865
 ] 

Bilwa S T commented on YARN-8047:
-

[~sunilg] I have updated patch. Please check

> RMWebApp make external class pluggable
> --
>
> Key: YARN-8047
> URL: https://issues.apache.org/jira/browse/YARN-8047
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8047-001.patch, YARN-8047-002.patch, 
> YARN-8047-003.patch, YARN-8047.004.patch, YARN-8047.005.patch
>
>
> JIra should make sure we should be able to plugin webservices and web pages 
> of scheduler in Resourcemanager
> * RMWebApp allow to bind external classes
> * RMController allow to plugin scheduler classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9930) Support max running app logic for CapacityScheduler

2020-05-28 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118899#comment-17118899
 ] 

Peter Bacsko edited comment on YARN-9930 at 5/28/20, 5:03 PM:
--

Created a POC based on the solution exists in FS. No tests yet at all.

Note that I copy-pasted {{MaxRunningAppsEnforcer}}. I started to refactor it so 
that a single class could serve both FS and CS but the it required way too many 
changes. The class is heavily tied to FS. So I created 
{{CSMaxRunningAppsEnforcer}}.



was (Author: pbacsko):
Created a POC based on the solution exists in FS. No tests yet at all.

Note that I copy-pasted {{MaxRunningAppsEnforcer}}. I started to refactor it so 
that a single class could serve both FS and CS but it just got too big. The 
class is heavily tied to FS. So I created {{CSMaxRunningAppsEnforcer}}.


> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9930) Support max running app logic for CapacityScheduler

2020-05-28 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9930:
---
Attachment: YARN-9930-POC01.patch

> Support max running app logic for CapacityScheduler
> ---
>
> Key: YARN-9930
> URL: https://issues.apache.org/jira/browse/YARN-9930
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 3.1.0, 3.1.1
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
> Attachments: YARN-9930-POC01.patch
>
>
> In FairScheduler, there has limitation for max running which will let 
> application pending.
> But in CapacityScheduler there has no feature like max running app.Only got 
> max app,and jobs will be rejected directly on client.
> This jira i want to implement this semantic for CapacityScheduler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9196) Attempt started time zone and Application started time zone is different when OS time zone is modified

2020-05-28 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118903#comment-17118903
 ] 

Bilwa S T commented on YARN-9196:
-

Hi [~snemeth]

I have updated patch. Please take a look at it. Thanks

> Attempt started time zone and Application started time zone is different when 
> OS time zone is modified
> --
>
> Key: YARN-9196
> URL: https://issues.apache.org/jira/browse/YARN-9196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9196-001.patch, YARN-9196.002.patch
>
>
> In RM application page, attempt start time is formatted client side 
> (browser), but application start time is formatted by the server.
> If client time zone and server time zone is different then on the UI, the 
> application start time and attempt start time will be in different format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9196) Attempt started time zone and Application started time zone is different when OS time zone is modified

2020-05-28 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-9196:

Attachment: YARN-9196.002.patch

> Attempt started time zone and Application started time zone is different when 
> OS time zone is modified
> --
>
> Key: YARN-9196
> URL: https://issues.apache.org/jira/browse/YARN-9196
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-9196-001.patch, YARN-9196.002.patch
>
>
> In RM application page, attempt start time is formatted client side 
> (browser), but application start time is formatted by the server.
> If client time zone and server time zone is different then on the UI, the 
> application start time and attempt start time will be in different format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10274) Merge QueueMapping and QueueMappingEntity

2020-05-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118866#comment-17118866
 ] 

Hadoop QA commented on YARN-10274:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 
31s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
38s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 32s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 5 new + 64 unchanged - 0 fixed = 69 total (was 64) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 14s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}168m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26075/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10274 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13004258/YARN-10274.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 26179c9541c4 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 

[jira] [Created] (YARN-10294) NodeManager shows a wrong reason when a YARN service fails to start

2020-05-28 Thread YCozy (Jira)
YCozy created YARN-10294:


 Summary: NodeManager shows a wrong reason when a YARN service 
fails to start
 Key: YARN-10294
 URL: https://issues.apache.org/jira/browse/YARN-10294
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.3.0
Reporter: YCozy


We have a YARN cluster and try to start a sleeper service. A NodeManager NM1 
gets assigned and tries to start the service. We can see from its log:
{noformat}
2020-05-28 14:48:18,650 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler:
 Starting container [container_6_0001_01_01]
2020-05-28 14:48:18,710 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_6_0001_01_01 transitioned from SCHEDULED to 
RUNNING{noformat}
Due to some misconfiguration, the container fails to start. We can also see 
from the container's serviceam.log:
{noformat}
2020-05-28 14:48:56,651 [Curator-Framework-0] ERROR imps.CuratorFrameworkImpl - 
Background retry gave up
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
ConnectionLoss
  at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:972)
  at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:943)
  at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:66)
  at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:346)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)                   
 
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
2020-05-28 14:49:04,621 [pool-5-thread-1] ERROR service.ServiceScheduler - 
Failed to register app sleeper1 in registry
org.apache.hadoop.registry.client.exceptions.RegistryIOException: 
`/registry/users/root/services/yarn-service': Failure of mkdir()  on 
/registry/users/root/services/yarn-service: 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
       ConnectionLoss for /registry/users/root/services/yarn-service: 
KeeperErrorCode = ConnectionLoss for /registry/users/root/services/yarn-service
  at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.operationFailure(CuratorService.java:440)
  at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkMkPath(CuratorService.java:595)
  at 
org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.mknode(RegistryOperationsService.java:99)
  at 
org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.putService(YarnRegistryViewForProviders.java:194)
  at 
org.apache.hadoop.yarn.service.registry.YarnRegistryViewForProviders.registerSelf(YarnRegistryViewForProviders.java:210)
  at 
org.apache.hadoop.yarn.service.ServiceScheduler$2.run(ServiceScheduler.java:575)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)    
 
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)                   
 
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)                                      
 
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for /registry/users/root/services/yarn-service
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)      
 
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)       
 
  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1637)                 
 
  at 
org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1180)
  at 
org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
  at 
org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
  at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)             
 
  at 
org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
                 

[jira] [Commented] (YARN-9196) Attempt started time zone and Application started time zone is different when OS time zone is modified

2020-05-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119070#comment-17119070
 ] 

Hadoop QA commented on YARN-9196:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
0s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
59s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
50s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 27s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 3 new + 86 unchanged - 1 fixed = 89 total (was 87) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
18s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
51s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 
38s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 88m  
5s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
49s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}220m  3s{color} | 
{color:black} {color} |
\\

[jira] [Updated] (YARN-10251) Show extended resources on legacy RM UI.

2020-05-28 Thread Eric Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-10251:
--
Attachment: YARN-10251.branch-2.10.001.patch

> Show extended resources on legacy RM UI.
> 
>
> Key: YARN-10251
> URL: https://issues.apache.org/jira/browse/YARN-10251
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: Legacy RM UI With Not All Resources Shown.png, Updated 
> Legacy RM UI With All Resources Shown.png, YARN-10251.branch-2.10.001.patch
>
>
> It would be great to update the legacy RM UI to include GPU resources in the 
> overview and in the per-app sections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9930) Support max running app logic for CapacityScheduler

2020-05-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119161#comment-17119161
 ] 

Hadoop QA commented on YARN-9930:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 31m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
43s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 35s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 16 new + 269 unchanged - 0 fixed = 285 total (was 269) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}306m 15s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}382m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestSchedulingRequestContainerAllocationAsync
 |
|   | hadoop.yarn.server.resourcemanager.scheduler.TestAbstractYarnScheduler |
|   | hadoop.yarn.server.resourcemanager.TestApplicationMasterServiceFair |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoQueueCreation
 |
|   | hadoop.yarn.server.resourcemanager.TestSignalContainer |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerMultiNodes
 |
|   | hadoop.yarn.server.resourcemanager.TestContainerResourceUsage |
|