[jira] [Assigned] (YARN-9565) RMAppImpl#ranNodes not cleared on FinalTransition

2019-05-22 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned YARN-9565:
---

Assignee: Bilwa S T

> RMAppImpl#ranNodes not cleared on FinalTransition
> -
>
> Key: YARN-9565
> URL: https://issues.apache.org/jira/browse/YARN-9565
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
>
> RMAppImpl holds the list of  nodes on which containers ran which is never 
> cleared.
> This could cause memory leak



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9576) ResourceUsageMultiNodeLookupPolicy may cause Application starve forever

2019-05-22 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846407#comment-16846407
 ] 

Tao Yang commented on YARN-9576:


Thanks [~jutia] for raising this issue.
I think it's indeed a problem, reservation mechanism including re-reservation 
is applicable and good enough for HB-driven scheduling process. But for global 
scheduler, all nodes can be sorted in parallel and taken as candidates in each 
scheduling process,  reservation mechanism should go forward too which needs 
more discussion. A simple solution I think is to keep counting re-reservation 
for the request according to current logic but skip generating reservation 
proposal to let scheduler have a chance to look up follow candidates for this 
request. Thoughts?

>  ResourceUsageMultiNodeLookupPolicy may cause Application starve forever
> 
>
> Key: YARN-9576
> URL: https://issues.apache.org/jira/browse/YARN-9576
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
>
> seems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause 
> Application starve forever
> for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
> cluster, and two queues A,B, each is configured with 50% capacity.
> firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
> and each node of the 10 nodes will have a contianer allocated.
> Afterwards,  another job JobB which requests 3G resource is submited to queue 
> B, and there will be one container with 3G size reserved on node h1,
> with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
> h1,h2,..h9,h10, and there will always be one container re-reverved on node 
> h1, no other reservation happen,  JobB will hang forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-05-22 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846400#comment-16846400
 ] 

Prabhu Joseph commented on YARN-8625:
-

Thanks [~eepayne] for the confirmation. I think still branch-2.7 is fine.

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, 
> ApplicationHistoryServer_Rest_Api.png, ApplicationHistoryServer_UI.png, 
> yarn-site.xml
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-05-22 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846400#comment-16846400
 ] 

Prabhu Joseph edited comment on YARN-8625 at 5/23/19 3:20 AM:
--

Thanks [~eepayne] for the confirmation. I think upto branch-2.7 is fine.


was (Author: prabhu joseph):
Thanks [~eepayne] for the confirmation. I think still branch-2.7 is fine.

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, 
> ApplicationHistoryServer_Rest_Api.png, ApplicationHistoryServer_UI.png, 
> yarn-site.xml
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision

2019-05-22 Thread tianjuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845536#comment-16845536
 ] 

tianjuan edited comment on YARN-7494 at 5/23/19 2:22 AM:
-

seems that ResourceUsageMultiNodeLookupPolicy may cause Application starve 
forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen, no preemption happens either, JobB will hang 
forever, [~sunilg] what's your thought about this situation?


was (Author: jutia):
seems that ResourceUsageMultiNodeLookupPolicy may cause Application starve 
forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen, no preemption happens eothr, JobB will hang 
forever, [~sunilg] what's your thought about this situation?

> Add muti-node lookup mechanism and pluggable nodes sorting policies to 
> optimize placement decision
> --
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, 
> YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, 
> YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9576) ResourceUsageMultiNodeLookupPolicy may cause Application starve forever

2019-05-22 Thread tianjuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianjuan reassigned YARN-9576:
--

Assignee: tianjuan

>  ResourceUsageMultiNodeLookupPolicy may cause Application starve forever
> 
>
> Key: YARN-9576
> URL: https://issues.apache.org/jira/browse/YARN-9576
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: tianjuan
>Assignee: tianjuan
>Priority: Major
>
> seems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause 
> Application starve forever
> for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
> cluster, and two queues A,B, each is configured with 50% capacity.
> firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
> and each node of the 10 nodes will have a contianer allocated.
> Afterwards,  another job JobB which requests 3G resource is submited to queue 
> B, and there will be one container with 3G size reserved on node h1,
> with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
> h1,h2,..h9,h10, and there will always be one container re-reverved on node 
> h1, no other reservation happen,  JobB will hang forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9576) ResourceUsageMultiNodeLookupPolicy may cause Application starve forever

2019-05-22 Thread tianjuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846351#comment-16846351
 ] 

tianjuan edited comment on YARN-9576 at 5/23/19 2:15 AM:
-

Wangda, thanks for your reply. But previously, with node heartbeat(or aysnc 
scheduler), in this case, reservation can happen on several nodes, and finally 
trigger preemption from queueA, but with ResourceUsageMultiNodeLookupPolicy,  
since the order policy will always be h1,h2,..h9,h10, and there will always be 
one container re-reverved on node h1, no other reservation happen on other 
nodes, preemption will neven happen. 


was (Author: jutia):
Wangda, thanks for your reply. But previously, with node heartbeat(or aysnc 
scheduler), in this case, reservation can happen on several nodes, and finally 
trigger preemption from queueA, but with ResourceUsageMultiNodeLookupPolicy,  
the order policy will always be h1,h2,..h9,h10, and there will always be one 
container re-reverved on node h1, no other reservation happen on other nodes, 
preemption will neven happen. 

>  ResourceUsageMultiNodeLookupPolicy may cause Application starve forever
> 
>
> Key: YARN-9576
> URL: https://issues.apache.org/jira/browse/YARN-9576
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: tianjuan
>Priority: Major
>
> seems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause 
> Application starve forever
> for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
> cluster, and two queues A,B, each is configured with 50% capacity.
> firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
> and each node of the 10 nodes will have a contianer allocated.
> Afterwards,  another job JobB which requests 3G resource is submited to queue 
> B, and there will be one container with 3G size reserved on node h1,
> with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
> h1,h2,..h9,h10, and there will always be one container re-reverved on node 
> h1, no other reservation happen,  JobB will hang forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9576) ResourceUsageMultiNodeLookupPolicy may cause Application starve forever

2019-05-22 Thread tianjuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianjuan updated YARN-9576:
---
Description: 
seems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause 
Application starve forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen,  JobB will hang forever.

  was:
eems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause Application 
starve forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen,  JobB will hang forever.


>  ResourceUsageMultiNodeLookupPolicy may cause Application starve forever
> 
>
> Key: YARN-9576
> URL: https://issues.apache.org/jira/browse/YARN-9576
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: tianjuan
>Priority: Major
>
> seems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause 
> Application starve forever
> for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
> cluster, and two queues A,B, each is configured with 50% capacity.
> firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
> and each node of the 10 nodes will have a contianer allocated.
> Afterwards,  another job JobB which requests 3G resource is submited to queue 
> B, and there will be one container with 3G size reserved on node h1,
> with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
> h1,h2,..h9,h10, and there will always be one container re-reverved on node 
> h1, no other reservation happen,  JobB will hang forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9576) ResourceUsageMultiNodeLookupPolicy may cause Application starve forever

2019-05-22 Thread tianjuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846351#comment-16846351
 ] 

tianjuan commented on YARN-9576:


Wangda, thanks for your reply. But previously, with node heartbeat(or aysnc 
scheduler), in this case, reservation can happen on several nodes, and finally 
trigger preemption from queueA, but with ResourceUsageMultiNodeLookupPolicy,  
the order policy will always be h1,h2,..h9,h10, and there will always be one 
container re-reverved on node h1, no other reservation happen on other nodes, 
preemption will neven happen. 

>  ResourceUsageMultiNodeLookupPolicy may cause Application starve forever
> 
>
> Key: YARN-9576
> URL: https://issues.apache.org/jira/browse/YARN-9576
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: tianjuan
>Priority: Major
>
> eems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause 
> Application starve forever
> for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
> cluster, and two queues A,B, each is configured with 50% capacity.
> firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
> and each node of the 10 nodes will have a contianer allocated.
> Afterwards,  another job JobB which requests 3G resource is submited to queue 
> B, and there will be one container with 3G size reserved on node h1,
> with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
> h1,h2,..h9,h10, and there will always be one container re-reverved on node 
> h1, no other reservation happen,  JobB will hang forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9576) ResourceUsageMultiNodeLookupPolicy may cause Application starve forever

2019-05-22 Thread tianjuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianjuan updated YARN-9576:
---
Description: 
eems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause Application 
starve forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen,  JobB will hang forever.

  was:
eems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause Application 
starve forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen,  JobB will hang forever, [~sunilg] what's ypur 
thought about this situation?


>  ResourceUsageMultiNodeLookupPolicy may cause Application starve forever
> 
>
> Key: YARN-9576
> URL: https://issues.apache.org/jira/browse/YARN-9576
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: tianjuan
>Priority: Major
>
> eems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause 
> Application starve forever
> for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
> cluster, and two queues A,B, each is configured with 50% capacity.
> firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
> and each node of the 10 nodes will have a contianer allocated.
> Afterwards,  another job JobB which requests 3G resource is submited to queue 
> B, and there will be one container with 3G size reserved on node h1,
> with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
> h1,h2,..h9,h10, and there will always be one container re-reverved on node 
> h1, no other reservation happen,  JobB will hang forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846310#comment-16846310
 ] 

Eric Yang commented on YARN-9560:
-

# All of node manager logic are marked with:
{code:java}
  @InterfaceAudience.Private
  @InterfaceStability.Unstable
  {code}
To prevent javadoc generation for private API. The current change will generate 
OCIContainerRuntime as public API. This is a big commitment change, and I am 
not sure that YARN is ready to expose this abstract class as a reference 
implementation.

 # . It would be better to have YARN sysfs logic as part of the 
OCIContainerRuntime to ensure that we remind developer to implement YARN sysfs 
API for their runtime to expose cluster runtime configuration inside container.
 # We don't have method by method comprehensive test for 
DockerContainerRuntime. It is impossible to detect if the code is using the 
latest version of DockerContainerRuntime for the refactoring, or if the 
ordering of the statement has any side effect to the running code. What manual 
test have been done on your side for DockerContainerRuntime? I need some time 
to repeat the tests and add my own tests to support the coverages.
 # Without unit test case, more thorough test needs to be conducted which will 
delay the commit and get into repeated iterations to test the refactoring 
matches latest code.  This is something that I like to avoid as well.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9561) Add C changes for the new OCI/squashfs/runc runtime

2019-05-22 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846239#comment-16846239
 ] 

Eric Badger commented on YARN-9561:
---

bq.  Can we have some test cases?
Yes, definitely. As I said above, we can't commit this until I add in some 
tests. But I wanted to put up the patch so that people could give early 
feedback.

bq. Hadoop C is using K style brace placement, cJSON code is using Allman 
style brace placement. It is a little confusing to see them in the same 
directory. Would it be better to place cJSON files in a vendor sub-directory to 
ensure that we don't make modification to the source and easier to rebase to 
upstream?
Sure, that's fine with me. I'll make the change in the next patch

> Add C changes for the new OCI/squashfs/runc runtime
> ---
>
> Key: YARN-9561
> URL: https://issues.apache.org/jira/browse/YARN-9561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9561.001.patch, YARN-9561.002.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new OCI/squashFS/runc runtime. There should 
> be no changes to existing code paths. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9521) RM filed to start due to system services

2019-05-22 Thread Tan, Wangda (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846230#comment-16846230
 ] 

Tan, Wangda commented on YARN-9521:
---

Thanks [~kyungwan nam] for the patch.  

cc: [~rohithsharma], [~billie.rina...@gmail.com] for patch reviewers.

> RM filed to start due to system services
> 
>
> Key: YARN-9521
> URL: https://issues.apache.org/jira/browse/YARN-9521
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: kyungwan nam
>Priority: Major
> Attachments: YARN-9521.001.patch
>
>
> when starting RM, listing system services directory has failed as follows.
> {code}
> 2019-04-30 17:18:25,441 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory 
> is configured to /services
> 2019-04-30 17:18:25,467 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation 
> initialized to yarn (auth:SIMPLE)
> 2019-04-30 17:18:25,467 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in 
> state STARTED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501)
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> ... 13 more
> {code}
> it looks like due to the usage of filesystem cache.
> this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to 
> yarn-site



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-05-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846228#comment-16846228
 ] 

Eric Payne commented on YARN-8625:
--

[~Prabhu Joseph], I was able to reproduce your test results in my test cluster. 
Thanks for your patience.

+1. The patch LGTM. I was hoping that the preservation of the resource usage 
seconds and preemption seconds would also be part of the File and Memory state 
stores, but that's for another JIRA.

I'll commit this tomorrow unless others have more input. The affected version 
is 2.7.4. How var back do you want this patch ported? 

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch, 
> ApplicationHistoryServer_Rest_Api.png, ApplicationHistoryServer_UI.png, 
> yarn-site.xml
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846226#comment-16846226
 ] 

Eric Badger commented on YARN-9560:
---

Hey [~eyang], if {{YARN_CONTAINER_RUNTIME_TYPE}} is undefined, the code will 
act just like it did before this patch. So it will choose the default runtime 
since there isn't a specific runtime specified. 

We could add some test cases, but I'm not sure that's within the scope of this 
restructuring. This code shouldn't be changing how anything works, just 
changing the structure of it. If it is changing how things work, then that's 
something that I need to fix. 

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9561) Add C changes for the new OCI/squashfs/runc runtime

2019-05-22 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846225#comment-16846225
 ] 

Eric Yang edited comment on YARN-9561 at 5/22/19 8:58 PM:
--

[~ebadger] Can we have some test cases?  Hadoop C is using K style brace 
placement, cJSON code is using Allman style brace placement.  It is a little 
confusing to see them in the same directory.  Would it be better to place cJSON 
files in a vendor sub-directory to ensure that we don't make modification to 
the source and easier to rebase to upstream?


was (Author: eyang):
[~ebadger] Can we have some test cases?  Hadoop C is using K style brace 
placement, cJSON code is using Allman style brace placement.  It is a little 
confusing to see them in the directory.  Would it be better to place cJSON 
files in a vendor sub-directory to ensure that we don't make modification to 
the source and easier to rebase to upstream?

> Add C changes for the new OCI/squashfs/runc runtime
> ---
>
> Key: YARN-9561
> URL: https://issues.apache.org/jira/browse/YARN-9561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9561.001.patch, YARN-9561.002.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new OCI/squashFS/runc runtime. There should 
> be no changes to existing code paths. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9561) Add C changes for the new OCI/squashfs/runc runtime

2019-05-22 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846225#comment-16846225
 ] 

Eric Yang commented on YARN-9561:
-

[~ebadger] Can we have some test cases?  Hadoop C is using K style brace 
placement, cJSON code is using Allman style brace placement.  It is a little 
confusing to see them in the directory.  Would it be better to place cJSON 
files in a vendor sub-directory to ensure that we don't make modification to 
the source and easier to rebase to upstream?

> Add C changes for the new OCI/squashfs/runc runtime
> ---
>
> Key: YARN-9561
> URL: https://issues.apache.org/jira/browse/YARN-9561
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9561.001.patch, YARN-9561.002.patch
>
>
> This JIRA will be used to add the C changes to the container-executor native 
> binary that are necessary for the new OCI/squashFS/runc runtime. There should 
> be no changes to existing code paths. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9576) ResourceUsageMultiNodeLookupPolicy may cause Application starve forever

2019-05-22 Thread Tan, Wangda (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846213#comment-16846213
 ] 

Tan, Wangda commented on YARN-9576:
---

[~jutia], actually this behavior is not caused by multi-node lookup policy, it 
is caused by resource fragmentation. There's no good solution for this except 
queue priority based preemption. See YARN-5864.

>  ResourceUsageMultiNodeLookupPolicy may cause Application starve forever
> 
>
> Key: YARN-9576
> URL: https://issues.apache.org/jira/browse/YARN-9576
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: tianjuan
>Priority: Major
>
> eems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause 
> Application starve forever
> for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
> cluster, and two queues A,B, each is configured with 50% capacity.
> firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
> and each node of the 10 nodes will have a contianer allocated.
> Afterwards,  another job JobB which requests 3G resource is submited to queue 
> B, and there will be one container with 3G size reserved on node h1,
> with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
> h1,h2,..h9,h10, and there will always be one container re-reverved on node 
> h1, no other reservation happen,  JobB will hang forever, [~sunilg] what's 
> ypur thought about this situation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846209#comment-16846209
 ] 

Eric Yang commented on YARN-9560:
-

[~ebadger] If YARN_CONTAINER_RUNTIME_TYPE is undefined, job submission will 
trigger OCIContainerRuntime?  Is there a distinction between OCIContainer and 
regular java task based container?  How is that controlled?  Can we include 
some test cases to show how to toggle between each runtime?

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9543) UI2 should handle missing ATSv2 gracefully

2019-05-22 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846197#comment-16846197
 ] 

Szilard Nemeth commented on YARN-9543:
--

Thanks [~akhilpb]!
Then from my side, patch LGTM (non-binding).
[~akhilpb]: Could you please take a look? 

Thanks!

> UI2 should handle missing ATSv2 gracefully
> --
>
> Key: YARN-9543
> URL: https://issues.apache.org/jira/browse/YARN-9543
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2, yarn-ui-v2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: YARN-9543.001.patch, YARN-9543.002.patch
>
>
> Resource manager UI2 is throwing some console errors and an error page on the 
> flows page.
> Suggested improvements:
>  * Disable or remove the flows tab if ATSv2 is not available or not installed
>  * Handle all connection errors to ATSv2 gracefully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-22 Thread Ahmed Hussein (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846169#comment-16846169
 ] 

Ahmed Hussein commented on YARN-9563:
-

I noticed that TestLeafQueue is different across different yarn versions (i.e., 
2.8). My intuition was to add keep the test code in case the implementation 
changes and produces the same behavior we are trying to avoid.

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9563.001.patch, YARN-9563.002.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9563) Resource report REST API could return NaN or Inf

2019-05-22 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846162#comment-16846162
 ] 

Jonathan Eagles commented on YARN-9563:
---

Thanks for the updated patch [~ahussein]. One small thing is that the 
TestLeafQueue test in patch 002 passes with or without the code in 
SchedulerApplicationAttempt. Is the new test trying to excise the new 
functionality.

> Resource report REST API could return NaN or Inf
> 
>
> Key: YARN-9563
> URL: https://issues.apache.org/jira/browse/YARN-9563
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Minor
> Attachments: YARN-9563.001.patch, YARN-9563.002.patch
>
>
> The Resource Manager's Cluster Applications and Cluster Application REST APIs 
> are sometimes returning invalid JSON. This was addressed in YARN-6082.
> However, the fix only fixes the calculation in one site and does not 
> guarantee to avoid the problem.Likewise, generating NaN/Inf can break the web 
> GUI if the columns cannot render non-numeric values.
> The suggested fix is to check for NaN/Inf in the protob. The protob replaces 
> NaN/Inf by 0.0f.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9543) UI2 should handle missing ATSv2 gracefully

2019-05-22 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846098#comment-16846098
 ] 

Akhil PB edited comment on YARN-9543 at 5/22/19 6:59 PM:
-

Hi [~snemeth] The function urlForQueryRecord is Emberjs adapter thing.


was (Author: akhilpb):
Hi [~snemeth] The function urlForQueryRecord is related to Emberjs adapter.

> UI2 should handle missing ATSv2 gracefully
> --
>
> Key: YARN-9543
> URL: https://issues.apache.org/jira/browse/YARN-9543
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2, yarn-ui-v2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: YARN-9543.001.patch, YARN-9543.002.patch
>
>
> Resource manager UI2 is throwing some console errors and an error page on the 
> flows page.
> Suggested improvements:
>  * Disable or remove the flows tab if ATSv2 is not available or not installed
>  * Handle all connection errors to ATSv2 gracefully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9543) UI2 should handle missing ATSv2 gracefully

2019-05-22 Thread Akhil PB (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846098#comment-16846098
 ] 

Akhil PB commented on YARN-9543:


Hi [~snemeth] The function urlForQueryRecord is related to Emberjs adapter.

> UI2 should handle missing ATSv2 gracefully
> --
>
> Key: YARN-9543
> URL: https://issues.apache.org/jira/browse/YARN-9543
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2, yarn-ui-v2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: YARN-9543.001.patch, YARN-9543.002.patch
>
>
> Resource manager UI2 is throwing some console errors and an error page on the 
> flows page.
> Suggested improvements:
>  * Disable or remove the flows tab if ATSv2 is not available or not installed
>  * Handle all connection errors to ATSv2 gracefully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9567) Add diagnostics for outstanding resource requests on app attempts page

2019-05-22 Thread Tan, Wangda (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846088#comment-16846088
 ] 

Tan, Wangda commented on YARN-9567:
---

[~Tao Yang]
{quote}Is it ok to support like this in UI1?  Please feel free to give your 
suggestions.
{quote}
 
Of course! 

> Add diagnostics for outstanding resource requests on app attempts page
> --
>
> Key: YARN-9567
> URL: https://issues.apache.org/jira/browse/YARN-9567
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: no_diagnostic_at_first.png, 
> show_diagnostics_after_requesting_app_activities_REST_API.png
>
>
> Currently on app attempt page we can see outstanding resource requests, it 
> will be helpful for users to know why if we can join diagnostics of this app 
> with these. 
> Discussed with [~cheersyang], we can passively load diagnostics from cache of 
> completed app activities instead of actively triggering which may bring 
> uncontrollable risks.
> For example:
> (1) At first we can see no diagnostic in cache if app activities not 
> triggered below the outstanding requests.
> !no_diagnostic_at_first.png|width=793,height=248!
> (2) After requesting the application activities REST API, we can see 
> diagnostics now.
> !show_diagnostics_after_requesting_app_activities_REST_API.png|width=1046,height=276!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9569) Auto-created leaf queues do not honor cluster-wide min/max memory/vcores

2019-05-22 Thread Tan, Wangda (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846085#comment-16846085
 ] 

Tan, Wangda commented on YARN-9569:
---

Thanks [~ccondit], good catch. 

I remember the reason why we initialize CSConf w/o loading default is we try to 
not pollute configs. [~suma.shivaprasad] do you remember? Could you suggest 
what should be the proper fix?

> Auto-created leaf queues do not honor cluster-wide min/max memory/vcores
> 
>
> Key: YARN-9569
> URL: https://issues.apache.org/jira/browse/YARN-9569
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0
>Reporter: Craig Condit
>Priority: Major
>
> Auto-created leaf queues do not honor cluster-wide settings for maximum 
> CPU/vcores allocation.
> To reproduce:
>  # Set auto-create-child-queue.enabled=true for a parent queue.
>  # Set leaf-queue-template.maximum-allocation-mb=16384.
>  # Set yarn.resource-types.memory-mb.maximum-allocation=16384 in 
> resource-types.xml
>  # Launch a YARN app with a container requesting 16 GB RAM.
>  
> This scenario should work, but instead you get an error similar to this:
> {code:java}
> java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger 
> than the cluster setting for queue root.auto.test max allocation per queue: 
>  cluster setting:    {code}
>  
> This seems to be caused by this code in 
> ManagedParentQueue.getLeafQueueConfigs:
> {code:java}
> CapacitySchedulerConfiguration leafQueueConfigTemplate = new
> CapacitySchedulerConfiguration(new Configuration(false), false);{code}
>  
> This initializes a new leaf queue configuration that does not read 
> resource-types.xml (or any other config). Later, this 
> CapacitySchedulerConfiguration instance calls 
> ResourceUtils.fetchMaximumAllocationFromConfig()  from its 
> getMaximumAllocationPerQueue() method and passes itself as the configuration 
> to use. Since the resource types are not present, ResourceUtils falls back to 
> compiled-in defaults of 8GB RAM, 4 cores.
>  
> I was able to work around this with a custom AutoCreatedQueueManagementPolicy 
> implementation which does something like this in init() and reinitialize():
> {code:java}
> for (Map.Entry entry : this.scheduler.getConfiguration()) {
> if (entry.getKey().startsWith("yarn.resource-types")) {
>   parentQueue.getLeafQueueTemplate().getLeafQueueConfigs()
> .set(entry.getKey(), entry.getValue());
>   }
> }
> {code}
> However, this is obviously a very hacky way to solve the problem.
> I can submit a proper patch if someone can provide some direction as to the 
> best way to proceed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision

2019-05-22 Thread Tan, Wangda (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846081#comment-16846081
 ] 

Tan, Wangda commented on YARN-7494:
---

+ [~cheersyang]

> Add muti-node lookup mechanism and pluggable nodes sorting policies to 
> optimize placement decision
> --
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, 
> YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, 
> YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9543) UI2 should handle missing ATSv2 gracefully

2019-05-22 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846044#comment-16846044
 ] 

Szilard Nemeth commented on YARN-9543:
--

Hi [~zsiegl]!

1. In adapters/timeline-health.js: I don't get what urlForQueryRecord is? Is 
this a function or some EmberJS specific thingy? Maybe you can remove the 
comment "query, modelName" if it's not necessary, but I don't really know, 
maybe this is something related to EmberJS. 
Otherwise, the patch looks good!

> UI2 should handle missing ATSv2 gracefully
> --
>
> Key: YARN-9543
> URL: https://issues.apache.org/jira/browse/YARN-9543
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2, yarn-ui-v2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: YARN-9543.001.patch, YARN-9543.002.patch
>
>
> Resource manager UI2 is throwing some console errors and an error page on the 
> flows page.
> Suggested improvements:
>  * Disable or remove the flows tab if ATSv2 is not available or not installed
>  * Handle all connection errors to ATSv2 gracefully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9543) UI2 should handle missing ATSv2 gracefully

2019-05-22 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9543:
-
Description: 
Resource manager UI2 is throwing some console errors and an error page on the 
flows page.

Suggested improvements:
 * Disable or remove the flows tab if ATSv2 is not available or not installed
 * Handle all connection errors to ATSv2 gracefully

  was:
Resource manager UI2 is throwing some console errors and a error page on flows 
page.

Suggested improvements:
 * Disable or remove flows tab if ATSv2 is not available/installed
 * Handle all connection errors to ATSv2 gracefully


> UI2 should handle missing ATSv2 gracefully
> --
>
> Key: YARN-9543
> URL: https://issues.apache.org/jira/browse/YARN-9543
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2, yarn-ui-v2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: YARN-9543.001.patch, YARN-9543.002.patch
>
>
> Resource manager UI2 is throwing some console errors and an error page on the 
> flows page.
> Suggested improvements:
>  * Disable or remove the flows tab if ATSv2 is not available or not installed
>  * Handle all connection errors to ATSv2 gracefully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9545) Create healthcheck REST endpoint for ATSv2

2019-05-22 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846042#comment-16846042
 ] 

Szilard Nemeth commented on YARN-9545:
--

Hi [~zsiegl]!

Couple of thoughts:
1. Please fix the javadoc of method: TimelineReaderWebServices#health,
"Health check rest end point." should be changed to "Health check REST 
endpoint." or something like that.

Please also change the return value description in the javadoc as well, as it 
seems odd.
This could be the value of it, as an example: 
   
{code:java}
* @return A {@link Response} object with HTTP status 200 OK if the service is 
running.
   * Otherwise, a {@link Response} object with HTTP status 500 is
   * returned.
{code}


2. Still in TimelineReaderWebServices#health, you have an unnecessary line 
break after the try-statement.
3. Still in TimelineReaderWebServices#health, please adhere to the 80 character 
line limit, so the code here should be reformatted.
4. I can see that you added an implementation of 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineReaderImpl#isConnectionAlive,
 but I can't see any usages of the isConnectionAlive method. 
Do you know the reason for this? 
5. Can you add some tests for the health check? 
6. Please address the findbugs / checkstyle issues

Thanks!

> Create healthcheck REST endpoint for ATSv2
> --
>
> Key: YARN-9545
> URL: https://issues.apache.org/jira/browse/YARN-9545
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: YARN-9545.001.patch, YARN-9545.002.patch
>
>
> RM UI2 and CM needs a health check url for ATSv2 service.
> Create a /health rest endpoint
>  * must respond with 200 \{health: ok} if all ok
>  * must respond with non 200 if any problem occurs
>  * could check reader/writer connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9545) Create healthcheck REST endpoint for ATSv2

2019-05-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846026#comment-16846026
 ] 

Hadoop QA commented on YARN-9545:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
1s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
33s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-documentstore
 in trunk has 2 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
57s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  8s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 4 new + 35 unchanged - 0 fixed = 39 total (was 35) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
57s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
35s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
39s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
49s{color} | {color:green} hadoop-yarn-server-timelineservice-documentstore in 
the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
46s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 85m  2s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9545 |
| JIRA Patch URL | 

[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846015#comment-16846015
 ] 

Hadoop QA commented on YARN-9560:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 10 unchanged - 2 fixed = 11 total (was 12) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 20m 58s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.nodemanager.webapp.TestNMWebServices |
|   | hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
|   | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9560 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12969405/YARN-9560.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3d8e8a6a41a3 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a315913 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846011#comment-16846011
 ] 

Craig Condit commented on YARN-9560:


I'm +1 (no-binding) on patch 005 as well.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846011#comment-16846011
 ] 

Craig Condit edited comment on YARN-9560 at 5/22/19 4:06 PM:
-

I'm +1 (non-binding) on patch 005 as well.


was (Author: ccondit):
I'm +1 (no-binding) on patch 005 as well.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846007#comment-16846007
 ] 

Eric Badger commented on YARN-9560:
---

[~Jim_Brennan], thanks for the thorough reviews!

[~eyang], [~billie.rinaldi], [~shaneku...@gmail.com], as a committer would one 
of you mind reviewing this?

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845999#comment-16845999
 ] 

Jim Brennan commented on YARN-9560:
---

[~ebadger] thanks for the update!  I am +1 (non-binding) on patch 005.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845957#comment-16845957
 ] 

Eric Badger commented on YARN-9560:
---

Thanks [~Jim_Brennan]. Patch 5 moves the OCIContainerRuntime import down and 
removes all instances of super.getXXX from DockerLinuxContainerRuntime

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9560:
--
Attachment: YARN-9560.005.patch

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9545) Create healthcheck REST endpoint for ATSv2

2019-05-22 Thread Zoltan Siegl (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Siegl updated YARN-9545:
---
Attachment: YARN-9545.002.patch

> Create healthcheck REST endpoint for ATSv2
> --
>
> Key: YARN-9545
> URL: https://issues.apache.org/jira/browse/YARN-9545
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: YARN-9545.001.patch, YARN-9545.002.patch
>
>
> RM UI2 and CM needs a health check url for ATSv2 service.
> Create a /health rest endpoint
>  * must respond with 200 \{health: ok} if all ok
>  * must respond with non 200 if any problem occurs
>  * could check reader/writer connection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-05-22 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845917#comment-16845917
 ] 

Jim Brennan commented on YARN-9560:
---

[~ebadger] thanks for updating the patch and addressing the checkstyle issues.  
A couple comments on patch 004.
 * GpuResourceHandlerImpl - (nit) move import for OCIContainerRuntime down.
 * DockerLinuxContainerRuntime - I think you should only use super.getXXX in 
places where you need to differentiate from a potentially overridden method in 
the subclass and the one in the super.  In most cases, you will want the 
overridden method if it exists, so you should not use super (I realize 
currently none of these are overridden).  I think the only cases where you need 
to use super is in the constructor and initialize method.

 

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch
>
>
> Since the new OCI/squashFS/runc runtime will be using a lot of the same code 
> as DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - FSImageContainerRuntime (name negotiable)
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9578) Add limit option to control number of results for app activities REST API

2019-05-22 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845755#comment-16845755
 ] 

Tao Yang commented on YARN-9578:


Attached v1 patch for review.
This patch should be submitted after YARN-9497 resolved.

> Add limit option to control number of results for app activities REST API
> -
>
> Key: YARN-9578
> URL: https://issues.apache.org/jira/browse/YARN-9578
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9578.001.patch
>
>
> Currently all completed activities of specified application in cache will be 
> returned for application activities REST API. Most results may be redundant 
> in some scenarios which only need a few latest results, for example, perhaps 
> only one result is needed to be shown on UI for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9578) Add limit option to control number of results for app activities REST API

2019-05-22 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9578:
---
Attachment: YARN-9578.001.patch

> Add limit option to control number of results for app activities REST API
> -
>
> Key: YARN-9578
> URL: https://issues.apache.org/jira/browse/YARN-9578
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9578.001.patch
>
>
> Currently all completed activities of specified application in cache will be 
> returned for application activities REST API. Most results may be redundant 
> in some scenarios which only need a few latest results, for example, perhaps 
> only one result is needed to be shown on UI for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9578) Add limit option to control number of results for app activities REST API

2019-05-22 Thread Tao Yang (JIRA)
Tao Yang created YARN-9578:
--

 Summary: Add limit option to control number of results for app 
activities REST API
 Key: YARN-9578
 URL: https://issues.apache.org/jira/browse/YARN-9578
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Tao Yang
Assignee: Tao Yang


Currently all completed activities of specified application in cache will be 
returned for application activities REST API. Most results may be redundant in 
some scenarios which only need a few latest results, for example, perhaps only 
one result is needed to be shown on UI for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9525) IFile format is not working against s3a remote folder

2019-05-22 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845679#comment-16845679
 ] 

Peter Bacsko commented on YARN-9525:


[~wangda] the test was performed by [~adam.antal] - he can describe what he did.

> IFile format is not working against s3a remote folder
> -
>
> Key: YARN-9525
> URL: https://issues.apache.org/jira/browse/YARN-9525
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.2
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch
>
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} 
> configured to an s3a URI throws the following exception during log 
> aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload 
> this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or 
> directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
>   at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
>   at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
>   ... 7 more
> {noformat}
> This stack trace point to 
> {{LogAggregationIndexedFileController$initializeWriter}} where we do the 
> following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a GetFileStatus to get the length of the log 
> file (the bytes we just wrote out), and that's where the failures happens: 
> the file is not there yet due to eventual consistency.
> Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9497) Support grouping by diagnostics for query results of scheduler and app activities

2019-05-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845622#comment-16845622
 ] 

Hadoop QA commented on YARN-9497:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
32s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 83m  
6s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
43s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9497 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12969353/YARN-9497.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e8ea06965b7c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9dff6ef |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24134/testReport/ |
| Max. process+thread count | 875 (vs. ulimit of 

[jira] [Updated] (YARN-9577) YARN router should expose SubClusters infomation through RouterWebServices

2019-05-22 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie updated YARN-9577:
--
Summary: YARN router should expose SubClusters infomation through 
RouterWebServices  (was: YARN router should expose SubClusters infomation 
throuth RouterWebServices)

> YARN router should expose SubClusters infomation through RouterWebServices
> --
>
> Key: YARN-9577
> URL: https://issues.apache.org/jira/browse/YARN-9577
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: router
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
>
> When yarn federation is enabled, it is very helpful to have a way to access 
> all subclusters Info through API , currently we can implement this in 
> RouterWebServices.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9577) YARN router should expose SubClusters infomation throuth RouterWebServices

2019-05-22 Thread Shen Yinjie (JIRA)
Shen Yinjie created YARN-9577:
-

 Summary: YARN router should expose SubClusters infomation throuth 
RouterWebServices
 Key: YARN-9577
 URL: https://issues.apache.org/jira/browse/YARN-9577
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: router
Reporter: Shen Yinjie


When yarn federation is enabled, it is very helpful to have a way to access all 
subclusters Info through API , currently we can implement this in 
RouterWebServices.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9577) YARN router should expose SubClusters infomation throuth RouterWebServices

2019-05-22 Thread Shen Yinjie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shen Yinjie reassigned YARN-9577:
-

Assignee: Shen Yinjie

> YARN router should expose SubClusters infomation throuth RouterWebServices
> --
>
> Key: YARN-9577
> URL: https://issues.apache.org/jira/browse/YARN-9577
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: router
>Reporter: Shen Yinjie
>Assignee: Shen Yinjie
>Priority: Major
>
> When yarn federation is enabled, it is very helpful to have a way to access 
> all subclusters Info through API , currently we can implement this in 
> RouterWebServices.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision

2019-05-22 Thread tianjuan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845536#comment-16845536
 ] 

tianjuan edited comment on YARN-7494 at 5/22/19 7:07 AM:
-

seems that ResourceUsageMultiNodeLookupPolicy may cause Application starve 
forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen, no preemption happens eothr, JobB will hang 
forever, [~sunilg] what's your thought about this situation?


was (Author: jutia):
seems that ResourceUsageMultiNodeLookupPolicy may cause Application starve 
forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen,  JobB will hang forever, [~sunilg] what's ypur 
thought about this situation?

> Add muti-node lookup mechanism and pluggable nodes sorting policies to 
> optimize placement decision
> --
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, 
> YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, 
> YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9576) ResourceUsageMultiNodeLookupPolicy may cause Application starve forever

2019-05-22 Thread tianjuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tianjuan updated YARN-9576:
---
Description: 
eems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause Application 
starve forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen,  JobB will hang forever, [~sunilg] what's ypur 
thought about this situation?

  was:
eems that ResourceUsageMultiNodeLookupPolicy may cause Application starve 
forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen,  JobB will hang forever, [~sunilg] what's ypur 
thought about this situation?


>  ResourceUsageMultiNodeLookupPolicy may cause Application starve forever
> 
>
> Key: YARN-9576
> URL: https://issues.apache.org/jira/browse/YARN-9576
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: tianjuan
>Priority: Major
>
> eems that ResourceUsageMultiNodeLookupPolicy in YARN-7494 may cause 
> Application starve forever
> for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
> cluster, and two queues A,B, each is configured with 50% capacity.
> firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
> and each node of the 10 nodes will have a contianer allocated.
> Afterwards,  another job JobB which requests 3G resource is submited to queue 
> B, and there will be one container with 3G size reserved on node h1,
> with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
> h1,h2,..h9,h10, and there will always be one container re-reverved on node 
> h1, no other reservation happen,  JobB will hang forever, [~sunilg] what's 
> ypur thought about this situation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9576) ResourceUsageMultiNodeLookupPolicy may cause Application starve forever

2019-05-22 Thread tianjuan (JIRA)
tianjuan created YARN-9576:
--

 Summary:  ResourceUsageMultiNodeLookupPolicy may cause Application 
starve forever
 Key: YARN-9576
 URL: https://issues.apache.org/jira/browse/YARN-9576
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: tianjuan


eems that ResourceUsageMultiNodeLookupPolicy may cause Application starve 
forever

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1,

with ResourceUsageMultiNodeLookupPolicy, the order policy will always be 
h1,h2,..h9,h10, and there will always be one container re-reverved on node h1, 
no other reservation happen,  JobB will hang forever, [~sunilg] what's ypur 
thought about this situation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9543) UI2 should handle missing ATSv2 gracefully

2019-05-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845579#comment-16845579
 ] 

Hadoop QA commented on YARN-9543:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
27m 59s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9543 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12969358/YARN-9543.002.patch |
| Optional Tests |  dupname  asflicense  shadedclient  |
| uname | Linux d2bfe35ff89c 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 67f9a7b |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 446 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24135/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> UI2 should handle missing ATSv2 gracefully
> --
>
> Key: YARN-9543
> URL: https://issues.apache.org/jira/browse/YARN-9543
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: ATSv2, yarn-ui-v2
>Affects Versions: 3.1.2
>Reporter: Zoltan Siegl
>Assignee: Zoltan Siegl
>Priority: Major
> Attachments: YARN-9543.001.patch, YARN-9543.002.patch
>
>
> Resource manager UI2 is throwing some console errors and a error page on 
> flows page.
> Suggested improvements:
>  * Disable or remove flows tab if ATSv2 is not available/installed
>  * Handle all connection errors to ATSv2 gracefully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9521) RM filed to start due to system services

2019-05-22 Thread kyungwan nam (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845571#comment-16845571
 ] 

kyungwan nam commented on YARN-9521:


Please let me know if anyone has any ideas on how to resolve.
Thanks.

> RM filed to start due to system services
> 
>
> Key: YARN-9521
> URL: https://issues.apache.org/jira/browse/YARN-9521
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.2
>Reporter: kyungwan nam
>Priority: Major
> Attachments: YARN-9521.001.patch
>
>
> when starting RM, listing system services directory has failed as follows.
> {code}
> 2019-04-30 17:18:25,441 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(114)) - System Service Directory 
> is configured to /services
> 2019-04-30 17:18:25,467 INFO  client.SystemServiceManagerImpl 
> (SystemServiceManagerImpl.java:serviceInit(120)) - UserGroupInformation 
> initialized to yarn (auth:SIMPLE)
> 2019-04-30 17:18:25,467 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service ResourceManager failed in 
> state STARTED
> org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
> Filesystem closed
> at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
> at 
> org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:869)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1228)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1269)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1265)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1316)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1501)
> Caused by: java.io.IOException: Filesystem closed
> at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:473)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1639)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1217)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1233)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1200)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1179)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$26.doCall(DistributedFileSystem.java:1175)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1187)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.list(SystemServiceManagerImpl.java:375)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.scanForUserServices(SystemServiceManagerImpl.java:282)
> at 
> org.apache.hadoop.yarn.service.client.SystemServiceManagerImpl.serviceStart(SystemServiceManagerImpl.java:126)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
> ... 13 more
> {code}
> it looks like due to the usage of filesystem cache.
> this issue does not happen, when I add "fs.hdfs.impl.disable.cache=true" to 
> yarn-site



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org