[jira] [Updated] (YARN-9295) [UI2] Fix 'Decomissioned' label typo in Cluster Overview page

2019-02-13 Thread Akhil PB (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akhil PB updated YARN-9295:
---
Summary: [UI2] Fix 'Decomissioned' label typo in Cluster Overview page  
(was: Fix 'Decomissioned' label typo in Cluster Overview page)

> [UI2] Fix 'Decomissioned' label typo in Cluster Overview page
> -
>
> Key: YARN-9295
> URL: https://issues.apache.org/jira/browse/YARN-9295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Charan Hebri
>Assignee: Charan Hebri
>Priority: Trivial
> Attachments: Decommissioned-typo.png, YARN-9295.001.patch
>
>
> Change label text from 'Decomissioned' to 'Decommissioned' in Node Managers 
> section of the Cluster Overview page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9302:
--
Description: I think it's more flexible to make maxAssign configurable at 
NM side. After that, we can assign different amount of containers.  (was: I 
think it's more flexible to make maxAssign configurable at NM side. )

> make maxAssign configurable at NM side
> --
>
> Key: YARN-9302
> URL: https://issues.apache.org/jira/browse/YARN-9302
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> I think it's more flexible to make maxAssign configurable at NM side. After 
> that, we can assign different amount of containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9302:
--
Description: I think it's more flexible to make maxAssign configurable at 
NM side.   (was: I think it's more flexible to config)

> make maxAssign configurable at NM side
> --
>
> Key: YARN-9302
> URL: https://issues.apache.org/jira/browse/YARN-9302
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> I think it's more flexible to make maxAssign configurable at NM side. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9300) Lazy preemption should trigger an update on queue preemption metrics for CapacityScheduler

2019-02-13 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9300:
---
Attachment: YARN-9300.001.patch

> Lazy preemption should trigger an update on queue preemption metrics for 
> CapacityScheduler
> --
>
> Key: YARN-9300
> URL: https://issues.apache.org/jira/browse/YARN-9300
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9300.001.patch
>
>
> Currently lazy preemption can't trigger an update on queue preemption metrics 
> since the update is only called in 
> CapacityScheduler#completedContainerInternal which is not the only way to be 
> passed for all container completions. 
> This issue plans to move this update to LeafQueue#completedContainer to 
> trigger an update on queue preemption metrics for all container completions 
> because of preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin updated YARN-9302:
--
Description: I think it's more flexible to config

> make maxAssign configurable at NM side
> --
>
> Key: YARN-9302
> URL: https://issues.apache.org/jira/browse/YARN-9302
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> I think it's more flexible to config



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-9302:
-

Assignee: Zhaohui Xin

> make maxAssign configurable at NM side
> --
>
> Key: YARN-9302
> URL: https://issues.apache.org/jira/browse/YARN-9302
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9302) make maxAssign configurable at NM side

2019-02-13 Thread Zhaohui Xin (JIRA)
Zhaohui Xin created YARN-9302:
-

 Summary: make maxAssign configurable at NM side
 Key: YARN-9302
 URL: https://issues.apache.org/jira/browse/YARN-9302
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhaohui Xin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9299) TestTimelineReaderWhitelistAuthorizationFilter ignores Http Errors

2019-02-13 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767889#comment-16767889
 ] 

Prabhu Joseph commented on YARN-9299:
-

[~rohithsharma] Can you review the patch for this jira - this fixes 
TestTimelineReaderWhitelistAuthorizationFilter positive test cases to make sure 
there is no SC_FORBIDDEN thrown from TimelineReaderWhitelistAuthorizationFilter.

> TestTimelineReaderWhitelistAuthorizationFilter ignores Http Errors
> --
>
> Key: YARN-9299
> URL: https://issues.apache.org/jira/browse/YARN-9299
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9299-001.patch
>
>
> TestTimelineReaderWhitelistAuthorizationFilter positive test cases does not 
> check if there is any Error in HttpResponse. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9299) TestTimelineReaderWhitelistAuthorizationFilter ignores Http Errors

2019-02-13 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9299:

Affects Version/s: 3.1.2

> TestTimelineReaderWhitelistAuthorizationFilter ignores Http Errors
> --
>
> Key: YARN-9299
> URL: https://issues.apache.org/jira/browse/YARN-9299
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9299-001.patch
>
>
> TestTimelineReaderWhitelistAuthorizationFilter positive test cases does not 
> check if there is any Error in HttpResponse. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9301) Too many InvalidStateTransitionException with SLS

2019-02-13 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-9301:
--

 Summary: Too many InvalidStateTransitionException with SLS
 Key: YARN-9301
 URL: https://issues.apache.org/jira/browse/YARN-9301
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt


Too many InvalidStateTransistionExcetion

{noformat}
19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Can't handle this event at 
current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
LAUNCHED at RUNNING
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:483)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:65)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.containerLaunchedOnNode(SchedulerApplicationAttempt.java:655)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.containerLaunchedOnNode(AbstractYarnScheduler.java:359)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.updateNewContainerInfo(AbstractYarnScheduler.java:1010)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.nodeUpdate(AbstractYarnScheduler.java:1112)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:1295)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1752)
at 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:205)
at 
org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.handle(SLSCapacityScheduler.java:60)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
at java.lang.Thread.run(Thread.java:745)
19/02/13 17:44:43 ERROR rmcontainer.RMContainerImpl: Invalid event LAUNCHED on 
container container_1550059705491_0067_01_01

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9300) Lazy preemption should trigger an update on queue preemption metrics for CapacityScheduler

2019-02-13 Thread Tao Yang (JIRA)
Tao Yang created YARN-9300:
--

 Summary: Lazy preemption should trigger an update on queue 
preemption metrics for CapacityScheduler
 Key: YARN-9300
 URL: https://issues.apache.org/jira/browse/YARN-9300
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 3.2.2
Reporter: Tao Yang
Assignee: Tao Yang


Currently lazy preemption can't trigger an update on queue preemption metrics 
since the update is only called in CapacityScheduler#completedContainerInternal 
which is not the only way to be passed for all container completions. 
This issue plans to move this update to LeafQueue#completedContainer to trigger 
an update on queue preemption metrics for all container completions because of 
preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-13 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767790#comment-16767790
 ] 

Eric Yang commented on YARN-8927:
-

[~ebadger] I think it's still admin mistake because the repository name can be 
preconfigured to a host in local domain which would have no chance to contact 
docker hub even if a repository is later setup to try to impersonate.  YARN's 
trusted registry acl can avoid untrusted docker hub repository.  The discussion 
is digressing.  I agree that adding the local image white list can tighten 
security further for images without '/' characters or used.  This jira can't 
solve docker run pulling remote image when image is absent or remote image name 
is identical to local image name.  [~csingh] is solving the docker image 
localization issues in YARN-9228.  It may help to solve precheck of image 
existence in her story instead.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-13 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767744#comment-16767744
 ] 

Eric Badger commented on YARN-8927:
---

This isn't an admin mistakenly naming their local image the same as a 
repository on dockerhub. The admin will name their local images something and 
then after that a nefarious actor will upload a malicious image to that same 
location in dockerhub. Unless you are assuming that dockerhub is to be a 
trusted source, which I don't think it can be.

As for avoiding this issue by using a private repository, this is not possible 
as Docker refuses to remove docker.io from the default registry list 
(https://github.com/moby/moby/issues/33069). So docker.io will always be the 
fallback if the image does not exist locally. 

Again, I would love it if Docker would just allow for you to remove default 
registries or add a --no-pull flag or similar to the run command. But, since 
they are not and will not do those, we have to mitigate in other ways to avoid 
bad apples who can push malicious images to dockerhub.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-13 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767703#comment-16767703
 ] 

Eric Yang commented on YARN-8927:
-

[~ebadger] I don't think there is a way to prevent docker run to pull a image 
that admin has mistakenly named local images that matches repository on docker 
hub, then having the image absent locally.  The chance of this happening is 
rare and can be avoided by using private repository host/port to avoid 
contacting docker hub.  I like to avoid conflating admin mistakes (usability 
problem) and actual security problem for this jira to move forward.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-13 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767645#comment-16767645
 ] 

Eric Badger commented on YARN-8927:
---

ARN-9184 deals with explicit pulls. However, docker will do an implicit pull 
during {{docker run}} if the image does not exist locally. YARN-9184 seems to 
deal with explicitly pulling (or not pulling) images before the container is 
launched.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-13 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767645#comment-16767645
 ] 

Eric Badger edited comment on YARN-8927 at 2/13/19 10:25 PM:
-

YARN-9184 deals with explicit pulls. However, docker will do an implicit pull 
during {{docker run}} if the image does not exist locally. YARN-9184 seems to 
deal with explicitly pulling (or not pulling) images before the container is 
launched.


was (Author: ebadger):
ARN-9184 deals with explicit pulls. However, docker will do an implicit pull 
during {{docker run}} if the image does not exist locally. YARN-9184 seems to 
deal with explicitly pulling (or not pulling) images before the container is 
launched.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-13 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767636#comment-16767636
 ] 

Eric Yang commented on YARN-8927:
-

[~ebadger] {quote}If we are assuming that Dockerhub and any other default 
registry is untrusted (we should), then the assumption has to be that any image 
by any name can be published. Let's say I tag a local image as 
hadoop/myimage:latest on every node in my cluster. We have to assume that there 
could be a repo within the default registry named hadoop with an image named 
myimage:latest. This doesn't make my local image hadoop/myimage:latest any less 
of a local image, but it also means that there is an image in Dockerhub by the 
same name which will be pulled if, for whatever reason, my local image was 
deleted, not uploaded yet, etc.{quote}

The last point is covered by YARN-9184.  Can you confirm?

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-13 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767583#comment-16767583
 ] 

Eric Badger commented on YARN-8927:
---

{quote}
It seems if a user wants lcoal image "repoA/userA/imageA" to be allowed, he/she 
should configure "repoA/userA" in the "docker.trusted.registries"? I will try 
if this works and get back to you.
{quote}
It's not about wanting repoA/userA/imageA to be allowed. That is an easy 
problem to solve as you have described. The hard part is allowing 
repoA/userA/imageA to be allowed _only_ if it exists locally. 

{quote}
And one thing worthing noting is that if YARN allows an image name, then Docker 
will check if it's local and prefer to run it before pulling from a hub. YARN's 
checking logic here seems duplicated work because if Docker can pull it and 
run. We can hardly say this "repoA/userA/imageA" is a real local image. 
{quote}
If we are assuming that Dockerhub and any other default registry is untrusted 
(we should), then the assumption has to be that any image by any name can be 
published. Let's say I tag a local image as {{hadoop/myimage:latest}} on every 
node in my cluster. We have to assume that there could be a repo within the 
default registry named {{hadoop}} with an image named {{myimage:latest}}. This 
doesn't make my local image {{hadoop/myimage:latest}} any less of a local 
image, but it also means that there is an image in Dockerhub by the same name 
which will be pulled if, for whatever reason, my local image was deleted, not 
uploaded yet, etc.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9299) TestTimelineReaderWhitelistAuthorizationFilter ignores Http Errors

2019-02-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767473#comment-16767473
 ] 

Hadoop QA commented on YARN-9299:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
13s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9299 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958611/YARN-9299-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a17230023736 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 29b411d |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23398/testReport/ |
| Max. process+thread count | 336 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23398/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> TestTimelineReaderWhitelis

[jira] [Created] (YARN-9299) TestTimelineReaderWhitelistAuthorizationFilter ignores Http Errors

2019-02-13 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-9299:
---

 Summary: TestTimelineReaderWhitelistAuthorizationFilter ignores 
Http Errors
 Key: YARN-9299
 URL: https://issues.apache.org/jira/browse/YARN-9299
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestTimelineReaderWhitelistAuthorizationFilter positive test cases does not 
check if there is any Error in HttpResponse. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9299) TestTimelineReaderWhitelistAuthorizationFilter ignores Http Errors

2019-02-13 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9299:

Attachment: YARN-9299-001.patch

> TestTimelineReaderWhitelistAuthorizationFilter ignores Http Errors
> --
>
> Key: YARN-9299
> URL: https://issues.apache.org/jira/browse/YARN-9299
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9299-001.patch
>
>
> TestTimelineReaderWhitelistAuthorizationFilter positive test cases does not 
> check if there is any Error in HttpResponse. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9118) Handle issues with parsing user defined GPU devices in GpuDiscoverer

2019-02-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767366#comment-16767366
 ] 

Hadoop QA commented on YARN-9118:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 13s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 19s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 4 new + 11 unchanged - 5 fixed = 15 total (was 16) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
52s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9118 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958591/YARN-9118.008.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f51f23e03480 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 00c5ffa |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23397/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23397/testReport/ |
| Max. process+thread count | 447 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanage

[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-02-13 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767370#comment-16767370
 ] 

Eric Yang commented on YARN-8927:
-

[~tangzhankun] when "library" is configured, and there is a local image named 
black.  This is not a top level image.  This image is trusted by default.  In 
[~ebadger]'s environment, local trusted image is tagged with "repoA/imageA".  
Patch 002 breaks his trust list because top level images are trusted, but 
untagged image name black is also trusted.  This is the reason that he ask for 
a local image white list to prevent local image like black to be trusted.  Is 
this something that can be enhanced in the condition that checks for library 
and '/'?  It would be possible to add a white list here to tighten security.

> Support trust top-level image like "centos" when "library" is configured in 
> "docker.trusted.registries"
> ---
>
> Key: YARN-8927
> URL: https://issues.apache.org/jira/browse/YARN-8927
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8927-trunk.001.patch, YARN-8927-trunk.002.patch
>
>
> There are some missing cases that we need to catch when handling 
> "docker.trusted.registries".
> The container-executor.cfg configuration is as follows:
> {code:java}
> docker.trusted.registries=tangzhankun,ubuntu,centos{code}
> It works if run DistrubutedShell with "tangzhankun/tensorflow"
> {code:java}
> "yarn ... -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=tangzhankun/tensorflow
> {code}
> But run a DistrubutedShell job with "centos", "centos[:tagName]", "ubuntu" 
> and "ubuntu[:tagName]" fails:
> The error message is like:
> {code:java}
> "image: centos is not trusted"
> {code}
> We need better handling the above cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9118) Handle issues with parsing user defined GPU devices in GpuDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767351#comment-16767351
 ] 

Peter Bacsko commented on YARN-9118:


"If I put those method names into a newline, it looks really weird"

Just use {{@SuppressWarnings("checkstyle:linelength")}} if it doesn't make 
sense.

> Handle issues with parsing user defined GPU devices in GpuDiscoverer
> 
>
> Key: YARN-9118
> URL: https://issues.apache.org/jira/browse/YARN-9118
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9118.001.patch, YARN-9118.002.patch, 
> YARN-9118.003.patch, YARN-9118.004.patch, YARN-9118.005.patch, 
> YARN-9118.006.patch, YARN-9118.007.patch, YARN-9118.008.patch
>
>
> getGpusUsableByYarn has the following issues: 
> - Duplicate GPU device definitions are not denied: This seems to be the 
> biggest issue as it could increase the number of devices on the node if the 
> device ID is defined 2 or more times.
> - An empty-string is accepted, it works like the user would not want to use 
> auto-discovery and haven't defined any GPU devices: This will result in an 
> empty device list, but the empty-string check is never explicitly there in 
> the code, so this behavior just coincidental.
> - Number validation does not happen on GPU device IDs (separated by commas)
> Many testcases are added as the coverage was already very low.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9118) Handle issues with parsing user defined GPU devices in GpuDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767351#comment-16767351
 ] 

Peter Bacsko edited comment on YARN-9118 at 2/13/19 4:03 PM:
-

"If I put those method names into a newline, it looks really weird"

Just use {{@SuppressWarnings("checkstyle:linelength")}} if that's the case


was (Author: pbacsko):
"If I put those method names into a newline, it looks really weird"

Just use {{@SuppressWarnings("checkstyle:linelength")}} if it doesn't make 
sense.

> Handle issues with parsing user defined GPU devices in GpuDiscoverer
> 
>
> Key: YARN-9118
> URL: https://issues.apache.org/jira/browse/YARN-9118
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9118.001.patch, YARN-9118.002.patch, 
> YARN-9118.003.patch, YARN-9118.004.patch, YARN-9118.005.patch, 
> YARN-9118.006.patch, YARN-9118.007.patch, YARN-9118.008.patch
>
>
> getGpusUsableByYarn has the following issues: 
> - Duplicate GPU device definitions are not denied: This seems to be the 
> biggest issue as it could increase the number of devices on the node if the 
> device ID is defined 2 or more times.
> - An empty-string is accepted, it works like the user would not want to use 
> auto-discovery and haven't defined any GPU devices: This will result in an 
> empty device list, but the empty-string check is never explicitly there in 
> the code, so this behavior just coincidental.
> - Number validation does not happen on GPU device IDs (separated by commas)
> Many testcases are added as the coverage was already very low.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9118) Handle issues with parsing user defined GPU devices in GpuDiscoverer

2019-02-13 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767300#comment-16767300
 ] 

Szilard Nemeth commented on YARN-9118:
--

Hi [~tangzhankun]!
Fixed some of the checkstyle issues with patch008.
Some of them does not make sense for me: 
- Missing package-info: Is this really required?
- I had 2 lines are longer than 80 chars issues in 
GpuDeviceSpecificationException: If I put those method names into a newline, it 
looks really weird. 
- 'conf' hides a field: Does this have any value to rename the parameter?
Are you fine with not fixing the issues listed above?

[~pbacsko]: Extracted the creation of the configuration objects into the method 
with the latest patch.

> Handle issues with parsing user defined GPU devices in GpuDiscoverer
> 
>
> Key: YARN-9118
> URL: https://issues.apache.org/jira/browse/YARN-9118
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9118.001.patch, YARN-9118.002.patch, 
> YARN-9118.003.patch, YARN-9118.004.patch, YARN-9118.005.patch, 
> YARN-9118.006.patch, YARN-9118.007.patch, YARN-9118.008.patch
>
>
> getGpusUsableByYarn has the following issues: 
> - Duplicate GPU device definitions are not denied: This seems to be the 
> biggest issue as it could increase the number of devices on the node if the 
> device ID is defined 2 or more times.
> - An empty-string is accepted, it works like the user would not want to use 
> auto-discovery and haven't defined any GPU devices: This will result in an 
> empty device list, but the empty-string check is never explicitly there in 
> the code, so this behavior just coincidental.
> - Number validation does not happen on GPU device IDs (separated by commas)
> Many testcases are added as the coverage was already very low.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9098) Separate mtab file reader code and cgroups file system hierarchy parser code from CGroupsHandlerImpl and ResourceHandlerModule

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767298#comment-16767298
 ] 

Peter Bacsko commented on YARN-9098:


Maybe it's just nitpicking, but...

{noformat}
  public List getPathsForController(String controller) {
return mappings.entrySet().stream()
.filter(e -> e.getValue().contains(controller))
.map(Map.Entry::getKey)
.collect(Collectors.toList());
  }
{noformat}

Is it ok to use {{contains()}} here? If cpu and cpuacct are mounted to two 
different directories, then we might return wrong path for cpu, no? Usually 
they're mounted to the same directory like {{/sys/fs/cgroup/cpu,cpuacct}} but 
it's something to think about.

> Separate mtab file reader code and cgroups file system hierarchy parser code 
> from CGroupsHandlerImpl and ResourceHandlerModule
> --
>
> Key: YARN-9098
> URL: https://issues.apache.org/jira/browse/YARN-9098
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9098.002.patch, YARN-9098.003.patch, 
> YARN-9098.004.patch, YARN-9098.005.patch, YARN-9098.006.patch
>
>
> Separate mtab file reader code and cgroups file system hierarchy parser code 
> from CGroupsHandlerImpl and ResourceHandlerModule
> CGroupsHandlerImpl has a method parseMtab that parses an mtab file and stores 
> cgroups data.
> CGroupsLCEResourcesHandler also has a method with the same name, with 
> identical code.
> The parser code should be extracted from these places and be added in a new 
> class as this is a separate responsibility.
> As the output of the file parser is a Map>, it's better 
> to encapsulate it in a domain object, named 'CGroupsMountConfig' for instance.
> ResourceHandlerModule has a method named parseConfiguredCGroupPath, that is 
> responsible for producing the same results (Map>) to 
> store cgroups data, it does not operate on mtab file, but looking at the 
> filesystem for cgroup settings. As the output is the same, CGroupsMountConfig 
> should be used here, too.
> Again, this could should not be part of ResourceHandlerModule as it is a 
> different responsibility.
> One more thing which is strongly related to the methods above is 
> CGroupsHandlerImpl.initializeFromMountConfig: This method processes the 
> result of a parsed mtab file or a parsed cgroups filesystem data and stores 
> file system paths for all available controllers. This method invokes 
> findControllerPathInMountConfig, which is a duplicated in CGroupsHandlerImpl 
> and CGroupsLCEResourcesHandler, so it should be moved to a single place. To 
> store filesystem path and controller mappings, a new domain object could be 
> introduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9118) Handle issues with parsing user defined GPU devices in GpuDiscoverer

2019-02-13 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9118:
-
Attachment: YARN-9118.008.patch

> Handle issues with parsing user defined GPU devices in GpuDiscoverer
> 
>
> Key: YARN-9118
> URL: https://issues.apache.org/jira/browse/YARN-9118
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9118.001.patch, YARN-9118.002.patch, 
> YARN-9118.003.patch, YARN-9118.004.patch, YARN-9118.005.patch, 
> YARN-9118.006.patch, YARN-9118.007.patch, YARN-9118.008.patch
>
>
> getGpusUsableByYarn has the following issues: 
> - Duplicate GPU device definitions are not denied: This seems to be the 
> biggest issue as it could increase the number of devices on the node if the 
> device ID is defined 2 or more times.
> - An empty-string is accepted, it works like the user would not want to use 
> auto-discovery and haven't defined any GPU devices: This will result in an 
> empty device list, but the empty-string check is never explicitly there in 
> the code, so this behavior just coincidental.
> - Number validation does not happen on GPU device IDs (separated by commas)
> Many testcases are added as the coverage was already very low.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9123) Clean up and split testcases in TestNMWebServices for GPU support

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767291#comment-16767291
 ] 

Peter Bacsko commented on YARN-9123:


" testGetNMResourceInfoFailBecauseOfUnknownPlugin is a bit lengthy: 47 
character."

I think this is fine (seen much worse). Another name could be sth like 
{{testGetNMResourceInfoWhenPluginIsUnknown}} which is also a popular naming 
scheme (I mean using "when").

Talking about repetitions, this could be extracted too:
{noformat}
ClientResponse response = r.path("ws").path("v1").path("node").path(
   
"resources").path("resource-2").accept(MediaType.APPLICATION_JSON).get(
   ClientResponse.class);
{noformat}


> Clean up and split testcases in TestNMWebServices for GPU support
> -
>
> Key: YARN-9123
> URL: https://issues.apache.org/jira/browse/YARN-9123
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-9123.001.patch, YARN-9123.002.patch, 
> YARN-9123.003.patch, YARN-9123.004.patch
>
>
> The following testcases can be cleaned up a bit: 
> TestNMWebServices#testGetNMResourceInfo - Can be split up to 3 different cases
> TestNMWebServices#testGetYarnGpuResourceInfo



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9135) NM State store ResourceMappings serialization are tested with Strings instead of real Device objects

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767287#comment-16767287
 ] 

Peter Bacsko commented on YARN-9135:


Thanks for updating the patch [~snemeth].

Please make sure that these methods return a standard {{Map}} 
instead of {{ImmutableMap}} (the more generic the better).
{{public ImmutableMap getNodeVsCpus()}}
{{public ImmutableMap getNodeVsCpus()}}



> NM State store ResourceMappings serialization are tested with Strings instead 
> of real Device objects
> 
>
> Key: YARN-9135
> URL: https://issues.apache.org/jira/browse/YARN-9135
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9135.001.patch, YARN-9135.003.patch, 
> YARN-9135.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9133) Make tests more easy to comprehend in TestGpuResourceHandler

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767279#comment-16767279
 ] 

Peter Bacsko commented on YARN-9133:


+1 (non-binding)



> Make tests more easy to comprehend in TestGpuResourceHandler
> 
>
> Key: YARN-9133
> URL: https://issues.apache.org/jira/browse/YARN-9133
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9133.001.patch, YARN-9133.001.patch, 
> YARN-9133.002.patch, YARN-9133.003.patch, YARN-9133.004.patch, 
> YARN-9133.005.patch
>
>
> Tests are not quite easy to read: 
> - Some more helper methods would improve readability.
> - Eliminating the boolean flag that controls if docker is used would also 
> improve readability and clarity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9138) Test error handling of nvidia-smi binary execution of GpuDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767263#comment-16767263
 ] 

Peter Bacsko commented on YARN-9138:


[~snemeth] now you can remove this unnecessary code-paths:

{noformat}
if (Shell.WINDOWS) {
...
} else {
...
{noformat}

> Test error handling of nvidia-smi binary execution of GpuDiscoverer
> ---
>
> Key: YARN-9138
> URL: https://issues.apache.org/jira/browse/YARN-9138
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9138.001.patch, YARN-9138.002.patch
>
>
> The code that executes nvidia-smi (doing GPU device auto-discovery) don't 
> have much test coverage.
> This patch adds tests to this part of the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9138) Test error handling of nvidia-smi binary execution of GpuDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767263#comment-16767263
 ] 

Peter Bacsko edited comment on YARN-9138 at 2/13/19 2:47 PM:
-

[~snemeth]

1. now you can remove these unnecessary code-paths:

{noformat}
if (Shell.WINDOWS) {
...
} else {
...
{noformat}

2. OK, I know this is annoying, but could you static import assert calls? We 
use it everywhere else, so let's be consistent.

3. String "PATH" is used multiple times, it's worth making it static final. 
Same applies to "u+x".


was (Author: pbacsko):
[~snemeth] now you can remove these unnecessary code-paths:

{noformat}
if (Shell.WINDOWS) {
...
} else {
...
{noformat}

> Test error handling of nvidia-smi binary execution of GpuDiscoverer
> ---
>
> Key: YARN-9138
> URL: https://issues.apache.org/jira/browse/YARN-9138
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9138.001.patch, YARN-9138.002.patch
>
>
> The code that executes nvidia-smi (doing GPU device auto-discovery) don't 
> have much test coverage.
> This patch adds tests to this part of the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9138) Test error handling of nvidia-smi binary execution of GpuDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767263#comment-16767263
 ] 

Peter Bacsko edited comment on YARN-9138 at 2/13/19 2:38 PM:
-

[~snemeth] now you can remove these unnecessary code-paths:

{noformat}
if (Shell.WINDOWS) {
...
} else {
...
{noformat}


was (Author: pbacsko):
[~snemeth] now you can remove this unnecessary code-paths:

{noformat}
if (Shell.WINDOWS) {
...
} else {
...
{noformat}

> Test error handling of nvidia-smi binary execution of GpuDiscoverer
> ---
>
> Key: YARN-9138
> URL: https://issues.apache.org/jira/browse/YARN-9138
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9138.001.patch, YARN-9138.002.patch
>
>
> The code that executes nvidia-smi (doing GPU device auto-discovery) don't 
> have much test coverage.
> This patch adds tests to this part of the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9139) Simplify initializer code of GpuDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767257#comment-16767257
 ] 

Peter Bacsko commented on YARN-9139:


[~snemeth]

1. Please fix the remaining checkstyle issues
2. Why is {{TestFpgaDiscoverer}} class is referenced in 
{{TestGpuResourceHandler.java}} ?
3. Repeated use of {{Configuration conf = createDefaultConfig();}} - extract 
{{conf}} to a class variable and initialize once


> Simplify initializer code of GpuDiscoverer
> --
>
> Key: YARN-9139
> URL: https://issues.apache.org/jira/browse/YARN-9139
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9139.001.patch, YARN-9139.002.patch, 
> YARN-9139.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8295) [UI2] The "Resource Usage" tab is pointless for finished applications

2019-02-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767246#comment-16767246
 ] 

Hadoop QA commented on YARN-8295:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
28m 35s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 58s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8295 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958565/YARN-8295.001.patch |
| Optional Tests |  dupname  asflicense  shadedclient  |
| uname | Linux c9389175cbbe 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 00c5ffa |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 445 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23396/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> [UI2] The "Resource Usage" tab is pointless for finished applications
> -
>
> Key: YARN-8295
> URL: https://issues.apache.org/jira/browse/YARN-8295
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Gergely Novák
>Assignee: Charan Hebri
>Priority: Minor
> Attachments: YARN-8295.001.patch
>
>
> If the user goes to Applications -> app -> Resource Usage for a finished 
> application, they get this message: "No resource usage data is available for 
> this application!". 
> I think it would be better to hide this tab for finished applications, or at 
> least add something like "this application is not using any resources because 
> it is finished" to the message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9118) Handle issues with parsing user defined GPU devices in GpuDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767226#comment-16767226
 ] 

Peter Bacsko commented on YARN-9118:


Minor:
{{Configuration conf = new Configuration(false);}} - this line keeps repeating 
in the tests. How about making {{conf}} a class variable and instantiating it 
in {{setup()}}?

Otherwise +1 non-binding.



> Handle issues with parsing user defined GPU devices in GpuDiscoverer
> 
>
> Key: YARN-9118
> URL: https://issues.apache.org/jira/browse/YARN-9118
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9118.001.patch, YARN-9118.002.patch, 
> YARN-9118.003.patch, YARN-9118.004.patch, YARN-9118.005.patch, 
> YARN-9118.006.patch, YARN-9118.007.patch
>
>
> getGpusUsableByYarn has the following issues: 
> - Duplicate GPU device definitions are not denied: This seems to be the 
> biggest issue as it could increase the number of devices on the node if the 
> device ID is defined 2 or more times.
> - An empty-string is accepted, it works like the user would not want to use 
> auto-discovery and haven't defined any GPU devices: This will result in an 
> empty device list, but the empty-string check is never explicitly there in 
> the code, so this behavior just coincidental.
> - Number validation does not happen on GPU device IDs (separated by commas)
> Many testcases are added as the coverage was already very low.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-02-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767222#comment-16767222
 ] 

Hadoop QA commented on YARN-9270:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 5 new + 143 unchanged - 12 fixed = 148 total (was 155) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
50s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9270 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12958563/YARN-9270-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux eb7c1cfa7a5e 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 00c5ffa |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23395/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23395/testReport/ |
| Max. process+thread count | 339 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server

[jira] [Commented] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767203#comment-16767203
 ] 

Peter Bacsko commented on YARN-9217:


Minor comments:
1. Do we need a separate variable here?
{noformat}
70  if (usableGpus.isEmpty()) {
71String message = "GPU is enabled on the NodeManager, but couldn't 
find "
72+ "any usable GPU devices, please double check 
configuration.";
73LOG.warn(message);
{noformat}

2. Similar thing in GpuNodeResourceUpdateHandler
{noformat}
if (usableGpus.isEmpty()) {
  String message = "GPU is enabled, but couldn't find any usable GPUs on the "
  + "NodeManager.";
  LOG.warn(message);
{noformat}

3. I would rename {{checkErrorNumber()}} to {{checkErrorCount()}}

4. By the way -- is it reasonable to perform GPU discovery in a loop? What's 
the idea here? Is "nvidia-smi" flaky sometimes? What condition are we trying to 
avoid? I realized that this part of the code existed before, but still... 
anyone? :) 

5. {{NvidiaBinaryHelper}} - {{@returns}} clause is missing in the JavaDoc

6. {{NvidiaBinaryHelper}} - this class is very small. If it's introduced for 
testing purposes, I strongly recommend using a replaceable lamba function, like 
this:

{noformat}
Function> gpuDeviceRetriever = 
this::getGpuDeviceInformation;
...
@VisibleForTesting
void setGpuDeviceRetriever(Function> 
func) {
  this.gpuDeviceRetriever = func;
}
...
lastDiscoveredGpuInformation = gpuDeviceRetriever.apply(pathOfGpuBinary);
{noformat}

Then you can set your own retrieving logic in the test. Lambdas can't throw 
exceptions, so you have to wrap incorrect return values in {{Optional}}.

*Fundamental question*: is this the way how we want to use thig plugin? Just 
asking because we might accidentally mask erratic behavior. Eg. a Hadoop user 
might think that he has a cluster with 10 GPUs. In reality, the plugin failed 
to detect some cards, and only 5 NMs support GPU scheduling. If it's not 
explicitly displayed, the user might be under the impression that 10 GPUs are 
ready to run YARN workloads. This can be very misleading.

At the very least, a fail-fast method should be considered.

> Nodemanager will fail to start if GPU is misconfigured on the node or GPU 
> drivers missing
> -
>
> Key: YARN-9217
> URL: https://issues.apache.org/jira/browse/YARN-9217
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-9217.001.patch, YARN-9217.002.patch, 
> YARN-9217.003.patch, YARN-9217.004.patch
>
>
> Nodemanager will not start
> 1. If Autodiscovery is enabled:
>  * If nvidia-smi path is misconfigured or the file does not exist.
>  * There is 0 GPU found
>  * If the file exists but it is not pointing to an nvidia-smi
>  * if the binary is ok but there is an IOException
> 2. If the manually configured GPU devices are misconfigured
>  * Any index:minor number format failure will cause a problem
>  * 0 configured device will cause a problem
>  * NumberFormatException is not handled
> It would be a better option to add warnings about the configuration, set 0 
> available GPUs and let the node work and run non-gpu jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8295) [UI2] The "Resource Usage" tab is pointless for finished applications

2019-02-13 Thread Charan Hebri (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charan Hebri updated YARN-8295:
---
Attachment: YARN-8295.001.patch

> [UI2] The "Resource Usage" tab is pointless for finished applications
> -
>
> Key: YARN-8295
> URL: https://issues.apache.org/jira/browse/YARN-8295
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn-ui-v2
>Reporter: Gergely Novák
>Assignee: Charan Hebri
>Priority: Minor
> Attachments: YARN-8295.001.patch
>
>
> If the user goes to Applications -> app -> Resource Usage for a finished 
> application, they get this message: "No resource usage data is available for 
> this application!". 
> I think it would be better to hide this tab for finished applications, or at 
> least add something like "this application is not using any resources because 
> it is finished" to the message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1655) Add implementations to FairScheduler to support increase/decrease container resource

2019-02-13 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767164#comment-16767164
 ] 

Wilfred Spiegelenburg commented on YARN-1655:
-

The junit test failures are not related to this change.

[~asuresh] could you please review this as you did the unifying code work?

> Add implementations to FairScheduler to support increase/decrease container 
> resource
> 
>
> Key: YARN-1655
> URL: https://issues.apache.org/jira/browse/YARN-1655
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, scheduler
>Reporter: Wangda Tan
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-1655.001.patch, YARN-1655.002.patch, 
> YARN-1655.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767153#comment-16767153
 ] 

Peter Bacsko commented on YARN-9270:


Uploaded v2. Changes:
* FpgaDiscoverer is no longer singleton
* Removed unnecessary synchronized methods (checked the call hierarchy)

"We request the instance of the FpgaDiscoverer 5 times, and then call the 
setResourceHanderPlugin on it with the same parameter (openclPlugin)"
This is no longer relevant now.

"Also could you move the previous comments/description of the test cases to the 
new tests' javadoc?"
Removed those altogether. Tests are short now, should be obvious what they do.

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-13 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767167#comment-16767167
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

Junit test failure seems unrelated
no tests is correct those will follow with the integration into the scheduler/

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9268) Various fixes are needed in FpgaDevice

2019-02-13 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9268:
---
Description: 
Need to fix the following in the class {{FpgaDevice}}:
 * It implements {{Comparable}}, but returns 0 in every case. There is no 
natural ordering among FPGA devices, perhaps "acl0" comes before "acl1", but 
this seems too forced and unnecessary.We think this class should not implement 
{{Comparable}} at all, at least not like that.
 * Stores unnecessary fields: devName, busNum, temperature, power usage. For 
one, these are never needed in the code. Secondly, temp and power usage changes 
constantly. It's pointless to store these in this POJO.
 * {{serialVersionUID}} is 1L - let's generate a number for this
 * Use {{int}} instead of {{Integer}} - don't allow nulls. If major/minor 
uniquely identifies the card, then let's demand them in the constructor and 
don't store Integers that can be null.

  was:
Need to fix the following the class {{FpgaDevice}}:
 * It implements {{Comparable}}, but returns 0 in every case. There is no 
natural ordering among FPGA devices, perhaps "acl0" comes before "acl1", but 
this seems too forced and unnecessary.We think this class should not implement 
{{Comparable}} at all, at least not like that.
 * Stores unnecessary fields: devName, busNum, temperature, power usage. For 
one, these are never needed in the code. Secondly, temp and power usage changes 
constantly. It's pointless to store these in this POJO.
 * {{serialVersionUID}} is 1L - let's generate a number for this
 * Use {{int}} instead of {{Integer}} - don't allow nulls. If major/minor 
uniquely identifies the card, then let's demand them in the constructor and 
don't store Integers that can be null.


> Various fixes are needed in FpgaDevice
> --
>
> Key: YARN-9268
> URL: https://issues.apache.org/jira/browse/YARN-9268
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9268-001.patch, YARN-9268-002.patch, 
> YARN-9268-003.patch
>
>
> Need to fix the following in the class {{FpgaDevice}}:
>  * It implements {{Comparable}}, but returns 0 in every case. There is no 
> natural ordering among FPGA devices, perhaps "acl0" comes before "acl1", but 
> this seems too forced and unnecessary.We think this class should not 
> implement {{Comparable}} at all, at least not like that.
>  * Stores unnecessary fields: devName, busNum, temperature, power usage. For 
> one, these are never needed in the code. Secondly, temp and power usage 
> changes constantly. It's pointless to store these in this POJO.
>  * {{serialVersionUID}} is 1L - let's generate a number for this
>  * Use {{int}} instead of {{Integer}} - don't allow nulls. If major/minor 
> uniquely identifies the card, then let's demand them in the constructor and 
> don't store Integers that can be null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9270:
---
Attachment: YARN-9270-002.patch

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch, YARN-9270-002.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9270) Minor cleanup in TestFpgaDiscoverer

2019-02-13 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767085#comment-16767085
 ] 

Peter Bacsko commented on YARN-9270:


" could we remove the wildcard import import java.util.*."
Certainly, let's do this in YARN-9266.

"don't see why the constructor of Configuration is called with false"
[...]
"Also the 5th testcase (testLinuxFpgaResourceDiscoverPluginWithSdkRootSet) uses 
another Conifiguration object in the original testcase"

I think the idea here is that the original conf object was created with "false" 
so that it doesn't load the default values, but in that particular test (5th), 
we do. I see no significant difference though. Just tried it, test result is 
the same. 

I'm also thinking about making {{FpgaDiscoverer}} non-singleton. It's much 
better to test that way.

> Minor cleanup in TestFpgaDiscoverer
> ---
>
> Key: YARN-9270
> URL: https://issues.apache.org/jira/browse/YARN-9270
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9270-001.patch
>
>
> Let's do some cleanup in this class.
> * {{testLinuxFpgaResourceDiscoverPluginConfig}} - this test should be split 
> up to 5 different tests, because it tests 5 different scenarios.
> * remove {{setNewEnvironmentHack()}} - too complicated. We can introduce a 
> {{Function}} in the plugin class like {{Function envProvider 
> = System::getenv()}} plus a setter method which allows the test to modify 
> {{envProvider}}. Much simpler and straightfoward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7977) Do ACLs check for flow activity entities

2019-02-13 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-7977:
---

Assignee: Abhishek Modi

> Do ACLs check for flow activity entities
> 
>
> Key: YARN-7977
> URL: https://issues.apache.org/jira/browse/YARN-7977
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Abhishek Modi
>Priority: Major
>
> Verify ACLs while retrieving flow activity entities



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7979) Do ACLs check for application entities

2019-02-13 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-7979:
---

Assignee: Abhishek Modi

> Do ACLs check for application entities
> --
>
> Key: YARN-7979
> URL: https://issues.apache.org/jira/browse/YARN-7979
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Abhishek Modi
>Priority: Major
>
> Verify ACLs for application entities



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7981) Do ACLs check for sub app entities

2019-02-13 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-7981:
---

Assignee: Abhishek Modi

> Do ACLs check for sub app entities
> --
>
> Key: YARN-7981
> URL: https://issues.apache.org/jira/browse/YARN-7981
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Abhishek Modi
>Priority: Major
>
> ACLs check while retrieving sub app entities.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-5357) Timeline service v2 integration with Federation

2019-02-13 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-5357:
---

Assignee: Abhishek Modi  (was: Prabha Manepalli)

> Timeline service v2 integration with Federation 
> 
>
> Key: YARN-5357
> URL: https://issues.apache.org/jira/browse/YARN-5357
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Abhishek Modi
>Priority: Major
>
> Jira to note the discussion points from an initial chat about integrating 
> Timeline Service v2 with Federation (YARN-2915).
> cc [~subru] [~curino] 
> For Federation:
> - all entities that belong to the same flow run should have the same cluster 
> name
> - app id in the same flow run strongly ordered in time
> - need a logical cluster name and physical cluster name
> - a possibility to implement the Application TimelineCollector as an 
> interceptor in the AMRMProxyService.
> For Timeline Service:
> - need to store physical cluster id and logical cluster id so that we don't 
> lose information at any level (flow/app/entity etc)
> - add a  new table app id to cluster mapping table
> - need a different entity table/some table to store node level metrics for 
> physical cluster stats. Once we get to node-level rollup, we probably have to 
> store something in a dc, cluster, rack, node hierarchy. In that case a 
> physical cluster makes sense, but we'd still need some way to tie physical 
> and logical together in order to make automatic error detection etc that 
> we're envisioning feasible within a federated setup.
> For the Cluster Naming convention:
> - three situations for cluster name:
> > app submitted to router should take federated (aka logical) cluster name
> > app submitted directly to RM should take physical cluster name
> > Info about the physical cluster  in entities?
> - suggestion to set the cluster name as yarn tag at the router level (in the 
> app submission context) 
> Other points to note:
> - for federation to work smoothly in environments that use HDFS some 
> additional considerations are needed, and possibly some solution like what is 
> being used at Twitter with the nFly approach.
> Email thread context:
> {code}
> -- Forwarded message --
> From: Joep Rottinghuis 
> Date: Fri, Jul 8, 2016 at 1:22 PM
> Subject: Re: Federation -Timeline Service meeting notes
> To: Subramaniam Venkatraman Krishnan 
> Cc: Sangjin Lee, Vrushali Channapattan , Carlo Curino
> Thanks for the notes.
> I think that for federation to work smoothly in environments that use HDFS 
> some additional considerations are needed, and possibly some solution like 
> what we're using at Twitter with our nFly approach.
> bq. - need a different entity table/some table to store node level metrics 
> for physical cluster stats
> Once we get to node-level rollup, we probably have to store something in a 
> dc, cluster, rack, node hierarchy. In that case a physical cluster makes 
> sense, but we'd still need some way to tie physical and logical together in 
> order to make automatic error detection etc that we're envisioning feasible 
> within a federated setup.
> Cheers,
> Joep
> On Fri, Jul 8, 2016 at 1:00 PM, Subramaniam Venkatraman Krishnan  wrote:
> Thanks Vrushali for crisply capturing the essential from our rambling 
> discussion J.
>  
> Sangjin, I just want to add one comment to yours – we want to retain the 
> physical cluster name (possibly as a new entity type) so that we don’t lose 
> information & we can cluster level rollups even if they are not efficient.
>  
> Additionally, based on the walkthrough of Federation design:
> · There was general agreement with the proposed approach.
> · There is a possibility to implement the Application 
> TimelineCollector as an interceptor in the AMRMProxyService.
> · Joep raised the concern that it would be better if the RMs 
> obtain the epoch from FederationStateStore. This is not currently in the 
> roadmap of our MVP but we definitely plan to address this in future.
>  
> Regards,
> Subru
>  
> From: Sangjin Lee
> Sent: Thursday, July 07, 2016 6:22 PM
> To: Vrushali Channapattan 
> Cc: Joep Rottinghuis; Carlo Curino; Subramaniam Venkatraman Krishnan 
> Subject: Re: Federation -Timeline Service meeting notes
>  
> Thanks for the summary Vrushali!
>  
> Just so that we're on the same page regarding the terminology, I 
> understand we're using the terms "logical cluster" and "federated cluster" 
> interchangeably.
>  
> Also, between using the federated cluster name and the home cluster name 
> as a solution, I think we were leaning towards the federated cluster name 
> (al

[jira] [Assigned] (YARN-7978) Do ACLs check for flowrun entities

2019-02-13 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi reassigned YARN-7978:
---

Assignee: Abhishek Modi

> Do ACLs check for flowrun entities
> --
>
> Key: YARN-7978
> URL: https://issues.apache.org/jira/browse/YARN-7978
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Abhishek Modi
>Priority: Major
>
> Verify ACLs while retrieving flowrun entities



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-2499) Respect labels in preemption policy of fair scheduler

2019-02-13 Thread Zhaohui Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaohui Xin reassigned YARN-2499:
-

Assignee: Zhaohui Xin

> Respect labels in preemption policy of fair scheduler
> -
>
> Key: YARN-2499
> URL: https://issues.apache.org/jira/browse/YARN-2499
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Zhaohui Xin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9294) Potential race condition in setting GPU cgroups & execute command in the selected cgroup

2019-02-13 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766901#comment-16766901
 ] 

Zhankun Tang commented on YARN-9294:


[~oliverhuh...@gmail.com] , good job! Looking forward to your patch.

> Potential race condition in setting GPU cgroups & execute command in the 
> selected cgroup
> 
>
> Key: YARN-9294
> URL: https://issues.apache.org/jira/browse/YARN-9294
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0
>Reporter: Keqiu Hu
>Assignee: Keqiu Hu
>Priority: Critical
>
> Environment is latest branch-2 head
> OS: RHEL 7.4
> *Observation*
> Out of ~10 container allocations with GPU requirement, at least 1 of the 
> allocated containers would lose GPU isolation. Even if I asked for 1 GPU, I 
> could still have visibility to all GPUs on the same machine when running 
> nvidia-smi.
> The funny thing is even though I have visibility to all GPUs at the moment of 
> executing container-executor (say ordinal 0,1,2,3), but cgroups jailed the 
> process's access to only that single GPU after sometime. 
> The underlying process trying to access GPU would take the initial 
> information as source of truth and try to access physical 0 GPU which is not 
> really available to the process. This results in a 
> [CUDA_ERROR_INVALID_DEVICE: invalid device ordinal] error.
> Validated the container-executor commands are correct:
> {code:java}
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, --module-gpu, 
> --container_id, container_e22_1549663278916_0249_01_01, --excluded_gpus, 
> 0,1,2,3]
> PrivilegedOperationExecutor command: 
> [/export/apps/hadoop/nodemanager/latest/bin/container-executor, khu, khu, 0, 
> application_1549663278916_0249, 
> /grid/a/tmp/yarn/nmPrivate/container_e22_1549663278916_0249_01_01.tokens, 
> /grid/a/tmp/yarn, /grid/a/tmp/userlogs, 
> /export/apps/jdk/JDK-1_8_0_172/jre/bin/java, -classpath, ..., -Xmx256m, 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer,
>  khu, application_1549663278916_0249, 
> container_e22_1549663278916_0249_01_01, ltx1-hcl7552.grid.linkedin.com, 
> 8040, /grid/a/tmp/yarn]
> {code}
> So most likely a race condition between these two operations? 
> cc [~jhung]
> Another potential theory is the cgroups creation for the container actually 
> failed but the error was swallowed silently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org