[jira] [Commented] (YARN-7289) Application lifetime does not work with FairScheduler

2017-10-25 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219894#comment-16219894
 ] 

Rohith Sharma K S commented on YARN-7289:
-

Please go ahead. Since yesterday it was off time for you, my self took liberty 
to upload rather than waiting! We should target it for 2.9!

> Application lifetime does not work with FairScheduler
> -
>
> Key: YARN-7289
> URL: https://issues.apache.org/jira/browse/YARN-7289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7289.000.patch, YARN-7289.001.patch, 
> YARN-7289.002.patch, YARN-7289.003.patch, YARN-7289.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7394) Merge code paths for Reservation/Plan queues and Auto Created queues

2017-10-25 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7394:
---
Attachment: YARN-7394.patch

Attached patch renames ReservationQueue to AutoCreatedLeafQueue and adds an 
AbstractAutoCreatingParentQueue which is essentially a container for 
automatically created child leaf queues and is extended by 
PlanQueue/AutoCreatedEnabledParentQueue

> Merge code paths for Reservation/Plan queues and Auto Created queues
> 
>
> Key: YARN-7394
> URL: https://issues.apache.org/jira/browse/YARN-7394
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
> Attachments: YARN-7394.patch
>
>
> The initialization/reinitialization logic for ReservationQueue and 
> AutoCreated Leaf queues are similar. The proposal is to rename 
> ReservationQueue to a more generic name AutoCreatedLeafQueue which are either 
> managed by PlanQueue(already exists) or AutoCreateEnabledParentQueue (new 
> class). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7394) Merge code paths for Reservation/Plan queues and Auto Created queues

2017-10-25 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-7394:
---
Description: 
The initialization/reinitialization logic for ReservationQueue and AutoCreated 
Leaf queues are similar. The proposal is to rename ReservationQueue to a more 
generic name AutoCreatedLeafQueue which are either managed by PlanQueue(already 
exists) or AutoCreateEnabledParentQueue (new class). 



  was:
The initialization/reinitialization logic for ReservationQueue and AutoCreated 
Leaf queues are similar. The proposal is to rename ReservationQueue to a more 
generic name AutoCreatedLeafQueue which are either managed by PlanQueue(already 
exists) or AutoCreatedEnabledParentQueue (new class). 




> Merge code paths for Reservation/Plan queues and Auto Created queues
> 
>
> Key: YARN-7394
> URL: https://issues.apache.org/jira/browse/YARN-7394
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>
> The initialization/reinitialization logic for ReservationQueue and 
> AutoCreated Leaf queues are similar. The proposal is to rename 
> ReservationQueue to a more generic name AutoCreatedLeafQueue which are either 
> managed by PlanQueue(already exists) or AutoCreateEnabledParentQueue (new 
> class). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7396) NPE when accessing container logs due to null dirsHandler

2017-10-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219842#comment-16219842
 ] 

Sunil G commented on YARN-7396:
---

Yes. Makes sense to me.

> NPE when accessing container logs due to null dirsHandler
> -
>
> Key: YARN-7396
> URL: https://issues.apache.org/jira/browse/YARN-7396
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7396.001.patch
>
>
> {noformat}java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.containerLogPageRedirectPath(NMWebAppFilter.java:96)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.doFilter(NMWebAppFilter.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829){noformat}
> In YARN-6620 the NMContext creation in {{NodeManager#serviceInit}} was moved. 
> It's now created before the dirsHandler is initialized. So when 
> {{nmContext.getLocalDirsHandler}} is called, it's null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7224) Support GPU isolation for docker container

2017-10-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219840#comment-16219840
 ] 

Sunil G commented on YARN-7224:
---

bq.In general dockerCommandPlugin.updateDockerRunCommand helps to update docker 
command for volume etc. However is its better to have an api named 
sanitize/verifyCommand in dockerCommandPlugin so that incoming/created command 
will validated and logged based on system parameters
My point was to have a new api named sanitize/verifyCommand in 
{{DockerCommandPlugin}} along with updateDockerRunCommand. So all validations 
will be in a cleaner interface and need not have to much depend on updates etc.

bq.I like this ideal, but considering size of this patch, can we do this in a 
follow up JIRA?
Yes, you are correct. Could be a follow-up jira.

> Support GPU isolation for docker container
> --
>
> Key: YARN-7224
> URL: https://issues.apache.org/jira/browse/YARN-7224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-7224.001.patch, YARN-7224.002-wip.patch, 
> YARN-7224.003.patch, YARN-7224.004.patch, YARN-7224.005.patch, 
> YARN-7224.006.patch, YARN-7224.007.patch, YARN-7224.008.patch
>
>
> This patch is to address issues when docker container is being used:
> 1. GPU driver and nvidia libraries: If GPU drivers and NV libraries are 
> pre-packaged inside docker image, it could conflict to driver and 
> nvidia-libraries installed on Host OS. An alternative solution is to detect 
> Host OS's installed drivers and devices, mount it when launch docker 
> container. Please refer to \[1\] for more details. 
> 2. Image detection: 
> From \[2\], the challenge is: 
> bq. Mounting user-level driver libraries and device files clobbers the 
> environment of the container, it should be done only when the container is 
> running a GPU application. The challenge here is to determine if a given 
> image will be using the GPU or not. We should also prevent launching 
> containers based on a Docker image that is incompatible with the host NVIDIA 
> driver version, you can find more details on this wiki page.
> 3. GPU isolation.
> *Proposed solution*:
> a. Use nvidia-docker-plugin \[3\] to address issue #1, this is the same 
> solution used by K8S \[4\]. issue #2 could be addressed in a separate JIRA.
> We won't ship nvidia-docker-plugin with out releases and we require cluster 
> admin to preinstall nvidia-docker-plugin to use GPU+docker support on YARN. 
> "nvidia-docker" is a wrapper of docker binary which can address #3 as well, 
> however "nvidia-docker" doesn't provide same semantics of docker, and it 
> needs to setup additional environments such as PATH/LD_LIBRARY_PATH to use 
> it. To avoid introducing additional issues, we plan to use 
> nvidia-docker-plugin + docker binary approach.
> b. To address GPU driver and nvidia libraries, we uses nvidia-docker-plugin 
> \[3\] to create a volume which includes GPU-related libraries and mount it 
> when docker container being launched. Changes include: 
> - Instead of using {{volume-driver}}, this patch added {{docker volume 
> create}} command to c-e and NM Java side. The reason is {{volume-driver}} can 
> only use single volume driver for each launched docker container.
> - Updated {{c-e}} and Java side, if a mounted volume is a named volume in 
> docker, skip checking file existence. (Named-volume still need to be added to 
> permitted list of container-executor.cfg).
> c. To address isolation issue:
> We found that, cgroup + docker doesn't work under newer docker version which 
> uses {{runc}} as default runtime. Setting {{--cgroup-parent}} to a cgroup 
> which include any {{devices.deny}} causes docker container cannot be launched.
> Instead this patch passes allowed GPU devices via {{--device}} to docker 
> launch command.
> References:
> \[1\] https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver
> \[2\] https://github.com/NVIDIA/nvidia-docker/wiki/Image-inspection
> \[3\] https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin
> \[4\] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4827) Document configuration of ReservationSystem for FairScheduler

2017-10-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219817#comment-16219817
 ] 

Hudson commented on YARN-4827:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #13138 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13138/])
YARN-4827. Document configuration of ReservationSystem for (subru: rev 
3fae675383489129b3ca3c66683a1215d0c6edf0)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ReservationSystem.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/FairScheduler.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/CapacityOverTimePolicy.java


> Document configuration of ReservationSystem for FairScheduler
> -
>
> Key: YARN-4827
> URL: https://issues.apache.org/jira/browse/YARN-4827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Subru Krishnan
>Assignee: Yufei Gu
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-4827.001.patch, YARN-4827.002.patch, 
> YARN-4827.003.patch
>
>
> This JIRA tracks the effort to add documentation on how to configure 
> ReservationSystem for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7289) Application lifetime does not work with FairScheduler

2017-10-25 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219807#comment-16219807
 ] 

Miklos Szegedi commented on YARN-7289:
--

[~rohithsharma], thank you for the patch. Would you like to continue this 
patch, or should I do that?

> Application lifetime does not work with FairScheduler
> -
>
> Key: YARN-7289
> URL: https://issues.apache.org/jira/browse/YARN-7289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7289.000.patch, YARN-7289.001.patch, 
> YARN-7289.002.patch, YARN-7289.003.patch, YARN-7289.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7064) Use cgroup to get container resource utilization

2017-10-25 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated YARN-7064:
-
Attachment: YARN-7064.005.patch

> Use cgroup to get container resource utilization
> 
>
> Key: YARN-7064
> URL: https://issues.apache.org/jira/browse/YARN-7064
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7064.000.patch, YARN-7064.001.patch, 
> YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch, 
> YARN-7064.005.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants 
> to rebase patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7064) Use cgroup to get container resource utilization

2017-10-25 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219785#comment-16219785
 ] 

Miklos Szegedi commented on YARN-7064:
--

Thank you, [~haibochen] for the review.
I changed the behavior, so that we need to explicitly enable the resource 
calculator. I also noticed during an end to end testing that what we return is 
not the actual virtual memory reserved, just the used swap space including the 
physical memory. I introduced a CombinedResourceCalculator, that combines the 
two implementations (procfs, cgroups) and returns the right amount of virtual 
memory.


> Use cgroup to get container resource utilization
> 
>
> Key: YARN-7064
> URL: https://issues.apache.org/jira/browse/YARN-7064
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7064.000.patch, YARN-7064.001.patch, 
> YARN-7064.002.patch, YARN-7064.003.patch, YARN-7064.004.patch
>
>
> This is an addendum to YARN-6668. What happens is that that jira always wants 
> to rebase patches against YARN-1011 instead of trunk.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7398) LICENSE.txt is broken in branch-2 by YARN-4849 merge

2017-10-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219780#comment-16219780
 ] 

Subru Krishnan commented on YARN-7398:
--

Assigning it to [~varun_saxena] based on offline discussion with [~leftnoteasy] 
as the LICENSE.txt is fine in trunk and this seems to have happened during YARN 
UI v2 branch-2 merge.

cc [~vrushalic].

Can one of you take a look? Thanks.

> LICENSE.txt is broken in branch-2 by YARN-4849 merge
> 
>
> Key: YARN-7398
> URL: https://issues.apache.org/jira/browse/YARN-7398
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Subru Krishnan
>Assignee: Varun Saxena
>Priority: Blocker
>
> YARN-4849 (commit sha id 56654d8820f345fdefd6a3f81836125aa67adbae) seems to 
> have been based out of stale version of LICENSE.txt, for e.g: HSQLDB, gtest 
> etc, so I have reverted it. 
> [~leftnoteasy]/[~sunilg], can you guys take a look and fix the UI v2 licenses 
> asap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7398) LICENSE.txt is broken in branch-2 by YARN-4849 merge

2017-10-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-7398:
-
Summary: LICENSE.txt is broken in branch-2 by YARN-4849 merge  (was: 
LICENSE.txt is broken in branch-2 by YARN-4849)

> LICENSE.txt is broken in branch-2 by YARN-4849 merge
> 
>
> Key: YARN-7398
> URL: https://issues.apache.org/jira/browse/YARN-7398
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Subru Krishnan
>Assignee: Varun Saxena
>Priority: Blocker
>
> YARN-4849 (commit sha id 56654d8820f345fdefd6a3f81836125aa67adbae) seems to 
> have been based out of stale version of LICENSE.txt, for e.g: HSQLDB, gtest 
> etc, so I have reverted it. 
> [~leftnoteasy]/[~sunilg], can you guys take a look and fix the UI v2 licenses 
> asap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7398) LICENSE.txt is broken in branch-2 by YARN-4849

2017-10-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan reassigned YARN-7398:


Assignee: Varun Saxena  (was: Wangda Tan)

> LICENSE.txt is broken in branch-2 by YARN-4849
> --
>
> Key: YARN-7398
> URL: https://issues.apache.org/jira/browse/YARN-7398
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Subru Krishnan
>Assignee: Varun Saxena
>Priority: Blocker
>
> YARN-4849 (commit sha id 56654d8820f345fdefd6a3f81836125aa67adbae) seems to 
> have been based out of stale version of LICENSE.txt, for e.g: HSQLDB, gtest 
> etc, so I have reverted it. 
> [~leftnoteasy]/[~sunilg], can you guys take a look and fix the UI v2 licenses 
> asap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7398) LICENSE.txt is broken in branch-2 by YARN-4849

2017-10-25 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-7398:


 Summary: LICENSE.txt is broken in branch-2 by YARN-4849
 Key: YARN-7398
 URL: https://issues.apache.org/jira/browse/YARN-7398
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Subru Krishnan
Assignee: Wangda Tan
Priority: Blocker


YARN-4849 (commit sha id 56654d8820f345fdefd6a3f81836125aa67adbae) seems to 
have been based out of stale version of LICENSE.txt, for e.g: HSQLDB, gtest 
etc, so I have reverted it. 

[~leftnoteasy]/[~sunilg], can you guys take a look and fix the UI v2 licenses 
asap.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7396) NPE when accessing container logs due to null dirsHandler

2017-10-25 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219770#comment-16219770
 ] 

Jonathan Hung commented on YARN-7396:
-

001: simple patch which does dirsHandler initialization before NMContext 
initialization.

[~leftnoteasy]/[~sunilg], could you take a look at this? Thanks!

> NPE when accessing container logs due to null dirsHandler
> -
>
> Key: YARN-7396
> URL: https://issues.apache.org/jira/browse/YARN-7396
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7396.001.patch
>
>
> {noformat}java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.containerLogPageRedirectPath(NMWebAppFilter.java:96)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.doFilter(NMWebAppFilter.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829){noformat}
> In YARN-6620 the NMContext creation in {{NodeManager#serviceInit}} was moved. 
> It's now created before the dirsHandler is initialized. So when 
> {{nmContext.getLocalDirsHandler}} is called, it's null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7396) NPE when accessing container logs due to null dirsHandler

2017-10-25 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung reassigned YARN-7396:
---

Assignee: Jonathan Hung

> NPE when accessing container logs due to null dirsHandler
> -
>
> Key: YARN-7396
> URL: https://issues.apache.org/jira/browse/YARN-7396
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7396.001.patch
>
>
> {noformat}java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.containerLogPageRedirectPath(NMWebAppFilter.java:96)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.doFilter(NMWebAppFilter.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829){noformat}
> In YARN-6620 the NMContext creation in {{NodeManager#serviceInit}} was moved. 
> It's now created before the dirsHandler is initialized. So when 
> {{nmContext.getLocalDirsHandler}} is called, it's null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7396) NPE when accessing container logs due to null dirsHandler

2017-10-25 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7396:

Attachment: YARN-7396.001.patch

> NPE when accessing container logs due to null dirsHandler
> -
>
> Key: YARN-7396
> URL: https://issues.apache.org/jira/browse/YARN-7396
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Hung
> Attachments: YARN-7396.001.patch
>
>
> {noformat}java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.containerLogPageRedirectPath(NMWebAppFilter.java:96)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.doFilter(NMWebAppFilter.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829){noformat}
> In YARN-6620 the NMContext creation in {{NodeManager#serviceInit}} was moved. 
> It's now created before the dirsHandler is initialized. So when 
> {{nmContext.getLocalDirsHandler}} is called, it's null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7393) RegistryDNS doesn't work in tcp channel

2017-10-25 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7393:

Attachment: (was: YARN-7393.yarn-native-services.001.patch)

> RegistryDNS doesn't work in tcp channel
> ---
>
> Key: YARN-7393
> URL: https://issues.apache.org/jira/browse/YARN-7393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7393.yarn-native-services.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7393) RegistryDNS doesn't work in tcp channel

2017-10-25 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7393:

Attachment: YARN-7393.yarn-native-services.001.patch

> RegistryDNS doesn't work in tcp channel
> ---
>
> Key: YARN-7393
> URL: https://issues.apache.org/jira/browse/YARN-7393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7393.yarn-native-services.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7393) RegistryDNS doesn't work in tcp channel

2017-10-25 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7393:

Attachment: YARN-7393.yarn-native-services.001.patch

Fix message size calculation, and throttle if there is no incoming request.

> RegistryDNS doesn't work in tcp channel
> ---
>
> Key: YARN-7393
> URL: https://issues.apache.org/jira/browse/YARN-7393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7393.yarn-native-services.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7397) Reduce lock contention in FairScheduler#getAppWeight()

2017-10-25 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-7397:
---
Attachment: YARN-7397.001.patch

My testing shows about a 5% performance improvement from this patch.

> Reduce lock contention in FairScheduler#getAppWeight()
> --
>
> Key: YARN-7397
> URL: https://issues.apache.org/jira/browse/YARN-7397
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-7397.001.patch
>
>
> In profiling the fair scheduler, a large amount of time is spent waiting to 
> get the lock in {{FairScheduler.getAppWeight()}}, when the lock isn't 
> actually needed.  This patch reduces the scope of the lock to eliminate that 
> contention.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7393) RegistryDNS doesn't work in tcp channel

2017-10-25 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang reassigned YARN-7393:
---

Assignee: Eric Yang

> RegistryDNS doesn't work in tcp channel
> ---
>
> Key: YARN-7393
> URL: https://issues.apache.org/jira/browse/YARN-7393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7320) Duplicate LiteralByteStrings in SystemCredentialsForAppsProto.credentialsForApp_

2017-10-25 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219758#comment-16219758
 ] 

Robert Kanter commented on YARN-7320:
-

I discussed this offline with [~mi...@cloudera.com] and it looks like a race 
condition in {{ProtoUtils#convertToProtoFormat}} because it plays around with 
the {{ByteBuffer}}s position.  Duplicating it should remove the race condition.

+1 on the addendum pending Jenkins.  

> Duplicate LiteralByteStrings in 
> SystemCredentialsForAppsProto.credentialsForApp_
> 
>
> Key: YARN-7320
> URL: https://issues.apache.org/jira/browse/YARN-7320
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Fix For: 3.0.0
>
> Attachments: YARN-7320.01.addendum.patch, YARN-7320.01.patch, 
> YARN-7320.02.patch
>
>
> Using jxray (www.jxray.com) I've analyzed several heap dumps from YARN 
> Resource Manager running in a big cluster. The tool uncovered several sources 
> of memory waste. One problem, which results in wasting more than a quarter of 
> all memory, is a large number of duplicate {{LiteralByteString}} objects 
> coming from the following reference chain:
> {code}
> 1,011,810K (26.9%): byte[]: 5416705 / 100% dup arrays (22108 unique)
> ↖com.google.protobuf.LiteralByteString.bytes
> ↖org.apache.hadoop.yarn.proto.YarnServerCommonServiceProtos$.credentialsForApp_
> ↖{j.u.ArrayList}
> ↖j.u.Collections$UnmodifiableRandomAccessList.c
> ↖org.apache.hadoop.yarn.proto.YarnServerCommonServiceProtos$NodeHeartbeatResponseProto.systemCredentialsForApps_
> ↖org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.NodeHeartbeatResponsePBImpl.proto
> ↖org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.latestNodeHeartBeatResponse
> ↖org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode.rmNode
> ...
> {code}
> That is, collectively reference chains that look as above hold in memory 5.4 
> million {{LiteralByteString}} objects, but only ~22 thousand of these objects 
> are unique. Deduplicating these objects, e.g. using a Google Object Interner 
> instance, would save ~1GB of memory.
> It looks like the main place where the above {{LiteralByteString}}s are 
> created and attached to the {{SystemCredentialsForAppsProto}} objects is in 
> {{NodeHeartbeatResponsePBImpl.java}}, method 
> {{addSystemCredentialsToProto()}}. Probably adding a call to an interner 
> there will fix the problem. wi 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7389) Make TestResourceManager Scheduler agnostic

2017-10-25 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219753#comment-16219753
 ] 

Robert Kanter commented on YARN-7389:
-

[~subru], yes.  However, it won't work because it relies on some changes 
introduced by YARN-7146, which isn't in branch-2.  Though I guess we could 
backport YARN-7146 to branch-2 as well.

> Make TestResourceManager Scheduler agnostic
> ---
>
> Key: YARN-7389
> URL: https://issues.apache.org/jira/browse/YARN-7389
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 3.0.0
>
> Attachments: YARN-7389.001.patch
>
>
> Many of the tests in {{TestResourceManager}} override the scheduler to always 
> be {{CapacityScheduler}}.  However, these tests should be made scheduler 
> agnostic (they are testing the RM, not the scheduler).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7375) NPE in the RM Webapp when HA is enabled and the active RM fails

2017-10-25 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219738#comment-16219738
 ] 

Chandni Singh edited comment on YARN-7375 at 10/25/17 11:43 PM:


Test failures are unrelated to this change


was (Author: csingh):
Test failures are unrelated to this changes

> NPE in the RM Webapp when HA is enabled and the active RM fails
> ---
>
> Key: YARN-7375
> URL: https://issues.apache.org/jira/browse/YARN-7375
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
> Attachments: YARN-7375.001.patch
>
>
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280)
> at 
> org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58)
> Steps:
> 1. RM HA is enabled
> 2. Started a service 
> 3. Active RM failed. 
> 4. Switched to the Web UI of Standby RM 
> 5. Clicked to view the containers of the previous started application and 
> landed to an error page.
> 6. The NPE mentioned above was found in the standby RM logs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7397) Reduce lock contention in FairScheduler#getAppWeight()

2017-10-25 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-7397:
--

 Summary: Reduce lock contention in FairScheduler#getAppWeight()
 Key: YARN-7397
 URL: https://issues.apache.org/jira/browse/YARN-7397
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-beta1
Reporter: Daniel Templeton
Assignee: Daniel Templeton


In profiling the fair scheduler, a large amount of time is spent waiting to get 
the lock in {{FairScheduler.getAppWeight()}}, when the lock isn't actually 
needed.  This patch reduces the scope of the lock to eliminate that contention.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7375) NPE in the RM Webapp when HA is enabled and the active RM fails

2017-10-25 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219738#comment-16219738
 ] 

Chandni Singh commented on YARN-7375:
-

Test failures are unrelated to this changes

> NPE in the RM Webapp when HA is enabled and the active RM fails
> ---
>
> Key: YARN-7375
> URL: https://issues.apache.org/jira/browse/YARN-7375
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
> Attachments: YARN-7375.001.patch
>
>
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:327)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo.(AppInfo.java:133)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createResourceRequestsTable(RMAppAttemptBlock.java:77)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMAppAttemptBlock.createTablesForAttemptMetrics(RMAppAttemptBlock.java:280)
> at 
> org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:153)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.appattempt(RmController.java:58)
> Steps:
> 1. RM HA is enabled
> 2. Started a service 
> 3. Active RM failed. 
> 4. Switched to the Web UI of Standby RM 
> 5. Clicked to view the containers of the previous started application and 
> landed to an error page.
> 6. The NPE mentioned above was found in the standby RM logs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219729#comment-16219729
 ] 

Haibo Chen commented on YARN-7390:
--

I see. Was thinking that the failure was caused merely by misconfiguration. +1 
on fixing the real FS root cause.

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7217) Improve API service usability for updating service spec and state

2017-10-25 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219725#comment-16219725
 ] 

Jian He commented on YARN-7217:
---

Meta questions as Billie mentioned:
- should solr and fs be a pluggable implementation of a common interface ? 
Basically, should it be either fs or solr back-end. Right now it's both there.

Code comments:
- For below, one constructor can call another to avoid code duplication
{code}
public ApiServer() {
  YARN_CONFIG = new YarnConfiguration();
  try {
if (SERVICE_CLIENT==null) {
  SERVICE_CLIENT = new ServiceClient();
  SERVICE_CLIENT.init(YARN_CONFIG);
  SERVICE_CLIENT.start();
}
  } catch (Exception e) {
LOG.error("Fail to initialize ServiceClient, ApiServer is unavailable.");
  }
}

/**
 * Constructor used by ResourceManager.
 *
 * @param conf
 */
@Inject
public ApiServer(Configuration conf) {
  YARN_CONFIG = conf;
  try {
if (SERVICE_CLIENT==null) {
  SERVICE_CLIENT = new ServiceClient();
  SERVICE_CLIENT.init(YARN_CONFIG);
  SERVICE_CLIENT.start();
}
  } catch (Exception e) {
LOG.error("Fail to initialize ServiceClient, ApiServer is unavailable.");
  }
}

{code}
- getServicesList: it assumes solr is enabled, if not, it will throw NPE. I 
think we should conditionally check if solr is enabled, if not, throw exception 
saying only solr backend is supported for this endpoint.
- similarly for getServiceSpec endpoint, it will throw NPE because ysc is null, 
if solr is not enabled. 
- updateServiceSpec: it only updates the spec in hdfs and solr, what's the use 
case of this api ? 
- similarly TestYarnNativeServices#testChangeSpec, as discussed, we won't need 
to restart the entire service to update the spec ? what's the use case for this 
?
- {{/services/{service_name}/state}} : call it status  ? IMHO, state is like 
STOPPED or STARTED, status is more like a group of information
- Should it be if solr is enabled, create the solrClient ? if solr is not 
enabled, there's no point creating the solrClient
{code}
 addService(yarnClient);
  + createYarnSolrClient(configuration);
 super.serviceInit(configuration);
{code}
- The entire code can be inside the isEnabled block, similarly for other places
{code}
if (UserGroupInformation.isSecurityEnabled()) {
  try {
userName = UserGroupInformation.getCurrentUser().getUserName();
  } catch(KerberosAuthException e) {
userName = null;
  }
}
if (ysc.isEnabled()) {
  try {
ysc.deployApp(service, userName);
  } catch (ServiceUnavailableException e) {
throw new YarnException("Fail to persist service spec.");
  }
}
{code}
- we can create a common method for this, since it is used in so many places
{code}
if (UserGroupInformation.isSecurityEnabled()) {
  try {
userName = UserGroupInformation.getCurrentUser().getUserName();
  } catch(KerberosAuthException e) {
userName = null;
  }
}
{code}
- updateComponent api should also update the spec in solr ?
- All services configs are currently in YarnServiceConf class, I think we can 
put the new configs there to not mix with the core YarnConfigurations, until 
the feature and config namings are stable, we can merge them back to 
YarnConfiguration.
- ApiServerTest class is not used, remove?
- ServiceClientTest: createYarnSolrClient and serviceInit are the same as in 
the parent class, no need to override ?
- the username parameter is not used in findAppEntry API at all, but the 
deployApp inserts the username,  then why is the username required  in the 
first place ?
- similarly, username is not used in deleteApp, then why do we need to get the 
username in caller in the first place
- could you explain below logic ? looks like it tries to look for all entries 
with "id:appName" and the while loop continues until the last one is find, and 
return the last one . Presumbaly  there's only 1 entry, then why is a while 
loop required? If there are multiple entries, why returning the last one ? 
{code}
QueryResponse response;
try {
  response = solr.query(query);
  Iterator appList = response.getResults()
  .listIterator();
  while (appList.hasNext()) {
SolrDocument d = appList.next();
entry = mapper.readValue(d.get("yarnfile_s").toString(),
Service.class);
found = true;
  }
  if (!found) {
throw new ApplicationNotFoundException("Application entry is " +
"not found: " + appName);
  }
} catch (SolrServerException | IOException e) {
  LOG.error("Error in finding deployed application: " + appName, e);
}
return entry;
{code}
- YarnSolrClient#getEnabled is duplicated with isEnabled method. 
- YarnSolrClient#check, if the caller is only invoking the solrClient API if 
solr is enabled, then it won't enter this method in the first place, so the 
check is not 

[jira] [Created] (YARN-7396) NPE when accessing container logs due to null dirsHandler

2017-10-25 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-7396:
---

 Summary: NPE when accessing container logs due to null dirsHandler
 Key: YARN-7396
 URL: https://issues.apache.org/jira/browse/YARN-7396
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jonathan Hung


{noformat}java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.containerLogPageRedirectPath(NMWebAppFilter.java:96)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.NMWebAppFilter.doFilter(NMWebAppFilter.java:62)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829){noformat}
In YARN-6620 the NMContext creation in {{NodeManager#serviceInit}} was moved. 
It's now created before the dirsHandler is initialized. So when 
{{nmContext.getLocalDirsHandler}} is called, it's null.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations

2017-10-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219692#comment-16219692
 ] 

Wangda Tan commented on YARN-4511:
--

[~haibo.chen], 

ContainerUpdateContext#swapContainer 
1) Instead of using assert, should we throw exception? Assertions will be 
removed at the runtime.

2) I'm not sure why it need acquire Node and do operations, the original 
purpose of swapContainers/ContainerUpdateContext is (I believe you understand 
the code, just to make sure we have no differences here):

a. When a promotion or demotion request comes (same as increase/decrease 
container), ContainerUpdateContext calculate resource differences (for example, 
promote a 2G opportunistic container means request a 2G node-local request)and 
send request to scheduler.
b. Scheduler handles the increase/decrease request, which creates a new 
Container and AM pulls it. 
- b.1. If it is a increase request, RM changes internal resource accounting 
including SchedulerNode/Queue/Application, etc.
- b.2. If it is a decrease request, RM will not change resource accounting 
immediately. Instead, inside ContainerUpdateContext#swapContainer, it set 
to-be-released resource to tempContainerToKill. Which will be sent to scheduler 
to release async.

So my question is, is it possible to avoid the sync lock of SchedulerNode 
inside swapContainer? Which looks dangerous and potentially make implementation 
complicated. 

I do want to review other part of the code (such as SchedulerNode) in one shot, 
however I found they might be related to the swapContainer implementation.

Please let me know if you want a conf call to discuss this easier.

> Common scheduler changes supporting scheduler-specific implementations
> --
>
> Key: YARN-4511
> URL: https://issues.apache.org/jira/browse/YARN-4511
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Haibo Chen
> Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, 
> YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, 
> YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch, 
> YARN-4511-YARN-1011.07.patch, YARN-4511-YARN-1011.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4827) Document configuration of ReservationSystem for FairScheduler

2017-10-25 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219688#comment-16219688
 ] 

Yufei Gu commented on YARN-4827:


Thanks for the review and commit, [~subru].

> Document configuration of ReservationSystem for FairScheduler
> -
>
> Key: YARN-4827
> URL: https://issues.apache.org/jira/browse/YARN-4827
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Subru Krishnan
>Assignee: Yufei Gu
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0, 3.1.0
>
> Attachments: YARN-4827.001.patch, YARN-4827.002.patch, 
> YARN-4827.003.patch
>
>
> This JIRA tracks the effort to add documentation on how to configure 
> ReservationSystem for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219687#comment-16219687
 ] 

Subru Krishnan commented on YARN-7390:
--

Thanks [~yufeigu] and [~haibo.chen] for your thoughts. 

I am willing to sacrifice build time for code stability so lean towards 
parameterization but I concur that the best approach will be to initiate a 
discuss thread with the wider community.

Coming to this JIRA, I want to see the FS test failures fixed rather than 
simply opening up the configuration. Makes sense? 

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4687) Document Reservation ACLs

2017-10-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4687:
-
Fix Version/s: 2.9.0

> Document Reservation ACLs
> -
>
> Key: YARN-4687
> URL: https://issues.apache.org/jira/browse/YARN-4687
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sean Po
>Assignee: Sean Po
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4687.v1.patch, YARN-4687.v2.patch, 
> YARN-4687.v3.patch
>
>
> YARN-2575 introduces ACLs for ReservationSystem. This JIRA is for adding 
> documentation on how to configure the ACLs for Capacity/Fair schedulers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4687) Document Reservation ACLs

2017-10-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4687:
-
Target Version/s:   (was: 2.8.0)

> Document Reservation ACLs
> -
>
> Key: YARN-4687
> URL: https://issues.apache.org/jira/browse/YARN-4687
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sean Po
>Assignee: Sean Po
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4687.v1.patch, YARN-4687.v2.patch, 
> YARN-4687.v3.patch
>
>
> YARN-2575 introduces ACLs for ReservationSystem. This JIRA is for adding 
> documentation on how to configure the ACLs for Capacity/Fair schedulers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4859) [Bug] Unable to submit a job to a reservation when using FairScheduler

2017-10-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4859:
-
Target Version/s:   (was: 2.9.0, 3.0.0)

> [Bug] Unable to submit a job to a reservation when using FairScheduler
> --
>
> Key: YARN-4859
> URL: https://issues.apache.org/jira/browse/YARN-4859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Subru Krishnan
>Assignee: Yufei Gu
>Priority: Blocker
> Fix For: 2.9.0
>
> Attachments: Screen Shot 2017-10-16 at 17.27.48.png
>
>
> Jobs submitted to a reservation get stuck at scheduled stage when using 
> FairScheduler. I came across this when working on YARN-4827 (documentation 
> for configuring ReservationSystem for FairScheduler)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4859) [Bug] Unable to submit a job to a reservation when using FairScheduler

2017-10-25 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-4859:
-
Fix Version/s: 2.9.0

> [Bug] Unable to submit a job to a reservation when using FairScheduler
> --
>
> Key: YARN-4859
> URL: https://issues.apache.org/jira/browse/YARN-4859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Subru Krishnan
>Assignee: Yufei Gu
>Priority: Blocker
> Fix For: 2.9.0
>
> Attachments: Screen Shot 2017-10-16 at 17.27.48.png
>
>
> Jobs submitted to a reservation get stuck at scheduled stage when using 
> FairScheduler. I came across this when working on YARN-4827 (documentation 
> for configuring ReservationSystem for FairScheduler)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219625#comment-16219625
 ] 

Haibo Chen commented on YARN-7390:
--

Like Yufei said, we have considered parameterizing all unit tests that are 
supposed to be scheduler agonistic vs just removing code based on the 
assumption of CapacityScheduler. The latter is necessary in either case. The 
con of the former option is much increased unit test turnout time which may 
raise concerns in the community, whereas the latter allows us internally to run 
all unit tests against Fair Scheduler with a simple change of default 
scheduler. Not sure if we need to have some discussion in the community if we 
were to parameterize all scheduler-agnostic tests. Thoughts?

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7276) Federation Router Web Service fixes

2017-10-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7276:
--
Attachment: YARN-7276.011.patch

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276-branch-2.000.patch, 
> YARN-7276-branch-2.001.patch, YARN-7276-branch-2.002.patch, 
> YARN-7276-branch-2.003.patch, YARN-7276-branch-2.004.patch, 
> YARN-7276.000.patch, YARN-7276.001.patch, YARN-7276.002.patch, 
> YARN-7276.003.patch, YARN-7276.004.patch, YARN-7276.005.patch, 
> YARN-7276.006.patch, YARN-7276.007.patch, YARN-7276.009.patch, 
> YARN-7276.010.patch, YARN-7276.011.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7276) Federation Router Web Service fixes

2017-10-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219621#comment-16219621
 ] 

Íñigo Goiri commented on YARN-7276:
---

Thanks for the comments [~giovanni.fumarola], I uploaded 011 tackling them.

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276-branch-2.000.patch, 
> YARN-7276-branch-2.001.patch, YARN-7276-branch-2.002.patch, 
> YARN-7276-branch-2.003.patch, YARN-7276-branch-2.004.patch, 
> YARN-7276.000.patch, YARN-7276.001.patch, YARN-7276.002.patch, 
> YARN-7276.003.patch, YARN-7276.004.patch, YARN-7276.005.patch, 
> YARN-7276.006.patch, YARN-7276.007.patch, YARN-7276.009.patch, 
> YARN-7276.010.patch, YARN-7276.011.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7276) Federation Router Web Service fixes

2017-10-25 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219617#comment-16219617
 ] 

Giovanni Matteo Fumarola commented on YARN-7276:


Thanks [~elgoiri] for the patch. It looks good, couple of nits:
* Typo in {{routerWebService.replaceLabelsOnNode}} should be 
{{routerWebService.replaceLabelsOnNodes}};
* Typo in {{routerWebService.removeFromCluserNodeLabels}} should be 
{{routerWebService.addToClusterNodeLabels}};
* Rename {{testMultiThread}}  --> {{testGetAppsMultiThread}}.

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276-branch-2.000.patch, 
> YARN-7276-branch-2.001.patch, YARN-7276-branch-2.002.patch, 
> YARN-7276-branch-2.003.patch, YARN-7276-branch-2.004.patch, 
> YARN-7276.000.patch, YARN-7276.001.patch, YARN-7276.002.patch, 
> YARN-7276.003.patch, YARN-7276.004.patch, YARN-7276.005.patch, 
> YARN-7276.006.patch, YARN-7276.007.patch, YARN-7276.009.patch, 
> YARN-7276.010.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219584#comment-16219584
 ] 

Yufei Gu edited comment on YARN-7390 at 10/25/17 9:42 PM:
--

Yeah, I agreed. I actually pointed out the same thing when I review another 
unit test failure(YARN-7308). The reason I chose the second option is to be 
consistent with the approach we did recently. You can see we make a bunch of 
unit test scheduler agnostic. Parameterization maybe a better solution here 
since reservation system keep changing and not well tested in FS. I'll post 
parameterization version soon.


was (Author: yufeigu):
Yeah, I agreed. I actually pointed out the same thing when I review another 
unit test failure(YARN-7308). The reason I chose the second option is to be 
consistent with the approach we did recently. You can see we make a bunch of 
unit test scheduler agnostic. maybe a better solution here since reservation 
system keep changing and not well tested in FS. I'll post parameterization 
version soon.

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219584#comment-16219584
 ] 

Yufei Gu commented on YARN-7390:


Yeah, I agreed. I actually pointed out the same thing when I review another 
unit test failure(YARN-7308). The reason I chose the second option is to be 
consistent with the approach we did recently. You can see we make a bunch of 
unit test scheduler agnostic. maybe a better solution here since reservation 
system keep changing and not well tested in FS. I'll post parameterization 
version soon.

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7332) Compute effectiveCapacity per each resource vector

2017-10-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219570#comment-16219570
 ] 

Wangda Tan edited comment on YARN-7332 at 10/25/17 9:37 PM:


Thanks [~sunilg] for updating the patch, could you add one more test case to 
test what's the effectiveResource when clusterResource == (10,40)?


was (Author: leftnoteasy):
Thanks [~sunilg] for updating the patch, could you add one more test case to 
test what's the effectiveResource when clusterResource == (10,40)?

> Compute effectiveCapacity per each resource vector
> --
>
> Key: YARN-7332
> URL: https://issues.apache.org/jira/browse/YARN-7332
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: YARN-5881
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-7332.YARN-5881.001.patch, 
> YARN-7332.YARN-5881.002.patch
>
>
> Currently effective capacity uses a generalized approach based on dominance. 
> Hence some vectors may not be calculated correctly. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7332) Compute effectiveCapacity per each resource vector

2017-10-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219570#comment-16219570
 ] 

Wangda Tan commented on YARN-7332:
--

Thanks [~sunilg] for updating the patch, could you add one more test case to 
test what's the effectiveResource when clusterResource == (10,40)?

> Compute effectiveCapacity per each resource vector
> --
>
> Key: YARN-7332
> URL: https://issues.apache.org/jira/browse/YARN-7332
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: YARN-5881
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-7332.YARN-5881.001.patch, 
> YARN-7332.YARN-5881.002.patch
>
>
> Currently effective capacity uses a generalized approach based on dominance. 
> Hence some vectors may not be calculated correctly. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219547#comment-16219547
 ] 

Subru Krishnan commented on YARN-7390:
--

[~yufeigu], thanks for the clarification. I prefer parameterization so that we 
can avoid being in similar situations in future but I am fine if you choose to 
make it configurable for now.

More importantly, shouldn't the fix for test case failures be included as 
otherwise the patch is not very relevant, right?

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219537#comment-16219537
 ] 

Yufei Gu edited comment on YARN-7390 at 10/25/17 9:21 PM:
--

Thanks for the review, [~subru]. You need to set scheduler class to Fair 
Scheduler manually in yarn-default.xml to see the failures. There are two 
options in term of solution. One is to make this test parameterized, which test 
both FS and CS. One of the concern is that the unit test runtime would be 
pretty long if we do this for each test. The other option is to make it 
scheduler agnostic. You can do the different configuration depending on what 
scheduler you care. I chose the second in this patch.


was (Author: yufeigu):
Thanks for the review, [~subru]. You need to set scheduler class to Fair 
Scheduler manually in yarn-default.xml to see the failures. There are two 
options. One is to make this test parameterized, which test both FS and CS. One 
of the concern is that the unit test runtime would be pretty long if we do this 
for each test. The other option is to make it scheduler agnostic. You can do 
the different configuration depending on what scheduler you care. I chose the 
second in this patch.

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7395) NM fails to successfully kill tasks that run over their memory limit

2017-10-25 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219531#comment-16219531
 ] 

Eric Badger edited comment on YARN-7395 at 10/25/17 9:21 PM:
-

I'm seeing this on an internal 2.8 build of hadoop with everything underneath 
YARN-3611 pulled back. So it is possible that this bug is exclusive to that 
environment and not present in 2.9 or 3.0. However, the resource monitoring 
seems to be working correctly, it's the docker kill portion that isn't 
successful. That's why I believe that this is an issue across 2.9 and 3.0


was (Author: ebadger):
I'm seeing this on an internal 2.8 build of hadoop with everything underneath 
YARN-3611 pulled back. So it is possible that this bug is exclusive to that 
environment and not present in 2.9 or 3.0. 

> NM fails to successfully kill tasks that run over their memory limit
> 
>
> Key: YARN-7395
> URL: https://issues.apache.org/jira/browse/YARN-7395
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>
> The NM correctly notes that the container is over its configured limit, but 
> then fails to successfully kill the process. So the Docker container AM stays 
> around and the job keeps running



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219537#comment-16219537
 ] 

Yufei Gu commented on YARN-7390:


Thanks for the review, [~subru]. You need to set scheduler class to Fair 
Scheduler manually in yarn-default.xml to see the failures. There are two 
options. One is to make this test parameterized, which test both FS and CS. One 
of the concern is that the unit test runtime would be pretty long if we do this 
for each test. The other option is to make it scheduler agnostic. You can do 
the different configuration depending on what scheduler you care. I chose the 
second in this patch.

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7395) NM fails to successfully kill tasks that run over their memory limit

2017-10-25 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219536#comment-16219536
 ] 

Eric Badger commented on YARN-7395:
---

Here's the relevant lines from the NM log 
{noformat}
2017-10-25 20:03:07,549 [Container Monitor] WARN monitor.ContainersMonitorImpl: 
Process tree for container: container_e126_1508911755032_0004_02_01 has 
processes older than 1 iteration running over the configured limit. 
Limit=536870912, current usage = 585281536
2017-10-25 20:03:07,551 [Container Monitor] WARN monitor.ContainersMonitorImpl: 
Container [pid=29030,containerID=container_e126_1508911755032_0004_02_01] 
is running beyond physical memory limits. Current usage: 558.2 MB of 512 MB 
physical memory used; 2.8 GB of 1.0 GB virtual memory used. Killing container.
Dump of the process-tree for container_e126_1508911755032_0004_02_01 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 29065 29030 29030 29030 (java) 6022 290 2962636800 142606 /bin/java 
-Djava.io.tmpdir=/tmp/yarn-local/usercache/ebadger/appcache/application_1508911755032_0004/container_e126_1508911755032_0004_02_01/tmp
 -Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_01
 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
-Dhadoop.root.logfile=syslog 
-XX:ErrorFile=/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_01/hs_err_pid%p.log
 -XX:GCTimeLimit=50 -XX:ParallelGCThreads=4 -XX:NewRatio=8 
-Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_01/gc.log
 -Xmx1024m -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
|- 29030 29014 29030 29030 (bash) 3 2 9474048 285 /bin/bash -c 
/bin/java 
-Djava.io.tmpdir=/tmp/yarn-local/usercache/ebadger/appcache/application_1508911755032_0004/container_e126_1508911755032_0004_02_01/tmp
 -Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_01
 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
-Dhadoop.root.logfile=syslog 
-XX:ErrorFile=/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_01/hs_err_pid%p.log
 -XX:GCTimeLimit=50 -XX:ParallelGCThreads=4 -XX:NewRatio=8 
-Djava.net.preferIPv4Stack=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_01/gc.log
 -Xmx1024m -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
1>/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_01/stdout
 
2>/tmp/yarn-logs/application_1508911755032_0004/container_e126_1508911755032_0004_02_01/stderr
  

2017-10-25 20:03:07,551 [Container Monitor] INFO monitor.ContainersMonitorImpl: 
Removed ProcessTree with root 29030
2017-10-25 20:03:07,551 [AsyncDispatcher event handler] INFO 
container.ContainerImpl: Container container_e126_1508911755032_0004_02_01 
transitioned from RUNNING to KILLING
2017-10-25 20:03:07,552 [AsyncDispatcher event handler] INFO 
launcher.ContainerLaunch: Cleaning up container 
container_e126_1508911755032_0004_02_01
2017-10-25 20:03:07,576 [AsyncDispatcher event handler] WARN 
nodemanager.LinuxContainerExecutor: Error in signalling container 29030 with 
SIGTERM; exit = 1
org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException:
 Signal container failed
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.signalContainer(DockerLinuxContainerRuntime.java:615)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:510)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:473)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:140)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:56)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
at java.lang.Thread.run(Thread.java:745)
2017-10-25 20:03:07,576 [AsyncDispatcher event handler] INFO 
nodemanager.ContainerExecutor: Using command stop 

[jira] [Commented] (YARN-7395) NM fails to successfully kill tasks that run over their memory limit

2017-10-25 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219531#comment-16219531
 ] 

Eric Badger commented on YARN-7395:
---

I'm seeing this on an internal 2.8 build of hadoop with everything underneath 
YARN-3611 pulled back. So it is possible that this bug is exclusive to that 
environment and not present in 2.9 or 3.0. 

> NM fails to successfully kill tasks that run over their memory limit
> 
>
> Key: YARN-7395
> URL: https://issues.apache.org/jira/browse/YARN-7395
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>
> The NM correctly notes that the container is over its configured limit, but 
> then fails to successfully kill the process. So the Docker container AM stays 
> around and the job keeps running



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7395) NM fails to successfully kill tasks that run over their memory limit

2017-10-25 Thread Eric Badger (JIRA)
Eric Badger created YARN-7395:
-

 Summary: NM fails to successfully kill tasks that run over their 
memory limit
 Key: YARN-7395
 URL: https://issues.apache.org/jira/browse/YARN-7395
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Eric Badger


The NM correctly notes that the container is over its configured limit, but 
then fails to successfully kill the process. So the Docker container AM stays 
around and the job keeps running



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7339) LocalityMulticastAMRMProxyPolicy should handle cancel request properly

2017-10-25 Thread Botong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219521#comment-16219521
 ] 

Botong Huang commented on YARN-7339:


Thanks [~curino] for the review and commit!

> LocalityMulticastAMRMProxyPolicy should handle cancel request properly
> --
>
> Key: YARN-7339
> URL: https://issues.apache.org/jira/browse/YARN-7339
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Fix For: 2.9.0, 3.0
>
> Attachments: YARN-7339-branch-2.v6.patch, YARN-7339-v1.patch, 
> YARN-7339-v2.patch, YARN-7339-v3.patch, YARN-7339-v4.patch, 
> YARN-7339-v5.patch, YARN-7339-v6.patch
>
>
> Currently inside AMRMProxy, LocalityMulticastAMRMProxyPolicy is not handling 
> and splitting cancel requests from AM properly: 
> # For node cancel request, we should not treat it as a localized resource 
> request. Otherwise it can lead to all weight zero issue when computing 
> localized resource weight. 
> # For ANY cancel, we should broadcast to all known subclusters, not just the 
> ones associated with localized resources. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219439#comment-16219439
 ] 

Subru Krishnan edited comment on YARN-7390 at 10/25/17 9:02 PM:


Thanks [~yufeigu] for working on this. I had a question - is this an issue of 
tests not parameterized to run with FS (which you have fixed in the patch) or 
failing when they are executed manually?


was (Author: subru):
Thanks [~yufeigu] for the quick fix. The patch LGTM, can you take a quick look 
at the checkstyle warnings?

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7339) LocalityMulticastAMRMProxyPolicy should handle cancel request properly

2017-10-25 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219500#comment-16219500
 ] 

Carlo Curino commented on YARN-7339:


Thanks [~botong] committed this to branch-2, closing the JIRA.

> LocalityMulticastAMRMProxyPolicy should handle cancel request properly
> --
>
> Key: YARN-7339
> URL: https://issues.apache.org/jira/browse/YARN-7339
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Fix For: 2.9.0, 3.0
>
> Attachments: YARN-7339-branch-2.v6.patch, YARN-7339-v1.patch, 
> YARN-7339-v2.patch, YARN-7339-v3.patch, YARN-7339-v4.patch, 
> YARN-7339-v5.patch, YARN-7339-v6.patch
>
>
> Currently inside AMRMProxy, LocalityMulticastAMRMProxyPolicy is not handling 
> and splitting cancel requests from AM properly: 
> # For node cancel request, we should not treat it as a localized resource 
> request. Otherwise it can lead to all weight zero issue when computing 
> localized resource weight. 
> # For ANY cancel, we should broadcast to all known subclusters, not just the 
> ones associated with localized resources. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7339) LocalityMulticastAMRMProxyPolicy should handle cancel request properly

2017-10-25 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-7339:
---
Fix Version/s: 2.9.0

> LocalityMulticastAMRMProxyPolicy should handle cancel request properly
> --
>
> Key: YARN-7339
> URL: https://issues.apache.org/jira/browse/YARN-7339
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
> Fix For: 2.9.0, 3.0
>
> Attachments: YARN-7339-branch-2.v6.patch, YARN-7339-v1.patch, 
> YARN-7339-v2.patch, YARN-7339-v3.patch, YARN-7339-v4.patch, 
> YARN-7339-v5.patch, YARN-7339-v6.patch
>
>
> Currently inside AMRMProxy, LocalityMulticastAMRMProxyPolicy is not handling 
> and splitting cancel requests from AM properly: 
> # For node cancel request, we should not treat it as a localized resource 
> request. Otherwise it can lead to all weight zero issue when computing 
> localized resource weight. 
> # For ANY cancel, we should broadcast to all known subclusters, not just the 
> ones associated with localized resources. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7394) Merge code paths for Reservation/Plan queues and Auto Created queues

2017-10-25 Thread Suma Shivaprasad (JIRA)
Suma Shivaprasad created YARN-7394:
--

 Summary: Merge code paths for Reservation/Plan queues and Auto 
Created queues
 Key: YARN-7394
 URL: https://issues.apache.org/jira/browse/YARN-7394
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Suma Shivaprasad
Assignee: Suma Shivaprasad


The initialization/reinitialization logic for ReservationQueue and AutoCreated 
Leaf queues are similar. The proposal is to rename ReservationQueue to a more 
generic name AutoCreatedLeafQueue which are either managed by PlanQueue(already 
exists) or AutoCreatedEnabledParentQueue (new class). 





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2017-10-25 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-7190:
---
Attachment: YARN-7190.01.patch

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Varun Saxena
> Fix For: 2.9.0, YARN-5355_branch2
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch, 
> YARN-7190.01.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7388) TestAMRestart should be scheduler agnostic

2017-10-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219441#comment-16219441
 ] 

Subru Krishnan commented on YARN-7388:
--

[~haibo.chen], do you plan to include this for 2.9.0 as well as it looks 
relevant?

> TestAMRestart should be scheduler agnostic
> --
>
> Key: YARN-7388
> URL: https://issues.apache.org/jira/browse/YARN-7388
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: YARN-7388.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7390) All reservation related test cases failed when TestYarnClient runs against Fair Scheduler.

2017-10-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219439#comment-16219439
 ] 

Subru Krishnan commented on YARN-7390:
--

Thanks [~yufeigu] for the quick fix. The patch LGTM, can you take a quick look 
at the checkstyle warnings?

> All reservation related test cases failed when TestYarnClient runs against 
> Fair Scheduler.
> --
>
> Key: YARN-7390
> URL: https://issues.apache.org/jira/browse/YARN-7390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, reservation system
>Affects Versions: 2.9.0, 3.0.0, 3.1.0
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-7390.001.patch
>
>
> All reservation related test cases failed when {{TestYarnClient}} runs 
> against Fair Scheduler. To reproduce it, you need to set scheduler class to 
> Fair Scheduler in yarn-default.xml.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5516) Add REST API for periodicity

2017-10-25 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-5516:
--
Attachment: YARN-5516.v006.patch

Thanks [~subru] for the comments, the latest patch changes the dates in the 
InMemoryPlan tests to reference 2050 instead of 2017. 

> Add REST API for periodicity
> 
>
> Key: YARN-5516
> URL: https://issues.apache.org/jira/browse/YARN-5516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sangeetha Abdu Jyothi
>Assignee: Sean Po
> Attachments: YARN-5516.v001.patch, YARN-5516.v002.patch, 
> YARN-5516.v003.patch, YARN-5516.v004.patch, YARN-5516.v005.patch, 
> YARN-5516.v006.patch
>
>
> YARN-5516 changing REST API of the reservation system to support periodicity. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7389) Make TestResourceManager Scheduler agnostic

2017-10-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219435#comment-16219435
 ] 

Subru Krishnan commented on YARN-7389:
--

[~rkanter]/[~haibo.chen], does it make sense to include this in branch-2 as 
well?

> Make TestResourceManager Scheduler agnostic
> ---
>
> Key: YARN-7389
> URL: https://issues.apache.org/jira/browse/YARN-7389
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 3.0.0
>
> Attachments: YARN-7389.001.patch
>
>
> Many of the tests in {{TestResourceManager}} override the scheduler to always 
> be {{CapacityScheduler}}.  However, these tests should be made scheduler 
> agnostic (they are testing the RM, not the scheduler).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7351) Fix high CPU usage issue in RegistryDNS

2017-10-25 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219436#comment-16219436
 ] 

Jian He commented on YARN-7351:
---

YARN-7393 is opened for the tcp issue.

> Fix high CPU usage issue in RegistryDNS
> ---
>
> Key: YARN-7351
> URL: https://issues.apache.org/jira/browse/YARN-7351
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7351.yarn-native-services.01.patch, 
> YARN-7351.yarn-native-services.02.patch, 
> YARN-7351.yarn-native-services.03.patch, 
> YARN-7351.yarn-native-services.03.patch
>
>
> Thanks [~aw] for finding this issue.
> The current RegistryDNS implementation is always running on high CPU and 
> pretty much eats one core. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5516) Add REST API for periodicity

2017-10-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219424#comment-16219424
 ] 

Subru Krishnan commented on YARN-5516:
--

Thanks [~seanpo03] for addressing my comments and even critically tracking down 
the tough corner case bugs. The latest patch LGTM with one caveat - let's 
update the date to 2050 to be future proof :).

> Add REST API for periodicity
> 
>
> Key: YARN-5516
> URL: https://issues.apache.org/jira/browse/YARN-5516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sangeetha Abdu Jyothi
>Assignee: Sean Po
> Attachments: YARN-5516.v001.patch, YARN-5516.v002.patch, 
> YARN-5516.v003.patch, YARN-5516.v004.patch, YARN-5516.v005.patch
>
>
> YARN-5516 changing REST API of the reservation system to support periodicity. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7351) Fix high CPU usage issue in RegistryDNS

2017-10-25 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219425#comment-16219425
 ] 

Billie Rinaldi commented on YARN-7351:
--

This fixes the issue for me. I'll commit this fix for the CPU issue and we can 
open another ticket for fixing the TCP channel.

> Fix high CPU usage issue in RegistryDNS
> ---
>
> Key: YARN-7351
> URL: https://issues.apache.org/jira/browse/YARN-7351
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7351.yarn-native-services.01.patch, 
> YARN-7351.yarn-native-services.02.patch, 
> YARN-7351.yarn-native-services.03.patch, 
> YARN-7351.yarn-native-services.03.patch
>
>
> Thanks [~aw] for finding this issue.
> The current RegistryDNS implementation is always running on high CPU and 
> pretty much eats one core. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7393) RegistryDNS doesn't work in tcp channel

2017-10-25 Thread Jian He (JIRA)
Jian He created YARN-7393:
-

 Summary: RegistryDNS doesn't work in tcp channel
 Key: YARN-7393
 URL: https://issues.apache.org/jira/browse/YARN-7393
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7351) Fix high CPU usage issue in RegistryDNS

2017-10-25 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-7351:
-
Summary: Fix high CPU usage issue in RegistryDNS  (was: High CPU usage 
issue in RegistryDNS)

> Fix high CPU usage issue in RegistryDNS
> ---
>
> Key: YARN-7351
> URL: https://issues.apache.org/jira/browse/YARN-7351
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-7351.yarn-native-services.01.patch, 
> YARN-7351.yarn-native-services.02.patch, 
> YARN-7351.yarn-native-services.03.patch, 
> YARN-7351.yarn-native-services.03.patch
>
>
> Thanks [~aw] for finding this issue.
> The current RegistryDNS implementation is always running on high CPU and 
> pretty much eats one core. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7393) RegistryDNS doesn't work in tcp channel

2017-10-25 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-7393:
--
Parent Issue: YARN-7054  (was: YARN-5079)

> RegistryDNS doesn't work in tcp channel
> ---
>
> Key: YARN-7393
> URL: https://issues.apache.org/jira/browse/YARN-7393
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7276) Federation Router Web Service fixes

2017-10-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7276:
--
Attachment: YARN-7276.010.patch

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276-branch-2.000.patch, 
> YARN-7276-branch-2.001.patch, YARN-7276-branch-2.002.patch, 
> YARN-7276-branch-2.003.patch, YARN-7276-branch-2.004.patch, 
> YARN-7276.000.patch, YARN-7276.001.patch, YARN-7276.002.patch, 
> YARN-7276.003.patch, YARN-7276.004.patch, YARN-7276.005.patch, 
> YARN-7276.006.patch, YARN-7276.007.patch, YARN-7276.009.patch, 
> YARN-7276.010.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5516) Add REST API for periodicity

2017-10-25 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-5516:
--
Attachment: YARN-5516.v005.patch

The following tests are failing but are also tracked.
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation tracked 
by YARN-6747
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
tracked by YARN-7080

The following tests do not appear to be related to my changes.
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy
org.apache.hadoop.yarn.server.resourcemanager.TestReservationSystemWithRMHA

The latest patch fixes the bug in the InMemoryPlan I referenced above, and also 
adds a test with regards to the bug.

> Add REST API for periodicity
> 
>
> Key: YARN-5516
> URL: https://issues.apache.org/jira/browse/YARN-5516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sangeetha Abdu Jyothi
>Assignee: Sean Po
> Attachments: YARN-5516.v001.patch, YARN-5516.v002.patch, 
> YARN-5516.v003.patch, YARN-5516.v004.patch, YARN-5516.v005.patch
>
>
> YARN-5516 changing REST API of the reservation system to support periodicity. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7224) Support GPU isolation for docker container

2017-10-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7224:
-
Attachment: YARN-7224.008.patch

Thanks [~sunilg] for comments,

bq. In assignGpus, do we also need to update the assigned gpus to container's 
resource mapping list ?
I would prefer to keep them in NMStateStore#storeAssignedResources, otherwise 
all new resource plugins need to implement such logics.

bq. In general dockerCommandPlugin.updateDockerRunCommand helps to update 
docker command for volume etc. However is its better to have an api named 
sanitize/verifyCommand in dockerCommandPlugin so that incoming/created command 
will validated and logged based on system parameters
I'm not quite sure about this, could you explain?

bq. Once a docker volume is created, when this volume will be cleaned or 
unmounted ? in case when container crashes or force stopping container from 
external docker commands etc
bq. With container upgrades or partially using GPU device for a timeslice of 
container lifetime, how volumes could be mounted/re-mounted ?
For the GPU docker integration, we don't need to do this. Because all launched 
containers will share the same docker volume, so we don't need to create the 
docker volume again and again. I agree that we may need this in the future. So 
I added one method (getCleanupDockerVolumeCommand) to DockerCommandPlugin 
interface.

bq. In GpuDevice, do we also need to add make (like nvidia with version etc ? )
We don't need it for now, we can add it in the future easily when required.

bq. In initializeWhenGpuRequested, we do a lazy initialization. However if 
docker end point is down(default port), this could cause delay in container 
launch. Do we need a health mechanism to get this data updated ?
To me this is same as docker daemon is down. And since containers will fail 
fast, so admin should be able to fix this issue. 

bq. Once docker volume is created, its better to dump the docker volume inspect 
o/p on created volume. Could help for debugging later.
I like this ideal, but considering size of this patch, can we do this in a 
follow up JIRA?

Attached ver.8 patch.

> Support GPU isolation for docker container
> --
>
> Key: YARN-7224
> URL: https://issues.apache.org/jira/browse/YARN-7224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-7224.001.patch, YARN-7224.002-wip.patch, 
> YARN-7224.003.patch, YARN-7224.004.patch, YARN-7224.005.patch, 
> YARN-7224.006.patch, YARN-7224.007.patch, YARN-7224.008.patch
>
>
> This patch is to address issues when docker container is being used:
> 1. GPU driver and nvidia libraries: If GPU drivers and NV libraries are 
> pre-packaged inside docker image, it could conflict to driver and 
> nvidia-libraries installed on Host OS. An alternative solution is to detect 
> Host OS's installed drivers and devices, mount it when launch docker 
> container. Please refer to \[1\] for more details. 
> 2. Image detection: 
> From \[2\], the challenge is: 
> bq. Mounting user-level driver libraries and device files clobbers the 
> environment of the container, it should be done only when the container is 
> running a GPU application. The challenge here is to determine if a given 
> image will be using the GPU or not. We should also prevent launching 
> containers based on a Docker image that is incompatible with the host NVIDIA 
> driver version, you can find more details on this wiki page.
> 3. GPU isolation.
> *Proposed solution*:
> a. Use nvidia-docker-plugin \[3\] to address issue #1, this is the same 
> solution used by K8S \[4\]. issue #2 could be addressed in a separate JIRA.
> We won't ship nvidia-docker-plugin with out releases and we require cluster 
> admin to preinstall nvidia-docker-plugin to use GPU+docker support on YARN. 
> "nvidia-docker" is a wrapper of docker binary which can address #3 as well, 
> however "nvidia-docker" doesn't provide same semantics of docker, and it 
> needs to setup additional environments such as PATH/LD_LIBRARY_PATH to use 
> it. To avoid introducing additional issues, we plan to use 
> nvidia-docker-plugin + docker binary approach.
> b. To address GPU driver and nvidia libraries, we uses nvidia-docker-plugin 
> \[3\] to create a volume which includes GPU-related libraries and mount it 
> when docker container being launched. Changes include: 
> - Instead of using {{volume-driver}}, this patch added {{docker volume 
> create}} command to c-e and NM Java side. The reason is {{volume-driver}} can 
> only use single volume driver for each launched docker container.
> - Updated {{c-e}} and Java side, if a mounted volume is a named volume in 
> docker, skip checking file existence. (Named-volume still need to be added to 
> permitted list of 

[jira] [Commented] (YARN-7376) YARN top ACLs

2017-10-25 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219355#comment-16219355
 ] 

Jonathan Hung commented on YARN-7376:
-

[~vvasudev], YARN top hard loops over the getApplications call to the server. 
If it's implemented on the server, then the client will continuously call 
getApplications, and each call will fail. I don't think we want this behavior.

Perhaps we can set this acl config to be final. Would this help?

> YARN top ACLs
> -
>
> Key: YARN-7376
> URL: https://issues.apache.org/jira/browse/YARN-7376
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
> Attachments: YARN-7376.001.patch, YARN-7376.002.patch
>
>
> Currently YARN top can be invoked by everyone. But we want to avoid a 
> scenario where random users invoke YARN top, and potentially leave it 
> running. So we can implement ACLs to prevent this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5516) Add REST API for periodicity

2017-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219329#comment-16219329
 ] 

Hadoop QA commented on YARN-5516:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
55s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 110 unchanged - 8 fixed = 110 total (was 118) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
38s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
48s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 55m 
22s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
19s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}137m 58s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  

[jira] [Commented] (YARN-7190) Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user classpath

2017-10-25 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219318#comment-16219318
 ] 

Subru Krishnan commented on YARN-7190:
--

Thanks [~varun_saxena]/[~rohithsharma] for the prompt response. I concur on 
reopening this JIRA and uploading patch for trunk.

> Ensure only NM classpath in 2.x gets TSv2 related hbase jars, not the user 
> classpath
> 
>
> Key: YARN-7190
> URL: https://issues.apache.org/jira/browse/YARN-7190
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineclient, timelinereader, timelineserver
>Reporter: Vrushali C
>Assignee: Varun Saxena
> Fix For: 2.9.0, YARN-5355_branch2
>
> Attachments: YARN-7190-YARN-5355_branch2.01.patch, 
> YARN-7190-YARN-5355_branch2.02.patch, YARN-7190-YARN-5355_branch2.03.patch
>
>
> [~jlowe] had a good observation about the user classpath getting extra jars 
> in hadoop 2.x brought in with TSv2.  If users start picking up Hadoop 2,x's 
> version of HBase jars instead of the ones they shipped with their job, it 
> could be a problem.
> So when TSv2 is to be used in 2,x, the hbase related jars should come into 
> only the NM classpath not the user classpath.
> Here is a list of some jars
> {code}
> commons-csv-1.0.jar
> commons-el-1.0.jar
> commons-httpclient-3.1.jar
> disruptor-3.3.0.jar
> findbugs-annotations-1.3.9-1.jar
> hbase-annotations-1.2.6.jar
> hbase-client-1.2.6.jar
> hbase-common-1.2.6.jar
> hbase-hadoop2-compat-1.2.6.jar
> hbase-hadoop-compat-1.2.6.jar
> hbase-prefix-tree-1.2.6.jar
> hbase-procedure-1.2.6.jar
> hbase-protocol-1.2.6.jar
> hbase-server-1.2.6.jar
> htrace-core-3.1.0-incubating.jar
> jamon-runtime-2.4.1.jar
> jasper-compiler-5.5.23.jar
> jasper-runtime-5.5.23.jar
> jcodings-1.0.8.jar
> joni-2.1.2.jar
> jsp-2.1-6.1.14.jar
> jsp-api-2.1-6.1.14.jar
> jsr311-api-1.1.1.jar
> metrics-core-2.2.0.jar
> servlet-api-2.5-6.1.14.jar
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7320) Duplicate LiteralByteStrings in SystemCredentialsForAppsProto.credentialsForApp_

2017-10-25 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated YARN-7320:
-
Attachment: YARN-7320.01.addendum.patch

> Duplicate LiteralByteStrings in 
> SystemCredentialsForAppsProto.credentialsForApp_
> 
>
> Key: YARN-7320
> URL: https://issues.apache.org/jira/browse/YARN-7320
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Fix For: 3.0.0
>
> Attachments: YARN-7320.01.addendum.patch, YARN-7320.01.patch, 
> YARN-7320.02.patch
>
>
> Using jxray (www.jxray.com) I've analyzed several heap dumps from YARN 
> Resource Manager running in a big cluster. The tool uncovered several sources 
> of memory waste. One problem, which results in wasting more than a quarter of 
> all memory, is a large number of duplicate {{LiteralByteString}} objects 
> coming from the following reference chain:
> {code}
> 1,011,810K (26.9%): byte[]: 5416705 / 100% dup arrays (22108 unique)
> ↖com.google.protobuf.LiteralByteString.bytes
> ↖org.apache.hadoop.yarn.proto.YarnServerCommonServiceProtos$.credentialsForApp_
> ↖{j.u.ArrayList}
> ↖j.u.Collections$UnmodifiableRandomAccessList.c
> ↖org.apache.hadoop.yarn.proto.YarnServerCommonServiceProtos$NodeHeartbeatResponseProto.systemCredentialsForApps_
> ↖org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.NodeHeartbeatResponsePBImpl.proto
> ↖org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.latestNodeHeartBeatResponse
> ↖org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode.rmNode
> ...
> {code}
> That is, collectively reference chains that look as above hold in memory 5.4 
> million {{LiteralByteString}} objects, but only ~22 thousand of these objects 
> are unique. Deduplicating these objects, e.g. using a Google Object Interner 
> instance, would save ~1GB of memory.
> It looks like the main place where the above {{LiteralByteString}}s are 
> created and attached to the {{SystemCredentialsForAppsProto}} objects is in 
> {{NodeHeartbeatResponsePBImpl.java}}, method 
> {{addSystemCredentialsToProto()}}. Probably adding a call to an interner 
> there will fix the problem. wi 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6927) Add support for individual resource types requests in MapReduce

2017-10-25 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219276#comment-16219276
 ] 

Daniel Templeton commented on YARN-6927:


Except for the redundant uses of _public_, which I think is fine, the 
checkstyle complaints are valid.  Otherwise, LGTM.

> Add support for individual resource types requests in MapReduce
> ---
>
> Key: YARN-6927
> URL: https://issues.apache.org/jira/browse/YARN-6927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Daniel Templeton
>Assignee: Gergo Repas
> Attachments: YARN-6927.000.patch, YARN-6927.001.patch, 
> YARN-6927.002.patch, YARN-6927.003.patch, YARN-6927.004.patch, 
> YARN-6927.005.patch
>
>
> YARN-6504 adds support for resource profiles in MapReduce jobs, but resource 
> profiles don't give users much flexibility in their resource requests.  To 
> satisfy users' needs, MapReduce should also allow users to specify arbitrary 
> resource requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6929) yarn.nodemanager.remote-app-log-dir structure is not scalable

2017-10-25 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-6929:

Attachment: YARN-6929.2.patch

> yarn.nodemanager.remote-app-log-dir structure is not scalable
> -
>
> Key: YARN-6929
> URL: https://issues.apache.org/jira/browse/YARN-6929
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
> Attachments: YARN-6929.1.patch, YARN-6929.2.patch, YARN-6929.patch
>
>
> The current directory structure for yarn.nodemanager.remote-app-log-dir is 
> not scalable. Maximum Subdirectory limit by default is 1048576 (HDFS-6102). 
> With retention yarn.log-aggregation.retain-seconds of 7days, there are more 
> chances LogAggregationService fails to create a new directory with 
> FSLimitException$MaxDirectoryItemsExceededException.
> The current structure is 
> //logs/. This can be 
> improved with adding date as a subdirectory like 
> //logs// 
> {code}
> WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService:
>  Application failed to init aggregation 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4262)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4221)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4194)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:813)
>  
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:600)
>  
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
>  
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) 
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:415) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:308)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:366)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:443)
>  
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) 
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException):
>  The directory item limit of /app-logs/yarn/logs is exceeded: limit=1048576 
> items=1048576 
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:2021)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:2072)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1841)
>  
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsRecursively(FSNamesystem.java:4351)
>  
> at 
> 

[jira] [Commented] (YARN-7326) Add recursion support and configure RegistryDNS to lookup upstream

2017-10-25 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219242#comment-16219242
 ] 

Eric Yang commented on YARN-7326:
-

Thank you [~billie.rinaldi].

> Add recursion support and configure RegistryDNS to lookup upstream
> --
>
> Key: YARN-7326
> URL: https://issues.apache.org/jira/browse/YARN-7326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Fix For: yarn-native-services
>
> Attachments: YARN-7326.yarn-native-services.001.patch, 
> YARN-7326.yarn-native-services.002.patch, 
> YARN-7326.yarn-native-services.003.patch, 
> YARN-7326.yarn-native-services.004.patch, 
> YARN-7326.yarn-native-services.005.patch, 
> YARN-7326.yarn-native-services.006.patch, 
> YARN-7326.yarn-native-services.007.patch
>
>
> [~aw] helped to identify these issues: 
> Now some general bad news, not related to this patch:
> Ran a few queries, but this one is a bit concerning:
> {code}
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
> ;; Warning: query response not set
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
> ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#54(127.0.0.1)
> ;; WHEN: Thu Oct 12 16:04:54 PDT 2017
> ;; MSG SIZE  rcvd: 12
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
> ;; Connection to ::1#54(::1) for . failed: connection refused.
> ;; communications error to 127.0.0.1#54: end of file
> root@ubuntu:/hadoop/logs# 
> {code}
> It looks like it effectively fails when asked about a root zone, which is bad.
> It's also kind of interesting in what it does and doesn't log. Probably 
> should be configured to rotate logs based on size not date.
> The real showstopper though: RegistryDNS basically eats a core. It is running 
> with 100% cpu utilization with and without jsvc. On my laptop, this is 
> triggering my fan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7391) Consider square root instead of natural log for size-based weight

2017-10-25 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219241#comment-16219241
 ] 

Yufei Gu commented on YARN-7391:


We don't want app weight grow too fast or too slow. How do we define too fast 
or too slow? It's mainly based on the size of demand and expectation of users. 
Agreed with Daniel, there is no answer without an analysis of user cases.

It is totally unnecessary to lock the scheduler only for {{getWeights()}} from 
an app. It would be a performance issue while # of app is large. We could 
definitely do something to improve.

> Consider square root instead of natural log for size-based weight
> -
>
> Key: YARN-7391
> URL: https://issues.apache.org/jira/browse/YARN-7391
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>
> Currently for size-based weight, we compute the weight of an app using this 
> code from 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L377:
> {code}
>   if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.log1p(app.getDemand().getMemorySize()) / Math.log(2);
>   }
> {code}
> Because the natural log function grows slowly, the weights of two apps with 
> hugely different memory demands can be quite similar. For example, {{weight}} 
> evaluates to 14.3 for an app with a demand of 20 GB, and evaluates to 19.9 
> for an app with a demand of 1000 GB. The app with the much larger demand will 
> still have a higher weight, but not by a large amount relative to the sum of 
> those weights.
> I think it's worth considering a switch to a square root function, which will 
> grow more quickly. In the above example, the app with a demand of 20 GB now 
> has a weight of 143, while the app with a demand of 1000 GB now has a weight 
> of 1012. These weights seem more reasonable relative to each other given the 
> difference in demand between the two apps.
> The above example is admittedly a bit extreme, but I believe that a square 
> root function would also produce reasonable results in general.
> The code I have in mind would look something like:
> {code}
>   if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.sqrt(app.getDemand().getMemorySize());
>   }
> {code}
> Would people be comfortable with this change?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5534) Allow whitelisted volume mounts

2017-10-25 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219238#comment-16219238
 ] 

Eric Yang commented on YARN-5534:
-

[~shaneku...@gmail.com] It doesn't look like YARN-6623 contain all features of 
this JIRA.  I don't see syntax for defining arbitrary volumes  in YARN-6623.  
Would you like to rebase the patch base on YARN-6623?

> Allow whitelisted volume mounts 
> 
>
> Key: YARN-5534
> URL: https://issues.apache.org/jira/browse/YARN-5534
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: luhuichun
>Assignee: Shane Kumpf
> Attachments: YARN-5534.001.patch, YARN-5534.002.patch, 
> YARN-5534.003.patch
>
>
> Introduction 
> Mounting files or directories from the host is one way of passing 
> configuration and other information into a docker container. 
> We could allow the user to set a list of mounts in the environment of 
> ContainerLaunchContext (e.g. /dir1:/targetdir1,/dir2:/targetdir2). 
> These would be mounted read-only to the specified target locations. This has 
> been resolved in YARN-4595
> 2.Problem Definition
> Bug mounting arbitrary volumes into a Docker container can be a security risk.
> 3.Possible solutions
> one approach to provide safe mounts is to allow the cluster administrator to 
> configure a set of parent directories as white list mounting directories.
>  Add a property named yarn.nodemanager.volume-mounts.white-list, when 
> container executor do mount checking, only the allowed directories or 
> sub-directories can be mounted. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5516) Add REST API for periodicity

2017-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219204#comment-16219204
 ] 

Hadoop QA commented on YARN-5516:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m  
6s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 109 unchanged - 8 fixed = 109 total (was 117) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
40s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
41s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m  8s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
17s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}155m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 

[jira] [Commented] (YARN-7276) Federation Router Web Service fixes

2017-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219193#comment-16219193
 ] 

Hadoop QA commented on YARN-7276:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m  
4s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
26s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 26s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7276 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893971/YARN-7276.009.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a24c6d4da040 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 5b98639 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/18141/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/18141/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: 

[jira] [Commented] (YARN-7391) Consider square root instead of natural log for size-based weight

2017-10-25 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219191#comment-16219191
 ] 

Daniel Templeton commented on YARN-7391:


Do you have a particular scenario where the log-based weight isn't given 
reasonable results?  My concern would be that a sqrt-based weight would allow a 
big-memory app to starve out smaller apps.  Without a motivating issue, I'm not 
super excited about changing this part of the code.

On a tangentially related note, I was profiling the scheduler yesterday and 
noticed that we spend a ton on our scheduling time waiting for the lock in this 
method.  Looks like a good candidate for caching in {{FSAppAttempt}}.

> Consider square root instead of natural log for size-based weight
> -
>
> Key: YARN-7391
> URL: https://issues.apache.org/jira/browse/YARN-7391
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>
> Currently for size-based weight, we compute the weight of an app using this 
> code from 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L377:
> {code}
>   if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.log1p(app.getDemand().getMemorySize()) / Math.log(2);
>   }
> {code}
> Because the natural log function grows slowly, the weights of two apps with 
> hugely different memory demands can be quite similar. For example, {{weight}} 
> evaluates to 14.3 for an app with a demand of 20 GB, and evaluates to 19.9 
> for an app with a demand of 1000 GB. The app with the much larger demand will 
> still have a higher weight, but not by a large amount relative to the sum of 
> those weights.
> I think it's worth considering a switch to a square root function, which will 
> grow more quickly. In the above example, the app with a demand of 20 GB now 
> has a weight of 143, while the app with a demand of 1000 GB now has a weight 
> of 1012. These weights seem more reasonable relative to each other given the 
> difference in demand between the two apps.
> The above example is admittedly a bit extreme, but I believe that a square 
> root function would also produce reasonable results in general.
> The code I have in mind would look something like:
> {code}
>   if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.sqrt(app.getDemand().getMemorySize());
>   }
> {code}
> Would people be comfortable with this change?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7391) Consider square root instead of natural log for size-based weight

2017-10-25 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219191#comment-16219191
 ] 

Daniel Templeton edited comment on YARN-7391 at 10/25/17 5:57 PM:
--

Do you have a particular scenario where the log-based weight isn't given 
reasonable results?  My concern would be that a sqrt-based weight would allow a 
big-memory app to starve out smaller apps.  Without a motivating issue, I'm not 
super excited about changing this part of the code.

On a tangentially related note, I was profiling the scheduler yesterday and 
noticed that we spend a ton of our scheduling time waiting for the lock in this 
method.  Looks like a good candidate for caching in {{FSAppAttempt}}.


was (Author: templedf):
Do you have a particular scenario where the log-based weight isn't given 
reasonable results?  My concern would be that a sqrt-based weight would allow a 
big-memory app to starve out smaller apps.  Without a motivating issue, I'm not 
super excited about changing this part of the code.

On a tangentially related note, I was profiling the scheduler yesterday and 
noticed that we spend a ton on our scheduling time waiting for the lock in this 
method.  Looks like a good candidate for caching in {{FSAppAttempt}}.

> Consider square root instead of natural log for size-based weight
> -
>
> Key: YARN-7391
> URL: https://issues.apache.org/jira/browse/YARN-7391
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0-beta1
>Reporter: Steven Rand
>
> Currently for size-based weight, we compute the weight of an app using this 
> code from 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java#L377:
> {code}
>   if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.log1p(app.getDemand().getMemorySize()) / Math.log(2);
>   }
> {code}
> Because the natural log function grows slowly, the weights of two apps with 
> hugely different memory demands can be quite similar. For example, {{weight}} 
> evaluates to 14.3 for an app with a demand of 20 GB, and evaluates to 19.9 
> for an app with a demand of 1000 GB. The app with the much larger demand will 
> still have a higher weight, but not by a large amount relative to the sum of 
> those weights.
> I think it's worth considering a switch to a square root function, which will 
> grow more quickly. In the above example, the app with a demand of 20 GB now 
> has a weight of 143, while the app with a demand of 1000 GB now has a weight 
> of 1012. These weights seem more reasonable relative to each other given the 
> difference in demand between the two apps.
> The above example is admittedly a bit extreme, but I believe that a square 
> root function would also produce reasonable results in general.
> The code I have in mind would look something like:
> {code}
>   if (sizeBasedWeight) {
> // Set weight based on current memory demand
> weight = Math.sqrt(app.getDemand().getMemorySize());
>   }
> {code}
> Would people be comfortable with this change?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7332) Compute effectiveCapacity per each resource vector

2017-10-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-7332:
--
Attachment: YARN-7332.YARN-5881.002.patch

Thanks [~leftnoteasy] Updating patch with test case.

> Compute effectiveCapacity per each resource vector
> --
>
> Key: YARN-7332
> URL: https://issues.apache.org/jira/browse/YARN-7332
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: YARN-5881
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-7332.YARN-5881.001.patch, 
> YARN-7332.YARN-5881.002.patch
>
>
> Currently effective capacity uses a generalized approach based on dominance. 
> Hence some vectors may not be calculated correctly. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7289) Application lifetime does not work with FairScheduler

2017-10-25 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219173#comment-16219173
 ] 

Daniel Templeton commented on YARN-7289:


Here:

{code}
  if (scheduler.equals(CapacityScheduler.class)) {
newConf =
new YarnConfiguration(setUpCSQueue(maxLifetime, defaultLifetime));
conf = new YarnConfiguration(newConf);
  }
{code}

Do we need the intermediary conf?  Can't we just have:

{code}
  if (scheduler.equals(CapacityScheduler.class)) {
conf = new YarnConfiguration(setUpCSQueue(maxLifetime, 
defaultLifetime));
  }
{code}

?

It would also be good to add a comment to say why FS doesn't need any queue 
setup.

Is {{testApplicationLifetimeMonitor()}} actually testing anything with FS?  
Would it be a better approach to add an {{assume()}} so the test only runs with 
CS?

> Application lifetime does not work with FairScheduler
> -
>
> Key: YARN-7289
> URL: https://issues.apache.org/jira/browse/YARN-7289
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-7289.000.patch, YARN-7289.001.patch, 
> YARN-7289.002.patch, YARN-7289.003.patch, YARN-7289.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7320) Duplicate LiteralByteStrings in SystemCredentialsForAppsProto.credentialsForApp_

2017-10-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219145#comment-16219145
 ] 

Wangda Tan commented on YARN-7320:
--

Thanks [~mi...@cloudera.com].

> Duplicate LiteralByteStrings in 
> SystemCredentialsForAppsProto.credentialsForApp_
> 
>
> Key: YARN-7320
> URL: https://issues.apache.org/jira/browse/YARN-7320
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Fix For: 3.0.0
>
> Attachments: YARN-7320.01.patch, YARN-7320.02.patch
>
>
> Using jxray (www.jxray.com) I've analyzed several heap dumps from YARN 
> Resource Manager running in a big cluster. The tool uncovered several sources 
> of memory waste. One problem, which results in wasting more than a quarter of 
> all memory, is a large number of duplicate {{LiteralByteString}} objects 
> coming from the following reference chain:
> {code}
> 1,011,810K (26.9%): byte[]: 5416705 / 100% dup arrays (22108 unique)
> ↖com.google.protobuf.LiteralByteString.bytes
> ↖org.apache.hadoop.yarn.proto.YarnServerCommonServiceProtos$.credentialsForApp_
> ↖{j.u.ArrayList}
> ↖j.u.Collections$UnmodifiableRandomAccessList.c
> ↖org.apache.hadoop.yarn.proto.YarnServerCommonServiceProtos$NodeHeartbeatResponseProto.systemCredentialsForApps_
> ↖org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.NodeHeartbeatResponsePBImpl.proto
> ↖org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.latestNodeHeartBeatResponse
> ↖org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode.rmNode
> ...
> {code}
> That is, collectively reference chains that look as above hold in memory 5.4 
> million {{LiteralByteString}} objects, but only ~22 thousand of these objects 
> are unique. Deduplicating these objects, e.g. using a Google Object Interner 
> instance, would save ~1GB of memory.
> It looks like the main place where the above {{LiteralByteString}}s are 
> created and attached to the {{SystemCredentialsForAppsProto}} objects is in 
> {{NodeHeartbeatResponsePBImpl.java}}, method 
> {{addSystemCredentialsToProto()}}. Probably adding a call to an interner 
> there will fix the problem. wi 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7197) Add support for a volume blacklist for docker containers

2017-10-25 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219115#comment-16219115
 ] 

Eric Yang commented on YARN-7197:
-

[~ebadger] said:
{quote}
The user can just mount above the blacklist and they get access to exactly what 
they want. This protects them from mounting the exact path in the blacklist, 
but that doesn't really buy us anything if they can mount the parent directory. 
If I can't prevent a file/directory underneath the parent directory from being 
accessed, then I don't see the utility of the blacklist.
{quote}

Black list is not the inverse of white list in this context.  Black list is 
designed to prevent certain exact path to be mounted.  Such as /dev, /proc, 
/sys, and /run.  In the examples above, allowing people to read yarn system 
directory can leak credentials about other users.  Allowing user to mount 
/run/docker.socket can let user jail break docker container to become root.  
Black list can prevent system api from being mounted to minimize attack 
surface.  Paranoid admin might configure docker to use socket path other than 
/run/docker.socket, and put the customized location in black list.  The same 
applies to YARN system directories.  Black list increases degree of difficulty 
for host to be cracked by keeping programmable API away from inside the 
containers.  


> Add support for a volume blacklist for docker containers
> 
>
> Key: YARN-7197
> URL: https://issues.apache.org/jira/browse/YARN-7197
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Shane Kumpf
>Assignee: Eric Yang
> Attachments: YARN-7197.001.patch, YARN-7197.002.patch
>
>
> Docker supports bind mounting host directories into containers. Work is 
> underway to allow admins to configure a whilelist of volume mounts. While 
> this is a much needed and useful feature, it opens the door for 
> misconfiguration that may lead to users being able to compromise or crash the 
> system. 
> One example would be allowing users to mount /run from a host running 
> systemd, and then running systemd in that container, rendering the host 
> mostly unusable.
> This issue is to add support for a default blacklist. The default blacklist 
> would be where we put files and directories that if mounted into a container, 
> are likely to have negative consequences. Users are encouraged not to remove 
> items from the default blacklist, but may do so if necessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6927) Add support for individual resource types requests in MapReduce

2017-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219036#comment-16219036
 ] 

Hadoop QA commented on YARN-6927:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
42s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m  
1s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 19s{color} | {color:orange} root: The patch generated 15 new + 879 unchanged 
- 3 fixed = 894 total (was 882) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
41s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
2s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
31s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}121m 32s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
51s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}249m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.mapred.pipes.TestPipeApplication |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-6927 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893934/YARN-6927.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 56a44ef97e65 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 

[jira] [Commented] (YARN-7370) Preemption properties should be refreshable

2017-10-25 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219035#comment-16219035
 ] 

Gergely Novák commented on YARN-7370:
-

[~eepayne] He must have meant YARN-6124, not YARN-6142. 

I attached a first WiP version for review.

> Preemption properties should be refreshable
> ---
>
> Key: YARN-7370
> URL: https://issues.apache.org/jira/browse/YARN-7370
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Gergely Novák
> Attachments: YARN-7370.001.patch
>
>
> At least the properties for {{max-allowable-limit}} and {{minimum-threshold}} 
> should be refreshable. It would also be nice to make 
> {{intra-queue-preemption.enabled}} and {{preemption-order-policy}} 
> refreshable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7370) Preemption properties should be refreshable

2017-10-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gergely Novák updated YARN-7370:

Attachment: YARN-7370.001.patch

> Preemption properties should be refreshable
> ---
>
> Key: YARN-7370
> URL: https://issues.apache.org/jira/browse/YARN-7370
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Gergely Novák
> Attachments: YARN-7370.001.patch
>
>
> At least the properties for {{max-allowable-limit}} and {{minimum-threshold}} 
> should be refreshable. It would also be nice to make 
> {{intra-queue-preemption.enabled}} and {{preemption-order-policy}} 
> refreshable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6927) Add support for individual resource types requests in MapReduce

2017-10-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219014#comment-16219014
 ] 

Hadoop QA commented on YARN-6927:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
31s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 11s{color} | {color:orange} root: The patch generated 15 new + 879 unchanged 
- 3 fixed = 894 total (was 882) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
36s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
5s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
56s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}123m  5s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
51s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}236m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Timed out junit tests | org.apache.hadoop.mapred.pipes.TestPipeApplication |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:ca8ddc6 |
| JIRA Issue | YARN-6927 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12893934/YARN-6927.005.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ee2e578f1259 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 

[jira] [Updated] (YARN-7276) Federation Router Web Service fixes

2017-10-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7276:
--
Attachment: YARN-7276.009.patch

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276-branch-2.000.patch, 
> YARN-7276-branch-2.001.patch, YARN-7276-branch-2.002.patch, 
> YARN-7276-branch-2.003.patch, YARN-7276-branch-2.004.patch, 
> YARN-7276.000.patch, YARN-7276.001.patch, YARN-7276.002.patch, 
> YARN-7276.003.patch, YARN-7276.004.patch, YARN-7276.005.patch, 
> YARN-7276.006.patch, YARN-7276.007.patch, YARN-7276.009.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7276) Federation Router Web Service fixes

2017-10-25 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/YARN-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-7276:
--
Attachment: (was: YARN-7276.008.patch)

> Federation Router Web Service fixes
> ---
>
> Key: YARN-7276
> URL: https://issues.apache.org/jira/browse/YARN-7276
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
> Attachments: YARN-7276-branch-2.000.patch, 
> YARN-7276-branch-2.001.patch, YARN-7276-branch-2.002.patch, 
> YARN-7276-branch-2.003.patch, YARN-7276-branch-2.004.patch, 
> YARN-7276.000.patch, YARN-7276.001.patch, YARN-7276.002.patch, 
> YARN-7276.003.patch, YARN-7276.004.patch, YARN-7276.005.patch, 
> YARN-7276.006.patch, YARN-7276.007.patch
>
>
> While testing YARN-3661, I found a few issues with the REST interface in the 
> Router:
> * No support for empty content (error 204)
> * Media type support
> * Attributes in {{FederationInterceptorREST}}
> * Support for empty states and labels
> * DefaultMetricsSystem initialization is missing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218980#comment-16218980
 ] 

Sunil G commented on YARN-7244:
---

I think latest patch seems fine to me.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch, 
> YARN-7244.012.patch, YARN-7244.013.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4511) Common scheduler changes supporting scheduler-specific implementations

2017-10-25 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218975#comment-16218975
 ] 

Haibo Chen commented on YARN-4511:
--

[~leftnoteasy] Would you like to look at the patch as well while I am working 
on the SLS tests?

> Common scheduler changes supporting scheduler-specific implementations
> --
>
> Key: YARN-4511
> URL: https://issues.apache.org/jira/browse/YARN-4511
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Haibo Chen
> Attachments: YARN-4511-YARN-1011.00.patch, 
> YARN-4511-YARN-1011.01.patch, YARN-4511-YARN-1011.02.patch, 
> YARN-4511-YARN-1011.03.patch, YARN-4511-YARN-1011.04.patch, 
> YARN-4511-YARN-1011.05.patch, YARN-4511-YARN-1011.06.patch, 
> YARN-4511-YARN-1011.07.patch, YARN-4511-YARN-1011.08.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-882) Specify per user quota for private/application cache and user log files

2017-10-25 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218966#comment-16218966
 ] 

Sun Rui commented on YARN-882:
--

+1 for this feature

> Specify per user quota for private/application cache and user log files
> ---
>
> Key: YARN-882
> URL: https://issues.apache.org/jira/browse/YARN-882
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>
> At present there is no limit on the number of files / size of the files 
> localized by single user. Similarly there is no limit on the size of the log 
> files created by user via running containers.
> We need to restrict the user for this.
> For LocalizedResources; this has serious concerns in case of secured 
> environment where malicious user can start one container and localize 
> resources whose total size >= DEFAULT_NM_LOCALIZER_CACHE_TARGET_SIZE_MB. 
> Thereafter it will either fail (if no extra space is present on disk) or 
> deletion service will keep removing localized files for other 
> containers/applications. 
> The limit for logs/localized resources should be decided by RM and sent to NM 
> via secured containerToken. All these configurations should per container 
> instead of per user or per nm.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7244) ShuffleHandler is not aware of disks that are added

2017-10-25 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218956#comment-16218956
 ] 

Kuhu Shukla commented on YARN-7244:
---

[~jlowe]/[~sunilg] appreciate any comments on the latest patch! Thank you.

> ShuffleHandler is not aware of disks that are added
> ---
>
> Key: YARN-7244
> URL: https://issues.apache.org/jira/browse/YARN-7244
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-7244.001.patch, YARN-7244.002.patch, 
> YARN-7244.003.patch, YARN-7244.004.patch, YARN-7244.005.patch, 
> YARN-7244.006.patch, YARN-7244.007.patch, YARN-7244.008.patch, 
> YARN-7244.009.patch, YARN-7244.010.patch, YARN-7244.011.patch, 
> YARN-7244.012.patch, YARN-7244.013.patch
>
>
> The ShuffleHandler permanently remembers the list of "good" disks on NM 
> startup. If disks later are added to the node then map tasks will start using 
> them but the ShuffleHandler will not be aware of them. The end result is that 
> the data cannot be shuffled from the node leading to fetch failures and 
> re-runs of the map tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5516) Add REST API for periodicity

2017-10-25 Thread Sean Po (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218925#comment-16218925
 ] 

Sean Po commented on YARN-5516:
---

There is a bug in the code in line 639 of InMemoryPlan, where the duration can 
potentially be negative. I have added a fix along with a test to catch this 
error. I will update a patch when the Yetus run completes for the previous 
patch in case there is any action required as a result of a failed Yetus run.

> Add REST API for periodicity
> 
>
> Key: YARN-5516
> URL: https://issues.apache.org/jira/browse/YARN-5516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sangeetha Abdu Jyothi
>Assignee: Sean Po
> Attachments: YARN-5516.v001.patch, YARN-5516.v002.patch, 
> YARN-5516.v003.patch, YARN-5516.v004.patch
>
>
> YARN-5516 changing REST API of the reservation system to support periodicity. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5516) Add REST API for periodicity

2017-10-25 Thread Sean Po (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Po updated YARN-5516:
--
Attachment: YARN-5516.v004.patch

TestIncreaseAllocationExpirer tests fail intermittently and is tracked by 
YARN-7378. TestOpportunisticContainerAllocatorAMService also fails 
intermittently and is tracked by YARN-6841.

The checkstyle issues above will be fixed in YARN-5516.v004.patch.

> Add REST API for periodicity
> 
>
> Key: YARN-5516
> URL: https://issues.apache.org/jira/browse/YARN-5516
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sangeetha Abdu Jyothi
>Assignee: Sean Po
> Attachments: YARN-5516.v001.patch, YARN-5516.v002.patch, 
> YARN-5516.v003.patch, YARN-5516.v004.patch
>
>
> YARN-5516 changing REST API of the reservation system to support periodicity. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7326) Add recursion support and configure RegistryDNS to lookup upstream

2017-10-25 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218775#comment-16218775
 ] 

Billie Rinaldi commented on YARN-7326:
--

+1 for patch 007. Committing to yarn-native-services.

> Add recursion support and configure RegistryDNS to lookup upstream
> --
>
> Key: YARN-7326
> URL: https://issues.apache.org/jira/browse/YARN-7326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Eric Yang
> Attachments: YARN-7326.yarn-native-services.001.patch, 
> YARN-7326.yarn-native-services.002.patch, 
> YARN-7326.yarn-native-services.003.patch, 
> YARN-7326.yarn-native-services.004.patch, 
> YARN-7326.yarn-native-services.005.patch, 
> YARN-7326.yarn-native-services.006.patch, 
> YARN-7326.yarn-native-services.007.patch
>
>
> [~aw] helped to identify these issues: 
> Now some general bad news, not related to this patch:
> Ran a few queries, but this one is a bit concerning:
> {code}
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 .
> ;; Warning: query response not set
> ; <<>> DiG 9.10.3-P4-Ubuntu <<>> @localhost -p 54 .
> ; (2 servers found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOTAUTH, id: 47794
> ;; flags: rd ad; QUERY: 0, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#54(127.0.0.1)
> ;; WHEN: Thu Oct 12 16:04:54 PDT 2017
> ;; MSG SIZE  rcvd: 12
> root@ubuntu:/hadoop/logs# dig @localhost -p 54 axfr .
> ;; Connection to ::1#54(::1) for . failed: connection refused.
> ;; communications error to 127.0.0.1#54: end of file
> root@ubuntu:/hadoop/logs# 
> {code}
> It looks like it effectively fails when asked about a root zone, which is bad.
> It's also kind of interesting in what it does and doesn't log. Probably 
> should be configured to rotate logs based on size not date.
> The real showstopper though: RegistryDNS basically eats a core. It is running 
> with 100% cpu utilization with and without jsvc. On my laptop, this is 
> triggering my fan.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >