[jira] [Commented] (YARN-1197) Add container merge support in YARN

2013-09-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766343#comment-13766343
 ] 

Wangda Tan commented on YARN-1197:
--

I don't know is it possible to add this in RM or NM side. 
And I think it should be easier to move some existing applications (OpenMPI, 
PBS, etc.) to YARN platform, because such application should have their own 
daemons in old implementation, and container merge can be helpful to leverage 
their original logic with less modifications to be a resident of YARN :)
Welcome your suggestions and comments!
--
Thanks,
Wangda

 Add container merge support in YARN
 ---

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan

 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is,
 In some applications (like OpenMPI) has their own daemons in each node (one 
 for each node) in their original implementation, and their user's processes 
 are directly launched by its local daemon (like task-tracker in MRv1, but 
 it's per-application). Many functionalities are depended on the pipes created 
 when a process forked by its father, like IO-forwarding, process monitoring 
 (it will do more logic than what NM did for us) and may cause some 
 scalability issues.
 A very common resource request in MPI world is, give me 100G memory in the 
 cluster, I will launch 100 processes in this resource. In current YARN, we 
 have following two choices to make this happen,
 1) Send allocation request with 1G memory iteratively, until we got 100G 
 memories in total. Then ask NM launch such 100 MPI processes. That will cause 
 some problems like cannot support IO-forwarding, processes monitoring, etc. 
 as mentioned above.
 2) Send a larger resource request, like 10G. But we may encounter following 
 problems,
2.1 Such a large resource request is hard to get at one time.
2.2 We cannot use other resources more than the number we specified in the 
 node (we can only launch one daemon in one node).
2.3 Hard to decide how much resource to ask.
 So my proposal is,
 1) We can incrementally send resource request with small resources like 
 before, until we get enough resources in total
 2) Merge resource in the same node, make only one big container in each node
 3) Launch daemons in each node, and the daemon will spawn its local processes 
 and manage them.
 For example,
 We need to run 10 processes, 1G for each, finally we got
 container 1, 2, 3, 4, 5 in node1.
 container 6, 7, 8 in node2.
 container 9, 10 in node3.
 Then we will,
 merge [1, 2, 3, 4, 5] to container_11 with 5G, launch a daemon, and the 
 daemon will launch 5 processes
 merge [6, 7, 8] to container_12 with 3G, launch a daemon, and the daemon will 
 launch 3 processes
 merge [9, 10] to container_13 with 2G, launch a daemon, and the daemon will 
 launch 2 processes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-1197) Add container merge support in YARN

2013-09-13 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-1197:


 Summary: Add container merge support in YARN
 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan


Currently, YARN cannot support merge several containers in one node to a big 
container, which can make us incrementally ask resources, merge them to a 
bigger one, and launch our processes. The user scenario is,

In some applications (like OpenMPI) has their own daemons in each node (one for 
each node) in their original implementation, and their user's processes are 
directly launched by its local daemon (like task-tracker in MRv1, but it's 
per-application). Many functionalities are depended on the pipes created when a 
process forked by its father, like IO-forwarding, process monitoring (it will 
do more logic than what NM did for us) and may cause some scalability issues.

A very common resource request in MPI world is, give me 100G memory in the 
cluster, I will launch 100 processes in this resource. In current YARN, we 
have following two choices to make this happen,
1) Send allocation request with 1G memory iteratively, until we got 100G 
memories in total. Then ask NM launch such 100 MPI processes. That will cause 
some problems like cannot support IO-forwarding, processes monitoring, etc. as 
mentioned above.
2) Send a larger resource request, like 10G. But we may encounter following 
problems,
   2.1 Such a large resource request is hard to get at one time.
   2.2 We cannot use other resources more than the number we specified in the 
node (we can only launch one daemon in one node).
   2.3 Hard to decide how much resource to ask.

So my proposal is,
1) We can incrementally send resource request with small resources like before, 
until we get enough resources in total
2) Merge resource in the same node, make only one big container in each node
3) Launch daemons in each node, and the daemon will spawn its local processes 
and manage them.

For example,
We need to run 10 processes, 1G for each, finally we got
container 1, 2, 3, 4, 5 in node1.
container 6, 7, 8 in node2.
container 9, 10 in node3.
Then we will,
merge [1, 2, 3, 4, 5] to container_11 with 5G, launch a daemon, and the daemon 
will launch 5 processes
merge [6, 7, 8] to container_12 with 3G, launch a daemon, and the daemon will 
launch 3 processes
merge [9, 10] to container_13 with 2G, launch a daemon, and the daemon will 
launch 2 processes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1197) Add container merge support in YARN

2013-09-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767242#comment-13767242
 ] 

Wangda Tan commented on YARN-1197:
--

Hi Bikas,
Thanks for reply, it helps me understanding YARN mechanism, but I think 
there're some misunderstanding. 

In some HPC cases, how many processes will be launched in different node is not 
determinated before we submit job, just give it total enough resource (like 
100G) in the cluster to it. So we will have following problems,
1) We will launch exactly one daemon process in each node, and this daemon 
process launch other local processes. This is root cause of why we want this 
feature
2) We don't know how much resource to request in this case,
   # Large requests may cause some wasting, and it's hard to get from RM
   # Small requests may not enough (when cluster is busy, we cannot regret if 
we already have a small room in a node, we can only return it and ask a larger 
one, but when we returned it, the room may be occupied by another app, and we 
cannot take it back.

When we have a such API, we can implement our AM more easily, we can 
iteratively send request to RM which is depended on what we already have. And 
finally, we can merge them to different big containers and give it to real app. 
(like PBS/TORQUE/MPI), we can make a small cluster in YARN, and can support 
HPC workloads very well. (It's a little similar to mesos, aggregate resources 
to a slave daemon, and the slave daemon can manage these resources, but we 
don't need make it dynamic -- increase container size when its running, just 
merge it before we start processes will be good enough) :)

 Add container merge support in YARN
 ---

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan

 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1197) Support increasing resources of an allocated container

2013-09-17 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769256#comment-13769256
 ] 

Wangda Tan commented on YARN-1197:
--

{quote}
Increasing resources for a container while in acquired state is not different 
from waiting for some more time on the RM and allocating the larger container 
in the first attempt, right?
{quote}
I think there's a little difference here, because waiting for resource for a 
big container in the first attempt, scheduler will put the the request to 
reservedContainer at FSSchedulableNode or FiCaSchedulerNode. This will be 
considered as an exception, RM will try to satisfy such reserved container 
first when many different requests existed at the same time in a same node.
But if we try to ask more resource in an acquired container, I don't know 
what's your preferring, do you want to create another exception which can put 
an acquired container to *ScheduableNode to make it can get prior proceeded 
or just simply make the request as a normal resource request?

{quote}
Also, the RM starts a timer for each acquired container and expects the 
container to be launched on the NM before the timer expires. So we dont have 
too much time for the container to be launched and thus we cannot wait for 
increasing the resources.
{quote}
I don't know if we can refresh(receivePing) the timer for a container when we 
successfully increased resource for it?

{quote}
To be useful, we have to be able to increase the resources of a running 
container. I agree that its a significant change. So making the change will 
need a more thorough investigation and clear design proposal.
{quote}
Agree! I'd like to help moving this forward, I need investigate and consider 
end-to-end cases and draft a design proposal for it, once I've some ideas or 
question, I will let you know :)

Thanks

 Support increasing resources of an allocated container
 --

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan

 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-09-17 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770380#comment-13770380
 ] 

Wangda Tan commented on YARN-1197:
--

I totally agree with you, I'll work out a plan considered increase/decrease an 
available container (allocated/running) with RM-AM-NM communication, will keep 
you posted. Thanks.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan

 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1197) Support changing resources of an allocated container

2013-09-22 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1197:
-

Attachment: yarn-1197.pdf

Added a initial proposal for it, include increase/decrease a aquired or running 
container, hope anybody can help me review it. Then we can move forward to 
break down tasks and start work on it. Thanks. 

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-2297) Preemption can hang in corner case by not allowing any task container to proceed.

2014-07-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063156#comment-14063156
 ] 

Wangda Tan commented on YARN-2297:
--

Hi [~chris.douglas],
Thanks for your jumping in,
bq. Does this occur when the absolute guaranteed capacity of a queue is smaller 
than the minimum container size?
This can be happened when 
(used_capacity_of_a_queue + newly_allocated_container_resource  
guaranteed_resource_of_a_queue)  (used_capacity_of_a_queue  
guaranteed_resource_of_a_queue),
So I propose to change
{code}
while (toBePreempt  0):
  foreach application:
foreach container:
  if (toBePreempt  0):
do preemption
{code}
To
{code}
while (toBePreempt  0):
  foreach application:
foreach container:
  if (toBePreempt  0) and (container.resource  toBePreempt * 2):
do preemption
{code}
To make sure a container is not preempted too aggressive. 
Does this answered your question?

Thanks,
Wangda

 Preemption can hang in corner case by not allowing any task container to 
 proceed.
 -

 Key: YARN-2297
 URL: https://issues.apache.org/jira/browse/YARN-2297
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
Priority: Critical

 Preemption can cause hang issue in single-node cluster. Only AMs run. No task 
 container can run.
 h3. queue configuration
 Queue A/B has 1% and 99% respectively. 
 No max capacity.
 h3. scenario
 Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 
 1 user.
 Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. 
 Occupy entire cluster.
 Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each.
 Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. 
 No task of either app can proceed. 
 h3. commands
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.bytespermap=2147483648 
 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.mapsperhost=1 
 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000  -rt 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2297) Preemption can hang in corner case by not allowing any task container to proceed.

2014-07-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063249#comment-14063249
 ] 

Wangda Tan commented on YARN-2297:
--

Hi [~chris.douglas],
Thanks for your reply, I think dead zone is really a good idea to solve the 
jitter problem.
Wangda

 Preemption can hang in corner case by not allowing any task container to 
 proceed.
 -

 Key: YARN-2297
 URL: https://issues.apache.org/jira/browse/YARN-2297
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
Priority: Critical

 Preemption can cause hang issue in single-node cluster. Only AMs run. No task 
 container can run.
 h3. queue configuration
 Queue A/B has 1% and 99% respectively. 
 No max capacity.
 h3. scenario
 Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 
 1 user.
 Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. 
 Occupy entire cluster.
 Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each.
 Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. 
 No task of either app can proceed. 
 h3. commands
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.bytespermap=2147483648 
 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.mapsperhost=1 
 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000  -rt 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2297) Preemption can hang in corner case by not allowing any task container to proceed.

2014-07-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063255#comment-14063255
 ] 

Wangda Tan commented on YARN-2297:
--

Hi [~sunilg],
Thanks for providing thoughts here!
For your 1st point, I think it should be better solved as Chris suggested, 
using the dead zone parameter 
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity

For your 2nd point,
{code}
I feel now we will take a percentage here to find which queue is under utilized 
more based on its used vs guaranteed_capacity ?
{code}
I think if we use ratio(used, guaranteed), a problem is, assuming qA has 
configured 100MB, it used 10MB, qB has 2GB, it used 500MB, can we say we should 
allocate resource for qA instead of qB?
We've some other options here,
1. Use (guaranteed - used)
2. Use a combined function like sigmoid(ratio(used, guaranteed)) * (guaranteed 
- used)
Do you have any ideas here?

Thanks,
Wangda

 Preemption can hang in corner case by not allowing any task container to 
 proceed.
 -

 Key: YARN-2297
 URL: https://issues.apache.org/jira/browse/YARN-2297
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
Priority: Critical

 Preemption can cause hang issue in single-node cluster. Only AMs run. No task 
 container can run.
 h3. queue configuration
 Queue A/B has 1% and 99% respectively. 
 No max capacity.
 h3. scenario
 Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 
 1 user.
 Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. 
 Occupy entire cluster.
 Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each.
 Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. 
 No task of either app can proceed. 
 h3. commands
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.bytespermap=2147483648 
 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.mapsperhost=1 
 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000  -rt 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063316#comment-14063316
 ] 

Wangda Tan commented on YARN-796:
-

Hi [~john.jian.fang],
Thanks for providing use cases.
bq. Why do users have to choose either decentralized or centralized label 
configuration?
This is because cases like user may what to remove some static labels via 
dynamic API, and for next time RM restart, it will load static labels again. It 
will be hard to manage static/dynamic together, we need handling conflicts, etc.
bq. To me, the restful API could be more useful than the Admin UI.
I think both of them are very important in normal cases. RESTful API can be 
used by other management framework. Admin UI can be directly used by admin to 
tagging nodes.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063324#comment-14063324
 ] 

Wangda Tan commented on YARN-796:
-

Hi [~sunilg],
Thanks for reply,

bq. 1. In our use case scenarios, we are more likely to have OR and NOT. I feel 
combination of these labels need to be in a defined or restricted way. Result 
of some combinations (AND, OR and NOT) may come invalid, and some may need to 
be reduced. This complexity need not have to bring to RM to take a final 
decision. 
Agree that we need some restricted way, we need think harder about this :)
bq. 2. Reservation: If a node label has many nodes under it, then there is a 
chance of reservation. Valid candidates may come later, so solution can be look 
in to this aspect also. Node Label level reservations ?
I haven't thought about this before, I'll think about it, thanks for reminding 
me
bq. 3. Centralized Configuration: If a new node is added to cluster, may be it 
can be started by having a label configuration in its yarn-site.xml. This may 
be fine I feel. your thoughts?
I think this is more like a decentralized configuration in your description. 
For centralized configuration, I think maybe there's a node label repo which 
stores mapping of nodes to labels. And we will provide RESTful API for changing 
them.

Thanks,
Wangda


 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2257) Add user to queue mappings to automatically place users' apps into specific queues

2014-07-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063721#comment-14063721
 ] 

Wangda Tan commented on YARN-2257:
--

Hi [~sandyr],
Thanks for pointing me this,
I have a question here, what's the expected behavior when an admin what's to 
add a new QueuePlacementRule? I guess a new class need to be added to Hadoop 
project, and need rebuild Hadoop, right?
I think it's a little over-kill here, user may want convenient instead of 
flexibility. If you think the rules I mentioned is not flexible enough, maybe 
we can extend it to rules with pattern, like %user-root.users.%user which 
means putting application from %user to root.users.%user. Which maybe easier 
for admin to add new QueuePlacementRule.
I agree it's a good fit for YARN in general, but we should make it easier to 
use. 

Please feel free to let me know you comments, thanks.
Wangda

 Add user to queue mappings to automatically place users' apps into specific 
 queues
 --

 Key: YARN-2257
 URL: https://issues.apache.org/jira/browse/YARN-2257
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Patrick Liu
Assignee: Vinod Kumar Vavilapalli
  Labels: features

 Currently, the fair-scheduler supports two modes, default queue or individual 
 queue for each user.
 Apparently, the default queue is not a good option, because the resources 
 cannot be managed for each user or group.
 However, individual queue for each user is not good enough. Especially when 
 connecting yarn with hive. There will be increasing hive users in a corporate 
 environment. If we create a queue for a user, the resource management will be 
 hard to maintain.
 I think the problem can be solved like this:
 1. Define user-queue mapping in Fair-Scheduler.xml. Inside each queue, use 
 aclSubmitApps to control user's ability.
 2. Each time a user submit an app to yarn, if the user has mapped to a queue, 
 the app will be scheduled to that queue; otherwise, the app will be submitted 
 to default queue.
 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2285) Preemption can cause capacity scheduler to show 5,000% queue capacity.

2014-07-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063725#comment-14063725
 ] 

Wangda Tan commented on YARN-2285:
--

Thanks comments from Vinod and Sunil,

bq. From the look of it, it sounds like this isn't tied to preemption. It looks 
like this was a bug that exists even when preemption is not enabled. Can we 
validate that?
I'll validate this tomorrow

The usage of root queue above 100% is caused by reserved container, currently 
the UI shows queue allocated+reserved, we may need change that for user easier 
understand what happened.

 Preemption can cause capacity scheduler to show 5,000% queue capacity.
 --

 Key: YARN-2285
 URL: https://issues.apache.org/jira/browse/YARN-2285
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
 Environment: Turn on CS Preemption.
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
Priority: Minor
 Attachments: preemption_5000_percent.png


 I configure queue A, B to have 1%, 99% capacity respectively. There is no max 
 capacity for each queue. Set high user limit factor.
 Submit app 1 to queue A. AM container takes 50% of cluster memory. Task 
 containers take another 50%. Submit app 2 to queue B. Preempt task containers 
 of app 1 out. Turns out capacity of queue B increases to 99% but queue A has 
 5000% used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064483#comment-14064483
 ] 

Wangda Tan commented on YARN-415:
-

Hi [~eepayne],
bq. Since every RMAppAttemptImpl object has a reference to an 
RMAppAttemptMetrics object, you are suggesting that I move the resource usage 
stats to RMAppAttemptMetrics.
Yes
bq. Also, when reporting on resource usage, use the reporting methods from 
RMAppAttempt and RMApp.
I'm not quite sure about what's the reporting methods, it should be 
getRMAppAttemptMetrics in attempt and getRMAppMetrics in app.
bq. You're suggestion is to keep resource usage stats only for running 
containers.
Yes
bq. For completed containers, you are suggesting that the calculation be done 
for final resource usage stats within the RMContainerImpl#FinishTransition 
method and have that send the resource stats as a payload within the 
RMAppAttemptC ... 
No, you can update current trunk code, and check 
RMContainerImpl#FinishedTransition#updateMetricsIfPreempted, you can change the 
updateMetricsIfPreempted to something like updateAttemptMetrics. And create a 
new method in RMAppAttemptMetrics, like updateResourceUtilization. The 
benefit of doing this is you don need send payload to RMAppAttempt, all you 
needed information should be existed in RMContainer.

Do them make sense to you? Please feel free to let me know if you have any 
questions.

Thanks,
Wangda

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2285) Preemption can cause capacity scheduler to show 5,000% queue capacity.

2014-07-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064492#comment-14064492
 ] 

Wangda Tan commented on YARN-2285:
--

I've verified this will still happen even if preemption is not enabled, both 
for 5000% queue usage and above 100% root queue usage.

 Preemption can cause capacity scheduler to show 5,000% queue capacity.
 --

 Key: YARN-2285
 URL: https://issues.apache.org/jira/browse/YARN-2285
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.5.0
 Environment: Turn on CS Preemption.
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
Priority: Minor
 Attachments: preemption_5000_percent.png


 I configure queue A, B to have 1%, 99% capacity respectively. There is no max 
 capacity for each queue. Set high user limit factor.
 Submit app 1 to queue A. AM container takes 50% of cluster memory. Task 
 containers take another 50%. Submit app 2 to queue B. Preempt task containers 
 of app 1 out. Turns out capacity of queue B increases to 99% but queue A has 
 5000% used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2257) Add user to queue mappings to automatically place users' apps into specific queues

2014-07-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064512#comment-14064512
 ] 

Wangda Tan commented on YARN-2257:
--

Hi [~sandyr],
I agree we should have an existing library for queue rules. But I feel like 
we'd better add simple pattern match mechanism like %user-root.users.%user I 
mentioned before. Which will take reasonable effort but can cover more cases, 
do you agree with that?

Thanks,

 Add user to queue mappings to automatically place users' apps into specific 
 queues
 --

 Key: YARN-2257
 URL: https://issues.apache.org/jira/browse/YARN-2257
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Patrick Liu
Assignee: Vinod Kumar Vavilapalli
  Labels: features

 Currently, the fair-scheduler supports two modes, default queue or individual 
 queue for each user.
 Apparently, the default queue is not a good option, because the resources 
 cannot be managed for each user or group.
 However, individual queue for each user is not good enough. Especially when 
 connecting yarn with hive. There will be increasing hive users in a corporate 
 environment. If we create a queue for a user, the resource management will be 
 hard to maintain.
 I think the problem can be solved like this:
 1. Define user-queue mapping in Fair-Scheduler.xml. Inside each queue, use 
 aclSubmitApps to control user's ability.
 2. Each time a user submit an app to yarn, if the user has mapped to a queue, 
 the app will be scheduled to that queue; otherwise, the app will be submitted 
 to default queue.
 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-17 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064749#comment-14064749
 ] 

Wangda Tan commented on YARN-2305:
--

Hi [~sunilg],
Thanks for taking this issue,
I think there're two issues in your screenshot,
1) Root queue usage above 100%
It is possible that queue used resource is larger than its guaranteed resource 
because of container reservation. We may need show reserved resource and used 
resource separately in our web UI. I encountered a similar problem in YARN-2285 
too.

2) Total cluster memory showing on web UI is different from 
CapacityScheduler.clusterResource
This seems a new issue to me, memory showing on web UI is 
usedMemory+availableMemory of root queue. I feel like 
CSQueueUtils.updateQueueStatistics has some issues when we reserve container in 
LeafQueue. Hope to get more thoughts in your side.

Thanks,
Wangda

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Sunil G
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-07-17 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2308:


 Summary: NPE happened when RM restart after CapacityScheduler 
queue configuration changed 
 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan


I encountered a NPE when RM restart
{code}
2014-07-16 07:22:46,957 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ATTEMPT_ADDED to the scheduler
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:744)
{code}
And RM will be failed to restart.

This is caused by queue configuration changed, I removed some queues and added 
new queues. So when RM restarts, it tries to recover history applications, and 
when any of queues of these applications removed, NPE will be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-17 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065910#comment-14065910
 ] 

Wangda Tan commented on YARN-415:
-

Hi [~eepayne],
Thanks for updating your patch, the failed test case should be irrelevant to 
your changes, it is tracked by YARN-2270.
Reviewing..

Thanks,
Wangda

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066145#comment-14066145
 ] 

Wangda Tan commented on YARN-415:
-

Hi [~eepayne],
I've spent some time to review and think about the JIRA. I have a 

1. Revert changes of SchedulerAppReport, we already have changed 
ApplicationResourceUsageReport, and memory utilization should be a part of 
resource usage report.

2. Remove getMemory(VCore)Seconds from RMAppAttempt, modify 
RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running 
resource utilization.

3. put
{code}
 ._(Resources:,
String.format(%d MB-seconds, %d vcore-seconds, 
app.getMemorySeconds(), app.getVcoreSeconds()))
{code}
from Application Overview to Application Metrics, and rename it to 
Resource Seconds. It should be considered as a part of application metrics 
instead of overview.

4. Change finishedMemory/VCoreSeconds to AtomicLong in RMAppAttemptMetrics to 
make it can be efficiently accessed by multi-thread.

5. I think it's better to add a new method in SchedulerApplicationAttempt like 
getMemoryUtilization, which will only return memory/cpu seconds. We do this to 
prevent locking scheduling thread when showing application metrics on web UI.
getMemoryUtilization will be used by 
RMAppAttemptMetrics#getFinishedMemory(VCore)Seconds to return completed+running 
resource utilization. And used by 
SchedulerApplicationAttempt#getResourceUsageReport as well.

The MemoryUtilization class may contain two fields: 
runningContainerMemory(VCore)Seconds.

6. Since compute running container resource utilization is not O(1), we need 
scan all containers under an application. I think it's better to cache a 
previous compute result, and it will be recomputed after several seconds (maybe 
1-3 seconds should be enough) elapsed.

And you can modify SchedulerApplicationAttempt#liveContainers to be a 
ConcurrentHashMap. With #6, get memory utilization to show metrics on web UI 
will not lock scheduling thread at all.

Please let me know if you have any comments here,

Thanks,
Wangda


 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2305) When a container is in reserved state then total cluster memory is displayed wrongly.

2014-07-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066154#comment-14066154
 ] 

Wangda Tan commented on YARN-2305:
--

Thanks for your elaboration,
I understand now, I think this is inconsistency between ParentQueue and 
LeafQueue, using clusterResource instead of allocated+available can definitely 
solve this problem.

 When a container is in reserved state then total cluster memory is displayed 
 wrongly.
 -

 Key: YARN-2305
 URL: https://issues.apache.org/jira/browse/YARN-2305
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: J.Andreina
Assignee: Sunil G
 Attachments: Capture.jpg


 ENV Details:
 =  
  3 queues  :  a(50%),b(25%),c(25%) --- All max utilization is set to 
 100
  2 Node cluster with total memory as 16GB
 TestSteps:
 =
   Execute following 3 jobs with different memory configurations for 
 Map , reducer and AM task
   ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=a 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=2048 
 /dir8 /preempt_85 (application_1405414066690_0023)
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=b 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=2048 
 -Dyarn.app.mapreduce.am.resource.mb=2048 -Dmapreduce.reduce.memory.mb=2048 
 /dir2 /preempt_86 (application_1405414066690_0025)
  
  ./yarn jar wordcount-sleep.jar -Dmapreduce.job.queuename=c 
 -Dwordcount.map.sleep.time=2000 -Dmapreduce.map.memory.mb=1024 
 -Dyarn.app.mapreduce.am.resource.mb=1024 -Dmapreduce.reduce.memory.mb=1024 
 /dir2 /preempt_62
 Issue
 =
   when 2GB memory is in reserved state  totoal memory is shown as 
 15GB and used as 15GB  ( while total memory is 16GB)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-07-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066156#comment-14066156
 ] 

Wangda Tan commented on YARN-2308:
--

I think it should doable, queue of application missing should not make RM 
failure to start.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Priority: Critical

 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067542#comment-14067542
 ] 

Wangda Tan commented on YARN-2008:
--

Hi [~cwelch],
Thanks for working on this patch. However, I've thought about this for a while, 
I'm wondering if we should change this behavior.
With preemption, we don't need consider used capacity of sibling or sibling of 
parents. Preemption policy will take care of over used queues. In addition, 
even if we have preemption disabled, the headroom should not be changed as well 
(see next).
If we define headroom as maximum capacity of an application can get, the 
formula headroom = min((userLimit, queue-max-cap) - consumed) should be 
correct. But if we define headroom as maximum *guaranteed* capacity of an 
application can get, the formula should be changed to headroom = 
min((userLimit, queue-max-cap, queue-guaranteed-cap) - consumed).

Does this make sense to you? Please let me know if you have any comments.

Thanks,
Wangda

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2297) Preemption can prevent progress in small queues

2014-07-19 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067544#comment-14067544
 ] 

Wangda Tan commented on YARN-2297:
--

bq. I feel this can create a little bit more starvation for queues configured 
with less capacity.
+1, this should not be reasonable
bq. Yes. This make more sense, it can neutralize ratio as well as difference to 
a uniform way. I feel more sampling can be done to come with a better approach. 
i can check and update you.
I feel it should be a better way too, looking forward your update, we should 
make a fact-based decision :)

 Preemption can prevent progress in small queues
 ---

 Key: YARN-2297
 URL: https://issues.apache.org/jira/browse/YARN-2297
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.0
Reporter: Tassapol Athiapinya
Assignee: Wangda Tan
Priority: Critical

 Preemption can cause hang issue in single-node cluster. Only AMs run. No task 
 container can run.
 h3. queue configuration
 Queue A/B has 1% and 99% respectively. 
 No max capacity.
 h3. scenario
 Turn on preemption. Configure 1 NM with 4 GB of memory. Use only 2 apps. Use 
 1 user.
 Submit app 1 to queue A. AM needs 2 GB. There is 1 task that needs 2 GB. 
 Occupy entire cluster.
 Submit app 2 to queue B. AM needs 2 GB. There are 3 tasks that need 2 GB each.
 Instead of entire app 1 preempted, app 1 AM will stay. App 2 AM will launch. 
 No task of either app can proceed. 
 h3. commands
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomtextwriter 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.bytespermap=2147483648 
 -Dmapreduce.job.queuename=A -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M 
 -Dmapreduce.randomtextwriter.mapsperhost=1 
 -Dmapreduce.randomtextwriter.totalbytes=2147483648 dir1
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar sleep 
 -Dmapreduce.map.memory.mb=2000 
 -Dyarn.app.mapreduce.am.command-opts=-Xmx1800M 
 -Dmapreduce.job.queuename=B -Dmapreduce.map.maxattempts=100 
 -Dmapreduce.am.max-attempts=1 -Dyarn.app.mapreduce.am.resource.mb=2000 
 -Dmapreduce.map.java.opts=-Xmx1800M -m 1 -r 0 -mt 4000  -rt 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068116#comment-14068116
 ] 

Wangda Tan commented on YARN-796:
-

Really thanks all your comments above,

As Sandy, Alejandro and Allen mentioned, concerns of centralized configuration. 
My thinking is, node label is more dynamic comparing to any other existing 
options of NM.
An important use case we can see is, some customers want to mark label on each 
node indicate which department/team the node belongs to, when a new team comes 
in and new machines added, labels may need to be changed. And also, it is 
possible that the whole cluster is booked to run some huge batch job at 
12am-2am for example. So such labels will be changed frequently. If we only 
have distributed configuration on each node, it is a nightmare for admins to 
re-configure.
I think we should have a same internal interface for destributed/centralized 
configuration. Like what we've done for RMStateStore.

And as Jian Fang mentioned,
bq. doubt about the assumption for admin to configure labels for a cluster.
I think using script to mark labels is a great way to saving configuration 
works. But lots of other use cases need human intervention as well. Good 
examples like from Allen and me.

Thanks,
Wangda

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068124#comment-14068124
 ] 

Wangda Tan commented on YARN-796:
-

Hi Alejandro, 
I totally understand the use case I mentioned is antithetical of the design 
philosophy of YARN, which should be elastically sharing resources of a 
multi-tenant environment. But hard partition has some important use cases, even 
if this is not strongly recommended.
Like in some performance-sensitive environment. For example user may want to 
run HBase master/region-servers in a group of nodes, and don't want any other 
tasks running in these nodes even if they have free resource.
Our current queue configuration cannot solve such problem, of course user can 
create a separate YARN cluster in this case, but I think make such NMs under a 
same RM is easy to use and manage.

Do you agree?
Thanks,

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068145#comment-14068145
 ] 

Wangda Tan commented on YARN-796:
-

Alejandro,
I think we've mentioned this in our design doc, you check check 
https://issues.apache.org/jira/secure/attachment/12654446/Node-labels-Requirements-Design-doc-V1.pdf,
 top level requirements-admin tools-Security and access controls for 
managing Labels. Please let me know if you have any comments on it.

Thanks :),

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-07-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068147#comment-14068147
 ] 

Wangda Tan commented on YARN-1198:
--

I've just taken a look at all sub tasks of this JIRA, I'm wondering if we 
should define what is the headroom first.
In previous YARN, including YARN-1198 the headroom is defined as the maximum 
resource of an application can get.
And in YARN-2008, the headroom is defined as the available resource of an 
application can get, because we already considered used resource of sibling 
queues.

I'm afraid if we need add a new field like guaranteed headroom of an 
application consider its absolute capacity (not maximum capacity) and 
user-limits, etc. We may keep both of them because,
- The maximum resource is not always achievible because sum of maximum resource 
of leaf queues may excess cluster resource.
- With preemption, resource beyond guaranteed resource will be likely 
preempted. It should be consider as a temporary resource.

And with this, AM can,
- Using guaranteed headroom to allocate resource which will not be preempted.
- Using maximum headroom to try to allocate resource beyond its guaranteed 
headroom.

And in my humble opinion, the available resource of an application can get 
doesn't make a lot of sense here, and may cause some backward-compatible 
problems as well. Because in a dynamic cluster, the number can change rapidly, 
it is possible that a cluster is fulfilled by another application just happens 
one second after the AM got the available headroom.
And also, this field can not solve the deadlock problem as well, a malicious 
application can ask much more resource of this, or a careless developer totally 
ignore this field. The only valid solution in my head is putting such logic 
into scheduler side, and enforce resource usage by preemption policy.

Any thoughts? [~jlowe], [~cwelch]

Thanks,
Wangda

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1198.1.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068148#comment-14068148
 ] 

Wangda Tan commented on YARN-2008:
--

Hi [~cwelch], [~airbots],
I've put my comment on YARN-1198: 
https://issues.apache.org/jira/browse/YARN-1198?focusedCommentId=14068147page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14068147,
 because I think it is a general comment of headroom.
Please share your ideas here,

Thanks,
Wangda

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068163#comment-14068163
 ] 

Wangda Tan commented on YARN-796:
-

Hi [~sunilg],
bq. 2. Regarding reservations, how about introducing node-label reservations. 
Ideas is like, if an application is lacking resource on a node, it can reserve 
on that node as well as to node-label. So when a suitable node update comes 
from another node in same node-label, can try allocating container in new node 
by unreserving from old node.
I think this makes sense, we'd better support this. I will check our current 
resource reservation/unreservation logic how to support it, will keep you 
posted.

bq. 3. My approach was more like have a centralized configuration, but later 
after some time, if want to add a new node to cluster, then it can start with a 
hardcoded label in its yarn-site. In your approach, we need to use REStful API 
or admin command to bring this node under one label. May be while start up 
itself this node can be set under a label. your thoughts?
I think a problem of mixed centralized/distributed configuration I can see is, 
it will be hard to manage them after RM/NM restart -- should we use labels 
specified in NM config or our centralized config? I also replied Jian Fang 
previously about this: 
https://issues.apache.org/jira/browse/YARN-796?focusedCommentId=14063316page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14063316.
Maybe a workaround is we can define the centralized config all always overwrite 
distributed config. E.g. user defined GPU in NM config, and admin use RESTful 
added FPGA, RM will serialize both GPU, FPGA into a centralized storage 
system. And after RM restart or NM restart, RM will ignore NM config if 
anything defined in RM. But I still think it's better to avoid use both of them 
together.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068184#comment-14068184
 ] 

Wangda Tan commented on YARN-796:
-

bq. You can solve this problem today by just running separate RMs.
I think it's not good for configure, user need maintain several configuration 
folders in their nodes for submission job.

bq. In practice, however, marking nodes for specific teams in queue systems 
doesn't work because doing so assumes that the capacity never changes... i.e
It is possible that you cannot replace a failure node by a random node in 
heterogeneous cluster. E.g. only some nodes have GPUs, and these nodes will be 
dedicated to be used by data scientist team. Percentage of queue capacity 
doesn't make a lot of sense here. 

bq. ... except, you guessed it: this is a solved problem today too. You just 
need to make sure the container sizes that are requested consume the whole node.
Assume a HBase master want to run a node have 64G mem and infiniband. You can 
ask a 64G mem container, but it may be like to be allocated to a 128G node but 
doesn't have infiniband.
Again, it's another heterogeneous issue.
And ask for such a big container may need take a great amount of time, wait for 
resource reservation, etc.

bq. it still wouldn't be a nightmare because any competent admin would use 
configuration management to roll out changes to the nodes in a controlled 
manner.
It is very likely not every admin has scripts like you, especially some new 
YARN users, we'd better make this feature can be used out-of-box

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068248#comment-14068248
 ] 

Wangda Tan commented on YARN-796:
-

Allen,
I think what we was just talking about is how to support hard partition use 
case in YARN, aren't we? I'm surprised to get a -1 here, Nobody has ever said 
dynamic labeling from NM will not be supported.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069615#comment-14069615
 ] 

Wangda Tan commented on YARN-796:
-

Hi Tucu,
Thanks for providing thoughts about how to stage development works. It's 
reasonable and we're trying to scope work for first shooting as well. 
Will keep you posted.

Thanks,
Wangda

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069619#comment-14069619
 ] 

Wangda Tan commented on YARN-796:
-

Jian Fang,
I think it's make sense to make RM has a global picture because we can prevent 
typos created by admin manually filling labels on NM config, etc.
In another hand, I think your use case is also reasonable, 
We'd better need to support both of them, as well as OR label expression. 
Will keep you posted when we made a plan.

Thanks,
Wangda

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-07-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069651#comment-14069651
 ] 

Wangda Tan commented on YARN-1198:
--

I agree with [~jlowe], [~airbots] and [~cwelch], used resource should be 
considered into headroom (which is YANR-2008). And apparently, application 
master can ask more than that number to get more resource possibly. 

I completely agree with what Jason mentioned, ignore headroom will not cause 
more problem except application itself. What I originally want to say is when 
putting headroom and gang scheduling together, it will cause deadlock problem 
and should be solved in scheduler side. But it seems kind of off-topic, let's 
ignore it here. 

Also, as Chen mentioned, we don't need consider preemption when computing 
headroom. And besides, when resource will be preempted from an app, the AM will 
receive messages about preemption requests, it should handle itself.

Thanks,
Wangda



 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1198.1.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072696#comment-14072696
 ] 

Wangda Tan commented on YARN-415:
-

Hi Eric,
Thanks for updating your patch, I think now don't have major comments, 

*Following are some minor comments:*
1) RMAppAttemptImpl.java
1.1 There're some irrelevant line changes in RMAppAttemptImpl, could you please 
revert them? Like
{code}
   RMAppAttemptEventType.RECOVER, new AttemptRecoveredTransition())
-  
+
{code}

1.2 getResourceUtilization:
{code}
+if (rmApps != null) {
+  RMApp app = rmApps.get(attemptId.getApplicationId());
+  if (app != null) {
{code}
I think the two cannot happen, we don't need check null to avoid potential bug 
here

{code}
+  ApplicationResourceUsageReport appResUsageRpt =
{code}
It's better to name it appResUsageReport since rpt is not a common abbr of 
report.

2) RMContainerImpl.java
2.1 updateAttemptMetrics:
{code}
  if (rmApps != null) {
RMApp rmApp = 
rmApps.get(container.getApplicationAttemptId().getApplicationId());
if (rmApp != null) {
{code}
Again, I think the two null check is unnecessary

3) SchedulerApplicationAttempt.java
3.1 Some rename suggestions: (Please let me know if you have better idea)
CACHE_MILLI - MEMORY_UTILIZATION_CACHE_MILLISECONDS
lastTime - lastMemoryUtilizationUpdateTime
cachedMemorySeconds - lastMemorySeconds
same for cachedVCore ...

4) AppBlock.java
Should we rename Resource Seconds: to Resource Utilization or something?

5) Test
5.1 I'm wondering if we need add a end to end test, since we changed 
RMAppAttempt/RMContainerImpl/SchedulerApplicationAttempt.
It can consist submit an application, launch several containers, and finish 
application. And it's better to make the launched application contains several 
application attempt.
While the application running, there're muliple containers running, and 
multiple containers finished. We can check if total resource utilization are 
expected.

*To your comments:*
1) 
bq. One thing I did notice when these values are cached is that there is a race 
where containers can get counted twice:
I think this can not be avoid, it should be a transient state and Jian He and I 
discussed about this long time before.
But apparently, 3 sec cache make it not only a transient state. I suggest you 
can make lastTime in SchedulerApplicationAttempt protected. And in 
FiCaSchedulerApp/FSSchedulerApp, when remove container from liveContainer (in 
completedContainer method). You can set lastTime to a negtive value like -1, 
and next time when trying to get accumulated resource utilization, it will 
recompute all container utilization.

2)
bq. I am a little reluctant to modify the type of 
SchedulerApplicationAttempt#liveContainers as part of this JIRA. That seems 
like something that could be done separately.
I think that will be fine :), because current getRunningResourceUtilization is 
called by getResourceUsageReport. And getResourceUsageReport is synchronized, 
no matter we changed liveContainers to concurrent map or not, we cannot solve 
the locking problem. 
I agree to enhance it in a separated JIRA in the future.

Thanks,
Wangda


 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073899#comment-14073899
 ] 

Wangda Tan commented on YARN-2308:
--

[~lichangleo], thanks for working on it!
Looking forward your patch.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical

 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074094#comment-14074094
 ] 

Wangda Tan commented on YARN-415:
-

Hi Eric,
Thanks for updating your patch again,

*To your comments,*
bq. I was able to remove the rmApps variable, but I had to leave the check for 
app != null because if I try to take that out, several unit tests would fail 
with NullPointerException. Even with removing the rmApps variable, I needed to 
change TestRMContainerImpl.java to mock rmContext.getRMApps().
I would like to suggest to fix such UTs instead of inserting some kernel code 
to make UT pass. I'm not sure about the effort of doing this, if the effort is 
still reasonable, we should do it.

bq. I'm still working on the unit tests as you suggested, but I wanted to get 
the rest of the patch up first so you can look at it 
No problem :), I can give some reviews about your existing changes.

*I've reviewed some details of your patch, a very minor comments,*
ApplicationCLI.java
{code}
+  appReportStr.print(\tResources used : );
{code}
We need change it to Resource Utilization as well?

I think other the patch almost LGTM, looking forward your new patch contains an 
integration test.

Thanks,
Wangda

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074249#comment-14074249
 ] 

Wangda Tan commented on YARN-2069:
--

Hi [~mayank_bansal],
Thanks for working on this again. I've taken a brief look at your patch, I 
think the general appoarch in your patch is:
- Compute a target-user-limit for a given queue,
- Preempt containers according to a user's current comsumption and 
target-user-limit,
- If more resource need to be preempted, we should consider preempt AM 
container,

I think there're couple of rules we need respect (Please let me know if you 
don't agree with any of them),
# Used resource of users in a queue after preempted should be as average as 
possible
# Before we start preempting AM containers, all task containers should be 
preempted (according to YARN-2022, keep preempting AM container as least 
priority)
# If we should preempt AM container, we should respect #1 too

For #1,
If we want to quantize the result, it should be:
{code}
i∈{user}
Let rp_i = used-resource-after-preemption of user_i
Minimize sqrt(Σ(rp - Σ(rp_i)/#{user})^2)
  i  i
{code}
In another word, we should minimize standard deviation of 
used-resource-after-preemption.

Since not all containers are equal in size, so it is possible that 
used-resource-after-preemption of a given user cannot precisely equal to 
target-user-limit. In our current logic, we will make 
used-resource-after-preemption = target-user-limit. considering following 
example,
{code}
qA: has user {V, W, X, Y, Z}; each user has one application
V: app5: {4, 4, 4, 4}, //means V has 4 containers, each one has memory=4G, 
minimum_allocation=1G
W: app4: {4, 4, 4, 4},
X: app3: {4, 4, 4, 4},
Y: app2: {4, 4, 4, 4, 4, 4},
Z: app1: {4}
target-user-limit=11,
resource-to-obtain=23

After preemption:
V: {4, 4}
W: {4, 4}
X: {4, 4}
Y: {4, 4, 4, 4, 4, 4}
Z: {4}
{code}
This imbalance happens because, for every application we preempted, may excess 
user-limit (bias), the more user we processed, the more potentially accumulated 
bias we might have. In another word, the un-balanced is linear correlated 
number-of-user-in-a-queue multiplies average-container-size

And we cannot solve this problem by preempting from user has most usage, still 
the example: 
{code}

qA: has user {V, W, X, Y, Z}; each user has one application
V: app5: {4, 4, 4, 4}, //means V has 4 containers, each one has memory=4G, 
minimum_allocation=1G
W: app4: {4, 4, 4, 4},
X: app3: {4, 4, 4, 4},
Y: app2: {4, 4, 4, 4, 4, 4},
Z: app1: {4}
target-user-limit=11,
resource-to-obtain=23

After preemption (from user has most usage, the sequence is Y-V-W-X-Z):
V: {4, 4}
W: {4, 4, 4, 4}
X: {4, 4, 4, 4}
Y: {4, 4}
Z: {4} 
{code}
Still not very balanced, the ideal result should be:
{code}

V: {4, 4, 4}
W: {4, 4, 4}
X: {4, 4, 4}
Y: {4, 4, 4}
Z: {4} 
{code}

In addition, this appoarch cannot resolve rule #2/#3 as well if 
target-user-limit is not appropriately computed. 

So I propose to do in another way,
We should recompute used-resource - marked-preempted-resource every time for a 
user after making decision of preemption each container. Maybe we can use a 
priority queue here to store (used-resource - marked-preempted-resource) here. 
And we don’t need to compute a target user limit here.
The pseudo code for preempting resource of a queue might look like:
{code}
compute resToObtain first;

// first preempt task containers
while (resToObtain  0) {
  pick a user-x which has most (used-resource - marked-preempted-resource)
  pick one container-y from user to preempted
  resToObtain -= container-y.resource
}

if (resToObtain = 0) {
  return;
}

// if more resource need to be preempted, we should preempt AM container
while (resToObtain  0  total-am-resource - marked-preempted-am-resource  
max-am-percentage) {
  // do the same thing again:
  pick a user-x which has most (used-resource - marked-preempted-resource)
  pick one container-y from user to preempted
  resToObtain -= container-y.resource 
}
{code}

With this, we can make the un-balanced linear correlated with 
average-container-size only and solved the #2/#3 rules we should respect I 
mentioned before altogether.
Mayank, do you think is it looks like a reasonable suggestion? Any other 
thoughts? [~vinodkv], [~curino], [~sunilg].

Thanks,
Wangda

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, 

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075220#comment-14075220
 ] 

Wangda Tan commented on YARN-2069:
--

Hi Mayank,
Thanks for your detailed explanation, I think I understood your approach.

However, I think the current way to compute target user limit is not correct, 
let me explain:
I found basically, your created {{computeTargetedUserLimit}} is modified from 
{{computeUserLimit}}, it will calculate as following
{code}
target_capacity = used_capacity - resToObtain
min(
max(target_capacity / #active_user,
 target_capacity * user_limit_percent),
target_capacity * user_limit_factor)),
{code}
So when a user_limit_percent is set as default (100%), it is possible that 
target_user_limit * #active_user  queue_max_capacity.
In this case, it is possible that any of the user-usage is below 
target_user_limit, but the usage of the queue is larger than guaranteed 
resource.

Let me give you an example
{code}
Assume queue capacity = 50, used_resource = 70, resToObtain = 20
So target_capacity = 50, there're 5 users in the queue
user_limit_percent = 100%, user_limit_factor = 1 (both are default)

So target_user_capacity = min(max(50 / 5, 50 * 100%), 50) = 50
User1 used 20
User2 used 10
User3 used 10
User4 used 20
User5 used 10

So all user's used capacity are  target_user_capacity
{code}

In existing logic of {{balanceUserLimitsinQueueForPreemption}}
{code}
  if (Resources.lessThan(rc, clusterResource, userLimitforQueue,
  userConsumedResource)) {
 // do preemption
  } else 
  continue;
{code}
If a user used resource  target_user_capacity, it will not be preempted.

Mayank, is that correct? Or I misunderstood your logic? Please let me know you 
comments,

Thanks,
Wangda

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2362) Capacity Scheduler: apps with requests that exceed current capacity can starve pending apps

2014-07-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075970#comment-14075970
 ] 

Wangda Tan commented on YARN-2362:
--

I think we should fix this,
{code}
   if (!assignToQueue(clusterResource, required)) {
-return NULL_ASSIGNMENT;
+break;
   }
{code}
The {{return NULL_ASSIGNMENT}} statement means: if an app submitted earlier 
cannot allocate resource in a queue, the rest of apps in the queue cannot 
allocate resource in a queue too.

The {{break}} looks better to me.

And I agree this should be a duplicate of YARN-1631

 Capacity Scheduler: apps with requests that exceed current capacity can 
 starve pending apps
 ---

 Key: YARN-2362
 URL: https://issues.apache.org/jira/browse/YARN-2362
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.4.1
Reporter: Ram Venkatesh

 Cluster configuration:
 Total memory: 8GB
 yarn.scheduler.minimum-allocation-mb 256
 yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
 App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
 It subsequently makes a request for 4.6 GB, which cannot be granted and it 
 waits.
 App 2 makes a request for 1 GB - never receives it, so the app stays in the 
 ACCEPTED state for ever.
 I think this can happen in leaf queues that are near capacity.
 The fix is likely in LeafQueue.java assignContainers near line 861, where it 
 returns if the assignment would exceed queue capacity, instead of checking if 
 requests for other active applications can be met.
 {code:title=LeafQueue.java|borderStyle=solid}
// Check queue max-capacity limit
if (!assignToQueue(clusterResource, required)) {
 -return NULL_ASSIGNMENT;
 +break;
}
 {code}
 With this change, the scenario above allows App 2 to start and finish while 
 App 1 continues to wait.
 I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-07-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076042#comment-14076042
 ] 

Wangda Tan commented on YARN-1707:
--

Thanks for uploading the patch [~curino], [~subru]. They're great additions to 
current CapacityScheduler. I took a look at your patch,

*First I have a couple of questions about its background, especially 
{{PlanQueue}}/{{ReservationQueue}} in this patch. I think understanding 
background is important for me to get a whole picture of this patch. What I can 
understand is,*
# {{PlanQueue}} can have a normal {{ParentQueue}} as its parent, but all 
children of {{PlanQueue}} can only be {{ReservationQueue}}. Is it possible that 
multiple {{PlanQueue}} exist in the cluster?
# {{PlanQueue}} is initially setup in configuration, as same as 
{{ParentQueue}}, it has absolute capacity, etc. But different from 
{{ParentQueue}}, it has user-limit/user-limit-factor, etc.
# {{ReservationQueue}} is dynamically initialized by PlanFollower, when a new 
reservationId acquired, it will create a new {{ReservationQueue}} accordingly
# {{PlanFollower}} can dynamically adjust queue size of {{ReservationQueue}}s 
to make resource reservation can be satisfied.
# Is it possible that sum of reserved resource exceeds limit of 
{{PlanQueue}}/{{ReservationQeueu}} and preemption triggered?
# How to deal with RM restart? It is possible that RM restart during resource 
reservation, we may need to consider how to persistent such queues

Hope you could share your ideas about them.

*For requirement of this ticket (copied from JIRA),*
# create queues dynamically
# destroy queues dynamically
# dynamically change queue parameters (e.g., capacity)
# modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
instead of ==100%
# move app across queues

I found #1-#3 are dedicated used by {{PlanQueue}}, {{Reservation}}. IMHO, it 
should be better to added them to CapacityScheduler and don't couple them with 
ReservationSystem, but I cannot think about other solid senarios can leverage 
them. I hope to get feedbacks from community before we couple them with 
ReservationSystem. And as mentioned by [~acmurthy], can we merge add queue to 
existing add new queue mechanism?
#4 should be only valid in {{PlanQueue}}. Because if we change this behavior in 
{{ParentQueue}}, it is possible that some careless admin will mis-setting 
capacities of queues under a parent queue, if sum of their capacity don't 
equals to 1, some resource may not be able to be used by applications. 

*Some other comments (Majorly about move app because we may need consider scope 
of create/destory queues first):*
1) I think we need consider how moving apps across queues work with YARN-1368. 
We can change queue of containers from queueA to queueB, but with YARN-1368, 
during RM restart, container will report it is in queueA (we don't sync them to 
NM when do moveApp operation). I hope [~jianhe] could share some thoughts about 
this as well.
2) Move application in CapacityScheduler need call finishApplication in 
resource queue and submitApplication in target queue to make QueueMetrics 
correct. And submitApplication will check ACL of target queue as well.
3) Should we respect MaxApplicationsPerUser in target queue when trying to move 
app? IMHO, we can stop moving app if MaxApplicationsPerUser reached in target 
queue.

Thanks,
Wangda

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076062#comment-14076062
 ] 

Wangda Tan commented on YARN-2008:
--

Hi Craig,
As we discussed in YARN-1198, I think we should consider resource used by a 
queue's siblings when computing headroom, I took a look at your patch again, 
some comments:

We first need think about how to calculate headroom in general, I think 
headroom is (concluded from sub JIRAs of YARN-1198),
{code}
queue_available = min(clusterResource - used_by_sibling_of_parents - 
used_by_this_queue, queue_max_resource)
headroom = min(queue_available - available_resource_in_blacklisted_nodes, 
user_limit)
{code}
So I think this JIRA is focus on computing {{used_by_sibling_of_parents}}, is 
it?

I think the general appoarch looks good to me, except In CSQueueUtils.java, 
(will include review of tests in next iteration):
1) 
{code}
  //sibling used is parent used - my used...
  float siblingUsedCapacity = Resources.ratio(
 resourceCalculator,
 Resources.subtract(parent.getUsedResources(), 
queue.getUsedResources()),
 parentResource);
{code}
It seems to me this computing not robust enough when parent resource is empty, 
no matter it's an zero-capacity queue or sibling of it used 100% of cluster.
It's better to add an edge test case to prevent such zero-division as well.

2)
It's better to explicitly cap {{return absoluteMaxAvail}} in range of \[0~1\] 
to prevent errors float computation.

Thanks,
Wangda

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-07-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077254#comment-14077254
 ] 

Wangda Tan commented on YARN-1707:
--

Hi [~subru], 
Thanks for your elaboration, it is very helpful for me to understand the 
background.

Regards,
Wangda


 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077262#comment-14077262
 ] 

Wangda Tan commented on YARN-415:
-

Hi [~eepayne],
Thanks for updating your patch,
For e2e test, I think we can do this way, you can refer to tests in 
TestRMRestart
Using MockRM/MockAM can do such test, even though it's not a complete e2e test, 
but most logic are included in it. I suggest we could cover following cases:
{code}
* Create an app, before submit AM, resource utilization should be 0
* Submit AM, while AM running, we can get its resource utilization  0
* Allocate some container, and finish them, check total resource utilization
* Finish application attempt, and check total resource utilization
* Start a new application attempt, check if resource utilization of previous 
attempt is added to total resource utilization.
* Check if resource utilization can be persist/read during RM restart
{code}
Do you have any comments on this?

Thanks,
Wangda

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-07-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077296#comment-14077296
 ] 

Wangda Tan commented on YARN-1707:
--

Hi [~curino],
Thanks for your reply,
For regarding how the patch matches the JIRA:
Since I don't have other solid use cases in my mind that others besides 
{{ReservationSystem}} can leverage these features, I don't have strong opinions 
to merge such dynamic behaviors into {{ParentQueue}}, {{LeafQueue}}. Let's wait 
for more feedbacks.
I agree that we can consider queue capacity as a weight, it will be easier 
for users to configure, and it's a backward-compatible change also (except it 
will not throw exception when sum of children of a {{ParentQueue}} doesn't 
equals to 100).

bq. As I was mentioning in my previous comment, this is likely fine for the 
limited usage we will make of this from ReservationSystem
I think for moving application across queue is not a ReservationSystem specific 
change. I would suggest to check it will not violate restrictions in target 
queue before moving it.

Thanks,
Wangda

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2215) Add preemption info to REST/CLI

2014-07-28 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2215:
-

Assignee: Kenji Kikushima

 Add preemption info to REST/CLI
 ---

 Key: YARN-2215
 URL: https://issues.apache.org/jira/browse/YARN-2215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Kenji Kikushima
 Attachments: YARN-2215.patch


 As discussed in YARN-2181, we'd better to add preemption info to RM RESTful 
 API/CLI to make administrator/user get more understanding about preemption 
 happened on app/queue, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2215) Add preemption info to REST/CLI

2014-07-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077303#comment-14077303
 ] 

Wangda Tan commented on YARN-2215:
--

Hi [~kj-ki],
Thanks for working on this, I've assigned this JIRA to you. 
I think the fields you added should be fine. With the scope of this JIRA, I 
think it's better to add CLI support as well. Please submit patch to kickoff 
jenkins when you completed.

Wangda


 Add preemption info to REST/CLI
 ---

 Key: YARN-2215
 URL: https://issues.apache.org/jira/browse/YARN-2215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Kenji Kikushima
 Attachments: YARN-2215.patch


 As discussed in YARN-2181, we'd better to add preemption info to RM RESTful 
 API/CLI to make administrator/user get more understanding about preemption 
 happened on app/queue, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080388#comment-14080388
 ] 

Wangda Tan commented on YARN-2008:
--

Hi [~cwelch],
Thanks for uploading patch, +1 for putting isInvalidDivisor to 
{{ResourceCalculator}}. I would suggest to add some resource usage to L2Q1 in 
{{testAbsoluteMaxAvailCapacityWithUse}}, and see if L2Q2 can get correct 
maxAbsoluteAvailableCapacity.

Thanks,
Wangda

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-31 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081718#comment-14081718
 ] 

Wangda Tan commented on YARN-2008:
--

Hi [~cwelch],
I found the patch you updated is identical with *.3.patch, could you please 
check?

Thanks

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-31 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081844#comment-14081844
 ] 

Wangda Tan commented on YARN-2008:
--

Hi [~cwelch],
Thanks for updating, now tests can cover all cases I can think about, 
A very minor comment:
Could you please add a small ε for all {{assertEquals}} like following?
bq. +assertEquals( 0.1f, result, 0.01f);

Thanks,
Wangda 



 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-31 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081845#comment-14081845
 ] 

Wangda Tan commented on YARN-2069:
--

Hi [~mayank_bansal],
Thanks for uploading, reviewing it now.

Wangda

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, 
 YARN-2069-trunk-9.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-08-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084245#comment-14084245
 ] 

Wangda Tan commented on YARN-2008:
--

Thanks [~cwelch] for updating, 
LGTM, +1

Wangda

 CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
 queue structure 
 -

 Key: YARN-2008
 URL: https://issues.apache.org/jira/browse/YARN-2008
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Craig Welch
 Attachments: YARN-2008.1.patch, YARN-2008.2.patch, YARN-2008.3.patch, 
 YARN-2008.4.patch, YARN-2008.5.patch, YARN-2008.6.patch, YARN-2008.7.patch


 If there are two queues, both allowed to use 100% of the actual resources in 
 the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
 there is not actual space available. If we use current method to get 
 headroom, CapacityScheduler thinks there are still available resources for 
 users in Q1 but they have been used by Q2. 
 If the CapacityScheduelr has a hierarchy queue structure, it may report 
 incorrect queueMaxCap. Here is a example
  ||||rootQueue|| ||
 |  |   /   |  
   \ |
 |  L1ParentQueue1  |  |
 L1ParentQueue2|
 |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
 in minimum of its parent)|
 |/   | \ ||  
 |  L2LeafQueue1 |L2LeafQueue2 |  | 
 |(50% of its parent) |  (50% of its parent in minimum) |   |
 When we calculate headroom of a user in L2LeafQueue2, current method will 
 think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
 However, without checking L1ParentQueue1, we are not sure. It is possible 
 that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
 L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-08-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14084279#comment-14084279
 ] 

Wangda Tan commented on YARN-2069:
--

Hi [~mayank_bansal],
Thanks for your patience.

I've just read through your new patch. 

After #1/#2, if there's more resource need preempt, AM container will be 
preempted. Is it corect? Please let me know if I misread your approach.

*I think we should discuss scope of this JIRA first, I'm a little confused 
after thought about it.*

According to the desc of this JIRA,
we need make sure: (Assume we calculated {{target-user-limit}} already).
*REQ #1:* When consider preempt a container from user-x, if {{used-resource - 
marked-preempted-resource}} of user-x already = {{target-user-limit}}. We need 
make sure, no any other user in the queue has {{used-resource - 
marked-preempted-resource}}  {{target-user-limit}}.
*REQ #2:* When we have to preempt an AM container, we need make sure #1 too.

And as commented by [~vinodkv]: 
https://issues.apache.org/jira/browse/YARN-2069?focusedCommentId=14064047page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14064047.
*REQ #3:* User's resource after preemption should be as balanced as possible 
around {{target-user-limit}}

Do you agree with these requirements? I think we should update requirements to 
JIRA desc if we decided.

* My understanding of your new patch consists of two phases:*
1. {{distributePreemptionforUsers}} will do preemption to enforce 
{{target-user-limit}} for each user.
2. If there's more resource need preempted, will call 
{{distributePreemptionforUsers}} to make sure {{resToObtain}} is distributed to 
{{resToObtain}} divide {{#active-user}} in the queue.

I think phase-1 can enforce REQ#1. But phase-2 cannot enforce REQ#3. And also, 
REQ#2 cannot be satisfied in the patch.

Let me give you an example about why REQ#3 not satisfied, similar to Vinod's 
example:
{code}
Queue has guaranteed resource = 30%, now it used 60%, want to shrink it down to 
40%.
Container size are equal, which is 3% of the cluster.
Now 5 app in the queue, user-limit configured to 20%. So expected resource are 
{8%, 8%, 8%, 8%, 8%}.

Before preemption:
{15%, 9%, 12%, 12%, 12%}

It is possible after preemption in your current appoarch:
{15%, 6%, %6, %6, %6} (total is 39%)
{code}

Sometimes we cannot get all user's resource exactly same to 
{{target-user-limit}} because contianer size may not divisible by 
{{target-user-limit}}. But we can do better in following example
{code}
After preemption:
{9%, 9%, %9, %6, %6} (total is 39%)
{code}

The unbalanced happened caused by accumulated bias I mentioned in my comment: 
https://issues.apache.org/jira/browse/YARN-2069?focusedCommentId=14074249page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14074249


Thanks,
Wangda

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, 
 YARN-2069-trunk-9.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087723#comment-14087723
 ] 

Wangda Tan commented on YARN-2378:
--


Hi [~subru],
Thanks for uploading patch, I took a look at your patch.

As mentioned by [~vvasudev], there's an other JIRA (YARN-2248) related to 
moving. I think two JIRAs has different advantages, I hope you can decide how 
to merge your works.
- YARN-2378 covers RMApp related changes, which should be done while moving
- YARN-2248 covers more tests for queue-metrics.

I think another major difference is, YARN-2248 will check queue capacity before 
moving and YARN-2378 not. I had a discussion with [~curino] offline about this, 
here I paste what he said:
{code}
Imagine I have a busy cluster an want to migrate apps from queue A to queue B. 
Since we do not provide any transactional semantics from the CLI it would be 
quite hard to make sure I can move an app (even if I kill everything in a queue 
B, and then invoke move A-B, more apps might show up and crowd the target 
queue B before I can successfully move).   Having move to be more sturdy and 
succeed right away, and enhance preemption (if needed) to repair invariants 
seems a better option in this scenario.
I think preemption already would already enforce max capacity, other active 
JIRAs should deal with user-limit as well.
More generally I think eventually preemption can be our universal 
rebalancer/enforcer, allowing us to play a bit more fast an loose with 
move/resizing of queues.
{code}
I agree with this, another example is when refresh queue capacity, some queues 
may be shrunk to lower than its guaranteed/used resource. We will not stop such 
queue refresh, and preemption will also take care this.

Some other comments about YARN-2378
1) I think we should implement state store in move transition:
{code}
  // TODO: Write out change to state store (YARN-1558)
  // Also take care of RM failover
  moveEvent.getResult().set(null);
{code}

2) There’re lots of test failure, I’m afraid it broke some major logic, could 
you please check it?

Will include test review in next iteration.

Thanks,
Wangda

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2248) Capacity Scheduler changes for moving apps between queues

2014-08-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088912#comment-14088912
 ] 

Wangda Tan commented on YARN-2248:
--

[~keyki], I agree we should get move-app committed in 2.6.0.

 Capacity Scheduler changes for moving apps between queues
 -

 Key: YARN-2248
 URL: https://issues.apache.org/jira/browse/YARN-2248
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Janos Matyas
Assignee: Janos Matyas
 Fix For: 2.6.0

 Attachments: YARN-2248-1.patch, YARN-2248-2.patch, YARN-2248-3.patch


 We would like to have the capability (same as the Fair Scheduler has) to move 
 applications between queues. 
 We have made a baseline implementation and tests to start with - and we would 
 like the community to review, come up with suggestions and finally have this 
 contributed. 
 The current implementation is available for 2.4.1 - so the first thing is 
 that we'd need to identify the target version as there are differences 
 between 2.4.* and 3.* interfaces.
 The story behind is available at 
 http://blog.sequenceiq.com/blog/2014/07/02/move-applications-between-queues/ 
 and the baseline implementation and test at:
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/ExtendedCapacityScheduler.java#L924
 https://github.com/sequenceiq/hadoop-common/blob/branch-2.4.1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/a/TestExtendedCapacitySchedulerAppMove.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-08-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088965#comment-14088965
 ] 

Wangda Tan commented on YARN-2249:
--

Hi [~jianhe],
Thanks for working on the patch,
I've read your patch, several comments/questions

1) I haven't followed work preserving restart discussions for a long time. How 
current RM handle the problem: after RM restarted, it started allocate 
resource, and NM report container recover, but there's no resource available in 
a node/queue?
I remember we've discussed this topic while you were working on YARN-1368, 
which is RM will not allocate new resource for x secs after restart for NM can 
reconnect and recover containers. If you chose that appoarch, we can cache 
outstanding container release request until x secs after restart reached.
And could you elaborate why you use NM liveness expire time? Can we improve 
this?

2) It seems to me using
{code}
+this.pendingRelease =
+CacheBuilder.newBuilder().expireAfterWrite
{code}
Is not a good enough because it will cache every release request from AM. 
Actually, we only need cache release request for a period of time after AM 
reconnected to RM. After the time reaches, release logic should behave as 
before.

3) I think we shouldn't {{logFailure}} for rmContainer not found in this case. 
IMHO, we should {{logFailure}} when release request removing from cache instead.

4) We should notify AM about container completed message when we decide to not 
recover a container.
And we should add this to test as well.

5) Test,
Can we wait for some state instead of {{Thread.sleep(3000);}}?

Thanks,
Wangda

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-08-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089002#comment-14089002
 ] 

Wangda Tan commented on YARN-415:
-

Hi Eric,
Thanks for your hard working to add to add these e2e tests, 

bq. However, I had trouble setting up a test with more than one attempt for the 
same app. I think I covered the rest.
I suggest you can refer to 
{{TestAMRestart#testAMRestartWithExistingContainers}} as an example. Please let 
me know if you still have problem to set up multi-attempt test.

Some minor suggestions:
1)
{code}
+  private final static File TEMP_DIR = new File(System.getProperty(
+  test.build.data, /tmp), decommision);
{code}
I think it didn't use by the test

2)
bq. +Assert.assertTrue(YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS  1);
We don't need this assert

3)
bq. +System.out.println(EEP 001);
It's better to remove such personal debug-info.

4)
{code}
+conf.setInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
+YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
{code}
It's better to put such logic to {{setup}}

And please update your patch against trunk,

Thanks,
Wangda


 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2014-08-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090098#comment-14090098
 ] 

Wangda Tan commented on YARN-807:
-

Hi [~sandyr],
While reading comment of YARN-2385: 
https://issues.apache.org/jira/browse/YARN-2385?focusedCommentId=14089936page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14089936.
I found the difference between CapacityScheduler/FairScheduler getAppsInQueues 
was made by this patch, which is FairScheduler will return all apps, and 
CapacityScheduler will only return active apps. Is there any special 
considerations for made this different? Do you think is it fine to change 
CapacityScheduler's behavior to return active+pending apps?

Hope to get your idea about this :)

Thanks,
Wangda

 When querying apps by queue, iterating over all apps is inefficient and 
 limiting 
 -

 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, 
 YARN-807-4.patch, YARN-807.patch


 The question which apps are in queue x can be asked via the RM REST APIs, 
 through the ClientRMService, and through the command line.  In all these 
 cases, the question is answered by scanning through every RMApp and filtering 
 by the app's queue name.
 All schedulers maintain a mapping of queues to applications.  I think it 
 would make more sense to ask the schedulers which applications are in a given 
 queue. This is what was done in MR1. This would also have the advantage of 
 allowing a parent queue to return all the applications on leaf queues under 
 it, and allow queue name aliases, as in the way that root.default and 
 default refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue

2014-08-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090100#comment-14090100
 ] 

Wangda Tan commented on YARN-2385:
--

Hi Subru,
I've commented on YARN-807, 
https://issues.apache.org/jira/browse/YARN-807?focusedCommentId=14090098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14090098
 about this.
I hope we can get some suggestion from [~sandyr] as well.

Thanks,
Wangda

 Adding support for listing all applications in a queue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: abstractyarnscheduler

 This JIRA proposes adding a method in AbstractYarnScheduler to get all the 
 pending/active applications. Fair scheduler already supports moving a single 
 application from one queue to another. Support for the same is being added to 
 Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition 
 of this method, we can transparently add support for moving all applications 
 from source queue to target queue and draining a queue, i.e. killing all 
 applications in a queue as proposed by YARN-2389



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2014-08-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090183#comment-14090183
 ] 

Wangda Tan commented on YARN-807:
-

Hi [~sandyr],
Thanks for your comment. If you think it's a bug, we can resolve it in 
YARN-2385. 
If desirable behavior of this patch is  to want running/completed app returned 
when querying by queue. We might need go to check all RMApp in RMContext, 
because apps will be removed from scheduler after it completed. We may need to 
create a Mapqueue-name, app-id in RMContext.

Do you think is it a doable approach? 

Thanks,
Wangda

 When querying apps by queue, iterating over all apps is inefficient and 
 limiting 
 -

 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, 
 YARN-807-4.patch, YARN-807.patch


 The question which apps are in queue x can be asked via the RM REST APIs, 
 through the ClientRMService, and through the command line.  In all these 
 cases, the question is answered by scanning through every RMApp and filtering 
 by the app's queue name.
 All schedulers maintain a mapping of queues to applications.  I think it 
 would make more sense to ask the schedulers which applications are in a given 
 queue. This is what was done in MR1. This would also have the advantage of 
 allowing a parent queue to return all the applications on leaf queues under 
 it, and allow queue name aliases, as in the way that root.default and 
 default refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-08-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090250#comment-14090250
 ] 

Wangda Tan commented on YARN-415:
-

Hi [~eepayne],
It's great to have so much cleanups in your new patch, I think it almost looks 
good to me,
One minor comment:
I found {{testUsageAfterAMRestartWithMultipleContainers}} and 
{{testUsageAfterAMRestartKeepContainers}} are very similar, could you find a 
way to create a common test method for them, only difference is passing a 
boolean keepContainer as parameter? 

Thanks,
Wagda

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090314#comment-14090314
 ] 

Wangda Tan commented on YARN-2378:
--

Hi [~subru],
It's good to have more test from YARN-2248, now I think it covers most cases,
Several comments about tests:
1) testMoveAppForMoveToQueueCannotRunApp:
I think the name may not be precise enough. Actually, it means moving an app 
from a small queue (cannot allocate more resource) to a larger queue. I suggest 
you can change the name
And the comment is incorrect:
{code}
+// task_0_0 task_1_0 allocated, used=4G 
+nodeUpdate(nm_0); 
{code}
The used should = 2G here.

2) testMoveAllApps:
bq. +Thread.sleep(100);
I think we don't need sleep here, moveApplication is a synchronized call.

Thanks,
Wangda

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-08-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090318#comment-14090318
 ] 

Wangda Tan commented on YARN-2249:
--

Hi [~jianhe],
Thanks for update, several minor comments:

AbstractYarnScheduler.java
1.
{code}
+  private final Object object = new Object();
{code}
Please change this to a more meaningful name.

2.
{code}
+}, yarnConf.getInt(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
+  YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS));
{code}
I found this is used several times, it's better to make it as a member of AYS

Thanks,
Wangda

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2014-08-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090440#comment-14090440
 ] 

Wangda Tan commented on YARN-807:
-

bq. It's also worth considering only holding this map for completed 
applications, so we don't need to keep two maps for running applications.

I suggest we can do this way:
1) Rename scheduler side getAppsInQueue to getRunningAppsInQueue
2) Create MapQueue-name, SetApp-ID in RMContext, it will contain 
completed/running apps. The benefit to store them separately is we don't need 
query two places while client want to get applications. And 
getRunningAppsInQueue in scheduler side will be used when we need query running 
apps in queue like YARN-2378.

Thanks,
Wangda

 When querying apps by queue, iterating over all apps is inefficient and 
 limiting 
 -

 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, 
 YARN-807-4.patch, YARN-807.patch


 The question which apps are in queue x can be asked via the RM REST APIs, 
 through the ClientRMService, and through the command line.  In all these 
 cases, the question is answered by scanning through every RMApp and filtering 
 by the app's queue name.
 All schedulers maintain a mapping of queues to applications.  I think it 
 would make more sense to ask the schedulers which applications are in a given 
 queue. This is what was done in MR1. This would also have the advantage of 
 allowing a parent queue to return all the applications on leaf queues under 
 it, and allow queue name aliases, as in the way that root.default and 
 default refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-807) When querying apps by queue, iterating over all apps is inefficient and limiting

2014-08-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090455#comment-14090455
 ] 

Wangda Tan commented on YARN-807:
-

Hi Sandy, 
Thanks for your elaboration. As you said, I agree we need to go through 
scheduler according to two capabilities you mentioned.
Maybe a possible way is saving completed app in leaf queue as you mentioned, I 
remember now YARN will evict some apps when total number of apps exceeds a 
configuration number (like 10,000). We should do such evicting for completed 
app in leaf queue as well.

 When querying apps by queue, iterating over all apps is inefficient and 
 limiting 
 -

 Key: YARN-807
 URL: https://issues.apache.org/jira/browse/YARN-807
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.3.0

 Attachments: YARN-807-1.patch, YARN-807-2.patch, YARN-807-3.patch, 
 YARN-807-4.patch, YARN-807.patch


 The question which apps are in queue x can be asked via the RM REST APIs, 
 through the ClientRMService, and through the command line.  In all these 
 cases, the question is answered by scanning through every RMApp and filtering 
 by the app's queue name.
 All schedulers maintain a mapping of queues to applications.  I think it 
 would make more sense to ask the schedulers which applications are in a given 
 queue. This is what was done in MR1. This would also have the advantage of 
 allowing a parent queue to return all the applications on leaf queues under 
 it, and allow queue name aliases, as in the way that root.default and 
 default refer to the same queue in the fair scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered

2014-08-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14091658#comment-14091658
 ] 

Wangda Tan commented on YARN-2249:
--

Jian, Thanks for update, 
My last comment is,
Could you rename {{mutex}} to {{pendingReleaseMutex}} or something?

Wangda

 RM may receive container release request on AM resync before container is 
 actually recovered
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092301#comment-14092301
 ] 

Wangda Tan commented on YARN-2308:
--

[~lichangleo],
Thanks for working on this,
I took a quick scan at your patch, I think the general approach should be fine. 
Some minor suggestions:
1) 
{code}
+if (application==null) {
+  LOG.info(can't retireve application attempt);
+  return;
+}
{code}
Please leave a space before and after ==,
Use LOG.error instead of info

2) Test code
2.1
bq. +System.out.println(testing queue change!!!);
Remove this plz,

2.2
{code}
+conf.setBoolean(CapacitySchedulerConfiguration.ENABLE_USER_METRICS, true);
+conf.set(CapacitySchedulerConfiguration.RESOURCE_CALCULATOR_CLASS,
{code}
We may not need this too

2.3
{code}
+// clear queue metrics
+rm1.clearQueueMetrics(app1);
{code}
Also this

2.4
It's better to wait and check for app state transition to Failed after it 
rejected

2.5
I think this test isn't work-preserving restart specific problem, it's better 
to place the test in TestRMRestart

Please let me know if you have any comment on them.

Thanks,
Wangda




 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-08-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092303#comment-14092303
 ] 

Wangda Tan commented on YARN-415:
-

[~eepayne],
bq. I created a common method that both of these call.
Thanks!

bq. I also noticed that testUsageWithMultipleContainers was doing similar 
things to testUsageAfterRMRestart, so I combined them both into 
testUsageWithMultipleContainersAndRMRestart.
Good catch,

I don't have further comments, but would you please check test failure above? 

Thanks,
Wangda

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093695#comment-14093695
 ] 

Wangda Tan commented on YARN-2378:
--

[~subru],
I've ran the previous failed test locally, it passed. And as same as the latest 
Jenkins result. 
I think LGTM, +1.
[~jianhe], would you like to take a look at this?

Thanks,
Wangda

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-08-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093696#comment-14093696
 ] 

Wangda Tan commented on YARN-415:
-

[~jianhe], would you like to take a look at it?

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-11 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093714#comment-14093714
 ] 

Wangda Tan commented on YARN-2308:
--

[~lichangleo],
Thanks for updating, I think following line is not necessary
bq. +conf.setBoolean(YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED, 
true);
I just tried in my local, remove it should be fine. Besides this, LGTM, +1.

[~zjshen], do you have take a look at this?

Thanks,
Wangda

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-08-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095039#comment-14095039
 ] 

Wangda Tan commented on YARN-2414:
--

Assigned it to myself, will post a patch soon

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan

 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: java.lang.NullPointerException
   at 
 

[jira] [Assigned] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-08-12 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2414:


Assignee: Wangda Tan

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan

 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.AppBlock.render(AppBlock.java:116)
   at 
 

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095050#comment-14095050
 ] 

Wangda Tan commented on YARN-2308:
--

bq. I think we should catch exception in following code and return Failed 
directly.
Currently the CapacityScheduler will create AppRejectedEvent when found queue 
not existed while recovering or submit.
{code}
if (queue == null) {
  String message = Application  + applicationId + 
   submitted by user  + user +  to unknown queue:  + queueName;
  this.rmContext.getDispatcher().getEventHandler()
  .handle(new RMAppRejectedEvent(applicationId, message));
  return;
}
{code}
We cannot catch exception here, because now exception throw:
{code}
  // Add application to scheduler synchronously to guarantee scheduler
  // knows applications before AM or NM re-registers.
  app.scheduler.handle(new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user, true));
{code}

bq. That's what I meant. RMApp can choose to enter FAILED state directly and no 
need to add attempt any more.
It will not add attempt here, because it will get rejected directly

bq. RM_WORK_PRESERVING_RECOVERY_ENABLED=true reflects the failure case in the 
description, but I'm wondering why RM_WORK_PRESERVING_RECOVERY_ENABLED=false, 
the test is going to fail. App will anyway be rejected, won't it?
I've tried this in my local again, it can get passed. Set 
RM_WORK_PRESERVING_RECOVERY_ENABLED=false is enough to cover what we want to 
verify.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095055#comment-14095055
 ] 

Wangda Tan commented on YARN-2308:
--

Typo:
bq. We cannot catch exception here, because now exception throw:
Should be We cannot catch exception here, because *no* exception throw:


 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2385) Adding support for listing all applications in a queue

2014-08-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096329#comment-14096329
 ] 

Wangda Tan commented on YARN-2385:
--

I think we might not need maintain completed apps CS and Fair after thought 
about it. Maintain such fields is not original responsibility designed for 
scheduler.
And for now, user can get completed container via REST API, that should be able 
to cover most use cases. 


 Adding support for listing all applications in a queue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Karthik Kambatla
  Labels: abstractyarnscheduler

 This JIRA proposes adding a method in AbstractYarnScheduler to get all the 
 pending/active applications. Fair scheduler already supports moving a single 
 application from one queue to another. Support for the same is being added to 
 Capacity Scheduler as part of YARN-2378 and YARN-2248. So with the addition 
 of this method, we can transparently add support for moving all applications 
 from source queue to target queue and draining a queue, i.e. killing all 
 applications in a queue as proposed by YARN-2389



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2411) [Capacity Scheduler] support simple user and group mappings to queues

2014-08-18 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100316#comment-14100316
 ] 

Wangda Tan commented on YARN-2411:
--

Ram, Thanks for updating,
LGTM, +1.

Wangda

 [Capacity Scheduler] support simple user and group mappings to queues
 -

 Key: YARN-2411
 URL: https://issues.apache.org/jira/browse/YARN-2411
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Reporter: Ram Venkatesh
Assignee: Ram Venkatesh
 Attachments: YARN-2411-2.patch, YARN-2411.1.patch, YARN-2411.3.patch, 
 YARN-2411.4.patch, YARN-2411.5.patch


 YARN-2257 has a proposal to extend and share the queue placement rules for 
 the fair scheduler and the capacity scheduler. This is a good long term 
 solution to streamline queue placement of both schedulers but it has core 
 infra work that has to happen first and might require changes to current 
 features in all schedulers along with corresponding configuration changes, if 
 any. 
 I would like to propose a change with a smaller scope in the capacity 
 scheduler that addresses the core use cases for implicitly mapping jobs that 
 have the default queue or no queue specified to specific queues based on the 
 submitting user and user groups. It will be useful in a number of real-world 
 scenarios and can be migrated over to the unified scheme when YARN-2257 
 becomes available.
 The proposal is to add two new configuration options:
 yarn.scheduler.capacity.queue-mappings-override.enable 
 A boolean that controls if user-specified queues can be overridden by the 
 mapping, default is false.
 and,
 yarn.scheduler.capacity.queue-mappings
 A string that specifies a list of mappings in the following format (default 
 is  which is the same as no mapping)
 map_specifier:source_attribute:queue_name[,map_specifier:source_attribute:queue_name]*
 map_specifier := user (u) | group (g)
 source_attribute := user | group | %user
 queue_name := the name of the mapped queue | %user | %primary_group
 The mappings will be evaluated left to right, and the first valid mapping 
 will be used. If the mapped queue does not exist, or the current user does 
 not have permissions to submit jobs to the mapped queue, the submission will 
 fail.
 Example usages:
 1. user1 is mapped to queue1, group1 is mapped to queue2
 u:user1:queue1,g:group1:queue2
 2. To map users to queues with the same name as the user:
 u:%user:%user
 I am happy to volunteer to take this up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-08-20 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:


Attachment: YARN-796.node-label.demo.patch.1

Hi guys,
Thanks for your input in the past several weeks, I implemented a patch based 
the design doc: 
https://issues.apache.org/jira/secure/attachment/12662291/Node-labels-Requirements-Design-doc-V2.pdf
 during the past two weeks. Really appreciate if you can take a look. The patch 
is: YARN-796.node-label.demo.patch.1 (I made a longer name to not confuse with 
other patches).

*Already included in this patch:*
* Protocol changes for ResourceRequest, ApplicationSubmissionContext (leveraged 
contribution from Yuliya's patch, thanks). also updated AMRMClient
* RMAdmin changes to dynamically update labels of node (add/set/remove), also 
updated RMAdmin CLI
* Capacity scheduler related changes including: 
** headroom calculation, preemption, container allocation respect labels. 
** Allow user set list of labels of a queue can access in capacity-scheduler.xml
* A centralized node label manager can be updated dynamically to add/set/remove 
labels, and can store labels to file system. It will work with RM restart/HA 
scenario (Similar to RMStateStore).
* Support set {{--labels}} option in distributed shell, we can use distributed 
shell to test this feature
* Related unit tests

*Will include later:*
* RM REST APIs for node label
* Distributed configuration (set labels in yarn-site.xml of NMs)
* Support labels in FairScheduler

*Try this patch*
1. Create a capacity-scheduler.xml with labels accessible on queues
{code}
   root
   /  \
  ab
  ||
  a1   b1

a.capacity = 50, b.capacity = 50 
a1.capacity = 100, b1.capacity = 100

And a.label = red,blue; b.label = blue,green
property
nameyarn.scheduler.capacity.root.a.labels/name
valuered, blue/value
/property

property
nameyarn.scheduler.capacity.root.b.labels/name
valueblue, green/value
/property)
{code}
This means queue a (And its sub queues) CAN access label red and blue; queue b 
(And its sub queues) CAN access label blue and green

2. Create a node-labels.json locally, this is initial labels on nodes, (you can 
dynamically change it using rmadmin CLI while RM is running, you don't have to 
do it). And set {{yarn.resourcemanager.labels.node-to-label-json.path}} to 
{{file:///path/to/node-labels.xml}}
{code}
{
   host1:{
   labels:[red, blue]
   },
   host2:{
   labels:[blue, green]
   }
}
{code}
This sets red/blue labels on host1, and sets blue/green labels on host2

3. Start Yarn cluster (if you have several nodes in the cluster, you need 
launch HDFS to use distributed shell)
* Submit a distributed shell:
{code}
hadoop jar path/to/*distributedshell*.jar 
org.apache.hadoop.yarn.applications.distributedshell.Client -shell_command 
hostname -jar path/to/*distributedshell*.jar -num_containers 10 -labels red  
blue -queue a1
{code}
This will run a distributed shell, launch 10 containers, and the command run is 
hostname, asked label is red  blue, all containers will be allocated on 
host1,

Some other examples:
* {{-queue a1 -labels red  green}}, this will be rejected, because queue a1 
cannot access label green
* {{-queue a1 -labels blue}}, some containers will be allocated on host1, and 
some others will be allocated to host2, because both of host1/host2 contain 
blue label
* {{-queue b1 -labels green}}, all containers will be allocated on host2

4. Dynamically update labels using rmadmin CLI
{code}
// dynamically add labels x, y to label manager
yarn rmadmin -addLabels x,y

// dynamically set label x on node1, and label y on node2
yarn rmadmin -setNodeToLabels node1:x;node2:x,y

// remove labels from label manager, and also remove labels on nodes
yarn rmadmin -removeLabels x
{code}

*Two more examples for node label*
1. Labels as constraints:
{code}
Queue structure:
root
   / | \
  a  b  c

a has label: WINDOWS, LINUX, GPU
b has label: WINDOWS, LINUX, LARGE_MEM
c doesn't have label

25 nodes in the cluster:
h1-h5:   LINUX, GPU
h6-h10:  LINUX,
h11-h15: LARGE_MEM, LINUX
h16-h20: LARGE_MEM, WINDOWS
h21-h25: empty
{code}
If you want LINUX  GPU resource, you should submit to queue-a, and set 
label in Resource Request to LINUX  GPU
If you want LARGE_MEM resource, and don't mind its OS, you can submit to 
queue-b, and set label in Resource Request to LARGE_MEM
If you want to allocate on nodes don't have labels (h21-h25), you can submit it 
to any queue, and leave label in Resource Request empty

2. Labels to hard partition cluster
{code}
Queue structure:
root
   / | \
  a  b  c

a has label: MARKETING
b has label: HR
c has label: RD

15 nodes in the cluster:
h1-h5:   MARKETING
h6-h10:  HR
h11-h15: RD
{code}
Now cluster is hard partitioned to 3 small clusters, h1-h5 for marketing, only 
queue-A can use it, you should set label in Resource Request to a. Similar to 
HR/RD cluster. 

I appreciate your 

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-08-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104926#comment-14104926
 ] 

Wangda Tan commented on YARN-796:
-

bq. As I've said before, I basically want something similar to the health check 
code: I provide something executable that the NM can run at runtime that will 
provide the list of labels. If we need to add labels, it's updating the script 
which is a much smaller footprint than redeploying HADOOP_CONF_DIR everywhere.
I understand now, it's meaningful since it's a flexible way for admin to set 
labels in NM side. Maybe add a {{NodeLabelCheckerService}} to NM similar to 
{{NodeHealthCheckerService}} should work. I'll create a separated JIRA for 
setting labels in NM side under this ticket and leave design/implementation 
discussion here.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-08-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104931#comment-14104931
 ] 

Wangda Tan commented on YARN-2056:
--

May a another way to do this is add a config in per queue, like 
{{..queue-path.disable_preemption}}. And in 
{{ProportionalCapacityPreemptionPolicy#cloneQueues}}, if a queue's used 
capacity more than guaranteed resource, and it set disable preemption. We will 
not create a TempQueue for it.
We will not require RM restart if queue property changed (queue property will 
be refreshed and PreemptionPolicy will get such changes.

Does it make sense?

Thanks,
Wangda

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2434) RM should not recover containers from previously failed attempt when AM restart is not enabled

2014-08-21 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2434:
-

Summary: RM should not recover containers from previously failed attempt 
when AM restart is not enabled  (was: RM should not recover containers from 
previously failed attempt)

 RM should not recover containers from previously failed attempt when AM 
 restart is not enabled
 --

 Key: YARN-2434
 URL: https://issues.apache.org/jira/browse/YARN-2434
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2434.1.patch


 If container-preserving AM restart is not enabled and AM failed during RM 
 restart, RM on restart should not recover containers from previously failed 
 attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2434) RM should not recover containers from previously failed attempt when AM restart is not enabled

2014-08-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105116#comment-14105116
 ] 

Wangda Tan commented on YARN-2434:
--

Jian, thanks for the patch, LGTM +1

 RM should not recover containers from previously failed attempt when AM 
 restart is not enabled
 --

 Key: YARN-2434
 URL: https://issues.apache.org/jira/browse/YARN-2434
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2434.1.patch


 If container-preserving AM restart is not enabled and AM failed during RM 
 restart, RM on restart should not recover containers from previously failed 
 attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2433) Stale token used by restarted AM (with previous containers retained) to request new container

2014-08-21 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105119#comment-14105119
 ] 

Wangda Tan commented on YARN-2433:
--

[~yingdachen], thanks for reporting this issue, I can take a look at this 
issue, will get you posted.

Thanks,
Wangda

 Stale token used by restarted AM (with previous containers retained) to 
 request new container
 -

 Key: YARN-2433
 URL: https://issues.apache.org/jira/browse/YARN-2433
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0, 2.4.1
Reporter: Yingda Chen
Assignee: Wangda Tan

 With Hadoop 2.4, container retention is supported across AM 
 crash-and-restart. However, after an AM is restarted with containers 
 retained, it appears to be using the stale token to start new container. This 
 leads to the error below. To truly support container retention, AM should be 
 able to communicate with previous container(s) with the old token and ask for 
 new container with new token. 
 This could be similar to YARN-1321 which was reported and fixed earlier.
 ERROR: 
 Unauthorized request to start container. \nNMToken for application attempt : 
 appattempt_1408130608672_0065_01 was used for starting container with 
 container token issued for application attempt : 
 appattempt_1408130608672_0065_02
 STACK trace:
 {code}
 hadoop.ipc.ProtobufRpcEngine$Invoker.invoke 
 org.apache.hadoop.yarn.client.api.async.impl.NMClientAsyncImpl #0 | 103: 
 Response - YINGDAC1.redmond.corp.microsoft.com/10.121.136.231:45454: 
 startContainers {services_meta_data { key: mapreduce_shuffle value: 
 \000\0004\372 } failed_requests { container_id { app_attempt_id { 
 application_id { id: 65 cluster_timestamp: 1408130608672 } attemptId: 2 } id: 
 2 } exception { message: Unauthorized request to start container. \nNMToken 
 for application attempt : appattempt_1408130608672_0065_01 was used for 
 starting container with container token issued for application attempt : 
 appattempt_1408130608672_0065_02 trace: 
 org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to 
 start container. \nNMToken for application attempt : 
 appattempt_1408130608672_0065_01 was used for starting container with 
 container token issued for application attempt : 
 appattempt_1408130608672_0065_02\r\n\tat 
 org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:48)\r\n\tat
  
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.authorizeStartRequest(ContainerManagerImpl.java:508)\r\n\tat
  
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(ContainerManagerImpl.java:571)\r\n\tat
  
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:538)\r\n\tat
  
 org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:60)\r\n\tat
  
 org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:95)\r\n\tat
  
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat
  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)\r\n\tat 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)\r\n\tat 
 java.security.AccessController.doPrivileged(Native Method)\r\n\tat 
 javax.security.auth.Subject.doAs(Subject.java:415)\r\n\tat 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)\r\n\tat
  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)\r\n class_name: 
 org.apache.hadoop.yarn.exceptions.YarnException } }}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2345) yarn rmadmin -report

2014-08-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106669#comment-14106669
 ] 

Wangda Tan commented on YARN-2345:
--

Hi Hao,
I think we already have a NodeCLI, which is yarn node -status nodeid as you 
said. We don't need add such method to RM admin CLI. RM admin CLI should only 
implement methods contained by ResourceManagerAdministrationProtocol.
I would suggest to add more information when execute yarn node -all -list, 
like memory-used, CPU-used, etc. Just like RM web UI - nodes page. 

Thanks,
Wangda

 yarn rmadmin -report
 

 Key: YARN-2345
 URL: https://issues.apache.org/jira/browse/YARN-2345
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Reporter: Allen Wittenauer
Assignee: Hao Gao
  Labels: newbie
 Attachments: YARN-2345.1.patch


 It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2345) yarn rmadmin -report

2014-08-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108645#comment-14108645
 ] 

Wangda Tan commented on YARN-2345:
--

[~aw], I agree with you, user doesn't need understand what happen inside. How 
about mark yarn node CLI deprecated, and add existing function to rmadmin CLI?

 yarn rmadmin -report
 

 Key: YARN-2345
 URL: https://issues.apache.org/jira/browse/YARN-2345
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager
Reporter: Allen Wittenauer
Assignee: Hao Gao
  Labels: newbie
 Attachments: YARN-2345.1.patch


 It would be good to have an equivalent of hdfs dfsadmin -report in YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue

2014-08-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108646#comment-14108646
 ] 

Wangda Tan commented on YARN-2385:
--

I think splitting them to two APIs make sense to me. It's more flexible and 
accurate.

 Consider splitting getAppsinQueue to getRunningAppsInQueue + 
 getPendingAppsInQueue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subramaniam Krishnan
  Labels: abstractyarnscheduler

 Currently getAppsinQueue returns both pending  running apps. The purpose of 
 the JIRA is to explore splitting it to getRunningAppsInQueue + 
 getPendingAppsInQueue that will provide more flexibility to callers



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2448) RM should expose the name of the ResourceCalculator being used when AMs register

2014-08-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108822#comment-14108822
 ] 

Wangda Tan commented on YARN-2448:
--

[~vvasudev],
Thanks for working on the patch, it is LGTM, +1

Wangda

 RM should expose the name of the ResourceCalculator being used when AMs 
 register
 

 Key: YARN-2448
 URL: https://issues.apache.org/jira/browse/YARN-2448
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2448.0.patch, apache-yarn-2448.1.patch


 The RM should expose the name of the ResourceCalculator being used when AMs 
 register, as part of the RegisterApplicationMasterResponse.
 This will allow applications to make better decisions when scheduling. 
 MapReduce for example, only looks at memory when deciding it's scheduling, 
 even though the RM could potentially be using the DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108932#comment-14108932
 ] 

Wangda Tan commented on YARN-1707:
--

Hi [~curino],
Thanks for updating, I just took a look, some minor comments,

1) CapacityScheduler#removeQueue
{code}
if (disposableLeafQueue.getCapacity()  0) {
  throw new SchedulerConfigEditException(The queue  + queueName
  +  has non-zero capacity:  + disposableLeafQueue.getCapacity());
}
{code}
removeQueue check disposableLeafQueue's capacity  0, but addQueue doesn't 
check. In addition, 
After previous check, ParentQueue#removeChildQueue/addChildQueue doesn't need 
check its capacity again.
And they should throw same type of exception (both SchedulerConfigEditException 
or both IllegalArgumentException)

2) CS#addQueue
{code}
  throw new SchedulerConfigEditException(Queue  + queue.getQueueName()
  +  is not a dynamic Queue);
{code}
Should dynamic Queue should be reservation queue comparing to similar 
exception throw in removeQueue?

3) CS#setEntitlement
{code}
  if (sesConf.getCapacity()  queue.getCapacity()) {
newQueue.addCapacity((sesConf.getCapacity() - queue.getCapacity()));
  } else {
newQueue
.subtractCapacity((queue.getCapacity() - sesConf.getCapacity()));
  }
{code}
Maybe it's better to merge the add/substractCapacity to changeCapacity(delta)
Or just create a setCapacity in ReservationQueue?

4) CS#getReservableQueues
Is it better to rename it to getPlanQueues?

5) ReservationQueue#getQueueName
{code}
  @Override
  public String getQueueName() {
return this.getParent().getQueueName();
  }
{code}
I'm not sure why doing this, could you please elaborate? This makes 
this.queueName and this.getQueueName has different semantic.

6) ReservationQueue#substractCapacity
{code}
this.setCapacity(this.getCapacity() - capacity);
{code}
With EPSILON, it is possible this.capacity  0 set substract, its better to cap 
this.capacity in range of [0,1]. Also addCapacity

7) DynamicQueueConf
I think unfold it to two float as parameter for setEntitlement maybe more 
straigtforward, is it possible more fields will be add to DynamicQueueConf?

8) ParentQueue#setChildQueues
Since only PlanQueue need sum of capacity = 1, I would suggest make this 
method protected, and PlanQueue can overwrite this method. Or add a check in 
ParentQueue#setChildQueues.

Wangda

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108991#comment-14108991
 ] 

Wangda Tan commented on YARN-1198:
--

Hi [~cwelch],
Thanks for updating, I went through your patch just now.

I think the current approach makes more sense to me comparing to patch#4, it 
avoids iterating all apps when computing headroom. But currently, 
CapacityHeadroomProvider#getHeadroom will recompute headroom for each 
application heartbeat. Assume we have #application  #user in a queue (the 
most possible case), it's still a little costly.

I agree with the method which mentioned by Jason more: Specifically, we can 
create a map of user, headroom for each queue, when we need update headroom, 
we can update the all headroom in the map. And each SchedulerApplicationAttempt 
will hold a reference to headroom. The headroom in the map maybe as same as 
the {{HeadroomProvider}} in your patch. I would suggest to rename the 
{{HeadroomProvider}} to {{HeadroomReference}}, because we don't need do any 
computation in it anymore.

Another benefit is, we don't need write HeadroomProvider for each scheduler. A 
simple HeadroomReference with getter/setter should be enough.

Two more things we should take care with previous method:
1) As mentioned by Jason, currently, fair/capacity scheduler all support moving 
app between queues, we should recompute and change the reference after finished 
moving app. 
2) In LeafQueue#assignContainers, we don't need call 
{code}
  Resource userLimit = 
  computeUserLimitAndSetHeadroom(application, clusterResource, 
  required);
{code}
For each application, and in LeafQueue#updateClusterResource iterate and update 
the map of user, headroom should be enough

Wangda


 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Craig Welch
 Attachments: YARN-1198.1.patch, YARN-1198.2.patch, YARN-1198.3.patch, 
 YARN-1198.4.patch, YARN-1198.5.patch, YARN-1198.6.patch, YARN-1198.7.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-08-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113124#comment-14113124
 ] 

Wangda Tan commented on YARN-2056:
--

Hi [~eepayne],
Really sorry to come late, thanks for working on this, I just took a look at 
your method and patch, some comments:
1) I prefer to make per-queue disable preemption option follow the same config 
options in existing capacity-scheduler (same queue-path-prefix, etc.).
2) {{mockNested}} when(q.getQueuePath()) should consider hierarchy of queue as 
well
3) It's better to add tests for hierarchy of queues when preemption is disabled
4) In {{testPerQueueDisablePreemption}}, I think number of preemptions after 
enable queue-b's preemption is not very clear to me:
{code}
+// With no PREEMPTION_DISABLED set for queueB, get resources from both
+// queueB and queueC (times() assertion is cumulative).
+verify(mDisp, times(5)).handle(argThat(new IsPreemptionRequestFor(appB)));
+verify(mDisp, times(16)).handle(argThat(new IsPreemptionRequestFor(appC)));
{code}
In the 2nd preemption, more resource reclaimed from appC than appB, I think it 
should get resource from app more, could you please take a look at what 
happened? I just afraid because we changed ideal resource calculation in 1st 
preemption, is it possible to affect 2nd preemption?

Thanks,
Wangda

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113147#comment-14113147
 ] 

Wangda Tan commented on YARN-1707:
--

Hi [~curino],
Thanks for updating, I think current approach looks good to me, except 
Regarding 5,
I just have a chat with Subru, as you mentioned, changing this is majorly 
making ReservationQueues not existed in user side. But I still concern about 
changing the semantic, since it's still a very important semantic of CSQueue. I 
hope to get more feedbacks about this before moving forward, I'll think about 
this myself as well.

Thanks,
Wangda

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, 
 YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-27 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113187#comment-14113187
 ] 

Wangda Tan commented on YARN-1707:
--

Thanks for sharing this, Carlo! It's very helpful to have such investigation 
result, any thoughts, [~jianhe]?

Wangda

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, 
 YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2447) RM web services app submission doesn't pass secrets correctly

2014-08-28 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113515#comment-14113515
 ] 

Wangda Tan commented on YARN-2447:
--

Hi [~vvasudev],
I think the fix is very straight forward to me, previously secrets not properly 
set because of a typo. And now it can successfully get and set, modified test 
can also verify this.

LGTM, +1,
Wangda

 RM web services app submission doesn't pass secrets correctly
 -

 Key: YARN-2447
 URL: https://issues.apache.org/jira/browse/YARN-2447
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2447.0.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-08-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116184#comment-14116184
 ] 

Wangda Tan commented on YARN-1707:
--

Carlo, thanks updating the patch. In addition to Jian's comment, I think the 
changes for displayQueueName looks good to me.
I don't have further comments about this patch for now.

Thanks,
Wangda

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Attachments: YARN-1707.2.patch, YARN-1707.3.patch, YARN-1707.4.patch, 
 YARN-1707.5.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-08-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116613#comment-14116613
 ] 

Wangda Tan commented on YARN-2056:
--

Hi [~eepayne],
Thanks for updating your patch,

bq. Do you mean that the prefix should be yarn.scheduler.capacity instead of 
yarn.resourcemanager.monitor.capacity.preemption? I have done this in this 
patch.
Yeah

bq. mockNested when(q.getQueuePath()) should consider hierarchy of queue as well
Change makes sense to me

bq. testPerQueueDisablePreemption
Now this is more clear

Looking forward a test for disable hierarchy queue preemption.

Wangda

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119515#comment-14119515
 ] 

Wangda Tan commented on YARN-796:
-

Hi [~ViplavMadasu]
Really thanks for reviewing patch and pointing this out, this patch is a little 
out-of-dated, I've noticed and fixed this issue already. I've attached a latest 
patch named YARN-796.node-label.consolidate.1.patch.

And I'm working on split patches of this big patch, will update on this JIRA
Wangda

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-09-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.1.patch

Attached latest consolidated patch named 
YARN-796.node-label.consolidate.1.patch

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, 
 YARN-796.node-label.consolidate.1.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2492) [Umbrella] Allow for (admin) labels on nodes and resource-requests

2014-09-03 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2492:


 Summary: [Umbrella] Allow for (admin) labels on nodes and 
resource-requests 
 Key: YARN-2492
 URL: https://issues.apache.org/jira/browse/YARN-2492
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, client, resourcemanager
Reporter: Wangda Tan


Since YARN-796 is a sub JIRA of YARN-397, this JIRA is used to create and track 
sub tasks and attach split patches for YARN-796.

Let's keep all over-all discussions on YARN-796.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2492) [Umbrella] Allow for (admin) labels on nodes and resource-requests

2014-09-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119658#comment-14119658
 ] 

Wangda Tan commented on YARN-2492:
--

Mark this JIRA is a clone of YARN-796

 [Umbrella] Allow for (admin) labels on nodes and resource-requests 
 ---

 Key: YARN-2492
 URL: https://issues.apache.org/jira/browse/YARN-2492
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, client, resourcemanager
Reporter: Wangda Tan

 Since YARN-796 is a sub JIRA of YARN-397, this JIRA is used to create and 
 track sub tasks and attach split patches for YARN-796.
 Let's keep all over-all discussions on YARN-796.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >