from:"Mayank Bansal \(Jira\)"

[jira] [Commented] (YARN-9656) Plugin to avoid scheduling jobs on node which are not in "schedulable" state, but are healthy otherwise.

2019-10-14 Thread Mayank Bansal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951393#comment-16951393
 ] 

Mayank Bansal commented on YARN-9656:
-

[~wangda] We should not make full cluster unhealthy otherwise its very hard to 
distinguish the case between unhealthy and stressed. We would not want 
everybody to be removed from scheduling cycle otherwise its a cluster wide 
outage. We would want to see how many nodes can be stressed in one cycle and 
just avoid those small number of nodes

> Plugin to avoid scheduling jobs on node which are not in "schedulable" state, 
> but are healthy otherwise.
> 
>
> Key: YARN-9656
> URL: https://issues.apache.org/jira/browse/YARN-9656
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.9.1, 3.1.2
>Reporter: Prashant Golash
>Assignee: Prashant Golash
>Priority: Major
> Attachments: 2.patch
>
>
> Creating this Jira to get idea from the community if this is something 
> helpful which can be done in YARN. Some times the nodes go in a bad state for 
> e.g. (H/W problem: I/O is bad; Fan problem). In some other scenarios, if 
> CGroup is not enabled, nodes may be running very high on CPU and the jobs 
> scheduled on them will suffer.
>  
> The idea is three-fold:
>  # Gather relevant metrics from node-managers and put in some form (for e.g. 
> exclude file).
>  # RM loads the files and put the nodes as part of the blacklist.
>  # Once the node becomes good, they can again be put in the whitelist.
> Various optimizations can be done here, but I would like to understand if 
> this is something which could be helpful as an upstream feature in YARN.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2016-01-05 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-4161:

Attachment: YARN-4161.patch

Attaching patch

> Capacity Scheduler : Assign single or multiple containers per heart beat 
> driven by configuration
> 
>
> Key: YARN-4161
> URL: https://issues.apache.org/jira/browse/YARN-4161
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: YARN-4161.patch
>
>
> Capacity Scheduler right now schedules multiple containers per heart beat if 
> there are more resources available in the node.
> This approach works fine however in some cases its not distribute the load 
> across the cluster hence throughput of the cluster suffers. I am adding 
> feature to drive that using configuration by that we can control the number 
> of containers assigned per heart beat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2015-09-15 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-4161:

Description: 
Capacity Scheduler right now schedules multiple containers per heart beat if 
there are more resources available in the node.
This approach works fine however in some cases its not distribute the load 
across the cluster hence throughput of the cluster suffers. I am adding feature 
to drive that using configuration by that we can control the number of 
containers assigned per heart beat.

  was:Capacity 


> Capacity Scheduler : Assign single or multiple containers per heart beat 
> driven by configuration
> 
>
> Key: YARN-4161
> URL: https://issues.apache.org/jira/browse/YARN-4161
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>
> Capacity Scheduler right now schedules multiple containers per heart beat if 
> there are more resources available in the node.
> This approach works fine however in some cases its not distribute the load 
> across the cluster hence throughput of the cluster suffers. I am adding 
> feature to drive that using configuration by that we can control the number 
> of containers assigned per heart beat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2015-09-15 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-4161:

Description: Capacity 

> Capacity Scheduler : Assign single or multiple containers per heart beat 
> driven by configuration
> 
>
> Key: YARN-4161
> URL: https://issues.apache.org/jira/browse/YARN-4161
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>
> Capacity 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2015-09-15 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-4161:

Fix Version/s: (was: 2.1.0-beta)

> Capacity Scheduler : Assign single or multiple containers per heart beat 
> driven by configuration
> 
>
> Key: YARN-4161
> URL: https://issues.apache.org/jira/browse/YARN-4161
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>
> Capacity 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2015-09-15 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-4161:

Description: (was: This is different from (even if related to, and 
likely share code with) YARN-2113.

YARN-2113 focuses on making sure that even if queue has its guaranteed 
capacity, it's individual users are treated in-line with their limits 
irrespective of when they join in.

This JIRA is about respecting user-limits while preempting containers to 
balance queue capacities.)

> Capacity Scheduler : Assign single or multiple containers per heart beat 
> driven by configuration
> 
>
> Key: YARN-4161
> URL: https://issues.apache.org/jira/browse/YARN-4161
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Fix For: 2.1.0-beta
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2015-09-15 Thread Mayank Bansal (JIRA)

Mayank Bansal created YARN-4161:
---

 Summary: Capacity Scheduler : Assign single or multiple containers 
per heart beat driven by configuration
 Key: YARN-4161
 URL: https://issues.apache.org/jira/browse/YARN-4161
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Mayank Bansal
Assignee: Mayank Bansal


This is different from (even if related to, and likely share code with) 
YARN-2113.

YARN-2113 focuses on making sure that even if queue has its guaranteed 
capacity, it's individual users are treated in-line with their limits 
irrespective of when they join in.

This JIRA is about respecting user-limits while preempting containers to 
balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2015-09-15 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-4161:

Labels:   (was: BB2015-05-TBR)

> Capacity Scheduler : Assign single or multiple containers per heart beat 
> driven by configuration
> 
>
> Key: YARN-4161
> URL: https://issues.apache.org/jira/browse/YARN-4161
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Fix For: 2.1.0-beta
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2015-05-27 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:

Attachment: YARN-2069-2.6-11.patch

Uploading the 2.6 patch

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
  Labels: BB2015-05-TBR
 Attachments: YARN-2069-2.6-11.patch, YARN-2069-trunk-1.patch, 
 YARN-2069-trunk-10.patch, YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, 
 YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch, 
 YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, YARN-2069-trunk-9.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-15 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279056#comment-14279056
 ] 

Mayank Bansal commented on YARN-2933:
-

Thanks [~jianhe] and [~wangda] for the review

bq. looks good overall, we should use priority.AMCONTAINER here ?

It was Confusing by name , I changed the names and updated accordingly.

bq. it's better to use enum type instead of int in mockContainer, which can 
avoid call getValue() from enum.
Priority is been override in multiple tests differently so didn't want to 
change the signature of the functions, Moreover its same.

Uploading the updated patch

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, 
 YARN-2933-8.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-15 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-9.patch

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, 
 YARN-2933-8.patch, YARN-2933-9.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-14 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-8.patch

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch, 
 YARN-2933-8.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-14 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277328#comment-14277328
]

Mayank Bansal commented on YARN-2933:
-

Thanks [~wangda] for review.

bq. 1) ProportionalCapacityPreemptionPolicy.setNodeLabels is too simple to be a
method, it's better to remove it.
Getter and setters are usually simple but its good practice to have. I think we
should keep it.

bq. 2) It's better to use enum here instead of integer
Done.

Thanks,
Mayank

Capacity Scheduler preemption policy should only consider capacity without
labels temporarily
-

Key: YARN-2933
URL: https://issues.apache.org/jira/browse/YARN-2933
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch,
YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch,
YARN-2933-8.patch

Currently, we have capacity enforcement on each queue for each label in
CapacityScheduler, but we don't have preemption policy to support that.
YARN-2498 is targeting to support preemption respect node labels, but we have
some gaps in code base, like queues/FiCaScheduler should be able to get
usedResource/pendingResource, etc. by label. These items potentially need to
refactor CS which we need spend some time carefully think about.
For now, what immediately we can do is allow calculate ideal_allocation and
preempt containers only for resources on nodes without labels, to avoid
regression like: A cluster has some nodes with labels and some not, assume
queueA isn't satisfied for resource without label, but for now, preemption
policy may preempt resource from nodes with labels for queueA, that is not
correct.
Again, it is just a short-term enhancement, YARN-2498 will consider
preemption respecting node-labels for Capacity Scheduler which is our final
target.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-13 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-7.patch

Thanks [~wangda][~jianhe]and [~sunilg] for reviews

Updated the patch

Thanks,
Mayank


 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-13 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275989#comment-14275989
 ] 

Mayank Bansal commented on YARN-2933:
-

This test failure is not due to this patch.

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch, YARN-2933-7.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-08 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14269784#comment-14269784
]

Mayank Bansal commented on YARN-2933:
-

Thanks [~wangda] and Sunil for review.

bq. In addition to previously comment, I think we put incorrect #container for
each application when setLabelContainer=true. The usedResource or current
in TestProportionalPreemptionPolicy actually means used resource of nodes
without label. So if we want to have labeled container in an application, we
should make it stay outside of usedResource.

I don't think thats needed as the basic functionality for the test is to
demonstrate we can skip labeled container, So I think it does not mater.

bq. And testSkipLabeledContainer is fully covered by
testIdealAllocationForLabels. Since we have already checked #container
preempted in each application in testIdealAllocationForLabels, which implies
labeled containers are ignored.
Agreed

bq. A minor suggest is rename setLabelContainer to setLabeledContainer
Agreed

bq. An application's(if not specified any labels during submission time)
containers, may fall in to nodes where it can be labelled or not labelled. Am I
correct?

No , As of now containers with no labels can not go to labeled nodes.

Thanks,
Mayank

Capacity Scheduler preemption policy should only consider capacity without
labels temporarily
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-08 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-6.patch

Attaching patch

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch, YARN-2933-6.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-07 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267962#comment-14267962
]

Mayank Bansal commented on YARN-2933:
-

Thanks [~wangda] for review.

1. Fixed, I should have used it.
2. I think getter and setter should be there.
3. Done
4. Done
5. Test is fixed
6. FInd bug is not due to this patch.

Thanks,
Mayank

Capacity Scheduler preemption policy should only consider capacity without
labels temporarily
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-07 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-5.patch

Updating patch.

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch, YARN-2933-5.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2015-01-06 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-4.patch

Thanks [~wangda] for review.

I updated the patch based on the comments.

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch, 
 YARN-2933-4.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-23 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-3.patch

Fixing javadoc warnings

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-23 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257717#comment-14257717
 ] 

Mayank Bansal commented on YARN-2933:
-

These findbugs is not due to this patch,

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch, YARN-2933-2.patch, YARN-2933-3.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-22 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-2.patch

Thanks [~wangda] for the review.
Make sense.
Updating the patch.

Thanks,
Mayank

Capacity Scheduler preemption policy should only consider capacity without
labels temporarily
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-18 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252629#comment-14252629
 ] 

Mayank Bansal commented on YARN-2933:
-

These find bug and test failure is not due to this patch.

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-17 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-2933:
---

Assignee: Mayank Bansal  (was: Wangda Tan)

Taking it over

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal

 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-17 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-1.patch

Attaching patch for avoiding preemption for labeled containers.

Thanks,
Mayank

 Capacity Scheduler preemption policy should only consider capacity without 
 labels temporarily
 -

 Key: YARN-2933
 URL: https://issues.apache.org/jira/browse/YARN-2933
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Mayank Bansal
 Attachments: YARN-2933-1.patch


 Currently, we have capacity enforcement on each queue for each label in 
 CapacityScheduler, but we don't have preemption policy to support that. 
 YARN-2498 is targeting to support preemption respect node labels, but we have 
 some gaps in code base, like queues/FiCaScheduler should be able to get 
 usedResource/pendingResource, etc. by label. These items potentially need to 
 refactor CS which we need spend some time carefully think about.
 For now, what immediately we can do is allow calculate ideal_allocation and 
 preempt containers only for resources on nodes without labels, to avoid 
 regression like: A cluster has some nodes with labels and some not, assume 
 queueA isn't satisfied for resource without label, but for now, preemption 
 policy may preempt resource from nodes with labels for queueA, that is not 
 correct.
 Again, it is just a short-term enhancement, YARN-2498 will consider 
 preemption respecting node-labels for Capacity Scheduler which is our final 
 target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue info including labels of such queue

2014-10-21 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179377#comment-14179377
 ] 

Mayank Bansal commented on YARN-2647:
-

HI [~sunilg] ,

Are u still working on this ? Can i take it over if u r not looking at it?

Thanks,
Mayank

 Add yarn queue CLI to get queue info including labels of such queue
 ---

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Sunil G





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-21 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-2698:
---

Assignee: Mayank Bansal  (was: Wangda Tan)

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Mayank Bansal

 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-21 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14179379#comment-14179379
 ] 

Mayank Bansal commented on YARN-2698:
-

taking it over

Thanks,
Mayank

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Mayank Bansal

 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2598) GHS should show N/A instead of null for the inaccessible information

2014-10-09 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165715#comment-14165715
 ] 

Mayank Bansal commented on YARN-2598:
-

Committed to branch 2, 2,6 and trunk
Thanks [~zjshen]

Thanks,
Mayank

 GHS should show N/A instead of null for the inaccessible information
 

 Key: YARN-2598
 URL: https://issues.apache.org/jira/browse/YARN-2598
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2598.1.patch, YARN-2598.2.patch


 When the user doesn't have the access to an application, the app attempt 
 information is not visible to the user. ClientRMService will output N/A, but 
 GHS is showing null, which is not user-friendly.
 {code}
 14/09/24 22:07:20 INFO impl.TimelineClientImpl: Timeline service address: 
 http://nn.example.com:8188/ws/v1/timeline/
 14/09/24 22:07:20 INFO client.RMProxy: Connecting to ResourceManager at 
 nn.example.com/240.0.0.11:8050
 14/09/24 22:07:21 INFO client.AHSProxy: Connecting to Application History 
 server at nn.example.com/240.0.0.11:10200
 Application Report : 
   Application-Id : application_1411586934799_0001
   Application-Name : Sleep job
   Application-Type : MAPREDUCE
   User : hrt_qa
   Queue : default
   Start-Time : 1411586956012
   Finish-Time : 1411586989169
   Progress : 100%
   State : FINISHED
   Final-State : SUCCEEDED
   Tracking-URL : null
   RPC Port : -1
   AM Host : null
   Aggregate Resource Allocation : N/A
   Diagnostics : null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-10-09 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165795#comment-14165795
 ] 

Mayank Bansal commented on YARN-2320:
-

Thanks [~zjshen] for the patch
overall looks ok
1) couple of points I think Attempt and container too should have N/A instead 
of null. If you wanted to do it in seprate jira thats fine too.
2) latest patch needs rebasing 
3) What testing you have done on this patch?

Once I will have rebased patch will run tests.

Thanks,
Mayank

 Removing old application history store after we store the history data to 
 timeline store
 

 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2320.1.patch, YARN-2320.2.patch, YARN-2320.3.patch


 After YARN-2033, we should deprecate application history store set. There's 
 no need to maintain two sets of store interfaces. In addition, we should 
 conclude the outstanding jira's under YARN-321 about the application history 
 store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2670) Adding feedback capability to capacity scheduler from external systems

2014-10-09 Thread Mayank Bansal (JIRA)

Mayank Bansal created YARN-2670:
---

 Summary: Adding feedback capability to capacity scheduler from 
external systems
 Key: YARN-2670
 URL: https://issues.apache.org/jira/browse/YARN-2670
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Mayank Bansal
Assignee: Mayank Bansal


The sheer growth in data volume and Hadoop cluster size make it a significant 
challenge to diagnose and locate problems in a production-level cluster 
environment efficiently and within a short period of time. Often times, the 
distributed monitoring systems are not capable of detecting a problem well in 
advance when a large-scale Hadoop cluster starts to deteriorate in performance 
or becomes unavailable. Thus, incoming workloads, scheduled between the time 
when cluster starts to deteriorate and the time when the problem is identified, 
suffer from longer execution times. As a result, both reliability and 
throughput of the cluster reduce significantly. we address this problem by 
proposing a system called Astro, which consists of a predictive model and an 
extension to the Capacity scheduler. The predictive model in Astro takes into 
account a rich set of cluster behavioral information that are collected by 
monitoring processes and model them using machine learning algorithms to 
predict future behavior of the cluster. The Astro predictive model detects 
anomalies in the cluster and also identifies a ranked set of metrics that have 
contributed the most towards the problem. The Astro scheduler uses the 
prediction outcome and the list of metrics to decide whether it needs to move 
and reduce workloads from the problematic cluster nodes or to prevent 
additional workload allocations to them, in order to improve both throughput 
and reliability of the cluster.

This JIRA is only for adding feedback capabilities to Capacity Scheduler which 
can take feedback from external systems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-09-29 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152559#comment-14152559
 ] 

Mayank Bansal commented on YARN-2320:
-

I think  overal looks ok however Have to run.

some small comments

shouldn't we use N/A in convertToApplicationAttemptReport instead of null ?
Similarly for convertToApplicationReport?
Similary for convertToContainerReport?

 Removing old application history store after we store the history data to 
 timeline store
 

 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2320.1.patch, YARN-2320.2.patch


 After YARN-2033, we should deprecate application history store set. There's 
 no need to maintain two sets of store interfaces. In addition, we should 
 conclude the outstanding jira's under YARN-321 about the application history 
 store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-28 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2459:


Attachment: YARN-2459-2.patch

Attaching patch after oflline discussion with [~jianhe]

Thanks,
Mayank

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch, YARN-2459-2.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-27 Thread Mayank Bansal (JIRA)

Mayank Bansal created YARN-2459:
---

 Summary: RM crashes if App gets rejected for any reason and HA is 
enabled
 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.5.0


If RM HA is enabled and used Zookeeper store for RM State Store.
If for any reason Any app gets rejected and directly goes to NEW to FAILED
then final transition makes that to RMApps and Completed Apps memory structure 
but that doesn't make it to State store.
Now when RMApps default limit reaches it starts deleting apps from memory and 
store. In that case it try to delete this app from store and fails which causes 
RM to crash.

Thanks,
Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-27 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2459:


Attachment: YARN-2459-1.patch

Updating patch , Adding app to state store in App reject event to make RM 
memory and state store consistent.

Thanks,
Mayank

 RM crashes if App gets rejected for any reason and HA is enabled
 

 Key: YARN-2459
 URL: https://issues.apache.org/jira/browse/YARN-2459
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-2459-1.patch


 If RM HA is enabled and used Zookeeper store for RM State Store.
 If for any reason Any app gets rejected and directly goes to NEW to FAILED
 then final transition makes that to RMApps and Completed Apps memory 
 structure but that doesn't make it to State store.
 Now when RMApps default limit reaches it starts deleting apps from memory and 
 store. In that case it try to delete this app from store and fails which 
 causes RM to crash.
 Thanks,
 Mayank



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2459) RM crashes if App gets rejected for any reason and HA is enabled

2014-08-27 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2459:


Description: 
If RM HA is enabled and used Zookeeper store for RM State Store.
If for any reason Any app gets rejected and directly goes to NEW to FAILED
then final transition makes that to RMApps and Completed Apps memory structure 
but that doesn't make it to State store.
Now when RMApps default limit reaches it starts deleting apps from memory and 
store. In that case it try to delete this app from store and fails which causes 
RM to crash.

Stack Trace

2014-08-24 18:43:04,603 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Skipping scheduling since node phxaishdc9dn0360.phx.ebay.com:58458 is reserved 
by applica 
tion appattempt_1408727267637_12984_01 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Trying to fulfill reservation for application application_1408727267637_12984 
on node: ph 
xaishdc9dn0816.phx.ebay.com:50443 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
 Application application_1408727267637_12984 reserved container 
container_1408727267637_1 
2984_01_003215 on node host: phxaishdc9dn0816.phx.ebay.com:50443 #containers=17 
available=4224 used=63360, currently has 310 at priority 10; currentReservation 
2618880 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode:
 Updated reserved container container_1408727267637_12984_01_003215 on node 
host: phxai 
shdc9dn0816.phx.ebay.com:50443 #containers=17 available=4224 used=63360 for 
application 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp@2da03710
 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
Reserved container application=application_1408727267637_12984 
resource=memory:8448, vCores:1 
queue=hdmi-set: capacity=0.2, absoluteCapacity=0.2, 
usedResources=memory:34293248, vCores:7092usedCapacity=1.4031365, 
absoluteUsedCapacity=0.28062728, numApps=12, numContainers=7092 
usedCapacity=1.403 
1365 absoluteUsedCapacity=0.28062728 used=memory:34293248, vCores:7092 
cluster=memory:122202112, vCores:14584 
2014-08-24 18:43:04,613 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Skipping scheduling since node phxaishdc9dn0816.phx.ebay.com:50443 is reserved 
by applica 
tion appattempt_1408727267637_12984_01 
2014-08-24 18:43:04,614 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) 
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945) 
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:852)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:849)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:948)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:967)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:849)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.removeApplicationStateInternal(ZKRMStateStore.java:642)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:181)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$RemoveAppTransition.transition(RMStateStore.java:167)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:766)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:837)
 
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:832)
 
at

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-08-15 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-10.patch

Fixing small bug.

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-10.patch, 
 YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, 
 YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, 
 YARN-2069-trunk-8.patch, YARN-2069-trunk-9.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2032) Implement a scalable, available TimelineStore using HBase

2014-08-12 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2032:


Attachment: YARN-2032-branch2-2.patch

Attching update patch for branch 2

Thanks,
Mayank

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch


 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-31 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14081193#comment-14081193
 ] 

Mayank Bansal commented on YARN-2069:
-

Hi [~wangda] ,

Thanks for your review comments.

Updating the patch with the fix.

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-31 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-8.patch

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-25 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074940#comment-14074940
]

Mayank Bansal commented on YARN-2069:
-

HI [~wangda] ,

Thanks for the review.

Let me explain what this algo is doing .

Lets say you have queueA in your cluster with capacity 30% allocated to it.
Now Queue A is using 50% resources. Queue A has 5 users with 20% user limit.
That means with each user is using 10% of the capacity of the cluster.

Now Another queueB is there with allocated capacity 70%.
Used capacity of queue B is 50%. Now if another application gets submitted to
Queue B which needs 10% capacity.

Now 10% capacity has to be claimed back from queue A .
So restoobtain = 10%
Targated user limit will be = 8% (This is always calculated based on how much
we need to calim back from user)

So based on the current alogorithm , it will take out 2% resources from every
user and will leave behind the balance for each users.
This will also be true if all the users are not using same number of resources
so based on this algo it will take out more from the users
which are using more to balance till targated user limit.

Other thing this algo also does is it preempt application which is submitted
last that means if user1 has 2 application, it will try to take the maximum
containers from the last application submitted leaving behind the AM container
however user limit will be honoured with combined all applications in the queue.

This algo does not remove AM continer if its not absolutely needed, it goes to
get all the tasks containers first and then consider AM containers.to be
preempted.

Thanks,
Mayank

CS queue level preemption should respect user-limits

Key: YARN-2069
URL: https://issues.apache.org/jira/browse/YARN-2069
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch,
YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch,
YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch

This is different from (even if related to, and likely share code with)
YARN-2113.
YARN-2113 focuses on making sure that even if queue has its guaranteed
capacity, it's individual users are treated in-line with their limits
irrespective of when they join in.
This JIRA is about respecting user-limits while preempting containers to
balance queue capacities.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-24 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073727#comment-14073727
 ] 

Mayank Bansal commented on YARN-2069:
-

Thanks [~vinodkv] for the review.

I have changed the patch based on the targated capacity for the queue. It 
balances out with the users resources.
I also removed the twp passes and now its only one pass.

Please review it.

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-24 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-6.patch

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-24 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-7.patch

Updated patch

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, 
 YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-16 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1408:


Fix Version/s: 2.6.0
   2.5.0

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.5.0, 2.6.0

 Attachments: YARN-1408-branch-2.5-1.patch, Yarn-1408.1.patch, 
 Yarn-1408.10.patch, Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.8.patch, Yarn-1408.9.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-15 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062698#comment-14062698
 ] 

Mayank Bansal commented on YARN-1408:
-

+1 Committing

Thanks [~sunilg] for the patch.
Thanks [~jianhe], [~vinodkv] and [~wangda] for the reviews.

Thanks,
Mayank

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
 Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, 
 Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, 
 Yarn-1408.9.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-15 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062879#comment-14062879
 ] 

Mayank Bansal commented on YARN-1408:
-

Committed to trunk, branch 2 and branch-2.5.

branch-2.5 needed some rebase , Updating the patch for branch-2.5

Thanks,
Mayank

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-1408-branch-2.5-1.patch, Yarn-1408.1.patch, 
 Yarn-1408.10.patch, Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.8.patch, Yarn-1408.9.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-15 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1408:


Attachment: YARN-1408-branch-2.5-1.patch

rebasing against branch 2.5


 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-1408-branch-2.5-1.patch, Yarn-1408.1.patch, 
 Yarn-1408.10.patch, Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.8.patch, Yarn-1408.9.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-14 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14060929#comment-14060929
 ] 

Mayank Bansal commented on YARN-1408:
-

 At [~jianhe] 's point .
I think its good to check schedulerAttempt is not null before accessing it.
Make Sense?

Thanks,
Mayank

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
 Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, 
 Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, Yarn-1408.9.patch, 
 Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-14 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061320#comment-14061320
 ] 

Mayank Bansal commented on YARN-1408:
-

[~sunilg],
Can you check these test failures?

Thanks,
Mayank

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, 
 Yarn-1408.11.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, 
 Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, 
 Yarn-1408.9.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-11 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059104#comment-14059104
 ] 

Mayank Bansal commented on YARN-1408:
-

Thanks [~sunilg] for the patch.
Patch Looks good, Can you check these teste failures.

Thanks,
Mayank

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.8.patch, Yarn-1408.9.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-10 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057170#comment-14057170
 ] 

Mayank Bansal commented on YARN-2069:
-

I just verified, rebased the patch and compiled and tested . Patch doesn't 
seems to be the problem.

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-10 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057200#comment-14057200
 ] 

Mayank Bansal commented on YARN-1408:
-

Thanks [~sunilg] for the patch.

Patch looks good , There are some minor comments
1. You current patch is not applying on the trunk, Please rebase on trunk.

2. There are lot of unwanted formatting changes, can you please revert them 
back. Some examples are as follows
{code}
-  .currentTimeMillis());
+.currentTimeMillis());
{code}

{code}
-RMContainer rmContainer =
-new RMContainerImpl(container, attemptId, node.getNodeID(),
-  applications.get(attemptId.getApplicationId()).getUser(), rmContext,
-  status.getCreationTime());
+RMContainer rmContainer = new RMContainerImpl(container, attemptId,
+node.getNodeID(), applications.get(attemptId.getApplicationId())
+.getUser(), rmContext, status.getCreationTime());
{code}
Please check this in all the patch.



 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-09 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-5.patch

Thanks [~wangda] and [~sunilg] for the review.

I have update all the comments from [~wangda] except test cases one as I 
discussed offline with wangda and explain that current test cases are covering 
both the scenarios which he explained.

[~sunilg] ,

I think [~wangda] already addressed your comments.

Please review the latest patch

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-08 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14054648#comment-14054648
 ] 

Mayank Bansal commented on YARN-2022:
-

Merged to Branch 2.5

Thanks,
Mayank

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.5.0

 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-08 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055598#comment-14055598
]

Mayank Bansal commented on YARN-2069:
-

Hi [~wangda],

Good point , I missed it.

I have updated the patch accordingly, Please review.

Hi [~sunilg]

Previously as well we dont wait for one more cycle to happen before we start
preemption. it drops reservation and then count against res to Obtain and
return rest of the containers to preempt,
I am following the same pattern.
So essentially I am droping reservation , then try to balance the queue with
user limits and then get rest of the containers and then send them to preempt.

Thanks,
Mayank

CS queue level preemption should respect user-limits

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-08 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-4.patch

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-07-07 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053959#comment-14053959
]

Mayank Bansal commented on YARN-2069:
-

hi [~wangda],

Thanks for the review. I updated the patch please take a look , Le tme answer
your questions.
bq. In ProportionalCapacityPreemptionPolicy,
bq. 1) balanceUserLimitsinQueueForPreemption()
bq. 1.1, I think there's a bug when multiple applications under a same user
(say Jim) in a queue, and usage of Jim is over user-limit.
Any of Jim's applications will be tried to be preempted
(total-resource-used-by-Jim - user-limit).
We should remember resourcesToClaimBackFromUser and initialRes for each user
(not reset them when handling each application)
And it's better to add test to make sure this behavior is correct.

We need to maintian the reverse order of application submission which only can
be done iterating through applications as we want to preempt applications which
are last submitted.

bq. 1.2, Some debug logging should be removed like
Done

bq. 1.3, This check should be unnecessary
Done

bq. 2) preemptFrom
bq. I noticed this method will be called multiple times for a same application
within a editSchedule() call.
bq. The reservedContainers will be calculated multiple times.
bq. An alternative way to do this is to cache
This method will only be executed for all the applicatoins only once as we will
be removing all reservations and for the apps the reservation is been removed
that would be no-op

bq.In LeafQueue,
bq. 1) I think it's better to remember user limit, no need to compute it every
time, add a method like getUserLimit() to leafQueue should be better.
That valus is not static and changed every time based on cluster utilization
and thats why i am calculating every time.

bq, 1) Should we preempt containers equally from users when there're multiple
users beyond user-limit in a queue?
Its not good it should be based on who submitted last and over user limit,
however its not fair but we want to preempt last jobs first

bq. 2) Should we preempt containers equally from applications in a same user?
(Heap-like data structure maybe helpful to solve 1/2)
No as above mentioned

bq. 3) Should user-limit preemption be configurable?
I think if we just configure preemption , thats enough thoughts?

Thanks,
Mayank

Add cross-user preemption within CapacityScheduler's leaf-queue
---

Preemption today only works across queues and moves around resources across
queues per demand and usage. We should also have user-level preemption within
a queue, to balance capacity across users in a predictable manner.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-07-07 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2113:


Summary: Add cross-user preemption within CapacityScheduler's leaf-queue  
(was: CS queue level preemption should respect user-limits)

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2113
 URL: https://issues.apache.org/jira/browse/YARN-2113
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.5.0


 This is different from (even if related to, and likely share code with) 
 YARN-2069.
 YARN-2069 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-07 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Summary: CS queue level preemption should respect user-limits  (was: Add 
cross-user preemption within CapacityScheduler's leaf-queue)

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-07-07 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2113:


Description: Preemption today only works across queues and moves around 
resources across queues per demand and usage. We should also have user-level 
preemption within a queue, to balance capacity across users in a predictable 
manner.  (was: This is different from (even if related to, and likely share 
code with) YARN-2069.

YARN-2069 focuses on making sure that even if queue has its guaranteed 
capacity, it's individual users are treated in-line with their limits 
irrespective of when they join in.

This JIRA is about respecting user-limits while preempting containers to 
balance queue capacities.)

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2113
 URL: https://issues.apache.org/jira/browse/YARN-2113
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 2.5.0


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-07 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mayank Bansal updated YARN-2069:

Description:
This is different from (even if related to, and likely share code with)
YARN-2113.

YARN-2113 focuses on making sure that even if queue has its guaranteed
capacity, it's individual users are treated in-line with their limits
irrespective of when they join in.

This JIRA is about respecting user-limits while preempting containers to
balance queue capacities.

was:Preemption today only works across queues and moves around resources
across queues per demand and usage. We should also have user-level preemption
within a queue, to balance capacity across users in a predictable manner.

CS queue level preemption should respect user-limits

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-07-02 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-3.patch

Rebasing and Updating the patch.

Thanks,
Mayank

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-01 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049529#comment-14049529
 ] 

Mayank Bansal commented on YARN-2022:
-

+ 1 committing 

Thanks [~sunilg] for the patch.

Thanks [~vinodkv] and [~wangda] for the reviews.

Thanks,
Mayank

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048308#comment-14048308
 ] 

Mayank Bansal commented on YARN-2022:
-

Thanks [~sunilg] for the patch.

{code}
public void setAMContainer(boolean isAMContainer) {
  this.isAMContainer = isAMContainer;
  }
{code}

There should be write lock to it as well

  

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-26 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045386#comment-14045386
 ] 

Mayank Bansal commented on YARN-2022:
-

You are using getAbsoluteMaximumCapacity() in you patch while calculating the 
AM resources which seems to me wrong, I think you should be using 
getAbsoluteCapacity which is the configured capacity of the queue not the max 
capacity of the queue.

Thanks,
Mayank

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-25 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044217#comment-14044217
 ] 

Mayank Bansal commented on YARN-2022:
-

Hi [~sunilg]

If we dont use getAbsoluteCapacity then there is possibility we are running 
only AM's in the queue.
lets say we have 10% capacity of the queue and MAX capacity is 100% and AM 
precentage is 10% that means with your approach 10 AM's can run for this 
queue.And if we have cluster fully utilized then only AM's will be running in 
this queue.

Make sense?

Thanks,
Mayank


 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI

2014-06-25 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044218#comment-14044218
 ] 

Mayank Bansal commented on YARN-2181:
-

If we are adding this information into Web UI then we should change the CLI and 
Rest Apis as well for adding that info.
Thats inconsistent if we dont change the CLI/Rest and only add this info to Web 
UI

Thanks,
Mayank

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, application page.png, queue page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app/queue, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-06-24 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-2.patch

Thanks for the review.

I don't have any easy way now to seprate the YARN-2022 and this patch as I am 
changing the same code and run it through junkins.
I will rebase this patch once YARN-2022 is committed.

Thanks,
Mayank

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-23 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041435#comment-14041435
 ] 

Mayank Bansal commented on YARN-2022:
-

Hi [~vinodkv]
Is this ok with you if we commit this patch? As you have concerns before.
I think we need to still avoid killing AM's even if we have patch for not 
killing applications if AM gets killed.
Please suggest.

Thanks,
Mayank



 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-06-23 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-2069:
---

Assignee: Mayank Bansal  (was: Vinod Kumar Vavilapalli)

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal

 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-06-23 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041617#comment-14041617
 ] 

Mayank Bansal commented on YARN-2069:
-

Taking it over.

Thanks,
Mayank

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2069) Add cross-user preemption within CapacityScheduler's leaf-queue

2014-06-23 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2069:


Attachment: YARN-2069-trunk-1.patch

Attaching patch

Thanks,
Mayank

 Add cross-user preemption within CapacityScheduler's leaf-queue
 ---

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch


 Preemption today only works across queues and moves around resources across 
 queues per demand and usage. We should also have user-level preemption within 
 a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-16 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032918#comment-14032918
 ] 

Mayank Bansal commented on YARN-2022:
-

HI [~sunilg] 

Thanks for the patch.

Overall looks ok however I think we need to add the test case for AM percentage 
per queue as well.

Thanks,
Mayank 

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-12 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030143#comment-14030143
 ] 

Mayank Bansal commented on YARN-2022:
-

Hi [~vinodkv]

what you are saying make sense and I agree to that however I think we still 
need this patch as that will ensure we are give least priority to kill AM's.

Thoughts?

[~sunilg] Thanks for the patch.

here are some high level comments

{code}
+  public static final String SKIP_AM_CONTAINER_FROM_PREEMPTION = 
yarn.resourcemanager.monitor.capacity.preemption.skip_am_container;
{code}
Please run the formatter , it doesn't seems to be the standard length of the 
line

{code}
+skipAMContainer = config.getBoolean(SKIP_AM_CONTAINER_FROM_PREEMPTION,
+false);
{code}
By default it should be true, as we always wanted am to be least priority.

Did you run the test on the cluster?

Thanks,
Mayank

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-11 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028174#comment-14028174
 ] 

Mayank Bansal commented on YARN-2022:
-

hi [~sunilg] 

Thanks for the patch.

There is small addition we need to do in the approach which is as follows

we need to consider below parameters into account
yarn.scheduler.capacity.maximum-am-resource-percent / 
yarn.scheduler.capacity.queue-path.maximum-am-resource-percent

It would be if user have set  
yarn.scheduler.capacity.queue-path.maximum-am-resource-percent in the queue 
then we can not preempt AM even if we didn't reach to full resource need from 
the queue.

If user didn't set that queue level setting then we need to check if we are not 
avoiding yarn.scheduler.capacity.maximum-am-resource-percent constraint as well 

if these two constraints are not avoided and still we have some ams which we 
need to kill then yes we can go with the approach you put in your patch.

Thanks,
Mayank

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-10 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026789#comment-14026789
 ] 

Mayank Bansal commented on YARN-2022:
-

Hi [~sunilg]

Thanks for the update.
We are in the rush of pushing release is there a possibility you can put this 
simple patch today? If not do you mind If i can put this patch.?

Thanks,
Mayank



 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-10 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026795#comment-14026795
 ] 

Mayank Bansal commented on YARN-2022:
-

for user limits there is already a jira YARN-2113 and I think [~wangda] is 
working on it.

Thanks,
Mayank

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-09 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025581#comment-14025581
 ] 

Mayank Bansal commented on YARN-2022:
-

Hi [~sunilg] and [~curino]

Me and [~vinodkv] were discussing about making it simple and if we can just 
don't kill AM contianer that would be easier and will work well.
I think many framweorks (MR, Tez etc) depends on last AM attempt.

The only problem is if we have only AM running in that queue, I think that can 
be avoided by AM precentage per queue.

Thoughts?

Thanks,
Mayank

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-22 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006459#comment-14006459
 ] 

Mayank Bansal commented on YARN-2074:
-

Thanks [~jianhe] for the patch. Overall looks good.
some nits

{code}
  maxAppAttempts = attempts.size()
{code}
Can we use this?
{code}
maxAppAttempts == getAttemptFailureCount()
{code}

{code}
  public boolean isPreempted() {
 return getDiagnostics().contains(SchedulerUtils.PREEMPTED_CONTAINER);
   }
{code}

I think we need to compare the exit status  (-102) instead of relying on string 
message.


 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-05-22 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006548#comment-14006548
 ] 

Mayank Bansal commented on YARN-1408:
-

I agree with [~jianhe] and [~devaraj.k]
We should be able to preempt the container in ALLOCATED state. 

bq. oday the resource request is decremented when container is allocated. we 
may change it to decrement the resource request only when the container is 
pulled by the AM ?
I am not sure if thats the right thing as you dont want to run into other race 
conditions when container is been allocated however capacity is given to some 
other AM's?



 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-05-22 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006735#comment-14006735
 ] 

Mayank Bansal commented on YARN-2074:
-

+1 LGTM

Thanks,
Mayank

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2055) Preemption: Jobs are failing due to AMs are getting launched and killed multiple times

2014-05-16 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998941#comment-13998941
 ] 

Mayank Bansal commented on YARN-2055:
-

YARN-2022 is for avoiding killing AM however this issue more like how we are 
launching AM after preemption as there would be situations where you get some 
capacity for one heart beat and then again that capacity is reclaimed by other 
queue and then again AM will be killed and job will be failed. Based on the 
comments of YARN-2022 i dont see this case have been handeled there.

Thanks,
Mayank

 Preemption: Jobs are failing due to AMs are getting launched and killed 
 multiple times
 --

 Key: YARN-2055
 URL: https://issues.apache.org/jira/browse/YARN-2055
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal

 If Queue A does not have enough capacity to run AM, then AM will borrow 
 capacity from queue B to run AM in that case AM will be killed if queue B 
 will reclaim its capacity and again AM will be launched and killed again, in 
 that case job will be failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2055) Preemtion: Jobs are failing due to AMs are getting launched and killed multiple times

2014-05-14 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2055:


Assignee: (was: Sunil G)

 Preemtion: Jobs are failing due to AMs are getting launched and killed 
 multiple times
 -

 Key: YARN-2055
 URL: https://issues.apache.org/jira/browse/YARN-2055
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
 Fix For: 2.1.0-beta


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2055) Preemtion: Jobs are failing due to AMs are getting launched and killed multiple times

2014-05-14 Thread Mayank Bansal (JIRA)

Mayank Bansal created YARN-2055:
---

 Summary: Preemtion: Jobs are failing due to AMs are getting 
launched and killed multiple times
 Key: YARN-2055
 URL: https://issues.apache.org/jira/browse/YARN-2055
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Sunil G


Cluster Size = 16GB [2NM's]
Queue A Capacity = 50%
Queue B Capacity = 50%
Consider there are 3 applications running in Queue A which has taken the full 
cluster capacity. 
J1 = 2GB AM + 1GB * 4 Maps
J2 = 2GB AM + 1GB * 4 Maps
J3 = 2GB AM + 1GB * 2 Maps

Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
Currently in this scenario, Jobs J3 will get killed including its AM.

It is better if AM can be given least priority among multiple applications. In 
this same scenario, map tasks from J3 and J2 can be preempted.
Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2056) Disable preemption at Queue level

2014-05-14 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2056:


Description: We need to be able to disable preemption at individual queue 
level  (was: If Queue A does not have enough capacity to run AM, then AM will 
borrow capacity from queue B to run AM in that case AM will be killed if queue 
B will reclaim its capacity and again AM will be launched and killed again, in 
that case job will be failed.)

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
 Fix For: 2.1.0-beta


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2055) Preemtion: Jobs are failing due to AMs are getting launched and killed multiple times

2014-05-14 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2055:


Description: If Queue A does not have enough capacity to run AM, then AM 
will borrow capacity from queue B to run AM in that case AM will be killed if 
queue B will reclaim its capacity and again AM will be launched and killed 
again, in that case job will be failed.  (was: Cluster Size = 16GB [2NM's]
Queue A Capacity = 50%
Queue B Capacity = 50%
Consider there are 3 applications running in Queue A which has taken the full 
cluster capacity. 
J1 = 2GB AM + 1GB * 4 Maps
J2 = 2GB AM + 1GB * 4 Maps
J3 = 2GB AM + 1GB * 2 Maps

Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
Currently in this scenario, Jobs J3 will get killed including its AM.

It is better if AM can be given least priority among multiple applications. In 
this same scenario, map tasks from J3 and J2 can be preempted.
Later when cluster is free, maps can be allocated to these Jobs.)

 Preemtion: Jobs are failing due to AMs are getting launched and killed 
 multiple times
 -

 Key: YARN-2055
 URL: https://issues.apache.org/jira/browse/YARN-2055
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
 Fix For: 2.1.0-beta


 If Queue A does not have enough capacity to run AM, then AM will borrow 
 capacity from queue B to run AM in that case AM will be killed if queue B 
 will reclaim its capacity and again AM will be launched and killed again, in 
 that case job will be failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2032) Implement a scalable, available TimelineStore using HBase

2014-05-12 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2032:


Attachment: YARN-2032-branch-2-1.patch

Updating patch for branch-2

Thanks,
Mayank

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2032-branch-2-1.patch


 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2032) Implement a scalable, available TimelineStore using HBase

2014-05-11 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-2032:
---

Assignee: Mayank Bansal  (was: Vinod Kumar Vavilapalli)

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal

 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2032) Implement a scalable, available TimelineStore using HBase

2014-05-11 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993000#comment-13993000
 ] 

Mayank Bansal commented on YARN-2032:
-

Taking it over, As I am already working on it

Thanks,
Mayank

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-03-27 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-304:
---

Attachment: YARN-304-1.patch

Attaching patch

Thanks,
Mayank

 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Mayank Bansal
 Attachments: YARN-304-1.patch


 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-03-25 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946731#comment-13946731
]

Mayank Bansal commented on YARN-1809:
-

Thanks [~zjshen] for the patch

bq. Yes, I think it should, but I prefer to put it in ApplicationBaseProtocol
when ApplicationHistoryClientService has implemented DT related methods.
history protocol already have these methods so don't need to wait , as they
have dummy iplementation for that.

bq. ApplicationBaseProtocol and ApplicationContext are completely different
things. ApplicationBaseProtocol is the PRC interface. Previously, I thought we
should have a uniformed ApplicationContext: on the RM side, it wraps RMContext;
while on the AHS side, it wraps ApplicationHistory. However, inspired by
RMWebServices#getApps, I think the RPC interface is a better place to uniform
the way of retrieving app info, so I created ApplicationBaseProtocol. And
ApplicationContext is no longer useful.
ApplicationBaseProtocol would be the base protocol of Client and history
however application context is something different. The motivation for context
is to wrap RM and AHS application data, SO I think having context make sense as
protocol has totally different motivation and methods as well when we add the
delegation methods to it.

bq. I understand the big patch is desperate for review, but I've to do that
because the patch is aiming to refactor the code to avoid duplicate web-UI code
for RM and for AHS. The two webUI should share the common code path, and then
display similarly.
I am fine with this if this is something you want to do.

{code}
p
+ * The protocol between clients and the codeResourceManager/code or
+ * codeApplicationHistoryServer/code to get information on applications,
+ * application attempts and containers.
+ * /p

This should be= it is a base protocol for application client and history.

Shouldn't we add @Idempotent to getallapplications as well?

If we add appliction context back then we need to rebase the patch according to
that.

Synchronize RM and Generic History Service Web-UIs
--

Key: YARN-1809
URL: https://issues.apache.org/jira/browse/YARN-1809
Project: Hadoop YARN
Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch,
YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch

After YARN-953, the web-UI of generic history service is provide more
information than that of RM, the details about app attempt and container.
It's good to provide similar web-UIs, but retrieve the data from separate
source, i.e., RM cache and history store respectively.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-03-21 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-304:
--

Assignee: Mayank Bansal  (was: Zhijie Shen)

 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Mayank Bansal

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-304) RM Tracking Links for purged applications needs a long-term solution

2014-03-21 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943545#comment-13943545
 ] 

Mayank Bansal commented on YARN-304:


Taking it over


 RM Tracking Links for purged applications needs a long-term solution
 

 Key: YARN-304
 URL: https://issues.apache.org/jira/browse/YARN-304
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.5
Reporter: Derek Dagit
Assignee: Mayank Bansal

 This JIRA is intended to track a proper long-term fix for the issue described 
 in YARN-285.
 The following is from the original description:
 As applications complete, the RM tracks their IDs in a completed list. This 
 list is routinely truncated to limit the total number of application 
 remembered by the RM.
 When a user clicks the History for a job, either the browser is redirected to 
 the application's tracking link obtained from the stored application 
 instance. But when the application has been purged from the RM, an error is 
 displayed.
 In very busy clusters the rate at which applications complete can cause 
 applications to be purged from the RM's internal list within hours, which 
 breaks the proxy URLs users have saved for their jobs.
 We would like the RM to provide valid tracking links persist so that users 
 are not frustrated by broken links.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-03-19 Thread Mayank Bansal (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940879#comment-13940879
]

Mayank Bansal commented on YARN-1809:
-

Thanks [~zjshen] for the patch.
Herer are some comments

1. Change name from ApplicationInformationProtocol to like
ApplicationBaseProtocol
2. Why we cant have delegationtoken related api's to Base Protocol?
3. ApplicationHistoryClientService - Why we removing protocol handler? I think
we should keep it as it was.
4. I am not sure why we removed the ApplicationContext, I think
ApplicationContext shoule be retained
Isn't it that good if we have the following structure
bq . ApplicationBaseProtocol derived by ApplicationContext
Thoughts?
5. There are lot of refactoring in the patch , which is good but we could have
seprated in two JIRAs which will make changes central to specific issue.
Thoughts?

Synchronize RM and Generic History Service Web-UIs
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-03-19 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941234#comment-13941234
 ] 

Mayank Bansal commented on YARN-1809:
-

I have tested this patch locally, this works ok with running apps however as 
soon as app is finished the urls starts giving error however they should be 
redirected to ahs urls
Thoughts?

Thanks,
Mayank


 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch, 
 YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1690) Sending timeline entities+events from Distributed shell

2014-03-18 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939598#comment-13939598
 ] 

Mayank Bansal commented on YARN-1690:
-

Thanks [~zjshen] for the review

bq . 1. Call it DSEvent?
Done
bq. 2. Chang it to Timeline Client?
Done
bq.3. Typo on CLient
Done
bq. config is the member field of ApplicationMaster
Done
bq. 5. Please merge the following duplicate exception handling as well
Done
bq. 6. Again, please do not mention AHS here
Done
bq.7. Please change publishContainerStartEvent, publishContainerEndEvent, 
publishApplicationAttemptEvent to static, which don't need to be per instance.
Done
bq.8. Please apply for the following to all the added error logs.
Done
bq.9. Please don't limit the output to 1. According to the args for this DS 
job, it should be 1 DS_APP_ATTEMPT entities and 2 DS_CONTAINER entities, which 
has 2 events each? And assert the number of returned entities/events?
Done

 Sending timeline entities+events from Distributed shell 
 

 Key: YARN-1690
 URL: https://issues.apache.org/jira/browse/YARN-1690
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1690-1.patch, YARN-1690-2.patch, YARN-1690-3.patch, 
 YARN-1690-4.patch, YARN-1690-5.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1690) Sending timeline entities+events from Distributed shell

2014-03-18 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1690:


Attachment: YARN-1690-6.patch

Attaching patch

Thanks,
Mayank

 Sending timeline entities+events from Distributed shell 
 

 Key: YARN-1690
 URL: https://issues.apache.org/jira/browse/YARN-1690
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1690-1.patch, YARN-1690-2.patch, YARN-1690-3.patch, 
 YARN-1690-4.patch, YARN-1690-5.patch, YARN-1690-6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1690) Sending timeline entities+events from Distributed shell

2014-03-18 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1690:


Attachment: YARN-1690-7.patch

Attaching patch

 Sending timeline entities+events from Distributed shell 
 

 Key: YARN-1690
 URL: https://issues.apache.org/jira/browse/YARN-1690
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1690-1.patch, YARN-1690-2.patch, YARN-1690-3.patch, 
 YARN-1690-4.patch, YARN-1690-5.patch, YARN-1690-6.patch, YARN-1690-7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 3 4 5 >

1 - 100 of 436 matches

Mail list logo