[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-06-11 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133916#comment-17133916
 ] 

Manikandan R commented on YARN-10297:
-

Thanks [~Jim_Brennan]. LGTM. Please fix whitespace issues.

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-06-15 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136293#comment-17136293
 ] 

Manikandan R commented on YARN-10297:
-

[~jhung] Patch LGTM. Can you please take a look and commit?

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch, YARN-10297.002.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2019-08-09 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904027#comment-16904027
 ] 

Manikandan R commented on YARN-6492:


Ok, [~eepayne]. Will look into this.

Some observations on .004.patch are 
 
1. Since partition info are being extracted from request and node, there is a 
problem. For example, 
 
Node N has been mapped to Label X (Non exclusive). Queue A has been configured 
with ANY Node label. App A requested resources from Queue A and its containers 
ran on Node N for some reasons. During AbstractCSQueue#allocateResource call, 
Node partition (using SchedulerNode ) would get used for calculation. Lets say 
allocate call has been fired for 3 containers of 1 GB each, then

a. PartitionDefault * queue A -> pending mb is 3 GB
b. PartitionX * queue A -> pending mb is -3 GB
 
is the outcome. Because app request has been fired without any label 
specification and #a metrics has been derived. After allocation is over, 
pending resources usually gets decreased. When this happens, it use node 
partition info. hence #b metrics has derived. 
 
Given this kind of situation, We will need to put some thoughts on achieving 
the metrics correctly.
 
2. Though the intent of this jira is to do Partition Queue Metrics, we would 
like to retain the existing Queue Metrics for backward compatibility (as you 
can see from jira's discussion). 

With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
overridden either with some specific partition values or default partition 
values. It could be vice - versa as well. For example, after the queues (say 
queue A) has been initialised with some min and max cap and also with node 
label's min and max cap, Queuemetrics (availableMB) for queue A return values 
based on node label's cap config.

I've been working on these observations to provide a fix and attached 
.005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
availableVcores is correct (Please refer above #2 observation). Added more 
asserts in  {{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
#2 is working properly.

Also one more thing to note is, user metrics for availableMB, availableVcores 
at root queue was not there even before. Retained the same behaviour. User 
metrics for availableMB, availableVcores is available only at child queue's 
level and also with partitions.

Will focus on #1 in next patch.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2019-08-09 Thread Manikandan R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492.005.WIP.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2019-08-16 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909600#comment-16909600
 ] 

Manikandan R commented on YARN-6492:


[~eepayne] Observations mentioned earlier are important ones which had come up 
as part of iterative development. I think this whole PartitionQueueMetrics 
feature won't be in usable state without these fixes.

At the same time, I am totally OK with having separate JIRA's for ease of 
tracking, assuming that we would be marking this whole feature as complete only 
after this new JIRA related to issues has been fixed.

Reg the structure, 

Yes, we would like to sync with UI, Rest API etc like discussed very earlier in 
this JIRA.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-17 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909636#comment-16909636
 ] 

Manikandan R commented on YARN-9756:


[~eepayne] I spent some time on understanding this. 
ProportionalCapacityPreemptionPolicy#preemptOrkillSelectedContainerAfterWait 
triggers an pre-emption event for each container based on max limit allowed per 
round. I think if we can do sum of memory/vcores of all containers going to be 
pre-empt for each round and call appropriate metrics methods here. Is this 
correct? 

Also, since metrics is going to be per round, assuming there would be so many 
rounds, Wouldn't be difficult for users to derive value out of it? Do you have 
any JMX o/p structure in your mind?

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-20 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9756:
---
Attachment: YARN-9756.WIP.patch

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-20 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911513#comment-16911513
 ] 

Manikandan R commented on YARN-9756:


{quote}These new metrics will be similar to AggregateMemoryMBSecondsPreempted, 
AggregateVcoreSecondsPreempted, etc. I propose to process the total preempted 
resources in the same way that is done for preempted seconds (memory, vcores, 
etc).
LeafQueue#updateQueuePreemptionMetrics will aggregate the total preempted 
resources just like it does for preempted resource seconds.{quote}
Ok, I was bit confused with "per round". Attached a quick WIP patch for your 
review.
{quote}The challenge I have encountered is making this work for extended 
resources (like gpu, etc.){quote}
Can {{QueueMetricsForCustomResources}} be used to generate this metric like 
other metrics for GPU? Have covered this also in the patch. Please share your 
views.

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9767) PartitionQueueMetrics Issues

2019-08-20 Thread Manikandan R (Jira)
Manikandan R created YARN-9767:
--

 Summary: PartitionQueueMetrics Issues
 Key: YARN-9767
 URL: https://issues.apache.org/jira/browse/YARN-9767
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R
Assignee: Manikandan R


The intent of the Jira is to capture the issues/observations encountered as 
part of YARN-6492 development separately for ease of tracking.

Observations:

Please refer this 

https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027

1. Since partition info are being extracted from request and node, there is a 
problem. For example, 
 
Node N has been mapped to Label X (Non exclusive). Queue A has been configured 
with ANY Node label. App A requested resources from Queue A and its containers 
ran on Node N for some reasons. During AbstractCSQueue#allocateResource call, 
Node partition (using SchedulerNode ) would get used for calculation. Lets say 
allocate call has been fired for 3 containers of 1 GB each, then

a. PartitionDefault * queue A -> pending mb is 3 GB
b. PartitionX * queue A -> pending mb is -3 GB
 
is the outcome. Because app request has been fired without any label 
specification and #a metrics has been derived. After allocation is over, 
pending resources usually gets decreased. When this happens, it use node 
partition info. hence #b metrics has derived. 
 
Given this kind of situation, We will need to put some thoughts on achieving 
the metrics correctly.
 
2. Though the intent of this jira is to do Partition Queue Metrics, we would 
like to retain the existing Queue Metrics for backward compatibility (as you 
can see from jira's discussion).

With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
overridden either with some specific partition values or default partition 
values. It could be vice - versa as well. For example, after the queues (say 
queue A) has been initialised with some min and max cap and also with node 
label's min and max cap, Queuemetrics (availableMB) for queue A return values 
based on node label's cap config.

I've been working on these observations to provide a fix and attached 
.005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
availableVcores is correct (Please refer above #2 observation). Added more 
asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for #2 
is working properly.

Also one more thing to note is, user metrics for availableMB, availableVcores 
at root queue was not there even before. Retained the same behaviour. User 
metrics for availableMB, availableVcores is available only at child queue's 
level and also with partitions.

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9767) PartitionQueueMetrics Issues

2019-08-20 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911575#comment-16911575
 ] 

Manikandan R commented on YARN-9767:


On #1 observation,

After container allocation, pending resources gets deducted inside 
{{QueueeMetrics#allocateResources}} using the Node Partition as opposed to 
requested partition info. I think RMContainerImpl#getNodeLabelExpression can be 
used to decreasing pending resources as it is more appropriate because of 
following reasons:

1. {{RMContainerImpl#getNodeLabelExpression}} is derived from 
{{AppPlacementAllocator#getPrimaryRequestedNodePartition}}. Java doc of 
{{AppPlacementAllocator#getPrimaryRequestedNodePartition}} is good enough to 
explain this.
 2. In this case, actual intent is to run on ANY where (which is nothing but 
the "default" partition) but ended up in using some non exclusive partition. So 
increasing pending resources on "default" partition or 
PrimaryRequestedNodePartition (mostly "default" or any specific partition) and 
deducting the pending resources in the same way seems to be correct one rather 
than increasing and decreasing in two different places.

So fix would be something like

{{AppSchedulingInfo#updateMetrics}}
{code:java}
   queue.getMetrics().allocateResources(node.getPartition(), user, 1,
  containerAllocated.getContainer().getResource(), false);
  queue.getMetrics().decrPendingResources(
  containerAllocated.getNodeLabelExpression(), user, 1,
  containerAllocated.getContainer().getResource());
{code}
instead of
{code:java}
   queue.getMetrics().allocateResources(node.getPartition(), user, 1,
  containerAllocated.getContainer().getResource(), true);
{code}

Please share your thoughts.

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h..

[jira] [Updated] (YARN-9767) PartitionQueueMetrics Issues

2019-08-20 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9767:
---
Parent: YARN-6492
Issue Type: Sub-task  (was: Bug)

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2019-08-20 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911579#comment-16911579
 ] 

Manikandan R commented on YARN-6492:


Created YARN-9767 to track the issues separately.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9767) PartitionQueueMetrics Issues

2019-08-20 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911575#comment-16911575
 ] 

Manikandan R edited comment on YARN-9767 at 8/20/19 5:37 PM:
-

On #1 observation,

After container allocation, pending resources gets deducted inside 
{{QueueeMetrics#allocateResources}} using the Node Partition as opposed to 
using requested partition info. I think RMContainerImpl#getNodeLabelExpression 
can be used to decreasing pending resources as it is more appropriate because 
of following reasons:

1. {{RMContainerImpl#getNodeLabelExpression}} is derived from 
{{AppPlacementAllocator#getPrimaryRequestedNodePartition}}. Java doc of 
{{AppPlacementAllocator#getPrimaryRequestedNodePartition}} is good enough to 
explain this.
 2. In this case, actual intent is to run on ANY where (which is nothing but 
the "default" partition) but ended up in using some non exclusive partition. So 
increasing pending resources on "default" partition or 
PrimaryRequestedNodePartition (mostly "default" or any specific partition) and 
deducting the pending resources in the same way seems to be correct one rather 
than increasing and decreasing in two different places.

So fix would be something like

{{AppSchedulingInfo#updateMetrics}}
{code:java}
   queue.getMetrics().allocateResources(node.getPartition(), user, 1,
  containerAllocated.getContainer().getResource(), false);
  queue.getMetrics().decrPendingResources(
  containerAllocated.getNodeLabelExpression(), user, 1,
  containerAllocated.getContainer().getResource());
{code}
instead of
{code:java}
   queue.getMetrics().allocateResources(node.getPartition(), user, 1,
  containerAllocated.getContainer().getResource(), true);
{code}
Please share your thoughts.


was (Author: maniraj...@gmail.com):
On #1 observation,

After container allocation, pending resources gets deducted inside 
{{QueueeMetrics#allocateResources}} using the Node Partition as opposed to 
requested partition info. I think RMContainerImpl#getNodeLabelExpression can be 
used to decreasing pending resources as it is more appropriate because of 
following reasons:

1. {{RMContainerImpl#getNodeLabelExpression}} is derived from 
{{AppPlacementAllocator#getPrimaryRequestedNodePartition}}. Java doc of 
{{AppPlacementAllocator#getPrimaryRequestedNodePartition}} is good enough to 
explain this.
 2. In this case, actual intent is to run on ANY where (which is nothing but 
the "default" partition) but ended up in using some non exclusive partition. So 
increasing pending resources on "default" partition or 
PrimaryRequestedNodePartition (mostly "default" or any specific partition) and 
deducting the pending resources in the same way seems to be correct one rather 
than increasing and decreasing in two different places.

So fix would be something like

{{AppSchedulingInfo#updateMetrics}}
{code:java}
   queue.getMetrics().allocateResources(node.getPartition(), user, 1,
  containerAllocated.getContainer().getResource(), false);
  queue.getMetrics().decrPendingResources(
  containerAllocated.getNodeLabelExpression(), user, 1,
  containerAllocated.getContainer().getResource());
{code}
instead of
{code:java}
   queue.getMetrics().allocateResources(node.getPartition(), user, 1,
  containerAllocated.getContainer().getResource(), true);
{code}

Please share your thoughts.

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resour

[jira] [Commented] (YARN-9766) YARN CapacityScheduler QueueMetrics has missing metrics for parent queues having same name

2019-08-21 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912532#comment-16912532
 ] 

Manikandan R commented on YARN-9766:


While constructing Queue objects, it makes use of {{old}}, does null check and 
use {{getMetrics}} if it is possible. Below piece of code is not letting to 
create metrics for "root.a.d.b" as "root.a.b" has been generated before. I 
think checking equality using getQueuePath() in addition to "null" check helps 
to differentiate these two different paths. cc [~eepayne] [~sunilg]

{code}
this.metrics = old != null ?
(CSQueueMetrics) old.getMetrics() :
CSQueueMetrics.forQueue(getQueuePath(), parent,
cs.getConfiguration().getEnableUserMetrics(), cs.getConf());
{code}

[~tarunparimi] Can I take it forward? 


> YARN CapacityScheduler QueueMetrics has missing metrics for parent queues 
> having same name
> --
>
> Key: YARN-9766
> URL: https://issues.apache.org/jira/browse/YARN-9766
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
>
> In Capacity Scheduler, we enforce Leaf Queues to have unique names. But it is 
> not the case for Parent Queues. For example, we can have the below queue 
> hierarchy, where "b" is the queue name for two different queue paths root.a.b 
> and root.a.d.b . Since it is not a leaf queue this configuration works and 
> apps run fine in the leaf queues 'c'  and 'e'.
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> But the jmx metrics does not show the metrics for the parent queue 
> "root.a.d.b" . We can see metrics only for "root.a.b" queue.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9756:
---
Attachment: YARN-9756.001.patch

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756.001.patch, YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913455#comment-16913455
 ] 

Manikandan R commented on YARN-9756:


[~eepayne] Thanks. Attached .001.patch for your reviews.

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756.001.patch, YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-08-22 Thread Manikandan R (Jira)
Manikandan R created YARN-9772:
--

 Summary: CapacitySchedulerQueueManager has incorrect list of queues
 Key: YARN-9772
 URL: https://issues.apache.org/jira/browse/YARN-9772
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R
Assignee: Manikandan R


CapacitySchedulerQueueManager has incorrect list of queues when there is more 
than one parent queue (say at middle level) with same name.

For example,
 * root
 ** a
 *** b
  c
 *** d
  b
 * e

{{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
the map because of similar name. After parsing all the queues, map count should 
be 7, but it is 6. Any reference to queue "root.a.b" in code path is nothing 
but "root.a.d.b" object. Since {{CapacitySchedulerQueueManager#getQueues}} has 
been used in multiple places, will need to understand the implications in 
detail. For example, {{CapapcityScheduler#getQueue}} has been used in many 
places which in turn uses {{CapacitySchedulerQueueManager#getQueues. cc 
[~eepayne], [~sunilg] }}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-08-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9772:
---
Description: 
CapacitySchedulerQueueManager has incorrect list of queues when there is more 
than one parent queue (say at middle level) with same name.

For example,
 * root
 ** a
 *** b
  c
 *** d
  b
 * e

{{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
the map because of similar name. After parsing all the queues, map count should 
be 7, but it is 6. Any reference to queue "root.a.b" in code path is nothing 
but "root.a.d.b" object. Since {{CapacitySchedulerQueueManager#getQueues}} has 
been used in multiple places, will need to understand the implications in 
detail. For example, {{CapapcityScheduler#getQueue}} has been used in many 
places which in turn uses {{CapacitySchedulerQueueManager#getQueues}}. cc 
[~eepayne], [~sunilg]

  was:
CapacitySchedulerQueueManager has incorrect list of queues when there is more 
than one parent queue (say at middle level) with same name.

For example,
 * root
 ** a
 *** b
  c
 *** d
  b
 * e

{{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
the map because of similar name. After parsing all the queues, map count should 
be 7, but it is 6. Any reference to queue "root.a.b" in code path is nothing 
but "root.a.d.b" object. Since {{CapacitySchedulerQueueManager#getQueues}} has 
been used in multiple places, will need to understand the implications in 
detail. For example, {{CapapcityScheduler#getQueue}} has been used in many 
places which in turn uses {{CapacitySchedulerQueueManager#getQueues. cc 
[~eepayne], [~sunilg] }}


> CapacitySchedulerQueueManager has incorrect list of queues
> --
>
> Key: YARN-9772
> URL: https://issues.apache.org/jira/browse/YARN-9772
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> CapacitySchedulerQueueManager has incorrect list of queues when there is more 
> than one parent queue (say at middle level) with same name.
> For example,
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
> While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
> the map because of similar name. After parsing all the queues, map count 
> should be 7, but it is 6. Any reference to queue "root.a.b" in code path is 
> nothing but "root.a.d.b" object. Since 
> {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, 
> will need to understand the implications in detail. For example, 
> {{CapapcityScheduler#getQueue}} has been used in many places which in turn 
> uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9766) YARN CapacityScheduler QueueMetrics has missing metrics for parent queues having same name

2019-08-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913502#comment-16913502
 ] 

Manikandan R commented on YARN-9766:


Ok, [~tarunparimi]. Thanks.

While understanding this issue in detail, had come across another related 
issue. Created YARN-9772 for the same.

> YARN CapacityScheduler QueueMetrics has missing metrics for parent queues 
> having same name
> --
>
> Key: YARN-9766
> URL: https://issues.apache.org/jira/browse/YARN-9766
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
>
> In Capacity Scheduler, we enforce Leaf Queues to have unique names. But it is 
> not the case for Parent Queues. For example, we can have the below queue 
> hierarchy, where "b" is the queue name for two different queue paths root.a.b 
> and root.a.d.b . Since it is not a leaf queue this configuration works and 
> apps run fine in the leaf queues 'c'  and 'e'.
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> But the jmx metrics does not show the metrics for the parent queue 
> "root.a.d.b" . We can see metrics only for "root.a.b" queue.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9773) PartitionQueueMetrics for Custom Resources/Resource vectors

2019-08-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9773:
---
Parent: YARN-6492
Issue Type: Sub-task  (was: Bug)

> PartitionQueueMetrics for Custom Resources/Resource vectors
> ---
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9773) PartitionQueueMetrics for Custom Resources/Resource vectors

2019-08-22 Thread Manikandan R (Jira)
Manikandan R created YARN-9773:
--

 Summary: PartitionQueueMetrics for Custom Resources/Resource 
vectors
 Key: YARN-9773
 URL: https://issues.apache.org/jira/browse/YARN-9773
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R
Assignee: Manikandan R






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2019-08-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913526#comment-16913526
 ] 

Manikandan R commented on YARN-6492:


Created YARN-9773 for the same. Will split .005 patch and attach the same in 
corresponding sub tasks shortly.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913860#comment-16913860
 ] 

Manikandan R commented on YARN-9756:


Sorry. Made changes to \{{TestCapacitySchedulerSurgicalPreemption}} test case 
but missed to capture in patch.

Attached .002.patch.

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756.001.patch, YARN-9756.002.patch, 
> YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9756:
---
Attachment: YARN-9756.002.patch

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756.001.patch, YARN-9756.002.patch, 
> YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-08-23 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914411#comment-16914411
 ] 

Manikandan R commented on YARN-9768:


Is this duplicate of YARN-9478?

Have a patch to handle this. Can I post a patch over there?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2019-08-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492.006.WIP.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2019-08-28 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917920#comment-16917920
 ] 

Manikandan R commented on YARN-6492:


Attaching .006.patch. It covers the changes only required for this JIRA (not 
any changes related to YARN-9767 & YARN-9773).

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-28 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917923#comment-16917923
 ] 

Manikandan R commented on YARN-9756:


Attaching patch for branch 3.2.

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756-branch-3.2.003.patch, YARN-9756.001.patch, 
> YARN-9756.002.patch, YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9756:
---
Attachment: YARN-9756-branch-3.2.003.patch

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756-branch-3.2.003.patch, YARN-9756.001.patch, 
> YARN-9756.002.patch, YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9756:
---
Attachment: YARN-9756-branch-3.0.004.patch
YARN-9756-branch-2.8.005.patch

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756-branch-2.8.005.patch, 
> YARN-9756-branch-3.0.004.patch, YARN-9756-branch-3.2.003.patch, 
> YARN-9756.001.patch, YARN-9756.002.patch, YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-28 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917934#comment-16917934
 ] 

Manikandan R commented on YARN-9756:


Attaching patch for branch 3.0 & branch 2.8.

> Create metric that sums total memory/vcores preempted per round
> ---
>
> Key: YARN-9756
> URL: https://issues.apache.org/jira/browse/YARN-9756
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>Reporter: Eric Payne
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9756-branch-2.8.005.patch, 
> YARN-9756-branch-3.0.004.patch, YARN-9756-branch-3.2.003.patch, 
> YARN-9756.001.patch, YARN-9756.002.patch, YARN-9756.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-08-28 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917943#comment-16917943
 ] 

Manikandan R commented on YARN-9768:


[~crh] [~wangda]

Thanks.

Attaching patch for your review. I can pull config from YARN configuration if 
needed.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-08-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9768:
---
Attachment: YARN-9768.001.patch

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9478) Add timeout for renew delegation thread pool

2019-08-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YARN-9478.

Resolution: Duplicate

> Add timeout for renew delegation thread pool
> 
>
> Key: YARN-9478
> URL: https://issues.apache.org/jira/browse/YARN-9478
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
>Priority: Major
>
> Yarn by default creates a thread pool with 50 threads to handle all the token 
> renewal for the running jobs. Currently there is no timeout for the threads 
> so if there is one application is slowing to renew token, then eventually 
> Yarn could run into the situation that all the threads are busy with renewing 
> tokens for such application types and the whole Yarn cluster can't handle new 
> applications. 
> Propose to add timeout to the threads in the thread pool so the threads get 
> killed after certain time.  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9767) PartitionQueueMetrics Issues

2019-08-29 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918755#comment-16918755
 ] 

Manikandan R commented on YARN-9767:


Attaching .001.patch for review.

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9767.001.patch
>
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9767) PartitionQueueMetrics Issues

2019-08-29 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9767:
---
Attachment: YARN-9767.001.patch

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9767.001.patch
>
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9767) PartitionQueueMetrics Issues

2019-08-29 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918755#comment-16918755
 ] 

Manikandan R edited comment on YARN-9767 at 8/29/19 4:23 PM:
-

[~eepayne]  Attaching .001.patch for review. Can you please take a look?


was (Author: maniraj...@gmail.com):
Attaching .001.patch for review.

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9767.001.patch
>
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9773) Add QueueMetrics for Custom Resources

2019-08-29 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918779#comment-16918779
 ] 

Manikandan R commented on YARN-9773:


[~eepayne] Attaching .001.patch for review. Custom resources metrics would be 
registered into JMX similarly like "running_*" metrics.

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9773.001.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9773) Add QueueMetrics for Custom Resources

2019-08-29 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9773:
---
Attachment: YARN-9773.001.patch

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9773.001.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-09-06 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9768:
---
Attachment: YARN-9768.002.patch

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-09-06 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924378#comment-16924378
 ] 

Manikandan R commented on YARN-9768:


[~crh] [~elgoiri] Thanks for review. Sorry for the delay.

Extended a bit to have max retry attempts as well in addition to the test case 
changes. Please take a look. Once everything is fine, I can take care of the 
documentation part.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9773) Add QueueMetrics for Custom Resources

2019-09-06 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924384#comment-16924384
 ] 

Manikandan R commented on YARN-9773:


[~eepayne]  Thanks for the review. Attached .002.patch.

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9773.001.patch, YARN-9773.002.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9773) Add QueueMetrics for Custom Resources

2019-09-06 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9773:
---
Attachment: YARN-9773.002.patch

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9773.001.patch, YARN-9773.002.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-09-10 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926752#comment-16926752
 ] 

Manikandan R commented on YARN-9772:


To keep it simple, Should we extend the duplicates check (as of now, it does 
only for leaf queues) to parent queues as well? 

[~sunilg] [~wangda] [~weiweiyagn666] [~eepayne] Please share your thoughts.

> CapacitySchedulerQueueManager has incorrect list of queues
> --
>
> Key: YARN-9772
> URL: https://issues.apache.org/jira/browse/YARN-9772
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> CapacitySchedulerQueueManager has incorrect list of queues when there is more 
> than one parent queue (say at middle level) with same name.
> For example,
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
> While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
> the map because of similar name. After parsing all the queues, map count 
> should be 7, but it is 6. Any reference to queue "root.a.b" in code path is 
> nothing but "root.a.d.b" object. Since 
> {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, 
> will need to understand the implications in detail. For example, 
> {{CapapcityScheduler#getQueue}} has been used in many places which in turn 
> uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-09-10 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926752#comment-16926752
 ] 

Manikandan R edited comment on YARN-9772 at 9/10/19 4:14 PM:
-

To keep it simple and given the probability of having same names for parent and 
leaf queues is very less, Should we extend the duplicates check (as of now, it 
does only for leaf queues) to parent queues as well? 

[~sunilg] [~wangda] [~weiweiyagn666] [~eepayne] Please share your thoughts.


was (Author: maniraj...@gmail.com):
To keep it simple, Should we extend the duplicates check (as of now, it does 
only for leaf queues) to parent queues as well? 

[~sunilg] [~wangda] [~weiweiyagn666] [~eepayne] Please share your thoughts.

> CapacitySchedulerQueueManager has incorrect list of queues
> --
>
> Key: YARN-9772
> URL: https://issues.apache.org/jira/browse/YARN-9772
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> CapacitySchedulerQueueManager has incorrect list of queues when there is more 
> than one parent queue (say at middle level) with same name.
> For example,
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
> While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
> the map because of similar name. After parsing all the queues, map count 
> should be 7, but it is 6. Any reference to queue "root.a.b" in code path is 
> nothing but "root.a.d.b" object. Since 
> {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, 
> will need to understand the implications in detail. For example, 
> {{CapapcityScheduler#getQueue}} has been used in many places which in turn 
> uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-09-17 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931577#comment-16931577
 ] 

Manikandan R commented on YARN-9772:


True, [~tarunparimi] but I think those situations are very unlikely. May be 
detailed documentation would help in this context.

[~sunilg] [~wangda] Can you also share your thoughts?

> CapacitySchedulerQueueManager has incorrect list of queues
> --
>
> Key: YARN-9772
> URL: https://issues.apache.org/jira/browse/YARN-9772
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> CapacitySchedulerQueueManager has incorrect list of queues when there is more 
> than one parent queue (say at middle level) with same name.
> For example,
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
> While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
> the map because of similar name. After parsing all the queues, map count 
> should be 7, but it is 6. Any reference to queue "root.a.b" in code path is 
> nothing but "root.a.d.b" object. Since 
> {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, 
> will need to understand the implications in detail. For example, 
> {{CapapcityScheduler#getQueue}} has been used in many places which in turn 
> uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues

2019-09-17 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931577#comment-16931577
 ] 

Manikandan R edited comment on YARN-9772 at 9/17/19 4:17 PM:
-

True, [~tarunparimi] but I think those situations are very unlikely and 
detailed documentation would help in this context.

[~sunilg] [~wangda] Can you also share your thoughts?


was (Author: maniraj...@gmail.com):
True, [~tarunparimi] but I think those situations are very unlikely. May be 
detailed documentation would help in this context.

[~sunilg] [~wangda] Can you also share your thoughts?

> CapacitySchedulerQueueManager has incorrect list of queues
> --
>
> Key: YARN-9772
> URL: https://issues.apache.org/jira/browse/YARN-9772
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> CapacitySchedulerQueueManager has incorrect list of queues when there is more 
> than one parent queue (say at middle level) with same name.
> For example,
>  * root
>  ** a
>  *** b
>   c
>  *** d
>   b
>  * e
> {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. 
> While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in 
> the map because of similar name. After parsing all the queues, map count 
> should be 7, but it is 6. Any reference to queue "root.a.b" in code path is 
> nothing but "root.a.d.b" object. Since 
> {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, 
> will need to understand the implications in detail. For example, 
> {{CapapcityScheduler#getQueue}} has been used in many places which in turn 
> uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-09-17 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931643#comment-16931643
 ] 

Manikandan R commented on YARN-9768:


[~elgoiri] [~crh] Can you review?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9627) DelegationTokenRenewer could block transitionToStandy

2019-09-18 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932570#comment-16932570
 ] 

Manikandan R commented on YARN-9627:


{quote}Issue could block switch over when HDFS token renewal takes time
{quote}
YARN-9768 handles this problem based on timeout approach. Mind taking a look at 
the patch and share your thoughts as it is related to this JIRA?

> DelegationTokenRenewer could block transitionToStandy
> -
>
> Key: YARN-9627
> URL: https://issues.apache.org/jira/browse/YARN-9627
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: krishna reddy
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9627.001.patch, YARN-9627.002.patch, 
> YARN-9627.003.patch
>
>
> Cluster size: 5K
> Running containers: 55K
> *Scenario*: Largenumber of pending applications (around 50K) and performing 
> RM switch over
> Below exception :
> {noformat}
> 2019-06-13 17:39:27,594 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renew Kind: HDFS_DELEGATION_TOKEN, Service: X:1616, Ident: (token 
> for root: HDFS_DELEGATION_TOKEN owner=root/had...@hadoop.com, renewer=yarn, 
> realUser=, issueDate=1560361265181, maxDate=1560966065181, 
> sequenceNumber=104708, masterKeyId=3);exp=1560533965360; 
> apps=[application_1560346941775_20702] in 86397766 ms, appId = 
> [application_1560346941775_20702]
> 2019-06-13 17:39:27,609 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer on recovery.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>  
> 2019-06-13 17:58:20,878 ERROR org.apache.zookeeper.ClientCnxn: Time out error 
> occurred for the packet 'clientPath:null serverPath:null finished:false 
> header:: 27,4  replyHeader:: 27,4295687588,0  request:: 
> '/rmstore1/ZKRMStateRoot/RMDTSecretManagerRoot/RMDTMasterKeysRoot/DelegationKey_49,F
>   response:: 
> #31ff8a16b74ffe129768ffdbffe949ff8dffd517ffcafffa,s{4295423577,4295423577,1560342837789,1560342837789,0,0,0,0,17,0,4295423577}
>  '.
> 2019-06-13 17:58:20,877 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Renewed delegation-token= [Kind: HDFS_DELEGATION_TOKEN, Service: 
> X:1616, Ident: (token for root: HDFS_DELEGATION_TOKEN 
> owner=root/had...@hadoop.com, renewer=yarn, realUser=, 
> issueDate=1560366110990, maxDate=1560970910990, sequenceNumber=111891, 
> masterKeyId=3);exp=1560534896413; apps=[application_1560346941775_28115]]
> 2019-06-13 17:58:20,924 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer on recovery.
> java.lang.IllegalStateException: Timer already cancelled.
> at java.util.Timer.sched(Timer.java:397)
> at java.util.Timer.schedule(Timer.java:208)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.setTimerForTokenRenewal(DelegationTokenRenewer.java:612)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:523)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748
> {

[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-09-20 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934499#comment-16934499
 ] 

Manikandan R commented on YARN-9840:


[~pbacsko] I have a patch to address this. Can I post the same?

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-09-21 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934982#comment-16934982
 ] 

Manikandan R commented on YARN-9840:


Thanks [~pbacsko]. Attached .001.patch.

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9840.001.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-09-21 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9840:
---
Attachment: YARN-9840.001.patch

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9840.001.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9773) Add QueueMetrics for Custom Resources

2019-09-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9773:
---
Attachment: YARN-9773.003.patch

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9773.001.patch, YARN-9773.002.patch, 
> YARN-9773.003.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9773) Add QueueMetrics for Custom Resources

2019-09-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935370#comment-16935370
 ] 

Manikandan R commented on YARN-9773:


Thanks [~eepayne] for your review. Attached .003.patch.

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9773.001.patch, YARN-9773.002.patch, 
> YARN-9773.003.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935371#comment-16935371
 ] 

Manikandan R commented on YARN-9841:


[~pbacsko] Shall I take this forward as it is related to YARN-9840?

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-24 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9841:
---
Attachment: YARN-9841.001.patch

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937024#comment-16937024
 ] 

Manikandan R commented on YARN-9841:


Thanks [~pbacsko]. Attached .001.patch for review.

While working on this JIRA, had come across below observations:
 # As documented in 
[https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html,|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html]
 tried "u:user2:%primary_group" mapping and don't think it is working as 
expected. Expected queue o/p is, queue name similar to primary group of the 
user, but it is not the case.
 # Use case of "u:%user:parentqueue.%user" mapping doesn't return expected o/p 
when it is working in conjunction with "u:%user:%primary_group" mapping. Where 
as, Using "u:%user:parentqueue.%user" mapping alone is working as expected.

Created a separate junit patch to validate these observations. Can you please 
validate this? We can raise separate JIRA's to address these issues based on 
your confirmation.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-24 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9841:
---
Attachment: YARN-9841.junit.patch

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937024#comment-16937024
 ] 

Manikandan R edited comment on YARN-9841 at 9/24/19 5:59 PM:
-

Thanks [~pbacsko]. Attached .001.patch for review.

While working on this JIRA, had come across below observations:
 # As documented in 
[https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html,|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html]
 tried "u:user2:%primary_group" mapping and don't think it is working as 
expected. Expected queue o/p is, queue name similar to primary group of the 
user, but it is not the case. Where as, "u:%user:%primary_group" mapping is 
working as expected.
 # Use case of "u:%user:parentqueue.%user" mapping doesn't return expected o/p 
when it is working in conjunction with "u:%user:%primary_group" mapping. Where 
as, Using "u:%user:parentqueue.%user" mapping alone is working as expected.

Created a separate junit patch to validate these observations. Can you please 
validate this? We can raise separate JIRA's to address these issues based on 
your confirmation.


was (Author: maniraj...@gmail.com):
Thanks [~pbacsko]. Attached .001.patch for review.

While working on this JIRA, had come across below observations:
 # As documented in 
[https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html,|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html]
 tried "u:user2:%primary_group" mapping and don't think it is working as 
expected. Expected queue o/p is, queue name similar to primary group of the 
user, but it is not the case.
 # Use case of "u:%user:parentqueue.%user" mapping doesn't return expected o/p 
when it is working in conjunction with "u:%user:%primary_group" mapping. Where 
as, Using "u:%user:parentqueue.%user" mapping alone is working as expected.

Created a separate junit patch to validate these observations. Can you please 
validate this? We can raise separate JIRA's to address these issues based on 
your confirmation.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-09-24 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9840:
---
Attachment: YARN-9840.002.patch

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9840.001.patch, YARN-9840.002.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-09-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937061#comment-16937061
 ] 

Manikandan R edited comment on YARN-9840 at 9/24/19 6:07 PM:
-

[~pbacsko] Thanks for your review.

Addressed all your comments. Attached .002.patch.
{quote}What if there's no secondary group and we return {{null}}? Can't it 
cause an NPE somewhere else?
{quote}
In this case, it does't throw any exception and makes use of 'default' queue. 
Newly added asserts covers this. Also debug log has been added.
{quote}One more thing - this enhancement should be documented.
{quote}
Yes. Require some more clarity as mentioned in 
https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024.
 Will do in next patch.


was (Author: maniraj...@gmail.com):
[~pbacsko] Thanks for your review.

Addressed all your comments. Attached .002.patch.
{quote}What if there's no secondary group and we return {{null}}? Can't it 
cause an NPE somewhere else?
{quote}
In this case, it does't throw any exception and makes use of 'default' queue. 
Newly added asserts covers this. Also debug log has been added.
{quote}One more thing - this enhancement should be documented.
{quote}
Require some more clarity as mentioned in 
https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024.
 Will do in next patch.

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9840.001.patch, YARN-9840.002.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-09-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937061#comment-16937061
 ] 

Manikandan R commented on YARN-9840:


[~pbacsko] Thanks for your review.

Addressed all your comments. Attached .002.patch.
{quote}What if there's no secondary group and we return {{null}}? Can't it 
cause an NPE somewhere else?
{quote}
In this case, it does't throw any exception and makes use of 'default' queue. 
Newly added asserts covers this. Also debug log has been added.
{quote}One more thing - this enhancement should be documented.
{quote}
Require some more clarity as mentioned in 
https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024.
 Will do in next patch.

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9840.001.patch, YARN-9840.002.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-09-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938759#comment-16938759
 ] 

Manikandan R commented on YARN-9768:


[~inigoiri] [~crh] Can you review?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9841:
---
Attachment: YARN-9841.002.patch

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938834#comment-16938834
 ] 

Manikandan R commented on YARN-9841:


Thanks [~pbacsko] for review. Addressed all of your comments. Attached 
.002.patch. 
{quote}If we have this for {{%primary_group}}, can't we just handle 
{{%secondary_group}} as well?
{quote}
Initially thought about this, but then preferred to take it in separate for 
ease of tracking and to avoid confusions with description etc. Hope you are 
fine.

Also, Had a chance to look at observations raised earlier? We can track these 
issues in separate JIRA.
{quote}Can {{ctx}} ever be null? I assume this test should produce the same 
behavior each time, provided the code-under-test doesn't change. So to me it 
seems more logical to make sure that {{ctx}} is not null, which practically 
means a new assertion. Btw this applies to the piece of code above, too.
{quote}
Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in 
{{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts 
wherein some cases ctx is null.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938834#comment-16938834
 ] 

Manikandan R edited comment on YARN-9841 at 9/26/19 5:29 PM:
-

Thanks [~pbacsko] for review. Addressed all of your comments. Attached 
.002.patch. 
{quote}If we have this for {{%primary_group}}, can't we just handle 
{{%secondary_group}} as well?
{quote}
Initially thought about this, but then preferred to take it in separate Jira 
for ease of tracking and to avoid confusions with description, discussions etc. 
Hope you are fine.

Also, Had a chance to look at observations raised earlier? We can track these 
issues in separate JIRA.
{quote}Can {{ctx}} ever be null? I assume this test should produce the same 
behavior each time, provided the code-under-test doesn't change. So to me it 
seems more logical to make sure that {{ctx}} is not null, which practically 
means a new assertion. Btw this applies to the piece of code above, too.
{quote}
Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in 
{{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts 
wherein some cases ctx is null.


was (Author: maniraj...@gmail.com):
Thanks [~pbacsko] for review. Addressed all of your comments. Attached 
.002.patch. 
{quote}If we have this for {{%primary_group}}, can't we just handle 
{{%secondary_group}} as well?
{quote}
Initially thought about this, but then preferred to take it in separate for 
ease of tracking and to avoid confusions with description etc. Hope you are 
fine.

Also, Had a chance to look at observations raised earlier? We can track these 
issues in separate JIRA.
{quote}Can {{ctx}} ever be null? I assume this test should produce the same 
behavior each time, provided the code-under-test doesn't change. So to me it 
seems more logical to make sure that {{ctx}} is not null, which practically 
means a new assertion. Btw this applies to the piece of code above, too.
{quote}
Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in 
{{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts 
wherein some cases ctx is null.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-09-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9768:
---
Attachment: YARN-9768.003.patch

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2019-09-28 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940115#comment-16940115
 ] 

Manikandan R commented on YARN-9768:


[~inigoiri] Thanks for review. Sorry, There was some problem in eclipse 
formatter. Fixed.

Addressed almost all comments. Regarding sleeps, since there are multiple 
retries with fixed interval, sleeping helps in ensuring max retry attempts has 
been exhausted.

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping

2019-09-28 Thread Manikandan R (Jira)
Manikandan R created YARN-9865:
--

 Summary: Capacity scheduler: add support for combined %user + 
%secondary_group mapping
 Key: YARN-9865
 URL: https://issues.apache.org/jira/browse/YARN-9865
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R
Assignee: Manikandan R






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping

2019-09-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9865:
---
Description: Similiar to YARN-9841, but for secondary group.

> Capacity scheduler: add support for combined %user + %secondary_group mapping
> -
>
> Key: YARN-9865
> URL: https://issues.apache.org/jira/browse/YARN-9865
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> Similiar to YARN-9841, but for secondary group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping

2019-09-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9865:
---
Attachment: YARN-9865.001.patch

> Capacity scheduler: add support for combined %user + %secondary_group mapping
> -
>
> Key: YARN-9865
> URL: https://issues.apache.org/jira/browse/YARN-9865
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9865.001.patch
>
>
> Similiar to YARN-9841, but for secondary group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping

2019-09-28 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940245#comment-16940245
 ] 

Manikandan R commented on YARN-9865:


Attached .001.patch.

> Capacity scheduler: add support for combined %user + %secondary_group mapping
> -
>
> Key: YARN-9865
> URL: https://issues.apache.org/jira/browse/YARN-9865
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9865.001.patch
>
>
> Similiar to YARN-9841, but for secondary group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9866) u:user2:%primary_group is not working as expected

2019-09-28 Thread Manikandan R (Jira)
Manikandan R created YARN-9866:
--

 Summary: u:user2:%primary_group is not working as expected
 Key: YARN-9866
 URL: https://issues.apache.org/jira/browse/YARN-9866
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R
Assignee: Manikandan R






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9866) u:user2:%primary_group is not working as expected

2019-09-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9866:
---
Description: Please refer #1 in 
https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024
 for more details

> u:user2:%primary_group is not working as expected
> -
>
> Key: YARN-9866
> URL: https://issues.apache.org/jira/browse/YARN-9866
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> Please refer #1 in 
> https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024
>  for more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9867) "u:%user:parentqueue.%user" is not working as expected

2019-09-28 Thread Manikandan R (Jira)
Manikandan R created YARN-9867:
--

 Summary: "u:%user:parentqueue.%user" is not working as expected
 Key: YARN-9867
 URL: https://issues.apache.org/jira/browse/YARN-9867
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R
Assignee: Manikandan R






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9867) "u:%user:parentqueue.%user" is not working as expected

2019-09-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9867:
---
Description: Please refer #2 in 
https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024
 for more details

> "u:%user:parentqueue.%user" is not working as expected
> --
>
> Key: YARN-9867
> URL: https://issues.apache.org/jira/browse/YARN-9867
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> Please refer #2 in 
> https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024
>  for more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9868) Validate %primary_group queue in CS queue manager

2019-09-28 Thread Manikandan R (Jira)
Manikandan R created YARN-9868:
--

 Summary: Validate %primary_group queue in CS queue manager
 Key: YARN-9868
 URL: https://issues.apache.org/jira/browse/YARN-9868
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R
Assignee: Manikandan R


As part of %secondary_group mapping, we ensure o/p of %secondary_group while 
processing the queue mapping is available using CSQueueManager. Similarly, we 
will need to same for %primary_group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-28 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940249#comment-16940249
 ] 

Manikandan R commented on YARN-9841:


{quote}I'm fine with a separate JIRA.
{quote}
Created YARN-9865
{quote}I haven't had the chance to examine the mapping behaviour
{quote}
Created YARN-9866 and YARN-9867  for 2 issues. [~Prabhu Joseph] If you don't 
see these observations as issues, we can close if needed.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping

2019-09-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9865:
---
Issue Type: Improvement  (was: Bug)

> Capacity scheduler: add support for combined %user + %secondary_group mapping
> -
>
> Key: YARN-9865
> URL: https://issues.apache.org/jira/browse/YARN-9865
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9865.001.patch
>
>
> Similiar to YARN-9841, but for secondary group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-09-30 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941101#comment-16941101
 ] 

Manikandan R commented on YARN-9841:


Thanks for your validation.

{quote}"u:user2:%primary_group" - I can confirm that it's not working{quote}
Ok

{quote}I think this is not a bug, at least not they way you're 
suggesting.{quote}
Yes, you are right. I also checked it again.

{quote}Here, the problem is that end-users are not notified about this. {quote}
Yes. In general, I think we will need to improve the documentation to help 
users especially on the precedence.



 

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-10-08 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947063#comment-16947063
 ] 

Manikandan R commented on YARN-9840:


Sorry for the delay. Attached .003.patch.

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9840.001.patch, YARN-9840.002.patch, 
> YARN-9840.003.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-10-08 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9840:
---
Attachment: YARN-9840.003.patch

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9840.001.patch, YARN-9840.002.patch, 
> YARN-9840.003.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-10-08 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947070#comment-16947070
 ] 

Manikandan R commented on YARN-9841:


Attached .003.patch to fix the checkstyle issues.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-10-08 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9841:
---
Attachment: YARN-9841.003.patch

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-10-09 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947702#comment-16947702
 ] 

Manikandan R commented on YARN-9841:


Thanks [~pbacsko]. Attached .004.patch.

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, 
> YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-10-09 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9841:
---
Attachment: YARN-9841.004.patch

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, 
> YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-10-16 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952987#comment-16952987
 ] 

Manikandan R commented on YARN-9841:


Sorry for the delay.

Attached .005.patch for doc changes. 

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, 
> YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping

2019-10-16 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9841:
---
Attachment: YARN-9841.005.patch

> Capacity scheduler: add support for combined %user + %primary_group mapping
> ---
>
> Key: YARN-9841
> URL: https://issues.apache.org/jira/browse/YARN-9841
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9841.001.patch, YARN-9841.001.patch, 
> YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, 
> YARN-9841.005.patch, YARN-9841.junit.patch
>
>
> Right now in CS, using {{%primary_group}} with a parent queue is only 
> possible this way:
> {{u:%user:parentqueue.%primary_group}}
> Looking at 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java,
>  we cannot do something like:
> {{u:%user:%primary_group.%user}}
> Fair Scheduler supports a nested rule where such a placement/mapping rule is 
> possible. This improvement would reduce this feature gap.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping

2019-10-16 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9865:
---
Attachment: YARN-9865.002.patch

> Capacity scheduler: add support for combined %user + %secondary_group mapping
> -
>
> Key: YARN-9865
> URL: https://issues.apache.org/jira/browse/YARN-9865
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9865.001.patch, YARN-9865.002.patch
>
>
> Similiar to YARN-9841, but for secondary group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping

2019-10-16 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952990#comment-16952990
 ] 

Manikandan R commented on YARN-9865:


Attached .002.patch.

> Capacity scheduler: add support for combined %user + %secondary_group mapping
> -
>
> Key: YARN-9865
> URL: https://issues.apache.org/jira/browse/YARN-9865
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9865.001.patch, YARN-9865.002.patch
>
>
> Similiar to YARN-9841, but for secondary group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping

2019-10-16 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953010#comment-16953010
 ] 

Manikandan R commented on YARN-9865:


Dependency link has been fixed. It requires YARN-9841. Can you trigger the 
jenkins manually?

> Capacity scheduler: add support for combined %user + %secondary_group mapping
> -
>
> Key: YARN-9865
> URL: https://issues.apache.org/jira/browse/YARN-9865
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9865.001.patch, YARN-9865.002.patch
>
>
> Similiar to YARN-9841, but for secondary group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9773) Add QueueMetrics for Custom Resources

2019-10-17 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953864#comment-16953864
 ] 

Manikandan R commented on YARN-9773:


Thanks [~epayne] for your support.

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9773.001.patch, YARN-9773.002.patch, 
> YARN-9773.003.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9866) u:user2:%primary_group is not working as expected

2019-10-17 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9866:
---
Attachment: YARN-9866.001.patch

> u:user2:%primary_group is not working as expected
> -
>
> Key: YARN-9866
> URL: https://issues.apache.org/jira/browse/YARN-9866
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9866.001.patch
>
>
> Please refer #1 in 
> https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024
>  for more details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9912) Support u:user2:%secondary_group queue mapping

2019-10-17 Thread Manikandan R (Jira)
Manikandan R created YARN-9912:
--

 Summary: Support u:user2:%secondary_group queue mapping
 Key: YARN-9912
 URL: https://issues.apache.org/jira/browse/YARN-9912
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R
Assignee: Manikandan R






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9912) Support u:user2:%secondary_group queue mapping

2019-10-17 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9912:
---
Description: Simliar to 

> Support u:user2:%secondary_group queue mapping
> --
>
> Key: YARN-9912
> URL: https://issues.apache.org/jira/browse/YARN-9912
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> Simliar to 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9912) Support u:user2:%secondary_group queue mapping

2019-10-17 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9912:
---
Description: Similar to u:user2:%primary_group mapping, add support for 
u:user2:%secondary_group queue mapping as well.  (was: Simliar to )

> Support u:user2:%secondary_group queue mapping
> --
>
> Key: YARN-9912
> URL: https://issues.apache.org/jira/browse/YARN-9912
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> Similar to u:user2:%primary_group mapping, add support for 
> u:user2:%secondary_group queue mapping as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9912) Support u:user2:%secondary_group queue mapping

2019-10-17 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9912:
---
Attachment: YARN-9912.001.patch

> Support u:user2:%secondary_group queue mapping
> --
>
> Key: YARN-9912
> URL: https://issues.apache.org/jira/browse/YARN-9912
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9912.001.patch
>
>
> Similar to u:user2:%primary_group mapping, add support for 
> u:user2:%secondary_group queue mapping as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9912) Support u:user2:%secondary_group queue mapping

2019-10-17 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9912:
---
Issue Type: Improvement  (was: Bug)

> Support u:user2:%secondary_group queue mapping
> --
>
> Key: YARN-9912
> URL: https://issues.apache.org/jira/browse/YARN-9912
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9912.001.patch
>
>
> Similar to u:user2:%primary_group mapping, add support for 
> u:user2:%secondary_group queue mapping as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping

2019-10-17 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953887#comment-16953887
 ] 

Manikandan R commented on YARN-9840:


Sorry for the delay. A minor change in doc. Instead of 
u:user3:%secondary_group, it should be u:%user:%secondary_group.

u:user3:%secondary_group queue mapping has been addressed in YARN-9912

> Capacity scheduler: add support for Secondary Group rule mapping
> 
>
> Key: YARN-9840
> URL: https://issues.apache.org/jira/browse/YARN-9840
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9840-004.patch, YARN-9840.001.patch, 
> YARN-9840.002.patch, YARN-9840.003.patch
>
>
> Currently, Capacity Scheduler only supports primary group rule mapping like 
> this:
> {{u:%user:%primary_group}}
> Fair scheduler already supports secondary group placement rule. Let's add 
> this to CS to reduce the feature gap.
> Class of interest: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9868) Validate %primary_group queue in CS queue manager

2019-10-17 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9868:
---
Issue Type: Improvement  (was: Bug)

> Validate %primary_group queue in CS queue manager
> -
>
> Key: YARN-9868
> URL: https://issues.apache.org/jira/browse/YARN-9868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
>
> As part of %secondary_group mapping, we ensure o/p of %secondary_group while 
> processing the queue mapping is available using CSQueueManager. Similarly, we 
> will need to same for %primary_group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   >