[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-06-15 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136293#comment-17136293
 ] 

Manikandan R commented on YARN-10297:
-

[~jhung] Patch LGTM. Can you please take a look and commit?

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch, YARN-10297.002.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-06-11 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17133916#comment-17133916
 ] 

Manikandan R commented on YARN-10297:
-

Thanks [~Jim_Brennan]. LGTM. Please fix whitespace issues.

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9964) Queue metrics turn negative when relabeling a node with running containers to default partition

2020-06-01 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123338#comment-17123338
 ] 

Manikandan R commented on YARN-9964:


[~jhung] 
YARN-6492 patch covered this fixes too. Can we close this?

> Queue metrics turn negative when relabeling a node with running containers to 
> default partition 
> 
>
> Key: YARN-9964
> URL: https://issues.apache.org/jira/browse/YARN-9964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Priority: Major
>
> YARN-6467 changed queue metrics logic to only update certain metrics if it's 
> for default partition. But if an app runs containers in a labeled node, then 
> this node is moved to default partition, then the container is released, this 
> container's resource won't increment queue's allocated resource, but will 
> decrement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9964) Queue metrics turn negative when relabeling a node with running containers to default partition

2020-06-01 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R reassigned YARN-9964:
--

Assignee: Manikandan R

> Queue metrics turn negative when relabeling a node with running containers to 
> default partition 
> 
>
> Key: YARN-9964
> URL: https://issues.apache.org/jira/browse/YARN-9964
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
>
> YARN-6467 changed queue metrics logic to only update certain metrics if it's 
> for default partition. But if an app runs containers in a labeled node, then 
> this node is moved to default partition, then the container is released, this 
> container's resource won't increment queue's allocated resource, but will 
> decrement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9767) PartitionQueueMetrics Issues

2020-06-01 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R resolved YARN-9767.

Resolution: Fixed

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9767.001.patch
>
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9767) PartitionQueueMetrics Issues

2020-06-01 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123337#comment-17123337
 ] 

Manikandan R commented on YARN-9767:


YARN-6492 patch covered this fixes too. Hence closing this.

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9767.001.patch
>
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-05-30 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120409#comment-17120409
 ] 

Manikandan R commented on YARN-10297:
-

In this test case, I don't see a situation of rm being started twice. Can you 
try the other approach (setting mini cluster mode) in setup and see?

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Priority: Major
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-30 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120407#comment-17120407
 ] 

Manikandan R commented on YARN-6492:


Ok, Makes sense.

Patch LGTM.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-branch-2.10.016.patch, YARN-6492-branch-2.10.019.patch, 
> YARN-6492-branch-2.8.014.patch, YARN-6492-branch-2.9.015.patch, 
> YARN-6492-branch-3.1.018.patch, YARN-6492-branch-3.2.017.patch, 
> YARN-6492-junits.patch, YARN-6492.001.patch, YARN-6492.002.patch, 
> YARN-6492.003.patch, YARN-6492.004.patch, YARN-6492.005.WIP.patch, 
> YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, 
> YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, YARN-6492.011.WIP.patch, 
> YARN-6492.012.WIP.patch, YARN-6492.013.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-05-30 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120303#comment-17120303
 ] 

Manikandan R commented on YARN-10297:
-

Yes, we can make it as Synchronized.

On test failures, When I tried to debug TestCapacitySchedulerAutoQueueCreation 
test failures, had come across two approaches to fix the problem. I fixed the 
problem by stopping the earlier created rm. Other approach is to use 
DefaultMetricsSystem.setMiniClusterMode(true); . I had come across this 
alternative approach in many other test cases.

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Priority: Major
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-30 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17120300#comment-17120300
 ] 

Manikandan R commented on YARN-6492:


{quote}For the branch-2.10 patch, do we need to remove the{\quote}

This method has been used only in test cases. Yes, we can remove the method 
itself and modify test cases as well.

 

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.2.2, 3.4.0, 3.3.1, 3.1.5
>
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-branch-2.10.016.patch, YARN-6492-branch-2.8.014.patch, 
> YARN-6492-branch-2.9.015.patch, YARN-6492-branch-3.1.018.patch, 
> YARN-6492-branch-3.2.017.patch, YARN-6492-junits.patch, YARN-6492.001.patch, 
> YARN-6492.002.patch, YARN-6492.003.patch, YARN-6492.004.patch, 
> YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, 
> YARN-6492.008.WIP.patch, YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, 
> YARN-6492.011.WIP.patch, YARN-6492.012.WIP.patch, YARN-6492.013.patch, 
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2020-05-29 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492-branch-2.10.016.patch
YARN-6492-branch-2.9.015.patch
YARN-6492-branch-2.8.014.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-branch-2.10.016.patch, YARN-6492-branch-2.8.014.patch, 
> YARN-6492-branch-2.9.015.patch, YARN-6492-junits.patch, YARN-6492.001.patch, 
> YARN-6492.002.patch, YARN-6492.003.patch, YARN-6492.004.patch, 
> YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, 
> YARN-6492.008.WIP.patch, YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, 
> YARN-6492.011.WIP.patch, YARN-6492.012.WIP.patch, YARN-6492.013.patch, 
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-29 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119731#comment-17119731
 ] 

Manikandan R commented on YARN-6492:


[~jhung] Thanks.

Attached patch for branches 2.8, 2.9 & 2.10.

Following methods needs to be checked only in branch-2.8. 

QueueMetrics#allocateResources(String partition, String user, Resource res)

QueueMetrics#releaseResources(String partition, String user, Resource res).

 

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-junits.patch, YARN-6492.001.patch, YARN-6492.002.patch, 
> YARN-6492.003.patch, YARN-6492.004.patch, YARN-6492.005.WIP.patch, 
> YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, 
> YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, YARN-6492.011.WIP.patch, 
> YARN-6492.012.WIP.patch, YARN-6492.013.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6492) Generate queue metrics for each partition

2020-05-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116842#comment-17116842
 ] 

Manikandan R edited comment on YARN-6492 at 5/26/20, 3:56 PM:
--

Reg line 2542, retained the old asserts and added some more asserts to ensure 
pending resources metrics are correct when containers pending on "default" 
partition has been allocated with "x" partition.

Fixed whitespaces.

TestCapacitySchedulerAutoQueueCreation test failures has been fixed by stopping 
the old rm before starting up new rm in some test cases. This change has opened 
up couple of more asserts failures. Fixed those as well by using correct rm and 
cs variables.

Sure, can upload. After committing to trunk?


was (Author: maniraj...@gmail.com):
Reg line 2542, retained the old asserts and added some more asserts to ensure 
pending resources metrics are correct when containers pending on "default" 
partition has been allocated with "x" partition.

Fixed whitespaces.

TestCapacitySchedulerAutoQueueCreation test failures has been fixed by stop the 
old rm before starting up new rm in some test cases. This change has opened up 
couple of more asserts failures. Fixed those as well by using correct rm and cs 
variables.

Sure, can upload. After committing to trunk?

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-junits.patch, YARN-6492.001.patch, YARN-6492.002.patch, 
> YARN-6492.003.patch, YARN-6492.004.patch, YARN-6492.005.WIP.patch, 
> YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, 
> YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, YARN-6492.011.WIP.patch, 
> YARN-6492.012.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116842#comment-17116842
 ] 

Manikandan R commented on YARN-6492:


Reg line 2542, retained the old asserts and added some more asserts to ensure 
pending resources metrics are correct when containers pending on "default" 
partition has been allocated with "x" partition.

Fixed whitespaces.

TestCapacitySchedulerAutoQueueCreation test failures has been fixed by stop the 
old rm before starting up new rm in some test cases. This change has opened up 
couple of more asserts failures. Fixed those as well by using correct rm and cs 
variables.

Sure, can upload. After committing to trunk?

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-junits.patch, YARN-6492.001.patch, YARN-6492.002.patch, 
> YARN-6492.003.patch, YARN-6492.004.patch, YARN-6492.005.WIP.patch, 
> YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, 
> YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, YARN-6492.011.WIP.patch, 
> YARN-6492.012.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2020-05-26 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492.012.WIP.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-junits.patch, YARN-6492.001.patch, YARN-6492.002.patch, 
> YARN-6492.003.patch, YARN-6492.004.patch, YARN-6492.005.WIP.patch, 
> YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, 
> YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, YARN-6492.011.WIP.patch, 
> YARN-6492.012.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2020-05-25 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492.011.WIP.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-junits.patch, YARN-6492.001.patch, YARN-6492.002.patch, 
> YARN-6492.003.patch, YARN-6492.004.patch, YARN-6492.005.WIP.patch, 
> YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, 
> YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, YARN-6492.011.WIP.patch, 
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-25 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116138#comment-17116138
 ] 

Manikandan R commented on YARN-6492:


Addressed all comments, covered almost all 
checkstyle/whitespace/javadoc/findbugs/asflicense issues. 
TestCapacitySchedulerAutoQueueCreation failures are happening only for test 
cases which tries to mock the rm twice. 
TestCapacitySchedulerAutoQueueCreation#setupSchedulerInstance does this. Will 
need to check the reason even though MockRM shutdown the metrics system and 
clear queue metrics.
{quote}On line 2539 of TestNodeLabelContainerAllocation, should
{quote}
Behaviour was same even without this patch when user metrics has been enabled. 
Attached junits patch explains this. However, did some changes in 
LeafQueue.java as part of this patch as well. In both ways, 
UsersManager#computeUserLimit() does the actual calculation. Can we handle this 
separately?

Good to see the positive results on live cluster. Yes, we can handle 
CSQueueMetrics for partitioned metrics in separate JIRA.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-junits.patch, YARN-6492.001.patch, YARN-6492.002.patch, 
> YARN-6492.003.patch, YARN-6492.004.patch, YARN-6492.005.WIP.patch, 
> YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, 
> YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2020-05-25 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492-junits.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492-junits.patch, YARN-6492.001.patch, YARN-6492.002.patch, 
> YARN-6492.003.patch, YARN-6492.004.patch, YARN-6492.005.WIP.patch, 
> YARN-6492.006.WIP.patch, YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, 
> YARN-6492.009.WIP.patch, YARN-6492.010.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2020-05-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492.010.WIP.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, YARN-6492.009.WIP.patch, 
> YARN-6492.010.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17114129#comment-17114129
 ] 

Manikandan R commented on YARN-6492:


[~jhung] Thanks for your quick turnaround. Addressed all points except last 
three comments in .10 patch.


{quote}On line 2539 of TestNodeLabelContainerAllocation, should{\quote}
{quote}On line 2566, how is node1 getting 8 containers if queue A's max 
capacity is only 50% of 10GB = 5GB?{\quote}

Label 'x' is non-exclusive and because of IGNORE_PARTITION_EXCLUSIVITY 
scheduling mode calculation in UsersManager#computeUserLimit()?

In the meantime, Will dig more on these 3 comments in detail.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, YARN-6492.009.WIP.patch, 
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492.009.WIP.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, YARN-6492.009.WIP.patch, 
> partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-20 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112371#comment-17112371
 ] 

Manikandan R commented on YARN-6492:


[~jhung]  [~epayne] 

Attached .009 patch based on our discussions:
 # Retain existing default Queue Metrics behaviour (after YARN-6467).
 # Partition Metrics
 # Partition * Queue Metrics
 # Partition * Queue * User Metrics (Only If USER METRICS has been enabled).

Please review and share your feedback.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-14 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107476#comment-17107476
 ] 

Manikandan R commented on YARN-6492:


Thanks for sharing your views.

I spent good amount of time based on different notion to develop the patch. 
Now, I will need to shift my mind completely to modify the patch based on the 
new conclusions. For example,

In recent patch, no metrics method would do "if partition is default" check, 
which is something needs to be retained for backward compatibility and for 
Partition Queue Metrics computation, it should happen for all partitions at a 
high level.

Will work on the patch and update asap. Thanks.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2020-05-14 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107328#comment-17107328
 ] 

Manikandan R commented on YARN-10259:
-

Minor nit: To make it concise,

*if* (! iter.hasNext()) {

can be used to avoid continue;

May be taken up later if not possible now.

> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement
> ---
>
> Key: YARN-10259
> URL: https://issues.apache.org/jira/browse/YARN-10259
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10259-001.patch, YARN-10259-002.patch, 
> YARN-10259-003.patch
>
>
> Reserved Containers are not allocated from the available space of other nodes 
> in CandidateNodeSet in MultiNodePlacement. 
> *Repro:*
> 1. MultiNode Placement Enabled.
> 2. Two nodes h1 and h2 with 8GB
> 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
> placed in h2.
> 4. Submit app3 AM which is reserved in h1
> 5. Kill app2 which frees space in h2.
> 6. app3 AM never gets ALLOCATED
> RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on 
> h2 as it expects the assignment to be on same node where reservation has 
> happened.
> {code}
> 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
> appattempt_1588684773609_0003_01 reserved container 
> container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 
> available= used=. This attempt 
> currently has 1 reserved containers at priority 0; currentReservation 
> 
> 2020-05-05 18:49:37,264 INFO  [AsyncDispatcher event handler] 
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved 
> container=container_1588684773609_0003_01_01, on node=host: h1:1234 
> #containers=1 available= used= 
> with resource=
>RESERVED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h1:1234; Resource=)]
>
> 2020-05-05 18:49:38,283 DEBUG [Time-limited test] 
> allocator.RegularContainerAllocator 
> (RegularContainerAllocator.java:assignContainer(514)) - assignContainers: 
> node=h2 application=application_1588684773609_0003 priority=0 
> pendingAsk=,repeat=1> 
> type=OFF_SWITCH
> 2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate 
> from reserved container container_1588684773609_0003_01_01, but node is 
> not reserved
>ALLOCATED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h2:1234; Resource=)]
> {code}
> Attached testcase which reproduces the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-05-12 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105962#comment-17105962
 ] 

Manikandan R commented on YARN-10154:
-

Looks good to me. Thanks [~prabhujoseph] and [~sunilg]

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch, YARN-10154.addendum-001.patch, 
> YARN-10154.addendum-002.patch, YARN-10154.addendum-003.patch, 
> YARN-10154.addendum-004.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-10 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103876#comment-17103876
 ] 

Manikandan R commented on YARN-6492:


YARN-6467 would be available as given below (default partition metrics with 
queue wise breakup ) 

 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=default,q0=root,q1=a"
 ...{code}
from this Jira onwards. So, this Jira won't reverse YARN-6467 as such. Admins 
interested in "default" partition metrics can use this JMX o/p for their 
analysis.

At the same time, we would be retaining the below "original queuemetrics 
computation" as given below
{code:java}
 "name" : "Hadoop:service=ResourceManager,name=QueueMetrics,q0=root,q1=a" 
...{code}
This "original queuemetrics computation" doesn't consider partition at all into 
its computation. Purely, from Queue perspective, helps admin to view the 
metrics only from Queue angle which was happening before YARN-6467.

Thoughts?

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6492) Generate queue metrics for each partition

2020-05-09 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103138#comment-17103138
 ] 

Manikandan R edited comment on YARN-6492 at 5/9/20, 7:07 AM:
-

Thanks [~jhung] and [~epayne] for your support.

I would like to clear some things at high level, especially on 
scope/requirements of this Jira and revisit the background (thought process) on 
how all these related Jira has been created to ensure that we are on same page:

YARN-6467 computes metrics only for default partition. It has been created as 
an interim step towards the major goal of "Providing Metrics at Partition 
Level" to the customers. Major goal is nothing but this JIRA. Since YARN-6467 
is stepping stone for this JIRA, it has been coded in a way that it should 
easily accomodate this Jira changes in a simplistic way (For example,  Just 
removing if(partition is default) check inside each metric computation method 
expected to take care most of the things and no more changes required on collar 
side).

Though YARN-6467 covers some aspects, it had created confusion (for the queue's 
associated with multiple partitions) as well. Original QueueMetrics computation 
behaviour has been changed. Original QueueMetrics computation is nothing but 
the metrics computation only from Queue perspective irrespective of how many 
partitions it has been associated to and nothing to do with Partitions.  It 
started providing metrics only for "default" partition by replacing the 
original behaviour. Another reasons for taking up this path is, this Jira 
expected to go into the trunk immediately after YARN-6467 (as planned :) ) and 
hence there won't be any inconsistency in original queue metrics computation 
behaviour, but it didn't happened.

So, whenever we said "backward compatibility" we referred to this Original 
QueueMetrics computation, not "existing QueueMetrics should still only contain 
metrics for default partition".  In other words, Original QueueMetrics 
computation is nothing but the code/behaviour before YARN-6467.

Now, let me explain scope of this JIRA. We would like to achieve the following 
things:
 # Partition * Queue Metrics:

A partition can be associated with many queues. So we need to break up, hence 
we need Partition * Queue metrics.

Proposed structure is

PartitionMetric (labelX)
     QueueMetric (A)
        metrics
        Usermetrics 
        QueueMetric (A1)
            metrics
            Usermetrics
        QueueMetric (A2)
            metrics
            Usermetrics
     QueueMetric (B)
        metrics
        Usermetrics
 PartitionMetric (labelY)
     QueueMetric (A)
          QueueMetric (A1) 
          QueueMetric (A2)
     QueueMetric (B)
           …

{{QueueMetrics#getPartitionQueueMetrics }} takes care of this registration into 
Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x,q0=root,q1=a"
 ...{code}
 

 

2. Partition metrics:

Partition level metrics computation. This can help Admins to analyse the usage 
at Partition level.

Proposed structure is

PartitionMetric (labelX)
     metrics
 PartitionMetric (labelY)
     metrics

{{PartitionQueueMetrics#getPartitionQueueMetrics }} takes care of this 
registration into Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x" 
...{code}
In addition to these 2 changes, we would like to retain the Original 
QueueMetrics computation behaviour.

Hope the above explanation explains why the below assert has been changed:
{noformat}
assertEquals(10 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
is changed to
{noformat}
assertEquals(22 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
This assert has been added as part of YARN-9596 to ensure YARN-6467 works 
correctly. YARN-9767 exhaustive unit test changes explain the difference 
between Partition * Queue Metrics, Partition Metrics and Original QueueMetrics 
very clearly.

What changes this patch should contain?

Yes, there is some confusion as some changes are in YARN-9767. #2 described in 
YARN-9767 should be in this patch. Otherwise, this patch is incomplete from 
feature rollout perspective. [~jhung] said #1 described in YARN-9767 was there 
even before. My understanding, it should be happening after YARN-6467 only. 
Would it be better if we handle that too here? What do you think?

Please share your opinions. Post that, will post the proper patch first and 
then reviews can be taken up on the same.
{quote}In order to be consistent with other API responses like 
{{/ws/v1/cluster/scheduler}}, I think this should just be an empty string. So, 
I would expect the JMX response to look like the following for 
DEFAULT_PARTITION:
{quote}
 Yes 

[jira] [Comment Edited] (YARN-6492) Generate queue metrics for each partition

2020-05-09 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103138#comment-17103138
 ] 

Manikandan R edited comment on YARN-6492 at 5/9/20, 7:04 AM:
-

Thanks [~jhung] and [~epayne] for your support.

I would like to clear some things at high level, especially on 
scope/requirements of this Jira and revisit the background (thought process) on 
how all these related Jira has been created to ensure that we are on same page:

YARN-6467 computes metrics only for default partition. It has been created as 
an interim step towards the major goal of "Providing Metrics at Partition 
Level" to the customers. Major goal is nothing but this JIRA. Since YARN-6467 
is stepping stone for this JIRA, it has been coded in a way that it should 
easily accomodate this Jira changes in a simplistic way (For example,  Just 
removing if(partition is default) check inside each metric computation method 
expected to take care most of the things and no more changes required on collar 
side).

Though YARN-6467 covers some aspects, it had created confusion (for the queue's 
associated with multiple partitions) as well. Original QueueMetrics computation 
behaviour has been changed. Original QueueMetrics computation is nothing but 
the metrics computation only from Queue perspective irrespective of how many 
partitions it has been associated to and nothing to do with Partitions.  It 
started providing metrics only for "default" partition by replacing the 
original behaviour. Another reasons for taking up this path is, this Jira 
expected to go into the trunk immediately after YARN-6467 (as planned :) ) and 
hence there won't be any inconsistency in original queue metrics computation 
behaviour, but it didn't happened.

So, whenever we said "backward compatibility" we referred to this Original 
QueueMetrics computation, not "existing QueueMetrics should still only contain 
metrics for default partition".  In other words, Original QueueMetrics 
computation is nothing but the code/behaviour before YARN-6467.

Now, let me explain scope of this JIRA. We would like to achieve the following 
things:
 # Partition * Queue Metrics:

A partition can be associated with many queues. So we need to break up, hence 
we need Partition * Queue metrics.

Proposed structure is

PartitionMetric (labelX)
     QueueMetric (A)
        metrics
        Usermetrics 
        QueueMetric (A1)
            metrics
            Usermetrics
        QueueMetric (A2)
            metrics
            Usermetrics
     QueueMetric (B)
        metrics
        Usermetrics
 PartitionMetric (labelY)
     QueueMetric (A)
          QueueMetric (A1) 
          QueueMetric (A2)
     QueueMetric (B)
           …

{{QueueMetrics#getPartitionQueueMetrics }} takes care of this registration into 
Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x,q0=root,q1=a"
 ...{code}
 

 

2. Partition metrics:

Partition level metrics computation. This can help Admins to analyse the usage 
at Partition level.

Proposed structure is

PartitionMetric (labelX)
     metrics
 PartitionMetric (labelY)
     metrics

{{PartitionQueueMetrics#getPartitionQueueMetrics }} takes care of this 
registration into Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x" 
...{code}
In addition to these 2 changes, we would like to retain the Original 
QueueMetrics computation behaviour.

Hope the above explanation explains why the below assert has been changed:
{noformat}
assertEquals(10 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
is changed to
{noformat}
assertEquals(22 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
This assert has been added as part of YARN-9596 to ensure YARN-6467 works 
correctly. YARN-9767 exhaustive unit test changes explain the difference 
between Partition * Queue Metrics, Partition Metrics and Original QueueMetrics 
very clearly.

What changes this patch should contain?

Yes, there is some confusion as some changes are in YARN-9767. #2 described in 
YARN-9767 should be in this patch. Otherwise, this patch is incomplete from 
feature rollout perspective. [~jhung] said #1 described in YARN-9767 was there 
even before. My understanding, it should be happening after YARN-6467 only. 
Would it be better if we handle that too here? What do you think?

Please share your opinions. Post that, will post the proper patch first and 
then reviews can be taken up on the same.

 \{quote}In order to be consistent with other API responses like 
{{/ws/v1/cluster/scheduler}}, I think this should just be an empty string. So, 
I would expect the JMX response to look like the following for 
DEFAULT_PARTITION:\{quote}

 

[jira] [Comment Edited] (YARN-6492) Generate queue metrics for each partition

2020-05-09 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103138#comment-17103138
 ] 

Manikandan R edited comment on YARN-6492 at 5/9/20, 7:03 AM:
-

Thanks [~jhung] and [~epayne] for your support.

I would like to clear some things at high level, especially on 
scope/requirements of this Jira and revisit the background (thought process) on 
how all these related Jira has been created to ensure that we are on same page:

YARN-6467 computes metrics only for default partition. It has been created as 
an interim step towards the major goal of "Providing Metrics at Partition 
Level" to the customers. Major goal is nothing but this JIRA. Since YARN-6467 
is stepping stone for this JIRA, it has been coded in a way that it should 
easily accomodate this Jira changes in a simplistic way (For example,  Just 
removing if(partition is default) check inside each metric computation method 
expected to take care most of the things and no more changes required on collar 
side).

Though YARN-6467 covers some aspects, it had created confusion (for the queue's 
associated with multiple partitions) as well. Original QueueMetrics computation 
behaviour has been changed. Original QueueMetrics computation is nothing but 
the metrics computation only from Queue perspective irrespective of how many 
partitions it has been associated to and nothing to do with Partitions.  It 
started providing metrics only for "default" partition by replacing the 
original behaviour. Another reasons for taking up this path is, this Jira 
expected to go into the trunk immediately after YARN-6467 (as planned :) ) and 
hence there won't be any inconsistency in original queue metrics computation 
behaviour, but it didn't happened.

So, whenever we said "backward compatibility" we referred to this Original 
QueueMetrics computation, not "existing QueueMetrics should still only contain 
metrics for default partition".  In other words, Original QueueMetrics 
computation is nothing but the code/behaviour before YARN-6467.

Now, let me explain scope of this JIRA. We would like to achieve the following 
things:
 # Partition * Queue Metrics:

A partition can be associated with many queues. So we need to break up, hence 
we need Partition * Queue metrics.

Proposed structure is

PartitionMetric (labelX)
     QueueMetric (A)
        metrics
        Usermetrics 
        QueueMetric (A1)
            metrics
            Usermetrics
        QueueMetric (A2)
            metrics
            Usermetrics
     QueueMetric (B)
        metrics
        Usermetrics
 PartitionMetric (labelY)
     QueueMetric (A)
          QueueMetric (A1) 
          QueueMetric (A2)
     QueueMetric (B)
           …

{{QueueMetrics#getPartitionQueueMetrics }} takes care of this registration into 
Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x,q0=root,q1=a"
 ...{code}
 

 

2. Partition metrics:

Partition level metrics computation. This can help Admins to analyse the usage 
at Partition level.

Proposed structure is

PartitionMetric (labelX)
    metrics
 PartitionMetric (labelY)
    metrics


 {{PartitionQueueMetrics#getPartitionQueueMetrics }} takes care of this 
registration into Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x" 
...{code}
In addition to these 2 changes, we would like to retain the Original 
QueueMetrics computation behaviour.

Hope the above explanation explains why the below assert has been changed:
{noformat}
assertEquals(10 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
is changed to
{noformat}
assertEquals(22 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
This assert has been added as part of YARN-9596 to ensure YARN-6467 works 
correctly. YARN-9767 exhaustive unit test changes explain the difference 
between Partition * Queue Metrics, Partition Metrics and Original QueueMetrics 
very clearly.

What changes this patch should contain?

Yes, there is some confusion as some changes are in YARN-9767. #2 described in 
YARN-9767 should be in this patch. Otherwise, this patch is incomplete from 
feature rollout perspective. [~jhung] said #1 described in YARN-9767 was there 
even before. My understanding, it should be happening after YARN-6467 only. 
Would it be better if we handle that too here? What do you think?

Please share your opinions. Post that, will post the proper patch first and 
then reviews can be taken up on the same.

 

In order to be consistent with other API responses like 
{{/ws/v1/cluster/scheduler}}, I think this should just be an empty string. So, 
I would expect the JMX response to look like the following for 
DEFAULT_PARTITION:\{quote}

 

Yes 

[jira] [Comment Edited] (YARN-6492) Generate queue metrics for each partition

2020-05-09 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103138#comment-17103138
 ] 

Manikandan R edited comment on YARN-6492 at 5/9/20, 7:02 AM:
-

Thanks [~jhung] and [~epayne] for your support.

I would like to clear some things at high level, especially on 
scope/requirements of this Jira and revisit the background (thought process) on 
how all these related Jira has been created to ensure that we are on same page:

YARN-6467 computes metrics only for default partition. It has been created as 
an interim step towards the major goal of "Providing Metrics at Partition 
Level" to the customers. Major goal is nothing but this JIRA. Since YARN-6467 
is stepping stone for this JIRA, it has been coded in a way that it should 
easily accomodate this Jira changes in a simplistic way (For example,  Just 
removing if(partition is default) check inside each metric computation method 
expected to take care most of the things and no more changes required on collar 
side).

Though YARN-6467 covers some aspects, it had created confusion (for the queue's 
associated with multiple partitions) as well. Original QueueMetrics computation 
behaviour has been changed. Original QueueMetrics computation is nothing but 
the metrics computation only from Queue perspective irrespective of how many 
partitions it has been associated to and nothing to do with Partitions.  It 
started providing metrics only for "default" partition by replacing the 
original behaviour. Another reasons for taking up this path is, this Jira 
expected to go into the trunk immediately after YARN-6467 (as planned :) ) and 
hence there won't be any inconsistency in original queue metrics computation 
behaviour, but it didn't happened.

So, whenever we said "backward compatibility" we referred to this Original 
QueueMetrics computation, not "existing QueueMetrics should still only contain 
metrics for default partition".  In other words, Original QueueMetrics 
computation is nothing but the code/behaviour before YARN-6467.

Now, let me explain scope of this JIRA. We would like to achieve the following 
things:
 # Partition * Queue Metrics:

A partition can be associated with many queues. So we need to break up, hence 
we need Partition * Queue metrics.

Proposed structure is


 PartitionMetric (labelX)
    QueueMetric (A)
       metrics
       Usermetrics 
       QueueMetric (A1)
           metrics
           Usermetrics
       QueueMetric (A2)
           metrics
           Usermetrics
    QueueMetric (B)
       metrics
       Usermetrics
 PartitionMetric (labelY)
    QueueMetric (A)
         QueueMetric (A1) 
         QueueMetric (A2)
    QueueMetric (B)
          …

{{QueueMetrics#getPartitionQueueMetrics }} takes care of this registration into 
Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x,q0=root,q1=a"
 ...{code}
 

 

2. Partition metrics:

Partition level metrics computation. This can help Admins to analyse the usage 
at Partition level.

Proposed structure is
 PartitionMetric (labelX)
 metrics
 PartitionMetric (labelY)
 metrics
 {{PartitionQueueMetrics#getPartitionQueueMetrics }} takes care of this 
registration into Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x" 
...{code}
In addition to these 2 changes, we would like to retain the Original 
QueueMetrics computation behaviour.

Hope the above explanation explains why the below assert has been changed:
{noformat}
assertEquals(10 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
is changed to
{noformat}
assertEquals(22 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
This assert has been added as part of YARN-9596 to ensure YARN-6467 works 
correctly. YARN-9767 exhaustive unit test changes explain the difference 
between Partition * Queue Metrics, Partition Metrics and Original QueueMetrics 
very clearly.

What changes this patch should contain?

Yes, there is some confusion as some changes are in YARN-9767. #2 described in 
YARN-9767 should be in this patch. Otherwise, this patch is incomplete from 
feature rollout perspective. [~jhung] said #1 described in YARN-9767 was there 
even before. My understanding, it should be happening after YARN-6467 only. 
Would it be better if we handle that too here? What do you think?

Please share your opinions. Post that, will post the proper patch first and 
then reviews can be taken up on the same.

 

In order to be consistent with other API responses like 
{{/ws/v1/cluster/scheduler}}, I think this should just be an empty string. So, 
I would expect the JMX response to look like the following for 
DEFAULT_PARTITION:\{quote}

 

Yes [~epayne], we should 

[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-09 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103138#comment-17103138
 ] 

Manikandan R commented on YARN-6492:


Thanks [~jhung] and [~epayne] for your support.

I would like to clear some things at high level, especially on 
scope/requirements of this Jira and revisit the background (thought process) on 
how all these related Jira has been created to ensure that we are on same page:

YARN-6467 computes metrics only for default partition. It has been created as 
an interim step towards the major goal of "Providing Metrics at Partition 
Level" to the customers. Major goal is nothing but this JIRA. Since YARN-6467 
is stepping stone for this JIRA, it has been coded in a way that it should 
easily accomodate this Jira changes in a simplistic way (For example,  Just 
removing if(partition is default) check inside each metric computation method 
expected to take care most of the things and no more changes required on collar 
side).

Though YARN-6467 covers some aspects, it had created confusion (for the queue's 
associated with multiple partitions) as well. Original QueueMetrics computation 
behaviour has been changed. Original QueueMetrics computation is nothing but 
the metrics computation only from Queue perspective irrespective of how many 
partitions it has been associated to and nothing to do with Partitions.  It 
started providing metrics only for "default" partition by replacing the 
original behaviour. Another reasons for taking up this path is, this Jira 
expected to go into the trunk immediately after YARN-6467 (as planned :) ) and 
hence there won't be any inconsistency in original queue metrics computation 
behaviour, but it didn't happened.

So, whenever we said "backward compatibility" we referred to this Original 
QueueMetrics computation, not "existing QueueMetrics should still only contain 
metrics for default partition".  In other words, Original QueueMetrics 
computation is nothing but the code/behaviour before YARN-6467.

Now, let me explain scope of this JIRA. We would like to achieve the following 
things:
 # Partition * Queue Metrics:

A partition can be associated with many queues. So we need to break up, hence 
we need Partition * Queue metrics.

Proposed structure is
PartitionMetric (labelX)
QueueMetric (A)
metrics
Usermetrics 
 QueueMetric (A1)
 metrics
 Usermetrics
  QueueMetric (A2)
 metrics
 Usermetrics
 QueueMetric (B)
metrics
Usermetrics
PartitionMetric (labelY)
QueueMetric (A)
  QueueMetric (A1)
  QueueMetric (A2)
 QueueMetric (B)
…
{{QueueMetrics#getPartitionQueueMetrics }} takes care of this registration into 
Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x,q0=root,q1=a"
 ...{code}
 

 

2. Partition metrics:

Partition level metrics computation. This can help Admins to analyse the usage 
at Partition level.

Proposed structure is
PartitionMetric (labelX)
metrics
PartitionMetric (labelY)
metrics
{{PartitionQueueMetrics#getPartitionQueueMetrics }} takes care of this 
registration into Metric system and use this object for all metric computations.

Sample JMX o/p is 
{code:java}
"name" : 
"Hadoop:service=ResourceManager,name=PartitionQueueMetrics,partition=x" 
...{code}
In addition to these 2 changes, we would like to retain the Original 
QueueMetrics computation behaviour.

Hope the above explanation explains why the below assert has been changed:
{noformat}
assertEquals(10 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
is changed to
{noformat}
assertEquals(22 * GB, leafQueueA.getMetrics().getAvailableMB());{noformat}
This assert has been added as part of YARN-9596 to ensure YARN-6467 works 
correctly. YARN-9767 exhaustive unit test changes explain the difference 
between Partition * Queue Metrics, Partition Metrics and Original QueueMetrics 
very clearly.

What changes this patch should contain?

Yes, there is some confusion as some changes are in YARN-9767. #2 described in 
YARN-9767 should be in this patch. Otherwise, this patch is incomplete from 
feature rollout perspective. [~jhung] said #1 described in YARN-9767 was there 
even before. My understanding, it should be happening after YARN-6467 only. 
Would it be better if we handle that too here? What do you think?

Please share your opinions. Post that, will post the proper patch first and 
then reviews can be taken up on the same.

{quote}In order 

[jira] [Updated] (YARN-6492) Generate queue metrics for each partition

2020-05-07 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-6492:
---
Attachment: YARN-6492.008.WIP.patch

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, YARN-6492.008.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-05-07 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101814#comment-17101814
 ] 

Manikandan R commented on YARN-6492:


1:

We had two different method names till .006.patch. Based on our later 
discussions, simplified the code in .007.patch based on the thinking 
getPartitionQueueMetrics in QueueMetrics meant for Partition * Queue metric 
computation, whereas,  getPartitionQueueMetrics in PartitionQueueMetrics meant 
for Partition metric computation. Since PartitionQueueMetrics is an extension 
of QueueMetrics, we had same method names and functionality has been overridden 
for each of its need. However, comments are not clear. Modified the comments in 
.008.patch.

2,3:

This Jira  and YARN-9767 has to go in same time. Otherwise, as a feature, it 
won't be complete as such. It is just that we are handling the issues 
separately in YARN-9767 for better code review. YARN-9767 answers most of the 
concerns raised in #2, #3. For example, YARN-9767 contains very exhaustive test 
asserts which can clear up lot of confusions and enhance our understanding.

If it is confusing to have two different patches, we will need to decide on how 
to take this further. If you think, both Jira should be taken up in same patch 
for code completeness perspective, then let's do that. We can follow any one of 
the path which is convenient for us. Thoughts?

4:

Yes, We can do.

 

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-04-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091697#comment-17091697
 ] 

Manikandan R commented on YARN-10154:
-

Thanks [~prabhujoseph] and [~sunilg] for testing and fix.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch, YARN-10154.addendum-001.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10049) FIFOOrderingPolicy Improvements

2020-04-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091692#comment-17091692
 ] 

Manikandan R commented on YARN-10049:
-

Can we trigger?

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch, YARN-10049.002.patch, 
> YARN-10049.003.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10239) Capacity is zero for auto created leaf queues after leaf-queue-template.capacity has been updated

2020-04-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091690#comment-17091690
 ] 

Manikandan R commented on YARN-10239:
-

Can we close this as issues has been taken care in 
https://issues.apache.org/jira/browse/YARN-10154?focusedCommentId=17089904=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17089904?

> Capacity is zero for auto created leaf queues after 
> leaf-queue-template.capacity has been updated
> -
>
> Key: YARN-10239
> URL: https://issues.apache.org/jira/browse/YARN-10239
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akhil PB
>Assignee: Manikandan R
>Priority: Major
>
> In the scheduler response, the capacity of the auto created leaf queue became 
> zero after leaf-queue-template.capacity of managed parent queue has been 
> updated.
> cc: [~sunilg] [~wangda] [~prabhujoseph]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10239) Capacity is zero for auto created leaf queues after leaf-queue-template.capacity has been updated

2020-04-20 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R reassigned YARN-10239:
---

Assignee: Manikandan R

> Capacity is zero for auto created leaf queues after 
> leaf-queue-template.capacity has been updated
> -
>
> Key: YARN-10239
> URL: https://issues.apache.org/jira/browse/YARN-10239
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akhil PB
>Assignee: Manikandan R
>Priority: Major
>
> In the scheduler response, the capacity of the auto created leaf queue became 
> zero after leaf-queue-template.capacity of managed parent queue has been 
> updated.
> cc: [~sunilg] [~wangda] [~prabhujoseph]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10049) FIFOOrderingPolicy Improvements

2020-04-16 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084970#comment-17084970
 ] 

Manikandan R commented on YARN-10049:
-

[~pbacsko] [~snemeth] I don't think Junit failure is related to this patch. 
However, It is better to trigger jenkins as earlier results are throwing 404 
error. Can we take it forward?

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch, YARN-10049.002.patch, 
> YARN-10049.003.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-04-11 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10154:

Attachment: YARN-10154.003.patch

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch, 
> YARN-10154.003.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-04-11 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081378#comment-17081378
 ] 

Manikandan R commented on YARN-10154:
-

Sorry for the delay. 

Attaching .003.patch covering suggested above test cases.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10049) FIFOOrderingPolicy Improvements

2020-03-28 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10049:

Attachment: YARN-10049.003.patch

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch, YARN-10049.002.patch, 
> YARN-10049.003.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10049) FIFOOrderingPolicy Improvements

2020-03-28 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069251#comment-17069251
 ] 

Manikandan R commented on YARN-10049:
-

[~pbacsko] Thanks. Attached .003.patch.

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch, YARN-10049.002.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10049) FIFOOrderingPolicy Improvements

2020-03-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067846#comment-17067846
 ] 

Manikandan R commented on YARN-10049:
-

Rebasing patch with minor changes in junits.

[~pbacsko] [~snemeth] Can you please take a look?

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch, YARN-10049.002.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10049) FIFOOrderingPolicy Improvements

2020-03-26 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10049:

Attachment: YARN-10049.002.patch

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch, YARN-10049.002.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-03-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067836#comment-17067836
 ] 

Manikandan R commented on YARN-10043:
-

[~snemeth] [~pbacsko] Thanks for your reviews and commit.

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch, YARN-10043.004.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-03-25 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066864#comment-17066864
 ] 

Manikandan R commented on YARN-10043:
-

[~snemeth] I am waiting on this. Can we please take it forward?

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch, YARN-10043.004.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-03-25 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066859#comment-17066859
 ] 

Manikandan R commented on YARN-10154:
-

[~sunilg] Had a chance to review the patch? Thank you.

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-20 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17063380#comment-17063380
 ] 

Manikandan R commented on YARN-10198:
-

Should we do NULL check for {{parent}} after Line No. 163 to address mappings 
like u:%user:%primary_group ?

Other than this, Looks good to me. Thanks [~pbacsko] and [~prabhujoseph].

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch, YARN-10198-002.patch, 
> YARN-10198-003.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-19 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062750#comment-17062750
 ] 

Manikandan R commented on YARN-10198:
-

[~pbacsko] Thanks for extending the patch.

1. Yes, Unlike other placement rules in FS, 
"SecondaryGroupExistingPlacementRule" expects Queue exist. Except 
"SecondaryGroupExistingPlacementRule", all other rules (even 
PrimaryGroupExistingPlacementRule) depends on "FSPlacementRule#createQueue" 
flag as well in conjunction with FSPlacementRule#configuredQueue.

Modified patch is in line with above FS flow.

Line No.161 can make use of getPrimaryGroup() method.


> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch, YARN-10198-002.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10200) Add number of containers to RMAppManager summary

2020-03-17 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061053#comment-17061053
 ] 

Manikandan R commented on YARN-10200:
-

Should we sum up all attempt's totalallocatedcontainers (from 
RMAppAttemptMetrics) and add it to this summary?

> Add number of containers to RMAppManager summary
> 
>
> Key: YARN-10200
> URL: https://issues.apache.org/jira/browse/YARN-10200
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Priority: Major
>
> It would be useful to persist this so we can track containers processed by RM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-17 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061033#comment-17061033
 ] 

Manikandan R commented on YARN-10198:
-

[~pbacsko] Thanks for the patch.

1. Are we planning to have this check only for regular parent queue in separate 
jira?
2. Are we going to handle [managedParentQueue].%secondary_group in separate 
jira?


> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-10198-001.patch
>
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-03-14 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059200#comment-17059200
 ] 

Manikandan R edited comment on YARN-10154 at 3/14/20, 8:34 AM:
---

Yes, patch takes care of min and max in above discussed format only.

Updated patch for doc changes. 


was (Author: maniraj...@gmail.com):
Yes, patch takes care of min and max in above discussed format only.

Once [~sunilg] reviewed the patch, will update doc once for all with review 
comments (if any).

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-03-14 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10154:

Attachment: YARN-10154.002.patch

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch, YARN-10154.002.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-03-14 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059200#comment-17059200
 ] 

Manikandan R commented on YARN-10154:
-

Yes, patch takes care of min and max in above discussed format only.

Once [~sunilg] reviewed the patch, will update doc once for all with review 
comments (if any).

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10198) [managedParent].%primary_group mapping rule doesn't work after YARN-9868

2020-03-14 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059196#comment-17059196
 ] 

Manikandan R commented on YARN-10198:
-

Above mentioned NULL check has been added to be in line with YARN-9840 changes. 
With above check in place, client side thinks there is no queue mapping rule as 
ApplicationPlacementContext is NULL. Without above checks, 
ApplicationPlacementContext is not null but queue name is NULL. In both cases, 
"default" queue would be used.

Should we do this checks only for regular parent queue?

> [managedParent].%primary_group mapping rule doesn't work after YARN-9868
> 
>
> Key: YARN-10198
> URL: https://issues.apache.org/jira/browse/YARN-10198
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> YARN-9868 introduced an unnecessary check if we have the following placement 
> rule:
> [managedParentQueue].%primary_group
> Here, {{%primary_group}} is expected to be created if it doesn't exist. 
> However, there is this validation code which is not necessary:
> {noformat}
>   } else if (mapping.getQueue().equals(PRIMARY_GROUP_MAPPING)) {
> if (this.queueManager
> .getQueue(groups.getGroups(user).get(0)) != null) {
>   return getPlacementContext(mapping,
>   groups.getGroups(user).get(0));
> } else {
>   return null;
> }
> {noformat}
> We should revert this part to the original version:
> {noformat}
>   } else if (mapping.queue.equals(PRIMARY_GROUP_MAPPING)) {
> return getPlacementContext(mapping, 
> groups.getGroups(user).get(0));
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-03-09 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055304#comment-17055304
 ] 

Manikandan R commented on YARN-10154:
-

[~sunilg] Can you please do a quick check on this?

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-03-09 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055301#comment-17055301
 ] 

Manikandan R commented on YARN-10043:
-

[~snemeth] Can you please check?

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch, YARN-10043.004.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9831) NMTokenSecretManagerInRM#createNMToken blocks ApplicationMasterService allocate flow

2020-03-01 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048639#comment-17048639
 ] 

Manikandan R commented on YARN-9831:


In latest patch, I think {{!nodeSet.contains(container.getNodeId())}} block and 
its {{nodeSet.add(container.getNodeId());}} should be in critical section. 
Otherwise, it may lead to inconsistent data while adding (WRITE operation) new 
tokens into set as read op is followed by write op and lock is not write 
anymore.

In general, I think we can explore if computeIfPresent / computeIfAbsent / 
Compute methods of concurrent hash map can be used to perform operations like 
add, remove etc atomically in this context to avoid write locks.

> NMTokenSecretManagerInRM#createNMToken blocks ApplicationMasterService 
> allocate flow
> 
>
> Key: YARN-9831
> URL: https://issues.apache.org/jira/browse/YARN-9831
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9831.001.patch, YARN-9831.002.patch
>
>
> Currently attempt's NMToken cannot be generated independently. 
> Each attempts allocate flow blocks each other. We should improve the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9831) NMTokenSecretManagerInRM#createNMToken blocks ApplicationMasterService allocate flow

2020-02-27 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046840#comment-17046840
 ] 

Manikandan R commented on YARN-9831:


 [~BilwaST] Thanks for the patch.

Had a quick glance. Locks has been changed from "write" to "read" in 
{{createAndGetNMToken}} method assuming there shouldn't be any issues while 
adding Node Id's into Set ( nodeSet.add(container.getNodeId()); ) because it 
has been created ConcurrentHashMap.newKeySet(). If this is true, Should we 
apply the same principle to other places where in Node gets removed ( 
removeNodeKey() ) ?

> NMTokenSecretManagerInRM#createNMToken blocks ApplicationMasterService 
> allocate flow
> 
>
> Key: YARN-9831
> URL: https://issues.apache.org/jira/browse/YARN-9831
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9831.001.patch, YARN-9831.002.patch
>
>
> Currently attempt's NMToken cannot be generated independently. 
> Each attempts allocate flow blocks each other. We should improve the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10155) TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk

2020-02-27 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17046758#comment-17046758
 ] 

Manikandan R commented on YARN-10155:
-

Ok. However, we can get this patch in as it fixes the exception and reduces the 
waiting time. Post that, we can see the behaviour in Jenkins for some time and 
then close the Jira based on the results. Thoughts?

> TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk
> 
>
> Key: YARN-10155
> URL: https://issues.apache.org/jira/browse/YARN-10155
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10155.001.patch, testTokenThreadTimeout.txt, 
> testTokenThreadTimeout_with_patch.txt
>
>
> The TestDelegationTokenRenewer.testTokenThreadTimeout test committed in 
> YARN-9768 often fails with timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-02-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045710#comment-17045710
 ] 

Manikandan R commented on YARN-10043:
-

[~snemeth] Can you please take a final look to commit the code?

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch, YARN-10043.004.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10155) TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk

2020-02-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045709#comment-17045709
 ] 

Manikandan R commented on YARN-10155:
-

[~adam.antal] Thanks for your effort. I've been running all my tests through 
mvn cli with prior clean and build. Can you please try the alternative step?

> TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk
> 
>
> Key: YARN-10155
> URL: https://issues.apache.org/jira/browse/YARN-10155
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10155.001.patch, testTokenThreadTimeout.txt, 
> testTokenThreadTimeout_with_patch.txt
>
>
> The TestDelegationTokenRenewer.testTokenThreadTimeout test committed in 
> YARN-9768 often fails with timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9893) Capacity scheduler: enhance leaf-queue-template capacity / maximum-capacity setting

2020-02-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045707#comment-17045707
 ] 

Manikandan R commented on YARN-9893:


[~pbacsko]  Part of this requirement would be addressed in YARN-10154. We can 
handle support for two percentage values in separate jira.

Is that ok?

> Capacity scheduler: enhance leaf-queue-template capacity / maximum-capacity 
> setting
> ---
>
> Key: YARN-9893
> URL: https://issues.apache.org/jira/browse/YARN-9893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
>
> Capacity Scheduler does not support two percentage values for leaf queue 
> capacity and maximum-capacity settings. So, you can't do something like this:
> {{yarn.scheduler.capacity.root.users.john.leaf-queue-template.capacity=memory-mb=50.0%,
>  vcores=50.0%}}
> On top of that, it's not even possible to define absolute resources:
> {{yarn.scheduler.capacity.root.users.john.leaf-queue-template.capacity=memory-mb=16384,
>  vcores=8}}
> Only a single percentage value is accepted.
> This makes it nearly impossible to properly convert a similar setting from 
> Fair Scheduler, where such a configuration is valid and accepted 
> ({{}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-02-26 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R reassigned YARN-10154:
---

Assignee: Manikandan R  (was: Sunil G)

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10154.001.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9893) Capacity scheduler: enhance leaf-queue-template capacity / maximum-capacity setting

2020-02-26 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R reassigned YARN-9893:
--

Assignee: Manikandan R  (was: Peter Bacsko)

> Capacity scheduler: enhance leaf-queue-template capacity / maximum-capacity 
> setting
> ---
>
> Key: YARN-9893
> URL: https://issues.apache.org/jira/browse/YARN-9893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Peter Bacsko
>Assignee: Manikandan R
>Priority: Major
>
> Capacity Scheduler does not support two percentage values for leaf queue 
> capacity and maximum-capacity settings. So, you can't do something like this:
> {{yarn.scheduler.capacity.root.users.john.leaf-queue-template.capacity=memory-mb=50.0%,
>  vcores=50.0%}}
> On top of that, it's not even possible to define absolute resources:
> {{yarn.scheduler.capacity.root.users.john.leaf-queue-template.capacity=memory-mb=16384,
>  vcores=8}}
> Only a single percentage value is accepted.
> This makes it nearly impossible to properly convert a similar setting from 
> Fair Scheduler, where such a configuration is valid and accepted 
> ({{}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-02-26 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045698#comment-17045698
 ] 

Manikandan R commented on YARN-10154:
-

Attaching WIP patch with simple test case to validate approach is good enough 
to proceed further. 

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-10154.001.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-02-26 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10154:

Attachment: YARN-10154.001.patch

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-10154.001.patch
>
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10155) TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk

2020-02-24 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10155:

Attachment: YARN-10155.001.patch

> TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk
> 
>
> Key: YARN-10155
> URL: https://issues.apache.org/jira/browse/YARN-10155
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10155.001.patch
>
>
> The TestDelegationTokenRenewer.testTokenThreadTimeout test committed in 
> YARN-9768 often fails with timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10155) TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk

2020-02-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043538#comment-17043538
 ] 

Manikandan R commented on YARN-10155:
-

[~inigoiri]

I was not able to reproduce on my machine even after multiple runs. However, as 
a first step, I fixed the ArithmeticException as mentioned in 
https://issues.apache.org/jira/browse/YARN-9768?focusedCommentId=17022152=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17022152

assuming that this would avoid huge amount of unnecessary I/O into logs and 
also to get us into cleaner situation wherein we can focus on the actual 
problem. Also, reduced the timeout values to avoid waiting unnecessarily.

Please take a look.

> TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk
> 
>
> Key: YARN-10155
> URL: https://issues.apache.org/jira/browse/YARN-10155
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Manikandan R
>Priority: Major
>
> The TestDelegationTokenRenewer.testTokenThreadTimeout test committed in 
> YARN-9768 often fails with timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10154) CS Dynamic Queues cannot be configured with absolute resources

2020-02-21 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17042026#comment-17042026
 ] 

Manikandan R commented on YARN-10154:
-

[~sunilg] Can I take this forward?

> CS Dynamic Queues cannot be configured with absolute resources
> --
>
> Key: YARN-10154
> URL: https://issues.apache.org/jira/browse/YARN-10154
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
>
> In CS, ManagedParent Queue and its template cannot take absolute resource 
> value like 
> [memory=8192,vcores=8]
>  Thsi Jira is to track and improve the configuration reading module of 
> DynamicQueue to support absolute resource values.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-10155) TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk

2020-02-21 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R reassigned YARN-10155:
---

Assignee: Manikandan R

> TestDelegationTokenRenewer.testTokenThreadTimeout fails in trunk
> 
>
> Key: YARN-10155
> URL: https://issues.apache.org/jira/browse/YARN-10155
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0
>Reporter: Adam Antal
>Assignee: Manikandan R
>Priority: Major
>
> The TestDelegationTokenRenewer.testTokenThreadTimeout test committed in 
> YARN-9768 often fails with timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-02-19 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040219#comment-17040219
 ] 

Manikandan R commented on YARN-10043:
-

[~snemeth] Can you review the latest patch?

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch, YARN-10043.004.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6492) Generate queue metrics for each partition

2020-02-16 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037902#comment-17037902
 ] 

Manikandan R commented on YARN-6492:


[~jhung]  and [~epayne]  were doing the code reviews. Pinging them..Will follow 
up and move forward.

> Generate queue metrics for each partition
> -
>
> Key: YARN-6492
> URL: https://issues.apache.org/jira/browse/YARN-6492
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Jonathan Hung
>Assignee: Manikandan R
>Priority: Major
> Attachments: PartitionQueueMetrics_default_partition.txt, 
> PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, 
> YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, 
> YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, 
> YARN-6492.007.WIP.patch, partition_metrics.txt
>
>
> We are interested in having queue metrics for all partitions. Right now each 
> queue has one QueueMetrics object which captures metrics either in default 
> partition or across all partitions. (After YARN-6467 it will be in default 
> partition)
> But having the partition metrics would be very useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10043) FairOrderingPolicy Improvements

2020-02-13 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10043:

Attachment: YARN-10043.004.patch

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch, YARN-10043.004.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-02-13 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17036349#comment-17036349
 ] 

Manikandan R commented on YARN-10043:
-

Thanks [~snemeth]. Addressed all comments.
{quote}You can remove the type argument to FairOrderingPolicy.
{quote}
Are you suggesting FairOrderingPolicy policy = new FairOrderingPolicy(); ?
 If so, we will need to do suppress warnings etc.

 

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-02-10 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033715#comment-17033715
 ] 

Manikandan R commented on YARN-10043:
-

[~pbacsko] Can you please take a look? YARN-10049 is also dependent on this.

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10043) FairOrderingPolicy Improvements

2020-02-02 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028478#comment-17028478
 ] 

Manikandan R edited comment on YARN-10043 at 2/2/20 4:47 PM:
-

[~pbacsko] Thanks for your reviews.
 # Taken care.
 # Compare demands ensures entity without resource demand get lower priority 
than ones who have demands. When both entities has certain demands ( > 0), then 
there is no actual comparison. Hence the changes are in that way. It has been 
commented too at class level. Similar to {{FairSharePolicy}} implementation.
 # For #3 and #4, yes there are multiple asserts intended to clearly show the 
expected final ones are passing after asserting all earlier comparison (in the 
precedence order) has been passed as well. However, I see your point and made 
the changes by balancing both the views.


was (Author: maniraj...@gmail.com):
[~pbacsko] Thanks for your reviews.
 # Taken care.
 # Compare demands ensures entity without resource demand get lower priority 
than ones who have demands. When both entities has certain demands ( > 0), then 
there is no actual comparison. Hence the changes are in that way. It has been 
commented too at class level. Similar to {{FairSharePolicy}} implementation.

 # For #3 and #4, yes there are multiple asserts intended to clearly show the 
expected final ones are passing after asserting all earlier comparison (in the 
precedence order) has been passed as well. However, I see your point and made 
the changes by balancing both the views.

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10043) FairOrderingPolicy Improvements

2020-02-02 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10043:

Attachment: YARN-10043.003.patch

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch, 
> YARN-10043.003.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-02-02 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028478#comment-17028478
 ] 

Manikandan R commented on YARN-10043:
-

[~pbacsko] Thanks for your reviews.
 # Taken care.
 # Compare demands ensures entity without resource demand get lower priority 
than ones who have demands. When both entities has certain demands ( > 0), then 
there is no actual comparison. Hence the changes are in that way. It has been 
commented too at class level. Similar to {{FairSharePolicy}} implementation.

 # For #3 and #4, yes there are multiple asserts intended to clearly show the 
expected final ones are passing after asserting all earlier comparison (in the 
precedence order) has been passed as well. However, I see your point and made 
the changes by balancing both the views.

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-01-28 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025224#comment-17025224
 ] 

Manikandan R commented on YARN-10043:
-

[~leftnoteasy] [~sunilg] Can you please take a look?

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-27 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9768:
---
Attachment: YARN-9768.010.patch

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch, YARN-9768.010.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-27 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024831#comment-17024831
 ] 

Manikandan R commented on YARN-9768:


Attaching 0.10 patch..

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023064#comment-17023064
 ] 

Manikandan R commented on YARN-9768:


can you trigger?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10049) FIFOOrderingPolicy Improvements

2020-01-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023056#comment-17023056
 ] 

Manikandan R commented on YARN-10049:
-

It depends on YARN-10043 changes. Requires rebasing after committing YARN-10043.

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10103) Capacity scheduler: add support for create=true/false per mapping rule

2020-01-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023050#comment-17023050
 ] 

Manikandan R commented on YARN-10103:
-

[~pbacsko] I think AutoCreateEnabledParentQueue can be used to achieve leaf 
queue creation under parent queue (for which auto has been enabled) in 
conjunction with queue mapping rules. Please check 
yarn.scheduler.capacity..auto-create-child-queue.enabled property.

> Capacity scheduler: add support for create=true/false per mapping rule
> --
>
> Key: YARN-10103
> URL: https://issues.apache.org/jira/browse/YARN-10103
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Priority: Major
>
> You can't ask Capacity Scheduler for a mapping to create a queue if it 
> doesn't exist.
> For example, this mapping would use the first rule if the queue exist. If it 
> doesn't, then it proceeds to the next rule:
>  {{u:%user:%primary_group.%user:create=false;u:%user%:root.default}}
> Let's say user "alice" belongs to the "admins" group. It would first try to 
> map {{root.admins.alice}}. But, if the queue doesn't exist, then it places 
> the application into {{root.default}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10102) Capacity scheduler: add support for %specified mapping

2020-01-24 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023044#comment-17023044
 ] 

Manikandan R commented on YARN-10102:
-

[~pbacsko] I think yarn.scheduler.capacity.queue-mappings-override.enable does 
the same thing. Queue extracted from mapping rules would be used only if this 
property has been set to true. 

> Capacity scheduler: add support for %specified mapping
> --
>
> Key: YARN-10102
> URL: https://issues.apache.org/jira/browse/YARN-10102
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Peter Bacsko
>Priority: Major
>
> The reduce the gap between Fair Scheduler and Capacity Scheduler, it's 
> reasonable to have a {{%specified}} mapping. This would be equivalent to the 
> {{}}  placement rule in FS, that is, use the queue that comes in 
> with the application submission context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10043) FairOrderingPolicy Improvements

2020-01-23 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10043:

Attachment: YARN-10043.002.patch

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-01-23 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022258#comment-17022258
 ] 

Manikandan R commented on YARN-10043:
-

Fixed checkstyle, javadoc issues. Junit failure is not related to this patch.

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch, YARN-10043.002.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-23 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17022152#comment-17022152
 ] 

Manikandan R commented on YARN-9768:


[~inigoiri]

I ran this test 5 times, but haven't come across this timeout issue. Only 1 
time, VM crash had occurred. In addition, I do see lot of

{{java.util.concurrent.ExecutionException: java.lang.ArithmeticException: / by 
zero}}

in logs. Seems it is related to YARN-9817 . Should we trigger Jenkins again and 
see?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-22 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9768:
---
Attachment: YARN-9768.009.patch

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-22 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17021106#comment-17021106
 ] 

Manikandan R commented on YARN-9768:


Rebased the patch. Can you please take it forward?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch, 
> YARN-9768.009.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry

2020-01-20 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019628#comment-17019628
 ] 

Manikandan R commented on YARN-9768:


[~inigoiri] Can you please review his comment and commit the code?

> RM Renew Delegation token thread should timeout and retry
> -
>
> Key: YARN-9768
> URL: https://issues.apache.org/jira/browse/YARN-9768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9768.001.patch, YARN-9768.002.patch, 
> YARN-9768.003.patch, YARN-9768.004.patch, YARN-9768.005.patch, 
> YARN-9768.006.patch, YARN-9768.007.patch, YARN-9768.008.patch
>
>
> Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews 
> HDFS tokens received to check for validity and expiration time.
> This call is made to an underlying HDFS NN or Router Node (which has exact 
> APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the 
> thread remains stuck indefinitely. The thread should ideally timeout the 
> renewToken and retry from the client's perspective.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10049) FIFOOrderingPolicy Improvements

2020-01-19 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018890#comment-17018890
 ] 

Manikandan R commented on YARN-10049:
-

Thanks [~sunilg] [~leftnoteasy]

Attaching .001.patch based on earlier discussions.

{{FIFOComparator}} has been used in FifoOrderingPolicy, 
FifoOrderingPolicyForPendingApps, FifoOrderingPolicyWithExclusivePartitions 
(through other class) and FairOrderingPolicy but in different order. So changes 
made in comparator would be applicable in all above policies.

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10049) FIFOOrderingPolicy Improvements

2020-01-19 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10049:

Attachment: YARN-10049.001.patch

> FIFOOrderingPolicy Improvements
> ---
>
> Key: YARN-10049
> URL: https://issues.apache.org/jira/browse/YARN-10049
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10049.001.patch
>
>
> FIFOPolicy of FS does the following comparisons in addition to app priority 
> comparison:
> 1. Using Start time
> 2. Using Name
> Scope of this jira is to achieve the same comparisons in FIFOOrderingPolicy 
> of CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10043) FairOrderingPolicy Improvements

2020-01-18 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10043:

Attachment: YARN-10043.001.patch

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10043) FairOrderingPolicy Improvements

2020-01-18 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10043:

Attachment: (was: YARN-10043.001.patch)

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9831) NMTokenSecretManagerInRM#createNMToken blocks ApplicationMasterService allocate flow

2020-01-18 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018585#comment-17018585
 ] 

Manikandan R commented on YARN-9831:


[~BilwaST] Spent sometime on this Jira earlier. Attaching patch developed then 
for your reference. Please use if you have similar thoughts and take it forward.

> NMTokenSecretManagerInRM#createNMToken blocks ApplicationMasterService 
> allocate flow
> 
>
> Key: YARN-9831
> URL: https://issues.apache.org/jira/browse/YARN-9831
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9831.001.patch
>
>
> Currently attempt's NMToken cannot be generated independently. 
> Each attempts allocate flow blocks each other. We should improve the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9831) NMTokenSecretManagerInRM#createNMToken blocks ApplicationMasterService allocate flow

2020-01-18 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-9831:
---
Attachment: YARN-9831.001.patch

> NMTokenSecretManagerInRM#createNMToken blocks ApplicationMasterService 
> allocate flow
> 
>
> Key: YARN-9831
> URL: https://issues.apache.org/jira/browse/YARN-9831
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-9831.001.patch
>
>
> Currently attempt's NMToken cannot be generated independently. 
> Each attempts allocate flow blocks each other. We should improve the same



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10043) FairOrderingPolicy Improvements

2020-01-18 Thread Manikandan R (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-10043:

Attachment: YARN-10043.001.patch

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10043) FairOrderingPolicy Improvements

2020-01-18 Thread Manikandan R (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018584#comment-17018584
 ] 

Manikandan R commented on YARN-10043:
-

Thanks [~leftnoteasy] and everyone for sharing your thoughts.

Attached .001.patch based on our discussions.

> FairOrderingPolicy Improvements
> ---
>
> Key: YARN-10043
> URL: https://issues.apache.org/jira/browse/YARN-10043
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-10043.001.patch
>
>
> FairOrderingPolicy can be improved by using some of the approaches (only 
> relevant) implemented in FairSharePolicy of FS. This improvement has 
> significance in FS to CS migration context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   >