[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently
[ https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133916#comment-17133916 ] Manikandan R commented on YARN-10297: - Thanks [~Jim_Brennan]. LGTM. Please fix whitespace issues. > TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails > intermittently > --- > > Key: YARN-10297 > URL: https://issues.apache.org/jira/browse/YARN-10297 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-10297.001.patch > > > After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails > intermittently when running {{mvn test -Dtest=TestContinuousScheduling}} > {noformat}[INFO] Running > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling > [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling > [ERROR] > testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling) > Time elapsed: 0.194 s <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > PartitionQueueMetrics,partition= already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently
[ https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136293#comment-17136293 ] Manikandan R commented on YARN-10297: - [~jhung] Patch LGTM. Can you please take a look and commit? > TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails > intermittently > --- > > Key: YARN-10297 > URL: https://issues.apache.org/jira/browse/YARN-10297 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jim Brennan >Priority: Major > Attachments: YARN-10297.001.patch, YARN-10297.002.patch > > > After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails > intermittently when running {{mvn test -Dtest=TestContinuousScheduling}} > {noformat}[INFO] Running > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling > [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 > s <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling > [ERROR] > testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling) > Time elapsed: 0.194 s <<< ERROR! > org.apache.hadoop.metrics2.MetricsException: Metrics source > PartitionQueueMetrics,partition= already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904027#comment-16904027 ] Manikandan R commented on YARN-6492: Ok, [~eepayne]. Will look into this. Some observations on .004.patch are 1. Since partition info are being extracted from request and node, there is a problem. For example, Node N has been mapped to Label X (Non exclusive). Queue A has been configured with ANY Node label. App A requested resources from Queue A and its containers ran on Node N for some reasons. During AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) would get used for calculation. Lets say allocate call has been fired for 3 containers of 1 GB each, then a. PartitionDefault * queue A -> pending mb is 3 GB b. PartitionX * queue A -> pending mb is -3 GB is the outcome. Because app request has been fired without any label specification and #a metrics has been derived. After allocation is over, pending resources usually gets decreased. When this happens, it use node partition info. hence #b metrics has derived. Given this kind of situation, We will need to put some thoughts on achieving the metrics correctly. 2. Though the intent of this jira is to do Partition Queue Metrics, we would like to retain the existing Queue Metrics for backward compatibility (as you can see from jira's discussion). With this patch and YARN-9596 patch, queuemetrics (for queue's) would be overridden either with some specific partition values or default partition values. It could be vice - versa as well. For example, after the queues (say queue A) has been initialised with some min and max cap and also with node label's min and max cap, Queuemetrics (availableMB) for queue A return values based on node label's cap config. I've been working on these observations to provide a fix and attached .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, availableVcores is correct (Please refer above #2 observation). Added more asserts in {{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for #2 is working properly. Also one more thing to note is, user metrics for availableMB, availableVcores at root queue was not there even before. Retained the same behaviour. User metrics for availableMB, availableVcores is available only at child queue's level and also with partitions. Will focus on #1 in next patch. > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R >Priority: Major > Attachments: PartitionQueueMetrics_default_partition.txt, > PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, > YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, > YARN-6492.004.patch, partition_metrics.txt > > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-6492: --- Attachment: YARN-6492.005.WIP.patch > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R >Priority: Major > Attachments: PartitionQueueMetrics_default_partition.txt, > PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, > YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, > YARN-6492.004.patch, YARN-6492.005.WIP.patch, partition_metrics.txt > > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909600#comment-16909600 ] Manikandan R commented on YARN-6492: [~eepayne] Observations mentioned earlier are important ones which had come up as part of iterative development. I think this whole PartitionQueueMetrics feature won't be in usable state without these fixes. At the same time, I am totally OK with having separate JIRA's for ease of tracking, assuming that we would be marking this whole feature as complete only after this new JIRA related to issues has been fixed. Reg the structure, Yes, we would like to sync with UI, Rest API etc like discussed very earlier in this JIRA. > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R >Priority: Major > Attachments: PartitionQueueMetrics_default_partition.txt, > PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, > YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, > YARN-6492.004.patch, YARN-6492.005.WIP.patch, partition_metrics.txt > > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909636#comment-16909636 ] Manikandan R commented on YARN-9756: [~eepayne] I spent some time on understanding this. ProportionalCapacityPreemptionPolicy#preemptOrkillSelectedContainerAfterWait triggers an pre-emption event for each container based on max limit allowed per round. I think if we can do sum of memory/vcores of all containers going to be pre-empt for each round and call appropriate metrics methods here. Is this correct? Also, since metrics is going to be per round, assuming there would be so many rounds, Wouldn't be difficult for users to derive value out of it? Do you have any JMX o/p structure in your mind? > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9756: --- Attachment: YARN-9756.WIP.patch > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911513#comment-16911513 ] Manikandan R commented on YARN-9756: {quote}These new metrics will be similar to AggregateMemoryMBSecondsPreempted, AggregateVcoreSecondsPreempted, etc. I propose to process the total preempted resources in the same way that is done for preempted seconds (memory, vcores, etc). LeafQueue#updateQueuePreemptionMetrics will aggregate the total preempted resources just like it does for preempted resource seconds.{quote} Ok, I was bit confused with "per round". Attached a quick WIP patch for your review. {quote}The challenge I have encountered is making this work for extended resources (like gpu, etc.){quote} Can {{QueueMetricsForCustomResources}} be used to generate this metric like other metrics for GPU? Have covered this also in the patch. Please share your views. > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Major > Attachments: YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9767) PartitionQueueMetrics Issues
Manikandan R created YARN-9767: -- Summary: PartitionQueueMetrics Issues Key: YARN-9767 URL: https://issues.apache.org/jira/browse/YARN-9767 Project: Hadoop YARN Issue Type: Bug Reporter: Manikandan R Assignee: Manikandan R The intent of the Jira is to capture the issues/observations encountered as part of YARN-6492 development separately for ease of tracking. Observations: Please refer this https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027 1. Since partition info are being extracted from request and node, there is a problem. For example, Node N has been mapped to Label X (Non exclusive). Queue A has been configured with ANY Node label. App A requested resources from Queue A and its containers ran on Node N for some reasons. During AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) would get used for calculation. Lets say allocate call has been fired for 3 containers of 1 GB each, then a. PartitionDefault * queue A -> pending mb is 3 GB b. PartitionX * queue A -> pending mb is -3 GB is the outcome. Because app request has been fired without any label specification and #a metrics has been derived. After allocation is over, pending resources usually gets decreased. When this happens, it use node partition info. hence #b metrics has derived. Given this kind of situation, We will need to put some thoughts on achieving the metrics correctly. 2. Though the intent of this jira is to do Partition Queue Metrics, we would like to retain the existing Queue Metrics for backward compatibility (as you can see from jira's discussion). With this patch and YARN-9596 patch, queuemetrics (for queue's) would be overridden either with some specific partition values or default partition values. It could be vice - versa as well. For example, after the queues (say queue A) has been initialised with some min and max cap and also with node label's min and max cap, Queuemetrics (availableMB) for queue A return values based on node label's cap config. I've been working on these observations to provide a fix and attached .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, availableVcores is correct (Please refer above #2 observation). Added more asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for #2 is working properly. Also one more thing to note is, user metrics for availableMB, availableVcores at root queue was not there even before. Retained the same behaviour. User metrics for availableMB, availableVcores is available only at child queue's level and also with partitions. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9767) PartitionQueueMetrics Issues
[ https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911575#comment-16911575 ] Manikandan R commented on YARN-9767: On #1 observation, After container allocation, pending resources gets deducted inside {{QueueeMetrics#allocateResources}} using the Node Partition as opposed to requested partition info. I think RMContainerImpl#getNodeLabelExpression can be used to decreasing pending resources as it is more appropriate because of following reasons: 1. {{RMContainerImpl#getNodeLabelExpression}} is derived from {{AppPlacementAllocator#getPrimaryRequestedNodePartition}}. Java doc of {{AppPlacementAllocator#getPrimaryRequestedNodePartition}} is good enough to explain this. 2. In this case, actual intent is to run on ANY where (which is nothing but the "default" partition) but ended up in using some non exclusive partition. So increasing pending resources on "default" partition or PrimaryRequestedNodePartition (mostly "default" or any specific partition) and deducting the pending resources in the same way seems to be correct one rather than increasing and decreasing in two different places. So fix would be something like {{AppSchedulingInfo#updateMetrics}} {code:java} queue.getMetrics().allocateResources(node.getPartition(), user, 1, containerAllocated.getContainer().getResource(), false); queue.getMetrics().decrPendingResources( containerAllocated.getNodeLabelExpression(), user, 1, containerAllocated.getContainer().getResource()); {code} instead of {code:java} queue.getMetrics().allocateResources(node.getPartition(), user, 1, containerAllocated.getContainer().getResource(), true); {code} Please share your thoughts. > PartitionQueueMetrics Issues > > > Key: YARN-9767 > URL: https://issues.apache.org/jira/browse/YARN-9767 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > The intent of the Jira is to capture the issues/observations encountered as > part of YARN-6492 development separately for ease of tracking. > Observations: > Please refer this > https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027 > 1. Since partition info are being extracted from request and node, there is a > problem. For example, > > Node N has been mapped to Label X (Non exclusive). Queue A has been > configured with ANY Node label. App A requested resources from Queue A and > its containers ran on Node N for some reasons. During > AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) > would get used for calculation. Lets say allocate call has been fired for 3 > containers of 1 GB each, then > a. PartitionDefault * queue A -> pending mb is 3 GB > b. PartitionX * queue A -> pending mb is -3 GB > > is the outcome. Because app request has been fired without any label > specification and #a metrics has been derived. After allocation is over, > pending resources usually gets decreased. When this happens, it use node > partition info. hence #b metrics has derived. > > Given this kind of situation, We will need to put some thoughts on achieving > the metrics correctly. > > 2. Though the intent of this jira is to do Partition Queue Metrics, we would > like to retain the existing Queue Metrics for backward compatibility (as you > can see from jira's discussion). > With this patch and YARN-9596 patch, queuemetrics (for queue's) would be > overridden either with some specific partition values or default partition > values. It could be vice - versa as well. For example, after the queues (say > queue A) has been initialised with some min and max cap and also with node > label's min and max cap, Queuemetrics (availableMB) for queue A return values > based on node label's cap config. > I've been working on these observations to provide a fix and attached > .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, > availableVcores is correct (Please refer above #2 observation). Added more > asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for > #2 is working properly. > Also one more thing to note is, user metrics for availableMB, availableVcores > at root queue was not there even before. Retained the same behaviour. User > metrics for availableMB, availableVcores is available only at child queue's > level and also with partitions. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h..
[jira] [Updated] (YARN-9767) PartitionQueueMetrics Issues
[ https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9767: --- Parent: YARN-6492 Issue Type: Sub-task (was: Bug) > PartitionQueueMetrics Issues > > > Key: YARN-9767 > URL: https://issues.apache.org/jira/browse/YARN-9767 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > The intent of the Jira is to capture the issues/observations encountered as > part of YARN-6492 development separately for ease of tracking. > Observations: > Please refer this > https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027 > 1. Since partition info are being extracted from request and node, there is a > problem. For example, > > Node N has been mapped to Label X (Non exclusive). Queue A has been > configured with ANY Node label. App A requested resources from Queue A and > its containers ran on Node N for some reasons. During > AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) > would get used for calculation. Lets say allocate call has been fired for 3 > containers of 1 GB each, then > a. PartitionDefault * queue A -> pending mb is 3 GB > b. PartitionX * queue A -> pending mb is -3 GB > > is the outcome. Because app request has been fired without any label > specification and #a metrics has been derived. After allocation is over, > pending resources usually gets decreased. When this happens, it use node > partition info. hence #b metrics has derived. > > Given this kind of situation, We will need to put some thoughts on achieving > the metrics correctly. > > 2. Though the intent of this jira is to do Partition Queue Metrics, we would > like to retain the existing Queue Metrics for backward compatibility (as you > can see from jira's discussion). > With this patch and YARN-9596 patch, queuemetrics (for queue's) would be > overridden either with some specific partition values or default partition > values. It could be vice - versa as well. For example, after the queues (say > queue A) has been initialised with some min and max cap and also with node > label's min and max cap, Queuemetrics (availableMB) for queue A return values > based on node label's cap config. > I've been working on these observations to provide a fix and attached > .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, > availableVcores is correct (Please refer above #2 observation). Added more > asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for > #2 is working properly. > Also one more thing to note is, user metrics for availableMB, availableVcores > at root queue was not there even before. Retained the same behaviour. User > metrics for availableMB, availableVcores is available only at child queue's > level and also with partitions. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911579#comment-16911579 ] Manikandan R commented on YARN-6492: Created YARN-9767 to track the issues separately. > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R >Priority: Major > Attachments: PartitionQueueMetrics_default_partition.txt, > PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, > YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, > YARN-6492.004.patch, YARN-6492.005.WIP.patch, partition_metrics.txt > > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9767) PartitionQueueMetrics Issues
[ https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911575#comment-16911575 ] Manikandan R edited comment on YARN-9767 at 8/20/19 5:37 PM: - On #1 observation, After container allocation, pending resources gets deducted inside {{QueueeMetrics#allocateResources}} using the Node Partition as opposed to using requested partition info. I think RMContainerImpl#getNodeLabelExpression can be used to decreasing pending resources as it is more appropriate because of following reasons: 1. {{RMContainerImpl#getNodeLabelExpression}} is derived from {{AppPlacementAllocator#getPrimaryRequestedNodePartition}}. Java doc of {{AppPlacementAllocator#getPrimaryRequestedNodePartition}} is good enough to explain this. 2. In this case, actual intent is to run on ANY where (which is nothing but the "default" partition) but ended up in using some non exclusive partition. So increasing pending resources on "default" partition or PrimaryRequestedNodePartition (mostly "default" or any specific partition) and deducting the pending resources in the same way seems to be correct one rather than increasing and decreasing in two different places. So fix would be something like {{AppSchedulingInfo#updateMetrics}} {code:java} queue.getMetrics().allocateResources(node.getPartition(), user, 1, containerAllocated.getContainer().getResource(), false); queue.getMetrics().decrPendingResources( containerAllocated.getNodeLabelExpression(), user, 1, containerAllocated.getContainer().getResource()); {code} instead of {code:java} queue.getMetrics().allocateResources(node.getPartition(), user, 1, containerAllocated.getContainer().getResource(), true); {code} Please share your thoughts. was (Author: maniraj...@gmail.com): On #1 observation, After container allocation, pending resources gets deducted inside {{QueueeMetrics#allocateResources}} using the Node Partition as opposed to requested partition info. I think RMContainerImpl#getNodeLabelExpression can be used to decreasing pending resources as it is more appropriate because of following reasons: 1. {{RMContainerImpl#getNodeLabelExpression}} is derived from {{AppPlacementAllocator#getPrimaryRequestedNodePartition}}. Java doc of {{AppPlacementAllocator#getPrimaryRequestedNodePartition}} is good enough to explain this. 2. In this case, actual intent is to run on ANY where (which is nothing but the "default" partition) but ended up in using some non exclusive partition. So increasing pending resources on "default" partition or PrimaryRequestedNodePartition (mostly "default" or any specific partition) and deducting the pending resources in the same way seems to be correct one rather than increasing and decreasing in two different places. So fix would be something like {{AppSchedulingInfo#updateMetrics}} {code:java} queue.getMetrics().allocateResources(node.getPartition(), user, 1, containerAllocated.getContainer().getResource(), false); queue.getMetrics().decrPendingResources( containerAllocated.getNodeLabelExpression(), user, 1, containerAllocated.getContainer().getResource()); {code} instead of {code:java} queue.getMetrics().allocateResources(node.getPartition(), user, 1, containerAllocated.getContainer().getResource(), true); {code} Please share your thoughts. > PartitionQueueMetrics Issues > > > Key: YARN-9767 > URL: https://issues.apache.org/jira/browse/YARN-9767 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > The intent of the Jira is to capture the issues/observations encountered as > part of YARN-6492 development separately for ease of tracking. > Observations: > Please refer this > https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027 > 1. Since partition info are being extracted from request and node, there is a > problem. For example, > > Node N has been mapped to Label X (Non exclusive). Queue A has been > configured with ANY Node label. App A requested resources from Queue A and > its containers ran on Node N for some reasons. During > AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) > would get used for calculation. Lets say allocate call has been fired for 3 > containers of 1 GB each, then > a. PartitionDefault * queue A -> pending mb is 3 GB > b. PartitionX * queue A -> pending mb is -3 GB > > is the outcome. Because app request has been fired without any label > specification and #a metrics has been derived. After allocation is over, > pending resour
[jira] [Commented] (YARN-9766) YARN CapacityScheduler QueueMetrics has missing metrics for parent queues having same name
[ https://issues.apache.org/jira/browse/YARN-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912532#comment-16912532 ] Manikandan R commented on YARN-9766: While constructing Queue objects, it makes use of {{old}}, does null check and use {{getMetrics}} if it is possible. Below piece of code is not letting to create metrics for "root.a.d.b" as "root.a.b" has been generated before. I think checking equality using getQueuePath() in addition to "null" check helps to differentiate these two different paths. cc [~eepayne] [~sunilg] {code} this.metrics = old != null ? (CSQueueMetrics) old.getMetrics() : CSQueueMetrics.forQueue(getQueuePath(), parent, cs.getConfiguration().getEnableUserMetrics(), cs.getConf()); {code} [~tarunparimi] Can I take it forward? > YARN CapacityScheduler QueueMetrics has missing metrics for parent queues > having same name > -- > > Key: YARN-9766 > URL: https://issues.apache.org/jira/browse/YARN-9766 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > > In Capacity Scheduler, we enforce Leaf Queues to have unique names. But it is > not the case for Parent Queues. For example, we can have the below queue > hierarchy, where "b" is the queue name for two different queue paths root.a.b > and root.a.d.b . Since it is not a leaf queue this configuration works and > apps run fine in the leaf queues 'c' and 'e'. > * root > ** a > *** b > c > *** d > b > * e > But the jmx metrics does not show the metrics for the parent queue > "root.a.d.b" . We can see metrics only for "root.a.b" queue. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9756: --- Attachment: YARN-9756.001.patch > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9756.001.patch, YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913455#comment-16913455 ] Manikandan R commented on YARN-9756: [~eepayne] Thanks. Attached .001.patch for your reviews. > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9756.001.patch, YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues
Manikandan R created YARN-9772: -- Summary: CapacitySchedulerQueueManager has incorrect list of queues Key: YARN-9772 URL: https://issues.apache.org/jira/browse/YARN-9772 Project: Hadoop YARN Issue Type: Bug Reporter: Manikandan R Assignee: Manikandan R CapacitySchedulerQueueManager has incorrect list of queues when there is more than one parent queue (say at middle level) with same name. For example, * root ** a *** b c *** d b * e {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in the map because of similar name. After parsing all the queues, map count should be 7, but it is 6. Any reference to queue "root.a.b" in code path is nothing but "root.a.d.b" object. Since {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, will need to understand the implications in detail. For example, {{CapapcityScheduler#getQueue}} has been used in many places which in turn uses {{CapacitySchedulerQueueManager#getQueues. cc [~eepayne], [~sunilg] }} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues
[ https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9772: --- Description: CapacitySchedulerQueueManager has incorrect list of queues when there is more than one parent queue (say at middle level) with same name. For example, * root ** a *** b c *** d b * e {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in the map because of similar name. After parsing all the queues, map count should be 7, but it is 6. Any reference to queue "root.a.b" in code path is nothing but "root.a.d.b" object. Since {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, will need to understand the implications in detail. For example, {{CapapcityScheduler#getQueue}} has been used in many places which in turn uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg] was: CapacitySchedulerQueueManager has incorrect list of queues when there is more than one parent queue (say at middle level) with same name. For example, * root ** a *** b c *** d b * e {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in the map because of similar name. After parsing all the queues, map count should be 7, but it is 6. Any reference to queue "root.a.b" in code path is nothing but "root.a.d.b" object. Since {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, will need to understand the implications in detail. For example, {{CapapcityScheduler#getQueue}} has been used in many places which in turn uses {{CapacitySchedulerQueueManager#getQueues. cc [~eepayne], [~sunilg] }} > CapacitySchedulerQueueManager has incorrect list of queues > -- > > Key: YARN-9772 > URL: https://issues.apache.org/jira/browse/YARN-9772 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > CapacitySchedulerQueueManager has incorrect list of queues when there is more > than one parent queue (say at middle level) with same name. > For example, > * root > ** a > *** b > c > *** d > b > * e > {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. > While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in > the map because of similar name. After parsing all the queues, map count > should be 7, but it is 6. Any reference to queue "root.a.b" in code path is > nothing but "root.a.d.b" object. Since > {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, > will need to understand the implications in detail. For example, > {{CapapcityScheduler#getQueue}} has been used in many places which in turn > uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9766) YARN CapacityScheduler QueueMetrics has missing metrics for parent queues having same name
[ https://issues.apache.org/jira/browse/YARN-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913502#comment-16913502 ] Manikandan R commented on YARN-9766: Ok, [~tarunparimi]. Thanks. While understanding this issue in detail, had come across another related issue. Created YARN-9772 for the same. > YARN CapacityScheduler QueueMetrics has missing metrics for parent queues > having same name > -- > > Key: YARN-9766 > URL: https://issues.apache.org/jira/browse/YARN-9766 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Tarun Parimi >Assignee: Tarun Parimi >Priority: Major > > In Capacity Scheduler, we enforce Leaf Queues to have unique names. But it is > not the case for Parent Queues. For example, we can have the below queue > hierarchy, where "b" is the queue name for two different queue paths root.a.b > and root.a.d.b . Since it is not a leaf queue this configuration works and > apps run fine in the leaf queues 'c' and 'e'. > * root > ** a > *** b > c > *** d > b > * e > But the jmx metrics does not show the metrics for the parent queue > "root.a.d.b" . We can see metrics only for "root.a.b" queue. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9773) PartitionQueueMetrics for Custom Resources/Resource vectors
[ https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9773: --- Parent: YARN-6492 Issue Type: Sub-task (was: Bug) > PartitionQueueMetrics for Custom Resources/Resource vectors > --- > > Key: YARN-9773 > URL: https://issues.apache.org/jira/browse/YARN-9773 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9773) PartitionQueueMetrics for Custom Resources/Resource vectors
Manikandan R created YARN-9773: -- Summary: PartitionQueueMetrics for Custom Resources/Resource vectors Key: YARN-9773 URL: https://issues.apache.org/jira/browse/YARN-9773 Project: Hadoop YARN Issue Type: Bug Reporter: Manikandan R Assignee: Manikandan R -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913526#comment-16913526 ] Manikandan R commented on YARN-6492: Created YARN-9773 for the same. Will split .005 patch and attach the same in corresponding sub tasks shortly. > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R >Priority: Major > Attachments: PartitionQueueMetrics_default_partition.txt, > PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, > YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, > YARN-6492.004.patch, YARN-6492.005.WIP.patch, partition_metrics.txt > > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913860#comment-16913860 ] Manikandan R commented on YARN-9756: Sorry. Made changes to \{{TestCapacitySchedulerSurgicalPreemption}} test case but missed to capture in patch. Attached .002.patch. > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9756.001.patch, YARN-9756.002.patch, > YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9756: --- Attachment: YARN-9756.002.patch > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9756.001.patch, YARN-9756.002.patch, > YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16914411#comment-16914411 ] Manikandan R commented on YARN-9768: Is this duplicate of YARN-9478? Have a patch to handle this. Can I post a patch over there? > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-6492: --- Attachment: YARN-6492.006.WIP.patch > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R >Priority: Major > Attachments: PartitionQueueMetrics_default_partition.txt, > PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, > YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, > YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, > partition_metrics.txt > > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6492) Generate queue metrics for each partition
[ https://issues.apache.org/jira/browse/YARN-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917920#comment-16917920 ] Manikandan R commented on YARN-6492: Attaching .006.patch. It covers the changes only required for this JIRA (not any changes related to YARN-9767 & YARN-9773). > Generate queue metrics for each partition > - > > Key: YARN-6492 > URL: https://issues.apache.org/jira/browse/YARN-6492 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Jonathan Hung >Assignee: Manikandan R >Priority: Major > Attachments: PartitionQueueMetrics_default_partition.txt, > PartitionQueueMetrics_x_partition.txt, PartitionQueueMetrics_y_partition.txt, > YARN-6492.001.patch, YARN-6492.002.patch, YARN-6492.003.patch, > YARN-6492.004.patch, YARN-6492.005.WIP.patch, YARN-6492.006.WIP.patch, > partition_metrics.txt > > > We are interested in having queue metrics for all partitions. Right now each > queue has one QueueMetrics object which captures metrics either in default > partition or across all partitions. (After YARN-6467 it will be in default > partition) > But having the partition metrics would be very useful. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917923#comment-16917923 ] Manikandan R commented on YARN-9756: Attaching patch for branch 3.2. > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9756-branch-3.2.003.patch, YARN-9756.001.patch, > YARN-9756.002.patch, YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9756: --- Attachment: YARN-9756-branch-3.2.003.patch > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9756-branch-3.2.003.patch, YARN-9756.001.patch, > YARN-9756.002.patch, YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9756: --- Attachment: YARN-9756-branch-3.0.004.patch YARN-9756-branch-2.8.005.patch > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9756-branch-2.8.005.patch, > YARN-9756-branch-3.0.004.patch, YARN-9756-branch-3.2.003.patch, > YARN-9756.001.patch, YARN-9756.002.patch, YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9756) Create metric that sums total memory/vcores preempted per round
[ https://issues.apache.org/jira/browse/YARN-9756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917934#comment-16917934 ] Manikandan R commented on YARN-9756: Attaching patch for branch 3.0 & branch 2.8. > Create metric that sums total memory/vcores preempted per round > --- > > Key: YARN-9756 > URL: https://issues.apache.org/jira/browse/YARN-9756 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2 >Reporter: Eric Payne >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9756-branch-2.8.005.patch, > YARN-9756-branch-3.0.004.patch, YARN-9756-branch-3.2.003.patch, > YARN-9756.001.patch, YARN-9756.002.patch, YARN-9756.WIP.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16917943#comment-16917943 ] Manikandan R commented on YARN-9768: [~crh] [~wangda] Thanks. Attaching patch for your review. I can pull config from YARN configuration if needed. > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9768: --- Attachment: YARN-9768.001.patch > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > Attachments: YARN-9768.001.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-9478) Add timeout for renew delegation thread pool
[ https://issues.apache.org/jira/browse/YARN-9478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R resolved YARN-9478. Resolution: Duplicate > Add timeout for renew delegation thread pool > > > Key: YARN-9478 > URL: https://issues.apache.org/jira/browse/YARN-9478 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Major > > Yarn by default creates a thread pool with 50 threads to handle all the token > renewal for the running jobs. Currently there is no timeout for the threads > so if there is one application is slowing to renew token, then eventually > Yarn could run into the situation that all the threads are busy with renewing > tokens for such application types and the whole Yarn cluster can't handle new > applications. > Propose to add timeout to the threads in the thread pool so the threads get > killed after certain time. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9767) PartitionQueueMetrics Issues
[ https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918755#comment-16918755 ] Manikandan R commented on YARN-9767: Attaching .001.patch for review. > PartitionQueueMetrics Issues > > > Key: YARN-9767 > URL: https://issues.apache.org/jira/browse/YARN-9767 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9767.001.patch > > > The intent of the Jira is to capture the issues/observations encountered as > part of YARN-6492 development separately for ease of tracking. > Observations: > Please refer this > https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027 > 1. Since partition info are being extracted from request and node, there is a > problem. For example, > > Node N has been mapped to Label X (Non exclusive). Queue A has been > configured with ANY Node label. App A requested resources from Queue A and > its containers ran on Node N for some reasons. During > AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) > would get used for calculation. Lets say allocate call has been fired for 3 > containers of 1 GB each, then > a. PartitionDefault * queue A -> pending mb is 3 GB > b. PartitionX * queue A -> pending mb is -3 GB > > is the outcome. Because app request has been fired without any label > specification and #a metrics has been derived. After allocation is over, > pending resources usually gets decreased. When this happens, it use node > partition info. hence #b metrics has derived. > > Given this kind of situation, We will need to put some thoughts on achieving > the metrics correctly. > > 2. Though the intent of this jira is to do Partition Queue Metrics, we would > like to retain the existing Queue Metrics for backward compatibility (as you > can see from jira's discussion). > With this patch and YARN-9596 patch, queuemetrics (for queue's) would be > overridden either with some specific partition values or default partition > values. It could be vice - versa as well. For example, after the queues (say > queue A) has been initialised with some min and max cap and also with node > label's min and max cap, Queuemetrics (availableMB) for queue A return values > based on node label's cap config. > I've been working on these observations to provide a fix and attached > .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, > availableVcores is correct (Please refer above #2 observation). Added more > asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for > #2 is working properly. > Also one more thing to note is, user metrics for availableMB, availableVcores > at root queue was not there even before. Retained the same behaviour. User > metrics for availableMB, availableVcores is available only at child queue's > level and also with partitions. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9767) PartitionQueueMetrics Issues
[ https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9767: --- Attachment: YARN-9767.001.patch > PartitionQueueMetrics Issues > > > Key: YARN-9767 > URL: https://issues.apache.org/jira/browse/YARN-9767 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9767.001.patch > > > The intent of the Jira is to capture the issues/observations encountered as > part of YARN-6492 development separately for ease of tracking. > Observations: > Please refer this > https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027 > 1. Since partition info are being extracted from request and node, there is a > problem. For example, > > Node N has been mapped to Label X (Non exclusive). Queue A has been > configured with ANY Node label. App A requested resources from Queue A and > its containers ran on Node N for some reasons. During > AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) > would get used for calculation. Lets say allocate call has been fired for 3 > containers of 1 GB each, then > a. PartitionDefault * queue A -> pending mb is 3 GB > b. PartitionX * queue A -> pending mb is -3 GB > > is the outcome. Because app request has been fired without any label > specification and #a metrics has been derived. After allocation is over, > pending resources usually gets decreased. When this happens, it use node > partition info. hence #b metrics has derived. > > Given this kind of situation, We will need to put some thoughts on achieving > the metrics correctly. > > 2. Though the intent of this jira is to do Partition Queue Metrics, we would > like to retain the existing Queue Metrics for backward compatibility (as you > can see from jira's discussion). > With this patch and YARN-9596 patch, queuemetrics (for queue's) would be > overridden either with some specific partition values or default partition > values. It could be vice - versa as well. For example, after the queues (say > queue A) has been initialised with some min and max cap and also with node > label's min and max cap, Queuemetrics (availableMB) for queue A return values > based on node label's cap config. > I've been working on these observations to provide a fix and attached > .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, > availableVcores is correct (Please refer above #2 observation). Added more > asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for > #2 is working properly. > Also one more thing to note is, user metrics for availableMB, availableVcores > at root queue was not there even before. Retained the same behaviour. User > metrics for availableMB, availableVcores is available only at child queue's > level and also with partitions. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9767) PartitionQueueMetrics Issues
[ https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918755#comment-16918755 ] Manikandan R edited comment on YARN-9767 at 8/29/19 4:23 PM: - [~eepayne] Attaching .001.patch for review. Can you please take a look? was (Author: maniraj...@gmail.com): Attaching .001.patch for review. > PartitionQueueMetrics Issues > > > Key: YARN-9767 > URL: https://issues.apache.org/jira/browse/YARN-9767 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9767.001.patch > > > The intent of the Jira is to capture the issues/observations encountered as > part of YARN-6492 development separately for ease of tracking. > Observations: > Please refer this > https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027 > 1. Since partition info are being extracted from request and node, there is a > problem. For example, > > Node N has been mapped to Label X (Non exclusive). Queue A has been > configured with ANY Node label. App A requested resources from Queue A and > its containers ran on Node N for some reasons. During > AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) > would get used for calculation. Lets say allocate call has been fired for 3 > containers of 1 GB each, then > a. PartitionDefault * queue A -> pending mb is 3 GB > b. PartitionX * queue A -> pending mb is -3 GB > > is the outcome. Because app request has been fired without any label > specification and #a metrics has been derived. After allocation is over, > pending resources usually gets decreased. When this happens, it use node > partition info. hence #b metrics has derived. > > Given this kind of situation, We will need to put some thoughts on achieving > the metrics correctly. > > 2. Though the intent of this jira is to do Partition Queue Metrics, we would > like to retain the existing Queue Metrics for backward compatibility (as you > can see from jira's discussion). > With this patch and YARN-9596 patch, queuemetrics (for queue's) would be > overridden either with some specific partition values or default partition > values. It could be vice - versa as well. For example, after the queues (say > queue A) has been initialised with some min and max cap and also with node > label's min and max cap, Queuemetrics (availableMB) for queue A return values > based on node label's cap config. > I've been working on these observations to provide a fix and attached > .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, > availableVcores is correct (Please refer above #2 observation). Added more > asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for > #2 is working properly. > Also one more thing to note is, user metrics for availableMB, availableVcores > at root queue was not there even before. Retained the same behaviour. User > metrics for availableMB, availableVcores is available only at child queue's > level and also with partitions. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9773) Add QueueMetrics for Custom Resources
[ https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918779#comment-16918779 ] Manikandan R commented on YARN-9773: [~eepayne] Attaching .001.patch for review. Custom resources metrics would be registered into JMX similarly like "running_*" metrics. > Add QueueMetrics for Custom Resources > - > > Key: YARN-9773 > URL: https://issues.apache.org/jira/browse/YARN-9773 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9773.001.patch > > > Although the custom resource metrics are calculated and saved as a > QueueMetricsForCustomResources object within the QueueMetrics class, the JMX > and Simon QueueMetrics do not report that information for custom resources. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9773) Add QueueMetrics for Custom Resources
[ https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9773: --- Attachment: YARN-9773.001.patch > Add QueueMetrics for Custom Resources > - > > Key: YARN-9773 > URL: https://issues.apache.org/jira/browse/YARN-9773 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9773.001.patch > > > Although the custom resource metrics are calculated and saved as a > QueueMetricsForCustomResources object within the QueueMetrics class, the JMX > and Simon QueueMetrics do not report that information for custom resources. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9768: --- Attachment: YARN-9768.002.patch > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924378#comment-16924378 ] Manikandan R commented on YARN-9768: [~crh] [~elgoiri] Thanks for review. Sorry for the delay. Extended a bit to have max retry attempts as well in addition to the test case changes. Please take a look. Once everything is fine, I can take care of the documentation part. > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9773) Add QueueMetrics for Custom Resources
[ https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924384#comment-16924384 ] Manikandan R commented on YARN-9773: [~eepayne] Thanks for the review. Attached .002.patch. > Add QueueMetrics for Custom Resources > - > > Key: YARN-9773 > URL: https://issues.apache.org/jira/browse/YARN-9773 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9773.001.patch, YARN-9773.002.patch > > > Although the custom resource metrics are calculated and saved as a > QueueMetricsForCustomResources object within the QueueMetrics class, the JMX > and Simon QueueMetrics do not report that information for custom resources. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9773) Add QueueMetrics for Custom Resources
[ https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9773: --- Attachment: YARN-9773.002.patch > Add QueueMetrics for Custom Resources > - > > Key: YARN-9773 > URL: https://issues.apache.org/jira/browse/YARN-9773 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9773.001.patch, YARN-9773.002.patch > > > Although the custom resource metrics are calculated and saved as a > QueueMetricsForCustomResources object within the QueueMetrics class, the JMX > and Simon QueueMetrics do not report that information for custom resources. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues
[ https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926752#comment-16926752 ] Manikandan R commented on YARN-9772: To keep it simple, Should we extend the duplicates check (as of now, it does only for leaf queues) to parent queues as well? [~sunilg] [~wangda] [~weiweiyagn666] [~eepayne] Please share your thoughts. > CapacitySchedulerQueueManager has incorrect list of queues > -- > > Key: YARN-9772 > URL: https://issues.apache.org/jira/browse/YARN-9772 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > CapacitySchedulerQueueManager has incorrect list of queues when there is more > than one parent queue (say at middle level) with same name. > For example, > * root > ** a > *** b > c > *** d > b > * e > {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. > While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in > the map because of similar name. After parsing all the queues, map count > should be 7, but it is 6. Any reference to queue "root.a.b" in code path is > nothing but "root.a.d.b" object. Since > {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, > will need to understand the implications in detail. For example, > {{CapapcityScheduler#getQueue}} has been used in many places which in turn > uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues
[ https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926752#comment-16926752 ] Manikandan R edited comment on YARN-9772 at 9/10/19 4:14 PM: - To keep it simple and given the probability of having same names for parent and leaf queues is very less, Should we extend the duplicates check (as of now, it does only for leaf queues) to parent queues as well? [~sunilg] [~wangda] [~weiweiyagn666] [~eepayne] Please share your thoughts. was (Author: maniraj...@gmail.com): To keep it simple, Should we extend the duplicates check (as of now, it does only for leaf queues) to parent queues as well? [~sunilg] [~wangda] [~weiweiyagn666] [~eepayne] Please share your thoughts. > CapacitySchedulerQueueManager has incorrect list of queues > -- > > Key: YARN-9772 > URL: https://issues.apache.org/jira/browse/YARN-9772 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > CapacitySchedulerQueueManager has incorrect list of queues when there is more > than one parent queue (say at middle level) with same name. > For example, > * root > ** a > *** b > c > *** d > b > * e > {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. > While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in > the map because of similar name. After parsing all the queues, map count > should be 7, but it is 6. Any reference to queue "root.a.b" in code path is > nothing but "root.a.d.b" object. Since > {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, > will need to understand the implications in detail. For example, > {{CapapcityScheduler#getQueue}} has been used in many places which in turn > uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues
[ https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931577#comment-16931577 ] Manikandan R commented on YARN-9772: True, [~tarunparimi] but I think those situations are very unlikely. May be detailed documentation would help in this context. [~sunilg] [~wangda] Can you also share your thoughts? > CapacitySchedulerQueueManager has incorrect list of queues > -- > > Key: YARN-9772 > URL: https://issues.apache.org/jira/browse/YARN-9772 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > CapacitySchedulerQueueManager has incorrect list of queues when there is more > than one parent queue (say at middle level) with same name. > For example, > * root > ** a > *** b > c > *** d > b > * e > {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. > While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in > the map because of similar name. After parsing all the queues, map count > should be 7, but it is 6. Any reference to queue "root.a.b" in code path is > nothing but "root.a.d.b" object. Since > {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, > will need to understand the implications in detail. For example, > {{CapapcityScheduler#getQueue}} has been used in many places which in turn > uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9772) CapacitySchedulerQueueManager has incorrect list of queues
[ https://issues.apache.org/jira/browse/YARN-9772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931577#comment-16931577 ] Manikandan R edited comment on YARN-9772 at 9/17/19 4:17 PM: - True, [~tarunparimi] but I think those situations are very unlikely and detailed documentation would help in this context. [~sunilg] [~wangda] Can you also share your thoughts? was (Author: maniraj...@gmail.com): True, [~tarunparimi] but I think those situations are very unlikely. May be detailed documentation would help in this context. [~sunilg] [~wangda] Can you also share your thoughts? > CapacitySchedulerQueueManager has incorrect list of queues > -- > > Key: YARN-9772 > URL: https://issues.apache.org/jira/browse/YARN-9772 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > CapacitySchedulerQueueManager has incorrect list of queues when there is more > than one parent queue (say at middle level) with same name. > For example, > * root > ** a > *** b > c > *** d > b > * e > {{CapacitySchedulerQueueManager#getQueues}} maintains these list of queues. > While parsing "root.a.d.b", it overrides "root.a.b" with new Queue object in > the map because of similar name. After parsing all the queues, map count > should be 7, but it is 6. Any reference to queue "root.a.b" in code path is > nothing but "root.a.d.b" object. Since > {{CapacitySchedulerQueueManager#getQueues}} has been used in multiple places, > will need to understand the implications in detail. For example, > {{CapapcityScheduler#getQueue}} has been used in many places which in turn > uses {{CapacitySchedulerQueueManager#getQueues}}. cc [~eepayne], [~sunilg] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931643#comment-16931643 ] Manikandan R commented on YARN-9768: [~elgoiri] [~crh] Can you review? > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9627) DelegationTokenRenewer could block transitionToStandy
[ https://issues.apache.org/jira/browse/YARN-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932570#comment-16932570 ] Manikandan R commented on YARN-9627: {quote}Issue could block switch over when HDFS token renewal takes time {quote} YARN-9768 handles this problem based on timeout approach. Mind taking a look at the patch and share your thoughts as it is related to this JIRA? > DelegationTokenRenewer could block transitionToStandy > - > > Key: YARN-9627 > URL: https://issues.apache.org/jira/browse/YARN-9627 > Project: Hadoop YARN > Issue Type: Bug >Reporter: krishna reddy >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: YARN-9627.001.patch, YARN-9627.002.patch, > YARN-9627.003.patch > > > Cluster size: 5K > Running containers: 55K > *Scenario*: Largenumber of pending applications (around 50K) and performing > RM switch over > Below exception : > {noformat} > 2019-06-13 17:39:27,594 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renew Kind: HDFS_DELEGATION_TOKEN, Service: X:1616, Ident: (token > for root: HDFS_DELEGATION_TOKEN owner=root/had...@hadoop.com, renewer=yarn, > realUser=, issueDate=1560361265181, maxDate=1560966065181, > sequenceNumber=104708, masterKeyId=3);exp=1560533965360; > apps=[application_1560346941775_20702] in 86397766 ms, appId = > [application_1560346941775_20702] > 2019-06-13 17:39:27,609 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer on recovery. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > 2019-06-13 17:58:20,878 ERROR org.apache.zookeeper.ClientCnxn: Time out error > occurred for the packet 'clientPath:null serverPath:null finished:false > header:: 27,4 replyHeader:: 27,4295687588,0 request:: > '/rmstore1/ZKRMStateRoot/RMDTSecretManagerRoot/RMDTMasterKeysRoot/DelegationKey_49,F > response:: > #31ff8a16b74ffe129768ffdbffe949ff8dffd517ffcafffa,s{4295423577,4295423577,1560342837789,1560342837789,0,0,0,0,17,0,4295423577} > '. > 2019-06-13 17:58:20,877 INFO > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Renewed delegation-token= [Kind: HDFS_DELEGATION_TOKEN, Service: > X:1616, Ident: (token for root: HDFS_DELEGATION_TOKEN > owner=root/had...@hadoop.com, renewer=yarn, realUser=, > issueDate=1560366110990, maxDate=1560970910990, sequenceNumber=111891, > masterKeyId=3);exp=1560534896413; apps=[application_1560346941775_28115]] > 2019-06-13 17:58:20,924 WARN > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer: > Unable to add the application to the delegation token renewer on recovery. > java.lang.IllegalStateException: Timer already cancelled. > at java.util.Timer.sched(Timer.java:397) > at java.util.Timer.schedule(Timer.java:208) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.setTimerForTokenRenewal(DelegationTokenRenewer.java:612) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:523) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748 > {
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934499#comment-16934499 ] Manikandan R commented on YARN-9840: [~pbacsko] I have a patch to address this. Can I post the same? > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934982#comment-16934982 ] Manikandan R commented on YARN-9840: Thanks [~pbacsko]. Attached .001.patch. > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9840.001.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9840: --- Attachment: YARN-9840.001.patch > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9840.001.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9773) Add QueueMetrics for Custom Resources
[ https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9773: --- Attachment: YARN-9773.003.patch > Add QueueMetrics for Custom Resources > - > > Key: YARN-9773 > URL: https://issues.apache.org/jira/browse/YARN-9773 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9773.001.patch, YARN-9773.002.patch, > YARN-9773.003.patch > > > Although the custom resource metrics are calculated and saved as a > QueueMetricsForCustomResources object within the QueueMetrics class, the JMX > and Simon QueueMetrics do not report that information for custom resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9773) Add QueueMetrics for Custom Resources
[ https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935370#comment-16935370 ] Manikandan R commented on YARN-9773: Thanks [~eepayne] for your review. Attached .003.patch. > Add QueueMetrics for Custom Resources > - > > Key: YARN-9773 > URL: https://issues.apache.org/jira/browse/YARN-9773 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9773.001.patch, YARN-9773.002.patch, > YARN-9773.003.patch > > > Although the custom resource metrics are calculated and saved as a > QueueMetricsForCustomResources object within the QueueMetrics class, the JMX > and Simon QueueMetrics do not report that information for custom resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935371#comment-16935371 ] Manikandan R commented on YARN-9841: [~pbacsko] Shall I take this forward as it is related to YARN-9840? > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9841: --- Attachment: YARN-9841.001.patch > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937024#comment-16937024 ] Manikandan R commented on YARN-9841: Thanks [~pbacsko]. Attached .001.patch for review. While working on this JIRA, had come across below observations: # As documented in [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html,|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html] tried "u:user2:%primary_group" mapping and don't think it is working as expected. Expected queue o/p is, queue name similar to primary group of the user, but it is not the case. # Use case of "u:%user:parentqueue.%user" mapping doesn't return expected o/p when it is working in conjunction with "u:%user:%primary_group" mapping. Where as, Using "u:%user:parentqueue.%user" mapping alone is working as expected. Created a separate junit patch to validate these observations. Can you please validate this? We can raise separate JIRA's to address these issues based on your confirmation. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9841: --- Attachment: YARN-9841.junit.patch > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937024#comment-16937024 ] Manikandan R edited comment on YARN-9841 at 9/24/19 5:59 PM: - Thanks [~pbacsko]. Attached .001.patch for review. While working on this JIRA, had come across below observations: # As documented in [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html,|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html] tried "u:user2:%primary_group" mapping and don't think it is working as expected. Expected queue o/p is, queue name similar to primary group of the user, but it is not the case. Where as, "u:%user:%primary_group" mapping is working as expected. # Use case of "u:%user:parentqueue.%user" mapping doesn't return expected o/p when it is working in conjunction with "u:%user:%primary_group" mapping. Where as, Using "u:%user:parentqueue.%user" mapping alone is working as expected. Created a separate junit patch to validate these observations. Can you please validate this? We can raise separate JIRA's to address these issues based on your confirmation. was (Author: maniraj...@gmail.com): Thanks [~pbacsko]. Attached .001.patch for review. While working on this JIRA, had come across below observations: # As documented in [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html,|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html] tried "u:user2:%primary_group" mapping and don't think it is working as expected. Expected queue o/p is, queue name similar to primary group of the user, but it is not the case. # Use case of "u:%user:parentqueue.%user" mapping doesn't return expected o/p when it is working in conjunction with "u:%user:%primary_group" mapping. Where as, Using "u:%user:parentqueue.%user" mapping alone is working as expected. Created a separate junit patch to validate these observations. Can you please validate this? We can raise separate JIRA's to address these issues based on your confirmation. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9840: --- Attachment: YARN-9840.002.patch > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9840.001.patch, YARN-9840.002.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937061#comment-16937061 ] Manikandan R edited comment on YARN-9840 at 9/24/19 6:07 PM: - [~pbacsko] Thanks for your review. Addressed all your comments. Attached .002.patch. {quote}What if there's no secondary group and we return {{null}}? Can't it cause an NPE somewhere else? {quote} In this case, it does't throw any exception and makes use of 'default' queue. Newly added asserts covers this. Also debug log has been added. {quote}One more thing - this enhancement should be documented. {quote} Yes. Require some more clarity as mentioned in https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024. Will do in next patch. was (Author: maniraj...@gmail.com): [~pbacsko] Thanks for your review. Addressed all your comments. Attached .002.patch. {quote}What if there's no secondary group and we return {{null}}? Can't it cause an NPE somewhere else? {quote} In this case, it does't throw any exception and makes use of 'default' queue. Newly added asserts covers this. Also debug log has been added. {quote}One more thing - this enhancement should be documented. {quote} Require some more clarity as mentioned in https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024. Will do in next patch. > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9840.001.patch, YARN-9840.002.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16937061#comment-16937061 ] Manikandan R commented on YARN-9840: [~pbacsko] Thanks for your review. Addressed all your comments. Attached .002.patch. {quote}What if there's no secondary group and we return {{null}}? Can't it cause an NPE somewhere else? {quote} In this case, it does't throw any exception and makes use of 'default' queue. Newly added asserts covers this. Also debug log has been added. {quote}One more thing - this enhancement should be documented. {quote} Require some more clarity as mentioned in https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024. Will do in next patch. > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9840.001.patch, YARN-9840.002.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938759#comment-16938759 ] Manikandan R commented on YARN-9768: [~inigoiri] [~crh] Can you review? > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9841: --- Attachment: YARN-9841.002.patch > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938834#comment-16938834 ] Manikandan R commented on YARN-9841: Thanks [~pbacsko] for review. Addressed all of your comments. Attached .002.patch. {quote}If we have this for {{%primary_group}}, can't we just handle {{%secondary_group}} as well? {quote} Initially thought about this, but then preferred to take it in separate for ease of tracking and to avoid confusions with description etc. Hope you are fine. Also, Had a chance to look at observations raised earlier? We can track these issues in separate JIRA. {quote}Can {{ctx}} ever be null? I assume this test should produce the same behavior each time, provided the code-under-test doesn't change. So to me it seems more logical to make sure that {{ctx}} is not null, which practically means a new assertion. Btw this applies to the piece of code above, too. {quote} Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in {{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts wherein some cases ctx is null. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938834#comment-16938834 ] Manikandan R edited comment on YARN-9841 at 9/26/19 5:29 PM: - Thanks [~pbacsko] for review. Addressed all of your comments. Attached .002.patch. {quote}If we have this for {{%primary_group}}, can't we just handle {{%secondary_group}} as well? {quote} Initially thought about this, but then preferred to take it in separate Jira for ease of tracking and to avoid confusions with description, discussions etc. Hope you are fine. Also, Had a chance to look at observations raised earlier? We can track these issues in separate JIRA. {quote}Can {{ctx}} ever be null? I assume this test should produce the same behavior each time, provided the code-under-test doesn't change. So to me it seems more logical to make sure that {{ctx}} is not null, which practically means a new assertion. Btw this applies to the piece of code above, too. {quote} Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in {{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts wherein some cases ctx is null. was (Author: maniraj...@gmail.com): Thanks [~pbacsko] for review. Addressed all of your comments. Attached .002.patch. {quote}If we have this for {{%primary_group}}, can't we just handle {{%secondary_group}} as well? {quote} Initially thought about this, but then preferred to take it in separate for ease of tracking and to avoid confusions with description etc. Hope you are fine. Also, Had a chance to look at observations raised earlier? We can track these issues in separate JIRA. {quote}Can {{ctx}} ever be null? I assume this test should produce the same behavior each time, provided the code-under-test doesn't change. So to me it seems more logical to make sure that {{ctx}} is not null, which practically means a new assertion. Btw this applies to the piece of code above, too. {quote} Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in {{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts wherein some cases ctx is null. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9768: --- Attachment: YARN-9768.003.patch > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch, > YARN-9768.003.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940115#comment-16940115 ] Manikandan R commented on YARN-9768: [~inigoiri] Thanks for review. Sorry, There was some problem in eclipse formatter. Fixed. Addressed almost all comments. Regarding sleeps, since there are multiple retries with fixed interval, sleeping helps in ensuring max retry attempts has been exhausted. > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch, > YARN-9768.003.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
Manikandan R created YARN-9865: -- Summary: Capacity scheduler: add support for combined %user + %secondary_group mapping Key: YARN-9865 URL: https://issues.apache.org/jira/browse/YARN-9865 Project: Hadoop YARN Issue Type: Bug Reporter: Manikandan R Assignee: Manikandan R -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9865: --- Description: Similiar to YARN-9841, but for secondary group. > Capacity scheduler: add support for combined %user + %secondary_group mapping > - > > Key: YARN-9865 > URL: https://issues.apache.org/jira/browse/YARN-9865 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > Similiar to YARN-9841, but for secondary group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9865: --- Attachment: YARN-9865.001.patch > Capacity scheduler: add support for combined %user + %secondary_group mapping > - > > Key: YARN-9865 > URL: https://issues.apache.org/jira/browse/YARN-9865 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9865.001.patch > > > Similiar to YARN-9841, but for secondary group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940245#comment-16940245 ] Manikandan R commented on YARN-9865: Attached .001.patch. > Capacity scheduler: add support for combined %user + %secondary_group mapping > - > > Key: YARN-9865 > URL: https://issues.apache.org/jira/browse/YARN-9865 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9865.001.patch > > > Similiar to YARN-9841, but for secondary group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9866) u:user2:%primary_group is not working as expected
Manikandan R created YARN-9866: -- Summary: u:user2:%primary_group is not working as expected Key: YARN-9866 URL: https://issues.apache.org/jira/browse/YARN-9866 Project: Hadoop YARN Issue Type: Bug Reporter: Manikandan R Assignee: Manikandan R -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9866) u:user2:%primary_group is not working as expected
[ https://issues.apache.org/jira/browse/YARN-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9866: --- Description: Please refer #1 in https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024 for more details > u:user2:%primary_group is not working as expected > - > > Key: YARN-9866 > URL: https://issues.apache.org/jira/browse/YARN-9866 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > Please refer #1 in > https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024 > for more details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9867) "u:%user:parentqueue.%user" is not working as expected
Manikandan R created YARN-9867: -- Summary: "u:%user:parentqueue.%user" is not working as expected Key: YARN-9867 URL: https://issues.apache.org/jira/browse/YARN-9867 Project: Hadoop YARN Issue Type: Bug Reporter: Manikandan R Assignee: Manikandan R -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9867) "u:%user:parentqueue.%user" is not working as expected
[ https://issues.apache.org/jira/browse/YARN-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9867: --- Description: Please refer #2 in https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024 for more details > "u:%user:parentqueue.%user" is not working as expected > -- > > Key: YARN-9867 > URL: https://issues.apache.org/jira/browse/YARN-9867 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > Please refer #2 in > https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024 > for more details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9868) Validate %primary_group queue in CS queue manager
Manikandan R created YARN-9868: -- Summary: Validate %primary_group queue in CS queue manager Key: YARN-9868 URL: https://issues.apache.org/jira/browse/YARN-9868 Project: Hadoop YARN Issue Type: Bug Reporter: Manikandan R Assignee: Manikandan R As part of %secondary_group mapping, we ensure o/p of %secondary_group while processing the queue mapping is available using CSQueueManager. Similarly, we will need to same for %primary_group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940249#comment-16940249 ] Manikandan R commented on YARN-9841: {quote}I'm fine with a separate JIRA. {quote} Created YARN-9865 {quote}I haven't had the chance to examine the mapping behaviour {quote} Created YARN-9866 and YARN-9867 for 2 issues. [~Prabhu Joseph] If you don't see these observations as issues, we can close if needed. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9865: --- Issue Type: Improvement (was: Bug) > Capacity scheduler: add support for combined %user + %secondary_group mapping > - > > Key: YARN-9865 > URL: https://issues.apache.org/jira/browse/YARN-9865 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9865.001.patch > > > Similiar to YARN-9841, but for secondary group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16941101#comment-16941101 ] Manikandan R commented on YARN-9841: Thanks for your validation. {quote}"u:user2:%primary_group" - I can confirm that it's not working{quote} Ok {quote}I think this is not a bug, at least not they way you're suggesting.{quote} Yes, you are right. I also checked it again. {quote}Here, the problem is that end-users are not notified about this. {quote} Yes. In general, I think we will need to improve the documentation to help users especially on the precedence. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947063#comment-16947063 ] Manikandan R commented on YARN-9840: Sorry for the delay. Attached .003.patch. > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9840.001.patch, YARN-9840.002.patch, > YARN-9840.003.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9840: --- Attachment: YARN-9840.003.patch > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9840.001.patch, YARN-9840.002.patch, > YARN-9840.003.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947070#comment-16947070 ] Manikandan R commented on YARN-9841: Attached .003.patch to fix the checkstyle issues. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9841: --- Attachment: YARN-9841.003.patch > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16947702#comment-16947702 ] Manikandan R commented on YARN-9841: Thanks [~pbacsko]. Attached .004.patch. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, > YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9841: --- Attachment: YARN-9841.004.patch > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, > YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952987#comment-16952987 ] Manikandan R commented on YARN-9841: Sorry for the delay. Attached .005.patch for doc changes. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, > YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9841: --- Attachment: YARN-9841.005.patch > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.003.patch, YARN-9841.004.patch, > YARN-9841.005.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9865: --- Attachment: YARN-9865.002.patch > Capacity scheduler: add support for combined %user + %secondary_group mapping > - > > Key: YARN-9865 > URL: https://issues.apache.org/jira/browse/YARN-9865 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9865.001.patch, YARN-9865.002.patch > > > Similiar to YARN-9841, but for secondary group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952990#comment-16952990 ] Manikandan R commented on YARN-9865: Attached .002.patch. > Capacity scheduler: add support for combined %user + %secondary_group mapping > - > > Key: YARN-9865 > URL: https://issues.apache.org/jira/browse/YARN-9865 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9865.001.patch, YARN-9865.002.patch > > > Similiar to YARN-9841, but for secondary group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9865) Capacity scheduler: add support for combined %user + %secondary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953010#comment-16953010 ] Manikandan R commented on YARN-9865: Dependency link has been fixed. It requires YARN-9841. Can you trigger the jenkins manually? > Capacity scheduler: add support for combined %user + %secondary_group mapping > - > > Key: YARN-9865 > URL: https://issues.apache.org/jira/browse/YARN-9865 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9865.001.patch, YARN-9865.002.patch > > > Similiar to YARN-9841, but for secondary group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9773) Add QueueMetrics for Custom Resources
[ https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953864#comment-16953864 ] Manikandan R commented on YARN-9773: Thanks [~epayne] for your support. > Add QueueMetrics for Custom Resources > - > > Key: YARN-9773 > URL: https://issues.apache.org/jira/browse/YARN-9773 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.4 > > Attachments: YARN-9773.001.patch, YARN-9773.002.patch, > YARN-9773.003.patch > > > Although the custom resource metrics are calculated and saved as a > QueueMetricsForCustomResources object within the QueueMetrics class, the JMX > and Simon QueueMetrics do not report that information for custom resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9866) u:user2:%primary_group is not working as expected
[ https://issues.apache.org/jira/browse/YARN-9866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9866: --- Attachment: YARN-9866.001.patch > u:user2:%primary_group is not working as expected > - > > Key: YARN-9866 > URL: https://issues.apache.org/jira/browse/YARN-9866 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9866.001.patch > > > Please refer #1 in > https://issues.apache.org/jira/browse/YARN-9841?focusedCommentId=16937024&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937024 > for more details -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9912) Support u:user2:%secondary_group queue mapping
Manikandan R created YARN-9912: -- Summary: Support u:user2:%secondary_group queue mapping Key: YARN-9912 URL: https://issues.apache.org/jira/browse/YARN-9912 Project: Hadoop YARN Issue Type: Bug Reporter: Manikandan R Assignee: Manikandan R -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9912) Support u:user2:%secondary_group queue mapping
[ https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9912: --- Description: Simliar to > Support u:user2:%secondary_group queue mapping > -- > > Key: YARN-9912 > URL: https://issues.apache.org/jira/browse/YARN-9912 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > Simliar to -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9912) Support u:user2:%secondary_group queue mapping
[ https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9912: --- Description: Similar to u:user2:%primary_group mapping, add support for u:user2:%secondary_group queue mapping as well. (was: Simliar to ) > Support u:user2:%secondary_group queue mapping > -- > > Key: YARN-9912 > URL: https://issues.apache.org/jira/browse/YARN-9912 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > Similar to u:user2:%primary_group mapping, add support for > u:user2:%secondary_group queue mapping as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9912) Support u:user2:%secondary_group queue mapping
[ https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9912: --- Attachment: YARN-9912.001.patch > Support u:user2:%secondary_group queue mapping > -- > > Key: YARN-9912 > URL: https://issues.apache.org/jira/browse/YARN-9912 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9912.001.patch > > > Similar to u:user2:%primary_group mapping, add support for > u:user2:%secondary_group queue mapping as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9912) Support u:user2:%secondary_group queue mapping
[ https://issues.apache.org/jira/browse/YARN-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9912: --- Issue Type: Improvement (was: Bug) > Support u:user2:%secondary_group queue mapping > -- > > Key: YARN-9912 > URL: https://issues.apache.org/jira/browse/YARN-9912 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9912.001.patch > > > Similar to u:user2:%primary_group mapping, add support for > u:user2:%secondary_group queue mapping as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9840) Capacity scheduler: add support for Secondary Group rule mapping
[ https://issues.apache.org/jira/browse/YARN-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953887#comment-16953887 ] Manikandan R commented on YARN-9840: Sorry for the delay. A minor change in doc. Instead of u:user3:%secondary_group, it should be u:%user:%secondary_group. u:user3:%secondary_group queue mapping has been addressed in YARN-9912 > Capacity scheduler: add support for Secondary Group rule mapping > > > Key: YARN-9840 > URL: https://issues.apache.org/jira/browse/YARN-9840 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9840-004.patch, YARN-9840.001.patch, > YARN-9840.002.patch, YARN-9840.003.patch > > > Currently, Capacity Scheduler only supports primary group rule mapping like > this: > {{u:%user:%primary_group}} > Fair scheduler already supports secondary group placement rule. Let's add > this to CS to reduce the feature gap. > Class of interest: > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9868) Validate %primary_group queue in CS queue manager
[ https://issues.apache.org/jira/browse/YARN-9868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9868: --- Issue Type: Improvement (was: Bug) > Validate %primary_group queue in CS queue manager > - > > Key: YARN-9868 > URL: https://issues.apache.org/jira/browse/YARN-9868 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > > As part of %secondary_group mapping, we ensure o/p of %secondary_group while > processing the queue mapping is available using CSQueueManager. Similarly, we > will need to same for %primary_group. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org