[jira] [Updated] (YARN-11026) Make AppPlacementAllocator configurable in AppSchedulingInfo
[ https://issues.apache.org/jira/browse/YARN-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11026: Summary: Make AppPlacementAllocator configurable in AppSchedulingInfo (was: Make default AppPlacementAllocator configurable in AppSchedulingInfo) > Make AppPlacementAllocator configurable in AppSchedulingInfo > > > Key: YARN-11026 > URL: https://issues.apache.org/jira/browse/YARN-11026 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8431) Add whitelist/blacklist support for ATSv2 events.
[ https://issues.apache.org/jira/browse/YARN-8431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-8431: -- Assignee: Minni Mittal (was: Abhishek Modi) > Add whitelist/blacklist support for ATSv2 events. > - > > Key: YARN-8431 > URL: https://issues.apache.org/jira/browse/YARN-8431 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Abhishek Modi >Assignee: Minni Mittal >Priority: Major > > In this jira, we will add functionality in ATSv2 to blacklist events at > cluster level. Blacklisting of events should not require restart of any of > the services and should apply dynamically. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9383) Publish federation events to ATSv2.
[ https://issues.apache.org/jira/browse/YARN-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-9383: -- Assignee: Minni Mittal (was: Abhishek Modi) > Publish federation events to ATSv2. > --- > > Key: YARN-9383 > URL: https://issues.apache.org/jira/browse/YARN-9383 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Minni Mittal >Priority: Major > > With federation enabled, containers for a single application might get > spawned across multiple sub-clusters. This information right now is not > getting published to ATSv2. As part of this jira, we are going to publish > federation related info in container events to ATSv2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11025) Implement distributed decommissioning
[ https://issues.apache.org/jira/browse/YARN-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11025: Description: This Jira proposes to add support for accepting request from disributed sources for putting nodes in decomissioning state. It proposes to add configurable provider and consumer class interface in nodemanager. NM can receive request to put node in decomissioning from any distributed sources using the provider class implementation and consumer classes can utilize it to set node status. Corresponding changes will be done at RM side to update node staate while update event is called at DecomissioningNodesWatcher. was: This Jira proposes to add support for acceoting request from disributed sources for putting nodes in decomissioning state. It proposes to add configurable provider and consumer class interface in nodemanager. NM can receive request to put node in decomissioning from any distributed sources using the provider class implementation and consumer classes can utilize it to set node status. Corresponding changes will be done at RM side to update node staate while update event is called at DecomissioningNodesWatcher. > Implement distributed decommissioning > - > > Key: YARN-11025 > URL: https://issues.apache.org/jira/browse/YARN-11025 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > This Jira proposes to add support for accepting request from disributed > sources for putting nodes in decomissioning state. > It proposes to add configurable provider and consumer class interface in > nodemanager. NM can receive request to put node in decomissioning from any > distributed sources using the provider class implementation and consumer > classes can utilize it to set node status. > Corresponding changes will be done at RM side to update node staate while > update event is called at DecomissioningNodesWatcher. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11025) Implement distributed decommissioning
[ https://issues.apache.org/jira/browse/YARN-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11025: Description: This Jira proposes to add support for acceoting request from disributed sources for putting nodes in decomissioning state. It proposes to add configurable provider and consumer class interface in nodemanager. NM can receive request to put node in decomissioning from any distributed sources using the provider class implementation and consumer classes can utilize it to set node status. Corresponding changes will be done at RM side to update node staate while update event is called at DecomissioningNodesWatcher. > Implement distributed decommissioning > - > > Key: YARN-11025 > URL: https://issues.apache.org/jira/browse/YARN-11025 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > This Jira proposes to add support for acceoting request from disributed > sources for putting nodes in decomissioning state. > It proposes to add configurable provider and consumer class interface in > nodemanager. NM can receive request to put node in decomissioning from any > distributed sources using the provider class implementation and consumer > classes can utilize it to set node status. > Corresponding changes will be done at RM side to update node staate while > update event is called at DecomissioningNodesWatcher. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10995) Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy
[ https://issues.apache.org/jira/browse/YARN-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-10995: --- Assignee: Minni Mittal > Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy > - > > Key: YARN-10995 > URL: https://issues.apache.org/jira/browse/YARN-10995 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Minni Mittal >Priority: Major > > GuaranteedOrZeroCapacityOverTimePolicy has a comparator class that orders > applications by their submit time. It gets the applications from the > RMContext and doesn't need any information from > GuaranteedOrZeroCapacityOverTimePolicy class, so this easily could be moved > to RMContext, so that the reference to the RMContext/SchedulerContext could > be removed from GuaranteedOrZeroCapacityOverTimePolicy. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10995) Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy
[ https://issues.apache.org/jira/browse/YARN-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459317#comment-17459317 ] Minni Mittal commented on YARN-10995: - Hey [~bteke], If you are not working, can I take up this JIRA? Thanks > Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy > - > > Key: YARN-10995 > URL: https://issues.apache.org/jira/browse/YARN-10995 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Priority: Major > > GuaranteedOrZeroCapacityOverTimePolicy has a comparator class that orders > applications by their submit time. It gets the applications from the > RMContext and doesn't need any information from > GuaranteedOrZeroCapacityOverTimePolicy class, so this easily could be moved > to RMContext, so that the reference to the RMContext/SchedulerContext could > be removed from GuaranteedOrZeroCapacityOverTimePolicy. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9967) Fix NodeManager failing to start when Hdfs Auxillary Jar is set
[ https://issues.apache.org/jira/browse/YARN-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459310#comment-17459310 ] Minni Mittal commented on YARN-9967: Hey [~tarunparimi] , Can I take this Jira over, if you are not working on it ? Thanks > Fix NodeManager failing to start when Hdfs Auxillary Jar is set > --- > > Key: YARN-9967 > URL: https://issues.apache.org/jira/browse/YARN-9967 > Project: Hadoop YARN > Issue Type: Bug > Components: auxservices, nodemanager >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Tarun Parimi >Priority: Major > > Loading an auxiliary jar from a Hdfs location on a node manager fails with > ClassNotFound Exception > {code:java} > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > classpath: [] > 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: > system classes: [java., javax.accessibility., javax.activation., > javax.activity., javax.annotation., javax.annotation.processing., > javax.crypto., javax.imageio., javax.jws., javax.lang.model., > -javax.management.j2ee., javax.management., javax.naming., javax.net., > javax.print., javax.rmi., javax.script., -javax.security.auth.message., > javax.security.auth., javax.security.cert., javax.security.sasl., > javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., > -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., > org.xml.sax., org.apache.commons.logging., org.apache.log4j., > -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, > hdfs-default.xml, mapred-default.xml, yarn-default.xml] > 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: > Service > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed > in state INITED > java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189) > at > org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016) > {code} > *Repro:* > {code:java} > 1. Prepare a custom auxiliary service jar and place it on hdfs > [hdfs@yarndocker-1 yarn]$ cat TestShuffleHandler2.java > package org; > import org.apache.hadoop.yarn.server.api.AuxiliaryService; > import org.apache.hadoop.yarn.server.api.ApplicationInitializationContext; > import org.apache.hadoop.yarn.server.api.ApplicationTerminationContext; > import java.nio.ByteBuffer; > public class TestShuffleHandler2 extends AuxiliaryService { > public static final String MAPREDUCE_TEST_SHUFFLE_SERVICEID = > "test_shuffle2"; > public TestShuffleHandler2() { > super("testshuffle2"); > } > @Override > public void initializeApplication(ApplicationInitializationContext > context) { > } > @Override > public void stopApplication(ApplicationTerminationContext context) { > } > @Override > public synchronized ByteBuffer getMetaData() { > return ByteBuffer.allocate(0); > } > } > > [hdfs@yarndocker-1 yarn]$ javac -d . -cp `hadoop classpath` > TestShuffleHandler2.java > [hdfs@yarndocker-1 yarn]$ jar
[jira] [Created] (YARN-11047) yarn resourcemanager and nodemanager unable to connect with Hbase when ATSv2 is enabled
Minni Mittal created YARN-11047: --- Summary: yarn resourcemanager and nodemanager unable to connect with Hbase when ATSv2 is enabled Key: YARN-11047 URL: https://issues.apache.org/jira/browse/YARN-11047 Project: Hadoop YARN Issue Type: Task Reporter: Minni Mittal yarn resourcemanager command has following issue: 2021-12-14 19:26:33,345 WARN [pool-28-thread-1] storage.TimelineStorageMonitor (TimelineStorageMonitor.java:run(95)) - Got failure attempting to read from HBase, assuming Storage is down java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.NoSuchMethodError: [com.google.common.net|http://com.google.common.net/].HostAndPort.getHostText()Ljava/lang/String; at org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95) at org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283) at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77) at org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11047) yarn resourcemanager and nodemanager unable to connect with Hbase when ATSv2 is enabled
[ https://issues.apache.org/jira/browse/YARN-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-11047: --- Assignee: Minni Mittal > yarn resourcemanager and nodemanager unable to connect with Hbase when ATSv2 > is enabled > --- > > Key: YARN-11047 > URL: https://issues.apache.org/jira/browse/YARN-11047 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > yarn resourcemanager command has following issue: > 2021-12-14 19:26:33,345 WARN [pool-28-thread-1] > storage.TimelineStorageMonitor (TimelineStorageMonitor.java:run(95)) - Got > failure attempting to read from HBase, assuming Storage is down > java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: > java.lang.NoSuchMethodError: > [com.google.common.net|http://com.google.common.net/].HostAndPort.getHostText()Ljava/lang/String; > at > org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95) > at > org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77) > at > org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11034) Add enhanced headroom in AllocateResponse
[ https://issues.apache.org/jira/browse/YARN-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11034: Description: Add enhanced headroom in allocate response. This provides a channel for RMs to return load information for AMRMProxy and decision making when rerouting resource requests. (was: Add enhanced headroom in allocate response. This provides a channel for RMs to return load information for AMRMProxy and decision making when rerouting resource requests.) > Add enhanced headroom in AllocateResponse > - > > Key: YARN-11034 > URL: https://issues.apache.org/jira/browse/YARN-11034 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Add enhanced headroom in allocate response. This provides a channel for RMs > to return load information for AMRMProxy and decision making when rerouting > resource requests. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11037) Add configurable logic to split resource request to least loaded SC
Minni Mittal created YARN-11037: --- Summary: Add configurable logic to split resource request to least loaded SC Key: YARN-11037 URL: https://issues.apache.org/jira/browse/YARN-11037 Project: Hadoop YARN Issue Type: Task Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11034) add enhanced headroom in AllocateResponse.
Minni Mittal created YARN-11034: --- Summary: add enhanced headroom in AllocateResponse. Key: YARN-11034 URL: https://issues.apache.org/jira/browse/YARN-11034 Project: Hadoop YARN Issue Type: Task Reporter: Minni Mittal Assignee: Minni Mittal Add enhanced headroom in allocate response. This provides a channel for RMs to return load information for AMRMProxy and decision making when rerouting resource requests. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10201) Make AMRMProxyPolicy aware of SC load
[ https://issues.apache.org/jira/browse/YARN-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-10201: --- Assignee: Minni Mittal (was: Young Chen) > Make AMRMProxyPolicy aware of SC load > - > > Key: YARN-10201 > URL: https://issues.apache.org/jira/browse/YARN-10201 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy >Reporter: Young Chen >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10201.v0.patch, YARN-10201.v1.patch, > YARN-10201.v10.patch, YARN-10201.v2.patch, YARN-10201.v3.patch, > YARN-10201.v4.patch, YARN-10201.v5.patch, YARN-10201.v6.patch, > YARN-10201.v7.patch, YARN-10201.v8.patch, YARN-10201.v9.patch > > > LocalityMulticastAMRMProxyPolicy is currently unaware of SC load when > splitting resource requests. We propose changes to the policy so that it > receives feedback from SCs and can load balance requests across the federated > cluster. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10883) [Router]add more logs when applications killed by a user
[ https://issues.apache.org/jira/browse/YARN-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-10883: --- Assignee: Minni Mittal > [Router]add more logs when applications killed by a user > > > Key: YARN-10883 > URL: https://issues.apache.org/jira/browse/YARN-10883 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: chaosju >Assignee: Minni Mittal >Priority: Major > > the Router should record the client address which killed the application -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11028) Add metrics for container allocation latency
[ https://issues.apache.org/jira/browse/YARN-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11028: Summary: Add metrics for container allocation latency (was: Add metrics for Guaranteed container allocation latency) > Add metrics for container allocation latency > > > Key: YARN-11028 > URL: https://issues.apache.org/jira/browse/YARN-11028 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11029) Improve logs to print askCount, allocatedCount in AMRMProxy service
Minni Mittal created YARN-11029: --- Summary: Improve logs to print askCount, allocatedCount in AMRMProxy service Key: YARN-11029 URL: https://issues.apache.org/jira/browse/YARN-11029 Project: Hadoop YARN Issue Type: Improvement Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11028) Add metrics for Guaranteed container allocation latency
Minni Mittal created YARN-11028: --- Summary: Add metrics for Guaranteed container allocation latency Key: YARN-11028 URL: https://issues.apache.org/jira/browse/YARN-11028 Project: Hadoop YARN Issue Type: Improvement Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11027) Add ExecutionTypeRequest#compareTo method
[ https://issues.apache.org/jira/browse/YARN-11027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-11027: --- Assignee: Minni Mittal > Add ExecutionTypeRequest#compareTo method > -- > > Key: YARN-11027 > URL: https://issues.apache.org/jira/browse/YARN-11027 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11027) Add ExecutionTypeRequest#compareTo method
Minni Mittal created YARN-11027: --- Summary: Add ExecutionTypeRequest#compareTo method Key: YARN-11027 URL: https://issues.apache.org/jira/browse/YARN-11027 Project: Hadoop YARN Issue Type: Task Reporter: Minni Mittal -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11026) Make default AppPlacementAllocator configurable in n AppSchedulingInfo
[ https://issues.apache.org/jira/browse/YARN-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11026: Summary: Make default AppPlacementAllocator configurable in n AppSchedulingInfo (was: Make default AppPlacementAllocator configurable) > Make default AppPlacementAllocator configurable in n AppSchedulingInfo > -- > > Key: YARN-11026 > URL: https://issues.apache.org/jira/browse/YARN-11026 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11026) Make default AppPlacementAllocator configurable in AppSchedulingInfo
[ https://issues.apache.org/jira/browse/YARN-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11026: Summary: Make default AppPlacementAllocator configurable in AppSchedulingInfo (was: Make default AppPlacementAllocator configurable in n AppSchedulingInfo) > Make default AppPlacementAllocator configurable in AppSchedulingInfo > > > Key: YARN-11026 > URL: https://issues.apache.org/jira/browse/YARN-11026 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11026) Make default AppPlacementAllocator configurable
Minni Mittal created YARN-11026: --- Summary: Make default AppPlacementAllocator configurable Key: YARN-11026 URL: https://issues.apache.org/jira/browse/YARN-11026 Project: Hadoop YARN Issue Type: Task Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11025) Implement distributed decommissioning
[ https://issues.apache.org/jira/browse/YARN-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11025: Summary: Implement distributed decommissioning (was: Implement distributed maintenance ) > Implement distributed decommissioning > - > > Key: YARN-11025 > URL: https://issues.apache.org/jira/browse/YARN-11025 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11025) Implement distributed maintenance
Minni Mittal created YARN-11025: --- Summary: Implement distributed maintenance Key: YARN-11025 URL: https://issues.apache.org/jira/browse/YARN-11025 Project: Hadoop YARN Issue Type: New Feature Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10174) Add colored policies to enable manual load balancing across sub clusters
[ https://issues.apache.org/jira/browse/YARN-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-10174: --- Assignee: Minni Mittal (was: Young Chen) > Add colored policies to enable manual load balancing across sub clusters > > > Key: YARN-10174 > URL: https://issues.apache.org/jira/browse/YARN-10174 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Young Chen >Assignee: Minni Mittal >Priority: Major > > Add colored policies to enable manual load balancing across sub clusters -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9532) SLSRunner.run() throws a YarnException when it fails to create the output directory
[ https://issues.apache.org/jira/browse/YARN-9532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-9532: -- Assignee: Minni Mittal > SLSRunner.run() throws a YarnException when it fails to create the output > directory > --- > > Key: YARN-9532 > URL: https://issues.apache.org/jira/browse/YARN-9532 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Haicheng Chen >Assignee: Minni Mittal >Priority: Minor > > Dear YARN developers, we are developing a tool to detect exception-related > bugs in Java. Our prototype has spotted the following {{throw}} statement > whose exception class and error message indicate different error conditions. > > Version: Hadoop-3.1.2 > File: > HADOOP-ROOT/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java > Line: 894 > {code:java} > if (!outputFile.exists() && !outputFile.mkdirs()) { > System.err.println("ERROR: Cannot create output directory " > + outputFile.getAbsolutePath()); > throw new YarnException("Cannot create output directory"); > }{code} > > The exception is triggered when {{run()}} fails to create the output > directory (as indicated by the {{if}} condition and the error message). > However, throwing a {{YarnException}} is too general and makes accurate > exception handling more difficult. Since the error is related to I/O, > throwing an {{IOException}}, or wrapping an {{IOException}} could be better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11008) Support heterogeneous node types in SLS Runner
[ https://issues.apache.org/jira/browse/YARN-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-11008: Issue Type: Improvement (was: Task) > Support heterogeneous node types in SLS Runner > -- > > Key: YARN-11008 > URL: https://issues.apache.org/jira/browse/YARN-11008 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8570) GPU support in SLS
[ https://issues.apache.org/jira/browse/YARN-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447806#comment-17447806 ] Minni Mittal commented on YARN-8570: [~jhung], I would like to work on this Jira. Can I take this over ? > GPU support in SLS > -- > > Key: YARN-8570 > URL: https://issues.apache.org/jira/browse/YARN-8570 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > > Currently resource requests in SLS only support memory and vcores. Since GPU > is natively supported by YARN, it will be useful to support requesting GPU > resources. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11008) Support heterogeneous node types in SLS Runner
Minni Mittal created YARN-11008: --- Summary: Support heterogeneous node types in SLS Runner Key: YARN-11008 URL: https://issues.apache.org/jira/browse/YARN-11008 Project: Hadoop YARN Issue Type: Task Reporter: Minni Mittal -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10218) [GPG] Support HTTPS in GPG
[ https://issues.apache.org/jira/browse/YARN-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442743#comment-17442743 ] Minni Mittal commented on YARN-10218: - [~BilwaST] If you are not working, can I take up this Jira? > [GPG] Support HTTPS in GPG > -- > > Key: YARN-10218 > URL: https://issues.apache.org/jira/browse/YARN-10218 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > > HTTPS support in Router is handled as part of Jira YARN-10120. Https Rest > calls from GPG to Router must be supported -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10174) Add colored policies to enable manual load balancing across sub clusters
[ https://issues.apache.org/jira/browse/YARN-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442739#comment-17442739 ] Minni Mittal commented on YARN-10174: - [~youchen], if you are not working, can I take up this JIRA? > Add colored policies to enable manual load balancing across sub clusters > > > Key: YARN-10174 > URL: https://issues.apache.org/jira/browse/YARN-10174 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Young Chen >Assignee: Young Chen >Priority: Major > > Add colored policies to enable manual load balancing across sub clusters -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11004) Refactor Audit Logger classes to move Audit Constants to a separate class
Minni Mittal created YARN-11004: --- Summary: Refactor Audit Logger classes to move Audit Constants to a separate class Key: YARN-11004 URL: https://issues.apache.org/jira/browse/YARN-11004 Project: Hadoop YARN Issue Type: Task Reporter: Minni Mittal -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11004) Refactor Audit Logger classes to move Audit Constants to a separate class
[ https://issues.apache.org/jira/browse/YARN-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-11004: --- Assignee: Minni Mittal > Refactor Audit Logger classes to move Audit Constants to a separate class > - > > Key: YARN-11004 > URL: https://issues.apache.org/jira/browse/YARN-11004 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9853) Add number of paused containers in NodeInfo page.
[ https://issues.apache.org/jira/browse/YARN-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-9853: -- Assignee: Minni Mittal (was: Abhishek Modi) > Add number of paused containers in NodeInfo page. > - > > Key: YARN-9853 > URL: https://issues.apache.org/jira/browse/YARN-9853 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10822) Containers going from New to Scheduled transition for killed container on recovery
[ https://issues.apache.org/jira/browse/YARN-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10822: Summary: Containers going from New to Scheduled transition for killed container on recovery (was: Containers going from New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled) > Containers going from New to Scheduled transition for killed container on > recovery > -- > > Key: YARN-10822 > URL: https://issues.apache.org/jira/browse/YARN-10822 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10822.v1.patch > > > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from NEW to > LOCALIZING > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from LOCALIZING to > SCHEDULED > INFO [91] ContainerScheduler: Opportunistic container > container_e1171_1623422468672_2229_01_000738 will be queued at the NM. > INFO [127] ContainerManagerImpl: Stopping container with container Id: > container_e1171_1623422468672_2229_01_000738 > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to > KILLING > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > INFO [91] NMAuditLogger: USER=defaultcafor1stparty OPERATION=Container > Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS > APPID=application_1623422468672_2229 > CONTAINERID=container_e1171_1623422468672_2229_01_000738 > INFO [91] ApplicationImpl: Removing > container_e1171_1623422468672_2229_01_000738 from application > application_1623422468672_2229 > INFO [91] ContainersMonitorImpl: Stopping resource-monitoring for > container_e1171_1623422468672_2229_01_000738 > INFO [163] NodeStatusUpdaterImpl: Removed completed containers from NM > context:[container_e1171_1623422468672_2229_01_000738] > NM restart happened and recovery is attempted > > INFO [1] ContainerManagerImpl: Recovering > container_e1171_1623422468672_2229_01_000738 in state QUEUED with exit code > -1000 > INFO [1] ApplicationImpl: Adding > container_e1171_1623422468672_2229_01_000738 to application > application_1623422468672_2229 > INFO [89] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from NEW to > SCHEDULED > INFO [89] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to > KILLING > INFO [89] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > Ideally, when container got killed before restart, it should finish the > container immediately. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10474) [JDK 12] TestAsyncDispatcher fails
[jira] [Updated] (YARN-10999) Make NodeQueueLoadMonitor pluggable in ResourceManager
[ https://issues.apache.org/jira/browse/YARN-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10999: Description: Add support to make NodeQueueLoadMonitor as a pluggable service in Resource Manager. (was: Add support to make NodeQueueLoadMonitor as a pluggable interface in NodeManager. The default implementation NodeQueueLoadMonitorImpl should be used if the class is not set. ) > Make NodeQueueLoadMonitor pluggable in ResourceManager > -- > > Key: YARN-10999 > URL: https://issues.apache.org/jira/browse/YARN-10999 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > Add support to make NodeQueueLoadMonitor as a pluggable service in Resource > Manager. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10999) Make NodeQueueLoadMonitor pluggable in ResourceManager
[ https://issues.apache.org/jira/browse/YARN-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10999: Description: Add support to make NodeQueueLoadMonitor as a pluggable interface in NodeManager. The default implementation NodeQueueLoadMonitorImpl should be used if the class is not set. (was: Add support to NodeHealthCheckerService inorder to make it as a pluggable interface in NodeManager. The default implementation NodeHealthCheckerServiceImpl should be used if the class is not set. ) > Make NodeQueueLoadMonitor pluggable in ResourceManager > -- > > Key: YARN-10999 > URL: https://issues.apache.org/jira/browse/YARN-10999 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > Add support to make NodeQueueLoadMonitor as a pluggable interface in > NodeManager. The default implementation NodeQueueLoadMonitorImpl should be > used if the class is not set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10999) Make NodeQueueLoadMonitor pluggable in ResourceManager
[ https://issues.apache.org/jira/browse/YARN-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10999: Description: Add support to NodeHealthCheckerService inorder to make it as a pluggable interface in NodeManager. The default implementation NodeHealthCheckerServiceImpl should be used if the class is not set. > Make NodeQueueLoadMonitor pluggable in ResourceManager > -- > > Key: YARN-10999 > URL: https://issues.apache.org/jira/browse/YARN-10999 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > Add support to NodeHealthCheckerService inorder to make it as a pluggable > interface in NodeManager. The default implementation > NodeHealthCheckerServiceImpl should be used if the class is not set. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10999) Make NodeQueueLoadMonitor pluggable in ResourceManager
Minni Mittal created YARN-10999: --- Summary: Make NodeQueueLoadMonitor pluggable in ResourceManager Key: YARN-10999 URL: https://issues.apache.org/jira/browse/YARN-10999 Project: Hadoop YARN Issue Type: Task Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10998) Add YARN_ROUTER_HEAPSIZE to yarn-env for routers
[ https://issues.apache.org/jira/browse/YARN-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10998: Summary: Add YARN_ROUTER_HEAPSIZE to yarn-env for routers (was: Add YARN_ROUTER_HEAPSIZE to yarn-env variables) > Add YARN_ROUTER_HEAPSIZE to yarn-env for routers > > > Key: YARN-10998 > URL: https://issues.apache.org/jira/browse/YARN-10998 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > > Yarn services NM, RM etc have YARN_\{SERVICENAME}_HEAPSIZE variable defined, > we should have similar parameter for Router Service also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10998) Add YARN_ROUTER_HEAPSIZE to yarn-env variables
[ https://issues.apache.org/jira/browse/YARN-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10998: Summary: Add YARN_ROUTER_HEAPSIZE to yarn-env variables (was: Adding YARN_ROUTER_HEAPSIZE to yarn-env variables) > Add YARN_ROUTER_HEAPSIZE to yarn-env variables > -- > > Key: YARN-10998 > URL: https://issues.apache.org/jira/browse/YARN-10998 > Project: Hadoop YARN > Issue Type: Task >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > > Yarn services NM, RM etc have YARN_\{SERVICENAME}_HEAPSIZE variable defined, > we should have similar parameter for Router Service also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10998) Adding YARN_ROUTER_HEAPSIZE to yarn-env variables
Minni Mittal created YARN-10998: --- Summary: Adding YARN_ROUTER_HEAPSIZE to yarn-env variables Key: YARN-10998 URL: https://issues.apache.org/jira/browse/YARN-10998 Project: Hadoop YARN Issue Type: Task Reporter: Minni Mittal Assignee: Minni Mittal Yarn services NM, RM etc have YARN_\{SERVICENAME}_HEAPSIZE variable defined, we should have similar parameter for Router Service also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Moved] (YARN-10956) Add OpenTelemetry instrumentation code into YARN
[ https://issues.apache.org/jira/browse/YARN-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal moved HADOOP-17911 to YARN-10956: -- Key: YARN-10956 (was: HADOOP-17911) Project: Hadoop YARN (was: Hadoop Common) > Add OpenTelemetry instrumentation code into YARN > - > > Key: YARN-10956 > URL: https://issues.apache.org/jira/browse/YARN-10956 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator
[ https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388909#comment-17388909 ] Minni Mittal commented on YARN-10848: - Got it. For checking whether container fitsIn should just depend on available resource and requested resource (the way it is done for FairScheduler) and not on resource calculator. [~pbacsko], I've added the PR. Can you please review the patch ? > Vcore allocation problem with DefaultResourceCalculator > --- > > Key: YARN-10848 > URL: https://issues.apache.org/jira/browse/YARN-10848 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Reporter: Peter Bacsko >Assignee: Minni Mittal >Priority: Major > Labels: pull-request-available > Attachments: TestTooManyContainers.java > > Time Spent: 10m > Remaining Estimate: 0h > > If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating > containers even if we run out of vcores. > CS checks the the available resources at two places. The first check is > {{CapacityScheduler.allocateContainerOnSingleNode()}}: > {noformat} > if (calculator.computeAvailableContainers(Resources > .add(node.getUnallocatedResource(), > node.getTotalKillableResources()), > minimumAllocation) <= 0) { > LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient " > + "available or preemptible resource for minimum allocation"); > {noformat} > The second, which is more important, is located in > {{RegularContainerAllocator.assignContainer()}}: > {noformat} > if (!Resources.fitsIn(rc, capability, totalResource)) { > LOG.warn("Node : " + node.getNodeID() > + " does not have sufficient resource for ask : " + pendingAsk > + " node total capability : " + node.getTotalResource()); > // Skip this locality request > ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation( > activitiesManager, node, application, schedulerKey, > ActivityDiagnosticConstant. > NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST > + getResourceDiagnostics(capability, totalResource), > ActivityLevel.NODE); > return ContainerAllocation.LOCALITY_SKIPPED; > } > {noformat} > Here, {{rc}} is the resource calculator instance, the other two values are: > {noformat} > Resource capability = pendingAsk.getPerAllocationResource(); > Resource available = node.getUnallocatedResource(); > {noformat} > There is a repro unit test attatched to this case, which can demonstrate the > problem. The root cause is that we pass the resource calculator to > {{Resource.fitsIn()}}. Instead, we should use an overridden version, just > like in {{FSAppAttempt.assignContainer()}}: > {noformat} >// Can we allocate a container on this node? > if (Resources.fitsIn(capability, available)) { > // Inform the application of the new container for this request > RMContainer allocatedContainer = > allocate(type, node, schedulerKey, pendingAsk, > reservedContainer); > {noformat} > In CS, if we switch to DominantResourceCalculator OR use > {{Resources.fitsIn()}} without the calculator in > {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit > test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator
[ https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388046#comment-17388046 ] Minni Mittal edited comment on YARN-10848 at 7/27/21, 1:10 PM: --- [~pbacsko], As per my understanding, DefaultResourceCalculator considers memory as the limiting resource. {code:java} private static final Set INSUFFICIENT_RESOURCE_NAME = ImmutableSet.of(ResourceInformation.MEMORY_URI); {code} As such, it will keep on allocating containers till we have memory available irrespective of the availability of the vcores. In the test "TestTooManyContainers" you added, if we increase numRequestedContainers to 13, then it will allocate 11 containers and then will have {code:java} This node 127.0.0.1:1234 doesn't have sufficient available or preemptible resource for minimum allocation {code} This looks like expected behavior to me. Please help me with understanding the issue. was (Author: minni31): [~pbacsko], As per my understanding, DefaultResourceCalculator considers memory as the limiting resource. {code:java} private static final Set INSUFFICIENT_RESOURCE_NAME = ImmutableSet.of(ResourceInformation.MEMORY_URI); {code} As such, it will keep on allocating containers till we have memory available irrespective of the availability of the vcores. In the test "TestTooManyContainers" ypu added, if we increase numRequestedContainers to 13, then it will allocate 11 containers and then will have {code:java} This node 127.0.0.1:1234 doesn't have sufficient available or preemptible resource for minimum allocation {code} This looks like expected behavior to me. Please help me with understanding the issue. > Vcore allocation problem with DefaultResourceCalculator > --- > > Key: YARN-10848 > URL: https://issues.apache.org/jira/browse/YARN-10848 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Reporter: Peter Bacsko >Assignee: Minni Mittal >Priority: Major > Attachments: TestTooManyContainers.java > > > If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating > containers even if we run out of vcores. > CS checks the the available resources at two places. The first check is > {{CapacityScheduler.allocateContainerOnSingleNode()}}: > {noformat} > if (calculator.computeAvailableContainers(Resources > .add(node.getUnallocatedResource(), > node.getTotalKillableResources()), > minimumAllocation) <= 0) { > LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient " > + "available or preemptible resource for minimum allocation"); > {noformat} > The second, which is more important, is located in > {{RegularContainerAllocator.assignContainer()}}: > {noformat} > if (!Resources.fitsIn(rc, capability, totalResource)) { > LOG.warn("Node : " + node.getNodeID() > + " does not have sufficient resource for ask : " + pendingAsk > + " node total capability : " + node.getTotalResource()); > // Skip this locality request > ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation( > activitiesManager, node, application, schedulerKey, > ActivityDiagnosticConstant. > NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST > + getResourceDiagnostics(capability, totalResource), > ActivityLevel.NODE); > return ContainerAllocation.LOCALITY_SKIPPED; > } > {noformat} > Here, {{rc}} is the resource calculator instance, the other two values are: > {noformat} > Resource capability = pendingAsk.getPerAllocationResource(); > Resource available = node.getUnallocatedResource(); > {noformat} > There is a repro unit test attatched to this case, which can demonstrate the > problem. The root cause is that we pass the resource calculator to > {{Resource.fitsIn()}}. Instead, we should use an overridden version, just > like in {{FSAppAttempt.assignContainer()}}: > {noformat} >// Can we allocate a container on this node? > if (Resources.fitsIn(capability, available)) { > // Inform the application of the new container for this request > RMContainer allocatedContainer = > allocate(type, node, schedulerKey, pendingAsk, > reservedContainer); > {noformat} > In CS, if we switch to DominantResourceCalculator OR use > {{Resources.fitsIn()}} without the calculator in > {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit > test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-i
[jira] [Commented] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator
[ https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388046#comment-17388046 ] Minni Mittal commented on YARN-10848: - [~pbacsko], As per my understanding, DefaultResourceCalculator considers memory as the limiting resource. {code:java} private static final Set INSUFFICIENT_RESOURCE_NAME = ImmutableSet.of(ResourceInformation.MEMORY_URI); {code} As such, it will keep on allocating containers till we have memory available irrespective of the availability of the vcores. In the test "TestTooManyContainers" ypu added, if we increase numRequestedContainers to 13, then it will allocate 11 containers and then will have {code:java} This node 127.0.0.1:1234 doesn't have sufficient available or preemptible resource for minimum allocation {code} This looks like expected behavior to me. Please help me with understanding the issue. > Vcore allocation problem with DefaultResourceCalculator > --- > > Key: YARN-10848 > URL: https://issues.apache.org/jira/browse/YARN-10848 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Reporter: Peter Bacsko >Assignee: Minni Mittal >Priority: Major > Attachments: TestTooManyContainers.java > > > If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating > containers even if we run out of vcores. > CS checks the the available resources at two places. The first check is > {{CapacityScheduler.allocateContainerOnSingleNode()}}: > {noformat} > if (calculator.computeAvailableContainers(Resources > .add(node.getUnallocatedResource(), > node.getTotalKillableResources()), > minimumAllocation) <= 0) { > LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient " > + "available or preemptible resource for minimum allocation"); > {noformat} > The second, which is more important, is located in > {{RegularContainerAllocator.assignContainer()}}: > {noformat} > if (!Resources.fitsIn(rc, capability, totalResource)) { > LOG.warn("Node : " + node.getNodeID() > + " does not have sufficient resource for ask : " + pendingAsk > + " node total capability : " + node.getTotalResource()); > // Skip this locality request > ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation( > activitiesManager, node, application, schedulerKey, > ActivityDiagnosticConstant. > NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST > + getResourceDiagnostics(capability, totalResource), > ActivityLevel.NODE); > return ContainerAllocation.LOCALITY_SKIPPED; > } > {noformat} > Here, {{rc}} is the resource calculator instance, the other two values are: > {noformat} > Resource capability = pendingAsk.getPerAllocationResource(); > Resource available = node.getUnallocatedResource(); > {noformat} > There is a repro unit test attatched to this case, which can demonstrate the > problem. The root cause is that we pass the resource calculator to > {{Resource.fitsIn()}}. Instead, we should use an overridden version, just > like in {{FSAppAttempt.assignContainer()}}: > {noformat} >// Can we allocate a container on this node? > if (Resources.fitsIn(capability, available)) { > // Inform the application of the new container for this request > RMContainer allocatedContainer = > allocate(type, node, schedulerKey, pendingAsk, > reservedContainer); > {noformat} > In CS, if we switch to DominantResourceCalculator OR use > {{Resources.fitsIn()}} without the calculator in > {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit > test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9853) Add number of paused containers in NodeInfo page.
[ https://issues.apache.org/jira/browse/YARN-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377908#comment-17377908 ] Minni Mittal commented on YARN-9853: Hey [~abmodi], Can I take up this Jira ? > Add number of paused containers in NodeInfo page. > - > > Key: YARN-9853 > URL: https://issues.apache.org/jira/browse/YARN-9853 > Project: Hadoop YARN > Issue Type: Task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator
[ https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377412#comment-17377412 ] Minni Mittal commented on YARN-10848: - Hey [~pbacsko], Can I take up this Jira if you are not working on this ? > Vcore allocation problem with DefaultResourceCalculator > --- > > Key: YARN-10848 > URL: https://issues.apache.org/jira/browse/YARN-10848 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler >Reporter: Peter Bacsko >Priority: Major > Attachments: TestTooManyContainers.java > > > If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating > containers even if we run out of vcores. > CS checks the the available resources at two places. The first check is > {{CapacityScheduler.allocateContainerOnSingleNode()}}: > {noformat} > if (calculator.computeAvailableContainers(Resources > .add(node.getUnallocatedResource(), > node.getTotalKillableResources()), > minimumAllocation) <= 0) { > LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient " > + "available or preemptible resource for minimum allocation"); > {noformat} > The second, which is more important, is located in > {{RegularContainerAllocator.assignContainer()}}: > {noformat} > if (!Resources.fitsIn(rc, capability, totalResource)) { > LOG.warn("Node : " + node.getNodeID() > + " does not have sufficient resource for ask : " + pendingAsk > + " node total capability : " + node.getTotalResource()); > // Skip this locality request > ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation( > activitiesManager, node, application, schedulerKey, > ActivityDiagnosticConstant. > NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST > + getResourceDiagnostics(capability, totalResource), > ActivityLevel.NODE); > return ContainerAllocation.LOCALITY_SKIPPED; > } > {noformat} > Here, {{rc}} is the resource calculator instance, the other two values are: > {noformat} > Resource capability = pendingAsk.getPerAllocationResource(); > Resource available = node.getUnallocatedResource(); > {noformat} > There is a repro unit test attatched to this case, which can demonstrate the > problem. The root cause is that we pass the resource calculator to > {{Resource.fitsIn()}}. Instead, we should use an overridden version, just > like in {{FSAppAttempt.assignContainer()}}: > {noformat} >// Can we allocate a container on this node? > if (Resources.fitsIn(capability, available)) { > // Inform the application of the new container for this request > RMContainer allocatedContainer = > allocate(type, node, schedulerKey, pendingAsk, > reservedContainer); > {noformat} > In CS, if we switch to DominantResourceCalculator OR use > {{Resources.fitsIn()}} without the calculator in > {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit > test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10459) containerLaunchedOnNode method not need to hold schedulerApptemt lock
[ https://issues.apache.org/jira/browse/YARN-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10459: Attachment: YARN-10459.v1.patch > containerLaunchedOnNode method not need to hold schedulerApptemt lock > -- > > Key: YARN-10459 > URL: https://issues.apache.org/jira/browse/YARN-10459 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0, 3.1.3 >Reporter: Ryan Wu >Assignee: Minni Mittal >Priority: Major > Fix For: 3.2.1 > > Attachments: YARN-10459.v1.patch > > > > Now, the containerLaunchedOnNode method hold the SchedulerApplicationAttempt > writelock, but looking at the method, it does not change any field. And more > seriously, this will affect the scheduler. > {code:java} > public void containerLaunchedOnNode(ContainerId containerId, NodeId nodeId) > { > // Inform the container > writelock.lock > try { > RMContainer rmContainer = getRMContainer(containerId); > if (rmContainer == null) { > // Some unknown container sneaked into the system. Kill it. > rmContext.getDispatcher().getEventHandler().handle( new > RMNodeCleanContainerEvent(nodeId, containerId)); return; > } > rmContainer.handle( new RMContainerEvent(containerId, > RMContainerEventType.LAUNCHED)); >}finally { > writeLock.unlock(); >} > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10459) containerLaunchedOnNode method not need to hold schedulerApptemt lock
[ https://issues.apache.org/jira/browse/YARN-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-10459: --- Assignee: Minni Mittal > containerLaunchedOnNode method not need to hold schedulerApptemt lock > -- > > Key: YARN-10459 > URL: https://issues.apache.org/jira/browse/YARN-10459 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0, 3.1.3 >Reporter: Ryan Wu >Assignee: Minni Mittal >Priority: Major > Fix For: 3.2.1 > > > > Now, the containerLaunchedOnNode method hold the SchedulerApplicationAttempt > writelock, but looking at the method, it does not change any field. And more > seriously, this will affect the scheduler. > {code:java} > public void containerLaunchedOnNode(ContainerId containerId, NodeId nodeId) > { > // Inform the container > writelock.lock > try { > RMContainer rmContainer = getRMContainer(containerId); > if (rmContainer == null) { > // Some unknown container sneaked into the system. Kill it. > rmContext.getDispatcher().getEventHandler().handle( new > RMNodeCleanContainerEvent(nodeId, containerId)); return; > } > rmContainer.handle( new RMContainerEvent(containerId, > RMContainerEventType.LAUNCHED)); >}finally { > writeLock.unlock(); >} > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10841) Fix token reset synchronization by making sure for UAM response token reset is done while in lock.
[ https://issues.apache.org/jira/browse/YARN-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10841: Attachment: YARN-10841.v1.patch > Fix token reset synchronization by making sure for UAM response token reset > is done while in lock. > --- > > Key: YARN-10841 > URL: https://issues.apache.org/jira/browse/YARN-10841 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > Attachments: YARN-10841.v1.patch > > > *2021-06-24T10:11:39,465* [ERROR] [AMRM Heartbeater thread] > |impl.AMRMClientAsyncImpl|: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: amrmToken from UAM > cluster-0 should be null here > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor.allocate(FederationInterceptor.java:782) > > > *2021-06-24T10:10:12,608* INFO [616916] FederationInterceptor: Received new > UAM amrmToken with keyId 843616604 > Hearbeatcallback sets token to null. But because of synchroniztion issue, it > happened after mergeAllocate is called. So, while allocate merge is happening > the value should get set to null and should have happened Inside lock -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10841) Fix token reset synchronization by making sure for UAM response token reset is done while in lock.
[ https://issues.apache.org/jira/browse/YARN-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10841: Description: *2021-06-24T10:11:39,465* [ERROR] [AMRM Heartbeater thread] |impl.AMRMClientAsyncImpl|: Exception on heartbeat org.apache.hadoop.yarn.exceptions.YarnException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: amrmToken from UAM cluster-0 should be null here at org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor.allocate(FederationInterceptor.java:782) *2021-06-24T10:10:12,608* INFO [616916] FederationInterceptor: Received new UAM amrmToken with keyId 843616604 Hearbeatcallback sets token to null. But because of synchroniztion issue, it happened after mergeAllocate is called. So, while allocate merge is happening the value should get set to null and should have happened Inside lock > Fix token reset synchronization by making sure for UAM response token reset > is done while in lock. > --- > > Key: YARN-10841 > URL: https://issues.apache.org/jira/browse/YARN-10841 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > > *2021-06-24T10:11:39,465* [ERROR] [AMRM Heartbeater thread] > |impl.AMRMClientAsyncImpl|: Exception on heartbeat > org.apache.hadoop.yarn.exceptions.YarnException: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: amrmToken from UAM > cluster-0 should be null here > at > org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor.allocate(FederationInterceptor.java:782) > > > *2021-06-24T10:10:12,608* INFO [616916] FederationInterceptor: Received new > UAM amrmToken with keyId 843616604 > Hearbeatcallback sets token to null. But because of synchroniztion issue, it > happened after mergeAllocate is called. So, while allocate merge is happening > the value should get set to null and should have happened Inside lock -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10841) Fix token reset synchronization by making sure for UAM response token reset is done while in lock.
Minni Mittal created YARN-10841: --- Summary: Fix token reset synchronization by making sure for UAM response token reset is done while in lock. Key: YARN-10841 URL: https://issues.apache.org/jira/browse/YARN-10841 Project: Hadoop YARN Issue Type: Bug Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10822) Containers going from New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10822: Attachment: YARN-10822.v1.patch > Containers going from New to Scheduled transition even though container is > killed before NM restart when NM recovery is enabled > --- > > Key: YARN-10822 > URL: https://issues.apache.org/jira/browse/YARN-10822 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10822.v1.patch > > > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from NEW to > LOCALIZING > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from LOCALIZING to > SCHEDULED > INFO [91] ContainerScheduler: Opportunistic container > container_e1171_1623422468672_2229_01_000738 will be queued at the NM. > INFO [127] ContainerManagerImpl: Stopping container with container Id: > container_e1171_1623422468672_2229_01_000738 > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to > KILLING > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > INFO [91] NMAuditLogger: USER=defaultcafor1stparty OPERATION=Container > Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS > APPID=application_1623422468672_2229 > CONTAINERID=container_e1171_1623422468672_2229_01_000738 > INFO [91] ApplicationImpl: Removing > container_e1171_1623422468672_2229_01_000738 from application > application_1623422468672_2229 > INFO [91] ContainersMonitorImpl: Stopping resource-monitoring for > container_e1171_1623422468672_2229_01_000738 > INFO [163] NodeStatusUpdaterImpl: Removed completed containers from NM > context:[container_e1171_1623422468672_2229_01_000738] > NM restart happened and recovery is attempted > > INFO [1] ContainerManagerImpl: Recovering > container_e1171_1623422468672_2229_01_000738 in state QUEUED with exit code > -1000 > INFO [1] ApplicationImpl: Adding > container_e1171_1623422468672_2229_01_000738 to application > application_1623422468672_2229 > INFO [89] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from NEW to > SCHEDULED > INFO [89] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to > KILLING > INFO [89] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > Ideally, when container got killed before restart, it should finish the > container immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10822) Containers going from New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10822: Description: INFO [91] ContainerImpl: Container container_e1171_1623422468672_2229_01_000738 transitioned from NEW to LOCALIZING INFO [91] ContainerImpl: Container container_e1171_1623422468672_2229_01_000738 transitioned from LOCALIZING to SCHEDULED INFO [91] ContainerScheduler: Opportunistic container container_e1171_1623422468672_2229_01_000738 will be queued at the NM. INFO [127] ContainerManagerImpl: Stopping container with container Id: container_e1171_1623422468672_2229_01_000738 INFO [91] ContainerImpl: Container container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to KILLING INFO [91] ContainerImpl: Container container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL INFO [91] NMAuditLogger: USER=defaultcafor1stparty OPERATION=Container Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS APPID=application_1623422468672_2229 CONTAINERID=container_e1171_1623422468672_2229_01_000738 INFO [91] ApplicationImpl: Removing container_e1171_1623422468672_2229_01_000738 from application application_1623422468672_2229 INFO [91] ContainersMonitorImpl: Stopping resource-monitoring for container_e1171_1623422468672_2229_01_000738 INFO [163] NodeStatusUpdaterImpl: Removed completed containers from NM context:[container_e1171_1623422468672_2229_01_000738] NM restart happened and recovery is attempted INFO [1] ContainerManagerImpl: Recovering container_e1171_1623422468672_2229_01_000738 in state QUEUED with exit code -1000 INFO [1] ApplicationImpl: Adding container_e1171_1623422468672_2229_01_000738 to application application_1623422468672_2229 INFO [89] ContainerImpl: Container container_e1171_1623422468672_2229_01_000738 transitioned from NEW to SCHEDULED INFO [89] ContainerImpl: Container container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to KILLING INFO [89] ContainerImpl: Container container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL Ideally, when container got killed before restart, it should finish the container immediately. > Containers going from New to Scheduled transition even though container is > killed before NM restart when NM recovery is enabled > --- > > Key: YARN-10822 > URL: https://issues.apache.org/jira/browse/YARN-10822 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from NEW to > LOCALIZING > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from LOCALIZING to > SCHEDULED > INFO [91] ContainerScheduler: Opportunistic container > container_e1171_1623422468672_2229_01_000738 will be queued at the NM. > INFO [127] ContainerManagerImpl: Stopping container with container Id: > container_e1171_1623422468672_2229_01_000738 > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to > KILLING > INFO [91] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to > CONTAINER_CLEANEDUP_AFTER_KILL > INFO [91] NMAuditLogger: USER=defaultcafor1stparty OPERATION=Container > Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS > APPID=application_1623422468672_2229 > CONTAINERID=container_e1171_1623422468672_2229_01_000738 > INFO [91] ApplicationImpl: Removing > container_e1171_1623422468672_2229_01_000738 from application > application_1623422468672_2229 > INFO [91] ContainersMonitorImpl: Stopping resource-monitoring for > container_e1171_1623422468672_2229_01_000738 > INFO [163] NodeStatusUpdaterImpl: Removed completed containers from NM > context:[container_e1171_1623422468672_2229_01_000738] > NM restart happened and recovery is attempted > > INFO [1] ContainerManagerImpl: Recovering > container_e1171_1623422468672_2229_01_000738 in state QUEUED with exit code > -1000 > INFO [1] ApplicationImpl: Adding > container_e1171_1623422468672_2229_01_000738 to application > application_1623422468672_2229 > INFO [89] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from NEW to > SCHEDULED > INFO [89] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to > KILLING > INFO [89] ContainerImpl: Container > container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to > CONTAINER_
[jira] [Created] (YARN-10822) Containers going to New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled
Minni Mittal created YARN-10822: --- Summary: Containers going to New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled Key: YARN-10822 URL: https://issues.apache.org/jira/browse/YARN-10822 Project: Hadoop YARN Issue Type: Bug Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10822) Containers going from New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled
[ https://issues.apache.org/jira/browse/YARN-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10822: Summary: Containers going from New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled (was: Containers going to New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled) > Containers going from New to Scheduled transition even though container is > killed before NM restart when NM recovery is enabled > --- > > Key: YARN-10822 > URL: https://issues.apache.org/jira/browse/YARN-10822 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10815) Handle Invalid event: PAUSE_CONTAINER at Container state as SCHEDULED
Minni Mittal created YARN-10815: --- Summary: Handle Invalid event: PAUSE_CONTAINER at Container state as SCHEDULED Key: YARN-10815 URL: https://issues.apache.org/jira/browse/YARN-10815 Project: Hadoop YARN Issue Type: Bug Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10815) Handle Invalid event: PAUSE_CONTAINER at Container state SCHEDULED
[ https://issues.apache.org/jira/browse/YARN-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10815: Summary: Handle Invalid event: PAUSE_CONTAINER at Container state SCHEDULED (was: Handle Invalid event: PAUSE_CONTAINER at Container state as SCHEDULED) > Handle Invalid event: PAUSE_CONTAINER at Container state SCHEDULED > -- > > Key: YARN-10815 > URL: https://issues.apache.org/jira/browse/YARN-10815 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10478) Make RM-NM heartbeat scaling calculator pluggable
[ https://issues.apache.org/jira/browse/YARN-10478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-10478: --- Assignee: Minni Mittal > Make RM-NM heartbeat scaling calculator pluggable > - > > Key: YARN-10478 > URL: https://issues.apache.org/jira/browse/YARN-10478 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: Jim Brennan >Assignee: Minni Mittal >Priority: Minor > > [YARN-10475] adds a feature to enable scaling the interval for heartbeats > between the RM and NM based on CPU utilization. [~bibinchundatt] suggested > that we make this pluggable so that other calculations can be used if desired. > The configuration properties added in [YARN-10475] should be applicable to > any heartbeat calculator. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2614) Cleanup synchronized method in SchedulerApplicationAttempt
[ https://issues.apache.org/jira/browse/YARN-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359397#comment-17359397 ] Minni Mittal commented on YARN-2614: Hey [~leftnoteasy], Can I work on this Jira ? > Cleanup synchronized method in SchedulerApplicationAttempt > -- > > Key: YARN-2614 > URL: https://issues.apache.org/jira/browse/YARN-2614 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Wangda Tan >Priority: Major > > According to discussions in YARN-2594, there're some methods in > SchedulerApplicationAttempt will be accessed by other modules, that will lead > to potential dead lock in RM, we should cleanup them as much as we can. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10459) containerLaunchedOnNode method not need to hold schedulerApptemt lock
[ https://issues.apache.org/jira/browse/YARN-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359387#comment-17359387 ] Minni Mittal commented on YARN-10459: - Hey [~jianliang.wu], Can I take up this Jira if you are not working on this ? > containerLaunchedOnNode method not need to hold schedulerApptemt lock > -- > > Key: YARN-10459 > URL: https://issues.apache.org/jira/browse/YARN-10459 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.2.0, 3.1.3 >Reporter: Ryan Wu >Priority: Major > Fix For: 3.2.1 > > > > Now, the containerLaunchedOnNode method hold the SchedulerApplicationAttempt > writelock, but looking at the method, it does not change any field. And more > seriously, this will affect the scheduler. > {code:java} > public void containerLaunchedOnNode(ContainerId containerId, NodeId nodeId) > { > // Inform the container > writelock.lock > try { > RMContainer rmContainer = getRMContainer(containerId); > if (rmContainer == null) { > // Some unknown container sneaked into the system. Kill it. > rmContext.getDispatcher().getEventHandler().handle( new > RMNodeCleanContainerEvent(nodeId, containerId)); return; > } > rmContainer.handle( new RMContainerEvent(containerId, > RMContainerEventType.LAUNCHED)); >}finally { > writeLock.unlock(); >} > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9910) Make private localizer download resources in parallel
[ https://issues.apache.org/jira/browse/YARN-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-9910: -- Assignee: Minni Mittal (was: Abhishek Modi) > Make private localizer download resources in parallel > - > > Key: YARN-9910 > URL: https://issues.apache.org/jira/browse/YARN-9910 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Abhishek Modi >Assignee: Minni Mittal >Priority: Major > > Currently private localizer uses a single threaded pool to do the > localization. As part of this jira, private localizer will create a fixed > threadpool of configurable number of threads for localization. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10683) Add total resource in NodeManager metrics
[ https://issues.apache.org/jira/browse/YARN-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10683: Attachment: YARN-10683.v1.patch > Add total resource in NodeManager metrics > - > > Key: YARN-10683 > URL: https://issues.apache.org/jira/browse/YARN-10683 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > Attachments: YARN-10683.v1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics
[ https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10518: Description: This Jira deals with updating NodeManager metrics with custom resource types. It includes allocated, available resources. (was: This Jira deals with updating NodeManager metrics with custom resource types. It includes allocated, available and total resources.) > Add metrics for custom resource types in NodeManagerMetrics > > > Key: YARN-10518 > URL: https://issues.apache.org/jira/browse/YARN-10518 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10518.v1.patch > > > This Jira deals with updating NodeManager metrics with custom resource types. > It includes allocated, available resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics
[ https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10518: Attachment: YARN-10518.v1.patch > Add metrics for custom resource types in NodeManagerMetrics > > > Key: YARN-10518 > URL: https://issues.apache.org/jira/browse/YARN-10518 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10518.v1.patch > > > This Jira deals with updating NodeManager metrics with custom resource types. > It includes allocated, available and total resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10683) Add total resource in NodeManager metrics
[ https://issues.apache.org/jira/browse/YARN-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-10683: --- Assignee: Minni Mittal > Add total resource in NodeManager metrics > - > > Key: YARN-10683 > URL: https://issues.apache.org/jira/browse/YARN-10683 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10683) Add total resource in NodeManager metrics
Minni Mittal created YARN-10683: --- Summary: Add total resource in NodeManager metrics Key: YARN-10683 URL: https://issues.apache.org/jira/browse/YARN-10683 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Minni Mittal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10569) Add metrics for success/failure/latency in ResourceLocalization
[ https://issues.apache.org/jira/browse/YARN-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10569: Description: This Jira deals with updating NodeManager metrics with sucess, failure, pending and latency stats for ResourceLocalization Service. > Add metrics for success/failure/latency in ResourceLocalization > --- > > Key: YARN-10569 > URL: https://issues.apache.org/jira/browse/YARN-10569 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > This Jira deals with updating NodeManager metrics with sucess, failure, > pending and latency stats for ResourceLocalization Service. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10569) Add metrics for success/failure/latency in ResourceLocalization
Minni Mittal created YARN-10569: --- Summary: Add metrics for success/failure/latency in ResourceLocalization Key: YARN-10569 URL: https://issues.apache.org/jira/browse/YARN-10569 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8859) Add audit logs for router service
[ https://issues.apache.org/jira/browse/YARN-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-8859: --- Attachment: YARN-8859.v3.patch > Add audit logs for router service > - > > Key: YARN-8859 > URL: https://issues.apache.org/jira/browse/YARN-8859 > Project: Hadoop YARN > Issue Type: Sub-task > Components: router >Reporter: Bibin Chundatt >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8859.v1.patch, YARN-8859.v2.patch, > YARN-8859.v3.patch > > > Similar to all other yarn services. > RouterClientRMService and RouterWebServices api/rest call should have Audit > logging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7898) [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router
[ https://issues.apache.org/jira/browse/YARN-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-7898: --- Attachment: YARN-7898-YARN-7402.v9.patch > [FederationStateStore] Create a proxy chain for FederationStateStore API in > the Router > -- > > Key: YARN-7898 > URL: https://issues.apache.org/jira/browse/YARN-7898 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Minni Mittal >Priority: Major > Attachments: StateStoreProxy StressTest.jpg, > YARN-7898-YARN-7402.proto.patch, YARN-7898-YARN-7402.v1.patch, > YARN-7898-YARN-7402.v2.patch, YARN-7898-YARN-7402.v3.patch, > YARN-7898-YARN-7402.v4.patch, YARN-7898-YARN-7402.v5.patch, > YARN-7898-YARN-7402.v6.patch, YARN-7898-YARN-7402.v7.patch, > YARN-7898-YARN-7402.v8.patch, YARN-7898-YARN-7402.v9.patch, YARN-7898.v7.patch > > > As detailed in the proposal in the umbrella JIRA, we are introducing a new > component that routes client request to appropriate FederationStateStore. > This JIRA tracks the creation of a proxy for FederationStateStore in the > Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8859) Add audit logs for router service
[ https://issues.apache.org/jira/browse/YARN-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-8859: --- Attachment: YARN-8859.v2.patch > Add audit logs for router service > - > > Key: YARN-8859 > URL: https://issues.apache.org/jira/browse/YARN-8859 > Project: Hadoop YARN > Issue Type: Sub-task > Components: router >Reporter: Bibin Chundatt >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8859.v1.patch, YARN-8859.v2.patch > > > Similar to all other yarn services. > RouterClientRMService and RouterWebServices api/rest call should have Audit > logging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7898) [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router
[ https://issues.apache.org/jira/browse/YARN-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-7898: --- Attachment: YARN-7898-YARN-7402.v8.patch > [FederationStateStore] Create a proxy chain for FederationStateStore API in > the Router > -- > > Key: YARN-7898 > URL: https://issues.apache.org/jira/browse/YARN-7898 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Minni Mittal >Priority: Major > Attachments: StateStoreProxy StressTest.jpg, > YARN-7898-YARN-7402.proto.patch, YARN-7898-YARN-7402.v1.patch, > YARN-7898-YARN-7402.v2.patch, YARN-7898-YARN-7402.v3.patch, > YARN-7898-YARN-7402.v4.patch, YARN-7898-YARN-7402.v5.patch, > YARN-7898-YARN-7402.v6.patch, YARN-7898-YARN-7402.v7.patch, > YARN-7898-YARN-7402.v8.patch, YARN-7898.v7.patch > > > As detailed in the proposal in the umbrella JIRA, we are introducing a new > component that routes client request to appropriate FederationStateStore. > This JIRA tracks the creation of a proxy for FederationStateStore in the > Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8859) Add audit logs for router service
[ https://issues.apache.org/jira/browse/YARN-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-8859: --- Attachment: YARN-8859.v1.patch > Add audit logs for router service > - > > Key: YARN-8859 > URL: https://issues.apache.org/jira/browse/YARN-8859 > Project: Hadoop YARN > Issue Type: Sub-task > Components: router >Reporter: Bibin Chundatt >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8859.v1.patch > > > Similar to all other yarn services. > RouterClientRMService and RouterWebServices api/rest call should have Audit > logging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService
[ https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254076#comment-17254076 ] Minni Mittal commented on YARN-8529: [~bibinchundatt] Can you please review the patch ? > Add timeout to RouterWebServiceUtil#invokeRMWebService > -- > > Key: YARN-8529 > URL: https://issues.apache.org/jira/browse/YARN-8529 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8529.v1.patch, YARN-8529.v10.patch, > YARN-8529.v11.patch, YARN-8529.v2.patch, YARN-8529.v3.patch, > YARN-8529.v4.patch, YARN-8529.v5.patch, YARN-8529.v6.patch, > YARN-8529.v7.patch, YARN-8529.v8.patch, YARN-8529.v9.patch > > > {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. > This should be configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService
[ https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-8529: --- Attachment: YARN-8529.v11.patch > Add timeout to RouterWebServiceUtil#invokeRMWebService > -- > > Key: YARN-8529 > URL: https://issues.apache.org/jira/browse/YARN-8529 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8529.v1.patch, YARN-8529.v10.patch, > YARN-8529.v11.patch, YARN-8529.v2.patch, YARN-8529.v3.patch, > YARN-8529.v4.patch, YARN-8529.v5.patch, YARN-8529.v6.patch, > YARN-8529.v7.patch, YARN-8529.v8.patch, YARN-8529.v9.patch > > > {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. > This should be configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10519: Attachment: YARN-10519.v7.patch > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch, > YARN-10519.v6.patch, YARN-10519.v7.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10519: Attachment: YARN-10519.v6.patch > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch, > YARN-10519.v6.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService
[ https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-8529: --- Attachment: YARN-8529.v10.patch > Add timeout to RouterWebServiceUtil#invokeRMWebService > -- > > Key: YARN-8529 > URL: https://issues.apache.org/jira/browse/YARN-8529 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8529.v1.patch, YARN-8529.v10.patch, > YARN-8529.v2.patch, YARN-8529.v3.patch, YARN-8529.v4.patch, > YARN-8529.v5.patch, YARN-8529.v6.patch, YARN-8529.v7.patch, > YARN-8529.v8.patch, YARN-8529.v9.patch > > > {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. > This should be configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251590#comment-17251590 ] Minni Mittal commented on YARN-10519: - [~bibinchundatt], Can you please review the recent patch ? > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251517#comment-17251517 ] Minni Mittal commented on YARN-10519: - I've addressed the comments for new line and changing visibility in new patch. For UTs, reference in QueueMetrics is required. > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10519: Attachment: YARN-10519.v5.patch > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10519: Attachment: YARN-10519.v4.patch > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch, YARN-10519.v4.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-10523) Apps Pending Metrics can have incorrect value on RM recovery restart because of Unmanaged apps.
[ https://issues.apache.org/jira/browse/YARN-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal reassigned YARN-10523: --- Assignee: Minni Mittal > Apps Pending Metrics can have incorrect value on RM recovery restart because > of Unmanaged apps. > --- > > Key: YARN-10523 > URL: https://issues.apache.org/jira/browse/YARN-10523 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > This Jira handles the following scenario for AppPending metric on RM restart > when recovery is enabled: > AppsPending Metrics increments for each application which is final state is > none on RM restart. For applications where there is a container to recover, > those applications get decrement from the metric. > For unmanaged applications when there is no container to recover, this metric > doesn't get decrement, which makes the value of the metric incorrect. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10523) Apps Pending Metrics can have incorrect value on RM recovery restart because of Unmanaged apps.
Minni Mittal created YARN-10523: --- Summary: Apps Pending Metrics can have incorrect value on RM recovery restart because of Unmanaged apps. Key: YARN-10523 URL: https://issues.apache.org/jira/browse/YARN-10523 Project: Hadoop YARN Issue Type: Bug Reporter: Minni Mittal This Jira handles the following scenario for AppPending metric on RM restart when recovery is enabled: AppsPending Metrics increments for each application which is final state is none on RM restart. For applications where there is a container to recover, those applications get decrement from the metric. For unmanaged applications when there is no container to recover, this metric doesn't get decrement, which makes the value of the metric incorrect. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246413#comment-17246413 ] Minni Mittal commented on YARN-10519: - [~bibinchundatt] Can you please review the patch? > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10519: Attachment: YARN-10519.v3.patch > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, > YARN-10519.v3.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245977#comment-17245977 ] Minni Mittal commented on YARN-10519: - Thanks [~bibinchundatt] for the review. Addressed the comment in the second patch. > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10519: Attachment: YARN-10519.v2.patch > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
[ https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10519: Attachment: YARN-10519.v1.patch > Refactor QueueMetricsForCustomResources class to move to yarn-common package > > > Key: YARN-10519 > URL: https://issues.apache.org/jira/browse/YARN-10519 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-10519.v1.patch > > > Refactor the code for QueueMetricsForCustomResources to move the base classes > to yarn-common package. This helps in reusing the class in adding custom > resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package
Minni Mittal created YARN-10519: --- Summary: Refactor QueueMetricsForCustomResources class to move to yarn-common package Key: YARN-10519 URL: https://issues.apache.org/jira/browse/YARN-10519 Project: Hadoop YARN Issue Type: Improvement Reporter: Minni Mittal Assignee: Minni Mittal Refactor the code for QueueMetricsForCustomResources to move the base classes to yarn-common package. This helps in reusing the class in adding custom resource types at NM level also. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics
[ https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-10518: Description: This Jira deals with updating NodeManager metrics with custom resource types. It includes allocated, available and total resources. > Add metrics for custom resource types in NodeManagerMetrics > > > Key: YARN-10518 > URL: https://issues.apache.org/jira/browse/YARN-10518 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Minni Mittal >Assignee: Minni Mittal >Priority: Major > > This Jira deals with updating NodeManager metrics with custom resource types. > It includes allocated, available and total resources. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics
Minni Mittal created YARN-10518: --- Summary: Add metrics for custom resource types in NodeManagerMetrics Key: YARN-10518 URL: https://issues.apache.org/jira/browse/YARN-10518 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Minni Mittal Assignee: Minni Mittal -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7898) [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router
[ https://issues.apache.org/jira/browse/YARN-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-7898: --- Attachment: YARN-7898-YARN-7402.v7.patch > [FederationStateStore] Create a proxy chain for FederationStateStore API in > the Router > -- > > Key: YARN-7898 > URL: https://issues.apache.org/jira/browse/YARN-7898 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Minni Mittal >Priority: Major > Attachments: StateStoreProxy StressTest.jpg, > YARN-7898-YARN-7402.proto.patch, YARN-7898-YARN-7402.v1.patch, > YARN-7898-YARN-7402.v2.patch, YARN-7898-YARN-7402.v3.patch, > YARN-7898-YARN-7402.v4.patch, YARN-7898-YARN-7402.v5.patch, > YARN-7898-YARN-7402.v6.patch, YARN-7898-YARN-7402.v7.patch, YARN-7898.v7.patch > > > As detailed in the proposal in the umbrella JIRA, we are introducing a new > component that routes client request to appropriate FederationStateStore. > This JIRA tracks the creation of a proxy for FederationStateStore in the > Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService
[ https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047221#comment-17047221 ] Minni Mittal commented on YARN-8529: [~bibinchundatt] [~elgoiri] Can you please review the patch ? > Add timeout to RouterWebServiceUtil#invokeRMWebService > -- > > Key: YARN-8529 > URL: https://issues.apache.org/jira/browse/YARN-8529 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch, > YARN-8529.v3.patch, YARN-8529.v4.patch, YARN-8529.v5.patch, > YARN-8529.v6.patch, YARN-8529.v7.patch, YARN-8529.v8.patch, YARN-8529.v9.patch > > > {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. > This should be configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService
[ https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-8529: --- Attachment: YARN-8529.v9.patch > Add timeout to RouterWebServiceUtil#invokeRMWebService > -- > > Key: YARN-8529 > URL: https://issues.apache.org/jira/browse/YARN-8529 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch, > YARN-8529.v3.patch, YARN-8529.v4.patch, YARN-8529.v5.patch, > YARN-8529.v6.patch, YARN-8529.v7.patch, YARN-8529.v8.patch, YARN-8529.v9.patch > > > {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. > This should be configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService
[ https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Minni Mittal updated YARN-8529: --- Attachment: YARN-8529.v8.patch > Add timeout to RouterWebServiceUtil#invokeRMWebService > -- > > Key: YARN-8529 > URL: https://issues.apache.org/jira/browse/YARN-8529 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Minni Mittal >Priority: Major > Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch, > YARN-8529.v3.patch, YARN-8529.v4.patch, YARN-8529.v5.patch, > YARN-8529.v6.patch, YARN-8529.v7.patch, YARN-8529.v8.patch > > > {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. > This should be configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org