from:"Minni Mittal \(Jira\)"

[jira] [Updated] (YARN-11026) Make AppPlacementAllocator configurable in AppSchedulingInfo

2022-01-21 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11026:

Summary: Make AppPlacementAllocator configurable in AppSchedulingInfo  
(was: Make default AppPlacementAllocator configurable in AppSchedulingInfo)

> Make AppPlacementAllocator configurable in AppSchedulingInfo
> 
>
> Key: YARN-11026
> URL: https://issues.apache.org/jira/browse/YARN-11026
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8431) Add whitelist/blacklist support for ATSv2 events.

2022-01-21 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-8431:
--

Assignee: Minni Mittal  (was: Abhishek Modi)

> Add whitelist/blacklist support for ATSv2 events.
> -
>
> Key: YARN-8431
> URL: https://issues.apache.org/jira/browse/YARN-8431
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Abhishek Modi
>Assignee: Minni Mittal
>Priority: Major
>
> In this jira, we will add functionality in ATSv2 to blacklist events at 
> cluster level. Blacklisting of events should not require restart of any of 
> the services and should apply dynamically.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9383) Publish federation events to ATSv2.

2022-01-11 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-9383:
--

Assignee: Minni Mittal  (was: Abhishek Modi)

> Publish federation events to ATSv2.
> ---
>
> Key: YARN-9383
> URL: https://issues.apache.org/jira/browse/YARN-9383
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Abhishek Modi
>Assignee: Minni Mittal
>Priority: Major
>
> With federation enabled, containers for a single application might get 
> spawned across multiple sub-clusters. This information right now is not 
> getting published to ATSv2. As part of this jira, we are going to publish 
> federation related info in container events to ATSv2.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11025) Implement distributed decommissioning

2021-12-20 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11025:

Description: 
This Jira proposes to add support for accepting request from disributed sources 
for putting nodes in decomissioning state. 

It proposes to add configurable provider and consumer class interface in 
nodemanager. NM can receive request to put node in decomissioning from any 
distributed sources using the provider class implementation and consumer 
classes can utilize it to set node status. 

Corresponding changes will be done at RM side to update node staate while 
update event is called at DecomissioningNodesWatcher.

  was:
This Jira proposes to add support for acceoting request from disributed sources 
for putting nodes in decomissioning state. 

It proposes to add configurable provider and consumer class interface in 
nodemanager. NM can receive request to put node in decomissioning from any 
distributed sources using the provider class implementation and consumer 
classes can utilize it to set node status. 

Corresponding changes will be done at RM side to update node staate while 
update event is called at DecomissioningNodesWatcher.


> Implement distributed decommissioning
> -
>
> Key: YARN-11025
> URL: https://issues.apache.org/jira/browse/YARN-11025
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> This Jira proposes to add support for accepting request from disributed 
> sources for putting nodes in decomissioning state. 
> It proposes to add configurable provider and consumer class interface in 
> nodemanager. NM can receive request to put node in decomissioning from any 
> distributed sources using the provider class implementation and consumer 
> classes can utilize it to set node status. 
> Corresponding changes will be done at RM side to update node staate while 
> update event is called at DecomissioningNodesWatcher.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11025) Implement distributed decommissioning

2021-12-20 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11025:

Description: 
This Jira proposes to add support for acceoting request from disributed sources 
for putting nodes in decomissioning state. 

It proposes to add configurable provider and consumer class interface in 
nodemanager. NM can receive request to put node in decomissioning from any 
distributed sources using the provider class implementation and consumer 
classes can utilize it to set node status. 

Corresponding changes will be done at RM side to update node staate while 
update event is called at DecomissioningNodesWatcher.

> Implement distributed decommissioning
> -
>
> Key: YARN-11025
> URL: https://issues.apache.org/jira/browse/YARN-11025
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> This Jira proposes to add support for acceoting request from disributed 
> sources for putting nodes in decomissioning state. 
> It proposes to add configurable provider and consumer class interface in 
> nodemanager. NM can receive request to put node in decomissioning from any 
> distributed sources using the provider class implementation and consumer 
> classes can utilize it to set node status. 
> Corresponding changes will be done at RM side to update node staate while 
> update event is called at DecomissioningNodesWatcher.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10995) Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy

2021-12-14 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-10995:
---

Assignee: Minni Mittal

> Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy
> -
>
> Key: YARN-10995
> URL: https://issues.apache.org/jira/browse/YARN-10995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Assignee: Minni Mittal
>Priority: Major
>
> GuaranteedOrZeroCapacityOverTimePolicy has a comparator class that orders 
> applications by their submit time. It gets the applications from the 
> RMContext and doesn't need any information from 
> GuaranteedOrZeroCapacityOverTimePolicy class, so this easily could be moved 
> to RMContext, so that the reference to the RMContext/SchedulerContext could 
> be removed from GuaranteedOrZeroCapacityOverTimePolicy.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10995) Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy

2021-12-14 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459317#comment-17459317
 ] 

Minni Mittal commented on YARN-10995:
-

Hey [~bteke], If you are not working, can I take up this JIRA? Thanks

> Move PendingApplicationComparator from GuaranteedOrZeroCapacityOverTimePolicy
> -
>
> Key: YARN-10995
> URL: https://issues.apache.org/jira/browse/YARN-10995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Benjamin Teke
>Priority: Major
>
> GuaranteedOrZeroCapacityOverTimePolicy has a comparator class that orders 
> applications by their submit time. It gets the applications from the 
> RMContext and doesn't need any information from 
> GuaranteedOrZeroCapacityOverTimePolicy class, so this easily could be moved 
> to RMContext, so that the reference to the RMContext/SchedulerContext could 
> be removed from GuaranteedOrZeroCapacityOverTimePolicy.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9967) Fix NodeManager failing to start when Hdfs Auxillary Jar is set

2021-12-14 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459310#comment-17459310
 ] 

Minni Mittal commented on YARN-9967:


Hey [~tarunparimi] , Can I take this Jira over, if you are not working on it ? 
Thanks

> Fix NodeManager failing to start when Hdfs Auxillary Jar is set
> ---
>
> Key: YARN-9967
> URL: https://issues.apache.org/jira/browse/YARN-9967
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: auxservices, nodemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Tarun Parimi
>Priority: Major
>
> Loading an auxiliary jar from a Hdfs location on a node manager fails with 
> ClassNotFound Exception
> {code:java}
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> classpath: []
> 2019-11-08 03:59:49,256 INFO org.apache.hadoop.util.ApplicationClassLoader: 
> system classes: [java., javax.accessibility., javax.activation., 
> javax.activity., javax.annotation., javax.annotation.processing., 
> javax.crypto., javax.imageio., javax.jws., javax.lang.model., 
> -javax.management.j2ee., javax.management., javax.naming., javax.net., 
> javax.print., javax.rmi., javax.script., -javax.security.auth.message., 
> javax.security.auth., javax.security.cert., javax.security.sasl., 
> javax.sound., javax.sql., javax.swing., javax.tools., javax.transaction., 
> -javax.xml.registry., -javax.xml.rpc., javax.xml., org.w3c.dom., 
> org.xml.sax., org.apache.commons.logging., org.apache.log4j., 
> -org.apache.hadoop.hbase., org.apache.hadoop., core-default.xml, 
> hdfs-default.xml, mapred-default.xml, yarn-default.xml]
> 2019-11-08 03:59:49,257 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed 
> in state INITED
> java.lang.ClassNotFoundException: org.apache.auxtest.AuxServiceFromHDFS
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:189)
>   at 
> org.apache.hadoop.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:157)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:348)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxiliaryServiceWithCustomClassLoader.getInstance(AuxiliaryServiceWithCustomClassLoader.java:169)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:270)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:321)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:478)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:936)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1016)
> {code}
> *Repro:*
> {code:java}
> 1. Prepare a custom auxiliary service jar and place it on hdfs
> [hdfs@yarndocker-1 yarn]$ cat TestShuffleHandler2.java 
> package org;
> import org.apache.hadoop.yarn.server.api.AuxiliaryService;
> import org.apache.hadoop.yarn.server.api.ApplicationInitializationContext;
> import org.apache.hadoop.yarn.server.api.ApplicationTerminationContext;
> import java.nio.ByteBuffer;
> public class TestShuffleHandler2 extends AuxiliaryService {
> public static final String MAPREDUCE_TEST_SHUFFLE_SERVICEID = 
> "test_shuffle2";
> public TestShuffleHandler2() {
>   super("testshuffle2");
> }
> @Override
> public void initializeApplication(ApplicationInitializationContext 
> context) {
> }
> @Override
> public void stopApplication(ApplicationTerminationContext context) {
> }
> @Override
> public synchronized ByteBuffer getMetaData() {
>   return ByteBuffer.allocate(0); 
> }
>   }
>   
> [hdfs@yarndocker-1 yarn]$ javac -d . -cp `hadoop classpath` 
> TestShuffleHandler2.java 
> [hdfs@yarndocker-1 yarn]$ jar

[jira] [Created] (YARN-11047) yarn resourcemanager and nodemanager unable to connect with Hbase when ATSv2 is enabled

2021-12-14 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11047:
---

 Summary: yarn resourcemanager and nodemanager unable to connect 
with Hbase when ATSv2 is enabled
 Key: YARN-11047
 URL: https://issues.apache.org/jira/browse/YARN-11047
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal


yarn resourcemanager command has following issue: 

2021-12-14 19:26:33,345 WARN [pool-28-thread-1] storage.TimelineStorageMonitor 
(TimelineStorageMonitor.java:run(95)) - Got failure attempting to read from 
HBase, assuming Storage is down java.lang.RuntimeException: 
org.apache.hadoop.hbase.DoNotRetryIOException: java.lang.NoSuchMethodError: 
[com.google.common.net|http://com.google.common.net/].HostAndPort.getHostText()Ljava/lang/String;
 at 
org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95)
 at 
org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)
 at 
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77)
 at 
org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-11047) yarn resourcemanager and nodemanager unable to connect with Hbase when ATSv2 is enabled

2021-12-14 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-11047:
---

Assignee: Minni Mittal

> yarn resourcemanager and nodemanager unable to connect with Hbase when ATSv2 
> is enabled
> ---
>
> Key: YARN-11047
> URL: https://issues.apache.org/jira/browse/YARN-11047
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> yarn resourcemanager command has following issue: 
> 2021-12-14 19:26:33,345 WARN [pool-28-thread-1] 
> storage.TimelineStorageMonitor (TimelineStorageMonitor.java:run(95)) - Got 
> failure attempting to read from HBase, assuming Storage is down 
> java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: 
> java.lang.NoSuchMethodError: 
> [com.google.common.net|http://com.google.common.net/].HostAndPort.getHostText()Ljava/lang/String;
>  at 
> org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95)
>  at 
> org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)
>  at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77)
>  at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11034) Add enhanced headroom in AllocateResponse

2021-12-10 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11034:

Description: Add enhanced headroom in allocate response. This provides a 
channel for RMs to return load information for AMRMProxy and decision making 
when rerouting resource requests.   (was: Add enhanced headroom in allocate 
response. This provides a channel for RMs to return load information for 
AMRMProxy and decision making when rerouting resource requests.)

> Add enhanced headroom in AllocateResponse
> -
>
> Key: YARN-11034
> URL: https://issues.apache.org/jira/browse/YARN-11034
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Add enhanced headroom in allocate response. This provides a channel for RMs 
> to return load information for AMRMProxy and decision making when rerouting 
> resource requests. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11037) Add configurable logic to split resource request to least loaded SC

2021-12-07 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11037:
---

 Summary: Add configurable logic to split resource request to least 
loaded SC
 Key: YARN-11037
 URL: https://issues.apache.org/jira/browse/YARN-11037
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11034) add enhanced headroom in AllocateResponse.

2021-12-07 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11034:
---

 Summary: add enhanced headroom in AllocateResponse.
 Key: YARN-11034
 URL: https://issues.apache.org/jira/browse/YARN-11034
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal
Assignee: Minni Mittal


Add enhanced headroom in allocate response. This provides a channel for RMs to 
return load information for AMRMProxy and decision making when rerouting 
resource requests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10201) Make AMRMProxyPolicy aware of SC load

2021-12-07 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-10201:
---

Assignee: Minni Mittal  (was: Young Chen)

> Make AMRMProxyPolicy aware of SC load
> -
>
> Key: YARN-10201
> URL: https://issues.apache.org/jira/browse/YARN-10201
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy
>Reporter: Young Chen
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10201.v0.patch, YARN-10201.v1.patch, 
> YARN-10201.v10.patch, YARN-10201.v2.patch, YARN-10201.v3.patch, 
> YARN-10201.v4.patch, YARN-10201.v5.patch, YARN-10201.v6.patch, 
> YARN-10201.v7.patch, YARN-10201.v8.patch, YARN-10201.v9.patch
>
>
> LocalityMulticastAMRMProxyPolicy is currently unaware of SC load when 
> splitting resource requests. We propose changes to the policy so that it 
> receives feedback from SCs and can load balance requests across the federated 
> cluster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10883) [Router]add more logs when applications killed by a user

2021-12-07 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-10883:
---

Assignee: Minni Mittal

> [Router]add more logs when applications killed by a user
> 
>
> Key: YARN-10883
> URL: https://issues.apache.org/jira/browse/YARN-10883
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: chaosju
>Assignee: Minni Mittal
>Priority: Major
>
> the Router should record the client address which killed the application



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11028) Add metrics for container allocation latency

2021-12-02 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11028:

Summary: Add metrics for container allocation latency  (was: Add metrics 
for Guaranteed container allocation latency)

> Add metrics for container allocation latency
> 
>
> Key: YARN-11028
> URL: https://issues.apache.org/jira/browse/YARN-11028
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11029) Improve logs to print askCount, allocatedCount in AMRMProxy service

2021-12-01 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11029:
---

 Summary: Improve logs to print askCount, allocatedCount in 
AMRMProxy service
 Key: YARN-11029
 URL: https://issues.apache.org/jira/browse/YARN-11029
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11028) Add metrics for Guaranteed container allocation latency

2021-12-01 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11028:
---

 Summary: Add metrics for Guaranteed container allocation latency
 Key: YARN-11028
 URL: https://issues.apache.org/jira/browse/YARN-11028
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-11027) Add ExecutionTypeRequest#compareTo method

2021-12-01 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-11027:
---

Assignee: Minni Mittal

> Add ExecutionTypeRequest#compareTo method 
> --
>
> Key: YARN-11027
> URL: https://issues.apache.org/jira/browse/YARN-11027
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11027) Add ExecutionTypeRequest#compareTo method

2021-12-01 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11027:
---

 Summary: Add ExecutionTypeRequest#compareTo method 
 Key: YARN-11027
 URL: https://issues.apache.org/jira/browse/YARN-11027
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11026) Make default AppPlacementAllocator configurable in n AppSchedulingInfo

2021-12-01 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11026:

Summary: Make default AppPlacementAllocator configurable in n 
AppSchedulingInfo  (was: Make default AppPlacementAllocator configurable)

> Make default AppPlacementAllocator configurable in n AppSchedulingInfo
> --
>
> Key: YARN-11026
> URL: https://issues.apache.org/jira/browse/YARN-11026
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11026) Make default AppPlacementAllocator configurable in AppSchedulingInfo

2021-12-01 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11026:

Summary: Make default AppPlacementAllocator configurable in 
AppSchedulingInfo  (was: Make default AppPlacementAllocator configurable in n 
AppSchedulingInfo)

> Make default AppPlacementAllocator configurable in AppSchedulingInfo
> 
>
> Key: YARN-11026
> URL: https://issues.apache.org/jira/browse/YARN-11026
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11026) Make default AppPlacementAllocator configurable

2021-11-30 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11026:
---

 Summary: Make default AppPlacementAllocator configurable
 Key: YARN-11026
 URL: https://issues.apache.org/jira/browse/YARN-11026
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11025) Implement distributed decommissioning

2021-11-30 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11025:

Summary: Implement distributed decommissioning  (was: Implement distributed 
maintenance )

> Implement distributed decommissioning
> -
>
> Key: YARN-11025
> URL: https://issues.apache.org/jira/browse/YARN-11025
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11025) Implement distributed maintenance

2021-11-30 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11025:
---

 Summary: Implement distributed maintenance 
 Key: YARN-11025
 URL: https://issues.apache.org/jira/browse/YARN-11025
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10174) Add colored policies to enable manual load balancing across sub clusters

2021-11-24 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-10174:
---

Assignee: Minni Mittal  (was: Young Chen)

> Add colored policies to enable manual load balancing across sub clusters
> 
>
> Key: YARN-10174
> URL: https://issues.apache.org/jira/browse/YARN-10174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Young Chen
>Assignee: Minni Mittal
>Priority: Major
>
> Add colored policies to enable manual load balancing across sub clusters



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9532) SLSRunner.run() throws a YarnException when it fails to create the output directory

2021-11-23 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-9532:
--

Assignee: Minni Mittal

> SLSRunner.run() throws a YarnException when it fails to create the output 
> directory
> ---
>
> Key: YARN-9532
> URL: https://issues.apache.org/jira/browse/YARN-9532
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Haicheng Chen
>Assignee: Minni Mittal
>Priority: Minor
>
> Dear YARN developers, we are developing a tool to detect exception-related 
> bugs in Java. Our prototype has spotted the following {{throw}} statement 
> whose exception class and error message indicate different error conditions.
>  
> Version: Hadoop-3.1.2
> File: 
> HADOOP-ROOT/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
> Line: 894
> {code:java}
> if (!outputFile.exists() && !outputFile.mkdirs()) {
>   System.err.println("ERROR: Cannot create output directory "
>   + outputFile.getAbsolutePath());
>   throw new YarnException("Cannot create output directory");
> }{code}
>  
> The exception is triggered when {{run()}} fails to create the output 
> directory (as indicated by the {{if}} condition and the error message). 
> However, throwing a {{YarnException}} is too general and makes accurate 
> exception handling more difficult. Since the error is related to I/O, 
> throwing an {{IOException}}, or wrapping an {{IOException}} could be better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11008) Support heterogeneous node types in SLS Runner

2021-11-22 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11008:

Issue Type: Improvement  (was: Task)

> Support heterogeneous node types in SLS Runner
> --
>
> Key: YARN-11008
> URL: https://issues.apache.org/jira/browse/YARN-11008
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8570) GPU support in SLS

2021-11-22 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447806#comment-17447806
 ] 

Minni Mittal commented on YARN-8570:


[~jhung], I would like to work on this Jira. Can I take this over ?

> GPU support in SLS
> --
>
> Key: YARN-8570
> URL: https://issues.apache.org/jira/browse/YARN-8570
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently resource requests in SLS only support memory and vcores. Since GPU 
> is natively supported by YARN, it will be useful to support requesting GPU 
> resources.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11008) Support heterogeneous node types in SLS Runner

2021-11-15 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11008:
---

 Summary: Support heterogeneous node types in SLS Runner
 Key: YARN-11008
 URL: https://issues.apache.org/jira/browse/YARN-11008
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10218) [GPG] Support HTTPS in GPG

2021-11-12 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442743#comment-17442743
 ] 

Minni Mittal commented on YARN-10218:
-

[~BilwaST] If you are not working, can I take up this Jira?

> [GPG] Support HTTPS in GPG
> --
>
> Key: YARN-10218
> URL: https://issues.apache.org/jira/browse/YARN-10218
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> HTTPS support in Router is handled as part of Jira YARN-10120. Https Rest 
> calls from GPG to Router must be supported 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10174) Add colored policies to enable manual load balancing across sub clusters

2021-11-12 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442739#comment-17442739
 ] 

Minni Mittal commented on YARN-10174:
-

[~youchen], if you are not working, can I take up this JIRA?

> Add colored policies to enable manual load balancing across sub clusters
> 
>
> Key: YARN-10174
> URL: https://issues.apache.org/jira/browse/YARN-10174
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Young Chen
>Assignee: Young Chen
>Priority: Major
>
> Add colored policies to enable manual load balancing across sub clusters



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11004) Refactor Audit Logger classes to move Audit Constants to a separate class

2021-11-11 Thread Minni Mittal (Jira)

Minni Mittal created YARN-11004:
---

 Summary: Refactor Audit Logger classes to move Audit Constants to 
a separate class
 Key: YARN-11004
 URL: https://issues.apache.org/jira/browse/YARN-11004
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-11004) Refactor Audit Logger classes to move Audit Constants to a separate class

2021-11-11 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-11004:
---

Assignee: Minni Mittal

> Refactor Audit Logger classes to move Audit Constants to a separate class
> -
>
> Key: YARN-11004
> URL: https://issues.apache.org/jira/browse/YARN-11004
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9853) Add number of paused containers in NodeInfo page.

2021-11-11 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-9853:
--

Assignee: Minni Mittal  (was: Abhishek Modi)

> Add number of paused containers in NodeInfo page.
> -
>
> Key: YARN-9853
> URL: https://issues.apache.org/jira/browse/YARN-9853
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10822) Containers going from New to Scheduled transition for killed container on recovery

2021-11-08 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10822:

Summary: Containers going from New to Scheduled transition for killed 
container on recovery  (was: Containers going from New to Scheduled transition 
even though container is killed before NM restart when NM recovery is enabled)

> Containers going from New to Scheduled transition for killed container on 
> recovery
> --
>
> Key: YARN-10822
> URL: https://issues.apache.org/jira/browse/YARN-10822
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10822.v1.patch
>
>
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from NEW to 
> LOCALIZING
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from LOCALIZING to 
> SCHEDULED
> INFO  [91] ContainerScheduler: Opportunistic container 
> container_e1171_1623422468672_2229_01_000738 will be queued at the NM.
> INFO  [127] ContainerManagerImpl: Stopping container with container Id: 
> container_e1171_1623422468672_2229_01_000738
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to 
> KILLING
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> INFO  [91] NMAuditLogger: USER=defaultcafor1stparty OPERATION=Container 
> Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS 
> APPID=application_1623422468672_2229 
> CONTAINERID=container_e1171_1623422468672_2229_01_000738
> INFO  [91] ApplicationImpl: Removing 
> container_e1171_1623422468672_2229_01_000738 from application 
> application_1623422468672_2229
> INFO  [91] ContainersMonitorImpl: Stopping resource-monitoring for 
> container_e1171_1623422468672_2229_01_000738
> INFO  [163] NodeStatusUpdaterImpl: Removed completed containers from NM 
> context:[container_e1171_1623422468672_2229_01_000738]
> NM restart happened and recovery is attempted
>  
> INFO  [1] ContainerManagerImpl: Recovering 
> container_e1171_1623422468672_2229_01_000738 in state QUEUED with exit code 
> -1000
> INFO  [1] ApplicationImpl: Adding 
> container_e1171_1623422468672_2229_01_000738 to application 
> application_1623422468672_2229
> INFO  [89] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from NEW to 
> SCHEDULED
> INFO  [89] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to 
> KILLING
> INFO  [89] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> Ideally, when container got killed before restart, it should finish the 
> container immediately. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10474) [JDK 12] TestAsyncDispatcher fails

2021-11-08 Thread Minni Mittal (Jira)

[jira] [Updated] (YARN-10999) Make NodeQueueLoadMonitor pluggable in ResourceManager

2021-11-02 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10999:

Description: Add support to make NodeQueueLoadMonitor as a pluggable 
service in Resource Manager.     (was: Add support to make NodeQueueLoadMonitor 
as a pluggable interface in NodeManager.   The default implementation 
NodeQueueLoadMonitorImpl should be used if the class is not set. )

> Make NodeQueueLoadMonitor pluggable in ResourceManager
> --
>
> Key: YARN-10999
> URL: https://issues.apache.org/jira/browse/YARN-10999
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> Add support to make NodeQueueLoadMonitor as a pluggable service in Resource 
> Manager.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10999) Make NodeQueueLoadMonitor pluggable in ResourceManager

2021-11-02 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10999:

Description: Add support to make NodeQueueLoadMonitor as a pluggable 
interface in NodeManager.   The default implementation NodeQueueLoadMonitorImpl 
should be used if the class is not set.   (was: Add support to 
NodeHealthCheckerService inorder to make it as a pluggable interface in 
NodeManager.   The default implementation NodeHealthCheckerServiceImpl should 
be used if the class is not set. )

> Make NodeQueueLoadMonitor pluggable in ResourceManager
> --
>
> Key: YARN-10999
> URL: https://issues.apache.org/jira/browse/YARN-10999
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> Add support to make NodeQueueLoadMonitor as a pluggable interface in 
> NodeManager.   The default implementation NodeQueueLoadMonitorImpl should be 
> used if the class is not set. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10999) Make NodeQueueLoadMonitor pluggable in ResourceManager

2021-11-02 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10999:

Description: Add support to NodeHealthCheckerService inorder to make it as 
a pluggable interface in NodeManager.   The default implementation 
NodeHealthCheckerServiceImpl should be used if the class is not set. 

> Make NodeQueueLoadMonitor pluggable in ResourceManager
> --
>
> Key: YARN-10999
> URL: https://issues.apache.org/jira/browse/YARN-10999
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> Add support to NodeHealthCheckerService inorder to make it as a pluggable 
> interface in NodeManager.   The default implementation 
> NodeHealthCheckerServiceImpl should be used if the class is not set. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10999) Make NodeQueueLoadMonitor pluggable in ResourceManager

2021-11-02 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10999:
---

 Summary: Make NodeQueueLoadMonitor pluggable in ResourceManager
 Key: YARN-10999
 URL: https://issues.apache.org/jira/browse/YARN-10999
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10998) Add YARN_ROUTER_HEAPSIZE to yarn-env for routers

2021-11-02 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10998:

Summary: Add YARN_ROUTER_HEAPSIZE to yarn-env for routers  (was: Add 
YARN_ROUTER_HEAPSIZE to yarn-env variables)

> Add YARN_ROUTER_HEAPSIZE to yarn-env for routers
> 
>
> Key: YARN-10998
> URL: https://issues.apache.org/jira/browse/YARN-10998
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
>
> Yarn services NM, RM etc have YARN_\{SERVICENAME}_HEAPSIZE variable defined, 
> we should have similar parameter for Router Service also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10998) Add YARN_ROUTER_HEAPSIZE to yarn-env variables

2021-11-02 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10998:

Summary: Add YARN_ROUTER_HEAPSIZE to yarn-env variables  (was: Adding 
YARN_ROUTER_HEAPSIZE to yarn-env variables)

> Add YARN_ROUTER_HEAPSIZE to yarn-env variables
> --
>
> Key: YARN-10998
> URL: https://issues.apache.org/jira/browse/YARN-10998
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
>
> Yarn services NM, RM etc have YARN_\{SERVICENAME}_HEAPSIZE variable defined, 
> we should have similar parameter for Router Service also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10998) Adding YARN_ROUTER_HEAPSIZE to yarn-env variables

2021-11-02 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10998:
---

 Summary: Adding YARN_ROUTER_HEAPSIZE to yarn-env variables
 Key: YARN-10998
 URL: https://issues.apache.org/jira/browse/YARN-10998
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Minni Mittal
Assignee: Minni Mittal


Yarn services NM, RM etc have YARN_\{SERVICENAME}_HEAPSIZE variable defined, we 
should have similar parameter for Router Service also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Moved] (YARN-10956) Add OpenTelemetry instrumentation code into YARN

2021-09-15 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal moved HADOOP-17911 to YARN-10956:
--

Key: YARN-10956  (was: HADOOP-17911)
Project: Hadoop YARN  (was: Hadoop Common)

>  Add OpenTelemetry instrumentation code into YARN
> -
>
> Key: YARN-10956
> URL: https://issues.apache.org/jira/browse/YARN-10956
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator

2021-07-28 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388909#comment-17388909
 ] 

Minni Mittal commented on YARN-10848:
-

Got it. For checking whether container fitsIn should just depend on available 
resource and requested resource (the way it is done for FairScheduler) and not 
on resource calculator. 

[~pbacsko], I've added the PR. Can you please review the patch ?

> Vcore allocation problem with DefaultResourceCalculator
> ---
>
> Key: YARN-10848
> URL: https://issues.apache.org/jira/browse/YARN-10848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Assignee: Minni Mittal
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestTooManyContainers.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating 
> containers even if we run out of vcores.
> CS checks the the available resources at two places. The first check is 
> {{CapacityScheduler.allocateContainerOnSingleNode()}}:
> {noformat}
> if (calculator.computeAvailableContainers(Resources
> .add(node.getUnallocatedResource(), 
> node.getTotalKillableResources()),
> minimumAllocation) <= 0) {
>   LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
>   + "available or preemptible resource for minimum allocation");
> {noformat}
> The second, which is more important, is located in 
> {{RegularContainerAllocator.assignContainer()}}:
> {noformat}
> if (!Resources.fitsIn(rc, capability, totalResource)) {
>   LOG.warn("Node : " + node.getNodeID()
>   + " does not have sufficient resource for ask : " + pendingAsk
>   + " node total capability : " + node.getTotalResource());
>   // Skip this locality request
>   ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
>   activitiesManager, node, application, schedulerKey,
>   ActivityDiagnosticConstant.
>   NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
>   + getResourceDiagnostics(capability, totalResource),
>   ActivityLevel.NODE);
>   return ContainerAllocation.LOCALITY_SKIPPED;
> }
> {noformat}
> Here, {{rc}} is the resource calculator instance, the other two values are:
> {noformat}
> Resource capability = pendingAsk.getPerAllocationResource();
> Resource available = node.getUnallocatedResource();
> {noformat}
> There is a repro unit test attatched to this case, which can demonstrate the 
> problem. The root cause is that we pass the resource calculator to 
> {{Resource.fitsIn()}}. Instead, we should use an overridden version, just 
> like in {{FSAppAttempt.assignContainer()}}:
> {noformat}
>// Can we allocate a container on this node?
> if (Resources.fitsIn(capability, available)) {
>   // Inform the application of the new container for this request
>   RMContainer allocatedContainer =
>   allocate(type, node, schedulerKey, pendingAsk,
>   reservedContainer);
> {noformat}
> In CS, if we switch to DominantResourceCalculator OR use 
> {{Resources.fitsIn()}} without the calculator in 
> {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit 
> test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator

2021-07-27 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388046#comment-17388046
 ] 

Minni Mittal edited comment on YARN-10848 at 7/27/21, 1:10 PM:
---

[~pbacsko], As per my understanding, DefaultResourceCalculator considers memory 
as the limiting resource.
{code:java}
private static final Set INSUFFICIENT_RESOURCE_NAME =
ImmutableSet.of(ResourceInformation.MEMORY_URI);
{code}
 As such, it will keep on allocating containers till we have memory available 
irrespective of the availability of the vcores.

In the test "TestTooManyContainers" you added, if we increase 
numRequestedContainers to 13, then it will allocate 11 containers and then will 
have
{code:java}
 This node 127.0.0.1:1234 doesn't have sufficient available or preemptible 
resource for minimum allocation
{code}
This looks like expected behavior to me.

Please help me with understanding the issue. 


was (Author: minni31):
[~pbacsko], As per my understanding, DefaultResourceCalculator considers memory 
as the limiting resource.

 
{code:java}
private static final Set INSUFFICIENT_RESOURCE_NAME =
ImmutableSet.of(ResourceInformation.MEMORY_URI);
{code}
 

As such, it will keep on allocating containers till we have memory available 
irrespective of the availability of the vcores.

In the test "TestTooManyContainers" ypu added, if we increase 
numRequestedContainers to 13, then it will allocate 11 containers and then will 
have
{code:java}
 This node 127.0.0.1:1234 doesn't have sufficient available or preemptible 
resource for minimum allocation
{code}
This looks like expected behavior to me.

Please help me with understanding the issue. 

> Vcore allocation problem with DefaultResourceCalculator
> ---
>
> Key: YARN-10848
> URL: https://issues.apache.org/jira/browse/YARN-10848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Assignee: Minni Mittal
>Priority: Major
> Attachments: TestTooManyContainers.java
>
>
> If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating 
> containers even if we run out of vcores.
> CS checks the the available resources at two places. The first check is 
> {{CapacityScheduler.allocateContainerOnSingleNode()}}:
> {noformat}
> if (calculator.computeAvailableContainers(Resources
> .add(node.getUnallocatedResource(), 
> node.getTotalKillableResources()),
> minimumAllocation) <= 0) {
>   LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
>   + "available or preemptible resource for minimum allocation");
> {noformat}
> The second, which is more important, is located in 
> {{RegularContainerAllocator.assignContainer()}}:
> {noformat}
> if (!Resources.fitsIn(rc, capability, totalResource)) {
>   LOG.warn("Node : " + node.getNodeID()
>   + " does not have sufficient resource for ask : " + pendingAsk
>   + " node total capability : " + node.getTotalResource());
>   // Skip this locality request
>   ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
>   activitiesManager, node, application, schedulerKey,
>   ActivityDiagnosticConstant.
>   NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
>   + getResourceDiagnostics(capability, totalResource),
>   ActivityLevel.NODE);
>   return ContainerAllocation.LOCALITY_SKIPPED;
> }
> {noformat}
> Here, {{rc}} is the resource calculator instance, the other two values are:
> {noformat}
> Resource capability = pendingAsk.getPerAllocationResource();
> Resource available = node.getUnallocatedResource();
> {noformat}
> There is a repro unit test attatched to this case, which can demonstrate the 
> problem. The root cause is that we pass the resource calculator to 
> {{Resource.fitsIn()}}. Instead, we should use an overridden version, just 
> like in {{FSAppAttempt.assignContainer()}}:
> {noformat}
>// Can we allocate a container on this node?
> if (Resources.fitsIn(capability, available)) {
>   // Inform the application of the new container for this request
>   RMContainer allocatedContainer =
>   allocate(type, node, schedulerKey, pendingAsk,
>   reservedContainer);
> {noformat}
> In CS, if we switch to DominantResourceCalculator OR use 
> {{Resources.fitsIn()}} without the calculator in 
> {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit 
> test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-i

[jira] [Commented] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator

2021-07-27 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388046#comment-17388046
 ] 

Minni Mittal commented on YARN-10848:
-

[~pbacsko], As per my understanding, DefaultResourceCalculator considers memory 
as the limiting resource.

 
{code:java}
private static final Set INSUFFICIENT_RESOURCE_NAME =
ImmutableSet.of(ResourceInformation.MEMORY_URI);
{code}
 

As such, it will keep on allocating containers till we have memory available 
irrespective of the availability of the vcores.

In the test "TestTooManyContainers" ypu added, if we increase 
numRequestedContainers to 13, then it will allocate 11 containers and then will 
have
{code:java}
 This node 127.0.0.1:1234 doesn't have sufficient available or preemptible 
resource for minimum allocation
{code}
This looks like expected behavior to me.

Please help me with understanding the issue. 

> Vcore allocation problem with DefaultResourceCalculator
> ---
>
> Key: YARN-10848
> URL: https://issues.apache.org/jira/browse/YARN-10848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Assignee: Minni Mittal
>Priority: Major
> Attachments: TestTooManyContainers.java
>
>
> If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating 
> containers even if we run out of vcores.
> CS checks the the available resources at two places. The first check is 
> {{CapacityScheduler.allocateContainerOnSingleNode()}}:
> {noformat}
> if (calculator.computeAvailableContainers(Resources
> .add(node.getUnallocatedResource(), 
> node.getTotalKillableResources()),
> minimumAllocation) <= 0) {
>   LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
>   + "available or preemptible resource for minimum allocation");
> {noformat}
> The second, which is more important, is located in 
> {{RegularContainerAllocator.assignContainer()}}:
> {noformat}
> if (!Resources.fitsIn(rc, capability, totalResource)) {
>   LOG.warn("Node : " + node.getNodeID()
>   + " does not have sufficient resource for ask : " + pendingAsk
>   + " node total capability : " + node.getTotalResource());
>   // Skip this locality request
>   ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
>   activitiesManager, node, application, schedulerKey,
>   ActivityDiagnosticConstant.
>   NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
>   + getResourceDiagnostics(capability, totalResource),
>   ActivityLevel.NODE);
>   return ContainerAllocation.LOCALITY_SKIPPED;
> }
> {noformat}
> Here, {{rc}} is the resource calculator instance, the other two values are:
> {noformat}
> Resource capability = pendingAsk.getPerAllocationResource();
> Resource available = node.getUnallocatedResource();
> {noformat}
> There is a repro unit test attatched to this case, which can demonstrate the 
> problem. The root cause is that we pass the resource calculator to 
> {{Resource.fitsIn()}}. Instead, we should use an overridden version, just 
> like in {{FSAppAttempt.assignContainer()}}:
> {noformat}
>// Can we allocate a container on this node?
> if (Resources.fitsIn(capability, available)) {
>   // Inform the application of the new container for this request
>   RMContainer allocatedContainer =
>   allocate(type, node, schedulerKey, pendingAsk,
>   reservedContainer);
> {noformat}
> In CS, if we switch to DominantResourceCalculator OR use 
> {{Resources.fitsIn()}} without the calculator in 
> {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit 
> test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9853) Add number of paused containers in NodeInfo page.

2021-07-09 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377908#comment-17377908
 ] 

Minni Mittal commented on YARN-9853:


Hey [~abmodi], Can I take up this Jira ?

> Add number of paused containers in NodeInfo page.
> -
>
> Key: YARN-9853
> URL: https://issues.apache.org/jira/browse/YARN-9853
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator

2021-07-08 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377412#comment-17377412
 ] 

Minni Mittal commented on YARN-10848:
-

Hey [~pbacsko], Can I take up this Jira if you are not working on this ?

> Vcore allocation problem with DefaultResourceCalculator
> ---
>
> Key: YARN-10848
> URL: https://issues.apache.org/jira/browse/YARN-10848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Priority: Major
> Attachments: TestTooManyContainers.java
>
>
> If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating 
> containers even if we run out of vcores.
> CS checks the the available resources at two places. The first check is 
> {{CapacityScheduler.allocateContainerOnSingleNode()}}:
> {noformat}
> if (calculator.computeAvailableContainers(Resources
> .add(node.getUnallocatedResource(), 
> node.getTotalKillableResources()),
> minimumAllocation) <= 0) {
>   LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
>   + "available or preemptible resource for minimum allocation");
> {noformat}
> The second, which is more important, is located in 
> {{RegularContainerAllocator.assignContainer()}}:
> {noformat}
> if (!Resources.fitsIn(rc, capability, totalResource)) {
>   LOG.warn("Node : " + node.getNodeID()
>   + " does not have sufficient resource for ask : " + pendingAsk
>   + " node total capability : " + node.getTotalResource());
>   // Skip this locality request
>   ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
>   activitiesManager, node, application, schedulerKey,
>   ActivityDiagnosticConstant.
>   NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
>   + getResourceDiagnostics(capability, totalResource),
>   ActivityLevel.NODE);
>   return ContainerAllocation.LOCALITY_SKIPPED;
> }
> {noformat}
> Here, {{rc}} is the resource calculator instance, the other two values are:
> {noformat}
> Resource capability = pendingAsk.getPerAllocationResource();
> Resource available = node.getUnallocatedResource();
> {noformat}
> There is a repro unit test attatched to this case, which can demonstrate the 
> problem. The root cause is that we pass the resource calculator to 
> {{Resource.fitsIn()}}. Instead, we should use an overridden version, just 
> like in {{FSAppAttempt.assignContainer()}}:
> {noformat}
>// Can we allocate a container on this node?
> if (Resources.fitsIn(capability, available)) {
>   // Inform the application of the new container for this request
>   RMContainer allocatedContainer =
>   allocate(type, node, schedulerKey, pendingAsk,
>   reservedContainer);
> {noformat}
> In CS, if we switch to DominantResourceCalculator OR use 
> {{Resources.fitsIn()}} without the calculator in 
> {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit 
> test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10459) containerLaunchedOnNode method not need to hold schedulerApptemt lock

2021-07-08 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10459:

Attachment: YARN-10459.v1.patch

> containerLaunchedOnNode method not need to hold schedulerApptemt lock 
> --
>
> Key: YARN-10459
> URL: https://issues.apache.org/jira/browse/YARN-10459
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0, 3.1.3
>Reporter: Ryan Wu
>Assignee: Minni Mittal
>Priority: Major
> Fix For: 3.2.1
>
> Attachments: YARN-10459.v1.patch
>
>
>  
> Now, the containerLaunchedOnNode method hold the SchedulerApplicationAttempt 
> writelock, but looking at the method, it does not change any field. And more 
> seriously, this will affect the scheduler.
> {code:java}
>  public void containerLaunchedOnNode(ContainerId containerId, NodeId nodeId) 
> { 
> // Inform the container 
> writelock.lock 
> try { 
>   RMContainer rmContainer = getRMContainer(containerId); 
>   if (rmContainer == null) { 
>   // Some unknown container sneaked into the system. Kill it.  
> rmContext.getDispatcher().getEventHandler().handle( new 
> RMNodeCleanContainerEvent(nodeId, containerId)); return; 
>   } 
>   rmContainer.handle( new RMContainerEvent(containerId, 
> RMContainerEventType.LAUNCHED)); 
>}finally { 
>   writeLock.unlock(); 
>} 
> }  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10459) containerLaunchedOnNode method not need to hold schedulerApptemt lock

2021-07-08 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-10459:
---

Assignee: Minni Mittal

> containerLaunchedOnNode method not need to hold schedulerApptemt lock 
> --
>
> Key: YARN-10459
> URL: https://issues.apache.org/jira/browse/YARN-10459
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0, 3.1.3
>Reporter: Ryan Wu
>Assignee: Minni Mittal
>Priority: Major
> Fix For: 3.2.1
>
>
>  
> Now, the containerLaunchedOnNode method hold the SchedulerApplicationAttempt 
> writelock, but looking at the method, it does not change any field. And more 
> seriously, this will affect the scheduler.
> {code:java}
>  public void containerLaunchedOnNode(ContainerId containerId, NodeId nodeId) 
> { 
> // Inform the container 
> writelock.lock 
> try { 
>   RMContainer rmContainer = getRMContainer(containerId); 
>   if (rmContainer == null) { 
>   // Some unknown container sneaked into the system. Kill it.  
> rmContext.getDispatcher().getEventHandler().handle( new 
> RMNodeCleanContainerEvent(nodeId, containerId)); return; 
>   } 
>   rmContainer.handle( new RMContainerEvent(containerId, 
> RMContainerEventType.LAUNCHED)); 
>}finally { 
>   writeLock.unlock(); 
>} 
> }  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10841) Fix token reset synchronization by making sure for UAM response token reset is done while in lock.

2021-07-08 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10841:

Attachment: YARN-10841.v1.patch

> Fix token reset synchronization by making sure for UAM response token reset 
> is done while in lock. 
> ---
>
> Key: YARN-10841
> URL: https://issues.apache.org/jira/browse/YARN-10841
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
> Attachments: YARN-10841.v1.patch
>
>
> *2021-06-24T10:11:39,465* [ERROR] [AMRM Heartbeater thread] 
> |impl.AMRMClientAsyncImpl|: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: amrmToken from UAM 
> cluster-0 should be null here
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor.allocate(FederationInterceptor.java:782)
>  
>  
> *2021-06-24T10:10:12,608* INFO  [616916] FederationInterceptor: Received new 
> UAM amrmToken with keyId 843616604 
> Hearbeatcallback sets token to null. But because of synchroniztion issue, it 
> happened after mergeAllocate is called. So, while allocate merge is happening 
> the value should get set to null and should have happened Inside lock



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10841) Fix token reset synchronization by making sure for UAM response token reset is done while in lock.

2021-07-08 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10841:

Description: 
*2021-06-24T10:11:39,465* [ERROR] [AMRM Heartbeater thread] 
|impl.AMRMClientAsyncImpl|: Exception on heartbeat
org.apache.hadoop.yarn.exceptions.YarnException: 
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: amrmToken from UAM 
cluster-0 should be null here
at 
org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor.allocate(FederationInterceptor.java:782)
 

 

*2021-06-24T10:10:12,608* INFO  [616916] FederationInterceptor: Received new 
UAM amrmToken with keyId 843616604 

Hearbeatcallback sets token to null. But because of synchroniztion issue, it 
happened after mergeAllocate is called. So, while allocate merge is happening 
the value should get set to null and should have happened Inside lock

> Fix token reset synchronization by making sure for UAM response token reset 
> is done while in lock. 
> ---
>
> Key: YARN-10841
> URL: https://issues.apache.org/jira/browse/YARN-10841
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
>
> *2021-06-24T10:11:39,465* [ERROR] [AMRM Heartbeater thread] 
> |impl.AMRMClientAsyncImpl|: Exception on heartbeat
> org.apache.hadoop.yarn.exceptions.YarnException: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: amrmToken from UAM 
> cluster-0 should be null here
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.FederationInterceptor.allocate(FederationInterceptor.java:782)
>  
>  
> *2021-06-24T10:10:12,608* INFO  [616916] FederationInterceptor: Received new 
> UAM amrmToken with keyId 843616604 
> Hearbeatcallback sets token to null. But because of synchroniztion issue, it 
> happened after mergeAllocate is called. So, while allocate merge is happening 
> the value should get set to null and should have happened Inside lock



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10841) Fix token reset synchronization by making sure for UAM response token reset is done while in lock.

2021-07-05 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10841:
---

 Summary: Fix token reset synchronization by making sure for UAM 
response token reset is done while in lock. 
 Key: YARN-10841
 URL: https://issues.apache.org/jira/browse/YARN-10841
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10822) Containers going from New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled

2021-06-15 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10822:

Attachment: YARN-10822.v1.patch

> Containers going from New to Scheduled transition even though container is 
> killed before NM restart when NM recovery is enabled
> ---
>
> Key: YARN-10822
> URL: https://issues.apache.org/jira/browse/YARN-10822
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10822.v1.patch
>
>
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from NEW to 
> LOCALIZING
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from LOCALIZING to 
> SCHEDULED
> INFO  [91] ContainerScheduler: Opportunistic container 
> container_e1171_1623422468672_2229_01_000738 will be queued at the NM.
> INFO  [127] ContainerManagerImpl: Stopping container with container Id: 
> container_e1171_1623422468672_2229_01_000738
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to 
> KILLING
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> INFO  [91] NMAuditLogger: USER=defaultcafor1stparty OPERATION=Container 
> Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS 
> APPID=application_1623422468672_2229 
> CONTAINERID=container_e1171_1623422468672_2229_01_000738
> INFO  [91] ApplicationImpl: Removing 
> container_e1171_1623422468672_2229_01_000738 from application 
> application_1623422468672_2229
> INFO  [91] ContainersMonitorImpl: Stopping resource-monitoring for 
> container_e1171_1623422468672_2229_01_000738
> INFO  [163] NodeStatusUpdaterImpl: Removed completed containers from NM 
> context:[container_e1171_1623422468672_2229_01_000738]
> NM restart happened and recovery is attempted
>  
> INFO  [1] ContainerManagerImpl: Recovering 
> container_e1171_1623422468672_2229_01_000738 in state QUEUED with exit code 
> -1000
> INFO  [1] ApplicationImpl: Adding 
> container_e1171_1623422468672_2229_01_000738 to application 
> application_1623422468672_2229
> INFO  [89] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from NEW to 
> SCHEDULED
> INFO  [89] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to 
> KILLING
> INFO  [89] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> Ideally, when container got killed before restart, it should finish the 
> container immediately. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10822) Containers going from New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled

2021-06-14 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10822:

Description: 
INFO  [91] ContainerImpl: Container 
container_e1171_1623422468672_2229_01_000738 transitioned from NEW to LOCALIZING

INFO  [91] ContainerImpl: Container 
container_e1171_1623422468672_2229_01_000738 transitioned from LOCALIZING to 
SCHEDULED

INFO  [91] ContainerScheduler: Opportunistic container 
container_e1171_1623422468672_2229_01_000738 will be queued at the NM.

INFO  [127] ContainerManagerImpl: Stopping container with container Id: 
container_e1171_1623422468672_2229_01_000738

INFO  [91] ContainerImpl: Container 
container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to 
KILLING

INFO  [91] ContainerImpl: Container 
container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL

INFO  [91] NMAuditLogger: USER=defaultcafor1stparty OPERATION=Container 
Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS 
APPID=application_1623422468672_2229 
CONTAINERID=container_e1171_1623422468672_2229_01_000738

INFO  [91] ApplicationImpl: Removing 
container_e1171_1623422468672_2229_01_000738 from application 
application_1623422468672_2229

INFO  [91] ContainersMonitorImpl: Stopping resource-monitoring for 
container_e1171_1623422468672_2229_01_000738

INFO  [163] NodeStatusUpdaterImpl: Removed completed containers from NM 
context:[container_e1171_1623422468672_2229_01_000738]


NM restart happened and recovery is attempted

 

INFO  [1] ContainerManagerImpl: Recovering 
container_e1171_1623422468672_2229_01_000738 in state QUEUED with exit code 
-1000

INFO  [1] ApplicationImpl: Adding container_e1171_1623422468672_2229_01_000738 
to application application_1623422468672_2229

INFO  [89] ContainerImpl: Container 
container_e1171_1623422468672_2229_01_000738 transitioned from NEW to SCHEDULED

INFO  [89] ContainerImpl: Container 
container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to 
KILLING

INFO  [89] ContainerImpl: Container 
container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL

Ideally, when container got killed before restart, it should finish the 
container immediately. 

> Containers going from New to Scheduled transition even though container is 
> killed before NM restart when NM recovery is enabled
> ---
>
> Key: YARN-10822
> URL: https://issues.apache.org/jira/browse/YARN-10822
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from NEW to 
> LOCALIZING
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from LOCALIZING to 
> SCHEDULED
> INFO  [91] ContainerScheduler: Opportunistic container 
> container_e1171_1623422468672_2229_01_000738 will be queued at the NM.
> INFO  [127] ContainerManagerImpl: Stopping container with container Id: 
> container_e1171_1623422468672_2229_01_000738
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to 
> KILLING
> INFO  [91] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> INFO  [91] NMAuditLogger: USER=defaultcafor1stparty OPERATION=Container 
> Finished - Killed TARGET=ContainerImpl RESULT=SUCCESS 
> APPID=application_1623422468672_2229 
> CONTAINERID=container_e1171_1623422468672_2229_01_000738
> INFO  [91] ApplicationImpl: Removing 
> container_e1171_1623422468672_2229_01_000738 from application 
> application_1623422468672_2229
> INFO  [91] ContainersMonitorImpl: Stopping resource-monitoring for 
> container_e1171_1623422468672_2229_01_000738
> INFO  [163] NodeStatusUpdaterImpl: Removed completed containers from NM 
> context:[container_e1171_1623422468672_2229_01_000738]
> NM restart happened and recovery is attempted
>  
> INFO  [1] ContainerManagerImpl: Recovering 
> container_e1171_1623422468672_2229_01_000738 in state QUEUED with exit code 
> -1000
> INFO  [1] ApplicationImpl: Adding 
> container_e1171_1623422468672_2229_01_000738 to application 
> application_1623422468672_2229
> INFO  [89] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from NEW to 
> SCHEDULED
> INFO  [89] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from SCHEDULED to 
> KILLING
> INFO  [89] ContainerImpl: Container 
> container_e1171_1623422468672_2229_01_000738 transitioned from KILLING to 
> CONTAINER_

[jira] [Created] (YARN-10822) Containers going to New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled

2021-06-14 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10822:
---

 Summary: Containers going to New to Scheduled transition even 
though container is killed before NM restart when NM recovery is enabled
 Key: YARN-10822
 URL: https://issues.apache.org/jira/browse/YARN-10822
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10822) Containers going from New to Scheduled transition even though container is killed before NM restart when NM recovery is enabled

2021-06-14 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10822:

Summary: Containers going from New to Scheduled transition even though 
container is killed before NM restart when NM recovery is enabled  (was: 
Containers going to New to Scheduled transition even though container is killed 
before NM restart when NM recovery is enabled)

> Containers going from New to Scheduled transition even though container is 
> killed before NM restart when NM recovery is enabled
> ---
>
> Key: YARN-10822
> URL: https://issues.apache.org/jira/browse/YARN-10822
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10815) Handle Invalid event: PAUSE_CONTAINER at Container state as SCHEDULED

2021-06-09 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10815:
---

 Summary: Handle Invalid event: PAUSE_CONTAINER at Container state 
as SCHEDULED
 Key: YARN-10815
 URL: https://issues.apache.org/jira/browse/YARN-10815
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10815) Handle Invalid event: PAUSE_CONTAINER at Container state SCHEDULED

2021-06-09 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10815:

Summary: Handle Invalid event: PAUSE_CONTAINER at Container state SCHEDULED 
 (was: Handle Invalid event: PAUSE_CONTAINER at Container state as SCHEDULED)

> Handle Invalid event: PAUSE_CONTAINER at Container state SCHEDULED
> --
>
> Key: YARN-10815
> URL: https://issues.apache.org/jira/browse/YARN-10815
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10478) Make RM-NM heartbeat scaling calculator pluggable

2021-06-08 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-10478:
---

Assignee: Minni Mittal

> Make RM-NM heartbeat scaling calculator pluggable
> -
>
> Key: YARN-10478
> URL: https://issues.apache.org/jira/browse/YARN-10478
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Reporter: Jim Brennan
>Assignee: Minni Mittal
>Priority: Minor
>
> [YARN-10475] adds a feature to enable scaling the interval for heartbeats 
> between the RM and NM based on CPU utilization.  [~bibinchundatt] suggested 
> that we make this pluggable so that other calculations can be used if desired.
> The configuration properties added in [YARN-10475] should be applicable to 
> any heartbeat calculator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-2614) Cleanup synchronized method in SchedulerApplicationAttempt

2021-06-08 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359397#comment-17359397
 ] 

Minni Mittal commented on YARN-2614:


Hey [~leftnoteasy], Can I work on this Jira ?

> Cleanup synchronized method in SchedulerApplicationAttempt
> --
>
> Key: YARN-2614
> URL: https://issues.apache.org/jira/browse/YARN-2614
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Wangda Tan
>Priority: Major
>
> According to discussions in YARN-2594, there're some methods in 
> SchedulerApplicationAttempt will be accessed by other modules, that will lead 
> to potential dead lock in RM, we should cleanup them as much as we can.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10459) containerLaunchedOnNode method not need to hold schedulerApptemt lock

2021-06-08 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359387#comment-17359387
 ] 

Minni Mittal commented on YARN-10459:
-

Hey [~jianliang.wu], Can I take up this Jira if you are not working on this ?

> containerLaunchedOnNode method not need to hold schedulerApptemt lock 
> --
>
> Key: YARN-10459
> URL: https://issues.apache.org/jira/browse/YARN-10459
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0, 3.1.3
>Reporter: Ryan Wu
>Priority: Major
> Fix For: 3.2.1
>
>
>  
> Now, the containerLaunchedOnNode method hold the SchedulerApplicationAttempt 
> writelock, but looking at the method, it does not change any field. And more 
> seriously, this will affect the scheduler.
> {code:java}
>  public void containerLaunchedOnNode(ContainerId containerId, NodeId nodeId) 
> { 
> // Inform the container 
> writelock.lock 
> try { 
>   RMContainer rmContainer = getRMContainer(containerId); 
>   if (rmContainer == null) { 
>   // Some unknown container sneaked into the system. Kill it.  
> rmContext.getDispatcher().getEventHandler().handle( new 
> RMNodeCleanContainerEvent(nodeId, containerId)); return; 
>   } 
>   rmContainer.handle( new RMContainerEvent(containerId, 
> RMContainerEventType.LAUNCHED)); 
>}finally { 
>   writeLock.unlock(); 
>} 
> }  
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-9910) Make private localizer download resources in parallel

2021-06-07 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-9910:
--

Assignee: Minni Mittal  (was: Abhishek Modi)

> Make private localizer download resources in parallel
> -
>
> Key: YARN-9910
> URL: https://issues.apache.org/jira/browse/YARN-9910
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Abhishek Modi
>Assignee: Minni Mittal
>Priority: Major
>
> Currently private localizer uses a single threaded pool to do the 
> localization. As part of this jira, private localizer will create a fixed 
> threadpool of configurable number of threads for localization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10683) Add total resource in NodeManager metrics

2021-03-23 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10683:

Attachment: YARN-10683.v1.patch

> Add total resource in NodeManager metrics
> -
>
> Key: YARN-10683
> URL: https://issues.apache.org/jira/browse/YARN-10683
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
> Attachments: YARN-10683.v1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

2021-03-23 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10518:

Description: This Jira deals with updating NodeManager metrics with custom 
resource types. It includes allocated, available  resources.  (was: This Jira 
deals with updating NodeManager metrics with custom resource types. It includes 
allocated, available and total resources.)

> Add metrics for custom resource types in NodeManagerMetrics 
> 
>
> Key: YARN-10518
> URL: https://issues.apache.org/jira/browse/YARN-10518
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10518.v1.patch
>
>
> This Jira deals with updating NodeManager metrics with custom resource types. 
> It includes allocated, available  resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

2021-03-23 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10518:

Attachment: YARN-10518.v1.patch

> Add metrics for custom resource types in NodeManagerMetrics 
> 
>
> Key: YARN-10518
> URL: https://issues.apache.org/jira/browse/YARN-10518
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10518.v1.patch
>
>
> This Jira deals with updating NodeManager metrics with custom resource types. 
> It includes allocated, available and total resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10683) Add total resource in NodeManager metrics

2021-03-09 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-10683:
---

Assignee: Minni Mittal

> Add total resource in NodeManager metrics
> -
>
> Key: YARN-10683
> URL: https://issues.apache.org/jira/browse/YARN-10683
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10683) Add total resource in NodeManager metrics

2021-03-09 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10683:
---

 Summary: Add total resource in NodeManager metrics
 Key: YARN-10683
 URL: https://issues.apache.org/jira/browse/YARN-10683
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10569) Add metrics for success/failure/latency in ResourceLocalization

2021-01-11 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10569:

Description: This Jira deals with updating NodeManager metrics with sucess, 
failure, pending and latency stats for ResourceLocalization Service.

> Add metrics for success/failure/latency in ResourceLocalization
> ---
>
> Key: YARN-10569
> URL: https://issues.apache.org/jira/browse/YARN-10569
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> This Jira deals with updating NodeManager metrics with sucess, failure, 
> pending and latency stats for ResourceLocalization Service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10569) Add metrics for success/failure/latency in ResourceLocalization

2021-01-11 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10569:
---

 Summary: Add metrics for success/failure/latency in 
ResourceLocalization
 Key: YARN-10569
 URL: https://issues.apache.org/jira/browse/YARN-10569
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8859) Add audit logs for router service

2021-01-06 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-8859:
---
Attachment: YARN-8859.v3.patch

> Add audit logs for router service
> -
>
> Key: YARN-8859
> URL: https://issues.apache.org/jira/browse/YARN-8859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: router
>Reporter: Bibin Chundatt
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8859.v1.patch, YARN-8859.v2.patch, 
> YARN-8859.v3.patch
>
>
> Similar to all other yarn services. 
> RouterClientRMService and RouterWebServices api/rest call should have Audit 
> logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7898) [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router

2021-01-06 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-7898:
---
Attachment: YARN-7898-YARN-7402.v9.patch

> [FederationStateStore] Create a proxy chain for FederationStateStore API in 
> the Router
> --
>
> Key: YARN-7898
> URL: https://issues.apache.org/jira/browse/YARN-7898
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Minni Mittal
>Priority: Major
> Attachments: StateStoreProxy StressTest.jpg, 
> YARN-7898-YARN-7402.proto.patch, YARN-7898-YARN-7402.v1.patch, 
> YARN-7898-YARN-7402.v2.patch, YARN-7898-YARN-7402.v3.patch, 
> YARN-7898-YARN-7402.v4.patch, YARN-7898-YARN-7402.v5.patch, 
> YARN-7898-YARN-7402.v6.patch, YARN-7898-YARN-7402.v7.patch, 
> YARN-7898-YARN-7402.v8.patch, YARN-7898-YARN-7402.v9.patch, YARN-7898.v7.patch
>
>
> As detailed in the proposal in the umbrella JIRA, we are introducing a new 
> component that routes client request to appropriate FederationStateStore. 
> This JIRA tracks the creation of a proxy for FederationStateStore in the 
> Router.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8859) Add audit logs for router service

2021-01-06 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-8859:
---
Attachment: YARN-8859.v2.patch

> Add audit logs for router service
> -
>
> Key: YARN-8859
> URL: https://issues.apache.org/jira/browse/YARN-8859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: router
>Reporter: Bibin Chundatt
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8859.v1.patch, YARN-8859.v2.patch
>
>
> Similar to all other yarn services. 
> RouterClientRMService and RouterWebServices api/rest call should have Audit 
> logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7898) [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router

2020-12-24 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-7898:
---
Attachment: YARN-7898-YARN-7402.v8.patch

> [FederationStateStore] Create a proxy chain for FederationStateStore API in 
> the Router
> --
>
> Key: YARN-7898
> URL: https://issues.apache.org/jira/browse/YARN-7898
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Minni Mittal
>Priority: Major
> Attachments: StateStoreProxy StressTest.jpg, 
> YARN-7898-YARN-7402.proto.patch, YARN-7898-YARN-7402.v1.patch, 
> YARN-7898-YARN-7402.v2.patch, YARN-7898-YARN-7402.v3.patch, 
> YARN-7898-YARN-7402.v4.patch, YARN-7898-YARN-7402.v5.patch, 
> YARN-7898-YARN-7402.v6.patch, YARN-7898-YARN-7402.v7.patch, 
> YARN-7898-YARN-7402.v8.patch, YARN-7898.v7.patch
>
>
> As detailed in the proposal in the umbrella JIRA, we are introducing a new 
> component that routes client request to appropriate FederationStateStore. 
> This JIRA tracks the creation of a proxy for FederationStateStore in the 
> Router.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8859) Add audit logs for router service

2020-12-23 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-8859:
---
Attachment: YARN-8859.v1.patch

> Add audit logs for router service
> -
>
> Key: YARN-8859
> URL: https://issues.apache.org/jira/browse/YARN-8859
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: router
>Reporter: Bibin Chundatt
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8859.v1.patch
>
>
> Similar to all other yarn services. 
> RouterClientRMService and RouterWebServices api/rest call should have Audit 
> logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService

2020-12-23 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254076#comment-17254076
 ] 

Minni Mittal commented on YARN-8529:


[~bibinchundatt] Can you please review the patch  ?

> Add timeout to RouterWebServiceUtil#invokeRMWebService
> --
>
> Key: YARN-8529
> URL: https://issues.apache.org/jira/browse/YARN-8529
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8529.v1.patch, YARN-8529.v10.patch, 
> YARN-8529.v11.patch, YARN-8529.v2.patch, YARN-8529.v3.patch, 
> YARN-8529.v4.patch, YARN-8529.v5.patch, YARN-8529.v6.patch, 
> YARN-8529.v7.patch, YARN-8529.v8.patch, YARN-8529.v9.patch
>
>
> {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. 
> This should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService

2020-12-22 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-8529:
---
Attachment: YARN-8529.v11.patch

> Add timeout to RouterWebServiceUtil#invokeRMWebService
> --
>
> Key: YARN-8529
> URL: https://issues.apache.org/jira/browse/YARN-8529
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8529.v1.patch, YARN-8529.v10.patch, 
> YARN-8529.v11.patch, YARN-8529.v2.patch, YARN-8529.v3.patch, 
> YARN-8529.v4.patch, YARN-8529.v5.patch, YARN-8529.v6.patch, 
> YARN-8529.v7.patch, YARN-8529.v8.patch, YARN-8529.v9.patch
>
>
> {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. 
> This should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-18 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10519:

Attachment: YARN-10519.v7.patch

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch, 
> YARN-10519.v6.patch, YARN-10519.v7.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-18 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10519:

Attachment: YARN-10519.v6.patch

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch, 
> YARN-10519.v6.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService

2020-12-18 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-8529:
---
Attachment: YARN-8529.v10.patch

> Add timeout to RouterWebServiceUtil#invokeRMWebService
> --
>
> Key: YARN-8529
> URL: https://issues.apache.org/jira/browse/YARN-8529
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8529.v1.patch, YARN-8529.v10.patch, 
> YARN-8529.v2.patch, YARN-8529.v3.patch, YARN-8529.v4.patch, 
> YARN-8529.v5.patch, YARN-8529.v6.patch, YARN-8529.v7.patch, 
> YARN-8529.v8.patch, YARN-8529.v9.patch
>
>
> {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. 
> This should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-18 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251590#comment-17251590
 ] 

Minni Mittal commented on YARN-10519:
-

[~bibinchundatt], Can you please review the recent patch ?

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-17 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17251517#comment-17251517
 ] 

Minni Mittal commented on YARN-10519:
-

I've addressed the comments for new line and changing visibility in new patch. 

For UTs, reference in QueueMetrics is required. 

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-17 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10519:

Attachment: YARN-10519.v5.patch

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch, YARN-10519.v4.patch, YARN-10519.v5.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-14 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10519:

Attachment: YARN-10519.v4.patch

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch, YARN-10519.v4.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10523) Apps Pending Metrics can have incorrect value on RM recovery restart because of Unmanaged apps.

2020-12-09 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal reassigned YARN-10523:
---

Assignee: Minni Mittal

> Apps Pending Metrics can have incorrect value on RM recovery restart because 
> of Unmanaged apps.
> ---
>
> Key: YARN-10523
> URL: https://issues.apache.org/jira/browse/YARN-10523
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> This Jira handles the following scenario for AppPending metric on RM restart 
> when recovery is enabled:
> AppsPending Metrics increments for each application which is final state is 
> none on RM restart. For applications where there is a container to recover, 
> those applications get decrement from the metric. 
> For unmanaged applications when there is no container to recover, this metric 
> doesn't get decrement, which makes the value of the metric incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10523) Apps Pending Metrics can have incorrect value on RM recovery restart because of Unmanaged apps.

2020-12-09 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10523:
---

 Summary: Apps Pending Metrics can have incorrect value on RM 
recovery restart because of Unmanaged apps.
 Key: YARN-10523
 URL: https://issues.apache.org/jira/browse/YARN-10523
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Minni Mittal


This Jira handles the following scenario for AppPending metric on RM restart 
when recovery is enabled:

AppsPending Metrics increments for each application which is final state is 
none on RM restart. For applications where there is a container to recover, 
those applications get decrement from the metric. 

For unmanaged applications when there is no container to recover, this metric 
doesn't get decrement, which makes the value of the metric incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-09 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246413#comment-17246413
 ] 

Minni Mittal commented on YARN-10519:
-

[~bibinchundatt] Can you please review the patch?

 

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-08 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10519:

Attachment: YARN-10519.v3.patch

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch, 
> YARN-10519.v3.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-08 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245977#comment-17245977
 ] 

Minni Mittal commented on YARN-10519:
-

Thanks [~bibinchundatt] for the review. Addressed the comment in the second 
patch.

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-08 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10519:

Attachment: YARN-10519.v2.patch

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch, YARN-10519.v2.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-07 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10519:

Attachment: YARN-10519.v1.patch

> Refactor QueueMetricsForCustomResources class to move to yarn-common package
> 
>
> Key: YARN-10519
> URL: https://issues.apache.org/jira/browse/YARN-10519
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-10519.v1.patch
>
>
> Refactor the code for QueueMetricsForCustomResources to move the base classes 
> to yarn-common package. This helps in reusing the class in adding custom 
> resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10519) Refactor QueueMetricsForCustomResources class to move to yarn-common package

2020-12-07 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10519:
---

 Summary: Refactor QueueMetricsForCustomResources class to move to 
yarn-common package
 Key: YARN-10519
 URL: https://issues.apache.org/jira/browse/YARN-10519
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Minni Mittal
Assignee: Minni Mittal


Refactor the code for QueueMetricsForCustomResources to move the base classes 
to yarn-common package. This helps in reusing the class in adding custom 
resource types at NM level also. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

2020-12-07 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-10518:

Description: This Jira deals with updating NodeManager metrics with custom 
resource types. It includes allocated, available and total resources.

> Add metrics for custom resource types in NodeManagerMetrics 
> 
>
> Key: YARN-10518
> URL: https://issues.apache.org/jira/browse/YARN-10518
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> This Jira deals with updating NodeManager metrics with custom resource types. 
> It includes allocated, available and total resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10518) Add metrics for custom resource types in NodeManagerMetrics

2020-12-07 Thread Minni Mittal (Jira)

Minni Mittal created YARN-10518:
---

 Summary: Add metrics for custom resource types in 
NodeManagerMetrics 
 Key: YARN-10518
 URL: https://issues.apache.org/jira/browse/YARN-10518
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Minni Mittal
Assignee: Minni Mittal






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7898) [FederationStateStore] Create a proxy chain for FederationStateStore API in the Router

2020-02-29 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-7898:
---
Attachment: YARN-7898-YARN-7402.v7.patch

> [FederationStateStore] Create a proxy chain for FederationStateStore API in 
> the Router
> --
>
> Key: YARN-7898
> URL: https://issues.apache.org/jira/browse/YARN-7898
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Minni Mittal
>Priority: Major
> Attachments: StateStoreProxy StressTest.jpg, 
> YARN-7898-YARN-7402.proto.patch, YARN-7898-YARN-7402.v1.patch, 
> YARN-7898-YARN-7402.v2.patch, YARN-7898-YARN-7402.v3.patch, 
> YARN-7898-YARN-7402.v4.patch, YARN-7898-YARN-7402.v5.patch, 
> YARN-7898-YARN-7402.v6.patch, YARN-7898-YARN-7402.v7.patch, YARN-7898.v7.patch
>
>
> As detailed in the proposal in the umbrella JIRA, we are introducing a new 
> component that routes client request to appropriate FederationStateStore. 
> This JIRA tracks the creation of a proxy for FederationStateStore in the 
> Router.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService

2020-02-27 Thread Minni Mittal (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047221#comment-17047221
 ] 

Minni Mittal commented on YARN-8529:


[~bibinchundatt] [~elgoiri] Can you please review the patch ?

> Add timeout to RouterWebServiceUtil#invokeRMWebService
> --
>
> Key: YARN-8529
> URL: https://issues.apache.org/jira/browse/YARN-8529
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch, 
> YARN-8529.v3.patch, YARN-8529.v4.patch, YARN-8529.v5.patch, 
> YARN-8529.v6.patch, YARN-8529.v7.patch, YARN-8529.v8.patch, YARN-8529.v9.patch
>
>
> {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. 
> This should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService

2020-02-27 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-8529:
---
Attachment: YARN-8529.v9.patch

> Add timeout to RouterWebServiceUtil#invokeRMWebService
> --
>
> Key: YARN-8529
> URL: https://issues.apache.org/jira/browse/YARN-8529
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch, 
> YARN-8529.v3.patch, YARN-8529.v4.patch, YARN-8529.v5.patch, 
> YARN-8529.v6.patch, YARN-8529.v7.patch, YARN-8529.v8.patch, YARN-8529.v9.patch
>
>
> {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. 
> This should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8529) Add timeout to RouterWebServiceUtil#invokeRMWebService

2020-02-19 Thread Minni Mittal (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-8529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-8529:
---
Attachment: YARN-8529.v8.patch

> Add timeout to RouterWebServiceUtil#invokeRMWebService
> --
>
> Key: YARN-8529
> URL: https://issues.apache.org/jira/browse/YARN-8529
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Minni Mittal
>Priority: Major
> Attachments: YARN-8529.v1.patch, YARN-8529.v2.patch, 
> YARN-8529.v3.patch, YARN-8529.v4.patch, YARN-8529.v5.patch, 
> YARN-8529.v6.patch, YARN-8529.v7.patch, YARN-8529.v8.patch
>
>
> {{RouterWebServiceUtil#invokeRMWebService}} currently has a fixed timeout. 
> This should be configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 111 matches

Mail list logo