[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract

2021-12-20 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463003#comment-17463003
 ] 

Andras Gyori commented on YARN-10178:
-

Thank you [~epayne] for the thorough review, this is indeed convincing. Shall I 
ask someone to check it out and commit this?

> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract
> --
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch, 
> YARN-10178.006.patch, YARN-10178.branch-2.10.001.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = System.nanoTime();
>   ResourceCommitRequest request =
>   (ResourceCommitRequest) r;
>  
>   ...
>   boolean isSuccess = false;
>   if (attemptId != null) {
> FiCaSchedulerApp app = getApplicationAttempt(attemptId);
> // Required sanity check for attemptId - when async-scheduling enabled,
> // proposal might be outdated if AM failover just finished
> // and proposal queue was not be consumed in time
> if (app != null && 

[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract

2021-12-20 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462883#comment-17462883
 ] 

Eric Payne commented on YARN-10178:
---

No significant difference between most of the performance test suites in 
TestCapacitySchedulerPerf when compared between with and without this patch. 
One suite, however, showed _improvement_ with the patch:

With 10 threads enabled, 200 apps on 100% of 50 queues, this test showed an 
average 2x improvement after applying this patch.

Since there doesn't seem to be any other noticeable differences among the other 
test suites, I give my
+1.

Thank you [~tuyu], [~zhuqi], [~gandras], and others for your good work on this 
issue. This is a vital fix for a serious flaw.

> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract
> --
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch, 
> YARN-10178.006.patch, YARN-10178.branch-2.10.001.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = 

[jira] [Commented] (YARN-10178) Global Scheduler async thread crash caused by 'Comparison method violates its general contract

2021-12-20 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462835#comment-17462835
 ] 

Eric Payne commented on YARN-10178:
---

[~gandras], I have reviewed the trunk and branch-2.10 patches. The code changes 
look good to me. I have manually tested the branch-2.10 patch multiple times in 
a 100-node cluster and found it to be stable with no thread crashes.

I am currently running the TestCapacitySchedulerPerf performance tests with and 
without the patch, with and without threading enabled. I will post the results 
later today.



> Global Scheduler async thread crash caused by 'Comparison method violates its 
> general contract
> --
>
> Key: YARN-10178
> URL: https://issues.apache.org/jira/browse/YARN-10178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.2.1
>Reporter: tuyu
>Assignee: Qi Zhu
>Priority: Major
> Attachments: YARN-10178.001.patch, YARN-10178.002.patch, 
> YARN-10178.003.patch, YARN-10178.004.patch, YARN-10178.005.patch, 
> YARN-10178.006.patch, YARN-10178.branch-2.10.001.patch
>
>
> Global Scheduler Async Thread crash stack
> {code:java}
> ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: 
> Comparison method violates its general contract!  
>at 
> java.util.TimSort.mergeHi(TimSort.java:899)
> at java.util.TimSort.mergeAt(TimSort.java:516)
> at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
> at java.util.TimSort.sort(TimSort.java:254)
> at java.util.Arrays.sort(Arrays.java:1512)
> at java.util.ArrayList.sort(ArrayList.java:1462)
> at java.util.Collections.sort(Collections.java:177)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616)
> {code}
> JAVA 8 Arrays.sort default use timsort algo, and timsort has  few require 
> {code:java}
> 1.x.compareTo(y) != y.compareTo(x)
> 2.x>y,y>z --> x > z
> 3.x=y, x.compareTo(z) == y.compareTo(z)
> {code}
> if not Arrays paramters not satify this require,TimSort will throw 
> 'java.lang.IllegalArgumentException'
> look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know 
> Capacity Scheduler use this these queue resource usage to compare
> {code:java}
> AbsoluteUsedCapacity
> UsedCapacity
> ConfiguredMinResource
> AbsoluteCapacity
> {code}
> In Capacity Scheduler Global Scheduler AsyncThread use 
> PriorityUtilizationQueueOrderingPolicy function to choose queue to assign 
> container,and construct a CSAssignment struct, and use 
> submitResourceCommitRequest function add CSAssignment to backlogs
> ResourceCommitterService  will tryCommit this CSAssignment,look tryCommit 
> function,there will update queue resource usage
> {code:java}
> public boolean tryCommit(Resource cluster, ResourceCommitRequest r,
> boolean updatePending) {
>   long commitStart = System.nanoTime();
>   ResourceCommitRequest request =
>   (ResourceCommitRequest) r;
>  
>   ...
>   boolean isSuccess = false;
>   if (attemptId != null) {
> 

[jira] [Updated] (YARN-10503) Support queue capacity in terms of absolute resources with custom resourceType.

2021-12-20 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10503:
--
Fix Version/s: 3.2.4

> Support queue capacity in terms of absolute resources with custom 
> resourceType.
> ---
>
> Key: YARN-10503
> URL: https://issues.apache.org/jira/browse/YARN-10503
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Qi Zhu
>Assignee: Qi Zhu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.1, 3.2.4
>
> Attachments: YARN-10503-branch-3.3.010.patch, YARN-10503.001.patch, 
> YARN-10503.002.patch, YARN-10503.003.patch, YARN-10503.004.patch, 
> YARN-10503.005.patch, YARN-10503.006.patch, YARN-10503.007.patch, 
> YARN-10503.008.patch, YARN-10503.009.patch, YARN-10503.010.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Now the absolute resources are memory and cores.
> {code:java}
> /**
>  * Different resource types supported.
>  */
> public enum AbsoluteResourceType {
>   MEMORY, VCORES;
> }{code}
> But in our GPU production clusters, we need to support more resourceTypes.
> It's very import for cluster scaling when with different resourceType 
> absolute demands.
>  
> This Jira will handle GPU first.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11025) Implement distributed decommissioning

2021-12-20 Thread Minni Mittal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11025:

Description: 
This Jira proposes to add support for accepting request from disributed sources 
for putting nodes in decomissioning state. 

It proposes to add configurable provider and consumer class interface in 
nodemanager. NM can receive request to put node in decomissioning from any 
distributed sources using the provider class implementation and consumer 
classes can utilize it to set node status. 

Corresponding changes will be done at RM side to update node staate while 
update event is called at DecomissioningNodesWatcher.

  was:
This Jira proposes to add support for acceoting request from disributed sources 
for putting nodes in decomissioning state. 

It proposes to add configurable provider and consumer class interface in 
nodemanager. NM can receive request to put node in decomissioning from any 
distributed sources using the provider class implementation and consumer 
classes can utilize it to set node status. 

Corresponding changes will be done at RM side to update node staate while 
update event is called at DecomissioningNodesWatcher.


> Implement distributed decommissioning
> -
>
> Key: YARN-11025
> URL: https://issues.apache.org/jira/browse/YARN-11025
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> This Jira proposes to add support for accepting request from disributed 
> sources for putting nodes in decomissioning state. 
> It proposes to add configurable provider and consumer class interface in 
> nodemanager. NM can receive request to put node in decomissioning from any 
> distributed sources using the provider class implementation and consumer 
> classes can utilize it to set node status. 
> Corresponding changes will be done at RM side to update node staate while 
> update event is called at DecomissioningNodesWatcher.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11025) Implement distributed decommissioning

2021-12-20 Thread Minni Mittal (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Minni Mittal updated YARN-11025:

Description: 
This Jira proposes to add support for acceoting request from disributed sources 
for putting nodes in decomissioning state. 

It proposes to add configurable provider and consumer class interface in 
nodemanager. NM can receive request to put node in decomissioning from any 
distributed sources using the provider class implementation and consumer 
classes can utilize it to set node status. 

Corresponding changes will be done at RM side to update node staate while 
update event is called at DecomissioningNodesWatcher.

> Implement distributed decommissioning
> -
>
> Key: YARN-11025
> URL: https://issues.apache.org/jira/browse/YARN-11025
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Minni Mittal
>Assignee: Minni Mittal
>Priority: Major
>
> This Jira proposes to add support for acceoting request from disributed 
> sources for putting nodes in decomissioning state. 
> It proposes to add configurable provider and consumer class interface in 
> nodemanager. NM can receive request to put node in decomissioning from any 
> distributed sources using the provider class implementation and consumer 
> classes can utilize it to set node status. 
> Corresponding changes will be done at RM side to update node staate while 
> update event is called at DecomissioningNodesWatcher.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11050) Typo in method name: RMWebServiceProtocol#removeFromCluserNodeLabels

2021-12-20 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-11050:
--
Fix Version/s: 3.4.0

> Typo in method name: RMWebServiceProtocol#removeFromCluserNodeLabels
> 
>
> Key: YARN-11050
> URL: https://issues.apache.org/jira/browse/YARN-11050
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Trivial
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-11047) ResourceManager and NodeManager unable to connect to Hbase when ATSv2 is enabled

2021-12-20 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved YARN-11047.

Resolution: Fixed

> ResourceManager and NodeManager unable to connect to Hbase when ATSv2 is 
> enabled
> 
>
> Key: YARN-11047
> URL: https://issues.apache.org/jira/browse/YARN-11047
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> yarn resourcemanager command has following issue: 
> 2021-12-14 19:26:33,345 WARN [pool-28-thread-1] 
> storage.TimelineStorageMonitor (TimelineStorageMonitor.java:run(95)) - Got 
> failure attempting to read from HBase, assuming Storage is down 
> java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: 
> java.lang.NoSuchMethodError: 
> [com.google.common.net|http://com.google.common.net/].HostAndPort.getHostText()Ljava/lang/String;
>  at 
> org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95)
>  at 
> org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)
>  at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77)
>  at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-11047) ResourceManager and NodeManager unable to connect to Hbase when ATSv2 is enabled

2021-12-20 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-11047:
---
Fix Version/s: 3.4.0

> ResourceManager and NodeManager unable to connect to Hbase when ATSv2 is 
> enabled
> 
>
> Key: YARN-11047
> URL: https://issues.apache.org/jira/browse/YARN-11047
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Minni Mittal
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> yarn resourcemanager command has following issue: 
> 2021-12-14 19:26:33,345 WARN [pool-28-thread-1] 
> storage.TimelineStorageMonitor (TimelineStorageMonitor.java:run(95)) - Got 
> failure attempting to read from HBase, assuming Storage is down 
> java.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: 
> java.lang.NoSuchMethodError: 
> [com.google.common.net|http://com.google.common.net/].HostAndPort.getHostText()Ljava/lang/String;
>  at 
> org.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:95)
>  at 
> org.apache.hadoop.yarn.server.timelineservice.storage.reader.TimelineEntityReader.readEntities(TimelineEntityReader.java:283)
>  at 
> org.apache.hadoop.yarn.server.timelineservice.storage.HBaseStorageMonitor.healthCheck(HBaseStorageMonitor.java:77)
>  at 
> org.apache.hadoop.yarn.server.timelineservice.storage.TimelineStorageMonitor$MonitorThread.run(TimelineStorageMonitor.java:89)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org