[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217288#comment-17217288 ] Akira Ajisaka commented on YARN-10460: -- Thank you [~pbacsko]! > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail > - > > Key: YARN-10460 > URL: https://issues.apache.org/jira/browse/YARN-10460 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10460-001.patch, YARN-10460-002.patch, > YARN-10460-POC.patch > > > In our downstream build environment, we're using JUnit 4.13. Recently, we > discovered a truly weird test failure in TestNodeStatusUpdater. > The problem is that timeout handling has changed in Junit 4.13. See the > difference between these two snippets: > 4.12 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } > {noformat} > > 4.13 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > try { > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } finally { > try { > thread.join(1); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > try { > threadGroup.destroy(); < This > } catch (IllegalThreadStateException e) { > // If a thread from the group is still alive, the ThreadGroup > cannot be destroyed. > // Swallow the exception to keep the same behavior prior to > this change. > } > } > } > {noformat} > The change comes from [https://github.com/junit-team/junit4/pull/1517]. > Unfortunately, destroying the thread group causes an issue because there are > all sorts of object caching in the IPC layer. The exception is: > {noformat} > java.lang.IllegalThreadStateException > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867) > at java.lang.Thread.init(Thread.java:402) > at java.lang.Thread.init(Thread.java:349) > at java.lang.Thread.(Thread.java:675) > at > java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613) > at > com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163) > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136) > at org.apache.hadoop.ipc.Client.call(Client.java:1458) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy81.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576) > {noformat} > Both the {{clientExecutor}} in {{org.apache.hadoop.ipc.Client}} and the > clien
[jira] [Comment Edited] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217275#comment-17217275 ] Akira Ajisaka edited comment on YARN-10460 at 10/20/20, 5:37 AM: - Hi [~pbacsko], I have a question. Are there any failing tests other than this issue in JUnit 4.13? Hi [~snemeth], I would like to backport this to older branches. Now I consider upgrading JUnit to 4.13.1 in Apache Hadoop because of CVE-2020-15250 (HADOOP-17316). The severity is low, but it is worth upgrading. https://github.com/junit-team/junit4/security/advisories/GHSA-269g-pwp5-87pp was (Author: ajisakaa): Hi [~pbacsko], I have a question. Are there any failing tests other than this issue in JUnit 4.13? Hi [~snemeth], I would like to backport this to older branches. Now I consider upgrading JUnit to 4.13.1 in Apache Hadoop because of CVE-2020-15250. The severity is low, but it is worth upgrading. https://github.com/junit-team/junit4/security/advisories/GHSA-269g-pwp5-87pp > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail > - > > Key: YARN-10460 > URL: https://issues.apache.org/jira/browse/YARN-10460 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10460-001.patch, YARN-10460-002.patch, > YARN-10460-POC.patch > > > In our downstream build environment, we're using JUnit 4.13. Recently, we > discovered a truly weird test failure in TestNodeStatusUpdater. > The problem is that timeout handling has changed in Junit 4.13. See the > difference between these two snippets: > 4.12 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } > {noformat} > > 4.13 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > try { > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } finally { > try { > thread.join(1); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > try { > threadGroup.destroy(); < This > } catch (IllegalThreadStateException e) { > // If a thread from the group is still alive, the ThreadGroup > cannot be destroyed. > // Swallow the exception to keep the same behavior prior to > this change. > } > } > } > {noformat} > The change comes from [https://github.com/junit-team/junit4/pull/1517]. > Unfortunately, destroying the thread group causes an issue because there are > all sorts of object caching in the IPC layer. The exception is: > {noformat} > java.lang.IllegalThreadStateException > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867) > at java.lang.Thread.init(Thread.java:402) > at java.lang.Thread.init(Thread.java:349) > at java.lang.Thread.(Thread.java:675) > at > java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613) > at > com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163) > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136) > at org.apache.hadoop.ipc.Client.call(Client.java:1458) > at org.apache.hadoop.ipc.C
[jira] [Comment Edited] (YARN-10178) Global Scheduler asycthread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217285#comment-17217285 ] Wangda Tan edited comment on YARN-10178 at 10/20/20, 5:33 AM: -- Since recently we have a customer has the same issue, I spent some time to look at this, thanks [~tuyu] for the detailed analysis. Apart from issues mentioned by [~tuyu], which is async-scheduling related, I think it can also happen when async-scheduling is disabled. A possible place is completedContainer, it doesn't hold scheduler's lock, so it can happen even though async-scheduling is disabled. (I checked the problematic log, there's a container release event happened at the exact same timestamp (within the same milli-second) when the RM crash happens. One possible way to fix the problem is, inside PriorityUtilizationQueueOrderingPolicy, take a snapshot of queue capacities (which includes {code:java} AbsoluteUsedCapacity UsedCapacity ConfiguredMinResource AbsoluteCapacity And plus CSQueue's reference ){code} Create a new internal class (like PriorityQueueResourcesForSorting) to include the 4 fields, instead of sorting the CSQueue directly, we will sort the new structure. There're additional costs to copy the resources field, but it should be minimum for most cases (unless you have thousands of queues). cc: [~bteke] was (Author: wangda): Since recently we have a customer has the same issue, I spent some time to look at this, thanks [~tuyu] for the detailed analysis. Apart from issues mentioned by [~tuyu], which is async-scheduling related, I think it can also happen when async-scheduling is disabled. A possible place is completedContainer, it doesn't hold scheduler's lock, so it can happen even though async-scheduling is disabled. (I checked the problematic log, there's a container release event happened at the exact same timestamp (within the same milli-second) when the RM crash happens. One possible way to fix the problem is, inside PriorityUtilizationQueueOrderingPolicy, take a snapshot of queue capacities (which includes AbsoluteUsedCapacity UsedCapacity ConfiguredMinResource AbsoluteCapacity ) Create a new internal class (like PriorityQueueResourcesForSorting) to include the 4 fields, instead of sorting the CSQueue directly, we will sort the new structure. There're additional costs to copy the resources field, but it should be minimum for most cases (unless you have thousands of queues). cc: [~bteke] > Global Scheduler asycthread crash caused by 'Comparison method violates its > general contract' > - > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Priority: Major > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler
[jira] [Commented] (YARN-10178) Global Scheduler asycthread crash caused by 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217285#comment-17217285 ] Wangda Tan commented on YARN-10178: --- Since recently we have a customer has the same issue, I spent some time to look at this, thanks [~tuyu] for the detailed analysis. Apart from issues mentioned by [~tuyu], which is async-scheduling related, I think it can also happen when async-scheduling is disabled. A possible place is completedContainer, it doesn't hold scheduler's lock, so it can happen even though async-scheduling is disabled. (I checked the problematic log, there's a container release event happened at the exact same timestamp (within the same milli-second) when the RM crash happens. One possible way to fix the problem is, inside PriorityUtilizationQueueOrderingPolicy, take a snapshot of queue capacities (which includes AbsoluteUsedCapacity UsedCapacity ConfiguredMinResource AbsoluteCapacity ) Create a new internal class (like PriorityQueueResourcesForSorting) to include the 4 fields, instead of sorting the CSQueue directly, we will sort the new structure. There're additional costs to copy the resources field, but it should be minimum for most cases (unless you have thousands of queues). cc: [~bteke] > Global Scheduler asycthread crash caused by 'Comparison method violates its > general contract' > - > > Key: YARN-10178 > URL: https://issues.apache.org/jira/browse/YARN-10178 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.2.1 >Reporter: tuyu >Priority: Major > > Global Scheduler Async Thread crash stack > {code:java} > ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received > RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, > Thread-6066574, that exited unexpectedly: java.lang.IllegalArgumentException: > Comparison method violates its general contract! >at > java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeForceCollapse(TimSort.java:457) > at java.util.TimSort.sort(TimSort.java:254) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1462) > at java.util.Collections.sort(Collections.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:221) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:777) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:791) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:623) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1635) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1629) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1732) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1481) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:569) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:616) > {code} > JAVA 8 Arrays.sort default use timsort algo, and timsort has few require > {code:java} > 1.x.compareTo(y) != y.compareTo(x) > 2.x>y,y>z --> x > z > 3.x=y, x.compareTo(z) == y.compareTo(z) > {code} > if not Arrays paramters not satify this require,TimSort will throw > 'java.lang.IllegalArgumentException' > look at PriorityUtilizationQueueOrderingPolicy.compare function,we will know > Capacity Scheduler use this these queue resource usage to compare > {code:java} > AbsoluteUsedCapacity > UsedCapacity > ConfiguredMinResource > AbsoluteCapacity > {code} > In Capacity Scheduler Global Scheduler AsyncThread use > PriorityUtilizationQueueOrderingPolicy function to choose queue to assign > container,and construct a CSAssignment struct, and use > submitResourceCommit
[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217284#comment-17217284 ] Peter Bacsko commented on YARN-10460: - [~aajisaka] there is one more test which is potentially affected by the same thing {{org.apache.hadoop.mapreduce.task.reduce.TestFetcher.testCorruptedIFile()}} but I haven't been able to repro this locally, but I saw the same stack trace in Jenkins. So we have to keep our eye on that as well. > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail > - > > Key: YARN-10460 > URL: https://issues.apache.org/jira/browse/YARN-10460 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10460-001.patch, YARN-10460-002.patch, > YARN-10460-POC.patch > > > In our downstream build environment, we're using JUnit 4.13. Recently, we > discovered a truly weird test failure in TestNodeStatusUpdater. > The problem is that timeout handling has changed in Junit 4.13. See the > difference between these two snippets: > 4.12 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } > {noformat} > > 4.13 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > try { > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } finally { > try { > thread.join(1); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > try { > threadGroup.destroy(); < This > } catch (IllegalThreadStateException e) { > // If a thread from the group is still alive, the ThreadGroup > cannot be destroyed. > // Swallow the exception to keep the same behavior prior to > this change. > } > } > } > {noformat} > The change comes from [https://github.com/junit-team/junit4/pull/1517]. > Unfortunately, destroying the thread group causes an issue because there are > all sorts of object caching in the IPC layer. The exception is: > {noformat} > java.lang.IllegalThreadStateException > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867) > at java.lang.Thread.init(Thread.java:402) > at java.lang.Thread.init(Thread.java:349) > at java.lang.Thread.(Thread.java:675) > at > java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613) > at > com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163) > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136) > at org.apache.hadoop.ipc.Client.call(Client.java:1458) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy81.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNo
[jira] [Commented] (YARN-10453) Add partition resource info to get-node-labels and label-mappings api responses
[ https://issues.apache.org/jira/browse/YARN-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217282#comment-17217282 ] Hadoop QA commented on YARN-10453: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 15s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 14s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 38s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 46s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 15 unchanged - 1 fixed = 15 total (was 16) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 17s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:green}+1{color} | {color:green} unit {color}
[jira] [Commented] (YARN-8737) Race condition in ParentQueue when reinitializing and sorting child queues in the meanwhile
[ https://issues.apache.org/jira/browse/YARN-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217279#comment-17217279 ] Wangda Tan commented on YARN-8737: -- Rekicked Jenkins, after reviewed the case, the fix looks good to me, even though it covered a small set of the issues. I agree to move scheduling-related issues in YARN-10178. > Race condition in ParentQueue when reinitializing and sorting child queues in > the meanwhile > --- > > Key: YARN-8737 > URL: https://issues.apache.org/jira/browse/YARN-8737 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.3.0, 2.9.3, 3.2.2, 3.1.4 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Critical > Attachments: YARN-8737.001.patch > > > Administrator raised a update for queues through REST API, in RM parent queue > is refreshing child queues through calling ParentQueue#reinitialize, > meanwhile, async-schedule threads is sorting child queues when calling > ParentQueue#sortAndGetChildrenAllocationIterator. Race condition may happen > and throw exception as follow because TimSort does not handle the concurrent > modification of objects it is sorting: > {noformat} > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:899) > at java.util.TimSort.mergeAt(TimSort.java:516) > at java.util.TimSort.mergeCollapse(TimSort.java:441) > at java.util.TimSort.sort(TimSort.java:245) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.policy.PriorityUtilizationQueueOrderingPolicy.getAssignmentIterator(PriorityUtilizationQueueOrderingPolicy.java:291) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:804) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:817) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:636) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:2494) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:2431) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersOnMultiNodes(CapacityScheduler.java:2588) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:2676) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.scheduleBasedOnNodeLabels(CapacityScheduler.java:927) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:962) > {noformat} > I think we can add read-lock for > ParentQueue#sortAndGetChildrenAllocationIterator to solve this problem, the > write-lock will be hold when updating child queues in > ParentQueue#reinitialize. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217275#comment-17217275 ] Akira Ajisaka commented on YARN-10460: -- Hi [~pbacsko], I have a question. Are there any failing tests other than this issue in JUnit 4.13? Hi [~snemeth], I would like to backport this to older branches. Now I consider upgrading JUnit to 4.13.1 in Apache Hadoop because of CVE-2020-15250. The severity is low, but it is worth upgrading. https://github.com/junit-team/junit4/security/advisories/GHSA-269g-pwp5-87pp > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail > - > > Key: YARN-10460 > URL: https://issues.apache.org/jira/browse/YARN-10460 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10460-001.patch, YARN-10460-002.patch, > YARN-10460-POC.patch > > > In our downstream build environment, we're using JUnit 4.13. Recently, we > discovered a truly weird test failure in TestNodeStatusUpdater. > The problem is that timeout handling has changed in Junit 4.13. See the > difference between these two snippets: > 4.12 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } > {noformat} > > 4.13 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > try { > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } finally { > try { > thread.join(1); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > try { > threadGroup.destroy(); < This > } catch (IllegalThreadStateException e) { > // If a thread from the group is still alive, the ThreadGroup > cannot be destroyed. > // Swallow the exception to keep the same behavior prior to > this change. > } > } > } > {noformat} > The change comes from [https://github.com/junit-team/junit4/pull/1517]. > Unfortunately, destroying the thread group causes an issue because there are > all sorts of object caching in the IPC layer. The exception is: > {noformat} > java.lang.IllegalThreadStateException > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867) > at java.lang.Thread.init(Thread.java:402) > at java.lang.Thread.init(Thread.java:349) > at java.lang.Thread.(Thread.java:675) > at > java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613) > at > com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163) > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136) > at org.apache.hadoop.ipc.Client.call(Client.java:1458) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy81.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128) > at > org.ap
[jira] [Commented] (YARN-10453) Add partition resource info to get-node-labels and label-mappings api responses
[ https://issues.apache.org/jira/browse/YARN-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217221#comment-17217221 ] Sunil G commented on YARN-10453: Kicked jenkins again. > Add partition resource info to get-node-labels and label-mappings api > responses > --- > > Key: YARN-10453 > URL: https://issues.apache.org/jira/browse/YARN-10453 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Akhil PB >Assignee: Akhil PB >Priority: Major > Attachments: YARN-10453.001.patch, YARN-10453.002.patch > > > This jira will add partition resource info to responses get-node-labels and > label-mappings apis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10466) Fix NullPointerException in yarn-services Component.java
[ https://issues.apache.org/jira/browse/YARN-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217118#comment-17217118 ] Hadoop QA commented on YARN-10466: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 18s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 29s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 47s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 2s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 25s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1
[jira] [Created] (YARN-10467) ContainerIdPBImpl objects can be leaked in RMNodeImpl.completedContainers
Haibo Chen created YARN-10467: - Summary: ContainerIdPBImpl objects can be leaked in RMNodeImpl.completedContainers Key: YARN-10467 URL: https://issues.apache.org/jira/browse/YARN-10467 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.10.0 Reporter: Haibo Chen Assignee: Haibo Chen In one of our recent heap analysis, we found that the majority of the heap is occupied by {{RMNodeImpl.completedContainers}}, which accounts for 19GB, out of 24.3 GB. There are over 86 million ContainerIdPBImpl objects, in contrast, only 161,601 RMContainerImpl objects which represent the # of active containers that RM is still tracking. Inspecting some ContainerIdPBImpl objects, they belong to applications that have long finished. This indicates some sort of memory leak of ContainerIdPBImpl objects in RMNodeImpl. Right now, when a container is reported by a NM as completed, it is immediately added to RMNodeImpl.completedContainers and later cleaned up after the AM has been notified of its completion in the AM-RM heartbeat. The cleanup can be broken into a few steps. * Step 1: the completed container is first added to RMAppAttemptImpl.justFinishedContainers (this is asynchronous to being added to {{RMNodeImpl.completedContainers}}). * Step 2: During the heartbeat AM-RM heartbeat, the container is removed from RMAppAttemptImpl.justFinishedContainers and added to RMAppAttemptImpl.finishedContainersSentToAM Once a completed container gets added to RMAppAttemptImpl.finishedContainersSentToAM, it is guaranteed to be cleaned up from {{RMNodeImpl.completedContainers}} However, if the AM exits (regardless of failure or success) before some recently completed containers can be added to RMAppAttemptImpl.finishedContainersSentToAM in previous heartbeats, there won’t be any future AM-RM heartbeat to perform aforementioned step 2. Hence, these objects stay in RMNodeImpl.completedContainers forever. We have observed in MR that AMs can decide to exit upon success of all it tasks without waiting for notification of the completion of every container, or AM may just die suddenly (e.g. OOM). Spark and other framework may just be similar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10465) Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation
[ https://issues.apache.org/jira/browse/YARN-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217073#comment-17217073 ] Hadoop QA commented on YARN-10465: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 20s{color} | | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 3s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 10s{color} | | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 0m 48s{color} | | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} blanks {color} | {color:green} 0m 0s{color} | | {color:green} The patch has no blanks issues. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 14s{color} | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt|https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/240/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-router.txt] | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router: The patch generated 10 new + 0 unchanged - 0 fixed = 10 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 15s{color} | | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0u
[jira] [Commented] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217059#comment-17217059 ] Hadoop QA commented on YARN-8173: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 11s{color} | | {color:red} YARN-8173 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-8173 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927063/YARN-8173.007.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/241/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > [Router] Implement missing FederationClientInterceptor#getApplications() > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch, YARN-8173.004.patch, YARN-8173.005.patch, > YARN-8173.006.patch, YARN-8173.007.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10466) Fix NullPointerException in yarn-services Component.java
[ https://issues.apache.org/jira/browse/YARN-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] D M Murali Krishna Reddy updated YARN-10466: Attachment: YARN-10466.001.patch > Fix NullPointerException in yarn-services Component.java > - > > Key: YARN-10466 > URL: https://issues.apache.org/jira/browse/YARN-10466 > Project: Hadoop YARN > Issue Type: Bug >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Minor > Attachments: YARN-10466.001.patch > > > Due to changes in > [YARN-10219|https://issues.apache.org/jira/browse/YARN-10219] where the > constraint is initialised as null, there might be few scenarios in which NPE > can be thrown in requestContainers method. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8173) [Router] Implement missing FederationClientInterceptor#getApplications()
[ https://issues.apache.org/jira/browse/YARN-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217056#comment-17217056 ] D M Murali Krishna Reddy commented on YARN-8173: [~yiran], I would like to work on this, if you are not currently working on this task. > [Router] Implement missing FederationClientInterceptor#getApplications() > > > Key: YARN-8173 > URL: https://issues.apache.org/jira/browse/YARN-8173 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiran Wu >Assignee: Yiran Wu >Priority: Major > Attachments: YARN-8173.001.patch, YARN-8173.002.patch, > YARN-8173.003.patch, YARN-8173.004.patch, YARN-8173.005.patch, > YARN-8173.006.patch, YARN-8173.007.patch > > > oozie dependent method Implement > {code:java} > getApplications() > getDeglationToken() > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10466) Fix NullPointerException in yarn-services Component.java
D M Murali Krishna Reddy created YARN-10466: --- Summary: Fix NullPointerException in yarn-services Component.java Key: YARN-10466 URL: https://issues.apache.org/jira/browse/YARN-10466 Project: Hadoop YARN Issue Type: Bug Reporter: D M Murali Krishna Reddy Assignee: D M Murali Krishna Reddy Due to changes in [YARN-10219|https://issues.apache.org/jira/browse/YARN-10219] where the constraint is initialised as null, there might be few scenarios in which NPE can be thrown in requestContainers method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10465) Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation
[ https://issues.apache.org/jira/browse/YARN-10465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] D M Murali Krishna Reddy updated YARN-10465: Attachment: YARN-10465.001.patch > Support getClusterNodes, getNodeToLabels, getLabelsToNodes, > getClusterNodeLabels API's for Federation > - > > Key: YARN-10465 > URL: https://issues.apache.org/jira/browse/YARN-10465 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Attachments: YARN-10465.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10465) Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation
D M Murali Krishna Reddy created YARN-10465: --- Summary: Support getClusterNodes, getNodeToLabels, getLabelsToNodes, getClusterNodeLabels API's for Federation Key: YARN-10465 URL: https://issues.apache.org/jira/browse/YARN-10465 Project: Hadoop YARN Issue Type: Sub-task Components: federation Reporter: D M Murali Krishna Reddy Assignee: D M Murali Krishna Reddy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10450) Add cpu and memory utilization per node and cluster-wide metrics
[ https://issues.apache.org/jira/browse/YARN-10450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216747#comment-17216747 ] Jim Brennan commented on YARN-10450: Thanks [~ebadger]! > Add cpu and memory utilization per node and cluster-wide metrics > > > Key: YARN-10450 > URL: https://issues.apache.org/jira/browse/YARN-10450 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.3.1 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.4.0, 3.3.1, 3.1.5, 2.10.2, 3.2.3 > > Attachments: NodesPage.png, YARN-10450-branch-2.10.003.patch, > YARN-10450-branch-3.1.003.patch, YARN-10450-branch-3.2.003.patch, > YARN-10450.001.patch, YARN-10450.002.patch, YARN-10450.003.patch > > > Add metrics to show actual cpu and memory utilization for each node and > aggregated for the entire cluster. This is information is already passed > from NM to RM in the node status update. > We have been running with this internally for quite a while and found it > useful to be able to quickly see the actual cpu/memory utilization on the > node/cluster. It's especially useful if some form of overcommit is used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10442) RM should make sure node label file highly available
[ https://issues.apache.org/jira/browse/YARN-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1721#comment-1721 ] Hadoop QA commented on YARN-10442: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 15s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 6m 9s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 37s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 27s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 4s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 22m 59s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 33s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 34s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 15s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 46s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 24s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 8s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 8s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 55s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 55s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 25s{color} | {color:orange}https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2390/2/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 210 unchanged - 0 fixed = 213 total (was 210) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 31s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red}https://ci-hadoop.apache.org/job/
[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216657#comment-17216657 ] Szilard Nemeth commented on YARN-10460: --- [~pbacsko] OK, thanks. Resolving this jira, then. > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail > - > > Key: YARN-10460 > URL: https://issues.apache.org/jira/browse/YARN-10460 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10460-001.patch, YARN-10460-002.patch, > YARN-10460-POC.patch > > > In our downstream build environment, we're using JUnit 4.13. Recently, we > discovered a truly weird test failure in TestNodeStatusUpdater. > The problem is that timeout handling has changed in Junit 4.13. See the > difference between these two snippets: > 4.12 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } > {noformat} > > 4.13 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > try { > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } finally { > try { > thread.join(1); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > try { > threadGroup.destroy(); < This > } catch (IllegalThreadStateException e) { > // If a thread from the group is still alive, the ThreadGroup > cannot be destroyed. > // Swallow the exception to keep the same behavior prior to > this change. > } > } > } > {noformat} > The change comes from [https://github.com/junit-team/junit4/pull/1517]. > Unfortunately, destroying the thread group causes an issue because there are > all sorts of object caching in the IPC layer. The exception is: > {noformat} > java.lang.IllegalThreadStateException > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867) > at java.lang.Thread.init(Thread.java:402) > at java.lang.Thread.init(Thread.java:349) > at java.lang.Thread.(Thread.java:675) > at > java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613) > at > com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163) > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136) > at org.apache.hadoop.ipc.Client.call(Client.java:1458) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy81.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576) > {noformat} > Both the {{clientExecutor}} in {{org.apache.hadoop
[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216637#comment-17216637 ] Peter Bacsko commented on YARN-10460: - I think it's OK to have it only on trunk for now. > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail > - > > Key: YARN-10460 > URL: https://issues.apache.org/jira/browse/YARN-10460 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10460-001.patch, YARN-10460-002.patch, > YARN-10460-POC.patch > > > In our downstream build environment, we're using JUnit 4.13. Recently, we > discovered a truly weird test failure in TestNodeStatusUpdater. > The problem is that timeout handling has changed in Junit 4.13. See the > difference between these two snippets: > 4.12 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } > {noformat} > > 4.13 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > try { > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } finally { > try { > thread.join(1); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > try { > threadGroup.destroy(); < This > } catch (IllegalThreadStateException e) { > // If a thread from the group is still alive, the ThreadGroup > cannot be destroyed. > // Swallow the exception to keep the same behavior prior to > this change. > } > } > } > {noformat} > The change comes from [https://github.com/junit-team/junit4/pull/1517]. > Unfortunately, destroying the thread group causes an issue because there are > all sorts of object caching in the IPC layer. The exception is: > {noformat} > java.lang.IllegalThreadStateException > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867) > at java.lang.Thread.init(Thread.java:402) > at java.lang.Thread.init(Thread.java:349) > at java.lang.Thread.(Thread.java:675) > at > java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613) > at > com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163) > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136) > at org.apache.hadoop.ipc.Client.call(Client.java:1458) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy81.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576) > {noformat} > Both the {{clientExecutor}} in {{org.apache.hadoop.ip
[jira] [Commented] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216635#comment-17216635 ] Szilard Nemeth commented on YARN-10460: --- Hi [~pbacsko], Thanks for this fix. Committed to trunk. Thanks [~aajisaka], [~ebadger] for the reviews. Do you guys want to backport this to older branches? > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail > - > > Key: YARN-10460 > URL: https://issues.apache.org/jira/browse/YARN-10460 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10460-001.patch, YARN-10460-002.patch, > YARN-10460-POC.patch > > > In our downstream build environment, we're using JUnit 4.13. Recently, we > discovered a truly weird test failure in TestNodeStatusUpdater. > The problem is that timeout handling has changed in Junit 4.13. See the > difference between these two snippets: > 4.12 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } > {noformat} > > 4.13 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > try { > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } finally { > try { > thread.join(1); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > try { > threadGroup.destroy(); < This > } catch (IllegalThreadStateException e) { > // If a thread from the group is still alive, the ThreadGroup > cannot be destroyed. > // Swallow the exception to keep the same behavior prior to > this change. > } > } > } > {noformat} > The change comes from [https://github.com/junit-team/junit4/pull/1517]. > Unfortunately, destroying the thread group causes an issue because there are > all sorts of object caching in the IPC layer. The exception is: > {noformat} > java.lang.IllegalThreadStateException > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867) > at java.lang.Thread.init(Thread.java:402) > at java.lang.Thread.init(Thread.java:349) > at java.lang.Thread.(Thread.java:675) > at > java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613) > at > com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163) > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136) > at org.apache.hadoop.ipc.Client.call(Client.java:1458) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy81.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAn
[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10460: -- Fix Version/s: 3.4.0 > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail > - > > Key: YARN-10460 > URL: https://issues.apache.org/jira/browse/YARN-10460 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10460-001.patch, YARN-10460-002.patch, > YARN-10460-POC.patch > > > In our downstream build environment, we're using JUnit 4.13. Recently, we > discovered a truly weird test failure in TestNodeStatusUpdater. > The problem is that timeout handling has changed in Junit 4.13. See the > difference between these two snippets: > 4.12 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } > {noformat} > > 4.13 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask task = new FutureTask(callable); > ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > try { > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } finally { > try { > thread.join(1); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > try { > threadGroup.destroy(); < This > } catch (IllegalThreadStateException e) { > // If a thread from the group is still alive, the ThreadGroup > cannot be destroyed. > // Swallow the exception to keep the same behavior prior to > this change. > } > } > } > {noformat} > The change comes from [https://github.com/junit-team/junit4/pull/1517]. > Unfortunately, destroying the thread group causes an issue because there are > all sorts of object caching in the IPC layer. The exception is: > {noformat} > java.lang.IllegalThreadStateException > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867) > at java.lang.Thread.init(Thread.java:402) > at java.lang.Thread.init(Thread.java:349) > at java.lang.Thread.(Thread.java:675) > at > java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613) > at > com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163) > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:612) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136) > at org.apache.hadoop.ipc.Client.call(Client.java:1458) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy81.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576) > {noformat} > Both the {{clientExecutor}} in {{org.apache.hadoop.ipc.Client}} and the > client object in {{ProtobufRpcEngine}}/{{ProtobufRpcEngine
[jira] [Commented] (YARN-10463) For Federation, we should support getApplicationAttemptReport.
[ https://issues.apache.org/jira/browse/YARN-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216580#comment-17216580 ] Bilwa S T commented on YARN-10463: -- Hi [~zhuqi] Thanks for patch. I have few comments: # In below code you need to call routerMetrics.incrAppAttemptsFailedRetrieved() instead of routerMetrics.incrAppsFailedRetrieved(); {code:java} try { response = clientRMProxy.getApplicationAttemptReport(request); } catch (Exception e) { routerMetrics.incrAppsFailedRetrieved(); LOG.error("Unable to get the applicationAttempt report for " + request.getApplicationAttemptId() + "to SubCluster " + subClusterId.getId(), e); throw e; } {code} 2. Add null check for Application id ie request.getApplicationAttemptId().getApplicationId() 3. In TestFederationClientInterceptor.java, change "Test FederationClientInterceptor: Get Application Report" to "Test FederationClientInterceptor: Get Application Attempt Report" 4. In testcase instead of Assert.fail() and catch, use LambdaTestUtils#intercept(). > For Federation, we should support getApplicationAttemptReport. > -- > > Key: YARN-10463 > URL: https://issues.apache.org/jira/browse/YARN-10463 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.3.0 >Reporter: zhuqi >Assignee: zhuqi >Priority: Major > Attachments: YARN-10463.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org