[jira] [Updated] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11534: -- Labels: pull-request-available (was: ) > Incorrect exception handling in RecoveredContainerLaunch > > > Key: YARN-11534 > URL: https://issues.apache.org/jira/browse/YARN-11534 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Peter Szucs >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > > When NM is restarted during a container recovery, it can happen that it > interrupts the container reaquisition during the LinuxContainerExecutor's > signalContainer method. In this case we will get the following exception: > {code:java} > java.io.InterruptedIOException: java.lang.InterruptedException > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) > at org.apache.hadoop.util.Shell.run(Shell.java:901) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) > at > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.InterruptedException > at java.base/java.lang.Object.wait(Native Method) > at java.base/java.lang.Object.wait(Object.java:328) > at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) > ... 15 more{code} > Later this InterruptedIOException get caught and wrapped inside a > PrivilegedOperationException and a ContainerExecutionException. In > LinuxContainerExecutor's > [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] > method we catch this exception again, and throw an IOException from it, > indicating this error message in the stack trace: > {code:java} > IOException from it, causing the following stack trace: > org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: > Signal container failed > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) > at > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) > at >
[jira] [Updated] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11534: --- Description: When NM is restarted during a container recovery, it can happen that it interrupts the container reaquisition during the LinuxContainerExecutor's signalContainer method. In this case we will get the following exception: {code:java} java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.InterruptedException at java.base/java.lang.Object.wait(Native Method) at java.base/java.lang.Object.wait(Object.java:328) at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) ... 15 more{code} Later this InterruptedIOException get caught and wrapped inside a PrivilegedOperationException and a ContainerExecutionException. In LinuxContainerExecutor's [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] method we catch this exception again, and throw an IOException from it, indicating this error message in the stack trace: {code:java} IOException from it, causing the following stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2023-06-20 18:24:31,777 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e03_1687266197584_0033_01_01 java.io.IOException: Problem signalling
[jira] [Updated] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11534: --- Description: When NM is restarted during a container recovery, it can happen that it interrupts the container reaquisition during the LinuxContainerExecutor's signalContainer method. In this case we will get the following exception: {code:java} java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.InterruptedException at java.base/java.lang.Object.wait(Native Method) at java.base/java.lang.Object.wait(Object.java:328) at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) ... 15 more{code} Later this InterruptedIOException get caught and wrapped inside a PrivilegedOperationException and a ContainerExecutionException. In LinuxContainerExecutor's [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] method we catch this exception again, and throw an IOException from it, indicating this error message in the stack trace: {code:java} IOException from it, causing the following stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2023-06-20 18:24:31,777 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e03_1687266197584_0033_01_01 java.io.IOException: Problem signalling
[jira] [Updated] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11534: --- Description: When NM is restarted during a container recovery, it can happen that it interrupts the container reaquisition during the LinuxContainerExecutor's signalContainer method. In this case we will get the following exception: {code:java} java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.InterruptedException at java.base/java.lang.Object.wait(Native Method) at java.base/java.lang.Object.wait(Object.java:328) at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) ... 15 more{code} Later this InterruptedIOException get caught and wrapped inside a PrivilegedOperationException and a ContainerExecutionException. In LinuxContainerExecutor's [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] method we catch this exception again, and throw an {code:java} IOException from it, causing the following stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2023-06-20 18:24:31,777 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e03_1687266197584_0033_01_01 java.io.IOException: Problem signalling container 256974 with NULL; output: null and exitCode: -1 at
[jira] [Updated] (YARN-11534) Incorrect exception handling in RecoveredContainerLaunch
[ https://issues.apache.org/jira/browse/YARN-11534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Szucs updated YARN-11534: --- Description: When NM is restarted during a container recovery, it can happen that it interrupts the container reaquisition during the LinuxContainerExecutor's signalContainer method. In this case we will get the following exception: java.io.InterruptedIOException: java.lang.InterruptedException at org.apache.hadoop.util.Shell.runCommand(Shell.java:1011) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.InterruptedException at java.base/java.lang.Object.wait(Native Method) at java.base/java.lang.Object.wait(Object.java:328) at java.base/java.lang.ProcessImpl.waitFor(ProcessImpl.java:495) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1001) ... 15 more Later this InterruptedIOException get caught and wrapped inside a PrivilegedOperationException and a ContainerExecutionException. In LinuxContainerExecutor's [signalContainer|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L790] method we catch this exception again, and throw an IOException from it, causing the following stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Signal container failed at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.signalContainer(DefaultLinuxContainerRuntime.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.signalContainer(DelegatingLinuxContainerRuntime.java:184) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:735) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerAlive(LinuxContainerExecutor.java:887) at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:291) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:708) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:84) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:47) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) 2023-06-20 18:24:31,777 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch: Unable to recover container container_e03_1687266197584_0033_01_01 java.io.IOException: Problem signalling container 256974 with NULL; output: null and exitCode: -1 at