Weihao Zheng created YARN-11856: ----------------------------------- Summary: DOWNLOADING resources unlock and cleanup is interrupted when killing a container that is localizing Key: YARN-11856 URL: https://issues.apache.org/jira/browse/YARN-11856 Project: Hadoop YARN Issue Type: Bug Reporter: Weihao Zheng
Currently, LocalizerRunner tries to send a ContainerResourceFailedEvent to dispatcher before unlocking and cleaning up downloading resource when failed by interruption. This handle invoking will throw uncaught exception. Related logs: {quote}2025-MM-DD HH:04:59,573 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Stopping container with container Id: container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX ... ... 2025-MM-DD HH:04:59,573 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX transitioned from LOCALIZING to KILLING ... ... 2025-MM-DD HH:04:59,627 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Unknown localizer with localizerId container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX is sending heartbeat. Ordering it to DIE 2025-MM-DD HH:04:59,628 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer failed for container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX java.io.IOException: java.io.InterruptedIOException: Interrupted waiting to send RPC request to server at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1247) Caused by: java.io.InterruptedIOException: Interrupted waiting to send RPC request to server at org.apache.hadoop.ipc.Client.call(Client.java:1446) at org.apache.hadoop.ipc.Client.call(Client.java:1388) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy82.heartbeat(Unknown Source) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:63) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:306) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198) ... 2 more Caused by: java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) at java.util.concurrent.FutureTask.get(FutureTask.java:191) at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1158) at org.apache.hadoop.ipc.Client.call(Client.java:1441) ... 9 more 2025-MM-DD HH:04:59,629 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271) 2025-MM-DD HH:04:59,629 ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[LocalizerRunner for container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX,5,main] threw an Exception. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:312) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304) ... 1 more{quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org