Weihao Zheng created YARN-11856:
-----------------------------------

             Summary: DOWNLOADING resources unlock and cleanup is interrupted 
when killing a container that is localizing
                 Key: YARN-11856
                 URL: https://issues.apache.org/jira/browse/YARN-11856
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Weihao Zheng


Currently, LocalizerRunner tries to send a ContainerResourceFailedEvent to 
dispatcher before unlocking and cleaning up downloading resource when failed by 
interruption. This handle invoking will throw uncaught exception.

 

Related logs:
{quote}2025-MM-DD HH:04:59,573 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Stopping container with container Id: 
container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX
... ...

2025-MM-DD HH:04:59,573 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX transitioned from 
LOCALIZING to KILLING
... ...

2025-MM-DD HH:04:59,627 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Unknown localizer with localizerId 
container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX is sending heartbeat. Ordering it 
to DIE
2025-MM-DD HH:04:59,628 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Localizer failed for container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX
java.io.IOException: java.io.InterruptedIOException: Interrupted waiting to 
send RPC request to server
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1247)
Caused by: java.io.InterruptedIOException: Interrupted waiting to send RPC 
request to server
at org.apache.hadoop.ipc.Client.call(Client.java:1446)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy82.heartbeat(Unknown Source)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:63)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:306)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198)
... 2 more
Caused by: java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1158)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
... 9 more
2025-MM-DD HH:04:59,629 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
AsyncDispatcher thread interrupted
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271)
2025-MM-DD HH:04:59,629 ERROR 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[LocalizerRunner for 
container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX,5,main] threw an Exception.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.lang.InterruptedException
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:312)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271)
Caused by: java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304)
... 1 more{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to