Weihao Zheng created YARN-11856:
-----------------------------------
Summary: DOWNLOADING resources unlock and cleanup is interrupted
when killing a container that is localizing
Key: YARN-11856
URL: https://issues.apache.org/jira/browse/YARN-11856
Project: Hadoop YARN
Issue Type: Bug
Reporter: Weihao Zheng
Currently, LocalizerRunner tries to send a ContainerResourceFailedEvent to
dispatcher before unlocking and cleaning up downloading resource when failed by
interruption. This handle invoking will throw uncaught exception.
Related logs:
{quote}2025-MM-DD HH:04:59,573 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Stopping container with container Id:
container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX
... ...
2025-MM-DD HH:04:59,573 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX transitioned from
LOCALIZING to KILLING
... ...
2025-MM-DD HH:04:59,627 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Unknown localizer with localizerId
container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX is sending heartbeat. Ordering it
to DIE
2025-MM-DD HH:04:59,628 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Localizer failed for container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX
java.io.IOException: java.io.InterruptedIOException: Interrupted waiting to
send RPC request to server
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1247)
Caused by: java.io.InterruptedIOException: Interrupted waiting to send RPC
request to server
at org.apache.hadoop.ipc.Client.call(Client.java:1446)
at org.apache.hadoop.ipc.Client.call(Client.java:1388)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy82.heartbeat(Unknown Source)
at
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:63)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:306)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198)
... 2 more
Caused by: java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1158)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
... 9 more
2025-MM-DD HH:04:59,629 WARN org.apache.hadoop.yarn.event.AsyncDispatcher:
AsyncDispatcher thread interrupted
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271)
2025-MM-DD HH:04:59,629 ERROR
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread
Thread[LocalizerRunner for
container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX,5,main] threw an Exception.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
java.lang.InterruptedException
at
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:312)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271)
Caused by: java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
at
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304)
... 1 more{quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]