[ 
https://issues.apache.org/jira/browse/YARN-11856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated YARN-11856:
----------------------------------
    Labels: pull-request-available  (was: )

> DOWNLOADING resources unlock and cleanup is interrupted when killing a 
> container that is localizing
> ---------------------------------------------------------------------------------------------------
>
>                 Key: YARN-11856
>                 URL: https://issues.apache.org/jira/browse/YARN-11856
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Weihao Zheng
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, LocalizerRunner tries to send a ContainerResourceFailedEvent to 
> dispatcher before unlocking and cleaning up downloading resource when failed 
> by interruption. This handle invoking will throw uncaught exception.
>  
> Related logs:
> {quote}2025-MM-DD HH:04:59,573 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Stopping container with container Id: 
> container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX
> ... ...
> 2025-MM-DD HH:04:59,573 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX transitioned from 
> LOCALIZING to KILLING
> ... ...
> 2025-MM-DD HH:04:59,627 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Unknown localizer with localizerId 
> container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX is sending heartbeat. Ordering 
> it to DIE
> 2025-MM-DD HH:04:59,628 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed for container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX
> java.io.IOException: java.io.InterruptedIOException: Interrupted waiting to 
> send RPC request to server
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1247)
> Caused by: java.io.InterruptedIOException: Interrupted waiting to send RPC 
> request to server
> at org.apache.hadoop.ipc.Client.call(Client.java:1446)
> at org.apache.hadoop.ipc.Client.call(Client.java:1388)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy82.heartbeat(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:63)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:306)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198)
> ... 2 more
> Caused by: java.lang.InterruptedException
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
> at java.util.concurrent.FutureTask.get(FutureTask.java:191)
> at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1158)
> at org.apache.hadoop.ipc.Client.call(Client.java:1441)
> ... 9 more
> 2025-MM-DD HH:04:59,629 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher thread interrupted
> java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
> at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271)
> 2025-MM-DD HH:04:59,629 ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LocalizerRunner for 
> container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX,5,main] threw an Exception.
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:312)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
> at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304)
> ... 1 more{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to