[ https://issues.apache.org/jira/browse/YARN-11856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated YARN-11856: ---------------------------------- Labels: pull-request-available (was: ) > DOWNLOADING resources unlock and cleanup is interrupted when killing a > container that is localizing > --------------------------------------------------------------------------------------------------- > > Key: YARN-11856 > URL: https://issues.apache.org/jira/browse/YARN-11856 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Weihao Zheng > Priority: Major > Labels: pull-request-available > > Currently, LocalizerRunner tries to send a ContainerResourceFailedEvent to > dispatcher before unlocking and cleaning up downloading resource when failed > by interruption. This handle invoking will throw uncaught exception. > > Related logs: > {quote}2025-MM-DD HH:04:59,573 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Stopping container with container Id: > container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX > ... ... > 2025-MM-DD HH:04:59,573 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX transitioned from > LOCALIZING to KILLING > ... ... > 2025-MM-DD HH:04:59,627 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Unknown localizer with localizerId > container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX is sending heartbeat. Ordering > it to DIE > 2025-MM-DD HH:04:59,628 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed for container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX > java.io.IOException: java.io.InterruptedIOException: Interrupted waiting to > send RPC request to server > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:200) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:180) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1247) > Caused by: java.io.InterruptedIOException: Interrupted waiting to send RPC > request to server > at org.apache.hadoop.ipc.Client.call(Client.java:1446) > at org.apache.hadoop.ipc.Client.call(Client.java:1388) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy82.heartbeat(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:63) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:198) > ... 2 more > Caused by: java.lang.InterruptedException > at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404) > at java.util.concurrent.FutureTask.get(FutureTask.java:191) > at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1158) > at org.apache.hadoop.ipc.Client.call(Client.java:1441) > ... 9 more > 2025-MM-DD HH:04:59,629 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: > AsyncDispatcher thread interrupted > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) > at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271) > 2025-MM-DD HH:04:59,629 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[LocalizerRunner for > container_e02_XXXXXXXXXXXXX_XXXXXXX_XX_XXXXXX,5,main] threw an Exception. > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:312) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1271) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) > at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304) > ... 1 more{quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org