[
https://issues.apache.org/jira/browse/YARN-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16455610#comment-16455610
]
Eric Yang commented on YARN-8209:
---------------------------------
[~ebadger] Thank you for the insights. Cmd file is a serialization contract
between node manager and container-executor. If we want to step away from this
contract, we need an alternate proposal. If we read cmd from stdin, we still
need to handle possible buffer overflow, and proper passing of environment
variables to docker. This is likely going in full circle that is likely ending
to have a data file between node manager and container-executor. How about we
make a small modification for docker rm command to skip generation of cmd file
and pass the docker container id via environment variable to
container-executor. If container-executor can not find .cmd file, and
environment variable matches to delete a docker container, and it will perform
accordingly. This will decouple dependency for docker rm on .cmd file, and
avoid the race condition between FileDeletionTask and
DockerContainerDeletionTask. Can this be a possible workaround to the race
condition problem?
> NPE in DeletionService
> ----------------------
>
> Key: YARN-8209
> URL: https://issues.apache.org/jira/browse/YARN-8209
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Chandni Singh
> Assignee: Eric Badger
> Priority: Major
>
> {code:java}
> 2018-04-25 23:38:41,039 WARN concurrent.ExecutorHelper
> (ExecutorHelper.java:logThrowableFromAfterExecute(63)) - Caught exception in
> thread DeletionService #1:
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerClient.writeCommandToTempFile(DockerClient.java:109)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeDockerCommand(DockerCommandExecutor.java:85)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.executeStatusCommand(DockerCommandExecutor.java:192)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor.getContainerStatus(DockerCommandExecutor.java:128)
> at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.removeDockerContainer(LinuxContainerExecutor.java:935)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task.DockerContainerDeletionTask.run(DockerContainerDeletionTask.java:61)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]