[
https://issues.apache.org/jira/browse/YARN-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958854#comment-15958854
]
Eric Badger commented on YARN-6091:
-----------------------------------
bq. I just changed the judging condition of pclose's return code, then the
problem is solved.
That doesn't really solve the problem though. pclose() will return the return
value of wait4(), which in turn returns the return value of the underlying
executed process. If this is something other than -1, but also not 0, then we
will completely ignore the fact that that command failed. That is exactly what
is happening here. This SIGPIPE happens to be a fairly benign error since we
don't all that much care that the underlying process was trying to write to an
already closed pipe, but just changing the if condition ignores the fact that
the process returned 13 instead of 0. It would be much better to fix the reason
that the SIGPIPE is being thrown in the first place. We can do this by
redirecting stdout to /dev/null.
> the AppMaster register failed when use Docker on LinuxContainer
> ----------------------------------------------------------------
>
> Key: YARN-6091
> URL: https://issues.apache.org/jira/browse/YARN-6091
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager, yarn
> Affects Versions: 2.8.1
> Environment: CentOS
> Reporter: zhengchenyu
> Priority: Critical
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> In some servers, When I use Docker on LinuxContainer, I found the aciton that
> AppMaster register to Resourcemanager failed. But didn't happen in other
> servers.
> I found the pclose (in container-executor.c) return different value in
> different server, even though the process which is launched by popen is
> running normally. Some server return 0, and others return 13.
> Because yarn regard the application as failed application when pclose return
> nonzero, and yarn will remove the AMRMToken, then the AppMaster register
> failed because Resourcemanager have removed this applicaiton's token.
> In container-executor.c, the judgement condition is whether the return code
> is zero. But man the pclose, the document tells that "pclose return -1"
> represent wrong. So I change the judgement condition, then slove this
> problem.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]