[
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880230#comment-16880230
]
Peter Bacsko edited comment on YARN-9667 at 7/8/19 11:35 AM:
-------------------------------------------------------------
I suggest considering two things:
1) When setting {{ERRORFILE}} and {{LOGFILE}}, just invoke {{setbuf(ERRORFILE,
NULL);}} and {{setbuf(LOGFILE, NULL);}}. Then writing the output will be
written to the stream immediately without buffering.
2) We should use {{_exit()}} in the child process. After doing some research,
this is the preferred method of exiting from a child process. There are some
exceptions, but in general this seems to be the commonly used way of
terminating a child. Eg. if {{execvp()}} fails then just bail out, avoid any
sort of cleanup. This avoids flusing buffers twice, plus avoids calling
functions registered with {{atexit()}} twice, which probably we don't have
right now, but guarding against a potential bug is always a good thing.
was (Author: pbacsko):
I suggest considering two things:
1) When setting ERRORFILE or LOGFILE, just invoke {{setbuf(ERRORFILE, NULL);}}
and {{setbuf(LOGFILE, NULL);}}. Then writing the output will be written to the
stream immediately without buffering.
2) We should use {{_exit()}} in the child process. After doing some research,
this is the preferred method of exiting from a child process. There are some
exceptions, but in general this seems to be the commonly used way of
terminating a child. Eg. if {{execvp()}} fails then just bail out, avoid any
sort of cleanup. This avoids flusing buffers twice, plus avoids calling
functions registered with {{atexit()}} twice, which probably we don't have
right now, but guarding against a potential bug is always a good thing.
> Container-executor.c duplicates messages to stdout
> --------------------------------------------------
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager, yarn
> Affects Versions: 3.2.0
> Reporter: Adam Antal
> Priority: Major
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
> Shell execution returned exit code: 143. Privileged Execution Operation
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_000019/container_e84_1561921629886_0001_01_000019.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script
> paths..." part, though in the Stdout log we can clearly see it has been
> written there twice. After consulting with [~pbacsko] it seems like there's a
> missing flush in container-executor.c before the fork and that causes the
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit
> misleading that the child process writes out "Getting exit code file" and
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the
> code and change them to a single call, so that the fflush calls would not be
> forgotten accidentally. (It can cause problems in every place where it's
> used).
> Note: this issue probably affects every occasion of fork(), not just the one
> from {{launch_container_as_user}} in {{main.c}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]