[
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated YARN-4594:
-----------------------------
Summary: container-executor fails to remove directory tree when chmod
required (was: Fix test-container-executor.c to pass)
Thanks for the patch, Colin!
I noticed we're using openat, fchmodat, and unlinkat for the first time. I
suspect most other POSIX-like distributions support this, but I think these
were only recently added to MacOS X (in 10.9 Yosemite). I'm not sure if anyone
uses container-executor for MacOS X (or if container-executor even
compiles/works for MacOS X today). but adding these could break the native
build for those using older MacOS X versions. One alternative would be a
double-pass with ftw where we walk just the directories first changing
permissions then followed by the walk it does today. The directory trees
involved are going to be very shallow, so it's probably not a problem in
practice if we decided to go that route.
If we stick with the custom walker then here are some comments on the patch:
* I think this needs to use strerror(-fd):
{code}
if (fd < 0) {
fprintf(LOGFILE, "error opening %s: %s\n", name, strerror(ret));
goto done;
}
{code}
* There's no check for an error being encountered by readdir and therefore no
logging if it does occur
* Sometimes recursive_unlink_helper is returning errno and sometimes it is
returning -errno. For example, the "failed to stat" path will set ret=errno
and return -ret as -errno, but the "failed to unlink" path will set ret=-errno
and thus return errno.
> container-executor fails to remove directory tree when chmod required
> ---------------------------------------------------------------------
>
> Key: YARN-4594
> URL: https://issues.apache.org/jira/browse/YARN-4594
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: YARN-4594.001.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the
> wrong thing when confronted with directories with the wrong mode (permission
> bits), leading to an attempt to run rmdir on a non-empty directory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)