[ 
https://issues.apache.org/jira/browse/YARN-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-4594:
-----------------------------
    Summary: container-executor fails to remove directory tree when chmod 
required  (was: Fix test-container-executor.c to pass)

Thanks for the patch, Colin!

I noticed we're using openat, fchmodat, and unlinkat for the first time.  I 
suspect most other POSIX-like distributions support this, but I think these 
were only recently added to MacOS X (in 10.9 Yosemite).  I'm not sure if anyone 
uses container-executor for MacOS X (or if container-executor even 
compiles/works for MacOS X today). but adding these could break the native 
build for those using older MacOS X versions.  One alternative would be a 
double-pass with ftw where we walk just the directories first changing 
permissions then followed by the walk it does today.  The directory trees 
involved are going to be very shallow, so it's probably not a problem in 
practice if we decided to go that route.

If we stick with the custom walker then here are some comments on the patch:
* I think this needs to use strerror(-fd):
{code}
  if (fd < 0) {
    fprintf(LOGFILE, "error opening %s: %s\n", name, strerror(ret));
    goto done;
  }
{code}
* There's no check for an error being encountered by readdir and therefore no 
logging if it does occur
* Sometimes recursive_unlink_helper is returning errno and sometimes it is 
returning -errno.  For example, the "failed to stat" path will set ret=errno 
and return -ret as -errno, but the "failed to unlink" path will set ret=-errno 
and thus return errno.



> container-executor fails to remove directory tree when chmod required
> ---------------------------------------------------------------------
>
>                 Key: YARN-4594
>                 URL: https://issues.apache.org/jira/browse/YARN-4594
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: YARN-4594.001.patch
>
>
> test-container-executor.c doesn't work:
> * It assumes that realpath(/bin/ls) will be /bin/ls, whereas it is actually 
> /usr/bin/ls on many systems.
> * The recursive delete logic in container-executor.c fails -- nftw does the 
> wrong thing when confronted with directories with the wrong mode (permission 
> bits), leading to an attempt to run rmdir on a non-empty directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to