os::rmdir() does recursive deletion. See it's implementation in stout. 

I would recommend patching that function to print more details (e.g., directory 
name and contents) to debug this. 

@vinodkone

> On Sep 11, 2014, at 8:32 AM, Tom Arnfeld <[email protected]> wrote:
> 
> Hey Vinod,
> 
> So i've run into this a few more times and am struggling to understand why 
> this is happening. It only seems to happen for some tasks.
> 
> From what I can tell, the path to the sandbox directory is scheduled for GC 
> after the executor finishes. This GC process then iterates over the scheduled 
> directories and figures out what needs to be cleaned up. Given a path that is 
> to be removed, it then runs os::rmdir() on that directory. It doesn't seem to 
> do anything explicitly recursive (maybe i'm looking in the wrong place) which 
> is quite confusing, given the GC process is working fine for some tasks which 
> have many levels of nested directors and files.
> 
> The log entry comes from this function; 
> https://github.com/apache/mesos/blob/master/src/slave/gc.cpp#L127-L160 where 
> I can see the os::rmdir() call.
> 
> Any chance you could bring some clarity to how recursive directory deletes 
> happen? Assuming they do happen, the "Directory not empty" error is even more 
> frustrating, because they're clearly not behaving correctly. Perhaps an error 
> is being thrown when deleting the contents of the directory and that is being 
> swallowed, so files still remain by the time the whole sandbox removal is 
> attempted, causing a "Directory is not empty".
> 
> Appreciate any input!
> 
> 
>> On 8 September 2014 07:26, Tom Arnfeld <[email protected]> wrote:
>> That's useful to know, thanks Vinod. I'll try and dig deeper.
>> 
>> 
>>> On Mon, Sep 8, 2014 at 5:33 AM, Vinod Kone <[email protected]> wrote:
>>> 
>>>> On Sat, Sep 6, 2014 at 8:23 AM, Tom Arnfeld <[email protected]> wrote:
>>>> If I try and manually remove the directory mentioned, it works fine. Is 
>>>> this a known issue, or should I do a little more debugging? I've not tried 
>>>> to reproduce it under specific conditions yet.
>>>  
>>> This is surprising. GC does a recursive directory removal (see os::rmdir() 
>>> in stout) using post-order traversal. Definitely some debugging is in order 
>>> to see which directory failed and why. Does your sandbox contain any 
>>> special files (other than directories and files) like mounts, devices etc?
>>> 
>>>  
>>>> As a side note, should mesos perhaps have some kind of retry mechanism for 
>>>> GC? Also, will GC still run for an executor if the slave restarts after an 
>>>> executor terminates but before the GC process runs?
>>> 
>>> I don't know what the error was above but I doubt a retry would've helped 
>>> here. And yes GC runs for a terminated executor when slave restarts.
> 

Reply via email to