Jason Lowe commented on YARN-3089:

bq. the issue now happens for logs of all applications (and not just those that 
leverage YARN-2468)?

I think we're OK for normal apps.  It will try, and fail, to delete the 
individual log files, but then afterwards it issues a recursive delete on the 
log directory which does succeed.  We only found this problem because we saw 
delete tasks being leaked in the state store, and that's because errors from 
the container executor were escaping the try blocks.  See YARN-3090.

bq. For the changes in YARN-2468, we only delete the very old logs from HDFS. 
We do not delete any logs from NM.

The call of DeletionService.delete where basepaths are specified but subdir is 
empty/null is new with YARN-2468, specifically in this part of the code in 

        this.delService.delete(this.userUgi.getShortUserName(), null,
            .toArray(new Path[uploadedFilePathsInThisCycle.size()]));

The LCE wasn't properly handling the case when the basedirs arguments are _not_ 
directories.  (Based on the name of the argument, this isn't that surprising.)

The breakage we'll see is that on systems using the LCE the local delete 
triggered by the above code will not occur.  If the log directory is not going 
to be deleted shortly afterwards then we won't cleanup logs on the local disk 
in a timely manner after aggregation.

> LinuxContainerExecutor does not handle file arguments to deleteAsUser
> ---------------------------------------------------------------------
>                 Key: YARN-3089
>                 URL: https://issues.apache.org/jira/browse/YARN-3089
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Eric Payne
>            Priority: Blocker
>         Attachments: YARN-3089.v1.txt, YARN-3089.v2.txt
> YARN-2468 added the deletion of individual logs that are aggregated, but this 
> fails to delete log files when the LCE is being used.  The LCE native 
> executable assumes the paths being passed are paths and the delete fails.

This message was sent by Atlassian JIRA

Reply via email to