[
https://issues.apache.org/jira/browse/YARN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305159#comment-14305159
]
Jason Lowe commented on YARN-3089:
----------------------------------
bq. the issue now happens for logs of all applications (and not just those that
leverage YARN-2468)?
I think we're OK for normal apps. It will try, and fail, to delete the
individual log files, but then afterwards it issues a recursive delete on the
log directory which does succeed. We only found this problem because we saw
delete tasks being leaked in the state store, and that's because errors from
the container executor were escaping the try blocks. See YARN-3090.
bq. For the changes in YARN-2468, we only delete the very old logs from HDFS.
We do not delete any logs from NM.
The call of DeletionService.delete where basepaths are specified but subdir is
empty/null is new with YARN-2468, specifically in this part of the code in
AppLogAggregatorImpl.uploadLogsForContainers:
{code}
this.delService.delete(this.userUgi.getShortUserName(), null,
uploadedFilePathsInThisCycle
.toArray(new Path[uploadedFilePathsInThisCycle.size()]));
{code}
The LCE wasn't properly handling the case when the basedirs arguments are _not_
directories. (Based on the name of the argument, this isn't that surprising.)
The breakage we'll see is that on systems using the LCE the local delete
triggered by the above code will not occur. If the log directory is not going
to be deleted shortly afterwards then we won't cleanup logs on the local disk
in a timely manner after aggregation.
> LinuxContainerExecutor does not handle file arguments to deleteAsUser
> ---------------------------------------------------------------------
>
> Key: YARN-3089
> URL: https://issues.apache.org/jira/browse/YARN-3089
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Jason Lowe
> Assignee: Eric Payne
> Priority: Blocker
> Attachments: YARN-3089.v1.txt, YARN-3089.v2.txt
>
>
> YARN-2468 added the deletion of individual logs that are aggregated, but this
> fails to delete log files when the LCE is being used. The LCE native
> executable assumes the paths being passed are paths and the delete fails.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)