[
https://issues.apache.org/jira/browse/PIG-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571620#action_12571620
]
Olga Natkovich commented on PIG-116:
------------------------------------
The following config params in hadoop tells if trash is enabled and where:
<property>
<name>fs.trash.root</name>
<value>${hadoop.tmp.dir}/Trash</value>
<description>The trash directory, used by FsShell's 'rm' command.
</description>
</property>
<property>
<name>fs.trash.interval</name>
<value>0</value>
<description>Number of minutes between trash checkpoints.
If zero, the trash feature is disabled.
</description>
</property>
The format of directories to create is yyMMddHHmm.
> pig leaves temp files behind
> ----------------------------
>
> Key: PIG-116
> URL: https://issues.apache.org/jira/browse/PIG-116
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Olga Natkovich
>
> Currently, pig creates temp dirs via call to FileLocalizer.getTemporaryPath.
> They are created on the client and are mainly used to store data between 2
> M-R jobs. Pig then attempts to clean them up in the client's shutdown hook.
> The problem with this approach is that, because there is now way to order the
> shutdown hooks, in some cases, the DFS is already closed when we try to
> delete the files in which case a substention amount of data can be left in
> DFS. I see this issue more frequently with hadoop 0.16 perhaps because I had
> to add an extra shutdown hook to handle hod disconnects.
> The short term, I would like to propose the approach below:
> (1) If trash is configured on the cluster, use trash location to create temp
> directory that will expire in 7 days. The hope is that most jobs don't run
> longer that 7 days. The user can specify a longer interval via a command line
> switch
> (2) If trash is not enabled on the cluster, the location that we use now will
> be used
> (3) In the shutdown hook, we will attempt to cleanup. If the attempt fails
> and trash is enabled, we let trash handle it; otherwise we provide the list
> of locations to the user to clean. (I realize that this is not ideal but
> could not figure out a better way.)
> Longer term, I am talking with hadoop team to have better temp file support:
> https://issues.apache.org/jira/browse/HADOOP-2815
> Comments? Suggestions?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.