[ 
https://issues.apache.org/jira/browse/PIG-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-116.
--------------------------------

    Resolution: Won't Fix

We are not seeing this as causing problems since the data does not get cleaned 
under very rare circumstances

> pig leaves temp files behind
> ----------------------------
>
>                 Key: PIG-116
>                 URL: https://issues.apache.org/jira/browse/PIG-116
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Olga Natkovich
>
> Currently, pig creates temp dirs via call to FileLocalizer.getTemporaryPath. 
> They are created on the client and are mainly used to store data between 2 
> M-R jobs. Pig then attempts to clean them up in the client's shutdown hook. 
> The problem with this approach is that, because there is now way to order the 
> shutdown hooks, in some cases, the DFS is already closed when we try to 
> delete the files in which case a substention amount of data can be left in 
> DFS. I see this issue more frequently with hadoop 0.16 perhaps because I had 
> to add an extra shutdown hook to handle hod disconnects.
> The short term, I would like to propose the approach below:
> (1) If trash is configured on the cluster, use trash location to create temp 
> directory that will expire in 7 days. The hope is that most jobs don't run 
> longer that 7 days. The user can specify a longer interval via a command line 
> switch
> (2) If trash is not enabled on the cluster, the location that we use now will 
> be used
> (3) In the shutdown hook, we will attempt to cleanup. If the attempt fails 
> and trash is enabled, we let trash handle it; otherwise we provide the list 
> of locations to the user to clean. (I realize that this is not ideal but 
> could not figure out a better way.)
> Longer term, I am talking with hadoop team to have better temp file support: 
> https://issues.apache.org/jira/browse/HADOOP-2815
> Comments? Suggestions?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to