[ https://issues.apache.org/jira/browse/PIG-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich resolved PIG-116. -------------------------------- Resolution: Won't Fix We are not seeing this as causing problems since the data does not get cleaned under very rare circumstances > pig leaves temp files behind > ---------------------------- > > Key: PIG-116 > URL: https://issues.apache.org/jira/browse/PIG-116 > Project: Pig > Issue Type: Bug > Reporter: Olga Natkovich > Assignee: Olga Natkovich > > Currently, pig creates temp dirs via call to FileLocalizer.getTemporaryPath. > They are created on the client and are mainly used to store data between 2 > M-R jobs. Pig then attempts to clean them up in the client's shutdown hook. > The problem with this approach is that, because there is now way to order the > shutdown hooks, in some cases, the DFS is already closed when we try to > delete the files in which case a substention amount of data can be left in > DFS. I see this issue more frequently with hadoop 0.16 perhaps because I had > to add an extra shutdown hook to handle hod disconnects. > The short term, I would like to propose the approach below: > (1) If trash is configured on the cluster, use trash location to create temp > directory that will expire in 7 days. The hope is that most jobs don't run > longer that 7 days. The user can specify a longer interval via a command line > switch > (2) If trash is not enabled on the cluster, the location that we use now will > be used > (3) In the shutdown hook, we will attempt to cleanup. If the attempt fails > and trash is enabled, we let trash handle it; otherwise we provide the list > of locations to the user to clean. (I realize that this is not ideal but > could not figure out a better way.) > Longer term, I am talking with hadoop team to have better temp file support: > https://issues.apache.org/jira/browse/HADOOP-2815 > Comments? Suggestions? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.