[ 
https://issues.apache.org/jira/browse/PIG-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581919#action_12581919
 ] 

Pi Song commented on PIG-166:
-----------------------------

Amir,

So the problem is mainly due to pig hogging disk space right?

As I said before, I think we should have some kind of TempFileManager. A simple 
one shouldn't take too much time to implement.

(Copied from what I've already posted before)
- Rely on a temp folder and temp files on top of file systems. (Or should we 
rely and wait for Hadoop to have?)
- User can customize location and size limit.
- The folder is clean-up every time the system restarts (comparable to when pig 
restarts, not hadoop restarts so this can be more frequent)
- The old temp files are removed when the system running low on temp space
- Temp files where lifecycle is explicitly known may be marked for collection 
at appropriate time (like the concept of GC)

I can start working on this if people agree.



> Disk Full
> ---------
>
>                 Key: PIG-166
>                 URL: https://issues.apache.org/jira/browse/PIG-166
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Amir Youssefi
>
> Occasionally spilling fills up (all) hard drive(s) on a Data Node and crashes 
> Task Tracker (and other processes) on that node. We need to have a safety net 
> and fail the task before crashing happens (and more). 
> In Pig + Hadoop setting, Task Trackers get Black Listed. And Pig console gets 
> stock at a percentage without returning nodes to cluster. I talked to Hadoop 
> team to explore Max Percentage idea. Nodes running into this problem get into 
> permanent problems and manual cleaning by administrator is necessary. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to