[
https://issues.apache.org/jira/browse/PIG-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581919#action_12581919
]
Pi Song commented on PIG-166:
-----------------------------
Amir,
So the problem is mainly due to pig hogging disk space right?
As I said before, I think we should have some kind of TempFileManager. A simple
one shouldn't take too much time to implement.
(Copied from what I've already posted before)
- Rely on a temp folder and temp files on top of file systems. (Or should we
rely and wait for Hadoop to have?)
- User can customize location and size limit.
- The folder is clean-up every time the system restarts (comparable to when pig
restarts, not hadoop restarts so this can be more frequent)
- The old temp files are removed when the system running low on temp space
- Temp files where lifecycle is explicitly known may be marked for collection
at appropriate time (like the concept of GC)
I can start working on this if people agree.
> Disk Full
> ---------
>
> Key: PIG-166
> URL: https://issues.apache.org/jira/browse/PIG-166
> Project: Pig
> Issue Type: Bug
> Reporter: Amir Youssefi
>
> Occasionally spilling fills up (all) hard drive(s) on a Data Node and crashes
> Task Tracker (and other processes) on that node. We need to have a safety net
> and fail the task before crashing happens (and more).
> In Pig + Hadoop setting, Task Trackers get Black Listed. And Pig console gets
> stock at a percentage without returning nodes to cluster. I talked to Hadoop
> team to explore Max Percentage idea. Nodes running into this problem get into
> permanent problems and manual cleaning by administrator is necessary.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.