[
https://issues.apache.org/jira/browse/PIG-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588356#action_12588356
]
Pi Song commented on PIG-166:
-----------------------------
A cluster can have a large number of machines and not all of them are likely to
have the same amount of free disk space. I think a more practical solution
would be to be able to set up the low water mark of available disk space where
temp files are no longer allowed to be created.
A naive Implementation would be to keep checking the current available disk
space over time but that will degrade the execution performance too much. A
more clever way is to keep the initial available disk space and maintain that
variable against space used by our temp files but this way is not practical
because to make it work correctly we have to make sure there is only our
process utilizing disk at any time.
Any thought? Idea?
> Disk Full
> ---------
>
> Key: PIG-166
> URL: https://issues.apache.org/jira/browse/PIG-166
> Project: Pig
> Issue Type: Bug
> Reporter: Amir Youssefi
> Attachments: PIG-166_v1.patch
>
>
> Occasionally spilling fills up (all) hard drive(s) on a Data Node and crashes
> Task Tracker (and other processes) on that node. We need to have a safety net
> and fail the task before crashing happens (and more).
> In Pig + Hadoop setting, Task Trackers get Black Listed. And Pig console gets
> stock at a percentage without returning nodes to cluster. I talked to Hadoop
> team to explore Max Percentage idea. Nodes running into this problem get into
> permanent problems and manual cleaning by administrator is necessary.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.