[ 
https://issues.apache.org/jira/browse/PIG-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588356#action_12588356
 ] 

Pi Song commented on PIG-166:
-----------------------------

A cluster can have a large number of machines and not all of them are likely to 
have  the same amount of free disk space. I think a more practical solution 
would be to be able to set up the low water mark of available disk space where 
temp files are no longer allowed to be created.

A naive Implementation would be to keep checking the current available disk 
space over time but that will degrade the execution performance too much. A 
more clever way is to keep the initial available disk space and maintain that 
variable against space used by our temp files but this way is not practical 
because to make it work correctly we have to make sure there is only our 
process utilizing disk at any time.

Any thought? Idea?

> Disk Full
> ---------
>
>                 Key: PIG-166
>                 URL: https://issues.apache.org/jira/browse/PIG-166
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Amir Youssefi
>         Attachments: PIG-166_v1.patch
>
>
> Occasionally spilling fills up (all) hard drive(s) on a Data Node and crashes 
> Task Tracker (and other processes) on that node. We need to have a safety net 
> and fail the task before crashing happens (and more). 
> In Pig + Hadoop setting, Task Trackers get Black Listed. And Pig console gets 
> stock at a percentage without returning nodes to cluster. I talked to Hadoop 
> team to explore Max Percentage idea. Nodes running into this problem get into 
> permanent problems and manual cleaning by administrator is necessary. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to