[ 
https://issues.apache.org/jira/browse/PIG-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583719#action_12583719
 ] 

Alan Gates commented on PIG-166:
--------------------------------

Pi,

This is in response to your comments about garbage collected temp files.  
Currently, we mark temp files from bag spills as deleteOnExit.  So unless the 
JVM crashes or is killed via kill -9 or similar mechanism, we do clean up our 
tmp files.  The issue that Amir is addressing here is when running a single pig 
job fills up the disks.  

AFAIK, if people's jobs generate so much data that we can't even contain it on 
the disk then our only hope is to find a way to better parallelize the problem. 
 That will not always be possible.  But as Amir points out, when we do fail we 
shouldn't take down the node with us.  I think that's really the focus of this 
bug.

> Disk Full
> ---------
>
>                 Key: PIG-166
>                 URL: https://issues.apache.org/jira/browse/PIG-166
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Amir Youssefi
>
> Occasionally spilling fills up (all) hard drive(s) on a Data Node and crashes 
> Task Tracker (and other processes) on that node. We need to have a safety net 
> and fail the task before crashing happens (and more). 
> In Pig + Hadoop setting, Task Trackers get Black Listed. And Pig console gets 
> stock at a percentage without returning nodes to cluster. I talked to Hadoop 
> team to explore Max Percentage idea. Nodes running into this problem get into 
> permanent problems and manual cleaning by administrator is necessary. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to