need to create temp files in the task's working directory
---------------------------------------------------------

                 Key: PIG-129
                 URL: https://issues.apache.org/jira/browse/PIG-129
             Project: Pig
          Issue Type: Bug
            Reporter: Olga Natkovich
            Assignee: Amir Youssefi


Currently, pig creates temp data such is spilled bags in the directory 
specified by java.io.tmpdir. The problem is that this directory is usually 
shared by all tasks and can easily run out of space.

A better approach would be to create this files in the temp dir inside of the 
taks working directory as these locations usually have much mor space and also 
they can be hosted on different disks so the performance could be better.

There are 2 parts to this fix:

(1) in org.apache.pig.data.DataBag to check if the temp directory exists and 
create it if not before trying to create the temp file. This is somewhere 
around line 390 in the code.
(2) Change the mapred.child.java.opts in hadoop-site.xml to include new value 
for tmpdir property to point to ./tmp. For instance: 
<property>
        <name>mapred.child.java.opts</name>
        <value>-Xmx1024M -Djava.io.tmpdir="./tmp"</value>
        <description>arguments passed to child jvms</description>
</property>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to