[
https://issues.apache.org/jira/browse/PIG-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574019#action_12574019
]
Pi Song commented on PIG-129:
-----------------------------
I think the concept of multi-dir temp file creator (LocalDirAllocator in
Hadoop) should be adopted to Pig. What it does is:-
- You can set up a set of tmp file dirs in configuration (They can be on
different physical drives so you can utilize more disk space)
- When a temp file is being created, the system will probe the given temp dirs
in round-robin fashion
- For a selected temp dir, if it exists and you have permission to write, temp
file will be created
- For a selected temp dir, it it doesn't exist or you don't have permission to
write, the temp dir will be kept in the black list, thus not being used later
on.
- For the next temp file, move on to the next temp dir
> need to create temp files in the task's working directory
> ---------------------------------------------------------
>
> Key: PIG-129
> URL: https://issues.apache.org/jira/browse/PIG-129
> Project: Pig
> Issue Type: Bug
> Reporter: Olga Natkovich
> Assignee: Amir Youssefi
>
> Currently, pig creates temp data such is spilled bags in the directory
> specified by java.io.tmpdir. The problem is that this directory is usually
> shared by all tasks and can easily run out of space.
> A better approach would be to create this files in the temp dir inside of the
> taks working directory as these locations usually have much mor space and
> also they can be hosted on different disks so the performance could be better.
> There are 2 parts to this fix:
> (1) in org.apache.pig.data.DataBag to check if the temp directory exists and
> create it if not before trying to create the temp file. This is somewhere
> around line 390 in the code.
> (2) Change the mapred.child.java.opts in hadoop-site.xml to include new value
> for tmpdir property to point to ./tmp. For instance:
> <property>
> <name>mapred.child.java.opts</name>
> <value>-Xmx1024M -Djava.io.tmpdir="./tmp"</value>
> <description>arguments passed to child jvms</description>
> </property>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.