It should be possible to spill big databags to HDFS
---------------------------------------------------

                 Key: PIG-96
                 URL: https://issues.apache.org/jira/browse/PIG-96
             Project: Pig
          Issue Type: Improvement
          Components: data
            Reporter: Pi Song


Currently databags only get spilled to local disk which costs  2  disk io 
operations.If databags are too big, this is not efficient. 
We should take advantage of HDFS so if the databag is too big (determined by 
DataBag.getMemorySize() >  a big  threshold), let's spill it to HDFS. Also read 
from HDFS in parallel when data is required.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to