It should be possible to spill big databags to HDFS
---------------------------------------------------
Key: PIG-96
URL: https://issues.apache.org/jira/browse/PIG-96
Project: Pig
Issue Type: Improvement
Components: data
Reporter: Pi Song
Currently databags only get spilled to local disk which costs 2 disk io
operations.If databags are too big, this is not efficient.
We should take advantage of HDFS so if the databag is too big (determined by
DataBag.getMemorySize() > a big threshold), let's spill it to HDFS. Also read
from HDFS in parallel when data is required.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.