[ https://issues.apache.org/jira/browse/PIG-96?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich resolved PIG-96. ------------------------------- Resolution: Fixed Based on the discussion I don't see a reason to spill to DFS > It should be possible to spill big databags to HDFS > --------------------------------------------------- > > Key: PIG-96 > URL: https://issues.apache.org/jira/browse/PIG-96 > Project: Pig > Issue Type: Improvement > Components: data > Reporter: Pi Song > > Currently databags only get spilled to local disk which costs 2 disk io > operations.If databags are too big, this is not efficient. > We should take advantage of HDFS so if the databag is too big (determined by > DataBag.getMemorySize() > a big threshold), let's spill it to HDFS. Also > read from HDFS in parallel when data is required. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.