[ https://issues.apache.org/jira/browse/PIG-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665624#action_12665624 ]
Alan Gates commented on PIG-49: ------------------------------- At this point I think there is no plan to fix this. We have implemented a streaming interface for cogroup (one of the tables is streamed). For straight group by queries we are counting on the fact that most aggregate UDFs are algebraic and can use the combiner, and thus do not need this. Unless I see any objections I'll mark this as won't fix. > optimize bag usage > ------------------ > > Key: PIG-49 > URL: https://issues.apache.org/jira/browse/PIG-49 > Project: Pig > Issue Type: Improvement > Reporter: Olga Natkovich > > (1) Currently, we always bring the entire bag into memory even though in most > cases we just need to stream through it. This is very inefficient in terms of > memory and CPU usage. > (2) If we are doing multiple computations on the same group, we iterate over > the bag that represents the group several times. This is very inefficient > especially for spilled bags. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.