Alan Gates commented on PIG-49:
At this point I think there is no plan to fix this. We have implemented a
streaming interface for cogroup (one of the tables is streamed). For straight
group by queries we are counting on the fact that most aggregate UDFs are
algebraic and can use the combiner, and thus do not need this. Unless I see
any objections I'll mark this as won't fix.
> optimize bag usage
> Key: PIG-49
> URL: https://issues.apache.org/jira/browse/PIG-49
> Project: Pig
> Issue Type: Improvement
> Reporter: Olga Natkovich
> (1) Currently, we always bring the entire bag into memory even though in most
> cases we just need to stream through it. This is very inefficient in terms of
> memory and CPU usage.
> (2) If we are doing multiple computations on the same group, we iterate over
> the bag that represents the group several times. This is very inefficient
> especially for spilled bags.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.