It looks like right now, the combiner optimization does not kick in for a
script like this:

data = load 'foo' using PigStorage() as (a, b, c);
grouped = group data by a;
filtered = filter grouped by COUNT(data) < 1000;

Looking at the code in CombinerOptimizer, seems like the Filter bit is just
pseudo-coded in comments. Are there complications there other than what is
already noted, or is it just the matter of coding up the pseudo-code?

On that note -- assuming the optimization was implemented for Filter
following group, would it automagically start working for Splits, as well?

-D

Reply via email to