It looks like right now, the combiner optimization does not kick in for a script like this:
data = load 'foo' using PigStorage() as (a, b, c); grouped = group data by a; filtered = filter grouped by COUNT(data) < 1000; Looking at the code in CombinerOptimizer, seems like the Filter bit is just pseudo-coded in comments. Are there complications there other than what is already noted, or is it just the matter of coding up the pseudo-code? On that note -- assuming the optimization was implemented for Filter following group, would it automagically start working for Splits, as well? -D