SIZE() always leads to 1 reducer?

Yang Thu, 11 Apr 2013 15:14:20 -0700

I set default_parallel=15

but when I did a


y = group z ALL;
x = foreach y generate SIZE(z);

the 2 lines generate a MR job with only 1 reducer.


I guess it's because SIZE() needs to count all the groups. but don't we
have the sort of cumulative/additive UDFs ?


it would be faster if we could parallelize SIZE()

thanks
Yang

SIZE() always leads to 1 reducer?

Reply via email to