Re: nested FOREACH statements

Ruslan Al-Fakikh Tue, 25 Jun 2013 03:10:50 -0700

Hi!

I haven't tried this script, but here is an idea:
flattenned = FOREACH data2 GENERATE group AS initialGroup, FLATTEN(data1);
grouped = GROUP flattenned BY (initialGroup, lt, ln);
counted = FOREACH grouped GENERATE group AS wholeGroup, COUNT(flattenned)
AS aCount;
groupedAgain = GROUP counted BY wholeGroup.initialGroup
maximums = FOREACH groupedAgain GENERATE group, TOP([i don't remember the
parameters, but here goes the column to compare, the number of elements to
extract and the bag])


Also, what version of Pig are you using, I haven't tried it, but I know
that there can be 2 levels of nesting:
http://hortonworks.com/blog/new-features-in-apache-pig-0-10/
see
Nested Cross/Foreach

Hope that helps
Ruslan Al-Fakikh






On Fri, Jun 21, 2013 at 7:09 PM, Adamantios Corais <
[email protected]> wrote:

> It seems that group is not supported in nested FOREACH statements. I have
> the following schema:
>
> data2: {group: chararray,data1: {(lt: chararray,ln: chararray)}}
>
> on which I want to flatten data1, group all pairs of (lt, ln), count, order
> DESC, and finally limit 1.
>
> The idea is to extract the most probable pair of (lt, ln) for each group.
> How would you recommend me to do that?
>

Re: nested FOREACH statements

Reply via email to