Hi! I haven't tried this script, but here is an idea: flattenned = FOREACH data2 GENERATE group AS initialGroup, FLATTEN(data1); grouped = GROUP flattenned BY (initialGroup, lt, ln); counted = FOREACH grouped GENERATE group AS wholeGroup, COUNT(flattenned) AS aCount; groupedAgain = GROUP counted BY wholeGroup.initialGroup maximums = FOREACH groupedAgain GENERATE group, TOP([i don't remember the parameters, but here goes the column to compare, the number of elements to extract and the bag])
Also, what version of Pig are you using, I haven't tried it, but I know that there can be 2 levels of nesting: http://hortonworks.com/blog/new-features-in-apache-pig-0-10/ see Nested Cross/Foreach Hope that helps Ruslan Al-Fakikh On Fri, Jun 21, 2013 at 7:09 PM, Adamantios Corais < [email protected]> wrote: > It seems that group is not supported in nested FOREACH statements. I have > the following schema: > > data2: {group: chararray,data1: {(lt: chararray,ln: chararray)}} > > on which I want to flatten data1, group all pairs of (lt, ln), count, order > DESC, and finally limit 1. > > The idea is to extract the most probable pair of (lt, ln) for each group. > How would you recommend me to do that? >
