a = load 'log' as (country:chararray, search_term:chararray); b = foreach (group a by (country, search_term)) generate flatten(group) as (country, search_term), COUNT(a) as ct; c = order b by country asc, ct desc;
It sort of depends what format you want the output in, though. Note: if you know that the number of search terms is low you could do this in memory and do it in one m/r job, but this version will be scalable. If this solution doesn't make sense, I can help explain it. It's important to know what format you want the output in. This would give you every country (in ascending alphabetical order), and then the search term and count starting with the highest. 2012/5/10 Mark <[email protected]> > We have logs in the following format > > us, foo > us, foo > fr, fizz > us, bar > fr, baz > fr, fizz > us, foo > fr, fizz > > Where the first column is a country and the second column is a search term. > > How in the world can I output the country followed by the top terms in > order of occurrence... ie: > > us, (foo, bar) # Top term for 'us' is foo then bar then ... > fr, (fizz, baz) # Top term for 'fr' is fizz then baz then ... > > Thanks > > >
