You might want to use the TOP UDF which is more efficient for the same task (as I was taught on this list :). http://pig.apache.org/docs/r0.10.0/func.html#topx
Cheers, -- Gianmarco On Wed, May 9, 2012 at 7:39 PM, James Newhaven <[email protected]>wrote: > Ok, figured out the nested foreach. Thanks for your help. > > Regards, > James > > > > On Wed, May 9, 2012 at 5:33 PM, James Newhaven <[email protected] > >wrote: > > > Thanks Steve, > > > > Yes I did discover nested foreach, but I can't get the syntax right. Can > > anyone help get me started on how it's meant to look? > > > > Regards, > > James > > > > > > On Wed, May 9, 2012 at 4:55 PM, Steve Bernstein < > [email protected]>wrote: > > > >> You can. Check out nested Foreach, order by then limit. (see, for > >> example, > >> http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html). > >> > >> _____________ > >> Steve Bernstein > >> VP, Analytics > >> Rearden Commerce, Inc. > >> > >> +1.408.499.0961 Mobile > >> > >> deem.com | reardencommerce.com > >> > >> -----Original Message----- > >> From: James Newhaven [mailto:[email protected]] > >> Sent: Wednesday, May 09, 2012 4:57 AM > >> To: [email protected] > >> Subject: Ordering and limiting Tuples inside a Bag > >> > >> Hi, > >> > >> Another newbie Pig question. > >> > >> If I have a relation with a structure like this: (city, { (productId, > >> count), (product, count) }). > >> > >> This relation tracks counts of products for each city. So a tuple > >> containing a city name and then a bag of products each with an inventory > >> count. > >> > >> Is it possible in pig, to extract only the top 3 products with the > >> highest counts for each city, ordered from highest to lowest? > >> > >> Ideally, I would like the output to be like this: > >> > >> (New York City, ((apples, 50), (oranges, 34), (pears, 23))) (Another > >> City, ((oranges, 52), (pears, 32), (apples, 12))) > >> > >> Thanks, > >> James > >> > > > > >
