Hi Dmitriy, Can you explain which is the difference in the execution plan? And if there is a performance difference, shouldn't we try to fix it?
Cheers, -- Gianmarco On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[email protected]> wrote: > This works, but isn't the most efficient thing in the world. > Try using the TOP udf instead. > http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html > > On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney > <[email protected]> wrote: > > Or even: > > > > ordered = foreach (group data by $0) { sorted = order data by $1; first > = limit sorted 1; generate first; } > > > > > > Russell Jurney http://datasyndrome.com > > > > On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[email protected]> wrote: > > > >> Dear Gianmarco, > >> > >> It works great! Thanks. > >> > >> Tim > >> ________________________________________ > >> From: Gianmarco De Francisci Morales [[email protected]] > >> Sent: Monday, April 16, 2012 1:43 PM > >> To: [email protected] > >> Subject: Re: ordering tuple after grouping > >> > >> Sure, > >> use a nested foreach. > >> > >> grouped = group data by $0; > >> ordered = foreach grouped { > >> sorted = order data by $1; > >> first = limit sorted 1; > >> generate first; > >> } > >> > >> Beware, untested code. > >> > >> Cheers, > >> -- > >> Gianmarco > >> > >> > >> > >> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[email protected]> wrote: > >> > >>> Given data: > >>> > >>> (1, 55, abc) > >>> (2, 23, asd) > >>> (1, 85, xyz) > >>> (1, 2, aaa) > >>> > >>> > >>> I would like to group on $0 and then have my grouped tuple be ordered > by > >>> $1. Is this possible? > >>> > >>> The output should look like this: > >>> > >>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) > >>> (2, {(2,23,asd)}) > >>> > >>> > >>> Then I would like to keep the first tuple for every group. > >>> > >>> For example: > >>> > >>> (1,2,aaa) > >>> (2,23,asd) > >>> > >>> > >>> > >> >
