I see, I hadn't got your suggestion. You meant replacing both ORDER and LIMIT with TOP. Makes sense, thanks.
Cheers, -- Gianmarco On Tue, Apr 17, 2012 at 11:50, Dmitriy Ryaboy <[email protected]> wrote: > Top doesn't need to sort the whole relation; it can be done in a streaming > fashion over any collection (n log k, where k << n). Plus it's algebraic > (associative), since top 10 of a set is top 10 of all the top 10s of a > covering collection of subsets. > > On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales < > [email protected]> wrote: > > > Hi Dmitriy, > > > > Can you explain which is the difference in the execution plan? > > And if there is a performance difference, shouldn't we try to fix it? > > > > Cheers, > > -- > > Gianmarco > > > > > > > > On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[email protected]> > wrote: > > > >> This works, but isn't the most efficient thing in the world. > >> Try using the TOP udf instead. > >> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html > >> > >> On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney > >> <[email protected]> wrote: > >>> Or even: > >>> > >>> ordered = foreach (group data by $0) { sorted = order data by $1; first > >> = limit sorted 1; generate first; } > >>> > >>> > >>> Russell Jurney http://datasyndrome.com > >>> > >>> On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[email protected]> wrote: > >>> > >>>> Dear Gianmarco, > >>>> > >>>> It works great! Thanks. > >>>> > >>>> Tim > >>>> ________________________________________ > >>>> From: Gianmarco De Francisci Morales [[email protected]] > >>>> Sent: Monday, April 16, 2012 1:43 PM > >>>> To: [email protected] > >>>> Subject: Re: ordering tuple after grouping > >>>> > >>>> Sure, > >>>> use a nested foreach. > >>>> > >>>> grouped = group data by $0; > >>>> ordered = foreach grouped { > >>>> sorted = order data by $1; > >>>> first = limit sorted 1; > >>>> generate first; > >>>> } > >>>> > >>>> Beware, untested code. > >>>> > >>>> Cheers, > >>>> -- > >>>> Gianmarco > >>>> > >>>> > >>>> > >>>> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[email protected]> wrote: > >>>> > >>>>> Given data: > >>>>> > >>>>> (1, 55, abc) > >>>>> (2, 23, asd) > >>>>> (1, 85, xyz) > >>>>> (1, 2, aaa) > >>>>> > >>>>> > >>>>> I would like to group on $0 and then have my grouped tuple be ordered > >> by > >>>>> $1. Is this possible? > >>>>> > >>>>> The output should look like this: > >>>>> > >>>>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) > >>>>> (2, {(2,23,asd)}) > >>>>> > >>>>> > >>>>> Then I would like to keep the first tuple for every group. > >>>>> > >>>>> For example: > >>>>> > >>>>> (1,2,aaa) > >>>>> (2,23,asd) > >>>>> > >>>>> > >>>>> > >>>> > >> >
