Top doesn't need to sort the whole relation; it can be done in a streaming fashion over any collection (n log k, where k << n). Plus it's algebraic (associative), since top 10 of a set is top 10 of all the top 10s of a covering collection of subsets.
On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales <[email protected]> wrote: > Hi Dmitriy, > > Can you explain which is the difference in the execution plan? > And if there is a performance difference, shouldn't we try to fix it? > > Cheers, > -- > Gianmarco > > > > On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[email protected]> wrote: > >> This works, but isn't the most efficient thing in the world. >> Try using the TOP udf instead. >> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html >> >> On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney >> <[email protected]> wrote: >>> Or even: >>> >>> ordered = foreach (group data by $0) { sorted = order data by $1; first >> = limit sorted 1; generate first; } >>> >>> >>> Russell Jurney http://datasyndrome.com >>> >>> On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[email protected]> wrote: >>> >>>> Dear Gianmarco, >>>> >>>> It works great! Thanks. >>>> >>>> Tim >>>> ________________________________________ >>>> From: Gianmarco De Francisci Morales [[email protected]] >>>> Sent: Monday, April 16, 2012 1:43 PM >>>> To: [email protected] >>>> Subject: Re: ordering tuple after grouping >>>> >>>> Sure, >>>> use a nested foreach. >>>> >>>> grouped = group data by $0; >>>> ordered = foreach grouped { >>>> sorted = order data by $1; >>>> first = limit sorted 1; >>>> generate first; >>>> } >>>> >>>> Beware, untested code. >>>> >>>> Cheers, >>>> -- >>>> Gianmarco >>>> >>>> >>>> >>>> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[email protected]> wrote: >>>> >>>>> Given data: >>>>> >>>>> (1, 55, abc) >>>>> (2, 23, asd) >>>>> (1, 85, xyz) >>>>> (1, 2, aaa) >>>>> >>>>> >>>>> I would like to group on $0 and then have my grouped tuple be ordered >> by >>>>> $1. Is this possible? >>>>> >>>>> The output should look like this: >>>>> >>>>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)}) >>>>> (2, {(2,23,asd)}) >>>>> >>>>> >>>>> Then I would like to keep the first tuple for every group. >>>>> >>>>> For example: >>>>> >>>>> (1,2,aaa) >>>>> (2,23,asd) >>>>> >>>>> >>>>> >>>> >>
