Top doesn't need to sort the whole relation; it can be done in a streaming 
fashion over any collection (n log k, where k << n). Plus it's algebraic 
(associative), since top 10 of a set is top 10 of all the top 10s of a covering 
collection of subsets. 

On Apr 17, 2012, at 1:03 AM, Gianmarco De Francisci Morales <[email protected]> 
wrote:

> Hi Dmitriy,
> 
> Can you explain which is the difference in the execution plan?
> And if there is a performance difference, shouldn't we try to fix it?
> 
> Cheers,
> --
> Gianmarco
> 
> 
> 
> On Tue, Apr 17, 2012 at 09:47, Dmitriy Ryaboy <[email protected]> wrote:
> 
>> This works, but isn't the most efficient thing in the world.
>> Try using the TOP udf instead.
>> http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/builtin/TOP.html
>> 
>> On Mon, Apr 16, 2012 at 5:22 PM, Russell Jurney
>> <[email protected]> wrote:
>>> Or even:
>>> 
>>> ordered = foreach (group data by $0) { sorted = order data by $1; first
>> = limit sorted 1; generate first; }
>>> 
>>> 
>>> Russell Jurney http://datasyndrome.com
>>> 
>>> On Apr 16, 2012, at 4:03 PM, "Chan, Tim" <[email protected]> wrote:
>>> 
>>>> Dear Gianmarco,
>>>> 
>>>> It works great! Thanks.
>>>> 
>>>> Tim
>>>> ________________________________________
>>>> From: Gianmarco De Francisci Morales [[email protected]]
>>>> Sent: Monday, April 16, 2012 1:43 PM
>>>> To: [email protected]
>>>> Subject: Re: ordering tuple after grouping
>>>> 
>>>> Sure,
>>>> use a nested foreach.
>>>> 
>>>> grouped = group data by $0;
>>>> ordered = foreach grouped {
>>>> sorted = order data by $1;
>>>> first = limit sorted 1;
>>>> generate first;
>>>> }
>>>> 
>>>> Beware, untested code.
>>>> 
>>>> Cheers,
>>>> --
>>>> Gianmarco
>>>> 
>>>> 
>>>> 
>>>> On Mon, Apr 16, 2012 at 22:31, Chan, Tim <[email protected]> wrote:
>>>> 
>>>>> Given data:
>>>>> 
>>>>> (1, 55, abc)
>>>>> (2, 23, asd)
>>>>> (1, 85, xyz)
>>>>> (1, 2, aaa)
>>>>> 
>>>>> 
>>>>> I would like to group on $0 and then have my grouped tuple be ordered
>> by
>>>>> $1. Is this possible?
>>>>> 
>>>>> The output should look like this:
>>>>> 
>>>>> (1, {(1, 2, aaa),(1,55,abc),(1,85,xyz)})
>>>>> (2, {(2,23,asd)})
>>>>> 
>>>>> 
>>>>> Then I would like to keep the first tuple for every group.
>>>>> 
>>>>> For example:
>>>>> 
>>>>> (1,2,aaa)
>>>>> (2,23,asd)
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 

Reply via email to