Dear Jacob, Many thanks! It worked perfect.
Regards, Abhishek -----Original Message----- From: Jacob Perkins [mailto:[email protected]] Sent: 24 August 2013 23:49 To: [email protected] Subject: Re: Dedupe Logic Abhishek, You should be able to do this by grouping by the three columns and then ordering by the fourth in a nested foreach. eg: data = load 'some_url' as (f11, f12, f13, f14); deduped = foreach (group data by (f11,f12,f13)) { ordered = order data by f14 asc; one_rec = limit ordered 1; generate flatten(one_rec) as (f11, f2, f13, f14); }; --jacob @thedatachef On Sat, 2013-08-24 at 18:03 +0000, Ambastha, Abhishek wrote: > Hi, > > How can I sort and dedupe on multiple columns ? > > I have a 5 GB file with 70 columns. I want to sort on four columns f11, f12, > f13 and f14. Then I want to dedupe on three columns f11, f12 and f13 so that > the minimum value of f14 is retained (that is pick up the first record after > sort). Please suggest how to do this. > > Also, can this be done using rank function? > > Regards, > Abhishek
