Abhishek,
You should be able to do this by grouping by the three columns and then
ordering by the fourth in a nested foreach.
eg:
data = load 'some_url' as (f11, f12, f13, f14);
deduped = foreach (group data by (f11,f12,f13)) {
ordered = order data by f14 asc;
one_rec = limit ordered 1;
generate
flatten(one_rec) as (f11, f2, f13, f14);
};
--jacob
@thedatachef
On Sat, 2013-08-24 at 18:03 +0000, Ambastha, Abhishek wrote:
> Hi,
>
> How can I sort and dedupe on multiple columns ?
>
> I have a 5 GB file with 70 columns. I want to sort on four columns f11, f12,
> f13 and f14. Then I want to dedupe on three columns f11, f12 and f13 so that
> the minimum value of f14 is retained (that is pick up the first record after
> sort). Please suggest how to do this.
>
> Also, can this be done using rank function?
>
> Regards,
> Abhishek