Order and distinct are 2 very different operations. You order by something, but you take the distinct over all the fields of a relation, which is to say that the key/value structure is quite different for the general case.
> On Jun 3, 2015, at 11:02 AM, <william.dowl...@thomsonreuters.com> > <william.dowl...@thomsonreuters.com> wrote: > > Dear Pig users, > Can Pig combine sorting and unique-ing into a single job? Doing this > --define Components, then > Sorted_0 = order Components by block_id parallel $par; > Sorted = DISTINCT Sorted_0; > > causes one more MR job to be launched than simply doing this: > --define Components, then > Sorted = order Components by block_id parallel $par; > > It would seem there should be some way to do the distinct in the same pass as > the sort, like 'sort -u'. But I can't see how. Any tips would be much > appreciated! > > Thanks, > Will > > William F Dowling > Senior Technologist > Thomson Reuters >