Order and distinct are 2 very different operations. You order by something, but 
you take the distinct over all the fields of a relation, which is to say that 
the key/value structure is quite different for the general case.


> On Jun 3, 2015, at 11:02 AM, <william.dowl...@thomsonreuters.com> 
> <william.dowl...@thomsonreuters.com> wrote:
> 
> Dear Pig users,
> Can Pig combine sorting and unique-ing into a single job?  Doing this
> --define Components, then
> Sorted_0 = order Components by block_id parallel $par;
> Sorted = DISTINCT Sorted_0;
> 
> causes one more MR job to be launched than simply doing this:
> --define Components, then
> Sorted = order Components by block_id parallel $par;
> 
> It would seem there should be some way to do the distinct in the same pass as 
> the sort, like 'sort -u'.  But I can't see how. Any tips would be much 
> appreciated!
> 
> Thanks,
> Will
> 
> William F Dowling
> Senior Technologist
> Thomson Reuters
> 

Reply via email to