I think current pig does not support distinct by fields that you select.

This is from Pig documentation:

"You cannot use DISTINCT on a subset of fields; to do this, use FOREACH 
and a nested block to first select the fields and then apply DISTINCT"

and here's example to show you how

http://pig.apache.org/docs/r0.9.0/basic.html#nestedblock

Hope this helps,
 

Michael


________________________________
From: 唐亮 <[email protected]>
To: [email protected]
Sent: Tuesday, August 30, 2011 11:31 PM
Subject: Can I DISTINCT by Multiple Columns?

Hi pigs:
Can I distinct by multiple columns?

For example:
A = load ... as (a1:int, a2:int, a3:int);
B = DISTINCT A;  -- It's OK.


-- But can I distinct by a1 and a2?
C = DISTINCT A.a1, A.a2;  -- It's Not OK


-- And I know I can do it like this:
F = foreach A generate a1, a2;
D = DISTINCT F;

-- But, ...

Reply via email to