I think current pig does not support distinct by fields that you select. This is from Pig documentation:
"You cannot use DISTINCT on a subset of fields; to do this, use FOREACH and a nested block to first select the fields and then apply DISTINCT" and here's example to show you how http://pig.apache.org/docs/r0.9.0/basic.html#nestedblock Hope this helps, Michael ________________________________ From: 唐亮 <[email protected]> To: [email protected] Sent: Tuesday, August 30, 2011 11:31 PM Subject: Can I DISTINCT by Multiple Columns? Hi pigs: Can I distinct by multiple columns? For example: A = load ... as (a1:int, a2:int, a3:int); B = DISTINCT A; -- It's OK. -- But can I distinct by a1 and a2? C = DISTINCT A.a1, A.a2; -- It's Not OK -- And I know I can do it like this: F = foreach A generate a1, a2; D = DISTINCT F; -- But, ...
