Hi Michael,
If I understand correctly you are trying to get the distinct 1st column
elements from the dataset? Something like this:
grunt> A = load 'aaa' using PigStorage(',');
grunt> B = foreach A GENERATE $0;
grunt> C = DISTINCT B;
grunt> DUMP C;
Thanks,
Prashant
On Tue, Jan 24, 2012 at 11:19 PM, Michael Lok <[email protected]> wrote:
> Hi folks,
>
> I've got a dataset as below:
>
> 10,234324234,NAME 1,3
> 10,346464646,NAME 1,3
> 10,438389232,NAME 1,3
> 20,397383737,NAME 2,4
> 20,383783234,NAME 2,4
> 20,387382828,NAME 2,4
> 20,309323333,NAME 2,4
> 30,439378283,NAME 3,2
> 30,010191923,NAME 3,2
> 40,439837434,NAME 4,4
> 40,383723443,NAME 4,4
> 40,100182321,NAME 4,4
> 40,992173732,NAME 4,4
>
> I'd like to just print out the distinct records by column 1. Here's
> what I have:
>
> A = group FULL by $0;
>
> B = foreach FULL {
> C0 = FULL.$0;
> UC0 = DISTINCT C0;
> generate group, COUNT(UC0);
> };
>
> The script above prints out only the first column and count (not
> really required). But I need to print out just a single tuple for
> each of the distinct row.
>
> Is this possible?
>
> Any help is greatly appreciated.
>
>
> Thanks!
>