Hi Michael,

If I understand correctly you are trying to get the distinct 1st column
elements from the dataset? Something like this:

grunt> A = load 'aaa' using PigStorage(',');
grunt> B = foreach A GENERATE $0;
grunt> C = DISTINCT B;
grunt> DUMP C;

Thanks,
Prashant

On Tue, Jan 24, 2012 at 11:19 PM, Michael Lok <[email protected]> wrote:

> Hi folks,
>
> I've got a dataset as below:
>
> 10,234324234,NAME 1,3
> 10,346464646,NAME 1,3
> 10,438389232,NAME 1,3
> 20,397383737,NAME 2,4
> 20,383783234,NAME 2,4
> 20,387382828,NAME 2,4
> 20,309323333,NAME 2,4
> 30,439378283,NAME 3,2
> 30,010191923,NAME 3,2
> 40,439837434,NAME 4,4
> 40,383723443,NAME 4,4
> 40,100182321,NAME 4,4
> 40,992173732,NAME 4,4
>
> I'd like to just print out the distinct records by column 1.  Here's
> what I have:
>
> A = group FULL by $0;
>
> B = foreach FULL {
>        C0 = FULL.$0;
>        UC0 = DISTINCT C0;
>        generate group, COUNT(UC0);
> };
>
> The script above prints out only the first column and count (not
> really required).  But I need to print out just a single tuple for
> each of the distinct row.
>
> Is this possible?
>
> Any help is greatly appreciated.
>
>
> Thanks!
>

Reply via email to