Re: Counting elements in a bag

Ruslan Al-Fakikh Thu, 20 Sep 2012 15:01:30 -0700

Sorry,

I meant:
or just
c = foreach b generate COUNT(a); --without group
to eliminate the keys


On Thu, Sep 20, 2012 at 1:37 PM, Ruslan Al-Fakikh <[email protected]> wrote:
> Hey, try this:
>
> [cloudera@localhost workpig]$ cat input
> James
> John
> Lisa
> Larry
> Amanda
> Amanda
> John
> James
> Lisa
> John
> [cloudera@localhost workpig]$ pig -x local
> 2012-09-20 13:35:06,225 [main] INFO  org.apache.pig.Main - Logging
> error messages to: /home/cloudera/workpig/pig_1348133706198.log
> 2012-09-20 13:35:06,524 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: file:///
> grunt> a = load 'input';
> grunt> b = group a by $0;
> grunt> c = foreach b generate group, COUNT(a);
> grunt> dump c;
> (John,3)
> (Lisa,2)
> (James,2)
> (Larry,1)
> (Amanda,2)
>
> or just
> c = foreach b generate group, COUNT(a);
> to eliminate the keys
>
> Best regards,
> Ruslan
>
> On Wed, Sep 19, 2012 at 9:09 PM, Arun Ahuja <[email protected]> wrote:
>> Looking for an elegant way to do this:
>>
>> Suppose there is a bag with names { James, John, Lisa, Larry, Amanda,
>> Amanda, John, James, Lisa, John}
>> I'd like to get something back along the lines of a tuple (2, 2, 3, 1,
>> 2) where those are the counts for Amanda, James, John, Larry, Lisa
>> respectively.
>>
>> Obviously I could write a UDF to do this, but I want to ensure that
>> there are the same columns in every row i.e. Bag { Amanda }  gives me
>> (1, 0, 0, 0.. ).  I could precompute the possible bag entries and pass
>> that along to the UDF but is this the only possibility?  Anything
>> better?
>>
>> Thanks,
>>
>> Arun

Re: Counting elements in a bag

Reply via email to