Re: Counting elements in a bag

Ruslan Al-Fakikh Thu, 20 Sep 2012 02:37:38 -0700

Hey, try this:

[cloudera@localhost workpig]$ cat input
James
John
Lisa
Larry
Amanda
Amanda
John
James
Lisa
John
[cloudera@localhost workpig]$ pig -x local
2012-09-20 13:35:06,225 [main] INFO  org.apache.pig.Main - Logging
error messages to: /home/cloudera/workpig/pig_1348133706198.log
2012-09-20 13:35:06,524 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///
grunt> a = load 'input';
grunt> b = group a by $0;
grunt> c = foreach b generate group, COUNT(a);
grunt> dump c;
(John,3)
(Lisa,2)
(James,2)
(Larry,1)
(Amanda,2)


or just
c = foreach b generate group, COUNT(a);
to eliminate the keys

Best regards,
Ruslan

On Wed, Sep 19, 2012 at 9:09 PM, Arun Ahuja <[email protected]> wrote:
> Looking for an elegant way to do this:
>
> Suppose there is a bag with names { James, John, Lisa, Larry, Amanda,
> Amanda, John, James, Lisa, John}
> I'd like to get something back along the lines of a tuple (2, 2, 3, 1,
> 2) where those are the counts for Amanda, James, John, Larry, Lisa
> respectively.
>
> Obviously I could write a UDF to do this, but I want to ensure that
> there are the same columns in every row i.e. Bag { Amanda }  gives me
> (1, 0, 0, 0.. ).  I could precompute the possible bag entries and pass
> that along to the UDF but is this the only possibility?  Anything
> better?
>
> Thanks,
>
> Arun

Re: Counting elements in a bag

Reply via email to