Hey, try this: [cloudera@localhost workpig]$ cat input James John Lisa Larry Amanda Amanda John James Lisa John [cloudera@localhost workpig]$ pig -x local 2012-09-20 13:35:06,225 [main] INFO org.apache.pig.Main - Logging error messages to: /home/cloudera/workpig/pig_1348133706198.log 2012-09-20 13:35:06,524 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// grunt> a = load 'input'; grunt> b = group a by $0; grunt> c = foreach b generate group, COUNT(a); grunt> dump c; (John,3) (Lisa,2) (James,2) (Larry,1) (Amanda,2)
or just c = foreach b generate group, COUNT(a); to eliminate the keys Best regards, Ruslan On Wed, Sep 19, 2012 at 9:09 PM, Arun Ahuja <[email protected]> wrote: > Looking for an elegant way to do this: > > Suppose there is a bag with names { James, John, Lisa, Larry, Amanda, > Amanda, John, James, Lisa, John} > I'd like to get something back along the lines of a tuple (2, 2, 3, 1, > 2) where those are the counts for Amanda, James, John, Larry, Lisa > respectively. > > Obviously I could write a UDF to do this, but I want to ensure that > there are the same columns in every row i.e. Bag { Amanda } gives me > (1, 0, 0, 0.. ). I could precompute the possible bag entries and pass > that along to the UDF but is this the only possibility? Anything > better? > > Thanks, > > Arun
