Sorry, I meant: or just c = foreach b generate COUNT(a); --without group to eliminate the keys
On Thu, Sep 20, 2012 at 1:37 PM, Ruslan Al-Fakikh <[email protected]> wrote: > Hey, try this: > > [cloudera@localhost workpig]$ cat input > James > John > Lisa > Larry > Amanda > Amanda > John > James > Lisa > John > [cloudera@localhost workpig]$ pig -x local > 2012-09-20 13:35:06,225 [main] INFO org.apache.pig.Main - Logging > error messages to: /home/cloudera/workpig/pig_1348133706198.log > 2012-09-20 13:35:06,524 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > Connecting to hadoop file system at: file:/// > grunt> a = load 'input'; > grunt> b = group a by $0; > grunt> c = foreach b generate group, COUNT(a); > grunt> dump c; > (John,3) > (Lisa,2) > (James,2) > (Larry,1) > (Amanda,2) > > or just > c = foreach b generate group, COUNT(a); > to eliminate the keys > > Best regards, > Ruslan > > On Wed, Sep 19, 2012 at 9:09 PM, Arun Ahuja <[email protected]> wrote: >> Looking for an elegant way to do this: >> >> Suppose there is a bag with names { James, John, Lisa, Larry, Amanda, >> Amanda, John, James, Lisa, John} >> I'd like to get something back along the lines of a tuple (2, 2, 3, 1, >> 2) where those are the counts for Amanda, James, John, Larry, Lisa >> respectively. >> >> Obviously I could write a UDF to do this, but I want to ensure that >> there are the same columns in every row i.e. Bag { Amanda } gives me >> (1, 0, 0, 0.. ). I could precompute the possible bag entries and pass >> that along to the UDF but is this the only possibility? Anything >> better? >> >> Thanks, >> >> Arun
