Right, but my problem is a little bit different.
My input is more along the lines of:
1: {James, John, Lisa, Larry, Amanda,Amanda, John, James, Lisa, John}
2: {Amanda, Lisa, Lisa}
and with output:
(2, 2, 3, 1,2)
(1, 0, 0, 0,2)
I've done it for now as a UDF where I precompute the full set of names
and pass it as an argument to the function.
On Thu, Sep 20, 2012 at 6:00 PM, Ruslan Al-Fakikh <[email protected]> wrote:
> Sorry,
>
> I meant:
> or just
> c = foreach b generate COUNT(a); --without group
> to eliminate the keys
>
> On Thu, Sep 20, 2012 at 1:37 PM, Ruslan Al-Fakikh <[email protected]>
> wrote:
>> Hey, try this:
>>
>> [cloudera@localhost workpig]$ cat input
>> James
>> John
>> Lisa
>> Larry
>> Amanda
>> Amanda
>> John
>> James
>> Lisa
>> John
>> [cloudera@localhost workpig]$ pig -x local
>> 2012-09-20 13:35:06,225 [main] INFO org.apache.pig.Main - Logging
>> error messages to: /home/cloudera/workpig/pig_1348133706198.log
>> 2012-09-20 13:35:06,524 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting to hadoop file system at: file:///
>> grunt> a = load 'input';
>> grunt> b = group a by $0;
>> grunt> c = foreach b generate group, COUNT(a);
>> grunt> dump c;
>> (John,3)
>> (Lisa,2)
>> (James,2)
>> (Larry,1)
>> (Amanda,2)
>>
>> or just
>> c = foreach b generate group, COUNT(a);
>> to eliminate the keys
>>
>> Best regards,
>> Ruslan
>>
>> On Wed, Sep 19, 2012 at 9:09 PM, Arun Ahuja <[email protected]> wrote:
>>> Looking for an elegant way to do this:
>>>
>>> Suppose there is a bag with names { James, John, Lisa, Larry, Amanda,
>>> Amanda, John, James, Lisa, John}
>>> I'd like to get something back along the lines of a tuple (2, 2, 3, 1,
>>> 2) where those are the counts for Amanda, James, John, Larry, Lisa
>>> respectively.
>>>
>>> Obviously I could write a UDF to do this, but I want to ensure that
>>> there are the same columns in every row i.e. Bag { Amanda } gives me
>>> (1, 0, 0, 0.. ). I could precompute the possible bag entries and pass
>>> that along to the UDF but is this the only possibility? Anything
>>> better?
>>>
>>> Thanks,
>>>
>>> Arun