Heh, that's it, forgot the COUNT
A = LOAD 'data' AS (item:chararray, user:chararray);
B = GROUP A BY item;
C = FOREACH B {
distinct_users = DISTINCT A.user;
GENERATE
group AS item,
COUNT(distinct_users) AS distinct_users
;
};
Thanks Jacob.
On Fri, May 6, 2011 at 11:30 AM, Kim Vogt <[email protected]> wrote:
> I think you're missing a SUM and/or COUNT and that's the part I'm stuck on.
>
> -Kim
>
>
> On Fri, May 6, 2011 at 11:24 AM, jacob <[email protected]> wrote:
>
>> Kim,
>>
>> This is something pig addresses exceedingly well:
>>
>> A = LOAD 'data' AS (item:chararray, user:chararray);
>> B = GROUP A BY item;
>> C = FOREACH B {
>> distinct_users = DISTINCT A.user;
>> GENERATE
>> group AS item,
>> distinct_users AS distinct_users
>> ;
>> };
>>
>> should work. Haven't tested it though.
>>
>> --jacob
>> @thedatachef
>>
>>
>> On Fri, 2011-05-06 at 11:08 -0700, Kim Vogt wrote:
>> > Hi,
>> >
>> > I'm stuck on a query for counting distinct users. Say I have data that
>> looks
>> > like this:
>> >
>> > book, user1
>> > book, user2
>> > book, user1
>> > movie, user1
>> > movie, user2
>> > movie, user3
>> > music, user4
>> >
>> > I want to group by the first column and count the number of distinct
>> users
>> > for that product. The result would just be:
>> >
>> > book, 2
>> > movie, 3
>> > music, 1
>> >
>> > Is this piggable?
>> >
>> > Happy Friday!
>> >
>> > -Kim
>>
>>
>>
>