Kim,

This is something pig addresses exceedingly well:

A = LOAD 'data' AS (item:chararray, user:chararray);
B = GROUP A BY item;
C = FOREACH B {
      distinct_users = DISTINCT A.user;
      GENERATE
        group AS item,
        distinct_users AS distinct_users
      ;
    };

should work. Haven't tested it though.

--jacob
@thedatachef


On Fri, 2011-05-06 at 11:08 -0700, Kim Vogt wrote:
> Hi,
> 
> I'm stuck on a query for counting distinct users. Say I have data that looks
> like this:
> 
> book, user1
> book, user2
> book, user1
> movie, user1
> movie, user2
> movie, user3
> music, user4
> 
> I want to group by the first column and count the number of distinct users
> for that product. The result would just be:
> 
> book, 2
> movie, 3
> music, 1
> 
> Is this piggable?
> 
> Happy Friday!
> 
> -Kim


Reply via email to