I think you're missing a SUM and/or COUNT and that's the part I'm stuck on.

-Kim

On Fri, May 6, 2011 at 11:24 AM, jacob <[email protected]> wrote:

> Kim,
>
> This is something pig addresses exceedingly well:
>
> A = LOAD 'data' AS (item:chararray, user:chararray);
> B = GROUP A BY item;
> C = FOREACH B {
>      distinct_users = DISTINCT A.user;
>      GENERATE
>        group AS item,
>        distinct_users AS distinct_users
>      ;
>    };
>
> should work. Haven't tested it though.
>
> --jacob
> @thedatachef
>
>
> On Fri, 2011-05-06 at 11:08 -0700, Kim Vogt wrote:
> > Hi,
> >
> > I'm stuck on a query for counting distinct users. Say I have data that
> looks
> > like this:
> >
> > book, user1
> > book, user2
> > book, user1
> > movie, user1
> > movie, user2
> > movie, user3
> > music, user4
> >
> > I want to group by the first column and count the number of distinct
> users
> > for that product. The result would just be:
> >
> > book, 2
> > movie, 3
> > music, 1
> >
> > Is this piggable?
> >
> > Happy Friday!
> >
> > -Kim
>
>
>

Reply via email to