Sorry, that's what I get for trying to do things quickly :)

A = LOAD 'foo.tsv' AS (item:chararray, user:chararray);
B = GROUP A BY item;
C = FOREACH B {
      distinct_users = DISTINCT A.user;
      GENERATE
        group AS item,
        COUNT(distinct_users) AS num_distinct_users
      ;
    };

And I just tested it in local mode with Pig 0.8, works great.

--jacob
@thedatachef

On Fri, 2011-05-06 at 11:30 -0700, Kim Vogt wrote:
> I think you're missing a SUM and/or COUNT and that's the part I'm stuck on.
> 
> -Kim
> 
> On Fri, May 6, 2011 at 11:24 AM, jacob <[email protected]> wrote:
> 
> > Kim,
> >
> > This is something pig addresses exceedingly well:
> >
> > A = LOAD 'data' AS (item:chararray, user:chararray);
> > B = GROUP A BY item;
> > C = FOREACH B {
> >      distinct_users = DISTINCT A.user;
> >      GENERATE
> >        group AS item,
> >        distinct_users AS distinct_users
> >      ;
> >    };
> >
> > should work. Haven't tested it though.
> >
> > --jacob
> > @thedatachef
> >
> >
> > On Fri, 2011-05-06 at 11:08 -0700, Kim Vogt wrote:
> > > Hi,
> > >
> > > I'm stuck on a query for counting distinct users. Say I have data that
> > looks
> > > like this:
> > >
> > > book, user1
> > > book, user2
> > > book, user1
> > > movie, user1
> > > movie, user2
> > > movie, user3
> > > music, user4
> > >
> > > I want to group by the first column and count the number of distinct
> > users
> > > for that product. The result would just be:
> > >
> > > book, 2
> > > movie, 3
> > > music, 1
> > >
> > > Is this piggable?
> > >
> > > Happy Friday!
> > >
> > > -Kim
> >
> >
> >


Reply via email to