Yep, works  :)

-Kim

On Fri, May 6, 2011 at 11:32 AM, jacob <[email protected]> wrote:

>
> Sorry, that's what I get for trying to do things quickly :)
>
>
> A = LOAD 'foo.tsv' AS (item:chararray, user:chararray);
> B = GROUP A BY item;
> C = FOREACH B {
>      distinct_users = DISTINCT A.user;
>      GENERATE
>        group AS item,
>         COUNT(distinct_users) AS num_distinct_users
>      ;
>    };
>
> And I just tested it in local mode with Pig 0.8, works great.
>
> --jacob
> @thedatachef
>
> On Fri, 2011-05-06 at 11:30 -0700, Kim Vogt wrote:
> > I think you're missing a SUM and/or COUNT and that's the part I'm stuck
> on.
> >
> > -Kim
> >
> > On Fri, May 6, 2011 at 11:24 AM, jacob <[email protected]>
> wrote:
> >
> > > Kim,
> > >
> > > This is something pig addresses exceedingly well:
> > >
> > > A = LOAD 'data' AS (item:chararray, user:chararray);
> > > B = GROUP A BY item;
> > > C = FOREACH B {
> > >      distinct_users = DISTINCT A.user;
> > >      GENERATE
> > >        group AS item,
> > >        distinct_users AS distinct_users
> > >      ;
> > >    };
> > >
> > > should work. Haven't tested it though.
> > >
> > > --jacob
> > > @thedatachef
> > >
> > >
> > > On Fri, 2011-05-06 at 11:08 -0700, Kim Vogt wrote:
> > > > Hi,
> > > >
> > > > I'm stuck on a query for counting distinct users. Say I have data
> that
> > > looks
> > > > like this:
> > > >
> > > > book, user1
> > > > book, user2
> > > > book, user1
> > > > movie, user1
> > > > movie, user2
> > > > movie, user3
> > > > music, user4
> > > >
> > > > I want to group by the first column and count the number of distinct
> > > users
> > > > for that product. The result would just be:
> > > >
> > > > book, 2
> > > > movie, 3
> > > > music, 1
> > > >
> > > > Is this piggable?
> > > >
> > > > Happy Friday!
> > > >
> > > > -Kim
> > >
> > >
> > >
>
>
>

Reply via email to