I think you're missing a SUM and/or COUNT and that's the part I'm stuck on.
-Kim On Fri, May 6, 2011 at 11:24 AM, jacob <[email protected]> wrote: > Kim, > > This is something pig addresses exceedingly well: > > A = LOAD 'data' AS (item:chararray, user:chararray); > B = GROUP A BY item; > C = FOREACH B { > distinct_users = DISTINCT A.user; > GENERATE > group AS item, > distinct_users AS distinct_users > ; > }; > > should work. Haven't tested it though. > > --jacob > @thedatachef > > > On Fri, 2011-05-06 at 11:08 -0700, Kim Vogt wrote: > > Hi, > > > > I'm stuck on a query for counting distinct users. Say I have data that > looks > > like this: > > > > book, user1 > > book, user2 > > book, user1 > > movie, user1 > > movie, user2 > > movie, user3 > > music, user4 > > > > I want to group by the first column and count the number of distinct > users > > for that product. The result would just be: > > > > book, 2 > > movie, 3 > > music, 1 > > > > Is this piggable? > > > > Happy Friday! > > > > -Kim > > >
