Thanks, that helps a lot! :) Anze
On Friday 29 October 2010, Gerrit Jansen van Vuuren wrote: > Hi, > > Lets say you have a file with columns userid username location amount > > To count the total number of users: > A = LOAD 'myfile' as (userid:long, username:chararray, location:chararray, > amount:long); > G = GROUP A ALL PARALLEL 40; > R = FOREACH G GENERATE COUNT($1); > > dump R; > > To count the number of users by location; > > A = LOAD 'myfile' as (userid:long, username:chararray, location:chararray, > amount:long); > G = GROUP A BY location PARALLEL 40; > R = FOREACH G GENERATE FLATTEN(group), COUNT($1); > > dump R; > > To get the sum of amount per location, userid > > A = LOAD 'myfile' as (userid:long, username:chararray, location:chararray, > amount:long); > G = GROUP A BY (location, userid) PARALLEL 40; > R = FOREACH G GENERATE FLATTEN(group), COUNT($1) as usercount, > SUM($1.amount) as useramount; > > > NOTE PARALLEL is set to 40 as an example, this should be set by you, and > depends on your cluster setup, data etc. > > To count its always GROUP either ALL or BY <column name> > Then FOREACH and generate COUNT($1) the $1. > > Hope this helps, > > > -----Original Message----- > From: Anze [mailto:[email protected]] > Sent: Friday, October 29, 2010 12:01 PM > To: [email protected] > Subject: relations count > > Hi! > > I hope this is not too newbie question, but it's driving me crazy... How do > you count the records in a relation? Like DUMP, but instead of list of > records, I would like their count. > > Thanks, > > Anze
