Thanks, that helps a lot! :)

Anze


On Friday 29 October 2010, Gerrit Jansen van Vuuren wrote:
> Hi,
> 
> Lets say you have a file with columns userid username location amount
> 
> To count the total number of users:
> A = LOAD 'myfile' as (userid:long, username:chararray, location:chararray,
> amount:long);
> G = GROUP A ALL PARALLEL 40;
> R = FOREACH G GENERATE COUNT($1);
> 
> dump R;
> 
> To count the number of users by location;
> 
> A = LOAD 'myfile' as (userid:long, username:chararray, location:chararray,
> amount:long);
> G = GROUP A BY location PARALLEL 40;
> R = FOREACH G GENERATE FLATTEN(group), COUNT($1);
> 
> dump R;
> 
> To get the sum of amount per location, userid
> 
> A = LOAD 'myfile' as (userid:long, username:chararray, location:chararray,
> amount:long);
> G = GROUP A BY (location, userid) PARALLEL 40;
> R = FOREACH G GENERATE FLATTEN(group), COUNT($1) as usercount,
> SUM($1.amount) as useramount;
> 
> 
> NOTE PARALLEL is set to 40 as an example, this should be set by you, and
> depends on your cluster setup, data etc.
> 
> To count its always GROUP either ALL or BY <column name>
> Then FOREACH and generate COUNT($1) the $1.
> 
> Hope this helps,
> 
> 
> -----Original Message-----
> From: Anze [mailto:[email protected]]
> Sent: Friday, October 29, 2010 12:01 PM
> To: [email protected]
> Subject: relations count
> 
> Hi!
> 
> I hope this is not too newbie question, but it's driving me crazy... How do
> you count the records in a relation? Like DUMP, but instead of list of
> records, I would like their count.
> 
> Thanks,
> 
> Anze

Reply via email to