Very nice, worked like a champ, Prashant.

Any chance you could explain why? I'd love to be taught to fish, not just given 
the fish to eat. ;-)

GROUP ALL, as I read it, pulls the tuples into a single group. But, FOREACH'ing 
on each group, and counting against productscans is where my brain starts to 
hurt.


Thanks again for your help!
-Jason


On Mar 22, 2012, at 3:33 PM, Prashant Kommireddi wrote:

> Hi Jason,
> 
> Are you trying to count the number of records in the relation
> 'productscans'? In which case you would have to use GROUP
> http://pig.apache.org/docs/r0.9.1/basic.html#GROUP
> 
> grpd = GROUP productscans ALL;
> scancount = FOREACH grpd GENERATE COUNT(productscans);
> DUMP scancount;
> 
> Thanks,
> Prashant
> 
> On Thu, Mar 22, 2012 at 1:28 PM, Jason Alexander <[email protected]>wrote:
> 
>> Hey all,
>> 
>> 
>> I'm trying to write a script to pull the count of a dataset that I've
>> filtered.
>> 
>> Here's the script so far:
>> 
>> /* scans by title */
>> 
>> scans = LOAD '/hive/scans/*' USING PigStorage(',') AS
>> (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
>> productscans = FILTER scans BY (title MATCHES 'proactiv');
>> scancount = FOREACH productscans GENERATE COUNT($0);
>> DUMP scancount;
>> 
>> For some reason, I get the error:
>> 
>> Could not infer the matching function for org.apache.pig.builtin.COUNT as
>> multiple or none of them fit. Please use an explicit cast.
>> 
>> What am I doing wrong here? I'm assuming it has something to do with the
>> type of the field I'm passing in, but I can't seem to resolve this.
>> 
>> 
>> TIA,
>> -Jason
>> 
>> 
>> 
>> 

Reply via email to