Very nice, worked like a champ, Prashant. Any chance you could explain why? I'd love to be taught to fish, not just given the fish to eat. ;-)
GROUP ALL, as I read it, pulls the tuples into a single group. But, FOREACH'ing on each group, and counting against productscans is where my brain starts to hurt. Thanks again for your help! -Jason On Mar 22, 2012, at 3:33 PM, Prashant Kommireddi wrote: > Hi Jason, > > Are you trying to count the number of records in the relation > 'productscans'? In which case you would have to use GROUP > http://pig.apache.org/docs/r0.9.1/basic.html#GROUP > > grpd = GROUP productscans ALL; > scancount = FOREACH grpd GENERATE COUNT(productscans); > DUMP scancount; > > Thanks, > Prashant > > On Thu, Mar 22, 2012 at 1:28 PM, Jason Alexander <[email protected]>wrote: > >> Hey all, >> >> >> I'm trying to write a script to pull the count of a dataset that I've >> filtered. >> >> Here's the script so far: >> >> /* scans by title */ >> >> scans = LOAD '/hive/scans/*' USING PigStorage(',') AS >> (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray); >> productscans = FILTER scans BY (title MATCHES 'proactiv'); >> scancount = FOREACH productscans GENERATE COUNT($0); >> DUMP scancount; >> >> For some reason, I get the error: >> >> Could not infer the matching function for org.apache.pig.builtin.COUNT as >> multiple or none of them fit. Please use an explicit cast. >> >> What am I doing wrong here? I'm assuming it has something to do with the >> type of the field I'm passing in, but I can't seem to resolve this. >> >> >> TIA, >> -Jason >> >> >> >>
