Is there any chance you can test this using the version of pig in trunk? That'd be very helpful. If it's still an issue, file a JIRA and I'll take a look.
2012/3/27 Subir S <[email protected]> > Hi, > > There is a trivial issue with PigStats (during HASHJOIN), it does not print > correct record count. My job does a LEFT OUTER join operation and hence the > row count with input B should match output C. After seeing the difference > in count i cross checked, but seems it is only a printing issue...Hope this > is a bug which might have been already fixed by now? Can somebody advise!! > > > > 2012-03-27 06:28:39,200 [main] INFO org.apache.pig.tools.pigstats.PigStats > - Script Statistics: > > HadoopVersion PigVersion UserId StartedAt FinishedAt > Features > 0.20.2-cdh3u0 0.8.0-cdh3u0 ssasik0 2012-03-27 06:26:30 2012-03-27 > 06:28:39 HASH_JOIN > > Success! > > Job Stats (time in seconds): > JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime > MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs > job_201203261530_0597 30 4 28 7 17 100 > 97 98 > gold_price_link,items,items_price_link,items_price_link1,work_price_link > HASH_JOIN > > /tmp/pricing_hub/work/ssasik0/output/iteration1/hierarchy/20120319/hierarchy_items_final, > > Input(s): > Successfully read 9552894 records from: "/A/*" > Successfully read *9552894* records from: *"/B/*"* > > Output(s): > Successfully stored 12277671 records (2625930049 bytes) in: "/C/" > > Counters: > Total records written : 12277671 > Total bytes written : 2625930049 > Spillable Memory Manager spill count : 0 > Total bags proactively spilled: 0 > Total records proactively spilled: 0 > > Job DAG: > job_201203261530_0597 > > > 2012-03-27 06:28:39,211 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Success! > -bash-3.2$ hadoop fs -cat */B/** | wc -l > *12277671* >
