Hi Jon, Thanks for your reponse! I use pig 0.9.1-snapshot.
I've used FLATTEN instead of $0 and $1, but ACCUM_CALL is still not fired. Also tried to remove generic type in accumulator but it did not help. :( Is it easy for you to fire accumulator? Yen On Tue, Mar 13, 2012 at 3:06 PM, Jonathan Coveney <[email protected]>wrote: > What version of pig are you using? > > just as an experiment in the simple case, can you try doing > > GENERATE flatten(group) as (domain,host), ...(the rest)... > > shouldn't make a difference, but I think I remember that in some older > versions it did > > 2012/3/13 Yen SYU <[email protected]> > > > Hi all, > > > > I just test a very simple pig script as following: > > > > records = LOAD '$input' AS (hash:chararray, domain:chararray, > > host:chararray, page:chararray, freq:int); > > grpd = GROUP records BY (domain, host); > > stats = FOREACH grpd { > > hashes = records.hash; > > uniq_hashes = DISTINCT hashes; > > pages = records.page; > > GENERATE group.$1 AS host, group.$0 AS > > domain, COUNT(uniq_hashes) AS hash_total:long, PAGE_COUNT(pages) AS > > page_count:long, SUM(freq) AS freq:long); > > }; > > STORE stats INTO '$output'; > > > > where PAGE_COUNT is a customized UDF implementing Accumulator. I add an > > EXEC_CALL and ACCUM_CALL counter in this UDF and it looks that the > > accumulate method is never called. Even I tried to remove all other > > built-in UDFs and keep the NESTED FOREACH as simple as: > > > > stats = FOREACH grpd { > > pages = records.page; > > GENERATE group.$1 AS host, group.$0 AS > > domain, PAGE_COUNT(pages) AS page_count:long; > > }; > > > > Anyone idea what's going on behind the scenes? > > > > Thanks, > > Yen > > >
