What version of pig are you using? just as an experiment in the simple case, can you try doing
GENERATE flatten(group) as (domain,host), ...(the rest)... shouldn't make a difference, but I think I remember that in some older versions it did 2012/3/13 Yen SYU <[email protected]> > Hi all, > > I just test a very simple pig script as following: > > records = LOAD '$input' AS (hash:chararray, domain:chararray, > host:chararray, page:chararray, freq:int); > grpd = GROUP records BY (domain, host); > stats = FOREACH grpd { > hashes = records.hash; > uniq_hashes = DISTINCT hashes; > pages = records.page; > GENERATE group.$1 AS host, group.$0 AS > domain, COUNT(uniq_hashes) AS hash_total:long, PAGE_COUNT(pages) AS > page_count:long, SUM(freq) AS freq:long); > }; > STORE stats INTO '$output'; > > where PAGE_COUNT is a customized UDF implementing Accumulator. I add an > EXEC_CALL and ACCUM_CALL counter in this UDF and it looks that the > accumulate method is never called. Even I tried to remove all other > built-in UDFs and keep the NESTED FOREACH as simple as: > > stats = FOREACH grpd { > pages = records.page; > GENERATE group.$1 AS host, group.$0 AS > domain, PAGE_COUNT(pages) AS page_count:long; > }; > > Anyone idea what's going on behind the scenes? > > Thanks, > Yen >
