Hi all,

I just test a very simple pig script as following:

records = LOAD '$input' AS (hash:chararray, domain:chararray,
host:chararray, page:chararray, freq:int);
grpd = GROUP records BY (domain, host);
stats = FOREACH grpd {
                                  hashes = records.hash;
                                  uniq_hashes = DISTINCT hashes;
                                  pages = records.page;
                                  GENERATE group.$1 AS host, group.$0 AS
domain, COUNT(uniq_hashes) AS hash_total:long, PAGE_COUNT(pages) AS
page_count:long, SUM(freq) AS freq:long);
};
STORE stats INTO '$output';

where PAGE_COUNT is a customized UDF implementing Accumulator. I add an
EXEC_CALL and ACCUM_CALL counter in this UDF and it looks that the
accumulate method is never called. Even I tried to remove all other
built-in UDFs and keep the NESTED FOREACH as simple as:

stats = FOREACH grpd {
                                  pages = records.page;
                                  GENERATE group.$1 AS host, group.$0 AS
domain, PAGE_COUNT(pages) AS page_count:long;
};

Anyone idea what's going on behind the scenes?

Thanks,
Yen

Reply via email to