What version of pig are you using?

just as an experiment in the simple case, can you try doing

GENERATE flatten(group) as (domain,host), ...(the rest)...

shouldn't make a difference, but I think I remember that in some older
versions it did

2012/3/13 Yen SYU <[email protected]>

> Hi all,
>
> I just test a very simple pig script as following:
>
> records = LOAD '$input' AS (hash:chararray, domain:chararray,
> host:chararray, page:chararray, freq:int);
> grpd = GROUP records BY (domain, host);
> stats = FOREACH grpd {
>                                  hashes = records.hash;
>                                  uniq_hashes = DISTINCT hashes;
>                                  pages = records.page;
>                                  GENERATE group.$1 AS host, group.$0 AS
> domain, COUNT(uniq_hashes) AS hash_total:long, PAGE_COUNT(pages) AS
> page_count:long, SUM(freq) AS freq:long);
> };
> STORE stats INTO '$output';
>
> where PAGE_COUNT is a customized UDF implementing Accumulator. I add an
> EXEC_CALL and ACCUM_CALL counter in this UDF and it looks that the
> accumulate method is never called. Even I tried to remove all other
> built-in UDFs and keep the NESTED FOREACH as simple as:
>
> stats = FOREACH grpd {
>                                  pages = records.page;
>                                  GENERATE group.$1 AS host, group.$0 AS
> domain, PAGE_COUNT(pages) AS page_count:long;
> };
>
> Anyone idea what's going on behind the scenes?
>
> Thanks,
> Yen
>

Reply via email to