Hi there

I am using the following version of pig:

~/workspace$ pig-test --version
Apache Pig version 0.5.0 (r829623)
compiled Oct 25 2009, 18:58:38


I expect the following simple script to reduce the input to a manageable
size and then perform a group by to simply do a count.

A = load '/user/aaijaz/bb' using PigStorage(',') as (a:int);

B = limit A 100;

C = group B by a parallel 20;

D  = foreach C generate
        flatten(group) as a,
        COUNT(B) as num;

describe D;
dump D;

However, here is the output I get:

(1211)
(1222)
(1211)

If I replace dump with a store, I get the right output:
(1211, 2)
(1222, 1)

It is interesting to note that the describe works just fine in either case.

My questions:

1) Is this a known bug or am I misusing limit in some way?
2) If it is an unknown bug, I can go ahead and create a jira.

Adil

Reply via email to