Hi, Adil,
This is a known bug for 0.5
(https://issues.apache.org/jira/browse/PIG-850). It is fixed in 0.6
which will come in several days.
Adil Aijaz wrote:
Hi there
I am using the following version of pig:
~/workspace$ pig-test --version
Apache Pig version 0.5.0 (r829623)
compiled Oct 25 2009, 18:58:38
I expect the following simple script to reduce the input to a manageable
size and then perform a group by to simply do a count.
A = load '/user/aaijaz/bb' using PigStorage(',') as (a:int);
B = limit A 100;
C = group B by a parallel 20;
D = foreach C generate
flatten(group) as a,
COUNT(B) as num;
describe D;
dump D;
However, here is the output I get:
(1211)
(1222)
(1211)
If I replace dump with a store, I get the right output:
(1211, 2)
(1222, 1)
It is interesting to note that the describe works just fine in either case.
My questions:
1) Is this a known bug or am I misusing limit in some way?
2) If it is an unknown bug, I can go ahead and create a jira.
Adil