Hi,
I have been using Pig for a few months now, was using 0.7 earlier and
recently migrated to 0.8. In a script I am working on right now I hit a
snag where the script failed. I investigated some more, and have been
able to generate a small illustrative example which I think is the cause
of the problem. Here it is:
data = LOAD '$INPUT' USING PigStorage(',') AS (f0:chararray, f1:
chararray, f2:int, f3:chararray);
DUMP data;
(A, apple,1, alpha)
(A, airplane,1, alpha)
(B, ball,2, beta)
(C, cat,3, gamma)
(C, candle,3, gamma)
(D, dog,4, delta)
data = FOREACH data GENERATE f0, TOTUPLE(f1, f2) AS t1, f3;
data = GROUP data BY f0;
data = FOREACH data GENERATE group AS f0, data.t1 AS b1, data.f3 AS b3;
data = FOREACH data GENERATE f0, FLATTEN(b1), b3;
DESCRIBE data;
data: {f0: chararray,b1::f1: chararray,b1::f2: int,b3: {f3: chararray}}
DUMP data;
(A,( apple,1),{( alpha),( alpha)})
(A,( airplane,1),{( alpha),( alpha)})
(B,( ball,2),{( beta)})
(C,( cat,3),{( gamma),( gamma)})
(C,( candle,3),{( gamma),( gamma)})
(D,( dog,4),{( delta)})
DESCRIBE appears to claim that the tuple would also be flattened out
into two fields, while DUMP keeps the tuple as is (which should be the
correct behavior). When I try a subsequent FLATTEN on the tuple the
script fails with error 2229.
Any insights/solutions would be very helpful!
Thanks!
Amit