The issue here is that describe is incorrectly removing the second level of tuple, even though dump is doing the right thing. I tested it again the top of trunk code, and describe now does the right thing. I suspect this is a side effect of the semantics work that's been going on (see https://issues.apache.org/jira/browse/PIG-998).

Alan.

On Feb 15, 2011, at 3:10 AM, amramesh wrote:


Hi,

I have been using Pig for a few months now, was using 0.7 earlier and
recently migrated to 0.8. In a script I am working on right now I hit a
snag where the script failed. I investigated some more, and have been
able to generate a small illustrative example which I think is the cause
of the problem. Here it is:

data = LOAD '$INPUT' USING PigStorage(',') AS (f0:chararray, f1:
chararray, f2:int, f3:chararray);

DUMP data;
(A, apple,1, alpha)
(A, airplane,1, alpha)
(B, ball,2, beta)
(C, cat,3, gamma)
(C, candle,3, gamma)
(D, dog,4, delta)

data = FOREACH data GENERATE f0, TOTUPLE(f1, f2) AS t1, f3;
data = GROUP data BY f0;
data = FOREACH data GENERATE group AS f0, data.t1 AS b1, data.f3 AS b3;
data = FOREACH data GENERATE f0, FLATTEN(b1), b3;

DESCRIBE data;
data: {f0: chararray,b1::f1: chararray,b1::f2: int,b3: {f3: chararray}}

DUMP data;
(A,( apple,1),{( alpha),( alpha)})
(A,( airplane,1),{( alpha),( alpha)})
(B,( ball,2),{( beta)})
(C,( cat,3),{( gamma),( gamma)})
(C,( candle,3),{( gamma),( gamma)})
(D,( dog,4),{( delta)})

DESCRIBE appears to claim that the tuple would also be flattened out
into two fields, while DUMP keeps the tuple as is (which should be the
correct behavior). When I try a subsequent FLATTEN on the tuple the
script fails with error 2229.

Any insights/solutions would be very helpful!

Thanks!
Amit


Reply via email to