Re: Pig 0.8: DESCRIBE and DUMP are in disagreement after a GROUP BY and a FLATTEN

Alan Gates Tue, 15 Feb 2011 01:17:17 -0800

The issue here is that describe is incorrectly removing the secondlevel of tuple, even though dump is doing the right thing. I testedit again the top of trunk code, and describe now does the rightthing. I suspect this is a side effect of the semantics work that'sbeen going on (see https://issues.apache.org/jira/browse/PIG-998).


Alan.


On Feb 15, 2011, at 3:10 AM, amramesh wrote:


Hi,

I have been using Pig for a few months now, was using 0.7 earlier and

recently migrated to 0.8. In a script I am working on right now Ihit a

snag where the script failed. I investigated some more, and have been

able to generate a small illustrative example which I think is thecause

of the problem. Here it is:

data = LOAD '$INPUT' USING PigStorage(',') AS (f0:chararray, f1:
chararray, f2:int, f3:chararray);

DUMP data;
(A, apple,1, alpha)
(A, airplane,1, alpha)
(B, ball,2, beta)
(C, cat,3, gamma)
(C, candle,3, gamma)
(D, dog,4, delta)

data = FOREACH data GENERATE f0, TOTUPLE(f1, f2) AS t1, f3;
data = GROUP data BY f0;

data = FOREACH data GENERATE group AS f0, data.t1 AS b1, data.f3 ASb3;

data = FOREACH data GENERATE f0, FLATTEN(b1), b3;

DESCRIBE data;

data: {f0: chararray,b1::f1: chararray,b1::f2: int,b3: {f3:chararray}}


DUMP data;
(A,( apple,1),{( alpha),( alpha)})
(A,( airplane,1),{( alpha),( alpha)})
(B,( ball,2),{( beta)})
(C,( cat,3),{( gamma),( gamma)})
(C,( candle,3),{( gamma),( gamma)})
(D,( dog,4),{( delta)})

DESCRIBE appears to claim that the tuple would also be flattenedout

into two fields, while DUMP keeps the tuple as is (which should be the
correct behavior). When I try a subsequent FLATTEN on the tuple the
script fails with error 2229.

Any insights/solutions would be very helpful!

Thanks!
Amit

Re: Pig 0.8: DESCRIBE and DUMP are in disagreement after a GROUP BY and a FLATTEN

Reply via email to