Hi Folks:
I am new to the pig world. I have been using it for about a week and I am
completely blown away with how good it is.
I have a question about Schemas. I have a processing chain similar to the
following:
A = LOAD 'data' USING PigStorage('\u0001') AS (y:chararray, cust1:int,
cust2:int);
B = FOREACH A GENERATE (y, {(cust1), (cust2)}) AS t: tuple(y, CUSTS);
C = FOREACH B GENERATE(t.y, FLATTEN(t.CUSTS));
So, basically, my raw data contains multiple customer records per row, and some
common data. I would like to "explode" each row, so that I have one row per
customer data (which also includes the common data).
The code above does this, however, I am not able to supply a schema for C.
Whenever I try to do this, I get an error regarding mismatched schemas.
I would greatly appreciate any pointers you may have.
Best regards,
Dave.