I have tried to UNION the results before the group, and I have found that
once I UNION, I can never recreate a schema.  Is this a bug?

> DESCRIBE thing1;
thing1: {name: chararray,property1: chararray,property2: double}
> DESCRIBE thing2;
thing2: {name: chararray,property1: chararray,property2: double}

combined_things = UNION thing1, thing2;
> DESCRIBE combined_things;
Schema for combined_things unknown.

> DUMP combined_things;
Output is fine!

> combined_things = FOREACH combined_things GENERATE $0 AS name:chararray,
$1 AS property1:chararray, $2 AS property2:double;
2010-05-07 11:49:17,015 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray.
Other Field Schema: given: chararray

> combined_things = FOREACH combined_things GENERATE (chararray)$0 AS
name:chararray, (chararray)$1 AS property1:chararray, (double )$2 AS
property2:double;
2010-05-07 11:52:53,305 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2999: Unexpected internal error.
org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to
java.lang.Error

My schema is gone, and I can never ever have it back because I have unioned?
 Is that a bug, or is this the intended behavior?

Russ

On Thu, May 6, 2010 at 5:22 PM, Russell Jurney <[email protected]>wrote:

> I have a bunch of grouped datasets that I need to union and store.  When I
> union them, they lose their schema.  I need the schema for my output storage
> function to work.  How do I recreate my a schema with a bag of tuples in it
> with a GENERATE/AS?
>
> The schema of each union'd source (all the same) was: g_records: {key:
> chararray,values: {A2: chararray,A3: double}}
>
> Code:
>
> ------------------
>
> records = LOAD 'records' USING PigStorage('\t') AS (A1:chararray,
> A2:chararray, A3:double);
> g_records = GROUP records BY A1;
> g_records = FOREACH g_records GENERATE $0 AS key:chararray, $1 AS values;
> g_records = FOREACH g_records GENERATE key, values.(A2, A3);
>
> > DESCRIBE g_records: {key: chararray,values: {A2: chararray,A3: double}}
>
> all_g_records = UNION g_records, g_records_2, g_records_3, g_records_4;
>
> /* Problem for me: */
> > DESCRIBE all_g_records: Schema for all_g_records unknown.
>
> output_records = FOREACH all_g_records GENERATE $0 AS key:chararray, $1 AS
> values:bag []  # errr... how?
>
> ------------------
>
> Thanks!
>
> Russell Jurney
> [email protected]
>

Reply via email to