Thanks, I'm on Apache Pig version 0.6.1-dev (rexported).  Perhaps I should
upgrade!

I was able to code my way out of the schema black hole with this:

all_things = UNION thing1, thing2, ...;
all_things = FOREACH all_things GENERATE $0 AS field1, $1 AS field2, $2
AS field3;
all_things = FOREACH all_things GENERATE (chararray) field1 AS
field1:chararray, (chararray)field2 AS field2  chararray,
(double)field3 AS field3:double;

Apparently in 0.6 you could cast to a named bytearray, then cast that
bytearray to any named type.

Russ

On Fri, May 7, 2010 at 1:00 PM, Dmitriy Ryaboy <[email protected]> wrote:

> What version of pig are you using?
>
> [dmit...@sjc1j039 ~]$ pig -x local
> 2010-05-07 19:58:12,905 [main] INFO  org.apache.pig.Main - Logging
> error messages to: /var/log/pig/pig_1273262292904.log
> grunt> set1 = load 'tmp/numbers' as (a:chararray, b:int, c:int);
> grunt> set2 = load 'tmp/numbers' as (a:chararray, b:int, c:int);
> grunt> describe set1;
> set1: {a: chararray,b: int,c: int}
> grunt> describe set2;
> set2: {a: chararray,b: int,c: int}
> grunt> unioned = union set1, set2;
> grunt> describe unioned;
> unioned: {a: chararray,b: int,c: int}
>
>
> On Fri, May 7, 2010 at 12:21 PM, Russell Jurney
> <[email protected]> wrote:
> > I have tried to UNION the results before the group, and I have found that
> > once I UNION, I can never recreate a schema.  Is this a bug?
> >
> >> DESCRIBE thing1;
> > thing1: {name: chararray,property1: chararray,property2: double}
> >> DESCRIBE thing2;
> > thing2: {name: chararray,property1: chararray,property2: double}
> >
> > combined_things = UNION thing1, thing2;
> >> DESCRIBE combined_things;
> > Schema for combined_things unknown.
> >
> >> DUMP combined_things;
> > Output is fine!
> >
> >> combined_things = FOREACH combined_things GENERATE $0 AS name:chararray,
> > $1 AS property1:chararray, $2 AS property2:double;
> > 2010-05-07 11:49:17,015 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray.
> > Other Field Schema: given: chararray
> >
> >> combined_things = FOREACH combined_things GENERATE (chararray)$0 AS
> > name:chararray, (chararray)$1 AS property1:chararray, (double )$2 AS
> > property2:double;
> > 2010-05-07 11:52:53,305 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 2999: Unexpected internal error.
> > org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to
> > java.lang.Error
> >
> > My schema is gone, and I can never ever have it back because I have
> unioned?
> >  Is that a bug, or is this the intended behavior?
> >
> > Russ
> >
> > On Thu, May 6, 2010 at 5:22 PM, Russell Jurney <[email protected]
> >wrote:
> >
> >> I have a bunch of grouped datasets that I need to union and store.  When
> I
> >> union them, they lose their schema.  I need the schema for my output
> storage
> >> function to work.  How do I recreate my a schema with a bag of tuples in
> it
> >> with a GENERATE/AS?
> >>
> >> The schema of each union'd source (all the same) was: g_records: {key:
> >> chararray,values: {A2: chararray,A3: double}}
> >>
> >> Code:
> >>
> >> ------------------
> >>
> >> records = LOAD 'records' USING PigStorage('\t') AS (A1:chararray,
> >> A2:chararray, A3:double);
> >> g_records = GROUP records BY A1;
> >> g_records = FOREACH g_records GENERATE $0 AS key:chararray, $1 AS
> values;
> >> g_records = FOREACH g_records GENERATE key, values.(A2, A3);
> >>
> >> > DESCRIBE g_records: {key: chararray,values: {A2: chararray,A3:
> double}}
> >>
> >> all_g_records = UNION g_records, g_records_2, g_records_3, g_records_4;
> >>
> >> /* Problem for me: */
> >> > DESCRIBE all_g_records: Schema for all_g_records unknown.
> >>
> >> output_records = FOREACH all_g_records GENERATE $0 AS key:chararray, $1
> AS
> >> values:bag []  # errr... how?
> >>
> >> ------------------
> >>
> >> Thanks!
> >>
> >> Russell Jurney
> >> [email protected]
> >>
> >
>

Reply via email to