I am on the official release of 0.6 with a few patches, mostly adding piggybank stuff. You can grab my particular blend from the pig-twttr branch on my github fork. It's all existing patches, just a bit of occasional backporting. No guarantees regarding stability, for that you need to hit up Cloudera :-).
-D On Fri, May 7, 2010 at 1:04 PM, Russell Jurney <[email protected]> wrote: > Thanks, I'm on Apache Pig version 0.6.1-dev (rexported). Perhaps I should > upgrade! > > I was able to code my way out of the schema black hole with this: > > all_things = UNION thing1, thing2, ...; > all_things = FOREACH all_things GENERATE $0 AS field1, $1 AS field2, $2 > AS field3; > all_things = FOREACH all_things GENERATE (chararray) field1 AS > field1:chararray, (chararray)field2 AS field2 chararray, > (double)field3 AS field3:double; > > Apparently in 0.6 you could cast to a named bytearray, then cast that > bytearray to any named type. > > Russ > > On Fri, May 7, 2010 at 1:00 PM, Dmitriy Ryaboy <[email protected]> wrote: > >> What version of pig are you using? >> >> [dmit...@sjc1j039 ~]$ pig -x local >> 2010-05-07 19:58:12,905 [main] INFO org.apache.pig.Main - Logging >> error messages to: /var/log/pig/pig_1273262292904.log >> grunt> set1 = load 'tmp/numbers' as (a:chararray, b:int, c:int); >> grunt> set2 = load 'tmp/numbers' as (a:chararray, b:int, c:int); >> grunt> describe set1; >> set1: {a: chararray,b: int,c: int} >> grunt> describe set2; >> set2: {a: chararray,b: int,c: int} >> grunt> unioned = union set1, set2; >> grunt> describe unioned; >> unioned: {a: chararray,b: int,c: int} >> >> >> On Fri, May 7, 2010 at 12:21 PM, Russell Jurney >> <[email protected]> wrote: >> > I have tried to UNION the results before the group, and I have found that >> > once I UNION, I can never recreate a schema. Is this a bug? >> > >> >> DESCRIBE thing1; >> > thing1: {name: chararray,property1: chararray,property2: double} >> >> DESCRIBE thing2; >> > thing2: {name: chararray,property1: chararray,property2: double} >> > >> > combined_things = UNION thing1, thing2; >> >> DESCRIBE combined_things; >> > Schema for combined_things unknown. >> > >> >> DUMP combined_things; >> > Output is fine! >> > >> >> combined_things = FOREACH combined_things GENERATE $0 AS name:chararray, >> > $1 AS property1:chararray, $2 AS property2:double; >> > 2010-05-07 11:49:17,015 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> > ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray. >> > Other Field Schema: given: chararray >> > >> >> combined_things = FOREACH combined_things GENERATE (chararray)$0 AS >> > name:chararray, (chararray)$1 AS property1:chararray, (double )$2 AS >> > property2:double; >> > 2010-05-07 11:52:53,305 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> > ERROR 2999: Unexpected internal error. >> > org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to >> > java.lang.Error >> > >> > My schema is gone, and I can never ever have it back because I have >> unioned? >> > Is that a bug, or is this the intended behavior? >> > >> > Russ >> > >> > On Thu, May 6, 2010 at 5:22 PM, Russell Jurney <[email protected] >> >wrote: >> > >> >> I have a bunch of grouped datasets that I need to union and store. When >> I >> >> union them, they lose their schema. I need the schema for my output >> storage >> >> function to work. How do I recreate my a schema with a bag of tuples in >> it >> >> with a GENERATE/AS? >> >> >> >> The schema of each union'd source (all the same) was: g_records: {key: >> >> chararray,values: {A2: chararray,A3: double}} >> >> >> >> Code: >> >> >> >> ------------------ >> >> >> >> records = LOAD 'records' USING PigStorage('\t') AS (A1:chararray, >> >> A2:chararray, A3:double); >> >> g_records = GROUP records BY A1; >> >> g_records = FOREACH g_records GENERATE $0 AS key:chararray, $1 AS >> values; >> >> g_records = FOREACH g_records GENERATE key, values.(A2, A3); >> >> >> >> > DESCRIBE g_records: {key: chararray,values: {A2: chararray,A3: >> double}} >> >> >> >> all_g_records = UNION g_records, g_records_2, g_records_3, g_records_4; >> >> >> >> /* Problem for me: */ >> >> > DESCRIBE all_g_records: Schema for all_g_records unknown. >> >> >> >> output_records = FOREACH all_g_records GENERATE $0 AS key:chararray, $1 >> AS >> >> values:bag [] # errr... how? >> >> >> >> ------------------ >> >> >> >> Thanks! >> >> >> >> Russell Jurney >> >> [email protected] >> >> >> > >> >
