I have my own branch, which differs even from LinkedIn's branch. I'll look at updating to something newer.
On Fri, May 7, 2010 at 1:16 PM, Dmitriy Ryaboy <[email protected]> wrote: > I am on the official release of 0.6 with a few patches, mostly adding > piggybank stuff. > You can grab my particular blend from the pig-twttr branch on my > github fork. It's all existing patches, just a bit of occasional > backporting. > No guarantees regarding stability, for that you need to hit up Cloudera > :-). > > -D > > On Fri, May 7, 2010 at 1:04 PM, Russell Jurney <[email protected]> > wrote: > > Thanks, I'm on Apache Pig version 0.6.1-dev (rexported). Perhaps I > should > > upgrade! > > > > I was able to code my way out of the schema black hole with this: > > > > all_things = UNION thing1, thing2, ...; > > all_things = FOREACH all_things GENERATE $0 AS field1, $1 AS field2, $2 > > AS field3; > > all_things = FOREACH all_things GENERATE (chararray) field1 AS > > field1:chararray, (chararray)field2 AS field2 chararray, > > (double)field3 AS field3:double; > > > > Apparently in 0.6 you could cast to a named bytearray, then cast that > > bytearray to any named type. > > > > Russ > > > > On Fri, May 7, 2010 at 1:00 PM, Dmitriy Ryaboy <[email protected]> > wrote: > > > >> What version of pig are you using? > >> > >> [dmit...@sjc1j039 ~]$ pig -x local > >> 2010-05-07 19:58:12,905 [main] INFO org.apache.pig.Main - Logging > >> error messages to: /var/log/pig/pig_1273262292904.log > >> grunt> set1 = load 'tmp/numbers' as (a:chararray, b:int, c:int); > >> grunt> set2 = load 'tmp/numbers' as (a:chararray, b:int, c:int); > >> grunt> describe set1; > >> set1: {a: chararray,b: int,c: int} > >> grunt> describe set2; > >> set2: {a: chararray,b: int,c: int} > >> grunt> unioned = union set1, set2; > >> grunt> describe unioned; > >> unioned: {a: chararray,b: int,c: int} > >> > >> > >> On Fri, May 7, 2010 at 12:21 PM, Russell Jurney > >> <[email protected]> wrote: > >> > I have tried to UNION the results before the group, and I have found > that > >> > once I UNION, I can never recreate a schema. Is this a bug? > >> > > >> >> DESCRIBE thing1; > >> > thing1: {name: chararray,property1: chararray,property2: double} > >> >> DESCRIBE thing2; > >> > thing2: {name: chararray,property1: chararray,property2: double} > >> > > >> > combined_things = UNION thing1, thing2; > >> >> DESCRIBE combined_things; > >> > Schema for combined_things unknown. > >> > > >> >> DUMP combined_things; > >> > Output is fine! > >> > > >> >> combined_things = FOREACH combined_things GENERATE $0 AS > name:chararray, > >> > $1 AS property1:chararray, $2 AS property2:double; > >> > 2010-05-07 11:49:17,015 [main] ERROR org.apache.pig.tools.grunt.Grunt > - > >> > ERROR 1022: Type mismatch merging schema prefix. Field Schema: > bytearray. > >> > Other Field Schema: given: chararray > >> > > >> >> combined_things = FOREACH combined_things GENERATE (chararray)$0 AS > >> > name:chararray, (chararray)$1 AS property1:chararray, (double )$2 AS > >> > property2:double; > >> > 2010-05-07 11:52:53,305 [main] ERROR org.apache.pig.tools.grunt.Grunt > - > >> > ERROR 2999: Unexpected internal error. > >> > org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to > >> > java.lang.Error > >> > > >> > My schema is gone, and I can never ever have it back because I have > >> unioned? > >> > Is that a bug, or is this the intended behavior? > >> > > >> > Russ > >> > > >> > On Thu, May 6, 2010 at 5:22 PM, Russell Jurney < > [email protected] > >> >wrote: > >> > > >> >> I have a bunch of grouped datasets that I need to union and store. > When > >> I > >> >> union them, they lose their schema. I need the schema for my output > >> storage > >> >> function to work. How do I recreate my a schema with a bag of tuples > in > >> it > >> >> with a GENERATE/AS? > >> >> > >> >> The schema of each union'd source (all the same) was: g_records: > {key: > >> >> chararray,values: {A2: chararray,A3: double}} > >> >> > >> >> Code: > >> >> > >> >> ------------------ > >> >> > >> >> records = LOAD 'records' USING PigStorage('\t') AS (A1:chararray, > >> >> A2:chararray, A3:double); > >> >> g_records = GROUP records BY A1; > >> >> g_records = FOREACH g_records GENERATE $0 AS key:chararray, $1 AS > >> values; > >> >> g_records = FOREACH g_records GENERATE key, values.(A2, A3); > >> >> > >> >> > DESCRIBE g_records: {key: chararray,values: {A2: chararray,A3: > >> double}} > >> >> > >> >> all_g_records = UNION g_records, g_records_2, g_records_3, > g_records_4; > >> >> > >> >> /* Problem for me: */ > >> >> > DESCRIBE all_g_records: Schema for all_g_records unknown. > >> >> > >> >> output_records = FOREACH all_g_records GENERATE $0 AS key:chararray, > $1 > >> AS > >> >> values:bag [] # errr... how? > >> >> > >> >> ------------------ > >> >> > >> >> Thanks! > >> >> > >> >> Russell Jurney > >> >> [email protected] > >> >> > >> > > >> > > >
