What version of pig are you using?

[dmit...@sjc1j039 ~]$ pig -x local
2010-05-07 19:58:12,905 [main] INFO  org.apache.pig.Main - Logging
error messages to: /var/log/pig/pig_1273262292904.log
grunt> set1 = load 'tmp/numbers' as (a:chararray, b:int, c:int);
grunt> set2 = load 'tmp/numbers' as (a:chararray, b:int, c:int);
grunt> describe set1;
set1: {a: chararray,b: int,c: int}
grunt> describe set2;
set2: {a: chararray,b: int,c: int}
grunt> unioned = union set1, set2;
grunt> describe unioned;
unioned: {a: chararray,b: int,c: int}


On Fri, May 7, 2010 at 12:21 PM, Russell Jurney
<[email protected]> wrote:
> I have tried to UNION the results before the group, and I have found that
> once I UNION, I can never recreate a schema.  Is this a bug?
>
>> DESCRIBE thing1;
> thing1: {name: chararray,property1: chararray,property2: double}
>> DESCRIBE thing2;
> thing2: {name: chararray,property1: chararray,property2: double}
>
> combined_things = UNION thing1, thing2;
>> DESCRIBE combined_things;
> Schema for combined_things unknown.
>
>> DUMP combined_things;
> Output is fine!
>
>> combined_things = FOREACH combined_things GENERATE $0 AS name:chararray,
> $1 AS property1:chararray, $2 AS property2:double;
> 2010-05-07 11:49:17,015 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1022: Type mismatch merging schema prefix. Field Schema: bytearray.
> Other Field Schema: given: chararray
>
>> combined_things = FOREACH combined_things GENERATE (chararray)$0 AS
> name:chararray, (chararray)$1 AS property1:chararray, (double )$2 AS
> property2:double;
> 2010-05-07 11:52:53,305 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2999: Unexpected internal error.
> org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to
> java.lang.Error
>
> My schema is gone, and I can never ever have it back because I have unioned?
>  Is that a bug, or is this the intended behavior?
>
> Russ
>
> On Thu, May 6, 2010 at 5:22 PM, Russell Jurney 
> <[email protected]>wrote:
>
>> I have a bunch of grouped datasets that I need to union and store.  When I
>> union them, they lose their schema.  I need the schema for my output storage
>> function to work.  How do I recreate my a schema with a bag of tuples in it
>> with a GENERATE/AS?
>>
>> The schema of each union'd source (all the same) was: g_records: {key:
>> chararray,values: {A2: chararray,A3: double}}
>>
>> Code:
>>
>> ------------------
>>
>> records = LOAD 'records' USING PigStorage('\t') AS (A1:chararray,
>> A2:chararray, A3:double);
>> g_records = GROUP records BY A1;
>> g_records = FOREACH g_records GENERATE $0 AS key:chararray, $1 AS values;
>> g_records = FOREACH g_records GENERATE key, values.(A2, A3);
>>
>> > DESCRIBE g_records: {key: chararray,values: {A2: chararray,A3: double}}
>>
>> all_g_records = UNION g_records, g_records_2, g_records_3, g_records_4;
>>
>> /* Problem for me: */
>> > DESCRIBE all_g_records: Schema for all_g_records unknown.
>>
>> output_records = FOREACH all_g_records GENERATE $0 AS key:chararray, $1 AS
>> values:bag []  # errr... how?
>>
>> ------------------
>>
>> Thanks!
>>
>> Russell Jurney
>> [email protected]
>>
>

Reply via email to