I have a bunch of grouped datasets that I need to union and store. When I
union them, they lose their schema. I need the schema for my output storage
function to work. How do I recreate my a schema with a bag of tuples in it
with a GENERATE/AS?
The schema of each union'd source (all the same) was: g_records: {key:
chararray,values: {A2: chararray,A3: double}}
Code:
------------------
records = LOAD 'records' USING PigStorage('\t') AS (A1:chararray,
A2:chararray, A3:double);
g_records = GROUP records BY A1;
g_records = FOREACH g_records GENERATE $0 AS key:chararray, $1 AS values;
g_records = FOREACH g_records GENERATE key, values.(A2, A3);
> DESCRIBE g_records: {key: chararray,values: {A2: chararray,A3: double}}
all_g_records = UNION g_records, g_records_2, g_records_3, g_records_4;
/* Problem for me: */
> DESCRIBE all_g_records: Schema for all_g_records unknown.
output_records = FOREACH all_g_records GENERATE $0 AS key:chararray, $1 AS
values:bag [] # errr... how?
------------------
Thanks!
Russell Jurney
[email protected]