There is a patch for AvroStorage which computes a union schema thereby allowing input avro files having different schemas, specifically (un-nested) records with different fields.
https://issues.apache.org/jira/browse/PIG-2579 Best, stan On Wed, Mar 21, 2012 at 8:31 PM, Jonathan Coveney <[email protected]> wrote: > A question about this: does Avro have clear cut rules for how to > essentially merge two arbitrary JSON schemas? > > 2012/3/21 Jonathan Coveney <[email protected]> > >> ATM, there is no quick and easy solution short of patching Pig... feel >> free to make a ticket. >> >> Short of that, what you can do is load each relation with a different >> schema separately, and then do a union of it. Given that there might be a >> lot of different relations and schemas involved, you could probably make a >> script to do this for you... but yeah, the long term approach is to patch >> AvroStorage. >> >> >> 2012/3/21 Markus Resch <[email protected]> >> >>> Hi guys, >>> >>> Thanks again for your awesome hint about sqoop. >>> >>> I have another question: The data I'm working with is stored as AVRO >>> Files in the Hadoop. When I try to glob them everything works just >>> perfectly. But. When I add something to the schema of a single data file >>> while the others remain, everything gets wrecked: >>> >>> "currently we assume all avro files under the same "location" >>> * share the same schema and will throw exception if not." >>> >>> (e.g. I add a new data field) Expected behavior for me would be: If I'm >>> globbing several files with slightly different schema the result of the >>> LOAD would be either return an intersection of all valid fields that are >>> common to both schemes or the atoms of the missing fields are nulled. >>> >>> How could I handle this properly? >>> >>> Thanks >>> >>> Markus >>> >>> >>> >>
