Supporting schema migration is a badly needed feature in AvroStorage. I'm not able to add it in the near future. Anyone else interested?
On Tue, Mar 20, 2012 at 2:08 PM, Scott Carey <[email protected]> wrote: > I'm assuming you are using Pig's AvroStorage function. It appears that it > does not support schema migration, but it certainly could do so. A > collection of avro files can be 'viewed' as if they all are of one schema > provided they can all resolve to it. I have several tools that do this > successfully with MapReduce/Pig/Hive. > > The Pig AvroStorage tool is maintained by the Apache Pig project, you will > need to inquire there in order to get more details. > > -Scott > > > > On 3/20/12 2:27 AM, "Markus Resch" <[email protected]> wrote: > > >Hi guys, > > > >Thanks again for your awesome hint about sqoop. > > > >I have another question: The Data I'm working with is stored as AVRO > >Files in the Hadoop. When I try to glob them everything works just > >perfectly. But. When I add the schema of a single data file while the > >others remain everything gets wrecked: > > > >"currently we assume all avro files under the same "location" > > * share the same schema and will throw exception if not." > > > >(e.g. I add a new data field) Expected behavior for me would be: If I'm > >globbing several files with slightly different schema the result of the > >LOAD would be either return an intersection of all valid fields that are > >common to both schemes or the atoms of the missing fields are nulled. > > > >How could I handle this properly? > > > >Thanks > > > >Markus > > > > > > > > > > > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
