There is a patch for Avro to deal with this use case: https://issues.apache.org/jira/browse/PIG-2579 (See the attached pig example which loads two avro input files with different schemas.)
Best, stan On Wed, Mar 28, 2012 at 4:22 PM, IGZ Nick <[email protected]> wrote: > Hi guys, > > I use Pig to process some clickstream data. I need to track a new field, so > I added a new field to my avro schema, and changed my Pig script > accordingly. It works fine with the new files (which have that new column) > but it breaks when I run it on my old files which do not have that column > in the schema (since avro stores schema in the data files itself). I was > expecting that Pig will assume the field to be null if that particular > field does not exist. But now I am having to maintain separate scripts to > process the old and new files. Is there any workaround this? Because I > figure I'll have to add new column frequently and I don't want to maintain > a separate script for each window where the schema is constant. > > Thanks,
