Ok.. you mean I can just use the newer schema to read the old schema as well, by populating some default value for the missing field. I think that should work, messy code though!
Thanks! On Wed, Mar 28, 2012 at 3:53 PM, Bill Graham <[email protected]> wrote: > If you evolved your schema to just add fields, then you should be able to > use a single schema descriptor file to read both pre- and post-evolved data > objects. This is because one of the rules of new fields in Avro is that > they have to have a default value and be non-null. AvroStorage should pick > that default field up for the old objects. If it doesn't, then that's a > bug. > > > On Wed, Mar 28, 2012 at 3:26 PM, IGZ Nick <[email protected]> wrote: > > > @Bill, > > I did look at the option of providing input as a parameter while > > initializing AvroStorage(). But even then, I'll still need to change my > > script to handle the two files because I'll still need to have separate > > schemas right? > > > > @Stan, > > Thanks for pointing me to it, it is a useful feature. But in my case, I > > would never have two input files with different schemas. The input will > > always have only one of the schemas, but I want my new script (with the > > additional column) to be able to process the old data as well, even if > the > > input only contains data with the older schema. > > > > On Wed, Mar 28, 2012 at 3:00 PM, Stan Rosenberg < > [email protected] > > >wrote: > > > > > There is a patch for Avro to deal with this use case: > > > https://issues.apache.org/jira/browse/PIG-2579 > > > (See the attached pig example which loads two avro input files with > > > different schemas.) > > > > > > Best, > > > > > > stan > > > > > > On Wed, Mar 28, 2012 at 4:22 PM, IGZ Nick <[email protected]> wrote: > > > > Hi guys, > > > > > > > > I use Pig to process some clickstream data. I need to track a new > > field, > > > so > > > > I added a new field to my avro schema, and changed my Pig script > > > > accordingly. It works fine with the new files (which have that new > > > column) > > > > but it breaks when I run it on my old files which do not have that > > column > > > > in the schema (since avro stores schema in the data files itself). I > > was > > > > expecting that Pig will assume the field to be null if that > particular > > > > field does not exist. But now I am having to maintain separate > scripts > > to > > > > process the old and new files. Is there any workaround this? Because > I > > > > figure I'll have to add new column frequently and I don't want to > > > maintain > > > > a separate script for each window where the schema is constant. > > > > > > > > Thanks, > > > > > > > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me at > [email protected] going forward.* >
