Re: Working with changing schemas (avro) in Pig

IGZ Nick Wed, 28 Mar 2012 15:27:09 -0700

@Bill,
I did look at the option of providing input as a parameter while
initializing AvroStorage(). But even then, I'll still need to change my
script to handle the two files because I'll still need to have separate
schemas right?


@Stan,
Thanks for pointing me to it, it is a useful feature. But in my case, I
would never have two input files with different schemas. The input will
always have only one of the schemas, but I want my new script (with the
additional column) to be able to process the old data as well, even if the
input only contains data with the older schema.

On Wed, Mar 28, 2012 at 3:00 PM, Stan Rosenberg <[email protected]>wrote:

> There is a patch for Avro to deal with this use case:
> https://issues.apache.org/jira/browse/PIG-2579
> (See the attached pig example which loads two avro input files with
> different schemas.)
>
> Best,
>
> stan
>
> On Wed, Mar 28, 2012 at 4:22 PM, IGZ Nick <[email protected]> wrote:
> > Hi guys,
> >
> > I use Pig to process some clickstream data. I need to track a new field,
> so
> > I added a new field to my avro schema, and changed my Pig script
> > accordingly. It works fine with the new files (which have that new
> column)
> > but it breaks when I run it on my old files which do not have that column
> > in the schema (since avro stores schema in the data files itself). I was
> > expecting that Pig will assume the field to be null if that particular
> > field does not exist. But now I am having to maintain separate scripts to
> > process the old and new files. Is there any workaround this? Because I
> > figure I'll have to add new column frequently and I don't want to
> maintain
> > a separate script for each window where the schema is constant.
> >
> > Thanks,
>

Re: Working with changing schemas (avro) in Pig

Reply via email to