AFAIK, by default AvroStorage enforces that all input files have exactly the same schema. I've submitted a patch to improve this somewhat by allowing different input schemas so long as a union schema can be derived; e.g., say schema 1 contains field 'foo' which is not in schema 2, and schema 2 contains 'bar' which is not in schema 1, then the resulting schema will have both fields, etc. (The patch is here: https://issues.apache.org/jira/browse/PIG-2579.)
In your case, you seem to have different schemas where the difference is actual in the fields which are never used inside pig. That's an entirely new use case, afaik. The union schema is one workaround. However, it might be better to specify these unused fields and preclude them from validation; perhaps running validation only against those fields which are specified in the pig script. Best, stan On Thu, Apr 5, 2012 at 8:58 AM, Philipp <[email protected]> wrote: > Hi list, > > if I run pig over several avro files with different schemas I get a schema > mismatch message, even if the schema has only changed marginally in a field > that I'm not even using in that particular pig job. > I'm wondering if it would be possible to resolve the mismatch, eg. as > suggested in: > https://avro.apache.org/docs/current/spec.html#Schema+Resolution > > Regards, Philipp > >
