We do take the union schema approach, but create the unions programmaticly in java:
Something like: ArrayList<Schema> schemas = new ArrayList<Schema>(); schemas.add(schema1); schemas.add(schema2); Schema unionSchema = Schema.createUnion(schemas); AvroJob.setInputSchema(job, unionSchema); On Wed, May 11, 2011 at 12:44 PM, Markus Weimer <[email protected]> wrote: > Hi, > > I'd like to write a mapreduce job that uses avro throughout, but the map > phase would need to read files with two different schemas, similar to what > the MultipleInputFormat does in stock hadoop. Is this a supported use case? > > A work-around would be to create a union schema that has both fields as > optional and to convert all data into it, but that seems clumsy. > > Has anyone done this before? > > Thanks for any suggestion you can give, > > Markus > >
