Re: Multiple input schemas in MapReduce?

Jacob R Rideout Wed, 11 May 2011 15:01:25 -0700

We do take the union schema approach, but create the unions
programmaticly in java:


Something like:

ArrayList<Schema> schemas = new ArrayList<Schema>();
schemas.add(schema1);
schemas.add(schema2);
Schema unionSchema = Schema.createUnion(schemas);
AvroJob.setInputSchema(job, unionSchema);


On Wed, May 11, 2011 at 12:44 PM, Markus Weimer <[email protected]> wrote:
> Hi,
>
> I'd like to write a mapreduce job that uses avro throughout, but the map 
> phase would need to read files with two different schemas, similar to what 
> the MultipleInputFormat does in stock hadoop. Is this a supported use case?
>
> A work-around would be to create a union schema that has both fields as 
> optional and to convert all data into it, but that seems clumsy.
>
> Has anyone done this before?
>
> Thanks for any suggestion you can give,
>
> Markus
>
>

Re: Multiple input schemas in MapReduce?

Reply via email to