Re: Multiple input schemas in MapReduce?

Markus Weimer Wed, 11 May 2011 15:20:46 -0700

Hi,

this sounds interesting! What datatype would my input in the mapper have? Or: 
How would I distinguish between the different inputs in the mapper?


Thanks,

Markus

On May 11, 2011, at 3:00 PM, Jacob R Rideout wrote:

> We do take the union schema approach, but create the unions
> programmaticly in java:
> 
> Something like:
> 
> ArrayList<Schema> schemas = new ArrayList<Schema>();
> schemas.add(schema1);
> schemas.add(schema2);
> Schema unionSchema = Schema.createUnion(schemas);
> AvroJob.setInputSchema(job, unionSchema);
> 
> 
> On Wed, May 11, 2011 at 12:44 PM, Markus Weimer <[email protected]> wrote:
>> Hi,
>> 
>> I'd like to write a mapreduce job that uses avro throughout, but the map 
>> phase would need to read files with two different schemas, similar to what 
>> the MultipleInputFormat does in stock hadoop. Is this a supported use case?
>> 
>> A work-around would be to create a union schema that has both fields as 
>> optional and to convert all data into it, but that seems clumsy.
>> 
>> Has anyone done this before?
>> 
>> Thanks for any suggestion you can give,
>> 
>> Markus
>> 
>>

Re: Multiple input schemas in MapReduce?

Reply via email to