RE: Reading from disjoint schemas in map

James Campbell Wed, 14 May 2014 12:34:43 -0700

Martin,

Thanks very much!  Setting the map to expect an AvroKey<Object> and using 
instanceof works nicely.

James

From: Martin Kleppmann [mailto:[email protected]]
Sent: Wednesday, May 14, 2014 4:48 AM
To: [email protected]
Subject: Re: Reading from disjoint schemas in map

Hi James,

If you're using code generation to create Java classes for the Avro schemas, 
you should be able to just use Java's instanceof.

If you're using GenericRecord, you can use GenericRecord.getSchema() to 
determine the type of a particular record.

Hope that helps,
Martin

On 13 May 2014, at 21:03, James Campbell 
<[email protected]<mailto:[email protected]>> wrote:
I'm trying to read data into a mapreduce job, where the data may have been 
created by one of a few different schemas, none of which are evolutions of one 
another (though they are related).

I have seen several people suggest using a union schema, such that during job 
setup, one would set the input schema to be the union:
ArrayList<Schema> schemas = new ArrayList<Schema>();
schemas.add(schema1);
...
Schema unionSchema = Schema.createUnion(schemas);
AvroJob.setInputKeySchema(job, unionSchema);

However, I don't know how to then extract the correct type inside my mapper 
(which was apparently trivial (sorry-I'm new to avro)).

I'd guess that the map function profile becomes map(AvroKey<GenericRecord> key, 
NullWritable value, ...) but how can I then cause Avro to read the 
correctly-typed data from the GenericRecord?

Thanks!

James

RE: Reading from disjoint schemas in map

Reply via email to