Martin, Thanks very much! Setting the map to expect an AvroKey<Object> and using instanceof works nicely.
James From: Martin Kleppmann [mailto:[email protected]] Sent: Wednesday, May 14, 2014 4:48 AM To: [email protected] Subject: Re: Reading from disjoint schemas in map Hi James, If you're using code generation to create Java classes for the Avro schemas, you should be able to just use Java's instanceof. If you're using GenericRecord, you can use GenericRecord.getSchema() to determine the type of a particular record. Hope that helps, Martin On 13 May 2014, at 21:03, James Campbell <[email protected]<mailto:[email protected]>> wrote: I'm trying to read data into a mapreduce job, where the data may have been created by one of a few different schemas, none of which are evolutions of one another (though they are related). I have seen several people suggest using a union schema, such that during job setup, one would set the input schema to be the union: ArrayList<Schema> schemas = new ArrayList<Schema>(); schemas.add(schema1); ... Schema unionSchema = Schema.createUnion(schemas); AvroJob.setInputKeySchema(job, unionSchema); However, I don't know how to then extract the correct type inside my mapper (which was apparently trivial (sorry-I'm new to avro)). I'd guess that the map function profile becomes map(AvroKey<GenericRecord> key, NullWritable value, ...) but how can I then cause Avro to read the correctly-typed data from the GenericRecord? Thanks! James
