Hey,
I've been running into this case where I have a field of type int but I need to
allow for null values. To do this, I now have a new schema that defines that
field as a union of null and int such as:
type: [ "null", "int" ]
According to my interpretation of the spec, avro should resolve this correctly.
For reference, this reads like this (from
http://avro.apache.org/docs/current/spec.html#Schema+Resolution):
> if reader's is a union, but writer's is not
> The first schema in the reader's union that matches the writer's schema is
> recursively resolved against it. If none match, an error is signaled.)
>
However, when trying to do this, I get this:
org.apache.avro.AvroTypeException: Attempt to process a int when a union was
expected.
I've written a simple test that illustrates what I'm saying:
@Test
public void testReadingUnionFromValueWrittenAsPrimitive() throws Exception {
Schema writerSchema = new Schema.Parser().parse("{\n" +
" \"type\":\"record\",\n" +
" \"name\":\"NeighborComparisons\",\n" +
" \"fields\": [\n" +
" {\"name\": \"test\",\n" +
" \"type\": \"int\" }]} ");
Schema readersSchema = new Schema.Parser().parse(" {\n" +
" \"type\":\"record\",\n" +
" \"name\":\"NeighborComparisons\",\n" +
" \"fields\": [ {\n" +
" \"name\": \"test\",\n" +
" \"type\": [\"null\", \"int\"],\n" +
" \"default\": null } ] }");
GenericData.Record record = new GenericData.Record(writerSchema);
record.put("test", Integer.valueOf(10));
ByteArrayOutputStream output = new ByteArrayOutputStream();
JsonEncoder jsonEncoder =
EncoderFactory.get().jsonEncoder(writerSchema, output);
GenericDatumWriter<GenericData.Record> writer = new
GenericDatumWriter<GenericData.Record>(writerSchema);
writer.write(record, jsonEncoder);
jsonEncoder.flush();
output.flush();
System.out.println(output.toString());
JsonDecoder jsonDecoder =
DecoderFactory.get().jsonDecoder(readersSchema, output.toString());
GenericDatumReader<GenericData.Record> reader =
new GenericDatumReader<GenericData.Record>(writerSchema,
readersSchema);
GenericData.Record read = reader.read(null, jsonDecoder);
assertEquals(10, read.get("test"));
}
Am I misunderstanding how avro should handle such a case of schema resolution
or is the problem in the implementation?
Cheers!
--
Alex