Re: Schema resolution failure when the writer's schema is a primitive type and the reader's schema is a union

Scott Carey Fri, 31 Aug 2012 09:23:32 -0700

Yes, please file a bug in JIRA.  It will get more attention there.

On 8/30/12 11:06 PM, "Alexandre Normand" <[email protected]>
wrote:


> That's one of the things I've tried already. I've reversed the order to
> ["int", "null"] but I get the same result.
> 
> Should I file a bug in Jira?
> 
> -- 
> Alex
> 
> 
> On Thursday, August 30, 2012 at 11:01 PM, Scott Carey wrote:
>> 
>> My understanding of the spec is that promotion to a union should work as
>> long as the prior type is a member of the union.
>> 
>> What happens if the union in the reader schema union order is reversed?
>> 
>> This may be a bug.
>> 
>> -Scott
>> 
>> On 8/16/12 5:59 PM, "Alexandre Normand" <[email protected]>
>> wrote:
>> 
>> 
>>> Hey, 
>>> I've been running into this case where I have a field of type int but I
>>> need to allow for null values. To do this, I now have a new schema that
>>> defines that field as a union of
>>> null and int such as:
>>> type: [ "null", "int" ]
>>> According to my interpretation of the spec, avro should resolve this
>>> correctly. For reference, this reads like this (from
>>> http://avro.apache.org/docs/current/spec.html#Schema+Resolution):
>>> 
>>> if
>>> reader's is a union, but writer's is not
>>> The first schema in the reader's union that matches the writer's schema
>>> is recursively resolved against it. If none match, an error is signaled.)
>>> 
>>> 
>>> However, when trying to do this, I get this:
>>> org.apache.avro.AvroTypeException: Attempt to process a int when a union
>>> was expected.
>>> 
>>> I've written a simple test that illustrates what I'm saying:
>>> @Test
>>> public void testReadingUnionFromValueWrittenAsPrimitive() throws
>>> Exception {
>>> Schema writerSchema = new Schema.Parser().parse("{\n" +
>>> " \"type\":\"record\",\n" +
>>> " \"name\":\"NeighborComparisons\",\n" +
>>> " \"fields\": [\n" +
>>> " {\"name\": \"test\",\n" +
>>> " \"type\": \"int\" }]} ");
>>> Schema readersSchema = new Schema.Parser().parse(" {\n" +
>>> " \"type\":\"record\",\n" +
>>> " \"name\":\"NeighborComparisons\",\n" +
>>> " \"fields\": [ {\n" +
>>> " \"name\": \"test\",\n" +
>>> " \"type\": [\"null\", \"int\"],\n" +
>>> " \"default\": null } ] }");
>>> GenericData.Record record = new GenericData.Record(writerSchema);
>>> record.put("test", Integer.valueOf(10));
>>> 
>>> ByteArrayOutputStream output = new ByteArrayOutputStream();
>>> JsonEncoder jsonEncoder =
>>> EncoderFactory.get().jsonEncoder(writerSchema, output);
>>> GenericDatumWriter<GenericData.Record> writer = new
>>> GenericDatumWriter<GenericData.Record>(writerSchema);
>>> writer.write(record, jsonEncoder);
>>> jsonEncoder.flush();
>>> output.flush();
>>> 
>>> System.out.println(output.toString());
>>> 
>>> JsonDecoder jsonDecoder =
>>> DecoderFactory.get().jsonDecoder(readersSchema, output.toString());
>>> GenericDatumReader<GenericData.Record> reader =
>>> new GenericDatumReader<GenericData.Record>(writerSchema,
>>> readersSchema);
>>> GenericData.Record read = reader.read(null, jsonDecoder);
>>> assertEquals(10, read.get("test"));
>>> }
>>> 
>>> Am I misunderstanding how avro should handle such a case of schema
>>> resolution or is the problem in the implementation?
>>> 
>>> Cheers!
>>> 
>>> -- 
>>> Alex
>

Re: Schema resolution failure when the writer's schema is a primitive type and the reader's schema is a union

Reply via email to