The error can be reproduced with the following simplified schemas (with the
only difference being the presence of the “avro.java.string” property in the
reader schema and its absence in the writer schema):
Reader:
{
"type": "record",
"name": "Event",
"namespace": "com.example.event.model",
"fields": [
{
"name": "body",
"type": {
"type": "record",
"name": "EventBody",
"namespace": "com.example.event.model",
"fields": [
{
"name": "optionalNestedObject",
"type": [
"null",
{
"type": "record",
"name": "NestedObject",
"fields": [
{
"name": "mandatoryString",
"type": {
"type": "string",
"avro.java.string": "String"
}
}
]
}
],
"default": null
}
]
}
}
]
}
Writer:
{
"type": "record",
"name": "Event",
"namespace": "com.example.event.model",
"fields": [
{
"name": "body",
"type": {
"type": "record",
"name": "EventBody",
"namespace": "com.example.event.model",
"fields": [
{
"name": "optionalNestedObject",
"type": [
"null",
{
"type": "record",
"name": "NestedObject",
"fields": [
{
"name": "mandatoryString",
“type": "string"
}
]
}
],
"default": null
}
]
}
}
]
}
The issue seems to be caused by the org.apache.avro.Resolver.unionEquiv() not
considering properties when determining whether the “optionalNestedObject”
union is equal in both schemas and the
org.apache.avro.io.ResolvingGrammarGenerator.generate() then using the writer
schema to determine the String class to use for the string field within that
union since the union is falsely considered equivalent. This ultimately results
in a ClassCastException since the string is deserialized as
org.apache.avro.util.Utf8 but the POJO generated from the reader schema has a
field with the correct java.lang.String type.