I'm getting an NPE when writing an Avro file with one schema and reading
back with a slightly modified schema. The only difference between the writer
and the reader schemas is that the reader renames a single field and adds an
alias. I'm just testing Avro's schema evolution features, so the code is
indeed as trivial as it appears. Here's the stack trace I get when reading:
Exception in thread "main" java.lang.NullPointerException
at org.apache.avro.Schema$Field.hashCode(Schema.java:421)
at java.util.AbstractList.hashCode(AbstractList.java:527)
at org.apache.avro.Schema$RecordSchema.hashCode(Schema.java:602)
at
org.apache.avro.io.parsing.ValidatingGrammarGenerator$LitS.hashCode(ValidatingGrammarGenerator.java:133)
at
org.apache.avro.io.parsing.ResolvingGrammarGenerator$LitS2.hashCode(ResolvingGrammarGenerator.java:461)
at java.util.HashMap.get(HashMap.java:300)
at
org.apache.avro.io.parsing.ResolvingGrammarGenerator.resolveRecords(ResolvingGrammarGenerator.java:197)
at
org.apache.avro.io.parsing.ResolvingGrammarGenerator.generate(ResolvingGrammarGenerator.java:118)
at
org.apache.avro.io.parsing.ResolvingGrammarGenerator.generate(ResolvingGrammarGenerator.java:50)
at org.apache.avro.io.ResolvingDecoder.resolve(ResolvingDecoder.java:76)
at org.apache.avro.io.ResolvingDecoder.<init>(ResolvingDecoder.java:46)
at
org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:93)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:103)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:198)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:185)
...
The writer schema:
{ "type": "record",
"name": "Message",
"namespace": "com.edmunds.avro.data",
"fields" : [
{"name": "id", "type": "long"},
{"name": "time", "type": "long"},
{"name": "body", "type": ["string", "null"]}
]
}
The reader schema:
{ "type": "record",
"name": "Message",
"namespace": "com.edmunds.avro.data",
"fields" : [
{"name": "messageId", "type": "long", "aliases": ["id"]},
{"name": "time", "type": "long"},
{"name": "body", "type": ["string", "null"]}
]
}
Here's the writing code:
private void writeMessages() {
File file = getAvroFile();
if(file.exists()) {
file.delete();
}
DatumWriter<Message> writer = new
SpecificDatumWriter<Message>(Message.class);
DataFileWriter<Message> dataFileWriter = new
DataFileWriter<Message>(writer);
try {
dataFileWriter.create(Message.SCHEMA$, file);
for(Message m : createMessages()) {
dataFileWriter.append(m);
System.out.println("Wrote message: " + objToString(m));
}
dataFileWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
And the reading code:
private void readMessages() {
File file = getAvroFile();
DatumReader<Message> reader = new
SpecificDatumReader<Message>(Message.class);
DataFileReader<Message> dataFileReader;
try {
dataFileReader = new DataFileReader<Message>(file, reader);
} catch (IOException e) {
e.printStackTrace();
return;
}
while(dataFileReader.hasNext()) {
Message m = dataFileReader.next();
System.out.println("Read message: " + objToString(m));
}
}
I'm using Avro 1.4.1 and the avro-maven-plugin (updated to Avro 1.4.1) to
generate the Message class(es). My code works fine when I use the exact same
schema for the reader and writer. What am I doing wrong?