What's the "best" way to represent an optional enum in avro (in terms of
space efficiency, computational efficiency, and readability)? To be
consistent with other optional fields, I was planning to use union of null
and my enum type. The other approach I could see was adding a NULL field to
the enum -- but then my code would have to initialize the enum field to null
before a write.
I've tried to use union of null and the enum-type, but I've run into an
issue with this approach when using the AvroOutputFormat. The following
code summarizes my issue:
public void testDataWriteWithSchema() throws IOException {
final DataFileWriter<Event> writer =
new DataFileWriter<Event>(new SpecificDatumWriter<Event>());
writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));
writer.append(getEvent());
writer.close();
}
public void testDataWriteWithSchemaWithClass() throws IOException {
final DataFileWriter<Event> writer =
new DataFileWriter<Event>(new
SpecificDatumWriter<Event>(Event.class));
writer.create(Event.SCHEMA$, new File("target/datafile-test.avro"));
writer.append(getEvent());
writer.close();
}
When I don't pass in the Event.class to SpecificDatumWriter (the first test
method), the above test fails with the following exception:
Not in union
["null",
{"type":"enum","name":"Suit","namespace":"foo","symbols":["SPADES","CLUBS","HEARS","DIAMONDS"]}]:
SPADES
at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:382)
at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:67)
at org.apache.avro.generic.GenericDatumWriter.writeRecord(
GenericDatumWriter.java:100)
at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:62)
at org.apache.avro.generic.GenericDatumWriter.write(
GenericDatumWriter.java:54)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:245)
AvroOutputFormat uses the SpecificDatumWriter's default c'tor, so I run into
the above exception when using it. Is there some way around this (other
than implementing my own OutputFormat that passes along the class?).
Thanks,
Joe