Hi, I'm would like to serialize from and deserialize to java objects. If I'm correct, Avro offers three possibilities:
GenericDatumReader/Writer SpecificDatumReader/Writer ReflectDatumReader/Writer The GenericDatum uses a generic, map-like structure. It would require some additional code to convert it into the java objects I want to use. Creating and maintaining this code makes this approach not very useful for me. The SpecificDatum is used with the objects that can be generated by Avro. These java objects are automatically instantiated, which is what I want. But it requires me to use the generated classes in my code, and as with all generated code, it's wise not to change them yourself. The generated classes also extend an Avro class, which restricts the use of the classes too much for me. The ReflectDatum uses reflection to instantiate classes, and doesn't come with the restrictions of the generated classes. But it requires the classes to be java beans, so it must have public setter methods for all properties. This makes this approach not useful to me, since the classes I'd like to use are designed to be immutable. To summarize, the three options all force me to design the classes I'd like to use is a specific way, which are too restrictive for me. I'm a big fan of Jackson, because it allows me to deserialize to any POJO, regardless of how the POJO is designed. So I was thinking of how great it would be if I could use Jackson to deserialize from avro to java objects (and vv). Then I found jackson-dataformat-avro( https://github.com/FasterXML/jackson-dataformats-binary/tree/master/avro), which is a Jackson extension to serialize to avro and back. This works great, but, as far as I could find, only allows serializing a single object. It does not provide a way to serialize multiple objects to a single file or stream, to generate an Avro object container file. This is actually what I would like to achieve. For serialization, I found a way to do this (it seems to be working at least): I'm using the DataFileWriter.appendEncoded(ByteBuffer) method to write objects that I serialized using jackson-dataformat-avro. But I failed find a way to do deserialization of an object container file using Jackson. I tried the following. I created my own JacksonDatumReader, which basically does this: public T read(T reuse, Decoder in) throws IOException { InputStream inputStream = ((BinaryDecoder) in).inputStream(); return objectReader.readValue(inputStream); } The objectReader is a com.fasterxml.jackson.dataformat.avro.AvroMapper that does the actual parsing and binding. This works fine for the very first object in the file, but it crashes at the second. The problem seems to be that for the second object, the InputStream is already fully read -- there are no bytes left to read. It seems that the Decoder that is passed in is a wrapper for an entire data block, containing the data for multiple objects. When the second object is parsed, the same Decoder instance is passed in. The decoder keeps the read position internally, but for the second object, it seems that the entire input is already read, and there is no data left to read, and an EOFException is thrown. Does anyone know a way to make this JacksonDatumRead work? Or maybe anyone knows of another way to deserialize an avro file and use Jackson data binding? Thanks, Tom
