Hey,

i am currently doing some performance tests for my BSc thesis and i wondered how exactly the parsing of avro files when reading them works. From my understanding the data is read block by block from the file (rather than datum by datum) and then the datums are deserialized. Is this correct (this would mean that the memory usage of avro is depending on the block size rather than the datum size of each datum) or does it depend on the used implementation?

My second question is if there is a way to read the file datum by datum. I want to create an index which stores the byte offsets of the avro file so i can use e.g. seek() to go to that position and deserialize the following datum. Is this even possible or can i only start at positions with sync marker?

Greetings and thanks

Marius

Reply via email to