I'm debugging a nasty problem the occurs down in the Avro 1.4.1 code.
Sometimes when I read my serialized data into a generic datum object I crash
deep inside the Avro code. The call stack shows that the parser has been
walking down my data structure until it gets to a string node that it tries
to read using BinaryDecoder.readString. This function retrieves an invalid
string length (e.g. a negative number) and the process subsequently crashes
with an Array Index Out Of Bounds exception.

The exact origin of this bug is mysterious to me, but at a high level it
appears the problem is that I wrote the data with one schema and mistakenly
read it back in using a different schema. How exactly this happened is also
mysterious, but appears that my mechanism for supporting projection schemas
didn't behave as it should have. The two schemas in question are mostly the
same–in fact, one is a subset of the other.

   1. In general is it possible for a schema-to-data mismatch to cause a
   crash down in the Avro code of the sort that I described?
   2. If the answer to question (1) is "yes", is the only way you'd expect
   the crash to happen is if writing was done with the superset schema and
   reading done with the subset schema?
   3. Writing with a superset schema and reading with a subset schema will
   always work because this is just projection, correct?

Thanks.

Reply via email to