I think we should be cautious when changing specification, other language bindings might already use longs as position index. For example, it appears that C++ implementation does what the spec says now: https://github.com/apache/avro/blob/master/lang/c%2B%2B/impl/BinaryDecoder.cc#L230, and if we restrict this to int in the spec, then we make a breaking change for sure, in the unlikely situation when one writes a huge union where the position fits only into a long, then that won't be a valid Avro file any more - according to the new spec.
On Sun, Mar 29, 2020 at 12:27 PM Driesprong, Fokko <[email protected]> wrote: > Hi Anh, > > It looks like that you've found an inconsistency in the docs there. I > think we need to update the docs, and state that an int is being written. > > Stay strong! > > Cheers, Fokko > > Op vr 20 mrt. 2020 om 07:58 schreef Anh Le <[email protected]>: > >> Hi guys, >> >> I'm reading the current Avro Spec. It states that: >> >> > A union is encoded by first writing a long value indicating the >> zero-based position within the union of the schema of its value. The value >> is then encoded per the indicated schema within the union. >> >> But as I dive through the code base, for example: >> https://github.com/rdblue/avro-java/blob/master/avro/src/main/java/org/apache/avro/generic/GenericDatumWriter.java#L123-L125, >> I see there's no long value here. We've got an Int instead. >> >> Would you please tell me if there's any misunderstanding here. >> >> Thank you (and be strong)! >> >
