> > Lastly, it says "Note also that Avro binary-encoded data can be efficiently > ordered without deserializing it to objects." What does this mean exactly? > This might be mis-interpreted as saying one can lexicographically sort the > binary-encoding without asking Avro to deserialize it, and it'll be in a > proper order. However, this seems obviously not true from the number > formats. Perhaps it would be clearer to say "Avro can efficiently make > sort-comparisons on binary-encoded data without allocating deserialization > objects."
i had the exact same question when first coming to avro, so perhaps it does deserve clarification. On Thu, Dec 2, 2010 at 7:30 AM, David Jeske <[email protected]> wrote: > I like the inclusion of sort-order in avro, to enable different machines to > sort and exchange. I have a few suggestions to clarify the documentation. > Please correct any assumptions I've made that are incorrect... > > It seems that sorts are not stable across schema versions. I think I > understand why this makes sense inside the schema philosophy, yet I think > the documentation could clear up a couple of the subtlties a bit more. For > example, it says "*data items may only be compared if they have identical > schemas*". If I supply a source schema which avro can map into my target > schema, I would think it could load and compare things in my target schema. > Is this correct? It might be clarified. > > Also, the comment "*this permits data written by one system to be > efficiently sorted by another system*", could callout that data items > sorted in one schema may not be in the proper order if during read they are > mapped to a new version of the schema. In fact, it might be useful for Avro > to be able to tell me when it does the source->target schema mapping, > whether both schemas sorted in the same order (if it doesn't already). > > Lastly, it says "*Note also that Avro binary-encoded data can be > efficiently ordered without deserializing it to objects.*" What does this > mean exactly? This might be mis-interpreted as saying one can > lexicographically sort the binary-encoding without asking Avro to > deserialize it, and it'll be in a proper order. However, this seems > obviously not true from the number formats. Perhaps it would be clearer to > say "Avro can efficiently make sort-comparisons on binary-encoded data > without allocating deserialization objects." > > Did I properly understand those sort-related subtlties? > > >
