>
> Lastly, it says "Note also that Avro binary-encoded data can be efficiently
> ordered without deserializing it to objects." What does this mean exactly?
>  This might be mis-interpreted as saying one can lexicographically sort the
> binary-encoding without asking Avro to deserialize it, and it'll be in a
> proper order. However, this seems obviously not true from the number
> formats. Perhaps it would be clearer to say "Avro can efficiently make
> sort-comparisons on binary-encoded data without allocating deserialization
> objects."


i had the exact same question when first coming to avro, so perhaps it does
deserve clarification.

On Thu, Dec 2, 2010 at 7:30 AM, David Jeske <[email protected]> wrote:

> I like the inclusion of sort-order in avro, to enable different machines to
> sort and exchange. I have a few suggestions to clarify the documentation.
> Please correct any assumptions I've made that are incorrect...
>
> It seems that sorts are not stable across schema versions. I think I
> understand why this makes sense inside the schema philosophy, yet I think
> the documentation could clear up a couple of the subtlties a bit more. For
> example, it says "*data items may only be compared if they have identical
> schemas*". If I supply a source schema which avro can map into my target
> schema, I would think it could load and compare things in my target schema.
> Is this correct? It might be clarified.
>
> Also, the comment "*this permits data written by one system to be
> efficiently sorted by another system*", could callout that data items
> sorted in one schema may not be in the proper order if during read they are
> mapped to a new version of the schema. In fact, it might be useful for Avro
> to be able to tell me when it does the source->target schema mapping,
> whether both schemas sorted in the same order (if it doesn't already).
>
> Lastly, it says "*Note also that Avro binary-encoded data can be
> efficiently ordered without deserializing it to objects.*" What does this
> mean exactly?  This might be mis-interpreted as saying one can
> lexicographically sort the binary-encoding without asking Avro to
> deserialize it, and it'll be in a proper order. However, this seems
> obviously not true from the number formats. Perhaps it would be clearer to
> say "Avro can efficiently make sort-comparisons on binary-encoded data
> without allocating deserialization objects."
>
> Did I properly understand those sort-related subtlties?
>
>
>

Reply via email to