>> What causes the schema normalization to be incomplete?
Bad implementation, I use C++ avro and it's not complete and not very
active.

>And is that a problem? As long as the reader can get the schema, it
shouldn't matter that there are duplicates – as long as the >differences
between the duplicates do not affect decoding.
Not really a problem, we tend to use machine generated schemas and they are
always identical.

I think there are holes in the simplification of types if I remember
correctly.
Namespaces should be collapsed,
{"type" : "string"} -> "string" etc

Current implementation can't reliably decide if two types are identical. If
you correct the problem later then a registered schema would actually
change it's hash since it now can be simplified. If this is a problem
depends on your application.

We currently encode this as you suggest <schema_type (byte)><schema_id
(32/128bits)><avro (binary)>
The binary fields should probably have a defined endianness also.

I agree on that a defacto way of encoding this would be nice. Currently I
would say that the confluent / linkedin way is the normal....

Reply via email to