Hello,

I am a bit confused about the intended meaning of Parsing Canonical Form. The spec suggests that when reducing a schema to PCF, only a particular subset of fields should be retained:

[STRIP] Keep only attributes that are relevant to parsing data, which are: type, name, fields, symbols, items, values, size. Strip all others (e.g., doc and aliases).

However, certain other attributes are necessary for correctly interpreting data when logical types are supported; for example, the logicalType, precision, and scale attributes are necessary for correctly interpreting decimals.

What our Avro consumer is doing now is checking whether the reader and writer schemas' PCFs match, and if so, we don't bother performing schema resolution; this creates bugs for us when, for example, the writer changes the scale of a decimal (we will continue interpreting it according to the scale from the reader schema, giving wrong results). Perhaps we shouldn't be doing this check, and should simply _always_ resolve schemas that differ in any way?

Anyone have an idea what the intended meaning of the spec is?

Thanks,
Brennan (Member of Technical Staff at Materialize, Inc.)

Reply via email to