Parsing Canonical Form and logical types

Brennan Vincent Fri, 21 May 2021 10:44:47 -0700

Hello,

I am a bit confused about the intended meaning of Parsing CanonicalForm. The spec suggests that when reducing a schema to PCF, only aparticular subset of fields should be retained:

[STRIP] Keep only attributes that are relevant to parsing data,which are: type, name, fields, symbols, items, values, size. Strip allothers (e.g., doc and aliases).

However, certain other attributes are necessary for correctlyinterpreting data when logical types are supported; for example, thelogicalType, precision, and scale attributes are necessary for correctlyinterpreting decimals.

What our Avro consumer is doing now is checking whether the reader andwriter schemas' PCFs match, and if so, we don't bother performing schemaresolution; this creates bugs for us when, for example, the writerchanges the scale of a decimal (we will continue interpreting itaccording to the scale from the reader schema, giving wrong results).Perhaps we shouldn't be doing this check, and should simply _always_resolve schemas that differ in any way?


Anyone have an idea what the intended meaning of the spec is?

Thanks,
Brennan (Member of Technical Staff at Materialize, Inc.)

Parsing Canonical Form and logical types

Reply via email to