Re: Preserving defaults in canonical form schema record fields
Thanks for the update. I am rework on https://issues.apache.org/jira/browse/AVRO-2299 to fix git commit problem and keep us updated. On Mon, 4 Nov 2019 at 9:22 AM, Driesprong, Fokko wrote: > Thanks for bringing this up Michael, > > At my current project, I've bumped into this one as well. We're trying to > build a schema registry and take the fingerprint from the canonical schema > in order to check if something changed. The issue here is that the > canonical schema takes the minimal schema that ensures binary > compatibility. The default values are only considered when reading the file > and do not impact the binary. This was the original idea that I could > derive from https://issues.apache.org/jira/browse/AVRO-2299 > > For me, the first step would be to clarify the Avro spec on the different > schemas, such as plain, canonical, etc. So we are sure that every > implementation has the same notion of a canonical schema. > > There is a PR that addresses this issue in the Java implementation and > updates the spec: https://github.com/apache/avro/pull/452 But it looks > like > something is off with the PR. > > Cheers, Fokko > > Op zo 3 nov. 2019 om 22:01 schreef Michael A. Smith >: > > > I'm picking up the languishing ticket AVRO-1938. One issue with the PR > > (https://github.com/apache/avro/pull/143) is that it strips the > > default value from a field: > > > > input: > > > '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes","default":"abc"}]}' > > output: > > > '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes"}]}' > > > > The specification > > ( > > > https://avro.apache.org/docs/1.9.1/spec.html#Parsing+Canonical+Form+for+Schemas > > ) > > doesn't address record fields at all. If we are to assume that the > > same rules apply to fields as to other schema parts, then the [STRIP] > > rule says we should drop "default" and "order" from fields. But that > > can't be right -- default is crucial for readers, and two schema > > differing only on by a default are certainly different schema and > > ought to have different fingerprints. > > > > Did I miss something in reading the spec, or is this a gap? How should > > I interpret the spec in implementing parsing canonical form for record > > fields. Specifically, should the canonical form of a record field > > preserve its order and default values in the [STRIP] rule, and if so, > > where do those things go in the [ORDER] rule? > > > > Thanks, > > Michael A. Smith > > > -- Thanks and Regards *Rumeshkrishnan Mohan* *Mail: *rumeshkr...@gmail.com *Mobile:* +49 151 45725188. *WhatsApp:* +91 94436 87507.
Re: Preserving defaults in canonical form schema record fields
Thanks for bringing this up Michael, At my current project, I've bumped into this one as well. We're trying to build a schema registry and take the fingerprint from the canonical schema in order to check if something changed. The issue here is that the canonical schema takes the minimal schema that ensures binary compatibility. The default values are only considered when reading the file and do not impact the binary. This was the original idea that I could derive from https://issues.apache.org/jira/browse/AVRO-2299 For me, the first step would be to clarify the Avro spec on the different schemas, such as plain, canonical, etc. So we are sure that every implementation has the same notion of a canonical schema. There is a PR that addresses this issue in the Java implementation and updates the spec: https://github.com/apache/avro/pull/452 But it looks like something is off with the PR. Cheers, Fokko Op zo 3 nov. 2019 om 22:01 schreef Michael A. Smith : > I'm picking up the languishing ticket AVRO-1938. One issue with the PR > (https://github.com/apache/avro/pull/143) is that it strips the > default value from a field: > > input: > '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes","default":"abc"}]}' > output: > '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes"}]}' > > The specification > ( > https://avro.apache.org/docs/1.9.1/spec.html#Parsing+Canonical+Form+for+Schemas > ) > doesn't address record fields at all. If we are to assume that the > same rules apply to fields as to other schema parts, then the [STRIP] > rule says we should drop "default" and "order" from fields. But that > can't be right -- default is crucial for readers, and two schema > differing only on by a default are certainly different schema and > ought to have different fingerprints. > > Did I miss something in reading the spec, or is this a gap? How should > I interpret the spec in implementing parsing canonical form for record > fields. Specifically, should the canonical form of a record field > preserve its order and default values in the [STRIP] rule, and if so, > where do those things go in the [ORDER] rule? > > Thanks, > Michael A. Smith >
Preserving defaults in canonical form schema record fields
I'm picking up the languishing ticket AVRO-1938. One issue with the PR (https://github.com/apache/avro/pull/143) is that it strips the default value from a field: input: '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes","default":"abc"}]}' output: '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes"}]}' The specification (https://avro.apache.org/docs/1.9.1/spec.html#Parsing+Canonical+Form+for+Schemas) doesn't address record fields at all. If we are to assume that the same rules apply to fields as to other schema parts, then the [STRIP] rule says we should drop "default" and "order" from fields. But that can't be right -- default is crucial for readers, and two schema differing only on by a default are certainly different schema and ought to have different fingerprints. Did I miss something in reading the spec, or is this a gap? How should I interpret the spec in implementing parsing canonical form for record fields. Specifically, should the canonical form of a record field preserve its order and default values in the [STRIP] rule, and if so, where do those things go in the [ORDER] rule? Thanks, Michael A. Smith