Re: Preserving defaults in canonical form schema record fields

2020-01-19 Thread RumeshKrishnan Mohan
Thanks for the update.

I am rework on  https://issues.apache.org/jira/browse/AVRO-2299 to fix git
commit problem and keep us updated.

On Mon, 4 Nov 2019 at 9:22 AM, Driesprong, Fokko 
wrote:

> Thanks for bringing this up Michael,
>
> At my current project, I've bumped into this one as well. We're trying to
> build a schema registry and take the fingerprint from the canonical schema
> in order to check if something changed. The issue here is that the
> canonical schema takes the minimal schema that ensures binary
> compatibility. The default values are only considered when reading the file
> and do not impact the binary. This was the original idea that I could
> derive from https://issues.apache.org/jira/browse/AVRO-2299
>
> For me, the first step would be to clarify the Avro spec on the different
> schemas, such as plain, canonical, etc. So we are sure that every
> implementation has the same notion of a canonical schema.
>
> There is a PR that addresses this issue in the Java implementation and
> updates the spec: https://github.com/apache/avro/pull/452 But it looks
> like
> something is off with the PR.
>
> Cheers, Fokko
>
> Op zo 3 nov. 2019 om 22:01 schreef Michael A. Smith  >:
>
> > I'm picking up the languishing ticket AVRO-1938. One issue with the PR
> > (https://github.com/apache/avro/pull/143) is that it strips the
> > default value from a field:
> >
> > input:
> >
> '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes","default":"abc"}]}'
> > output:
> >
> '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes"}]}'
> >
> > The specification
> > (
> >
> https://avro.apache.org/docs/1.9.1/spec.html#Parsing+Canonical+Form+for+Schemas
> > )
> > doesn't address record fields at all. If we are to assume that the
> > same rules apply to fields as to other schema parts, then the [STRIP]
> > rule says we should drop "default" and "order" from fields. But that
> > can't be right -- default is crucial for readers, and two schema
> > differing only on by a default are certainly different schema and
> > ought to have different fingerprints.
> >
> > Did I miss something in reading the spec, or is this a gap? How should
> > I interpret the spec in implementing parsing canonical form for record
> > fields. Specifically, should the canonical form of a record field
> > preserve its order and default values in the [STRIP] rule, and if so,
> > where do those things go in the [ORDER] rule?
> >
> > Thanks,
> > Michael A. Smith
> >
>
-- 

Thanks and Regards
*Rumeshkrishnan Mohan*
*Mail: *rumeshkr...@gmail.com
*Mobile:* +49 151 45725188.
*WhatsApp:* +91 94436 87507.


Re: Preserving defaults in canonical form schema record fields

2019-11-04 Thread Driesprong, Fokko
Thanks for bringing this up Michael,

At my current project, I've bumped into this one as well. We're trying to
build a schema registry and take the fingerprint from the canonical schema
in order to check if something changed. The issue here is that the
canonical schema takes the minimal schema that ensures binary
compatibility. The default values are only considered when reading the file
and do not impact the binary. This was the original idea that I could
derive from https://issues.apache.org/jira/browse/AVRO-2299

For me, the first step would be to clarify the Avro spec on the different
schemas, such as plain, canonical, etc. So we are sure that every
implementation has the same notion of a canonical schema.

There is a PR that addresses this issue in the Java implementation and
updates the spec: https://github.com/apache/avro/pull/452 But it looks like
something is off with the PR.

Cheers, Fokko

Op zo 3 nov. 2019 om 22:01 schreef Michael A. Smith :

> I'm picking up the languishing ticket AVRO-1938. One issue with the PR
> (https://github.com/apache/avro/pull/143) is that it strips the
> default value from a field:
>
> input:
> '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes","default":"abc"}]}'
> output:
> '{"name":"example","type":"record","fields":[{"name":"def","type":"bytes"}]}'
>
> The specification
> (
> https://avro.apache.org/docs/1.9.1/spec.html#Parsing+Canonical+Form+for+Schemas
> )
> doesn't address record fields at all. If we are to assume that the
> same rules apply to fields as to other schema parts, then the [STRIP]
> rule says we should drop "default" and "order" from fields. But that
> can't be right -- default is crucial for readers, and two schema
> differing only on by a default are certainly different schema and
> ought to have different fingerprints.
>
> Did I miss something in reading the spec, or is this a gap? How should
> I interpret the spec in implementing parsing canonical form for record
> fields. Specifically, should the canonical form of a record field
> preserve its order and default values in the [STRIP] rule, and if so,
> where do those things go in the [ORDER] rule?
>
> Thanks,
> Michael A. Smith
>


Preserving defaults in canonical form schema record fields

2019-11-03 Thread Michael A. Smith
I'm picking up the languishing ticket AVRO-1938. One issue with the PR
(https://github.com/apache/avro/pull/143) is that it strips the
default value from a field:

input: 
'{"name":"example","type":"record","fields":[{"name":"def","type":"bytes","default":"abc"}]}'
output: 
'{"name":"example","type":"record","fields":[{"name":"def","type":"bytes"}]}'

The specification
(https://avro.apache.org/docs/1.9.1/spec.html#Parsing+Canonical+Form+for+Schemas)
doesn't address record fields at all. If we are to assume that the
same rules apply to fields as to other schema parts, then the [STRIP]
rule says we should drop "default" and "order" from fields. But that
can't be right -- default is crucial for readers, and two schema
differing only on by a default are certainly different schema and
ought to have different fingerprints.

Did I miss something in reading the spec, or is this a gap? How should
I interpret the spec in implementing parsing canonical form for record
fields. Specifically, should the canonical form of a record field
preserve its order and default values in the [STRIP] rule, and if so,
where do those things go in the [ORDER] rule?

Thanks,
Michael A. Smith