Re: Avro as a foundation of a JSON based system

Josh Mon, 21 Nov 2016 05:36:26 -0800

For me, it appears to completely ignore fields in the JSON that aren't
defined in the reader schema. The reader succeeds and builds a generic
record (which excludes any additional fields in the JSON).


Thanks for looking into it!

Josh

On Fri, Nov 18, 2016 at 8:31 PM, Zoltan Farkas <[email protected]> wrote:

> I recall that it would fail if you have extra fields in the json that are
> not defined in the reader schema and not in the writer schema.
> let me look into it and will get back to you.
>
> —Z
>
>
> On Nov 18, 2016, at 7:21 AM, Josh <[email protected]> wrote:
>
> Hi Zoltan,
>
> Your ExtendedJsonDecoder / Encoder looks really useful for doing the
> conversions between JSON and Avro.
>
> I just have a quick question -  when I use the ExtendedJsonDecoder with a
> GenericDatumReader, I get an AvroTypeException whenever the JSON doesn't
> conform to the Avro schema (as expected). However, if the JSON has some
> additional fields (i.e. fields that are present in the JSON, but not
> present in the Avro schema), then the reader ignores those extra fields and
> converts the JSON to Avro successfully. Do you know if there's a simple way
> to make the reader detect these extra fields, and throw an exception in
> that case?
>
> Thanks,
> Josh
>
> On Thu, Aug 11, 2016 at 3:52 PM, Zoltan Farkas <[email protected]>
> wrote:
>
>> We are doing the same successfully so far… here is some detail:
>>
>> we do not use the standard JSON Encoders/Decoders from the avro project
>> and we have our own which provide a more “natural” JSON encoding that
>> implements:
>>
>> https://issues.apache.org/jira/browse/AVRO-1582
>>
>> For us it was also important to fix:
>>
>> https://issues.apache.org/jira/browse/AVRO-1723
>>
>> We had to use our own fork to be able to fix/implement our needs faster,
>> which you can look at: https://github.com/zolyfarkas/avro
>>
>> Here is how we use the avro schemas:
>>
>> We develop our avro schema’s in separate projects “schema projects”.
>>
>> These projects are standard maven projects, stored in version control,
>> build with CI and published to a maven repo the following:
>> 1) avro generated java objects, sources and javadoc.
>> 2) c# generated objects. (accessible with nugget to everybody)
>> 3) zip package containing all schemas.
>>
>> We use avro IDL to define the schemas in the project, the avsc json
>> format is difficult to read and maintain, the schema json is only a wire
>> format for us.
>>
>> We see these advantages:
>>
>> 1) Building/Releasing a schema project is identical with releasing any
>> maven project. (Jenkins, maven release plugin...)
>> 2) Using this we can take advantage of the maven dependency system and
>> reuse schemas. it is as simple as adding a <dependency> in your pom and a
>> import statement in your idl. (C# uses nugget)
>> 3) As a side result our maven repo becomes a schema repo. And so far we
>> see no reason to use a dedicated schema repo like:
>> https://issues.apache.org/jira/browse/AVRO-1124
>> 4) the schema owner not only publishes schemas but also publishes al DTOs
>> for java and .NET, this way any team that needs to use the schema has no
>> need to generate code, all they need is to add a package dependency to they
>> project.
>> 5) During the build we also validate compatibiliy with the previously
>> released schemas.
>> 6) During the build we also validate schema quality. (comments on fields,
>> naming…). We are planning to make this maven plugin open source.
>> 7) Maven dependencies give you all the data needed to figure out what
>> apps use a schema like: group:myschema:3.0
>> 8) A rest service that uses a avro object for payload, can serve/accept
>> data in: application/octet-stream;fmt=avro (avro binary),
>> application/json;fmt=avro (classic json encoding),
>> application/json;fmt=enhanced(AVRO-1582) allowing us to pick the right
>> format for the right use case. (AVRO-1582 json can be significantly smaller
>> in size than binary on certain type of data)
>> 9) During the build we generate improved HTML doc for the avro objects,
>> like: http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/
>>
>> The more we leverage avro the more use cases we find like:
>>
>> 1) config discovery plugin that scans code for uses of
>> System.getProperty… and generates a avro idl :
>> http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-m
>> aven-plugin/index.html
>> 2) generate avro idl from jdbc metadata...
>>
>> hope it helps!
>>
>> cheers
>>
>> —Z
>>
>>
>> On Aug 11, 2016, at 6:23 AM, Elliot West <[email protected]> wrote:
>>
>> Hello,
>>
>> We are building a data processing system that has the following required
>> properties:
>>
>>    - Data is produced/consumed in JSON format
>>    - These JSON documents must always adhere to a schema
>>    - The schema must be defined in JSON also
>>    - It should be possible to evolve schemas and verify schema
>>    compatibility
>>
>> I initially started looking at Avro, not as a solution, but to understand
>> how it schema evolution can be managed. However, I quickly discovered that
>> with its JSON support it is able to meet all of my requirements.
>>
>> I am now considering a system where data structure is defined using the
>> Avro JSON schema, data is submitted using JSON that is then internally
>> decoded into Avro records, these records are then eventually encoded back
>> into JSON at the point of consumption. It seems to me that I can then take
>> advantage of Avro’s schema evolution features, while only ever exposing
>> JSON to consumers and producers. Aside from the dependency on Avro’s JSON
>> schema syntax, the use of Avro then becomes an internal implementation
>> detail.
>>
>> As I am completely new to Avro, I was wondering if this is a credible
>> idea, or if anyone would care to share their experiences of similar systems
>> that they have built?
>>
>> Many thanks,
>>
>> Elliot.
>>
>>
>>
>
>

Re: Avro as a foundation of a JSON based system

Reply via email to