Re: Avro as a foundation of a JSON based system

Zoltan Farkas Thu, 11 Aug 2016 07:53:10 -0700

We are doing the same successfully so far… here is some detail:

we do not use the standard JSON Encoders/Decoders from the avro project and we 
have our own which provide a more “natural” JSON encoding that implements:

https://issues.apache.org/jira/browse/AVRO-1582 
<https://issues.apache.org/jira/browse/AVRO-1582>

For us it was also important to fix:

https://issues.apache.org/jira/browse/AVRO-1723 
<https://issues.apache.org/jira/browse/AVRO-1723>

We had to use our own fork to be able to fix/implement our needs faster, which 
you can look at: https://github.com/zolyfarkas/avro 
<https://github.com/zolyfarkas/avro> 

Here is how we use the avro schemas:

We develop our avro schema’s in separate projects “schema projects”.

These projects are standard maven projects, stored in version control, build 
with CI and published to a maven repo the following:
1) avro generated java objects, sources and javadoc.
2) c# generated objects. (accessible with nugget to everybody)
3) zip package containing all schemas.

We use avro IDL to define the schemas in the project, the avsc json format is 
difficult to read and maintain, the schema json is only a wire format for us.

We see these advantages:

1) Building/Releasing a schema project is identical with releasing any maven 
project. (Jenkins, maven release plugin...)
2) Using this we can take advantage of the maven dependency system and reuse 
schemas. it is as simple as adding a <dependency> in your pom and a import 
statement in your idl. (C# uses nugget)
3) As a side result our maven repo becomes a schema repo. And so far we see no 
reason to use a dedicated schema repo like: 
https://issues.apache.org/jira/browse/AVRO-1124 
<https://issues.apache.org/jira/browse/AVRO-1124> 
4) the schema owner not only publishes schemas but also publishes al DTOs for 
java and .NET, this way any team that needs to use the schema has no need to 
generate code, all they need is to add a package dependency to they project.
5) During the build we also validate compatibiliy with the previously released 
schemas.
6) During the build we also validate schema quality. (comments on fields, 
naming…). We are planning to make this maven plugin open source.
7) Maven dependencies give you all the data needed to figure out what apps use 
a schema like: group:myschema:3.0
8) A rest service that uses a avro object for payload, can serve/accept data 
in: application/octet-stream;fmt=avro (avro binary), application/json;fmt=avro 
(classic json encoding), application/json;fmt=enhanced(AVRO-1582) allowing us 
to pick the right format for the right use case. (AVRO-1582 json can be 
significantly smaller in size than binary on certain type of data)
9) During the build we generate improved HTML doc for the avro objects, like: 
http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/ 
<http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/> 

The more we leverage avro the more use cases we find like:

1) config discovery plugin that scans code for uses of System.getProperty… and 
generates a avro idl : 
http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-maven-plugin/index.html

<http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-maven-plugin/index.html>

2) generate avro idl from jdbc metadata...

hope it helps!

cheers

—Z

> On Aug 11, 2016, at 6:23 AM, Elliot West <[email protected]> wrote:
> 
> Hello,
> 
> We are building a data processing system that has the following required 
> properties:
> Data is produced/consumed in JSON format
> These JSON documents must always adhere to a schema
> The schema must be defined in JSON also
> It should be possible to evolve schemas and verify schema compatibility
> I initially started looking at Avro, not as a solution, but to understand how 
> it schema evolution can be managed. However, I quickly discovered that with 
> its JSON support it is able to meet all of my requirements.
> 
> I am now considering a system where data structure is defined using the Avro 
> JSON schema, data is submitted using JSON that is then internally decoded 
> into Avro records, these records are then eventually encoded back into JSON 
> at the point of consumption. It seems to me that I can then take advantage of 
> Avro’s schema evolution features, while only ever exposing JSON to consumers 
> and producers. Aside from the dependency on Avro’s JSON schema syntax, the 
> use of Avro then becomes an internal implementation detail.
> 
> As I am completely new to Avro, I was wondering if this is a credible idea, 
> or if anyone would care to share their experiences of similar systems that 
> they have built?
> 
> Many thanks,
> 
> Elliot.
>

Re: Avro as a foundation of a JSON based system

Reply via email to