For me, it appears to completely ignore fields in the JSON that aren't defined in the reader schema. The reader succeeds and builds a generic record (which excludes any additional fields in the JSON).
Thanks for looking into it! Josh On Fri, Nov 18, 2016 at 8:31 PM, Zoltan Farkas <[email protected]> wrote: > I recall that it would fail if you have extra fields in the json that are > not defined in the reader schema and not in the writer schema. > let me look into it and will get back to you. > > —Z > > > On Nov 18, 2016, at 7:21 AM, Josh <[email protected]> wrote: > > Hi Zoltan, > > Your ExtendedJsonDecoder / Encoder looks really useful for doing the > conversions between JSON and Avro. > > I just have a quick question - when I use the ExtendedJsonDecoder with a > GenericDatumReader, I get an AvroTypeException whenever the JSON doesn't > conform to the Avro schema (as expected). However, if the JSON has some > additional fields (i.e. fields that are present in the JSON, but not > present in the Avro schema), then the reader ignores those extra fields and > converts the JSON to Avro successfully. Do you know if there's a simple way > to make the reader detect these extra fields, and throw an exception in > that case? > > Thanks, > Josh > > On Thu, Aug 11, 2016 at 3:52 PM, Zoltan Farkas <[email protected]> > wrote: > >> We are doing the same successfully so far… here is some detail: >> >> we do not use the standard JSON Encoders/Decoders from the avro project >> and we have our own which provide a more “natural” JSON encoding that >> implements: >> >> https://issues.apache.org/jira/browse/AVRO-1582 >> >> For us it was also important to fix: >> >> https://issues.apache.org/jira/browse/AVRO-1723 >> >> We had to use our own fork to be able to fix/implement our needs faster, >> which you can look at: https://github.com/zolyfarkas/avro >> >> Here is how we use the avro schemas: >> >> We develop our avro schema’s in separate projects “schema projects”. >> >> These projects are standard maven projects, stored in version control, >> build with CI and published to a maven repo the following: >> 1) avro generated java objects, sources and javadoc. >> 2) c# generated objects. (accessible with nugget to everybody) >> 3) zip package containing all schemas. >> >> We use avro IDL to define the schemas in the project, the avsc json >> format is difficult to read and maintain, the schema json is only a wire >> format for us. >> >> We see these advantages: >> >> 1) Building/Releasing a schema project is identical with releasing any >> maven project. (Jenkins, maven release plugin...) >> 2) Using this we can take advantage of the maven dependency system and >> reuse schemas. it is as simple as adding a <dependency> in your pom and a >> import statement in your idl. (C# uses nugget) >> 3) As a side result our maven repo becomes a schema repo. And so far we >> see no reason to use a dedicated schema repo like: >> https://issues.apache.org/jira/browse/AVRO-1124 >> 4) the schema owner not only publishes schemas but also publishes al DTOs >> for java and .NET, this way any team that needs to use the schema has no >> need to generate code, all they need is to add a package dependency to they >> project. >> 5) During the build we also validate compatibiliy with the previously >> released schemas. >> 6) During the build we also validate schema quality. (comments on fields, >> naming…). We are planning to make this maven plugin open source. >> 7) Maven dependencies give you all the data needed to figure out what >> apps use a schema like: group:myschema:3.0 >> 8) A rest service that uses a avro object for payload, can serve/accept >> data in: application/octet-stream;fmt=avro (avro binary), >> application/json;fmt=avro (classic json encoding), >> application/json;fmt=enhanced(AVRO-1582) allowing us to pick the right >> format for the right use case. (AVRO-1582 json can be significantly smaller >> in size than binary on certain type of data) >> 9) During the build we generate improved HTML doc for the avro objects, >> like: http://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html#/ >> >> The more we leverage avro the more use cases we find like: >> >> 1) config discovery plugin that scans code for uses of >> System.getProperty… and generates a avro idl : >> http://zolyfarkas.github.io/spf4j/spf4j-config-discovery-m >> aven-plugin/index.html >> 2) generate avro idl from jdbc metadata... >> >> hope it helps! >> >> cheers >> >> —Z >> >> >> On Aug 11, 2016, at 6:23 AM, Elliot West <[email protected]> wrote: >> >> Hello, >> >> We are building a data processing system that has the following required >> properties: >> >> - Data is produced/consumed in JSON format >> - These JSON documents must always adhere to a schema >> - The schema must be defined in JSON also >> - It should be possible to evolve schemas and verify schema >> compatibility >> >> I initially started looking at Avro, not as a solution, but to understand >> how it schema evolution can be managed. However, I quickly discovered that >> with its JSON support it is able to meet all of my requirements. >> >> I am now considering a system where data structure is defined using the >> Avro JSON schema, data is submitted using JSON that is then internally >> decoded into Avro records, these records are then eventually encoded back >> into JSON at the point of consumption. It seems to me that I can then take >> advantage of Avro’s schema evolution features, while only ever exposing >> JSON to consumers and producers. Aside from the dependency on Avro’s JSON >> schema syntax, the use of Avro then becomes an internal implementation >> detail. >> >> As I am completely new to Avro, I was wondering if this is a credible >> idea, or if anyone would care to share their experiences of similar systems >> that they have built? >> >> Many thanks, >> >> Elliot. >> >> >> > >
