Hi Lewis, On 26 Mar 2014, at 14:34, Lewis John Mcgibbney <[email protected]<mailto:[email protected]>> wrote: What actually happens with the Avro Schema in Gora is that it is permanently included in the generated data bean. This way you know the Schema when you read your data. You can see an example here
https://svn.apache.org/repos/asf/gora/branches/GORA_94/gora-core/src/examples/java/org/apache/gora/examples/generated/WebPage.java public static final org.apache.avro.Schema SCHEMA$ = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"WebPage\",... blah blah blah i would therefore question a justification as to why you _need_ to store the Schema with the data. Say you make a change to the schema. Your database now contains some records that were written before the schema change (i.e. encoded with schema v1) and some records that were written afterwards (encoded with schema v2). Ideally, an application should be able to read them all transparently and not have to care which schema version is used in the underlying store. In Avro, schema evolution takes care of this. However, in order to handle evolution correctly, the process reading the data from the database needs to know two schemas: 1. the schema that the client is expecting to see, usually the latest version of the schema (the "reader's schema"), 2. the schema with which the data was originally written, which may be an older version (the "writer's schema"). The schema that is included in the generated code covers 1., but in order to have 2. you need to either store the writer's schema long with the data, or some kind of fingerprint or version of the writer's schema. How does Gora handle this? I looked through the website but couldn't find a clear answer. Martin
