Hi Lewis,

On 26 Mar 2014, at 14:34, Lewis John Mcgibbney 
<[email protected]<mailto:[email protected]>> wrote:
What actually happens with the Avro Schema in Gora is that it is permanently 
included in the generated data bean. This way you know the Schema when you read 
your data. You can see an example here

https://svn.apache.org/repos/asf/gora/branches/GORA_94/gora-core/src/examples/java/org/apache/gora/examples/generated/WebPage.java

public static final org.apache.avro.Schema SCHEMA$ = new 
org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"WebPage\",...
 blah blah blah

i would therefore question a justification as to why you _need_ to store the 
Schema with the data.

Say you make a change to the schema. Your database now contains some records 
that were written before the schema change (i.e. encoded with schema v1) and 
some records that were written afterwards (encoded with schema v2). Ideally, an 
application should be able to read them all transparently and not have to care 
which schema version is used in the underlying store.

In Avro, schema evolution takes care of this. However, in order to handle 
evolution correctly, the process reading the data from the database needs to 
know two schemas:

1. the schema that the client is expecting to see, usually the latest version 
of the schema (the "reader's schema"),
2. the schema with which the data was originally written, which may be an older 
version (the "writer's schema").

The schema that is included in the generated code covers 1., but in order to 
have 2. you need to either store the writer's schema long with the data, or 
some kind of fingerprint or version of the writer's schema.

How does Gora handle this? I looked through the website but couldn't find a 
clear answer.

Martin

Reply via email to