Generally, Avro recommends storing the schema with the data. For a file that
means in the header of the file, for a key/value store that means in some
system metadata. Any individual store can only keep data serialized with one
schema.
In order for schema migration to work, the new code has to have the schema of
the old data. Old code should have access to the new data's schema too, then
you can support both forwards and backwards compatibility.
On Sep 16, 2010, at 1:34 AM, Robin Müller wrote:
> Thanks for the fast reply.
> I use the implementation for avro-serialization from voldemort
> (key-value-store) and it seems that they don't use the ResolvingDecoder.
> But I think there is a way to use an own implementation for the
> serialization in voldemort. So I'll give it a try with the ResolvingDecoder.
>
> Greetings,
> Robin
>
> Am 16.09.2010 10:21, schrieb Scott Carey:
>> Assuming Java: Are you using a ResolvingDecoder?
>>
>> This will happen by default if you are reading Generic or Specific records
>> from an Avro File, but if you are reading data otherwise, you have to use a
>> ResolvingDecoder to specify the expected (reader) and actual (writer)
>> schemas.
>>
>> On Sep 16, 2010, at 1:09 AM, Robin Müller wrote:
>>
>>
>>> Hi,
>>>
>>> I've read the part "Schema Resolution" of the Avro Specification. So I
>>> think that avro supports versioning of the schema.
>>> But when I try to change to following schema, an AvroTypeException will
>>> be thrown by reading data that was serialized with the old schema:
>>> {
>>> "name":"BrowserCountArray",
>>> "type":"record",
>>> "fields": [
>>> {
>>> "name":"BrowserCounts",
>>> "type":
>>> {
>>> "type":"array",
>>> "items": {
>>> "name": "BrowserCount",
>>> "type": "record",
>>> "fields": [
>>> {
>>> "name":"Browser",
>>> "type":"string"
>>> }, {
>>> "name":"Count",
>>> "type":"int"
>>> }]
>>> }
>>> }
>>> }]
>>> }
>>>
>>> For example I add a new field to the BrowserCount record like this:
>>>
>>> {
>>> "name":"BrowserCountArray",
>>> "type":"record",
>>> "fields": [
>>> {
>>> "name":"BrowserCounts",
>>> "type":
>>> {
>>> "type":"array",
>>> "items": {
>>> "name": "BrowserCount",
>>> "type": "record",
>>> "fields": [
>>> {
>>> "name":"Browser",
>>> "type":"string"
>>> }, {
>>> "name":"Count",
>>> "type":"int"
>>> }, {
>>> "name":"Blub",
>>> "type":"int",
>>> "default":"0"
>>> }]
>>> }
>>> }
>>> }]
>>> }
>>>
>>> Is it possible to add or remove fields from this record and read with
>>> this new schema data, that was serialized with an old one.
>>> Or is there another way to define an array of record which solves that
>>> problem.
>>>
>>> Thanks,
>>> Robin
>>>
>>