OK, I was able to reproduce the exception.
v1:
{"name": "Record", "type": "record",
"fields": [
{"name": "name", "type": "string"},
{"name": "id", "type": "int"}
]
}
v2:
{"name": "Record", "type": "record",
"fields": [
{"name": "name_rename", "type": "string", "aliases": ["name"]}
]
}
Step 1. Write Avro file using v1 generated class
Step 2. Read Avro file using v2 generated class
Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
at Record.put(Unknown Source)
at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
at Read.readFromAvro(Unknown Source)
at Read.main(Unknown Source)
The code to write/read the avro file didn't change from below.
On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <[email protected]> wrote:
> I'm trying to put together a simple test case to reproduce the
> exception. While I was creating the test case, I hit this behavior
> which doesn't seem right, but maybe it's my misunderstanding on how
> forward/backward compatibility should work:
>
> Schema v1:
>
> {"name": "Record", "type": "record",
> "fields": [
> {"name": "name", "type": "string"},
> {"name": "id", "type": "int"}
> ]
> }
>
> Schema v2:
>
> {"name": "Record", "type": "record",
> "fields": [
> {"name": "name_rename", "type": "string", "aliases": ["name"]},
> {"name": "new_field", "type": "int", "default":"0"}
> ]
> }
>
> In the 2nd version I:
>
> - removed field "id"
> - renamed field "name" to "name_rename"
> - added field "new_field"
>
> I write the v1 data file:
>
> public static Record createRecord(String name, int id) {
> Record record = new Record();
> record.name = name;
> record.id = id;
> return record;
> }
>
> public static void writeToAvro(OutputStream outputStream)
> throws IOException {
> DataFileWriter<Record> writer =
> new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
> writer.create(Record.SCHEMA$, outputStream);
>
> writer.append(createRecord("r1", 1));
> writer.append(createRecord("r2", 2));
>
> writer.close();
> outputStream.close();
> }
>
> I wrote a version-agnostic Read class:
>
> public static void readFromAvro(InputStream is) throws IOException {
> DataFileStream<Record> reader = new DataFileStream<Record>(
> is, new SpecificDatumReader<Record>());
> for (Record a : reader) {
> System.out.println(ToStringBuilder.reflectionToString(a));
> }
> IOUtils.cleanup(null, is);
> IOUtils.cleanup(null, reader);
> }
>
> Running the Read code against the v1 data file, and including the v1
> code-generated classes in the classpath produced:
>
> Record@6a8c436b[name=r1,id=1]
> Record@6baa9f99[name=r2,id=2]
>
> If I run the same code, but use just the v2 generated classes in the
> classpath I get:
>
> Record@39dd3812[name_rename=r1,new_field=1]
> Record@27b15692[name_rename=r2,new_field=2]
>
> The name_rename field seems to be good, but why would "new_field"
> inherit the values of the deleted field "id"?
>
> Cheers,
> Alex
>
>
>
>
>
>
>
> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <[email protected]> wrote:
>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>>> I then modified my original schema by adding, deleting and renaming
>>> some fields, creating version 2 of the schema. After re-creating the
>>> Java classes I attempted to read the version 1 file using the
>>> DataFileStream (with a SpecificDatumReader), and this is throwing an
>>> exception.
>>
>> This should work. Can you provide more detail? What is the exception?
>> A reproducible test case would be great to have.
>>
>> Thanks,
>>
>> Doug
>>
>