Re: schema resolution and record names

Scott Carey Tue, 05 Oct 2010 09:26:12 -0700

I understand what you are describing and how it may not be consistent with the 
spec.


I don't have any time to look at it at the moment however.

This is certainly a use case that makes a lot of sense, so even if we become 
more strict here there will be a way to achieve namespace migration.   In the 
case of data files specifically, it makes sense to have weaker matching 
strictness by default.


On Oct 4, 2010, at 6:38 PM, Patrick Linehan wrote:

i'd be happy to create a fully-working code example if that would help.  i have 
some firewall issues that prevent me from attaching the actual code i'm 
actually working with.

On Mon, Oct 4, 2010 at 2:06 PM, Patrick Linehan 
<[email protected]<mailto:[email protected]>> wrote:
the "problem" i'm having is that i seem to be getting alias-like functionality 
without using aliases.  i put "problem" in quotes because i actually like the 
behavior, i just don't see how it jives with the spec.  maybe a code example is 
a better way to go about this.

i create a data file as follows:

Schema schemaA = ...
Schema schemaB = ...
GenericDatumWriter datumWriter = new GenericDatumWriter(schemaA);
DataFileWriter fileWriter = new DataFileWriter(datumWriter);
OutputStream out = new FileOutputStream("datafile.avro");
fileWriter.create(schemaA, out);
fileWriter.append(<RECORD>);
fileWriter.close();

both schemaA and schemaB contain a single record definition, each with exactly 
the same primitive-type fields; same types, same names, same order.  however, 
the record names and namespaces differ.

using "avro-tools getschema", i can see that the schema stored in the file is 
schemaA.  also, if i create a GenericDatumReader and read the file, the 
returned GenericRecord values have a schema of schemaA.

however, i can also read the file using a SpecificDatumReader which is 
initialized to the specific type corresponding to schemaB (let's call that 
class RecordB), the schema which does _not_ match the schema of the file:

SpecificDatumReader datumReader = new SepcificDatumReader(RecordB.class);
DataFileReader fileReader = new DataFileReader(new File("datafile.avro"), 
datumReader);
RecordB record = fileReader.next();
fileReader.close();

examining the fields of "record" i see that the data has been parsed correctly, 
as if RecordB's schema (the "reader's schema") was correctly resolved with 
schemaA (the "writer's schema").

is this the expected behavior in this case?  does this not seem to contradict 
the schema resolution portions of the spec?  is this behavior specific to 
DataFileReader, since i "forced" the record type upon the reader?

also, thanks for taking the time to reply.  i very much appreciate it.

sincerely,
Confused

On Mon, Oct 4, 2010 at 1:10 PM, Doug Cutting 
<[email protected]<mailto:[email protected]>> wrote:
On 10/01/2010 05:45 PM, Patrick Linehan wrote:
am i misunderstanding the documentation?  is the behavior i'm seeing
expected?  when does a record name/namespace conflict actually cause an
error to be thrown?

The alias feature in Avro 1.4 will let you read records whose name or namespace 
differ:

http://avro.apache.org/docs/current/spec.html#Aliases

Does that help?

Doug

Re: schema resolution and record names

Reply via email to