Re: schema resolution and record names

Patrick Linehan Tue, 05 Oct 2010 12:12:41 -0700

>
> In the case of data files specifically, it makes sense to have weaker
> matching strictness by default.



yes, this seems quite reasonable.  perhaps a simplified API for aliasing
schemas when records in a data file only mismatch the specific type in a few
specific, "harmless" ways.  as doug points out, though, with his Date/Time
example, its hard to say what a "harmless" way might be.

this is a bigger fish than i'd care to fry right now, though :)

thanks for your help!

On Tue, Oct 5, 2010 at 9:26 AM, Scott Carey <[email protected]> wrote:

> I understand what you are describing and how it may not be consistent with
> the spec.
>
> I don't have any time to look at it at the moment however.
>
> This is certainly a use case that makes a lot of sense, so even if we
> become more strict here there will be a way to achieve namespace migration.
>   In the case of data files specifically, it makes sense to have weaker
> matching strictness by default.
>
>
> On Oct 4, 2010, at 6:38 PM, Patrick Linehan wrote:
>
> i'd be happy to create a fully-working code example if that would help.  i
> have some firewall issues that prevent me from attaching the actual code i'm
> actually working with.
>
> On Mon, Oct 4, 2010 at 2:06 PM, Patrick Linehan <[email protected]>wrote:
>
>> the "problem" i'm having is that i seem to be getting alias-like
>> functionality without using aliases.  i put "problem" in quotes because i
>> actually like the behavior, i just don't see how it jives with the spec.
>>  maybe a code example is a better way to go about this.
>>
>> i create a data file as follows:
>>
>> Schema schemaA = ...
>> Schema schemaB = ...
>> GenericDatumWriter datumWriter = new GenericDatumWriter(schemaA);
>> DataFileWriter fileWriter = new DataFileWriter(datumWriter);
>> OutputStream out = new FileOutputStream("datafile.avro");
>> fileWriter.create(schemaA, out);
>> fileWriter.append(<RECORD>);
>> fileWriter.close();
>>
>> both schemaA and schemaB contain a single record definition, each with
>> exactly the same primitive-type fields; same types, same names, same order.
>>  however, the record names and namespaces differ.
>>
>> using "avro-tools getschema", i can see that the schema stored in the file
>> is schemaA.  also, if i create a GenericDatumReader and read the file, the
>> returned GenericRecord values have a schema of schemaA.
>>
>> however, i can also read the file using a SpecificDatumReader which is
>> initialized to the specific type corresponding to schemaB (let's call that
>> class RecordB), the schema which does _not_ match the schema of the file:
>>
>> SpecificDatumReader datumReader = new SepcificDatumReader(RecordB.class);
>> DataFileReader fileReader = new DataFileReader(new File("datafile.avro"),
>> datumReader);
>> RecordB record = fileReader.next();
>> fileReader.close();
>>
>> examining the fields of "record" i see that the data has been parsed
>> correctly, as if RecordB's schema (the "reader's schema") was correctly
>> resolved with schemaA (the "writer's schema").
>>
>> is this the expected behavior in this case?  does this not seem to
>> contradict the schema resolution portions of the spec?  is this behavior
>> specific to DataFileReader, since i "forced" the record type upon the
>> reader?
>>
>> also, thanks for taking the time to reply.  i very much appreciate it.
>>
>> sincerely,
>> Confused
>>
>> On Mon, Oct 4, 2010 at 1:10 PM, Doug Cutting <[email protected]> wrote:
>>
>>> On 10/01/2010 05:45 PM, Patrick Linehan wrote:
>>>
>>>> am i misunderstanding the documentation?  is the behavior i'm seeing
>>>> expected?  when does a record name/namespace conflict actually cause an
>>>> error to be thrown?
>>>>
>>>
>>> The alias feature in Avro 1.4 will let you read records whose name or
>>> namespace differ:
>>>
>>> http://avro.apache.org/docs/current/spec.html#Aliases
>>>
>>> Does that help?
>>>
>>> Doug
>>>
>>
>>
>
>

Re: schema resolution and record names

Reply via email to