It would be nice if the Avro has a way for the message to carry the URL of the 
schema, much like it can carry the schema within it. We could pass it 
separately out of band (e.g. header) but that reduces the strength of the link 
between the message and the URL of the schema.

Any thoughts on supporting this?

Neil

----- Original Message -----
From: Doug Cutting <[email protected]>
To: [email protected]
Cc: 
Sent: Tuesday, December 6, 2011 1:48 PM
Subject: Re: schema by reference

On 12/06/2011 11:14 AM, Neil Davudo wrote:
> Yes, by a URL. Messages would be smaller in size when we store large numbers 
> of them, and we can always get the schema using the reference if necessary. 
> Similar to what we can do with WSDL having a reference to the XSD.

This is a reasonable thing to do.

A schema can easily be constructed from a URL with:

Schema.parse(url.openStream())

although one would probably want a cache in front of this.

Note that in Avro one one must ensure that the version of the schema at
the reference does not change, that it is identical to the version used
to write the datum.  So one should not probably not use a logical URL
for a datatype like http://me.com/schemas/FooRecord but rather a unique
ID like http://me.com/schemas/9fd73.

If you're using a database (e.g., HBase) then you can have a table that
of schemas, then, in other tables, store values annotated with the key
of the entry in the schema table.  https://github.com/spullara/havrobase
is one example of such an approach.

Or one might use a URL shortener for this, e.g.:

http://tinyurl.com/8a4rppd

redirects to

avro:///?{"type":"record","name":"foo","fields":[]}

One could then install a URL handler for "avro" URLs that resolves them
to their query string.

Doug

Reply via email to