Are you talking about RPC? Earlier you said, "messages would be smaller in size when we store large numbers of them", which led me to think you're talking about some sort of data store.
If you're talking about RPC then there's already a reference passed, the MD5 sum of the protocol text. The client and/or server could maintain a persistent database of these so that the text need never be transmitted. If that's not appropriate then one could devise a different RPC mechanism that instead uses, e.g., URLs. Perhaps these could be included in the handshake metadata of the existing RPC mechanism, as an extension. If you're talking about file-based storage, then Avro's data file format already factors out the schema. If you're talking about some other sort of storage, then I'm not sure what modifications to Avro would be required to support this. Doug On 12/06/2011 12:10 PM, Neil Davudo wrote: > It would be nice if the Avro has a way for the message to carry the URL of > the schema, much like it can carry the schema within it. We could pass it > separately out of band (e.g. header) but that reduces the strength of the > link between the message and the URL of the schema. > > Any thoughts on supporting this? > > Neil > > ----- Original Message ----- > From: Doug Cutting <[email protected]> > To: [email protected] > Cc: > Sent: Tuesday, December 6, 2011 1:48 PM > Subject: Re: schema by reference > > On 12/06/2011 11:14 AM, Neil Davudo wrote: >> Yes, by a URL. Messages would be smaller in size when we store large numbers >> of them, and we can always get the schema using the reference if necessary. >> Similar to what we can do with WSDL having a reference to the XSD. > > This is a reasonable thing to do. > > A schema can easily be constructed from a URL with: > > Schema.parse(url.openStream()) > > although one would probably want a cache in front of this. > > Note that in Avro one one must ensure that the version of the schema at > the reference does not change, that it is identical to the version used > to write the datum. So one should not probably not use a logical URL > for a datatype like http://me.com/schemas/FooRecord but rather a unique > ID like http://me.com/schemas/9fd73. > > If you're using a database (e.g., HBase) then you can have a table that > of schemas, then, in other tables, store values annotated with the key > of the entry in the schema table. https://github.com/spullara/havrobase > is one example of such an approach. > > Or one might use a URL shortener for this, e.g.: > > http://tinyurl.com/8a4rppd > > redirects to > > avro:///?{"type":"record","name":"foo","fields":[]} > > One could then install a URL handler for "avro" URLs that resolves them > to their query string. > > Doug >
