On 6/12/12 6:09 PM, "Christophe Taton" <[email protected]> wrote:
> On Tue, Jun 12, 2012 at 11:13 AM, Doug Cutting <[email protected]> wrote: >> On Tue, Jun 12, 2012 at 10:38 AM, Christophe Taton <[email protected]> >> wrote: >>> > I need my server to handle records with fields that can be "freely" >>> extended >>> > by users, without requiring a recompile and restart of the server. >>> > The server itself does not need to know how to handle the content of this >>> > extensible field. >>> > >>> > One way to achieve this is to have a bytes field whose content is managed >>> > externally, but this is very ineffective in many ways. >>> > Is there a another way to do this with Avro? >> >> You could use a very generic schema, like: >> >> {"type":"record", "name":"Value", fields: [ >> {"name":"value", "type": ["int","float","boolean", ... >> {"type":"map", "values":"Value"}} >> ]} >> >> This is roughly equivalent to a binary encoding of JSON. But by using >> a map it forces the serialization of a field name with every field >> value. Not only does that make payloads bigger but it also makes them >> slower to construct and parse. >> >> Another approach is to include the Avro schema for a value in the record, >> e.g.: >> >> {"type":"record", "name":"Extensions", fields: [ >> {"name":"schema", type: "string"}, >> {"name":"values", "type": {"type":"array", "items":"bytes"}} >> ]} >> >> This can make things more compact when there are a lot of values. For >> example, this might be used in a search application where each query >> lists the fields its interested in retrieving and each response >> contains a list of records that match the query and contain just the >> requested fields. The field names are not included in each match, but >> instead once for entire set of matches, making this faster and more >> compact. >> >> Finally, if you have a stateful connection then you can send send a >> schema in the first request then just send bytes encoding instances of >> that schema in subsequent requests over that connection. This again >> avoids sending field names with each field value. > > Thanks for the detailed reply! > > In practice, I have a bunch of independent records, each of them carrying at > most one "extension field". > > I was especially hoping there would be a way to avoid serializing an > "extension" record twice (once from the record object into a bytes field, and > then a second time as a bytes field into the destination output stream). > Ideally, such an extension field should not require its content to be bytes, > but should accept any record object, so that it is encoded only once. > As I understand it, Avro does not allow me to do this right now. Is this > correct? If your extension field (or fields) was a union of the allowed types its type can be detected at runtime. If the name is dynamic as well, it can be a pair record with name and data. If there are multiple types then an array or map can be used. Lastly, the option of encoding a blob as bytes and nesting it can be done this blob can be Avro or anything else. I can imagine an Avro RPC server and Client API that allowed for great flexibility in registering and responding to custom RPC types, but both the client and server in such a situation would have to be paired up to deal with interpreting which schema variations map to some sort of schema resolution versus a dynamic payload. > > Thanks, > Christophe
