On Sun, Nov 28, 2010 at 8:44 PM, Bruce Mitchener
<[email protected]>wrote:

> To be clear, HAvroBase stores tuples of (schema ID, data) and then looks up
> the schema from that ID.  It doesn't store each schema separately / entirely
> alongside the corresponding data records / entries.


Ahh, yes, that's analagous to what I'm planning to do as well. The Schema-ID
points to a directory of user-supplied schemas. However, it's important for
me to have a contingency plan in case somehow, someday there is ever
corruption that disconnected the schema-ID from the actual schema.

I think putting a packed-binary format of the field-type-info into each
record would give me what I want with a space-usage that's proportional to
Thrift overall. It also seems like the kind of thing that could (possibly)
one-day be a supported mechanism of Avro without actually changing the
existing binary format. Best of all worlds.

As a bonus, there are situations where the schemas i'll be using are so
unchanging and common (i.e. embedded in code) that there really isn't any
fear of them being lost. In these cases it's nice that Avro can be used to
pack and unpack things without any field-type overhead.

Thanks for the comments.

Reply via email to