On Sun, Nov 28, 2010 at 8:44 PM, Bruce Mitchener <[email protected]>wrote:
> To be clear, HAvroBase stores tuples of (schema ID, data) and then looks up > the schema from that ID. It doesn't store each schema separately / entirely > alongside the corresponding data records / entries. Ahh, yes, that's analagous to what I'm planning to do as well. The Schema-ID points to a directory of user-supplied schemas. However, it's important for me to have a contingency plan in case somehow, someday there is ever corruption that disconnected the schema-ID from the actual schema. I think putting a packed-binary format of the field-type-info into each record would give me what I want with a space-usage that's proportional to Thrift overall. It also seems like the kind of thing that could (possibly) one-day be a supported mechanism of Avro without actually changing the existing binary format. Best of all worlds. As a bonus, there are situations where the schemas i'll be using are so unchanging and common (i.e. embedded in code) that there really isn't any fear of them being lost. In these cases it's nice that Avro can be used to pack and unpack things without any field-type overhead. Thanks for the comments.
