I have a storage project considering adding Thrift or Avro to for record packing, and I have a couple questions.
Other than than type-id and field-ids, Avro and Thrift's designs seem isomorphic. *Is the binary format not including field-type-info something that's set in stone, or something that's open for feedback? * I prefer the philosophy of Avro, namely to expect schemas to be available, use those schemas for compatibility mapping, and to support dynamic schema parsing in any supported language. In fact, being able to parse schemas dynamically in any language is the real draw of Avro for me. (personally I'd prefer if they were actually Avro IDL, instead of JSON, but I understand that might complicate implementing client stubs). However, the fact that data is not tagged with any type-information is unacceptable dangerous for my application. There will be mechanisms for mapping records to schemas, and schemas will be stored, but if a schema were ever lost or corrupted, I can't afford for the data to turn into total junk. Unless data is trivially small, encoding a field type wouldn't change the size of the encoding much, but would provide some 'sanity checking' in addition to be able to recover the raw data even if a schema was lost or the schema ID for a piece of data was corrupted. Since Avro is relatively new, I'm asking to find out if this is anathama to the entire concept of Avro, or something something that was chosen, but might be reconsidered eventually. Going the thrift route for me will mean injecting a bit of the Avro philosophy into Thrift, namely, adding a Thrift IDL parser to the language I need, so I can save Thrift IDLs and then dynamically read them. However, doing this as a one-off for my language different than having a supported mechanism for all client languages -- like in Avro.
