Storing hashes or pointers to schemas or schema hashes is the typical way to deal with this.
http://www.quora.com/What-is-the-best-way-to-work-with-Avro-serialized-data-structures-in-a-database http://www.javarants.com/2010/06/30/havrobase-a-searchable-evolvable-entity-store-on-top-of-hbase-and-solr/ Search-hadoop.com finds previous discussions on this topic: http://search-hadoop.com/m/3iG061GVhHd2/HAvroBase&subj=Re+Versioning+of+an+array+of+a+record http://search-hadoop.com/m/ZajsGoopYw/HAvroBase&subj=Re+question+about+completely+untagged+data+ http://search-hadoop.com/m/pz55F1beCEu1/HAvroBase&subj=Re+Setting+bytes+in+Java In Hbase you can also play tricks with column names to match up schemas with versions — append or prepend a version number to the column name and query with a pattern match on the column. You might need 0.92 and its coprocessors to use different deserializations per record returned however. On 2/11/11 6:32 PM, "Garrett Wu" <[email protected]<mailto:[email protected]>> wrote: If I use avro to store messages into cells in HBase, would I need to store the writer schema along with it in every cell? A problem that I foresee is that I might modify my schema and write new versions to some of the cells in some rows of the table and then things would blow up unless I had stored the writer schema in every cell. Is there a better alternative?
