Storing hashes or pointers  to schemas or schema hashes is the typical way to 
deal with this.


http://www.quora.com/What-is-the-best-way-to-work-with-Avro-serialized-data-structures-in-a-database

http://www.javarants.com/2010/06/30/havrobase-a-searchable-evolvable-entity-store-on-top-of-hbase-and-solr/

Search-hadoop.com finds previous discussions on this topic:
http://search-hadoop.com/m/3iG061GVhHd2/HAvroBase&subj=Re+Versioning+of+an+array+of+a+record

http://search-hadoop.com/m/ZajsGoopYw/HAvroBase&subj=Re+question+about+completely+untagged+data+


http://search-hadoop.com/m/pz55F1beCEu1/HAvroBase&subj=Re+Setting+bytes+in+Java


In Hbase you can also play tricks with column names to match up schemas with 
versions — append or prepend a version number to the column name and query with 
a pattern match on the column.  You might need 0.92 and its coprocessors to use 
different deserializations per record returned however.



On 2/11/11 6:32 PM, "Garrett Wu" 
<[email protected]<mailto:[email protected]>> wrote:

If I use avro to store messages into cells in HBase, would I need to store the 
writer schema along with it in every cell?

A problem that I foresee is that I might modify my schema and write new 
versions to some of the cells in some rows of the table and then things would 
blow up unless I had stored the writer schema in every cell.  Is there a better 
alternative?

Reply via email to