Hello Eugen and everybody on the list,

I've completed my exams but I've also done some work on the project, lately I've been reading up on the HBase API and AVRO API specifications[1] so that I can get to know them better.

If you need to store AVRO objects, basically, arrays of bytes, into HBase then you would need to store a schema with the data, for example in the header of the file, so that you can later read it, if the schema changes radically over time. Ofcourse AVRO does support some of extension to modifying it's schemas, if you would look at my test code[0] you'd see that I was able to extend an existing schema, and prove that it does work with backward compatibility, I've followed Boris Lublinky's article[4] on using AVRO to get more familiar with it.

I've encountered a situation in which I do want to store my data through AVRO on HBase(due to less memory and structured format and HBase integration) and I see that there is a class on "org.apache.hadoop.hbase.avro" like AvroServer which basically starts up a server through which all sorts of clients can interact with the data store, and also generated classes(e.g. AColumnValues, APut, AGet, etc.). These classes from what it would appear in my mind are used to translate the requests to the server into HBase Puts and Gets by also using the AvroUtils but I don't know if this is the way to go.

Another thing I've been considering is using Sam Pullara's HAvroBase implementation[2] and code on github[3]. Sam proposes storing only a hashcode of the schema and schemas stored separately. HAvroBase is much more than I would need as it also supports mysql, mongoDB, etc. So I could use only the storing part for the Lucene IndexWriter.

Another way to go is to assume that there will never be a change in the object schemas and just store data just the way it is. This is dangerous because if there is a change, we would have to change code, instead of a simple JSON schema.

[0] http://code.google.com/a/apache-extras.org/p/mailbox-lucene-index-hbase/source/browse/LuceneTest/src/test/java/org/apache/james/mailbox/lucene/avro/AvroInheritanceTest.java
[1] http://avro.apache.org/docs/current/spec.html
[2] http://www.javarants.com/2010/06/30/havrobase-a-searchable-evolvable-entity-store-on-top-of-hbase-and-solr/
[3] https://github.com/spullara/havrobase
[4] http://www.infoq.com/articles/ApacheAvro;jsessionid=6A801F1882512F455322B572F4B69E24

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to