Hello Eugen and everybody on the list,
I've completed my exams but I've also done some work on the project,
lately I've been reading up on the HBase API and AVRO API
specifications[1] so that I can get to know them better.
If you need to store AVRO objects, basically, arrays of bytes, into
HBase then you would need to store a schema with the data, for example
in the header of the file, so that you can later read it, if the schema
changes radically over time. Ofcourse AVRO does support some of
extension to modifying it's schemas, if you would look at my test
code[0] you'd see that I was able to extend an existing schema, and
prove that it does work with backward compatibility, I've followed Boris
Lublinky's article[4] on using AVRO to get more familiar with it.
I've encountered a situation in which I do want to store my data through
AVRO on HBase(due to less memory and structured format and HBase
integration) and I see that there is a class on
"org.apache.hadoop.hbase.avro" like AvroServer which basically starts up
a server through which all sorts of clients can interact with the data
store, and also generated classes(e.g. AColumnValues, APut, AGet, etc.).
These classes from what it would appear in my mind are used to translate
the requests to the server into HBase Puts and Gets by also using the
AvroUtils but I don't know if this is the way to go.
Another thing I've been considering is using Sam Pullara's HAvroBase
implementation[2] and code on github[3]. Sam proposes storing only a
hashcode of the schema and schemas stored separately. HAvroBase is much
more than I would need as it also supports mysql, mongoDB, etc. So I
could use only the storing part for the Lucene IndexWriter.
Another way to go is to assume that there will never be a change in the
object schemas and just store data just the way it is. This is dangerous
because if there is a change, we would have to change code, instead of a
simple JSON schema.
[0]
http://code.google.com/a/apache-extras.org/p/mailbox-lucene-index-hbase/source/browse/LuceneTest/src/test/java/org/apache/james/mailbox/lucene/avro/AvroInheritanceTest.java
[1] http://avro.apache.org/docs/current/spec.html
[2]
http://www.javarants.com/2010/06/30/havrobase-a-searchable-evolvable-entity-store-on-top-of-hbase-and-solr/
[3] https://github.com/spullara/havrobase
[4]
http://www.infoq.com/articles/ApacheAvro;jsessionid=6A801F1882512F455322B572F4B69E24
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org