On 12.06.2012 11:30, Eric Charles wrote:
Hi Mihai,

Glad to hear your exams are over (I hope they went fine) :)
Hi Eric,

Thanks, they went very well, I got high marks.

As Ioan said, Avro serialization HBase will be deprecated in favor of Protobuf (if I understand well...).

I think Avro could be changed rather easily with Protobuf as they're both doing basically the same thing, only that Avro uses JSON schemas and can be used with any other language, which is of no of value to the project.

I also like Avro because it gives you serialization & storage format in one box, but is this what we want? The key point here is more an effective access to the persisted data.

If the data is passed through Avro we'll have it serialized and deserialization is basically handled by Avro, but we'll always have to interact with the schemas. In Protobuf we have the objects compiled into our classes, from what i gather it's mostly usefull for RPC, but Avro also has the protocol in which by using the avro-maven-plugin you can generate you own classes with which to interact. I can't say I'm an expert in either but I fancy Avro.


There has been a few tentatives so far to marry HBase and Lucene (see [1], [2], [3] and [4] for example, see also [5] for a more recent article).

Thank you for the github links, i will look thouroughly through the projects. I was already aware of Basene and Solandra(former Lucandra), they have simillar aproaches.
The questions I am wondering:

1. Will you focus on a 'generic' solution (reusable outside James), or on a very specific one tuned/optimized only for James mailbox needs?
I was thinking of writing generic code so that maybe it could be used outside of James but the data format would be specific to James mailbox needs, so the answer in the end is that it will be tuned for James.

2. What strategy will you take (custom Directory or custom IndexReader/Writer, usage of Coprocessor or not...)?
I was thinking that a custom Directory was the way to go, but I soon realized that it's not as simple as it sounds and overriding the higher level classes of IndexReader and IndexWriter would be more appropriate.(as in article [5]) So by bypassing the Directory I would have to make use of Hbase Coprocessors. As far as I can think of it, a RegionObserver would be employed to gather frequently performed on data for the Lucene queries and Endpoints.


[1] https://github.com/akkumar/hbasene
[2] https://github.com/thkoch2001/lucehbase
[3] https://github.com/jasonrutherglen/HBASE-SEARCH
[4] https://github.com/jasonrutherglen/LUCENE-FOR-HBASE
[5] http://www.infoq.com/articles/LuceneHbase


---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to