> Do you want to do Term- or Document partitioning? It sounds like no one uses term partitioning, doc-partitioning seems to be the most logical default?
> serve the index shards from memory In Lucene-land this's a function of allocating enough RAM for the system IO cache. On Sun, Feb 13, 2011 at 8:26 AM, Thomas Koch <tho...@koch.ro> wrote: > Jason Rutherglen: >> Hello, >> >> I'm curious as to what a 'good' approach would be for implementing >> search in HBase (using Lucene) with the end goal being the integration >> of realtime search into HBase. I think the use case makes sense as >> HBase is realtime and has a write-ahead log, performs automatic >> partitioning, splitting of data, failover, redundancy, etc. These are >> all things Lucene does not have out of the box, that we'd essentially >> get for 'free'. >> >> For starters: Where would be the right place to store Lucene segments >> or postings? Eg, we need to be able to efficiently perform a linear >> iteration of the per-term posting list(s). >> >> Thanks! >> >> Jason Rutherglen > Hi Jason, > > I had the same idea around last year but didn't continue it since I'm leaving > the company right now. > Do you want to do Term- or Document partitioning? Both have advantages and > disadvantages. You can get a very good introduction in chapter 14.1 of this > book: > http://www.ir.uwaterloo.ca/book > > The following lecture gives a very interesting insight on Google's index > architecture: > http://videolectures.net/wsdm09_dean_cblirs > > Projects that do Document partitioning: > distributed solr, katta, elasticsearch, linkedin's Sensei > Projects that do Term partitioning: > lucandra/solandra (using cassandra), hbasene (which is abandoned since a year) > > I very much thought that hbasene would be a perfect solution for scalable > search, but the above book and video convinced me that improving katta would > be the way to go: > - implement an indexing solution for katta > - serve the index shards from memory, as google apparently does > > Hope I could help, please keep us posted, > > Thomas Koch, http://www.koch.ro >