On Mon, Feb 6, 2012 at 8:20 AM, prasenjit mukherjee <prasen....@gmail.com> wrote:
> Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share > the same underlying in-memory datastructure so that IndexSearcher need > not be reopened with every commit. Because the semantics of an IndexReader in Lucene guarantee an unchanging point-in-time view of the index, as of when that IndexReader was opened. That said, Lucene has near-real-time readers, which keep point-in-time semantics but are very fast to open after adding/deleting docs, and do not require a (costly) commit. EG see my blog post: http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html The tests I ran there indexed at a highish rate (~1000 1KB sized docs per second, or 1 MB plain text per second, or ~2X Twitter's peak rate, at least as of last July), and the reopen latency was fast (~ 60 msec). Admittedly this was a fast machine, and the index was on a good SSD, and I used NRTCachingDir and MemoryCodec for the "id" field. But net/net Lucene's NRT search is very fast. It should easily handle your 20 docs/second rate, unless your docs are enormous.... Solr trunk has finally cutover to using these APIs, but unfortunately this has not been backported to Solr 3.x. You might want to check out ElasticSearch, an alternative to Solr, which does use Lucene's NRT APIs.... Mike McCandless http://blog.mikemccandless.com