Re: effect of continuous deletes on index's read performance

Michael McCandless Mon, 06 Feb 2012 05:48:20 -0800

On Mon, Feb 6, 2012 at 8:20 AM, prasenjit mukherjee
<prasen....@gmail.com> wrote:


> Pardon my ignorance, Why can't the IndexWriter and IndexSearcher share
> the same underlying in-memory datastructure so that IndexSearcher need
> not be reopened with every commit.

Because the semantics of an IndexReader in Lucene guarantee an
unchanging point-in-time view of the index, as of when that
IndexReader was opened.

That said, Lucene has near-real-time readers, which keep point-in-time
semantics but are very fast to open after adding/deleting docs, and do
not require a (costly) commit.  EG see my blog post:

    
http://blog.mikemccandless.com/2011/06/lucenes-near-real-time-search-is-fast.html

The tests I ran there indexed at a highish rate (~1000 1KB sized docs
per second, or 1 MB plain text per second, or ~2X Twitter's peak rate,
at least as of last July), and the reopen latency was fast (~ 60
msec).  Admittedly this was a fast machine, and the index was on a
good SSD, and I used NRTCachingDir and MemoryCodec for the "id" field.

But net/net Lucene's NRT search is very fast.  It should easily handle
your 20 docs/second rate, unless your docs are enormous....

Solr trunk has finally cutover to using these APIs, but unfortunately
this has not been backported to Solr 3.x.  You might want to check out
ElasticSearch, an alternative to Solr, which does use Lucene's NRT
APIs....

Mike McCandless

http://blog.mikemccandless.com

Re: effect of continuous deletes on index's read performance

Reply via email to