Hi,

I'm writing a GUI program that can search a PyLucene index of files, and
can update the index in the background.  To update the index, I build a new
index of updated files (during which time the user can still search the main
index) and then merge the indexes together using IndexWriter.addIndexes().
(I don't mind if the user can't search during the merge.)

The problem I'm having is that IndexWriter.addIndexes() blocks the whole
program, not just the thread that's running it.  Below is a small script
that demonstrates the problem.  It builds two indexes, then merges one into
the other in a worker thread while the main thread prints some output.  The
results looks like this:

Main thread 0: 0.00
Main thread 1: 0.02
Starting merge
Merge complete - 0.25 seconds
Main thread 2: 0.28
Main thread 3: 0.30
Main thread 4: 0.32

As you can see, the main thread blocks while the worker thread is in the
call to IndexWriter.addIndexes().  In the context of my GUI, this means
that the whole GUI freezes while the merge is happening, and it can take
minutes to complete.

Presumably IndexWriter.addIndexes() is holding onto the GIL - is that
really correct?

I'm using PyLucene 0.9.6 with Python 2.4 on Windows XP.

Here's the test script that reproduces the problem:

---------------------------------------------------------------------------

import time
from PyLucene import *

def mergeIndexes(destinationWriter, sourceStore):
    time.sleep(0.03)  # Let the main thread continue a bit before we start.
    print "Starting merge"
    start = time.time()
    destinationWriter.addIndexes([sourceStore])  # This blocks all threads!
    print "Merge complete - %2.2f seconds" % (time.time() - start)

# Create a 100-document index, index-1.
store1 = FSDirectory.getDirectory('index-1', True)
writer1 = IndexWriter(store1, WhitespaceAnalyzer(), True)
for i in range(1, 100):
    doc = Document()
    doc.add(Field('text', 'This is in index-1', False, True, True))
    writer1.addDocument(doc)
writer1.optimize()
writer1.close()

# Create a 1-document index, index-2.
store2 = FSDirectory.getDirectory('index-2', True)
writer2 = IndexWriter(store2, WhitespaceAnalyzer(), True)
doc = Document()
doc.add(Field('text', 'This is in index-2', False, True, True))
writer2.addDocument(doc)

# Merge index-1 into index-2 in a worker thread.
PythonThread(target=mergeIndexes, args=(writer2, store1)).start()

# Try to do something while the worker thread runs.
start = time.time()
for i in range(5):
    print "Main thread %d: %2.2f" % (i, time.time() - start)
    time.sleep(0.02)

---------------------------------------------------------------------------

Thanks,

-- 
Richie Hindle
[EMAIL PROTECTED]

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to