Re: [pylucene-dev] Robustness when indexing is interrupted?

Andi Vajda Fri, 11 Feb 2005 10:17:52 -0800

I assume you're talking about an index back by an FSDirectory. If so, well yes, if you open files, write into them, don't flush nor close buffers, things might be a little unfinished later.
And unrecoverable?
Big pain, in terms of keeping the index in sync with documents, since it means marking all of the docs in that index as unindexed and starting over. I'm dealing with many millions of documents...

There are other directory implementations around that instead of using the file system for storage use a database. For example, DbDirectory, also available with PyLucene, uses a Berkeley DB backend. That implementation allows you to use transactions and all the other facilities to control backup, replication and change of your data. There are simple examples illustrating this in samples/LuceneInAction, in the BerkeleyDbIndexer.py and BerkeleyDbSearcher.py files. To get access to DbDirectory you need to build a PyLucene that includes Berkeley DB support or use the -db- flavored binary archive from PyLucene's download site. You must also install Berkeley DB freely available from www.sleepycat.com.

If DbDirectory and Berkeley DB are not right for you, I recently added support for providing a full python implementation of the Directory protocol in PyLucene. The Directory, OutputStream, InputStream and Lock Lucene abstract classes may be 'extended' from python to provide python implementations. This allows you to provide your own 100% python solution to whichever index storage solution you need.

The pros and cons of various database backend approaches for storing lucene indexes have been discussed on the lucene-user mailing list many times before. http://jakarta.apache.org/site/mail2.html#Lucene

Andi..

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] Robustness when indexing is interrupted?

Reply via email to