Well, the problem is not about just IndexSearcher on big index. The problem is about many PythonThreads.
I can open index 20gb or 50gb with IndexSearcher without problem.
That's not what you said yesterday.
But when I try to create many PythonThreads which operates with IndexSearcher opened on big index then I receive exceptions like these GC Warning: Header allocation failed: Dropping block. GC Warning: Out of Memory! Returning NIL!
Have you asked the [EMAIL PROTECTED] mailing list about this error ?
And when I have decreased number of threads to 2 or 3 threads then this error has gone. So, the question is: How can PythonThread affect to this? Why the less amount of them does not produce exceptions of this kind?
There may be overhead involved in having multiple threads against a given index. Have you tried this under Java yet ? Have you asked the lucene-user mailing list ? A PythonThread is really a wrapper around a Java/libgcj thread that python is tricked into thinking it's one of its own.
By the way, you say that you have a 51 Gb index and a 20 Gb index. What is the size of the biggest single index files in these index directories ? There used to be a bug in libgcj that it couldn't support files bigger than 2 or 4 Gb (I don't remember). I know this bug is fixed in gcj 4.0, a PyLucene user actually verified that. I do not know that this bug has been fixed in the version of gcj we're using, gcj 3.4.3.
Also, I just learned that using an unoptimized index is going to require more memory. How much more is a question for lucene-user as well. Optimizing your index is likely to push you over the 4Gb per file limit in gcj < 4.0 though. Have you tried it ? (backing up your existing index first).
Andi.. _______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
