Hi Bill,
Earlier this year, I heard on the rumor mill (email list for NXLucene)
that memory leak problems were only solved in the GCC 4.2 and up
branch.... May be Andi has a better reply than this
best,
Marc
On Nov 19, 2007, at 7:49 PM, Bill Peverill wrote:
Hi Everybody!
Sorry this post is so long.
We are currently in the endstage of implementing a new search
capability
over a database with ~ 10 million records at present, which will
grow to ~
30 million within a month. We are no strangers to python and
pyLucene, and
naturally chose pyLucene for this project. This project has been
implemented
over our present database, and it works great for low user levels.
In a separate project, we have implemented a pyLucene solution for
another
[offline] product with a single-directory index size of about 40MB
over a
limited selection of the same data. This has been working well since
May.
------- Our Problem
Under load testing simulating our online traffic we encounter a
mounting
memory allocation which is never released, until finally the process
dies
with a modal dialog from within pyLucene : "fatal error in GC : too
many
heap sections." This usually occurs once memory has built to a
little under
2 GB [of 4 GB available.] We have been unable to free this memory
once it is
allocated, without restarting the job, which currently runs in a
console
window. We have not been able to get a stable implementation for
more than
some number of hours.
------- Our Questions
We would first like to establish that this is a solvable problem.
What are
the largest implementations out there so far with high traffic
searching a
large dataset? Any anecdotal evidence would be great. [Go on; Brag.]
Assuming pyLucene can support our requirements, has anybody
encountered this
particular issue? How did you solve it?
The list-serve archives suggest others have had this issue, but the
threads
have tended to go cold before a specific solution has been
documented. The
bug list has a single reference to a couple thread based memory
leaks, but
these may or may not be similar to ours, and will not be fixed.
[Making this
an unsolvable problem.] Maybe we just didn't find it, so if there is a
meaningful thread in the archives we'd love to hear about it.
------- A few things we've tried
we removed Cherrypy from the mix and had our PyLucene code perform
searches
directly, with no effect.
we've tried creating a new IndexSearcher per search, and also tried
creating
a single IndexSearcher that was used over many searches: none of the
variations we've tried prevented the memory from growing.
------- A note on our environment
We have two fast dedicated load balanced servers running server
2003, with 4
GB RAM and a fast RAID stripe to run pyLucene exclusively. pyLucene
is
called from cherryPy.
The present index is 3.1 GB in size as a 10 directory index; 2.8 GB
as a
single directory index. We will likely implement a multi-directory
index in
anticipation of bumping up against file size restrictions at the OS
level
once we implement the full data set. We anticipate an index in the
area of
10+ GB. We have not been optimizing our indices as part of our
tests: our
single-folder index consists of 34 files.
For debugging purposes we run CherryPy in console mode, but the
production
release will run as a windows service.
The software versions we use for the search web service are:
Python version 2.4.4
Cherrypy version 2.1.0 (with threading modified to use
PyLucene.PythonThread)
Pylucene GCJ version 2.2.0-1, built on Windows 2000 with mingw/gcj
3.4.6
and Python 2.4.3, obtained from the PyLucene download page
Note that because our codebase is not compatible with Python 2.5, we
can't
try the version of PyLucene that has been compiled with Python 2.5.
We would be grateful for any advice we can get. (Our gratitude could
include
beer contributions or other compensation if appropriate.) We'd also
love to
hear from people who have NOT had this problem.
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev