Hi Bill,

Earlier this year, I heard on the rumor mill (email list for NXLucene) that memory leak problems were only solved in the GCC 4.2 and up branch.... May be Andi has a better reply than this

best,

Marc

On Nov 19, 2007, at 7:49 PM, Bill Peverill wrote:

Hi Everybody!

Sorry this post is so long.

We are currently in the endstage of implementing a new search capability over a database with ~ 10 million records at present, which will grow to ~ 30 million within a month. We are no strangers to python and pyLucene, and naturally chose pyLucene for this project. This project has been implemented
over our present database, and it works great for low user levels.

In a separate project, we have implemented a pyLucene solution for another [offline] product with a single-directory index size of about 40MB over a limited selection of the same data. This has been working well since May.

------- Our Problem

Under load testing simulating our online traffic we encounter a mounting memory allocation which is never released, until finally the process dies with a modal dialog from within pyLucene : "fatal error in GC : too many heap sections." This usually occurs once memory has built to a little under 2 GB [of 4 GB available.] We have been unable to free this memory once it is allocated, without restarting the job, which currently runs in a console window. We have not been able to get a stable implementation for more than
some number of hours.

------- Our Questions

We would first like to establish that this is a solvable problem. What are the largest implementations out there so far with high traffic searching a
large dataset? Any anecdotal evidence would be great. [Go on; Brag.]

Assuming pyLucene can support our requirements, has anybody encountered this
particular issue? How did you solve it?

The list-serve archives suggest others have had this issue, but the threads have tended to go cold before a specific solution has been documented. The bug list has a single reference to a couple thread based memory leaks, but these may or may not be similar to ours, and will not be fixed. [Making this
an unsolvable problem.] Maybe we just didn't find it, so if there is a
meaningful thread in the archives we'd love to hear about it.

------- A few things we've tried

we removed Cherrypy from the mix and had our PyLucene code perform searches
directly, with no effect.

we've tried creating a new IndexSearcher per search, and also tried creating
a single IndexSearcher that was used over many searches: none of the
variations we've tried prevented the memory from growing.

------- A note on our environment

We have two fast dedicated load balanced servers running server 2003, with 4 GB RAM and a fast RAID stripe to run pyLucene exclusively. pyLucene is
called from cherryPy.

The present index is 3.1 GB in size as a 10 directory index; 2.8 GB as a single directory index. We will likely implement a multi-directory index in anticipation of bumping up against file size restrictions at the OS level once we implement the full data set. We anticipate an index in the area of 10+ GB. We have not been optimizing our indices as part of our tests: our
single-folder index consists of 34 files.

For debugging purposes we run CherryPy in console mode, but the production
release will run as a windows service.

The software versions we use for the search web service are:

Python          version 2.4.4
Cherrypy                version 2.1.0 (with threading modified to use
PyLucene.PythonThread)
Pylucene GCJ version 2.2.0-1, built on Windows 2000 with mingw/gcj 3.4.6
and Python 2.4.3, obtained from the PyLucene download page

Note that because our codebase is not compatible with Python 2.5, we can't
try the version of PyLucene that has been compiled with Python 2.5.


We would be grateful for any advice we can get. (Our gratitude could include beer contributions or other compensation if appropriate.) We'd also love to
hear from people who have NOT had this problem.
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to