On Tue, May 22, 2007 at 02:33:34PM +0200, Thomas Koch wrote:
> Hi,
>
> We're using PyLucene to create an index of our python-based database (for
> searching).
...
> Next we observed "GC Warning" messages during index creation :
> GC Warning: Repeated allocation of very large block (appr. size 1466368):
> May lead to memory leak and poor performance.
We had to rebuild GCJ 3.4.6 with LARGE_CONFIG defined to avoid this
message. I checked GCJ 4.2.0, and LARGE_CONFIG still doesn't seem to
be defined by default. The comment from 4.2.0's Makefile.direct still
reads:
"# -DLARGE_CONFIG tunes the collector for unusually large heaps.
# Necessary for heaps larger than about 500 MB on most machines.
# Recommended for heaps larger than about 64 MB.
"
It's possible I'm missing something about the 4.2 build process
which sets LARGE_CONFIG, of course.
Also, the "Large stack limit" message comes from
boehm-gc/solaris_threads.c in gcj, so that warning seems
solaris-specific. You might be able to avoid that by setting your
maximum stack size lower than 8M with ulimit (the number reported is
2G?)
> If there's any other way to get rid of the GC Warning (and memory leak) that
> would be of interest of course...
You could probably divide up your documents, and index, say, 50K in
one process, exit, do the next 50K in a new process, etc., tuning
the batch sizes as needed. Inelegant, but it'd probably work.
Aaron Lav ([EMAIL PROTECTED])
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev