On Tue, May 22, 2007 at 02:33:34PM +0200, Thomas Koch wrote:
> Hi,
> 
> We're using PyLucene to create an index of our python-based database (for
> searching).

...

> Next we observed "GC Warning" messages during index creation :
>  GC Warning: Repeated allocation of very large block (appr. size 1466368):
>              May lead to memory leak and poor performance.

We had to rebuild GCJ 3.4.6 with LARGE_CONFIG defined to avoid this
message.  I checked GCJ 4.2.0, and LARGE_CONFIG still doesn't seem to
be defined by default.  The comment from 4.2.0's Makefile.direct still
reads:

"# -DLARGE_CONFIG tunes the collector for unusually large heaps.
 #   Necessary for heaps larger than about 500 MB on most machines.
 #   Recommended for heaps larger than about 64 MB.
"

It's possible I'm missing something about the 4.2 build process 
which sets LARGE_CONFIG, of course.

Also, the "Large stack limit" message comes from
boehm-gc/solaris_threads.c in gcj, so that warning seems
solaris-specific.  You might be able to avoid that by setting your
maximum stack size lower than 8M with ulimit (the number reported is
2G?)

> If there's any other way to get rid of the GC Warning (and memory leak) that
> would be of interest of course...

You could probably divide up your documents, and index, say, 50K in
one process, exit, do the next 50K in a new process, etc., tuning
the batch sizes as needed.  Inelegant, but it'd probably work.

    Aaron Lav ([EMAIL PROTECTED])
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to