Hi! F.Y.I ------- I also got memory problems running my indexing process indexing a transaction log were each transaction is an own document(="many documents"). I run this on a MS Server 32-bit and I used the compiled JCC version. The process index each transaction log (at most 500000 documents) and closes the writers after each file is processed. When I monitored this process it never reached above 1G. Some of the problems was gone increasing the max heap size but I still think there is some kind of GC problem here. I have not tested the suggested solutions in this conversation but I will. I am not using any filter. At the moment I get memory problem after a couple of hours (I am able to index about 4-6 transactions log files per hour) so it is a bit disturbing. One workaround would just to index one file and then end the process and start it again and again... but I prefer not to use such workaround. /Fredrik Kant
2008/1/9, Andi Vajda <[EMAIL PROTECTED]>: > > > On Tue, 8 Jan 2008, Andi Vajda wrote: > > > On Wed, 9 Jan 2008, Brian Merrell wrote: > > > >> The thousands of _dumpRef() values/counts are almost all 1. I could > create > >> a histogram if it would be helpful. > >> For comparison I took my filter out and noted that the references > >> (len(myvm._dumpRef()) are quite stable (pretty constant around 250). > > > > Makes sense. It does more and more look like a leak in the generated > > extension code. More on this in the next day or two. > > Thanks, no need for a histogram. > > Andi.. > _______________________________________________ > pylucene-dev mailing list > [email protected] > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev > -- Fredrik Kant Kant Consulting AB Mobile: +46 70 787 06 01 www.kantconsulting.se
_______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
