# java -version java version "1.6.0_03" Java(TM) SE Runtime Environment (build 1.6.0_03-b05) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode)
# which java /usr/bin/java It doesn't seem to crash when I remove the filter. However this may be misleading as don't have nearly as many tokens (particularly unique tokens) without the filter. The problem may exist but the symptoms delayed. After a 3000 thousand documents I get len(myvm._dumpRef()) == 12270 and it seems to be increasing by about 4000 for each 1000 documents. I didn't even realize C++ code was being generated. I doubt I can help directly with this but would be happy to provide anything that would help those more knowledgeable than I debug this). -brian On 1/8/08, Andi Vajda <[EMAIL PROTECTED]> wrote: > > > On Tue, 8 Jan 2008, Brian Merrell wrote: > > > Thanks for the quick reply. I haven't used Java in years so my > apologies if > > I am not able to provide useful debug info without some guidance. > > > > Memory does seem to be running low when it crashes. According to top, > > python is using almost all of the 4GB when it bails. > > That may be misleading because all the memory used belongs to the Python > process. Even Java's since it's loaded in via shared libraries into the > Python process. > > > I don't know what Java VM I am using. How do I determine this? > > At the shell prompt enter: java -version > For example, on my Mac, I get: > > java version "1.5.0_13" > Java(TM) 2 Runtime Environment, Standard Edition (build > 1.5.0_13-b05-237) > Java HotSpot(TM) Client VM (build 1.5.0_13-119, mixed mode, sharing) > > Also, what does 'which java' return ? > > > I will try running it calling gc.collect() and running optimize and see > if > > that helps. Any suggestions on how to debug _dumpRefs? > > _dumpRefs() returns a dict of java objects as keys and their ref count as > values. If this dict is unusually large, something's amiss. What is > "unusually" ? Time will tell :) > > > P.S. My filter is implemented in Python. In fact here is the code: > > Another thing to try (proceeding by elimination), is to index your > documents > without your custom filter. Does it still run out of memory ? If the > answer > is no, clearly the python filter integration code needs to be looked at > closely (that is, the generated C++ for that code). Maybe something's > leaking there. > > Andi.. > _______________________________________________ > pylucene-dev mailing list > [email protected] > http://lists.osafoundation.org/mailman/listinfo/pylucene-dev >
_______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
