On Thu, 17 Jan 2008, Brian Merrell wrote:

I recompiled the latest version from the trunk and this doesn't seem to fix
my problems with my bigram "BrianAnalyzer".  It still seems to increase
refcounts by about 3000 per 1000 documents.  It seems like a very simple
subclass with only a single instance.  I wasn't able to tell if it fell into
the first or the second class of problems you describe (apparently it falls
into the second class?).

Maybe there is another leak, still.

Could you please send me a few documents plus your code so that I can reproduce the problem ? A few documents should be enough, I can adapt your code to index them over and over to simulate the number of documents you're using. If you can't send me the documents for various procedural reasons, could you please try to reproduce the bug with, say, spam text, and send those to me ?

In other words, what I'm looking for is an easy way to reproduce the problem so that I can focus on trying to fix it rather than trying to reproduce it.

Thanks !

Andi..


-brian

On 1/10/08, Andi Vajda <[EMAIL PROTECTED]> wrote:


In the past few days, several users reported symptoms of memory leaks in
jcc-PyLucene. After a bit of sleuthing, I found two leaks:

    1. A reference to a Python Java wrapper was leaked whenever an
inherited
       method was called on the Java object from Python (in callSuper).
       Fixing this one was trivial and is checked in to the svn trunk.

       I don't know if this fixes the cases reported but it certainly has
       a major impact. For instance, any time searcher.search(query) is
       called, the searcher is leaked (!).

       To verify this, run:
      > python test/test_PyLucene.py
Test_PyLuceneWithFSStore.test_searchDocuments -loop
       Without the fix, after a short while, the VM runs out of memory.
       With the fix, it seems to be running forever (and speed remains
more or
       less constant)

    2. A "deadly embrace" between a Python extension instance and its Java
       parent instance is currently preventing Python extention instances
       and their Java parent from ever being freed. The Python extension
       instance is holding a reference to the Java parent instance and the
       Java parent instance is holding a reference to the Python extension
       instance.
       Without some explicit intervention, this cross-VM cycle can't be
       broken. I'm currently thinking of making it possible to call
finalize()
       on these objects manually to break the cycle. I'm also thinking of
       adding a GC thread to the process that would garbage collect the
       extension instances with no more than two counted python refs. This
       thread, combined with weak global refs on the JNI side could make
       collecting these Python extension instances semi-automatic.
Needless
       to say, I don't like this idea too much and I'm trying to find
another
       less complicated way. In the meantime, I think I'm going to be
adding
       support for the manual way via finalize() shortly.

       This leak (still in svn trunk), is not normally that bad, as Python
       extension instances are rarer (than the earlier leak) and leaking
them
       is, normally, not as deadly. Still, there are cases where it is bad
as
       when implementing a Python extension for Directory and its sibling
       classes.
       To see for yourself, try running test/test_PythonDirectory.py in a
       loop. This leak is a problem in the current Chandler release, for
       example, where such a Python extension is used.

       More on this leak in the next few days.


In order to debug these leaks, I improved env._dumpRefs() a bit by adding
some keywords to it.

_dumpRefs() now can be called in three ways:

    - _dumpRefs(): returns a list of tuples (system hash id, ref count)
      these are useful for quickly getting an idea of how many global Java
      referenced objects there are at the moment (these objects are not
GC'ed
      by Java until removed from the refs table)

    - _dumpRefs(classes=True): returns a dict of { className: instance
count }
      to get an idea about how many instances of various classes are being
      thus kept from being GC'ed by Java

    - _dumpRefs(values=True): returns a list of tuples (value string, ref
      count) to get an idea of what the values look like. This is to be
used
      with caution a printing out Java values can be expensive.

It would be interesting to see if the people who recently reported memory
leak symptoms on this list could try the current trunk and report if that
solves their problem or at least, improves on it.
If you are re-building PyLucene from the trunk to try this out, be sure to
completely rebuild jcc first (the fix is there).

Thank you for your patience !

Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev


_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to