On Mon, 21 Jan 2008, anurag uniyal wrote:
It does solve the problem for Custom Analyzer and parsers etc.
But my code with Custom filters still goes out of memory.
In code below if i comment out 'result = MyFilter(result)' line
it works.
I don't seem to be able to reproduce this. It's working fine for me. I even
increased the the loop to 1,000,000. Monitoring the process, its size
remains constant too.
Maybe the __del__() method is causing trouble ?
But I left it in and all seemed fine for me.
So, what's different here ?
- did you rebuild JCC ?
- did you rebuild PyLucene ? (what's lucene.VERSION returning ?)
- what version of Python are you using ?
- on what OS ?
- what version of Java ?
Andi..
----
import lucene
lucene.initVM(lucene.CLASSPATH, maxheap='1m')
from lucene import (Token, PythonAnalyzer, PythonTokenStream,
StandardTokenizer, LowerCaseFilter)
from lia.analysis.AnalyzerUtils import AnalyzerUtils
class MyFilter(PythonTokenStream):
count = 0
filters = []
def __init__(self, tokenStream):
super(MyFilter, self).__init__()
self.input = tokenStream
MyFilter.count += 1
self..id = MyFilter.count
MyFilter.filters.append(self.id)
def next(self):
return self.input.next()
def __del__(self):
#self.input = None
MyFilter.filters.remove(self.id)
class MyAnalyzer(PythonAnalyzer):
def __init__(self):
super(MyAnalyzer, self).__init__()
def tokenStream(self, fieldName, reader):
result = StandardTokenizer(reader)
result = LowerCaseFilter(result)
# my filtering
result = MyFilter(result)
return result
text = 'TESTING the TESTS'
analyzer = MyAnalyzer()
try:
for i in xrange(10000):
if i%100==0:print i
tokens = AnalyzerUtils.tokensFromAnalysis(analyzer, text)
except lucene.JavaError,e:
print i,e
print "%s MyFilter remain:"%len(MyFilter.filters)
print MyFilter.filters
-----
rgds
Anurag
----- Original Message ----
From: Andi Vajda <[EMAIL PROTECTED]>
To: [email protected]
Sent: Sunday, 20 January, 2008 7:18:34 AM
Subject: Re: [pylucene-dev] finalizing the deadly embrace
On Thu, 17 Jan 2008, Andi Vajda wrote:
Thinking about this some more, I believe that Anurag's finalizer proxy idea
is on the right track. It provides the "trick" needed to break the deadly
embrace when the ref count of the python object is down to 1, that is, down
to when the only reference is the one from the Java parent wrapper.
When the finalizer proxy's refcount goes to zero, it is safe to assume that
only Java _may_ still be needing the object. This is enough then to replace
the strong global reference to the Java parent wrapper with a weak global
reference thereby breaking the deadly embrace and letting Java garbage
collect it when its time has come. When that time comes, the finalize()
method on it is normally called by the Java garbage collector and the python
ref count to the Python extension instance is brought to zero and the object
is finally freed.
This assumes, of course, that when such an extension object is instantiated,
the finalizer proxy is actually returned.
I should be able to implement this in C/C++ so that the performance hit is
minimal and in a way that is transparent to PyLucene users.
I checked the implementation of this idea into svn trunk rev 381.
It is no longer necessary to call finalize() by hand :)
I removed the finalize() calls from test_PythonDirectory.py and test_Sort.py
can now be run for ever, without any leakage.
It is necessary to rebuild both JCC and PyLucene to try this out.
I'd be curious to see if this solves your problem, Brian ?
Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
Forgot the famous last words? Access your message archive online at
http://in.messenger.yahoo.com/webmessengerpromo.php
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev