Re: [pylucene-dev] Analyzer memory leak

Andi Vajda Fri, 28 Jan 2005 14:47:02 -0800

First off, thanks for PyLucene -- it totally rocks!


You're welcome !

I've been working on a python script that uses Lucene to look up text in an index -- it works great, but it ran my machine out of memory in an all-night test. :-( A little bit of digging around and I've come up with this little Python program to duplicates the memory leak:
#!/usr/bin/env python
import PyLucene
from stringreader import StringReader
analyzer = PyLucene.StopAnalyzer()
while True:
   query = u"any old text here will cause a leak"
   for token in query.split(u' '):
       stream = analyzer.tokenStream("", StringReader(token))
       while stream.next(): pass
I know that my use of the analyzer is a bit strange, but I want to examine which words get tossed as stop words and I need to correlate tokenized lucene queries from non-tokenized query strings.

Is there something I need to do that I am not doing?

Well, to know what the stop words are in the StopAnalyzer you could check the StopWords.ENGLISH_STOP_WORDS variable instead.

But, that wouldn't solve the Reader leak :) I don't see anything wrong in the code snippet you sent. I guess the PythonReader c++/java class in cpp/PythonIO.cpp needs to be checked out for leaks. Looking....

Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Re: [pylucene-dev] Analyzer memory leak

Reply via email to