On Wed, 14 Jun 2006, Robert Kaye wrote:
Hi!
I am upgrading the MusicBrainz searching functionality from Lucene 1.4.x to
1.9.x and my command line tools for creating my indexes work peachy. But when
I load the searching scripts into apache2 with mod_python, and I try to
create my custom analyzer, the creation of that custom analyzer just plain
hangs. Never returns. I suspect that my custom analyzers are somehow at fault
and I hope that someone here can shed some light on what I am doing wrong.
First, some info:
- Linux 2.6.15, ubuntu dapper drake
- Python 2.4.3
- gcc/gcj/g++ 3.4.6 compiled from source by gcc 3.4.6 since gjc 3.4 is not
part of dapper and I distrust gcc 4.0.x
- PyLucene 1.9.1, without DB support, compiled by my own gcc/gcj
- apache 2.0.58 (compiled with my own gcc)
- mod_python 3.2.8 (also by my own gcc)
Here is my mod_python handler:
=====
from mod_python import apache, util
import analyzer
def handler(req):
a = analyzer.ArtistAnalyzer()
=====
The call to creating the ArtistAnalyzer never returns. Run as a standalone
script, it works just fine.
My analyzer.py looks like this:
=====
import PyLucene
class NoStopStandardAnalyzer(object):
def tokenStream(self, fieldName, reader):
res = PyLucene.StandardTokenizer(reader)
res = PyLucene.LowerCaseFilter(res)
return PyLucene.ISOLatin1AccentFilter(res)
class ArtistAnalyzer(PyLucene.PerFieldAnalyzerWrapper):
def __init__(self):
PyLucene.PerFieldAnalyzerWrapper.__init__(self,
NoStopStandardAnalyzer())
self.addAnalyzer("arid", PyLucene.KeywordAnalyzer())
self.addAnalyzer("p_artist", PyLucene.KeywordAnalyzer())
=====
If I use the StandardAnalyzer as a default analyzer for the
PerFieldAnalyzerWrapper, everything works as expected. Whenever my custom
analyzer gets created in mod_python, it grinds to a halt but the CPU never
gets pegged.
Any ideas what I might be doing wrong? Anything I have overlooked? I figured
I can't get more paranoid about the compilers than compiling gcj by hand and
then building everything with it. Alas that didn't yield any results. :-(
Any python thread running in this process accessing PyLucene needs to be known
to libgcj's garbage collector. In other words, any python thread using
PyLucene code (which calls into libgcj) needs to be an instance of
PyLucene.PythonThread which does the right thing in setting it up via libgcj.
How that is done under mod_python, I don't know. But failure to do so will
crash, hang, or otherwise act unhappy as soon as any java memory is allocated.
This question has been asked many times before on this list already but I
don't remember seeing any more practical answer on how this is actually done.
Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev