Sorry for the delay. I have a concise test case now. See below for inline comments. Code is at the bottom.
> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Andi Vajda > Sent: Monday, March 26, 2007 9:00 PM > > On Mon, 26 Mar 2007, Ofer Nave wrote: > > Ever since I started using a custom Analyer and > TokenFilter, my index > > build script keeps crashing. Usually it just freezes at a random > > point, and won't even respond to ctrl-c (I have to use kill -9 in > > another terminal). One time it ended with: 'Fatal Python > error: This > > thread state must be current when releasing'. One time it finished > > successfully (out of about 20 attempts). This is from > repeated runs without changing any code. > > If you submit a piece of code that reproduces the problem, I > can take a look at it (best would be something like a unit > test, see PyLucene/test). Haven't had time to look at the unit testing framework, but the code is simple and runs standalone. > Also, what is your OS ? did you build PyLucene yourself ? If > so, which gcj ? > Does 'make test' pass ? What is your version of Python ? Linux 2.6.9 Python 2.3.4 Lucene/PyLucene versions including in sample output below. I believe the admin compiled PyLucene from source. The box has gcj version 3.4.5 20051201. Sample code: --- #!/usr/bin/python import sys import PyLucene def main(): print 'PyLucene', PyLucene.VERSION, 'Lucene', PyLucene.LUCENE_VERSION data = dict(album='Hail To The Thief', artist='Radiohead', ASIN='B000092ZYX') directory = '/tmp/crash' store = PyLucene.FSDirectory.getDirectory(directory, True) # store = PyLucene.RAMDirectory() # analyzer = PyLucene.StandardAnalyzer() analyzer = TermJoinAnalyzer() writer = PyLucene.IndexWriter(store, analyzer, True) docs = 0 while True: doc = PyLucene.Document() doc.add(PyLucene.Field('album', data['album'], PyLucene.Field.Store.YES, PyLucene.Field.Index.TOKENIZED)) doc.add(PyLucene.Field('artist', data['artist'], PyLucene.Field.Store.YES, PyLucene.Field.Index.TOKENIZED)) doc.add(PyLucene.Field('ASIN', data['ASIN'], PyLucene.Field.Store.YES, PyLucene.Field.Index.UN_TOKENIZED)) writer.addDocument(doc) docs += 1 if docs % 5000 == 0: print docs class TermJoinTokenFilter(object): TOKEN_TYPE_JOINED = "JOINED" def __init__(self, tokenStream): self.tokenStream = tokenStream self.a = None self.b = None def __iter__(self): return self def next(self): if self.a: # emitted prev last time - need to set next, emit prev + next, and reset prev to None self.b = self.tokenStream.next() if self.b is None: return None joined = PyLucene.Token(self.a.termText() + self.b.termText(), self.a.startOffset(), self.a.endOffset(), self.TOKEN_TYPE_JOINED) joined.setPositionIncrement(0) self.a = None return joined elif self.b: # emitted prev + next last time - need to emit next, set prev to next, and reset next to None self.a = self.b self.b = None return self.a else: # first call ever - set prev to first token and emit first token self.a = self.tokenStream.next() return self.a class TermJoinAnalyzer(object): def __init__(self, analyzer=PyLucene.StandardAnalyzer()): self.analyzer = analyzer def tokenStream(self, fieldName, reader): tokenStream = self.analyzer.tokenStream(fieldName, reader) filter = TermJoinTokenFilter(tokenStream) return tokenStream.tokenFilter(filter) main() --- It builds an index in /tmp/crash. You can change the path, or to avoid disk, switch which Directory instantiation line is commented out. It uses my TermJoinAnalyzer class to demonstate the crash. To demonstrate how the same code runs fine with StandardAnalyzer, switch which Analayzer instantiation line is commented out. I ran it with TermJoinAnalyzer three times, and all three times it crashed within seconds - with three different errors, no less. :) When I ran it with StandardAnalyzer, it worked fine for several minutes before I killed it. Here's the output from the three crashes: --- [EMAIL PROTECTED] ~/proj/search/trunk]$ bin/tmp.py PyLucene 2.1.0-1 Lucene 2.1.0-509013 5000 10000 15000 20000 25000 Fatal Python error: auto-releasing thread-state, but no thread-state for this thread Aborted [EMAIL PROTECTED] ~/proj/search/trunk]$ bin/tmp.py PyLucene 2.1.0-1 Lucene 2.1.0-509013 5000 10000 15000 20000 25000 30000 35000 Fatal Python error: This thread state must be current when releasing Aborted [EMAIL PROTECTED] ~/proj/search/trunk]$ bin/tmp.py PyLucene 2.1.0-1 Lucene 2.1.0-509013 5000 10000 Traceback (most recent call last): File "bin/tmp.py", line 57, in ? main() File "bin/tmp.py", line 19, in main writer.addDocument(doc) PyLucene.JavaError: java.lang.NullPointerException --- -ofer _______________________________________________ pylucene-dev mailing list [email protected] http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
