On Mon, 26 Mar 2007, Ofer Nave wrote:

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ofer Nave
Sent: Monday, March 26, 2007 5:01 PM

I reimplemented my FooAnalyzer using this pattern and it
works now.  I still don't know why, but at least it works. :)

Ever since I started using a custom Analyer and TokenFilter, my index build
script keeps crashing.  Usually it just freezes at a random point, and won't
even respond to ctrl-c (I have to use kill -9 in another terminal).  One
time it ended with: 'Fatal Python error: This thread state must be current
when releasing'.  One time it finished successfully (out of about 20
attempts).  This is from repeated runs without changing any code.

If you submit a piece of code that reproduces the problem, I can take a look at it (best would be something like a unit test, see PyLucene/test).

Also, what is your OS ? did you build PyLucene yourself ? If so, which gcj ?
Does 'make test' pass ? What is your version of Python ?

Andi..


I'm not creating any threads.  It's a straight python script, no apache or
web stuff involved.  The only change has been the custom analyzer and
tokenfilter.

For reference:

---
class TermJoinTokenFilter(object):

   TOKEN_TYPE_JOINED = "JOINED"

   def __init__(self, tokenStream):
       self.tokenStream = tokenStream
       self.a = None
       self.b = None

   def __iter__(self):
       return self

   def next(self):
       if self.a:  # emitted prev last time - need to set next, emit prev +
next, and reset prev to None
           self.b = self.tokenStream.next()
           if self.b is None:
               return None
           joined = PyLucene.Token(self.a.termText() + self.b.termText(),
self.a.startOffset(), self.a.endOffset(), self.TOKEN_TYPE_JOINED)
           joined.setPositionIncrement(0)
           self.a = None
           return joined
       elif self.b:  # emitted prev + next last time - need to emit next,
set prev to next, and reset next to None
           self.a = self.b
           self.b = None
           return self.a
       else:  # first call ever - set prev to first token and emit first
token
           self.a = self.tokenStream.next()
           return self.a

class TermJoinAnalyzer(object):

   def __init__(self, analyzer=PyLucene.StandardAnalyzer()):
       self.analyzer = analyzer

   def tokenStream(self, fieldName, reader):
       tokenStream = self.analyzer.tokenStream(fieldName, reader)
       filter = TermJoinTokenFilter(tokenStream)
       return tokenStream.tokenFilter(filter)
---

-ofer

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to