> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Ofer Nave
> Sent: Monday, March 26, 2007 5:01 PM
> 
> I reimplemented my FooAnalyzer using this pattern and it 
> works now.  I still don't know why, but at least it works. :)

Ever since I started using a custom Analyer and TokenFilter, my index build
script keeps crashing.  Usually it just freezes at a random point, and won't
even respond to ctrl-c (I have to use kill -9 in another terminal).  One
time it ended with: 'Fatal Python error: This thread state must be current
when releasing'.  One time it finished successfully (out of about 20
attempts).  This is from repeated runs without changing any code.

I'm not creating any threads.  It's a straight python script, no apache or
web stuff involved.  The only change has been the custom analyzer and
tokenfilter.

For reference:

---
class TermJoinTokenFilter(object):

    TOKEN_TYPE_JOINED = "JOINED"

    def __init__(self, tokenStream):
        self.tokenStream = tokenStream
        self.a = None
        self.b = None

    def __iter__(self):
        return self

    def next(self):
        if self.a:  # emitted prev last time - need to set next, emit prev +
next, and reset prev to None
            self.b = self.tokenStream.next()
            if self.b is None:
                return None
            joined = PyLucene.Token(self.a.termText() + self.b.termText(),
self.a.startOffset(), self.a.endOffset(), self.TOKEN_TYPE_JOINED)
            joined.setPositionIncrement(0)
            self.a = None
            return joined
        elif self.b:  # emitted prev + next last time - need to emit next,
set prev to next, and reset next to None
            self.a = self.b
            self.b = None
            return self.a
        else:  # first call ever - set prev to first token and emit first
token
            self.a = self.tokenStream.next()
            return self.a

class TermJoinAnalyzer(object):

    def __init__(self, analyzer=PyLucene.StandardAnalyzer()):
        self.analyzer = analyzer

    def tokenStream(self, fieldName, reader):
        tokenStream = self.analyzer.tokenStream(fieldName, reader)
        filter = TermJoinTokenFilter(tokenStream)
        return tokenStream.tokenFilter(filter)
---

-ofer

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to