On Wed, 22 Nov 2006, Brian Whitman wrote:

I am attempting to create a sentence splitting fragmenter in PyLucene.
I see that the Fragmenter is not a class but a Interface definition.
Is there a way to create a new Fragmenter type that PyLucene can access via

highlighter.setTextFragmenter(SentenceFragmenter()) ?

I have tried to create a Python class that looks like

class SentenceFragmenter:
  def start(self, text):
        #stuff
  def isNewFragment(self,token):
        #stuff

but I get a InvalidArgsError in the setTextFragmenter call..

For that to work there has to be an extension point. At the moment, there is
no extension point for Fragmenter. I sure can add one....

An extension point is a wrapper in reverse, it's a Java class extending a Lucene class or implementing a Lucene interface where the implementation methods are native methods invoking a wrapped python implementation.
Currently, the only extension point available for the highlighter package is
for Formatter (see cpp/PythonHighlight.cpp and its uses in lucene.cpp).

Java Lucene has a number of well-known extension points and many of them are implemented by PyLucene. Since they are a fair amount of work to make I only implemented the obvious ones or the ones used by Java Lucene samples such as the ones in the "Lucene in Action" book ported to PyLucene.

I'm open to adding new ones as people find they are blocked by missing them for their use cases. Patches are also welcome...

I should have a Fragmenter extension point later today...

Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to