Hello, I need to work with an external stemmer, which is accessible as a COM object. I managed to integrate this using the com4j library. I tried two scenario's: 1. Create a custom FilterFactory and Filter class for this. The external stemmer is then invoked for every token 2. Create a custom TokenizerFactory, that invokes the external stemmer for the entire search text, then puts the result of this into a StringReader, and finally returns new WhitespaceTokenizer(stringReader), so the stemmed text gets tokenized by the whitespace tokenizer.
Both scenario's appear to work from a functional point of view. The first scenario however is to slow because of the overhead of calling the external COM object. The second scenario is much faster, and also gives correct search results. However, this then gives problems with highlighting - sometimes, errors are reported (String out of Range), in other cases, I get incorrect highlight fragments. Without knowing all details about this stuff, this makes sense because of the change done to the text to be processed (I guess positions get messed up then). Maybe my second scenario is totally insane? Any ideas on how to overcome this? Cheers, Jaco.