On Thu, 24 Nov 2005, Victor Peinado wrote:

Hello all,

I'm indexing Spanish documents with Lucene and I need to avoid stop
words. I'm quite new using PyLucene and so far the StandarAnalyzer
worked well enough.

But now i need to do more complex things. Is there any SpanishAnalyzer
in the official distribution of Lucene or PyLucene, as those ones for
German or Russian? If there isn't, is it very difficult to extend
Analyzer to implement a kind of SpanishanAnalyzer? What issues should
I have in mind? Any tip/idea/documentation I should read first?

I don't think there is a SpanishAnalyzer in Java Lucence 1.4.3. There may be something in the snowball contrib package (also included in PyLucene).

Creating a custom analyzer in python in PyLucene can be pretty simple. See the "Lucene in Action" samples ported to Python in the PyLucene distribution. If all you want is a different set of stop words, it might even be very simple.

For more specific information about a SpanishAnalyzer or how to go about creating your own, you might ask the [email protected] mailing list where such Lucene-specific (java or not) questions are best addressed.

Andi..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to