On 6/28/07, Robert Young <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Are the Nutch Stemming modifications available as a patch? I can't
> seem to find anything on issue.apache.org

There is some sort of stemming for German and French languages
(available as plugin analysis-de and analysis-fr). I don't know how
well they work (or if they work). AFAIK, there is no support for
stemming English.

Btw, I think we should revise nutch's document analysis system. For
example, analyzers for index-basic's fields are hard-coded in analysis
package (what happens if I don't use index-basic and use my own
index-mind-blowingly-awesome plugin?) . You either have to use all of
it or completely override it and use none of it. We should allow index
plugins to specify their analyzers per field. There are analysis-*
plugins but they work for documents in specific languages (what if I
don't want to use language identification? what if nutch can't figure
out what the language is?)

Index plugins should also be able control how stuff like their field's
length norm is calculated (which currently is hard coded too and can't
be changed).

Oh and, if you are feeling up to it, any help in this area would be
much appreciated :).

>
> Thanks
> Rob
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to