Doğacan Güney wrote:
> On 6/28/07, Robert Young <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> Are the Nutch Stemming modifications available as a patch? I can't
>> seem to find anything on issue.apache.org
>
> There is some sort of stemming for German and French languages
> (available as plugin analysis-de and analysis-fr). I don't know how
> well they work (or if they work). AFAIK, there is no support for
> stemming English.
There is PorterStemmer in lucene, but is not used in nutch. You can 
easily add this by overriding NutchDocumentAnalyzer.

>
> Btw, I think we should revise nutch's document analysis system. For
> example, analyzers for index-basic's fields are hard-coded in analysis
> package (what happens if I don't use index-basic and use my own
> index-mind-blowingly-awesome plugin?) . You either have to use all of
> it or completely override it and use none of it. We should allow index
> plugins to specify their analyzers per field. There are analysis-*
> plugins but they work for documents in specific languages (what if I
> don't want to use language identification? what if nutch can't figure
> out what the language is?)

I strongly agree. Index-* plugins and analysis-* plugins are cross 
dependent. For every new field added by the indexing plugins, ALL the 
analysis plugins should be changed to analyze this new field, which 
brakes the golden rule. I agree with the idea that index plugins should 
specify their analyzers.
>
> Index plugins should also be able control how stuff like their field's
> length norm is calculated (which currently is hard coded too and can't
> be changed).
>
> Oh and, if you are feeling up to it, any help in this area would be
> much appreciated :).
>
>>
>> Thanks
>> Rob
>>
>
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to