I dont think it should be 7.2 before we get some natural language processing. especially if there is public collaboration with nutch community and the folks at http://opennlp.sourceforge.net/ :-0
Tomi NA wrote: > On 9/19/06, Gonçalo Gaiolas <[EMAIL PROTECTED]> wrote: >> Hi everyone! >> >> >> >> I'm using version 7.2 of Nutch and I'm very happy with it. Want to >> send a >> big thumbs up for you guys behind it! > > Welcome, our honoured guest from the future! :) 7.2 probably includes > natural language processing and spawns a great deal of controversy as > to weather it can be considered "intelligent" or just very good at > smalltalk. :) > >> Having said that, I'd like to make my users search experience as good as >> possible. To do that, I need to solve two little "problems" : >> >> - Stemming – in my index I have lots of plurals and verbal >> forms >> that prevent my users from sometimes finding the right results. I've >> been >> looking around and it seems that the only stemming implementation >> available >> for nutch is described in the wiki and requires extensive changes in >> Nutch >> code, something I'd like to avoid. Can somebody help me ? >> >> - Synonyms – Ok, I don't really need synonyms. What I need >> is a way >> to specify that Image Converter should be equal to ImageConverter, or >> WebBlock should be the same as web block. How can I do this? This one is >> really impacting the search quality :-) > > I guess you need a different Analyzer. There's a list at > http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html > > You could also write your own to best represent the data you have. > > Cheers, > t.n.a. > > >
