Hi Guys, I have few questions: 1- I found that we have the lib "lib-lucene-analyzers" in the plugin folder. How does it works, should i just add the definition "lib-lucene-analyzers" in the list of plugins in nutch-site.xml or should I also add language-identifier, analysis-(fr|de|en) ?
2- How do we know the name of the plugin we have to add in nutch-site.xml ? Actually I've just added analysis-fr in the list and I've got an exception which said that it coudl not find org.apache.lucene.analyzer.FrenchAnalyzer. It was looking for a lucene implementation of the plugin instead of the nutch implementation. I don't know why. is there any mapping between the plugin name and a class ? 3- I tried to implement an HTMLParseFilter but there are few things that i don't understand. What is the aim of a ParseResult ? Actually I don't understand why we could store many parseresult ? Is there any specific usage ? Why do we call the htmlparsefilter.filter after having created a first ParseResult ? How should i proceed if i want to remove some tag + content of those tags in the Html page? Should i reparse again the page and create another ParseResult which i will only use ? For instance, I don't want to index some content. i want to remove all content of each Select box in my html page. I thought I could do it in a HtmlParseFilter but i notice that I will waste some processing time because it will parse and create a first ParseResult (which i will never use) and then it will do it again (in my htmlparsefilter) to get the real text content that i need to index. I may have miss something in this case i will appreciate your help. Cheers E