Hi Guys,

I have few questions:
1- I found that we have the lib "lib-lucene-analyzers" in the plugin folder.
How does it works, should i just add the definition "lib-lucene-analyzers"
in the list of plugins in nutch-site.xml or should I also add
language-identifier, analysis-(fr|de|en) ?

2- How do we know the name of the plugin we have to add in nutch-site.xml ?
Actually I've just added analysis-fr in the list and I've got an exception
which said that it coudl not find org.apache.lucene.analyzer.FrenchAnalyzer.
It was looking for a lucene implementation of the plugin instead of the
nutch implementation. I don't know why.
is there any mapping between the plugin name and a class ?

3- I tried to implement an HTMLParseFilter but there are few things that i
don't understand.
What is the aim of a ParseResult ? Actually I don't understand why we could
store many parseresult ? Is there any specific usage ?
Why do we call the htmlparsefilter.filter after having created a first
ParseResult ?
How should i proceed if i want to remove some tag + content of those tags in
the Html page? Should i reparse again the page and create another
ParseResult which i will only use ? For instance, I don't want to index some
content. i want to remove all content of each Select box in my html page. I
thought I could do it in a HtmlParseFilter but i notice that I will waste
some processing time because it will parse  and create a  first ParseResult
(which i will never use) and then it will do it again (in my
htmlparsefilter) to get the real text content that i need to index.
I may have miss something in this case i will appreciate your help.

Cheers
E

Reply via email to