Hi all,

 I am writing an custom HtmlParserFilter by implementing the
HtmlParseFilter. And, I am using the ParserChecker for testing the filter.

 I could see by some Syso's in the HTMLParseFilters class that by default
only org.apache.nutch.parse.js.JSParseFilter is being used. If I would like
to use my custom filter should I be adding some configurations any where?

 And a point to be noted is that, when I add the following lines in
nutch-site.xml,

<property>
          <name>plugin.includes</name>

<value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>
          <description>Regular expression naming plugin id names to
              include.  Any plugin not matching this expression is excluded.
              In any case you need at least include the
nutch-extensionpoints plugin. By
              default Nutch includes crawling just HTML and plain text via
HTTP,
              and basic indexing and search plugins.
          </description>
    </property>

 I don't even see JSParseFilter being applied. The package that has my
custom filter does not have any special plugin configuration xml files, do I
have to add some or configure it else where. I am using Nutch 1.2.

 I see my knowledge with Nutch growing considerably, thanks to all of you.

Cheers,
Abi

Reply via email to