Yes, you do have to make a config file for your plugin to be seen by Nutch. 

If you built Nutch from source, you should have the directory build/plugins. 
That's where the compiled plugins are. The names of the directories under there 
are the names that get included in 'plugin.includes'. Take a look at the 
existing plugin.xml files, you should be able to figure it out by example.

The standard way to package the plugin code is to put it in a jar in the 
corresponding plugin directory. This ensures that it won't get loaded if it's 
not used. (This is optional: if you KNOW that it's gonna get used every time, 
you can put your code anywhere on the classpath.)

Note that I'm using 1.1 - I can't guarantee that this information is still 
current.

-MB



On Feb 1, 2011, at 9:49 PM, .: Abhishek :. wrote:

> Hi all,
> 
> I am writing an custom HtmlParserFilter by implementing the
> HtmlParseFilter. And, I am using the ParserChecker for testing the filter.
> 
> I could see by some Syso's in the HTMLParseFilters class that by default
> only org.apache.nutch.parse.js.JSParseFilter is being used. If I would like
> to use my custom filter should I be adding some configurations any where?
> 
> And a point to be noted is that, when I add the following lines in
> nutch-site.xml,
> 
> <property>
>          <name>plugin.includes</name>
> 
> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>
>          <description>Regular expression naming plugin id names to
>              include.  Any plugin not matching this expression is excluded.
>              In any case you need at least include the
> nutch-extensionpoints plugin. By
>              default Nutch includes crawling just HTML and plain text via
> HTTP,
>              and basic indexing and search plugins.
>          </description>
>    </property>
> 
> I don't even see JSParseFilter being applied. The package that has my
> custom filter does not have any special plugin configuration xml files, do I
> have to add some or configure it else where. I am using Nutch 1.2.
> 
> I see my knowledge with Nutch growing considerably, thanks to all of you.
> 
> Cheers,
> Abi

Reply via email to