Hi Mike et all,

 Yes the adding of plugin.xml made it work.

 However, the outstanding question even now is that - even though my
plugin.includes lists a lot of plugin names why is that I just see JSParser
and my own custom parser in the HTMLParseFilters.

 The following is my plugin.includes value,
<value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|test-plugin</value>

 Here test-plugin is my custom plugin. When I add the following line,

 for(HtmlParseFilter filter: htmlParseFilters){
                System.out.println("Filter Name :
"+filter.getClass().getName());
            }

  below the last line of the constructor that takes conf  parameter i.e
this.htmlParseFilters = (HtmlParseFilter[])
objectCache.getObject(HtmlParseFilter.class.getName());
in the HTMLParserFilters I just see,

Filter Name : org.apache.nutch.parse.js.JSParseFilter
Filter Name : com.test.nutch.TestPluginFilter

 I am just wondering why is this. I should be seeing all the listed filters
in the values tag in plugin.includes right?




On Wed, Feb 2, 2011 at 11:29 AM, Mike Baranczak <[email protected]>wrote:

> Yes, you do have to make a config file for your plugin to be seen by Nutch.
>
> If you built Nutch from source, you should have the directory
> build/plugins. That's where the compiled plugins are. The names of the
> directories under there are the names that get included in
> 'plugin.includes'. Take a look at the existing plugin.xml files, you should
> be able to figure it out by example.
>
> The standard way to package the plugin code is to put it in a jar in the
> corresponding plugin directory. This ensures that it won't get loaded if
> it's not used. (This is optional: if you KNOW that it's gonna get used every
> time, you can put your code anywhere on the classpath.)
>
> Note that I'm using 1.1 - I can't guarantee that this information is still
> current.
>
> -MB
>
>
>
> On Feb 1, 2011, at 9:49 PM, .: Abhishek :. wrote:
>
> > Hi all,
> >
> > I am writing an custom HtmlParserFilter by implementing the
> > HtmlParseFilter. And, I am using the ParserChecker for testing the
> filter.
> >
> > I could see by some Syso's in the HTMLParseFilters class that by default
> > only org.apache.nutch.parse.js.JSParseFilter is being used. If I would
> like
> > to use my custom filter should I be adding some configurations any where?
> >
> > And a point to be noted is that, when I add the following lines in
> > nutch-site.xml,
> >
> > <property>
> >          <name>plugin.includes</name>
> >
> >
> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>
> >          <description>Regular expression naming plugin id names to
> >              include.  Any plugin not matching this expression is
> excluded.
> >              In any case you need at least include the
> > nutch-extensionpoints plugin. By
> >              default Nutch includes crawling just HTML and plain text via
> > HTTP,
> >              and basic indexing and search plugins.
> >          </description>
> >    </property>
> >
> > I don't even see JSParseFilter being applied. The package that has my
> > custom filter does not have any special plugin configuration xml files,
> do I
> > have to add some or configure it else where. I am using Nutch 1.2.
> >
> > I see my knowledge with Nutch growing considerably, thanks to all of you.
> >
> > Cheers,
> > Abi
>
>

Reply via email to