Hi Mike,

 Got it ! Thanks. I forgot to note the detail that the filters applied were
all HTMLParseFilters.

Regards,
Abi


On Wed, Feb 2, 2011 at 11:07 PM, Mike Baranczak <[email protected]>wrote:

> HTMLParseFilter is only one type of plugin, there are several other types.
> In the configuration you have, it looks like JSParseFilter and
> TestPluginFilter are the only plugins that implement HTMLParseFilter, so the
> results make sense.
>
> -MB
>
>
> On Feb 2, 2011, at 12:09 AM, .: Abhishek :. wrote:
>
> > Hi Mike et all,
> >
> > Yes the adding of plugin.xml made it work.
> >
> > However, the outstanding question even now is that - even though my
> > plugin.includes lists a lot of plugin names why is that I just see
> JSParser
> > and my own custom parser in the HTMLParseFilters.
> >
> > The following is my plugin.includes value,
> >
> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|test-plugin</value>
> >
> > Here test-plugin is my custom plugin. When I add the following line,
> >
> > for(HtmlParseFilter filter: htmlParseFilters){
> >                System.out.println("Filter Name :
> > "+filter.getClass().getName());
> >            }
> >
> >  below the last line of the constructor that takes conf  parameter i.e
> > this.htmlParseFilters = (HtmlParseFilter[])
> > objectCache.getObject(HtmlParseFilter.class.getName());
> > in the HTMLParserFilters I just see,
> >
> > Filter Name : org.apache.nutch.parse.js.JSParseFilter
> > Filter Name : com.test.nutch.TestPluginFilter
> >
> > I am just wondering why is this. I should be seeing all the listed
> filters
> > in the values tag in plugin.includes right?
> >
> >
> >
> >
> > On Wed, Feb 2, 2011 at 11:29 AM, Mike Baranczak <[email protected]
> >wrote:
> >
> >> Yes, you do have to make a config file for your plugin to be seen by
> Nutch.
> >>
> >> If you built Nutch from source, you should have the directory
> >> build/plugins. That's where the compiled plugins are. The names of the
> >> directories under there are the names that get included in
> >> 'plugin.includes'. Take a look at the existing plugin.xml files, you
> should
> >> be able to figure it out by example.
> >>
> >> The standard way to package the plugin code is to put it in a jar in the
> >> corresponding plugin directory. This ensures that it won't get loaded if
> >> it's not used. (This is optional: if you KNOW that it's gonna get used
> every
> >> time, you can put your code anywhere on the classpath.)
> >>
> >> Note that I'm using 1.1 - I can't guarantee that this information is
> still
> >> current.
> >>
> >> -MB
> >>
> >>
> >>
> >> On Feb 1, 2011, at 9:49 PM, .: Abhishek :. wrote:
> >>
> >>> Hi all,
> >>>
> >>> I am writing an custom HtmlParserFilter by implementing the
> >>> HtmlParseFilter. And, I am using the ParserChecker for testing the
> >> filter.
> >>>
> >>> I could see by some Syso's in the HTMLParseFilters class that by
> default
> >>> only org.apache.nutch.parse.js.JSParseFilter is being used. If I would
> >> like
> >>> to use my custom filter should I be adding some configurations any
> where?
> >>>
> >>> And a point to be noted is that, when I add the following lines in
> >>> nutch-site.xml,
> >>>
> >>> <property>
> >>>         <name>plugin.includes</name>
> >>>
> >>>
> >>
> <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value>
> >>>         <description>Regular expression naming plugin id names to
> >>>             include.  Any plugin not matching this expression is
> >> excluded.
> >>>             In any case you need at least include the
> >>> nutch-extensionpoints plugin. By
> >>>             default Nutch includes crawling just HTML and plain text
> via
> >>> HTTP,
> >>>             and basic indexing and search plugins.
> >>>         </description>
> >>>   </property>
> >>>
> >>> I don't even see JSParseFilter being applied. The package that has my
> >>> custom filter does not have any special plugin configuration xml files,
> >> do I
> >>> have to add some or configure it else where. I am using Nutch 1.2.
> >>>
> >>> I see my knowledge with Nutch growing considerably, thanks to all of
> you.
> >>>
> >>> Cheers,
> >>> Abi
> >>
> >>
>
>

Reply via email to