Hi Mike, Got it ! Thanks. I forgot to note the detail that the filters applied were all HTMLParseFilters.
Regards, Abi On Wed, Feb 2, 2011 at 11:07 PM, Mike Baranczak <[email protected]>wrote: > HTMLParseFilter is only one type of plugin, there are several other types. > In the configuration you have, it looks like JSParseFilter and > TestPluginFilter are the only plugins that implement HTMLParseFilter, so the > results make sense. > > -MB > > > On Feb 2, 2011, at 12:09 AM, .: Abhishek :. wrote: > > > Hi Mike et all, > > > > Yes the adding of plugin.xml made it work. > > > > However, the outstanding question even now is that - even though my > > plugin.includes lists a lot of plugin names why is that I just see > JSParser > > and my own custom parser in the HTMLParseFilters. > > > > The following is my plugin.includes value, > > > <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|test-plugin</value> > > > > Here test-plugin is my custom plugin. When I add the following line, > > > > for(HtmlParseFilter filter: htmlParseFilters){ > > System.out.println("Filter Name : > > "+filter.getClass().getName()); > > } > > > > below the last line of the constructor that takes conf parameter i.e > > this.htmlParseFilters = (HtmlParseFilter[]) > > objectCache.getObject(HtmlParseFilter.class.getName()); > > in the HTMLParserFilters I just see, > > > > Filter Name : org.apache.nutch.parse.js.JSParseFilter > > Filter Name : com.test.nutch.TestPluginFilter > > > > I am just wondering why is this. I should be seeing all the listed > filters > > in the values tag in plugin.includes right? > > > > > > > > > > On Wed, Feb 2, 2011 at 11:29 AM, Mike Baranczak <[email protected] > >wrote: > > > >> Yes, you do have to make a config file for your plugin to be seen by > Nutch. > >> > >> If you built Nutch from source, you should have the directory > >> build/plugins. That's where the compiled plugins are. The names of the > >> directories under there are the names that get included in > >> 'plugin.includes'. Take a look at the existing plugin.xml files, you > should > >> be able to figure it out by example. > >> > >> The standard way to package the plugin code is to put it in a jar in the > >> corresponding plugin directory. This ensures that it won't get loaded if > >> it's not used. (This is optional: if you KNOW that it's gonna get used > every > >> time, you can put your code anywhere on the classpath.) > >> > >> Note that I'm using 1.1 - I can't guarantee that this information is > still > >> current. > >> > >> -MB > >> > >> > >> > >> On Feb 1, 2011, at 9:49 PM, .: Abhishek :. wrote: > >> > >>> Hi all, > >>> > >>> I am writing an custom HtmlParserFilter by implementing the > >>> HtmlParseFilter. And, I am using the ParserChecker for testing the > >> filter. > >>> > >>> I could see by some Syso's in the HTMLParseFilters class that by > default > >>> only org.apache.nutch.parse.js.JSParseFilter is being used. If I would > >> like > >>> to use my custom filter should I be adding some configurations any > where? > >>> > >>> And a point to be noted is that, when I add the following lines in > >>> nutch-site.xml, > >>> > >>> <property> > >>> <name>plugin.includes</name> > >>> > >>> > >> > <value>nutch-extensionpoints|protocol-http|urlfilter-regex|parse-(text|html)|index-basic|query-(basic|site|url)</value> > >>> <description>Regular expression naming plugin id names to > >>> include. Any plugin not matching this expression is > >> excluded. > >>> In any case you need at least include the > >>> nutch-extensionpoints plugin. By > >>> default Nutch includes crawling just HTML and plain text > via > >>> HTTP, > >>> and basic indexing and search plugins. > >>> </description> > >>> </property> > >>> > >>> I don't even see JSParseFilter being applied. The package that has my > >>> custom filter does not have any special plugin configuration xml files, > >> do I > >>> have to add some or configure it else where. I am using Nutch 1.2. > >>> > >>> I see my knowledge with Nutch growing considerably, thanks to all of > you. > >>> > >>> Cheers, > >>> Abi > >> > >> > >

