Thanks Chris, I followed your suggestion and looked at the crawl log and $NUTCH_HOME/conf/nutch-default.xml. I found that the myProject plugin was not being included.
fetch.log: 050329 025333 loading file:/usr/local/nutch-0.6/conf/nutch-default.xml 050329 025333 loading file:/usr/local/nutch-0.6/conf/nutch-site.xml 050329 025333 No NutchFileSystem indicated, so defaulting to local fs. 050329 025334 Plugins: looking in: /usr/local/nutch-0.6/build/plugins 050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/parse-html/plugin.xml 050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/query-site/plugin.xml 050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/parse-text/plugin.xml 050329 025334 not including: /usr/local/nutch-0.6/build/plugins/myProject 050329 025334 not including: /usr/local/nutch-0.6/build/plugins/parse-msword 050329 025334 not including: /usr/local/nutch-0.6/build/plugins/ontology 050329 025334 not including: /usr/local/nutch-0.6/build/plugins/parse-mp3 050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/query-url/plugin.xml 050329 025334 not including: /usr/local/nutch-0.6/build/plugins/protocol-ftp 050329 025334 not including: /usr/local/nutch-0.6/build/plugins/clustering-carrot2 050329 025334 not including: /usr/local/nutch-0.6/build/plugins/parse-pdf 050329 025334 not including: /usr/local/nutch-0.6/build/plugins/language-identifier So I modified conf/nutch-site.xml and then it worked. <property> <name>plugin.includes</name> <value>myProject|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)</value> </property> Thanks for your help. On Mon, 28 Mar 2005 22:23:11 -0800, Chris Mattmann <[EMAIL PROTECTED]> wrote: > Hi Zennet, > > > > > The URLFilter plugin is already working from previous development but > > my changes to the code don't take effect. > > > > Here are the steps I've taken: > > 1. Modified the existing implementation of URLFilter interface > > Okay. > > > 2. Built the project with ant > > Good. > > > 3. Copied build/plugin/* to NUTCH_HOME/plugins > > You don't need to do this if you're running the crawl tool. The crawl tool > will by default load plugins out of $NUTCH_HOME/build/plugins > > > 4. Ran the generate-fetch-index cycle > > Okay > > > > I modified filter() to write some debug statements to a file and > > return null for every url (for debugging purposes). I know my code was > > not executed because no urls should have been indexed and there were > > debug statements in the file. I suspect that step 3 is what I am doing > > incorrectly or there is some other file I need to modify. > > Did you enable the plugin in the nutch-default.xml file within the conf > directory? Make sure that you enable the plugin there. Can you post a txt > capture of your crawl log? > > Thanks, > Chris > > > > > > I appreciate any help. > > > > Thanks, > > zennet > >
