Thanks Chris, 

I followed your suggestion and looked at the crawl log and
$NUTCH_HOME/conf/nutch-default.xml.
I found that the myProject plugin was not being included. 

fetch.log:
050329 025333 loading file:/usr/local/nutch-0.6/conf/nutch-default.xml
050329 025333 loading file:/usr/local/nutch-0.6/conf/nutch-site.xml
050329 025333 No NutchFileSystem indicated, so defaulting to local fs.
050329 025334 Plugins: looking in: /usr/local/nutch-0.6/build/plugins
050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/parse-html/plugin.xml
050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/query-site/plugin.xml
050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/parse-text/plugin.xml
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/myProject
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/parse-msword
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/ontology
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/parse-mp3
050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/query-url/plugin.xml
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/protocol-ftp
050329 025334 not including:
/usr/local/nutch-0.6/build/plugins/clustering-carrot2
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/parse-pdf
050329 025334 not including:
/usr/local/nutch-0.6/build/plugins/language-identifier

So I modified conf/nutch-site.xml and then it worked. 

<property>
 <name>plugin.includes</name>
<value>myProject|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)</value>
</property>

Thanks for your help. 


On Mon, 28 Mar 2005 22:23:11 -0800, Chris Mattmann
<[EMAIL PROTECTED]> wrote:
> Hi Zennet,
> 
> >
> > The URLFilter plugin is already working from previous development but
> > my changes to the code don't take effect.
> >
> > Here are the steps I've taken:
> > 1. Modified the existing implementation of URLFilter interface
> 
> Okay.
> 
> > 2. Built the project with ant
> 
> Good.
> 
> > 3. Copied build/plugin/* to NUTCH_HOME/plugins
> 
> You don't need to do this if you're running the crawl tool. The crawl tool
> will by default load plugins out of $NUTCH_HOME/build/plugins
> 
> > 4. Ran the generate-fetch-index cycle
> 
> Okay
> >
> > I modified filter() to write some debug statements to a file and
> > return null for every url (for debugging purposes). I know my code was
> > not executed because no urls should have been indexed and there were
> > debug statements in the file. I suspect that step 3 is what I am doing
> > incorrectly or there is some other file I need to modify.
> 
> Did you enable the plugin in the nutch-default.xml file within the conf
> directory? Make sure that you enable the plugin there. Can you post a txt
> capture of your crawl log?
> 
> Thanks,
>  Chris
> 
> 
> >
> > I appreciate any help.
> >
> > Thanks,
> > zennet
> 
>

Reply via email to