[Nutch-general] Re: urlfilter-db usage

Stefan Groschupf Thu, 01 Dec 2005 01:17:16 -0800

In case you are sure that the plugin is deployed successfully (checkthe logs in the very beginning there is a plugin included section)than there is nothing to do for you.What happens behind the sense is that until segment generation theplugin is asked if a specific url can added to the segment's fetchlist. This happens only in case the url pass ALL (!!) url filtersthat are deployed (see logfile) so verify that you do not blog a urlwith a given regular expression in a deployed reg-ex url filter.


HTH
Stefan



Am 01.12.2005 um 06:44 schrieb Brent Parker:

Greetings,
I'm a Nutch (0.7.1) newbie. I have installed it - used theIntranet crawl,and all works fine. I want to crawl the web, using a relativelysmall listof domains. Therefore, I am interested in using the urlfilter-dbplugin(http://issues.apache.org/jira/browse/NUTCH-100). I have downloadedthe
plugin. I was able to build and deploy with no problem. I set up the
nutch-default.xml, nutch-site.xml, and mysql as specified in theplugin
instructions. But how do I use (invoke) the plugin?
I am using the tutorial (http://lucene.apache.org/nutch/tutorial.html) as my
guide to do whole-web crawling.  Do I now start from the "Whole-web:
Fetching" section?

Just need a "little" guidance (I think).

Thanks in advance!
Brent




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: urlfilter-db usage

Reply via email to