Hi Brent,
Start here;
http://wiki.media-style.com/display/nutchDocu/quick+tutorial
After urls are injected you only need to repeat the, Generate, Fetch,
Update and Index parts of the above tutorial.
Re: Generate:
Generate builds a new segment of uncrawled urls.
That should get you started. I started testing Nutch about a week ago
so, if anyone wants to add anything, feel free.
Regards
----- Original Message -----
From: "Brent Parker" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, December 01, 2005 12:44 AM
Subject: urlfilter-db usage
> Greetings,
>
> I'm a Nutch (0.7.1) newbie. I have installed it - used the Intranet
crawl,
> and all works fine. I want to crawl the web, using a relatively small list
> of domains. Therefore, I am interested in using the urlfilter-db plugin
> (http://issues.apache.org/jira/browse/NUTCH-100). I have downloaded the
> plugin. I was able to build and deploy with no problem. I set up the
> nutch-default.xml, nutch-site.xml, and mysql as specified in the plugin
> instructions. But how do I use (invoke) the plugin?
>
> I am using the tutorial (http://lucene.apache.org/nutch/tutorial.html) as
my
> guide to do whole-web crawling. Do I now start from the "Whole-web:
> Fetching" section?
>
> Just need a "little" guidance (I think).
>
> Thanks in advance!
> Brent
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.362 / Virus Database: 267.13.10/188 - Release Date:
29/11/2005
>
>
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.10/189 - Release Date: 30/11/2005