Re: urlfilter-db usage

RJ Thu, 01 Dec 2005 07:40:39 -0800

 Hi Brent,

     Start here;
       http://wiki.media-style.com/display/nutchDocu/quick+tutorial


      After urls are injected you only need to repeat the, Generate, Fetch,
Update and Index parts of the above tutorial.
      Re: Generate:
             Generate builds a new segment of uncrawled urls.

      That should get you started. I started testing Nutch about a week ago
so, if anyone wants to add anything, feel free.

  Regards

----- Original Message ----- 
From: "Brent Parker" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Thursday, December 01, 2005 12:44 AM
Subject: urlfilter-db usage


> Greetings,
>
> I'm a Nutch (0.7.1) newbie.  I have installed it - used the Intranet
crawl,
> and all works fine. I want to crawl the web, using a relatively small list
> of domains. Therefore, I am interested in using the urlfilter-db plugin
> (http://issues.apache.org/jira/browse/NUTCH-100). I have downloaded the
> plugin. I was able to build and deploy with no problem. I set up the
> nutch-default.xml, nutch-site.xml, and mysql as specified in the plugin
> instructions. But how do I use (invoke) the plugin?
>
> I am using the tutorial (http://lucene.apache.org/nutch/tutorial.html) as
my
> guide to do whole-web crawling.  Do I now start from the "Whole-web:
> Fetching" section?
>
> Just need a "little" guidance (I think).
>
> Thanks in advance!
> Brent
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.362 / Virus Database: 267.13.10/188 - Release Date:
29/11/2005
>
>



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.10/189 - Release Date: 30/11/2005

Re: urlfilter-db usage

Reply via email to