Re: Crawl auto updated in nutch?

RJ Wed, 30 Nov 2005 14:42:33 -0800

-  bin/nutch generate db/ segments/
 will generate a new segment with uncrawled urls.


  Try is url:http://wiki.media-style.com/display/nutchDocu/quick+tutorial
and just do the parts, Generate, Update the DB, Index the segment

  I hope that helps.

 Regards



  Regards
----- Original Message ----- 
From: "Håvard W. Kongsgård" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Wednesday, November 30, 2005 4:27 PM
Subject: Re: Crawl auto updated in nutch?


> I searched the mail archive and found this
> http://www.mail-archive.com/[email protected]/msg01308.html
> - Is there in the current version of nutch on way to update the crawl
> without fetching every doc again?
> - Is the nutch team planning an updating function?
>
>
>
> Håvard W. Kongsgård wrote:
>
> > So how to update a crawl, the updating section of the FAQ is empty :-(
> >
http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6
> >
> >
> >>
> >> Doug Cutting wrote:
> >>
> >>> Håvard W. Kongsgård wrote:
> >>>
> >>>> - I want to index about 50 – 100 sites with lots of documents, is
> >>>> it best use the Intranet Crawling or Whole-web Crawling method.
> >>>
> >>>
> >>>
> >>>
> >>> The "intranet" style is simpler and hence a good place to start.  If
> >>> it doesn't work well for you then you might try the "whole-web" style.
> >>>
> >>>> - Is the crawl auto updated in nutch, or must I run a cron task
> >>>
> >>>
> >>>
> >>>
> >>> It is not auto-updated.
> >>>
> >>> Doug
> >>>
> >>>
> >>
> >
> >
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.362 / Virus Database: 267.13.10/188 - Release Date:
29/11/2005
>
>



-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.362 / Virus Database: 267.13.10/188 - Release Date: 29/11/2005

Re: Crawl auto updated in nutch?

Reply via email to