- bin/nutch generate db/ segments/ will generate a new segment with uncrawled urls.
Try is url:http://wiki.media-style.com/display/nutchDocu/quick+tutorial and just do the parts, Generate, Update the DB, Index the segment I hope that helps. Regards Regards ----- Original Message ----- From: "Håvard W. Kongsgård" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Wednesday, November 30, 2005 4:27 PM Subject: Re: Crawl auto updated in nutch? > I searched the mail archive and found this > http://www.mail-archive.com/[email protected]/msg01308.html > - Is there in the current version of nutch on way to update the crawl > without fetching every doc again? > - Is the nutch team planning an updating function? > > > > Håvard W. Kongsgård wrote: > > > So how to update a crawl, the updating section of the FAQ is empty :-( > > http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6 > > > > > >> > >> Doug Cutting wrote: > >> > >>> Håvard W. Kongsgård wrote: > >>> > >>>> - I want to index about 50 – 100 sites with lots of documents, is > >>>> it best use the Intranet Crawling or Whole-web Crawling method. > >>> > >>> > >>> > >>> > >>> The "intranet" style is simpler and hence a good place to start. If > >>> it doesn't work well for you then you might try the "whole-web" style. > >>> > >>>> - Is the crawl auto updated in nutch, or must I run a cron task > >>> > >>> > >>> > >>> > >>> It is not auto-updated. > >>> > >>> Doug > >>> > >>> > >> > > > > > > > -- > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.1.362 / Virus Database: 267.13.10/188 - Release Date: 29/11/2005 > > -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.362 / Virus Database: 267.13.10/188 - Release Date: 29/11/2005
