- bin/nutch generate db/ segments/ will generate a new segment with uncrawled urls.
Try is url:http://wiki.media-style.com/display/nutchDocu/quick+tutorial and just do the parts, Generate, Update the DB, Index the segment I hope that helps. Regards Regards ----- Original Message ----- From: "Håvard W. Kongsgård" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Wednesday, November 30, 2005 4:27 PM Subject: Re: Crawl auto updated in nutch? > I searched the mail archive and found this > http://www.mail-archive.com/[email protected]/msg01308.html > - Is there in the current version of nutch on way to update the crawl > without fetching every doc again? > - Is the nutch team planning an updating function? > > > > Håvard W. Kongsgård wrote: > > > So how to update a crawl, the updating section of the FAQ is empty :-( > > http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6 > > > > > >> > >> Doug Cutting wrote: > >> > >>> Håvard W. Kongsgård wrote: > >>> > >>>> - I want to index about 50 – 100 sites with lots of documents, is > >>>> it best use the Intranet Crawling or Whole-web Crawling method. > >>> > >>> > >>> > >>> > >>> The "intranet" style is simpler and hence a good place to start. If > >>> it doesn't work well for you then you might try the "whole-web" style. > >>> > >>>> - Is the crawl auto updated in nutch, or must I run a cron task > >>> > >>> > >>> > >>> > >>> It is not auto-updated. > >>> > >>> Doug > >>> > >>> > >> > > > > > > > -- > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.1.362 / Virus Database: 267.13.10/188 - Release Date: 29/11/2005 > > -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.362 / Virus Database: 267.13.10/188 - Release Date: 29/11/2005 ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
