Okay so do you run the command bin/nutch generate -dir somedirectory or what.. Do you have to be in the original crawl directory? Andy
-----Original Message----- From: Thomas Sondergaard [mailto:[EMAIL PROTECTED] Sent: Thursday, January 12, 2006 8:11 AM To: [email protected] Subject: Re: Introduction to Nutch, Part 1: Crawling Is it safe to run these commands while the searcher (web-interface) is using it? In other words can I just do the following: 1) crawl 2) start tomcat 3) setup a cron-job that runs the following commands every 5 days (for my intranet I don't want to be up to 30 days behind): 1. generate, 2. updatedb, 3. invertlinks, 4. index, 5. dedup, 6. merge 4) Sit back and enjoy my eternally up-to-date intranet search engine? Thanks, Thomas Gal Nitzan wrote: >The crawl tool can be used only once. > >After running the initial crawl you can not use this tool again. > >>From that point on you would run: > >1. generate >2. updatedb >3. invertlinks >4. index >5. dedup >6. merge > >The default parameter for fetching pages cycle is 30 days. > >So basically if you finished crawling your intranet in the initial >crawl you would run your generate in 30 days. > >However you can run the generate with the -adddays parameter set to 30 >and it will generate a fetchlist with all pages already in your crawldb. > >If your system contains new pages, the crawler would find it during the >fetch and would update the crawldb. > >G. > >On Thu, 2006-01-12 at 07:44 -0500, Andy Morris wrote: > > >>After doing an initial crawl how do you keep that directory current. >>How often should a intranet crawl be run. Should this be a cron job >>and do I have to restart tomcat after each crawl? >> >>Andy >>-----Original Message----- >>From: Tom White [mailto:[EMAIL PROTECTED] >>Sent: Wednesday, January 11, 2006 4:21 AM >>To: [email protected] >>Subject: Introduction to Nutch, Part 1: Crawling >> >>Hi, >> >>I've written an article about using Nutch at the intranet scale, which >>you may find interesting: >>http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.h >>tm >>l . >>Please post any comments on the article page itself. >> >>I've updated the wiki to link to it too. >> >>Regards, >> >>Tom >> >> >> > > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
