Okay so do you run the command bin/nutch generate -dir somedirectory or what.. Do you have to be in the original crawl directory? Andy
-----Original Message----- From: Thomas Sondergaard [mailto:[EMAIL PROTECTED] Sent: Thursday, January 12, 2006 8:11 AM To: [email protected] Subject: Re: Introduction to Nutch, Part 1: Crawling Is it safe to run these commands while the searcher (web-interface) is using it? In other words can I just do the following: 1) crawl 2) start tomcat 3) setup a cron-job that runs the following commands every 5 days (for my intranet I don't want to be up to 30 days behind): 1. generate, 2. updatedb, 3. invertlinks, 4. index, 5. dedup, 6. merge 4) Sit back and enjoy my eternally up-to-date intranet search engine? Thanks, Thomas Gal Nitzan wrote: >The crawl tool can be used only once. > >After running the initial crawl you can not use this tool again. > >>From that point on you would run: > >1. generate >2. updatedb >3. invertlinks >4. index >5. dedup >6. merge > >The default parameter for fetching pages cycle is 30 days. > >So basically if you finished crawling your intranet in the initial >crawl you would run your generate in 30 days. > >However you can run the generate with the -adddays parameter set to 30 >and it will generate a fetchlist with all pages already in your crawldb. > >If your system contains new pages, the crawler would find it during the >fetch and would update the crawldb. > >G. > >On Thu, 2006-01-12 at 07:44 -0500, Andy Morris wrote: > > >>After doing an initial crawl how do you keep that directory current. >>How often should a intranet crawl be run. Should this be a cron job >>and do I have to restart tomcat after each crawl? >> >>Andy >>-----Original Message----- >>From: Tom White [mailto:[EMAIL PROTECTED] >>Sent: Wednesday, January 11, 2006 4:21 AM >>To: [email protected] >>Subject: Introduction to Nutch, Part 1: Crawling >> >>Hi, >> >>I've written an article about using Nutch at the intranet scale, which >>you may find interesting: >>http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.h >>tm >>l . >>Please post any comments on the article page itself. >> >>I've updated the wiki to link to it too. >> >>Regards, >> >>Tom >> >> >> > > > >
