Okay so do you run the command bin/nutch generate -dir somedirectory or
what..
Do you have to be in the original crawl directory?
Andy 

-----Original Message-----
From: Thomas Sondergaard [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 12, 2006 8:11 AM
To: [email protected]
Subject: Re: Introduction to Nutch, Part 1: Crawling

Is it safe to run these commands while the searcher (web-interface) is
using it? In other words can I just do the following:

1) crawl
2) start tomcat
3) setup a cron-job that runs the following commands every 5 days (for
my intranet I don't want to be up to 30 days behind): 1. generate, 2. 
updatedb, 3. invertlinks, 4. index, 5. dedup, 6. merge
4) Sit back and enjoy my eternally up-to-date intranet search engine?

Thanks,

Thomas


Gal Nitzan wrote:

>The crawl tool can be used only once.
>
>After running the initial crawl you can not use this tool again.
>
>>From that point on you would run:
>
>1. generate
>2. updatedb
>3. invertlinks
>4. index
>5. dedup
>6. merge
>
>The default parameter for fetching pages cycle is 30 days.
>
>So basically if you finished crawling your intranet in the initial 
>crawl you would run your generate in 30 days.
>
>However you can run the generate with the -adddays parameter set to 30 
>and it will generate a fetchlist with all pages already in your
crawldb.
>
>If your system contains new pages, the crawler would find it during the

>fetch and would update the crawldb.
>
>G.
>
>On Thu, 2006-01-12 at 07:44 -0500, Andy Morris wrote:
>  
>
>>After doing an initial crawl how do you keep that directory current.
>>How often should a intranet crawl be run.  Should this be a cron job 
>>and do I have to restart tomcat after each crawl?
>>
>>Andy
>>-----Original Message-----
>>From: Tom White [mailto:[EMAIL PROTECTED]
>>Sent: Wednesday, January 11, 2006 4:21 AM
>>To: [email protected]
>>Subject: Introduction to Nutch, Part 1: Crawling
>>
>>Hi,
>>
>>I've written an article about using Nutch at the intranet scale, which

>>you may find interesting:
>>http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.h
>>tm
>>l .
>>Please post any comments on the article page itself.
>>
>>I've updated the wiki to link to it too.
>>
>>Regards,
>>
>>Tom
>>
>>    
>>
>
>
>  
>

Reply via email to