You would want something like this:

bin/nutch generate -topN 1000000
segment=`ls -d segments/2* | tail -1`
bin/nutch fetch $segment
bin/nutch updatedb db $segment


ofcourse replace topN with the count of urls you wish to fetch.  You could
do a for loop to run this over x amount of times as well.

-byron




-----Original Message-----
From: Richard Anderson <[EMAIL PROTECTED]>
To: [email protected]
Date: Thu, 21 Apr 2005 09:23:31 -0400
Subject: Running nutch on new segments

> 
> For nightly indexing how do you select the current segment to fetch, 
> updatedb, analyze, and index?
> 
> I wrote the following script that shows what I need to do to update the
> index nightly.
> 
> $cat nutch-update.sh
> 
> nutch fetch /webapps/nutch/search-dir/segments/20050421085829/
> nutch updatedb  /webapps/nutch/search-dir/db/  
> /webapps/nutch/search-dir/segments/20050421085829/
> nutch analyze  /webapps/nutch/search-dir/db/ 2
> nutch index  /webapps/nutch/search-dir/segments/20050421085829/
> 
> 
> Am I completely crazy? I can't find any docs on automating the indexing
> process in regards to using segments.
> 

Reply via email to