This solves the segment selection issue nicely. Thanks.
--Rick
Byron Miller wrote:
You would want something like this:
bin/nutch generate -topN 1000000 segment=`ls -d segments/2* | tail -1` bin/nutch fetch $segment bin/nutch updatedb db $segment
ofcourse replace topN with the count of urls you wish to fetch. You could do a for loop to run this over x amount of times as well.
-byron
-----Original Message----- From: Richard Anderson <[EMAIL PROTECTED]> To: [email protected] Date: Thu, 21 Apr 2005 09:23:31 -0400 Subject: Running nutch on new segments
For nightly indexing how do you select the current segment to fetch, updatedb, analyze, and index?
I wrote the following script that shows what I need to do to update the index nightly.
$cat nutch-update.sh
nutch fetch /webapps/nutch/search-dir/segments/20050421085829/
nutch updatedb /webapps/nutch/search-dir/db/ /webapps/nutch/search-dir/segments/20050421085829/
nutch analyze /webapps/nutch/search-dir/db/ 2
nutch index /webapps/nutch/search-dir/segments/20050421085829/
Am I completely crazy? I can't find any docs on automating the indexing process in regards to using segments.
