This solves the segment selection issue nicely. Thanks.


--Rick

Byron Miller wrote:

You would want something like this:

bin/nutch generate -topN 1000000
segment=`ls -d segments/2* | tail -1`
bin/nutch fetch $segment
bin/nutch updatedb db $segment


ofcourse replace topN with the count of urls you wish to fetch. You could do a for loop to run this over x amount of times as well.

-byron




-----Original Message----- From: Richard Anderson <[EMAIL PROTECTED]> To: [email protected] Date: Thu, 21 Apr 2005 09:23:31 -0400 Subject: Running nutch on new segments



For nightly indexing how do you select the current segment to fetch, updatedb, analyze, and index?

I wrote the following script that shows what I need to do to update the
index nightly.

$cat nutch-update.sh

nutch fetch /webapps/nutch/search-dir/segments/20050421085829/
nutch updatedb /webapps/nutch/search-dir/db/ /webapps/nutch/search-dir/segments/20050421085829/
nutch analyze /webapps/nutch/search-dir/db/ 2
nutch index /webapps/nutch/search-dir/segments/20050421085829/



Am I completely crazy? I can't find any docs on automating the indexing process in regards to using segments.









Reply via email to