Well,
you can use the normal nutch tools for that, but you may need to setup the url filter so that they filter the correct pages.
Than you can:
// generate a segment
bin/nutch bin/nutch generate yourDb aSegmentFolder
// get the segment
seg=`ls -d aSegmentFolder/2* | tail -1`
// fetch the segment
bin/nutch fetch $seg
// update the webdb with the content of the freshly fetched segment
bin/nutch updatedb db $seg
// index the segment
bin/nutch index $seg

May this document gives you more understanding of the procedure...
http://wiki.media-style.com/display/nutchDocu/Home

HTH
Stefan




Am 07.11.2005 um 23:50 schrieb Paul M Lieberman:

I've created a db of roughly 250,000 entries from a few of our websites. I did this with CrawlTool (depth 10).

How would I go about doing a nightly update to add more pages to the db?

I have looked high and low through the documentation, and have not been able to ferret this out.

TIA,

Paul Lieberman
American Psychological Association


---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net


Reply via email to