Well,
you can use the normal nutch tools for that, but you may need to
setup the url filter so that they filter the correct pages.
Than you can:
// generate a segment
bin/nutch bin/nutch generate yourDb aSegmentFolder
// get the segment
seg=`ls -d aSegmentFolder/2* | tail -1`
// fetch the segment
bin/nutch fetch $seg
// update the webdb with the content of the freshly fetched segment
bin/nutch updatedb db $seg
// index the segment
bin/nutch index $seg
May this document gives you more understanding of the procedure...
http://wiki.media-style.com/display/nutchDocu/Home
HTH
Stefan
Am 07.11.2005 um 23:50 schrieb Paul M Lieberman:
I've created a db of roughly 250,000 entries from a few of our
websites. I did this with CrawlTool (depth 10).
How would I go about doing a nightly update to add more pages to
the db?
I have looked high and low through the documentation, and have not
been able to ferret this out.
TIA,
Paul Lieberman
American Psychological Association
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net