Re: How do I update a nutch db?

Stefan Groschupf Mon, 07 Nov 2005 15:06:40 -0800

Well,

you can use the normal nutch tools for that, but you may need tosetup the url filter so that they filter the correct pages.

Than you can:
// generate a segment
bin/nutch bin/nutch generate yourDb aSegmentFolder
// get the segment
seg=`ls -d aSegmentFolder/2* | tail -1`
// fetch the segment
bin/nutch fetch $seg
// update the webdb with the content of the freshly fetched segment
bin/nutch updatedb db $seg
// index the segment
bin/nutch index $seg


May this document gives you more understanding of the procedure...
http://wiki.media-style.com/display/nutchDocu/Home

HTH
Stefan




Am 07.11.2005 um 23:50 schrieb Paul M Lieberman:

I've created a db of roughly 250,000 entries from a few of ourwebsites. I did this with CrawlTool (depth 10).
How would I go about doing a nightly update to add more pages tothe db?
I have looked high and low through the documentation, and have notbeen able to ferret this out.
TIA,

Paul Lieberman
American Psychological Association


---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net

Re: How do I update a nutch db?

Reply via email to