The default interval for fetched content is 30 days, so whats in your index now will not be fetched until those days have passed. All the new links are ready to be fetched immediately. Just create another segment from the same Nutch DB and it will include all of those new links to be fetched. You might want to run some stats on your Nutch DB before you do this, or at least limit the size of the new segment being created. Depending on the size of your first segment and the amount of links on those pages you might have imported "a lot" more links then your expecting. Stats command: bin/nutch readdb crawl/crawldb -stats
Limiting segment size: bin/nutch generate crawl/crawldb crawl/segments -topN [maximum amount of links] ----- Original Message ---- From: Ricardo J. Méndez <[EMAIL PROTECTED]> To: [email protected] Sent: Wednesday, March 7, 2007 12:16:54 AM Subject: Following outlinks during - or after - seed fetch Hi, I've written a plugin and have been running some tests with Nutch, based on the tutorials on the wiki (specifically http://wiki.apache.org/nutch/NutchTutorial ). I'm seeding the crawl list with a limited item list, so that I can verify the items are being loaded. After the end of the fetch, the index is correctly populated with the items I told it to fetch. How can I start a crawl from the outlinks on the items I've seeded? Thanks in advance, Ricardo J. Méndez http://ricardo.strangevistas.net/
