I think I found a similar thread here:

http://mail-archives.apache.org/mod_mbox/incubator-nutch-user/200503.mbox/[EMAIL
 PROTECTED]

The upshot was:

... use the commands described in the internet crawling tutorial.
http://incubator.apache.org/nutch/tutorial.html#Whole-web+Crawling


Hi,

I've followed the instructions to set up an Intranet Search Engine, but wondered about updating it with new pages. Do I just have to rerun the crawl everyday or can I use nutch update in some way?

Also I've set the following property in nutch-site.xml

<property>
 <name>db.default.fetch.interval</name>
 <value>1</value>
 <description>The default number of days between re-fetches of a page.
 </description>
</property>

Am I right in thinking this configures nutch to check the current pages it knows about are still valid, and takes them out if not?

Thanks for any help.

JS.






-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy. Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to