Hi,
I have a similar question but with a slightly different view point. I
have decided to test nutch here at the university to try and get some
more flexibility. For this to work I would need to re index the entire
website nightly. There are about 20 to 40 thousand pages here so I
don't think it would be a big deal to do that. Since pages are not
only always changing, but there are always new pages, I would want to
crawl the entire site every day. (I think thats my understanding...
fetch is only for already known pages correct?)
So.... How do I crawl everyday without restarting Tomcat everyday? Can
I merge the databases? Can a new crawl just update the current
database? Am I missing something (I'm pretty new to this and its
nomenclature)?
Or do I need to create a shell script that re-indexes the site,
restarts tomcat, and removes the old db?
It would be great if there was a section in the WIKI about setting up
Nutch for automatic updates, and reindexing as it seems that there are
a few questions like mine around.
Thanks in advance.
--
Dave Wolowicz
Web Developer
UVic Communications
-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general