Hello,
I am not sure if I understood you correctly but if you use technique
described as "whole web crawling" in tutorial you are not starting from
scratch but you can fetch new pages and refetch and update existing
ones. But probably I misunderstood your question so please give us more
details on the thing you want to achieve -e.g. do you plan to fetch from
limited number of sites ?
Regards
Piotr
carmmello wrote:
have been using Nutch for over 1 year now and that is a question that I
have allways asked without any answer. I have tried a lot of things,
looked in the mail lists, the tutorial, everywhere, but no answer. So,
for me, it seems that the only way to keep yourself updated is to start
everything all over again. It seems (as far as I know) that Nutch was
not designed to allow you to update yourself with only new or modified
pages on an existing set of index, db and segments. If someone knows
something about this issue, let us know, because this points seems, to
me, the bigest problem to, really, start using Nutch on a regular basis
in a "production site".
Tanks
------------------------------------------------------------------------
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.322 / Virus Database: 267.4.1 - Release Date: 2/6/2005