I guess you can run segmentMergeTool to merge new
segments with previous one ( document with duplicated
URL and content MD5 will be discarded) and then run
index on it,
not sure if it is the best scenario for daily
refetching---just my thought based on the code I dig
out,
Michael Ji,
--- Lokkju <[EMAIL PROTECTED]> wrote:
> I have searched through the mail archives, and seen
> this question
> asked alot, but no answer ever seems to come back.
> I am going to be
> using nutch against 5 sites, and I want to update
> the index on a
> nightly basis. Besides deleting the previous crawl,
> then running it
> again, what method of doing nightly updates is
> recommended?
>
> Thanks,
> Nick
>
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com