Hello Ali,
as far as I have seen during a similar research in Nutch 1.8 the update
method is never called and the underlying components are not intended to
distinguish between documents that are fetched for the first time and
documents that are fetched again. I've used a workaround based on the
signatures but I wouldn't recommend it.
Regards,
Florian
Am 01.07.2014 15:31, schrieb Ali Nazemian:
Dears,
Hi,
I am going to do some changes in nutch default behavior. I want to change
nutch solr index (indexWriter class) in a way that instead of adding new
document to solr, old documents are updated. I saw an "update" method
inside this class. Is that implemented for this purpose? If no what is the
purpose of this method? Another question is doing such thing (changing
indexWriter to update document instead of adding them) would affect my
performance for whole web crawling?
Best regards.