Hello Ali,

as far as I have seen during a similar research in Nutch 1.8 the update method is never called and the underlying components are not intended to distinguish between documents that are fetched for the first time and documents that are fetched again. I've used a workaround based on the signatures but I wouldn't recommend it.

Regards,
Florian


Am 01.07.2014 15:31, schrieb Ali Nazemian:
Dears,
Hi,
I am going to do some changes in nutch default behavior. I want to change
nutch solr index (indexWriter class) in a way that instead of adding new
document to solr, old documents are updated. I saw an "update" method
inside this class. Is that implemented for this purpose? If no what is the
purpose of this method? Another question is doing such thing (changing
indexWriter to update document instead of adding them) would affect my
performance for whole web crawling?
Best regards.


Reply via email to