Sometime ago for a very particular use case we abstracted this responsability  
into a custom Solr plugin for a few stored fields. it would handle this case, 
(don’t just updating a date field, but also keeping a counter on how many times 
an url is indexed). Of course you need stored fields for this and yet under the 
hood a document gets deleted and added.

On Jul 1, 2014, at 9:54 AM, Markus Jelsma <[email protected]> wrote:

> Hi, 
> 
> NutchIndexAction is indeed prepared to handle updates but the methods are not 
> implemented. In case of Solr, it still does an internal add/delete for 
> updated documents, and to do so, you must have all fields stored="true". So 
> in almost all cases, it is more efficient not to store all fields and send 
> some additional data over the wire. You can implement it though.
> 
> Markus
> 
> -----Original message-----
>> From:Ali Nazemian <[email protected]>
>> Sent: Tuesday 1st July 2014 15:31
>> To: [email protected]
>> Subject: Changing nutch for update documents instead of add new ones
>> 
>> Dears,
>> Hi,
>> I am going to do some changes in nutch default behavior. I want to change
>> nutch solr index (indexWriter class) in a way that instead of adding new
>> document to solr, old documents are updated. I saw an "update" method
>> inside this class. Is that implemented for this purpose? If no what is the
>> purpose of this method? Another question is doing such thing (changing
>> indexWriter to update document instead of adding them) would affect my
>> performance for whole web crawling?
>> Best regards.
>> 
>> -- 
>> A.Nazemian
>> 

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu

Reply via email to