Sometime ago for a very particular use case we abstracted this responsability into a custom Solr plugin for a few stored fields. it would handle this case, (don’t just updating a date field, but also keeping a counter on how many times an url is indexed). Of course you need stored fields for this and yet under the hood a document gets deleted and added.
On Jul 1, 2014, at 9:54 AM, Markus Jelsma <[email protected]> wrote: > Hi, > > NutchIndexAction is indeed prepared to handle updates but the methods are not > implemented. In case of Solr, it still does an internal add/delete for > updated documents, and to do so, you must have all fields stored="true". So > in almost all cases, it is more efficient not to store all fields and send > some additional data over the wire. You can implement it though. > > Markus > > -----Original message----- >> From:Ali Nazemian <[email protected]> >> Sent: Tuesday 1st July 2014 15:31 >> To: [email protected] >> Subject: Changing nutch for update documents instead of add new ones >> >> Dears, >> Hi, >> I am going to do some changes in nutch default behavior. I want to change >> nutch solr index (indexWriter class) in a way that instead of adding new >> document to solr, old documents are updated. I saw an "update" method >> inside this class. Is that implemented for this purpose? If no what is the >> purpose of this method? Another question is doing such thing (changing >> indexWriter to update document instead of adding them) would affect my >> performance for whole web crawling? >> Best regards. >> >> -- >> A.Nazemian >> VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu

