If you don’t want to buy disk space for deleted docs, you should not be 
using Solr. That is an essential part of a reliable Solr installation.

To avoid reindexing unchanged documents, use a bookkeeping RDBMS
table. In that table, put the document ID and the most recent successful
update to Solr. You can check if the fields are the same with a checksum
of the data. MD5 is fine for that. Check that database before sending the
document and update it after new documents are indexed.

You may also want to record deletes in the database.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 26, 2020, at 1:12 AM, Anshuman Singh <singhanshuma...@gmail.com> wrote:
> 
> I was reading about in-place updates
> https://lucene.apache.org/solr/guide/7_4/updating-parts-of-documents.html,
> In my use case I have to update the field "LASTUPDATETIME", all other
> fields are same. Updates are very frequent and I can't bear the cost of
> deleted docs.
> 
> If I provide all the fields, it deletes the document and re-index it. But
> if I just "set" the "LASTUPDATETIME" field (non-indexed, non-stored,
> docValue field), it does an in-place update without deletion. But the
> problem is I don't know if the document is present or I'm indexing it the
> first time.
> 
> Is there a way to prevent re-indexing if other fields are the same?
> 
> *P.S. I'm looking for a solution that doesn't require looking up if doc is
> present in the Collection or not.*

Reply via email to