Echoing what Thomas says, this problem indicates your indexing system
probably has a significant design flaw. For most systems, you should have a
notion of document identity that is external to Solr, and that should be
used as (or to deterministically generate) the id in Solr. If you don't do
this you are forced to solve the *potentially hard problem* of ensuring
that documents never get sent twice. (or not care about duplicates). There
are a few limited situations where that problem is not so hard (such as
systems where all data is re-indexed nightly into a fresh index), but often
it's very hard to ensure documents never get sent twice. It's often much
safer to let the duplicates simply overwrite each other (because they get
the same id).

If I had to guess, you are probably generating GUIDs as the value for your
key field (or something similarly random)? If so, don't do that. It's a
shortcut that comes back to bite you in exactly the way you are
experiencing.

Also note that it's a very bad place to be if you are unable to rebuild
your entire index. If you are on 6.6 as per the ref guide link you supplied
you really should upgrade due to numerous security flaws that have been
fixed, and upgrading from 6 to 8 or 9 definitely requires re-indexing
everything.

Having now said the above, if you just need to remove a few mistakes, such
as a dozen bogus documents that some developer accidentally sent to the
prod system, deleting documents is covered in the ref guide here:

https://solr.apache.org/guide/solr/latest/getting-started/tutorial-diy.html#deleting-data

-Gus

-- 
http://www.needhamsoftware.com (work)
https://a.co/d/b2sZLD9 (my fantasy fiction book)

Reply via email to