Responses about how to avoid this are not on topic. I’ve had Solr in production since version 1.3 and I know the right way.
I think I know how we got into this mess. The cluster is configured and deployed into Kubernetes. I think it was rebuilt with more shards then the existing storage volumes were mounted for the matching shards. New shards got empty volumes. Then the content was reloaded without a delete-all. Would it work to send the deletes directly to the leader for the shard? That might bypass the hash-based routing. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2023, at 8:35 AM, Walter Underwood <wun...@wunderwood.org> wrote: > > Clearly, they are not broadcast, or if they are, they are filtered by the > hash range before executing. If they were broadcast, this problem would not > have happened. > > Yes, we’ll delete-all and reindex at some point. This collection has 1.7 > billion documents across 96 shards, so a full reindex is not an everyday > occurrence. I’m trying to clean up the minor problem of 675k documents with > dupes. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On May 24, 2023, at 8:06 AM, Jan Høydahl <jan....@cominvent.com> wrote: >> >> I thought deletes were "broadcast" but probably for the composite-id router >> it is not since we know for sure where it resides. >> You say "shards were added" - how did you do that? >> Sounds like you shold simply re-create your collection and re-index? >> >> Jan >> >>> 24. mai 2023 kl. 16:39 skrev Walter Underwood <wun...@wunderwood.org>: >>> >>> We have a messed-up index with documents on shards where they shouldn’t be. >>> Content was indexed, shards were added, then everything was reindexed. So >>> the new document with the same ID was put on a new shard, leaving the >>> previous version on the old shard (where it doesn’t match the hash range). >>> >>> I’m trying to delete the old document by sending an update with >>> delete-by-id and a shards parameter. It returns success, but the document >>> isn’t deleted. >>> >>> Is the hash range being checked and overriding the shards param somehow? >>> Any ideas on how to make this work? >>> >>> And yes, we won’t do that again. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >> >