Responses about how to avoid this are not on topic. I’ve had Solr in production 
since version 1.3 and I know the right way.

I think I know how we got into this mess. The cluster is configured and 
deployed into Kubernetes. I think it was rebuilt with more shards then the 
existing storage volumes were mounted for the matching shards. New shards got 
empty volumes. Then the content was reloaded without a delete-all.

Would it work to send the deletes directly to the leader for the shard? That 
might bypass the hash-based routing.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On May 24, 2023, at 8:35 AM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> Clearly, they are not broadcast, or if they are, they are filtered by the 
> hash range before executing. If they were broadcast, this problem would not 
> have happened.
> 
> Yes, we’ll delete-all and reindex at some point. This collection has 1.7 
> billion documents across 96 shards, so a full reindex is not an everyday 
> occurrence. I’m trying to clean up the minor problem of 675k documents with 
> dupes.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On May 24, 2023, at 8:06 AM, Jan Høydahl <jan....@cominvent.com> wrote:
>> 
>> I thought deletes were "broadcast" but probably for the composite-id router 
>> it is not since we know for sure where it resides.
>> You say "shards were added" - how did you do that?
>> Sounds like you shold simply re-create your collection and re-index?
>> 
>> Jan
>> 
>>> 24. mai 2023 kl. 16:39 skrev Walter Underwood <wun...@wunderwood.org>:
>>> 
>>> We have a messed-up index with documents on shards where they shouldn’t be. 
>>> Content was indexed, shards were added, then everything was reindexed. So 
>>> the new document with the same ID was put on a new shard, leaving the 
>>> previous version on the old shard (where it doesn’t match the hash range).
>>> 
>>> I’m trying to delete the old document by sending an update with 
>>> delete-by-id and a shards parameter. It returns success, but the document 
>>> isn’t deleted.
>>> 
>>> Is the hash range being checked and overriding the shards param somehow? 
>>> Any ideas on how to make this work?
>>> 
>>> And yes, we won’t do that again.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>> 
> 

Reply via email to