Thank you very much Erick, Emir, and Bram this is extremly useful advice I sincerely appreciate everyone’s input!
Before I received your responses I ran a controlled DBQ test in our DR environment and exactly what you said occurred. It was like reading a step by step playbook of events with heavy blocking occurring on the Solr nodes and lots of threads going into a TIMED_WAITING state. Several shards were pushed into recovery mode and things were starting to get ugly, fast! I'd read snippets in blog posts and JIRA tickets on DBQ being a blocking operation but I did not expect having such a specific DBQ (i.e. by ID's) would operate very differently from the DBID (which I expected block as well). Boy was I wrong! They're used interchangeably in the Solr ref guide examples so it’s very useful to understand the performance implications of each. Additionally all of the information I found on delete operations never mentioned query performance so I was unsure of its impact in this dimension. Erik thanks again for your comprehensive response your blogs and user group responses are always a pleasure to read I'm constantly picking useful pieces of information that I use on a daily basis in managing our Solr/Fusion clusters. Additionally, I've been looking for an excuse to use streaming expressions and I did not think to use them the way you suggested. I've watched quite a few of Joel's presentations on youtube and his blog is brilliant. Streaming expressions are expanding with every Solr release they really are a very exciting part of Solr's evolution. Your final point on searcher state while streaming expressions are running and its relationship with new searchers is a very interesting additional piece of information I’ll add to the toolbox. Thank you. At the moment we're fortunate to have all the ID's of the documents to remove in a DB so I'll be able to construct batches of DBID requests relatively easily and store them in a backlog table for processing without needing to traverse Solr with cursors, streaming (or other means) to identify them. We follow a similar approach for updates in batches of around ~1000 docs/batch. Inspiration for that sweet spot was once again determined after reading one of Erik's Lucidworks blog posts and testing (https://lucidworks.com/post/really-batch-updates-solr-2/). Again thanks to the community and users for everyone’s contribution on the issue it is very much appreciated. Successful Solr-ing to all, Dwane ________________________________ From: Bram Van Dam <bram.van...@intix.eu> Sent: Wednesday, 27 May 2020 5:34 AM To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> Subject: Re: Solr Deletes On 26/05/2020 14:07, Erick Erickson wrote: > So best practice is to go ahead and use delete-by-id. I've noticed that this can cause issues when using implicit routing, at least on 7.x. Though I can't quite remember whether the issue was a performance issue, or whether documents would sometimes not get deleted. In either case, I worked it around it by doing something like this: UpdateRequest req = new UpdateRequest(); req.deleteById(id); req.setCommitWithin(-1); req.setParam(ShardParams._ROUTE_, shard); Maybe that'll help if you run into either of those issues. - Bram