I’m coming in a little late, but as of 8.5 there is a new streaming expression designed for DBQ situations which basically does what Erick was suggesting - gets a list of IDs for a query then does a delete by ID: https://lucene.apache.org/solr/guide/8_5/stream-decorator-reference.html#delete.
It won’t help if you’re not on 8.5, but going forward will be a good option for large delete sets. On May 26, 2020, 8:09 PM -0500, Dwane Hall <dwaneh...@hotmail.com>, wrote: > Thank you very much Erick, Emir, and Bram this is extremly useful advice I > sincerely appreciate everyone’s input! > > > Before I received your responses I ran a controlled DBQ test in our DR > environment and exactly what you said occurred. It was like reading a step by > step playbook of events with heavy blocking occurring on the Solr nodes and > lots of threads going into a TIMED_WAITING state. Several shards were pushed > into recovery mode and things were starting to get ugly, fast! > > > I'd read snippets in blog posts and JIRA tickets on DBQ being a blocking > operation but I did not expect having such a specific DBQ (i.e. by ID's) > would operate very differently from the DBID (which I expected block as > well). Boy was I wrong! They're used interchangeably in the Solr ref guide > examples so it’s very useful to understand the performance implications of > each. Additionally all of the information I found on delete operations never > mentioned query performance so I was unsure of its impact in this dimension. > > > Erik thanks again for your comprehensive response your blogs and user group > responses are always a pleasure to read I'm constantly picking useful pieces > of information that I use on a daily basis in managing our Solr/Fusion > clusters. Additionally, I've been looking for an excuse to use streaming > expressions and I did not think to use them the way you suggested. I've > watched quite a few of Joel's presentations on youtube and his blog is > brilliant. Streaming expressions are expanding with every Solr release they > really are a very exciting part of Solr's evolution. Your final point on > searcher state while streaming expressions are running and its relationship > with new searchers is a very interesting additional piece of information I’ll > add to the toolbox. Thank you. > > > > At the moment we're fortunate to have all the ID's of the documents to remove > in a DB so I'll be able to construct batches of DBID requests relatively > easily and store them in a backlog table for processing without needing to > traverse Solr with cursors, streaming (or other means) to identify them. We > follow a similar approach for updates in batches of around ~1000 docs/batch. > Inspiration for that sweet spot was once again determined after reading one > of Erik's Lucidworks blog posts and testing > (https://lucidworks.com/post/really-batch-updates-solr-2/). > > > > Again thanks to the community and users for everyone’s contribution on the > issue it is very much appreciated. > > > Successful Solr-ing to all, > > > Dwane > > ________________________________ > From: Bram Van Dam <bram.van...@intix.eu> > Sent: Wednesday, 27 May 2020 5:34 AM > To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> > Subject: Re: Solr Deletes > > On 26/05/2020 14:07, Erick Erickson wrote: > > So best practice is to go ahead and use delete-by-id. > > > I've noticed that this can cause issues when using implicit routing, at > least on 7.x. Though I can't quite remember whether the issue was a > performance issue, or whether documents would sometimes not get deleted. > > In either case, I worked it around it by doing something like this: > > UpdateRequest req = new UpdateRequest(); > req.deleteById(id); > req.setCommitWithin(-1); > req.setParam(ShardParams._ROUTE_, shard); > > Maybe that'll help if you run into either of those issues. > > - Bram