I’m coming in a little late, but as of 8.5 there is a new streaming expression 
designed for DBQ situations which basically does what Erick was suggesting - 
gets a list of IDs for a query then does a delete by ID: 
https://lucene.apache.org/solr/guide/8_5/stream-decorator-reference.html#delete.

It won’t help if you’re not on 8.5, but going forward will be a good option for 
large delete sets.
On May 26, 2020, 8:09 PM -0500, Dwane Hall <dwaneh...@hotmail.com>, wrote:
> Thank you very much Erick, Emir, and Bram this is extremly useful advice I 
> sincerely appreciate everyone’s input!
>
>
> Before I received your responses I ran a controlled DBQ test in our DR 
> environment and exactly what you said occurred. It was like reading a step by 
> step playbook of events with heavy blocking occurring on the Solr nodes and 
> lots of threads going into a TIMED_WAITING state. Several shards were pushed 
> into recovery mode and things were starting to get ugly, fast!
>
>
> I'd read snippets in blog posts and JIRA tickets on DBQ being a blocking 
> operation but I did not expect having such a specific DBQ (i.e. by ID's) 
> would operate very differently from the DBID (which I expected block as 
> well). Boy was I wrong! They're used interchangeably in the Solr ref guide 
> examples so it’s very useful to understand the performance implications of 
> each. Additionally all of the information I found on delete operations never 
> mentioned query performance so I was unsure of its impact in this dimension.
>
>
> Erik thanks again for your comprehensive response your blogs and user group 
> responses are always a pleasure to read I'm constantly picking useful pieces 
> of information that I use on a daily basis in managing our Solr/Fusion 
> clusters. Additionally, I've been looking for an excuse to use streaming 
> expressions and I did not think to use them the way you suggested. I've 
> watched quite a few of Joel's presentations on youtube and his blog is 
> brilliant. Streaming expressions are expanding with every Solr release they 
> really are a very exciting part of Solr's evolution. Your final point on 
> searcher state while streaming expressions are running and its relationship 
> with new searchers is a very interesting additional piece of information I’ll 
> add to the toolbox. Thank you.
>
>
>
> At the moment we're fortunate to have all the ID's of the documents to remove 
> in a DB so I'll be able to construct batches of DBID requests relatively 
> easily and store them in a backlog table for processing without needing to 
> traverse Solr with cursors, streaming (or other means) to identify them. We 
> follow a similar approach for updates in batches of around ~1000 docs/batch. 
> Inspiration for that sweet spot was once again determined after reading one 
> of Erik's Lucidworks blog posts and testing 
> (https://lucidworks.com/post/really-batch-updates-solr-2/).
>
>
>
> Again thanks to the community and users for everyone’s contribution on the 
> issue it is very much appreciated.
>
>
> Successful Solr-ing to all,
>
>
> Dwane
>
> ________________________________
> From: Bram Van Dam <bram.van...@intix.eu>
> Sent: Wednesday, 27 May 2020 5:34 AM
> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
> Subject: Re: Solr Deletes
>
> On 26/05/2020 14:07, Erick Erickson wrote:
> > So best practice is to go ahead and use delete-by-id.
>
>
> I've noticed that this can cause issues when using implicit routing, at
> least on 7.x. Though I can't quite remember whether the issue was a
> performance issue, or whether documents would sometimes not get deleted.
>
> In either case, I worked it around it by doing something like this:
>
> UpdateRequest req = new UpdateRequest();
> req.deleteById(id);
> req.setCommitWithin(-1);
> req.setParam(ShardParams._ROUTE_, shard);
>
> Maybe that'll help if you run into either of those issues.
>
> - Bram

Reply via email to