Re: Solr Deletes

Dwane Hall Tue, 26 May 2020 18:09:25 -0700

Thank you very much Erick, Emir, and Bram this is extremly useful advice I 
sincerely appreciate everyone’s input!



Before I received your responses I ran a controlled DBQ test in our DR 
environment and exactly what you said occurred.  It was like reading a step by 
step playbook of events with heavy blocking occurring on the Solr nodes and 
lots of threads going into a TIMED_WAITING state. Several shards were pushed 
into recovery mode and things were starting to get ugly, fast!


I'd read snippets in blog posts and JIRA tickets on DBQ being a blocking 
operation but I did not expect having such a specific DBQ (i.e. by ID's) would 
operate very differently from the DBID (which I expected block as well). Boy 
was I wrong! They're used interchangeably in the Solr ref guide examples so 
it’s very useful to understand the performance implications of each.  
Additionally all of the information I found on delete operations never 
mentioned query performance so I was unsure of its impact in this dimension.


Erik thanks again for your comprehensive response your blogs and user group 
responses are always a pleasure to read I'm constantly picking useful pieces of 
information that I use on a daily basis in managing our Solr/Fusion clusters. 
Additionally, I've been looking for an excuse to use streaming expressions and 
I did not think to use them the way you suggested.  I've watched quite a few of 
Joel's presentations on youtube and his blog is brilliant.  Streaming 
expressions are expanding with every Solr release they really are a very 
exciting part of Solr's evolution.  Your final point on searcher state while 
streaming expressions are running and its relationship with new searchers is a 
very interesting additional piece of information I’ll add to the toolbox. Thank 
you.



At the moment we're fortunate to have all the ID's of the documents to remove 
in a DB so I'll be able to construct batches of DBID requests relatively easily 
and store them in a backlog table for processing without needing to traverse 
Solr with cursors, streaming (or other means) to identify them.  We follow a 
similar approach for updates in batches of around ~1000 docs/batch.  
Inspiration for that sweet spot was once again determined after reading one of 
Erik's Lucidworks blog posts and testing 
(https://lucidworks.com/post/really-batch-updates-solr-2/).



Again thanks to the community and users for everyone’s contribution on the 
issue it is very much appreciated.


Successful Solr-ing to all,


Dwane

________________________________
From: Bram Van Dam <bram.van...@intix.eu>
Sent: Wednesday, 27 May 2020 5:34 AM
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
Subject: Re: Solr Deletes

On 26/05/2020 14:07, Erick Erickson wrote:
> So best practice is to go ahead and use delete-by-id.


I've noticed that this can cause issues when using implicit routing, at
least on 7.x. Though I can't quite remember whether the issue was a
performance issue, or whether documents would sometimes not get deleted.

In either case, I worked it around it by doing something like this:

UpdateRequest req = new UpdateRequest();
req.deleteById(id);
req.setCommitWithin(-1);
req.setParam(ShardParams._ROUTE_, shard);

Maybe that'll help if you run into either of those issues.

 - Bram

Re: Solr Deletes

Reply via email to