Thanks for explaining that Shawn! Emir, I use php library called solarium to do updates/deletes to solr. The request is sent to any of the available nodes in the cluster.
> On May 7, 2018, at 5:02 PM, Shawn Heisey <apa...@elyograg.org> wrote: > >> On 5/7/2018 5:05 PM, Jay Potharaju wrote: >> There are some deletes by query. I have not had any issues with DBQ, >> currently have 5.3 running in production. > > Here's the big problem with DBQ. Imagine this sequence of events with > these timestamps: > > 13:00:00: A commit for change visibility happens. > 13:00:00: A segment merge is triggered by the commit. > (It's a big merge that takes exactly 3 minutes.) > 13:00:05: A deleteByQuery is sent. > 13:00:15: An update to the index is sent. > 13:00:25: An update to the index is sent. > 13:00:35: An update to the index is sent. > 13:00:45: An update to the index is sent. > 13:00:55: An update to the index is sent. > 13:01:05: An update to the index is sent. > 13:01:15: An update to the index is sent. > 13:01:25: An update to the index is sent. > {time passes, more updates might be sent} > 13:03:00: The merge finishes. > > Here's what would happen in this scenario: The DBQ and all of the > update requests sent *after* the DBQ will block until the merge > finishes. That means that it's going to take up to three minutes for > Solr to respond to those requests. If the client that is sending the > request is configured with a 60 second socket timeout, which inter-node > requests made by Solr are by default, then it is going to experience a > timeout error. The request will probably complete successfully once the > merge finishes, but the connection is gone, and the client has already > received an error. > > Now imagine what happens if an optimize (forced merge of the entire > index) is requested on an index that's 50GB. That optimize may take 2-3 > hours, possibly longer. A deleteByQuery started on that index after the > optimize begins (and any updates requested after the DBQ) will pause > until the optimize is done. A pause of 2 hours or more is a BIG problem. > > This is why deleteByQuery is not recommended. > > If the deleteByQuery were changed into a two-step process involving a > query to retrieve ID values and then one or more deleteById requests, > then none of that blocking would occur. The deleteById operation can > run at the same time as a segment merge, so neither it nor subsequent > update requests will have the significant pause. From what I > understand, you can even do commits in this scenario and have changes be > visible before the merge completes. I haven't verified that this is the > case. > > Experienced devs: Can we fix this problem with DBQ? On indexes with a > uniqueKey, can DBQ be changed to use the two-step process I mentioned? > > Thanks, > Shawn >