On 5/7/2018 5:05 PM, Jay Potharaju wrote:
> There are some deletes by query. I have not had any issues with DBQ,
> currently have 5.3 running in production.

Here's the big problem with DBQ.  Imagine this sequence of events with
these timestamps:

13:00:00: A commit for change visibility happens.
13:00:00: A segment merge is triggered by the commit.
(It's a big merge that takes exactly 3 minutes.)
13:00:05: A deleteByQuery is sent.
13:00:15: An update to the index is sent.
13:00:25: An update to the index is sent.
13:00:35: An update to the index is sent.
13:00:45: An update to the index is sent.
13:00:55: An update to the index is sent.
13:01:05: An update to the index is sent.
13:01:15: An update to the index is sent.
13:01:25: An update to the index is sent.
{time passes, more updates might be sent}
13:03:00: The merge finishes.

Here's what would happen in this scenario:  The DBQ and all of the
update requests sent *after* the DBQ will block until the merge
finishes.  That means that it's going to take up to three minutes for
Solr to respond to those requests.  If the client that is sending the
request is configured with a 60 second socket timeout, which inter-node
requests made by Solr are by default, then it is going to experience a
timeout error.  The request will probably complete successfully once the
merge finishes, but the connection is gone, and the client has already
received an error.

Now imagine what happens if an optimize (forced merge of the entire
index) is requested on an index that's 50GB.  That optimize may take 2-3
hours, possibly longer.  A deleteByQuery started on that index after the
optimize begins (and any updates requested after the DBQ) will pause
until the optimize is done.  A pause of 2 hours or more is a BIG problem.

This is why deleteByQuery is not recommended.

If the deleteByQuery were changed into a two-step process involving a
query to retrieve ID values and then one or more deleteById requests,
then none of that blocking would occur.  The deleteById operation can
run at the same time as a segment merge, so neither it nor subsequent
update requests will have the significant pause.  From what I
understand, you can even do commits in this scenario and have changes be
visible before the merge completes.  I haven't verified that this is the
case.

Experienced devs: Can we fix this problem with DBQ?  On indexes with a
uniqueKey, can DBQ be changed to use the two-step process I mentioned?

Thanks,
Shawn

Reply via email to