On 6/16/22 02:59, Marius Grigaitis wrote:
In the end what caught our eye is a few deleteByQuery lines in stacks of
running threads while Solr is overloaded. We temporarily removed
deleteByQuery and it had around 10x performance improvement on indexing
speed.
I do not understand all the low-level interactions. But I have seen
deleteByQuery cause some major problems. It seems to create a blocking
situation where Lucene waits for things to complete before it actually
does the delete, and anything sent AFTER the delete waits for the
delete. Imagine this situation:
1) Ongoing indexing begins a segment merge, one that will take 15
minutes to complete.
2) A deleteByQuery is sent.
3) More index changes are sent.
What happens in this situation is that step 2 will wait for the merge to
complete, and step 3 will wait for step 2 to complete. I have seen
automatic segment merges that take a lot longer than 15 minutes.
If step 2 is changed to query for ID and then use deleteById, then steps
2 and 3 will run concurrently with the merge.
It took a lot of headscratching to figure out why my indexing process
sometimes stalled for LONG time spans.
Thanks,
Shawn