Hi!,

We're experiencing performance issues in the recent Solr versions — 9.5.0
and 9.6.1 — regarding backup and restore. In 9.2.1, we could take a backup
of 10TB data in just 1 and a half hours. Currently, as of 9.5.0, taking a
backup of the collection takes 7 hours! We're unable to make use of
disaster recovery effectively and reliably in Solr. Therefore, Solr 9.2.1
still remains the most effective choice among the other 9.x versions for
our use.

It seems that this is the ticket causing this issue:
1. https://issues.apache.org/jira/browse/SOLR-16879

Interestingly, we never encountered a throttling problem during operations
when this was introduced to be solved based on this argument on 9.2.1. From
a devops perspective, we have some details and metrics on these tasks to
distinguish the difference between two versions. The overall IOPS was 150MB
on 9.6.1, while IOPS was 500MB on 9.2.1 during the same backup and restore
tasks. In the first image [1], the peak on the left represents a backup, in
contrast, in the 2nd image [2], the same backup operation in 9.5.0 uses
less resource. As you may spot, 9.5.0 seems to be using a fifth of the
resources of 9.2.1.

Apart from that, monitoring some relevant metrics during the operations, I
had some difficulty interpreting the following metrics:

"ADMIN./admin/cores.threadPool.parallelCoreExpensiveAdminExecutor.pool.core":
0,
"ADMIN./admin/cores.threadPool.parallelCoreExpensiveAdminExecutor.pool.max":
5,
"ADMIN./admin/cores.threadPool.parallelCoreExpensiveAdminExecutor.pool.size":
1,
"ADMIN./admin/cores.threadPool.parallelCoreExpensiveAdminExecutor.running":
1,

The pool size was 1 although the pool max size is 5. Shouldn't the pool
size be 5, instead? However, there is always one task running on a single
node, not 5 concurrently, if I'm not mistaken.

I was also wondering if the max thread size, which is currently 5 in 9.4+,
could be configurable with either an environment variable or Java
parameter? The part that needs to be changed seems to be in
CoreAdminHandler.java on line 446 [3] I've made a small adjustment to add a
Solr parameter called `solr.maxExpensiveTaskThreads` for those who want to
set a different thread size for expensive tasks. The number given in this
parameter must meet the criteria of ThreadPoolExecutor, otherwise
IllegalArgumentException will occur. I've generated a patch [4] and I would
love to see if someone from the Solr committers would take on this and
apply for the upcoming release. Do you think our observation is accurate
and would this patch be feasible to implement?

Thanks!
Hakan

1. https://i.imgur.com/aSrs8OM.png
2. https://i.imgur.com/Yr6hBM8.png
3.
https://github.com/apache/solr/commit/82a847f0f9af18d6eceee18743d636db7a879f3e#diff-5bc3d44ca8b189f44fe9e6f75af8a5510463bdba79ff72a7d0ed190973a32533L446
4. https://gist.github.com/ozlerhakan/e4d11bddae6a2f89d2c212c220f4c965

Reply via email to