On Mon, 2018-10-29 at 10:55 +0200, Sofiya Strochyk wrote:
> I think we could try that, but most likely it turns out that at some
> point we are receiving 300 requests per second, and are able to
> reasonably handle 150 per second, which means everything else is
> going to be kept in the growing queue and increase response times
> even further..

Just as there should always be an upper limit on concurrent
connections, so should there be a limit on the queue. If your system is
overwhelmed (and you don't have some fancy auto-add-hardware), there
are only two possibilities: Crash the system or discard requests. It is
rarely the case that crashing the system is the acceptable action.

Queueing works to avoid congestion (improving throughput) and to avoid
crashes due to overwhelming. If the queue runs full and needs to
discard requests, turning off the queue would just mean that the system
is overwhelmed instead.

> Also, if one node has 12 cores that would mean it can process 12
> concurrent searches? And since every request is sent to all shards to
> check if there are results, does this also mean the whole cluster can
> handle 12 concurrent requests on average?

It is typically the case that threads goes idle while waiting for data
from memory or storage. This means that you get more performance out of
running more concurrent jobs than the number of CPUs.

How much one should over-provision is very hard to generalize, which is
why I suggest measuring (which of course also takes resources, this
time in the form of work hours). My rough suggestion of a factor 10 for
your system is guesswork erring on the side of a high number.

- Toke Eskildsen, Royal Danish Library


Reply via email to