Indeed; "async" doesn't support cancellation. Deleting the request ID
doesn't cancel. BTW I haven't been meaning to talk about
cancellation; I've been talking about not starting something if it's
already too late. And I haven't been talking about anything that
would impact "async" calls; I'm
A cluster wide timeout makes sense and is simpler if it is only used by the
Overseer (or whatever entity processes a request) to decide not to start
processing (that delay would not be request specific but depends on the
load put by other concurrent activity in the cluster).
If we consider a
On Thu, Feb 1, 2024 at 1:53 PM Ilan Ginzburg wrote:
>
> I'd be in favor of the Overseer dropping synchronous requests for which the
> requestor is no longer waiting (ephemeral ZK node is gone).
I agree! As you know, we've customized Solr to do exactly that for
collection creation. We suspect a
I'd be in favor of the Overseer dropping synchronous requests for which the
requestor is no longer waiting (ephemeral ZK node is gone).
For sync or async requests, we could let the caller set a timeout after
which the processing should not start if it hasn't already, or for async
messages allow a
Thanks David for starting this thread. We have also seen this behavior from
overseer resulting in "orphan collections" or "more than 1 replica created"
due to timeouts especially when our cluster is scaled up during peak
traffic days.
While I am still at a nascent stage of my understanding of solr
I have a proposal and am curious what folks think. When the Overseer
dequeues an admin command message to process, imagine it being
enhanced to examine the "ctime" (creation time) of the ZK message node
to determine how long it has been enqueued, and thus roughly how long
the client has been