My immediate thought was HTTP/2 but I see you are running with 
HTTP/1 (although interestingly some changes made for the sake of
HTTP/2 may have contributed since there is shared code). I am 
sort of skeptical of the third finding. The application of the 
idle timeout as the default request timeout isn't *that* old.
I remember researching this because of an issue with the index
fetcher (which incidentally should *not* have this behavior)

https://issues.apache.org/jira/browse/SOLR-17711

The thought of a bunch of requests trickling little bits of
data for arbitrarily long, just enough to reset idle timeout
seems unlikely at first blush.


From: [email protected] At: 03/18/26 07:39:49 UTC-4:00To:  
[email protected]
Subject: Deadlock observed for distributed search in Solr 9.10.1

We recently upgraded some Solr clusters from version 9.7 to 9.10.1. Collection 
have multiple shards and run distributed requests continously. After a few 
days, distributed requests would start timing out and all clients would fail, 
requiring a full solr cluster restart to recover. No sign of overload. 
Downgrading back to Solr 9.7 fixed the issues. This has been observed in 
several different environments.

Have anyone else seen similar behavior in your own clusters?

As there is no errors in Solr logs, no exceptions, no high load or scary 
Grafana graphs in GC or otherwise, we have spent several days investigating and 
trying to reproduce, with limited luck.

The best I have is an LLM analysis of the issue and a theory of what might 
cause it. It think the analysis is interesting and the suspect is leaking 
semaphores in Http2SolrClient.AsyncTracker which would eventually cause a full 
stop.

The analysis is here https://cwiki.apache.org/confluence/x/AZM8G - it contains 
a description, executive summary, tech details and some questions for 
ocmmitters. You may comment inline in Confluence if you have an account, or 
here in this thread.

I have not yet filed a bug in JIRA, as I want to discuss here and still hope to 
reproduce the issue in a pristine environment.

Jan

Reply via email to