Hi Pablo, I'm not sure what settings govern Solr's jetty container.
/opt/solr/server/etc/jetty.xml includes the following: solr.jetty.threads.min: 10 solr.jetty.threads.max: 10000 solr.jetty.threads.idle.timeout: 5000 solr.jetty.threads.stop.timeout: 60000 MAX_CONNECTIONS_PER_HOST could certainly be an issue, but I'm not sure where that would be configured. I'm not sure what you're asking about with respect to a singleton pattern. My application is highly distributed (over 900 agent applications making queries) making queries via a loadbalancer running HAProxy. -D On Wed, Dec 28, 2016 at 12:42 PM, Pablo Anzorena <anzorena.f...@gmail.com> wrote: > Dave, > > there is something similar like MAX_CONNECTIONS and > MAX_CONNECTIONS_PER_HOST which control the number of connections. > > Are you leaving open the connection to zookeeper after you establish it? > Are you using the singleton pattern? > > 2016-12-28 14:14 GMT-03:00 Dave Seltzer <dselt...@tveyes.com>: > > > Hi Erick, > > > > I'll dig in on these timeout settings and see how changes affect > behavior. > > > > One interesting aspect is that we're not indexing any content at the > > moment. The rate of ingress is something like 10 to 20 documents per day. > > > > So my guess is that ZK simply is deciding that these servers are dead > based > > on the fact that responses are so very sluggish. > > > > You've mentioned lots of timeouts, but are there any settings which > control > > the number of available threads? Or is this something which is largely > > handled automagically? > > > > Many thanks! > > > > -Dave > > > > On Wed, Dec 28, 2016 at 11:56 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > > > Dave: > > > > > > There are at least 4 timeouts (not even including ZK) that can > > > be relevant, defined in solr.xml: > > > socketTimeout > > > connTimeout > > > distribUpdateConnTimeout > > > distribUpdateSoTimeout > > > > > > Plus the ZK timeout > > > zkClientTimeout > > > > > > Plus the ZK configurations. > > > > > > So it would help narrow down what's going on if we knew why the nodes > > > dropped out. There are indeed a lot of messages dumped, but somewhere > > > in the logs there should be a root cause. > > > > > > You might see Leader Initiated Recovery (LIR) which can indicate that > > > an update operation from the leader took too long, the timeouts above > > > can be adjusted in this case. > > > > > > You might see evidence that ZK couldn't get a response from Solr in > > > "too long" and decided it was gone. > > > > > > You might see... > > > > > > One thing I'd look at very closely is GC processing. One of the > > > culprits for this behavior I've seen is a very long GC stop-the-world > > > pause leading to ZK thinking the node is dead and tripping this chain. > > > Depending on the timeouts, "very long" might be a few seconds. > > > > > > Not entirely helpful, but until you pinpoint why the node goes into > > > recovery it's throwing darts at the wall. GC and log messages might > > > give some insight into the root cause. > > > > > > Best, > > > Erick > > > > > > On Wed, Dec 28, 2016 at 8:26 AM, Dave Seltzer <dselt...@tveyes.com> > > wrote: > > > > Hello Everyone, > > > > > > > > I'm working on a Solr Cloud cluster which is used in a hash matching > > > > application. > > > > > > > > For performance reasons we've opted to batch-execute hash matching > > > queries. > > > > This means that a single query will contain many nested queries. As > you > > > > might expect, these queries take a while to execute. (On the order > of 5 > > > to > > > > 10 seconds.) > > > > > > > > I've noticed that Solr will act erratically when we send too many > > > > long-running queries. Specifically, heavily-loaded servers will > > > repeatedly > > > > fall out of the cluster and then recover. My theory is that there's > > some > > > > limit on the number of concurrent connections and that client queries > > are > > > > preventing zookeeper related queries... but I'm not sure. I've > > increased > > > > ZKClientTimeout to combat this. > > > > > > > > My question is: What configuration settings should I be looking at in > > > order > > > > to make sure I'm maximizing the ability of Solr to handle concurrent > > > > requests. > > > > > > > > Many thanks! > > > > > > > > -Dave > > > > > >