So I'm flipping through the SOLR source code now (and based on the logs), and this is what I'm seeing.
The SolrZkClient takes a startUpTimeOut and a startUpZkTimeOut property, and only checks if ZK is available within that period. Once that timeout has exceeded, then it declares that the SOLR node was unable to load the cores, and then it does nothing else. Subsequent incoming requests (like /solr/admin/info/system) then check for the CoreContainer state, and if that's not in a good state (and it won't be if ZK wasn't available at startup), then it'd just fail the request, and do nothing else. So what I'm wondering now: - if the reason why Solr is unable to start up is due to a ZK issue, should it attempt to try again? Especially if the problem is related to a Zk connectivity issue? - if SOLR's CoreContainer isn't initialised - vs is "shutting down" - should it actually attempt to try to initialise it again? Essentially handling an "isn't inialized" behaviour differently from a "shutting down"? And does this mean that the answer to any SOLR initialisation errors is actually "restart SOLR"? Thanks! On Wed, Sep 14, 2022 at 11:48 AM Jonathan Tan <[email protected]> wrote: > Oops, good question. > It's ZK 3.8, and Solr 8.11. > > And no, we're allowing kube to specify the IP addresses of the pods, and > kube is unlikely to be using the same IPs. > > Does SOLR resolve domain names for ZK and store the IP addresses...? > > > > On Wed, Sep 14, 2022 at 11:32 AM Shawn Heisey > <[email protected]> wrote: > >> On 9/13/22 19:26, Jonathan Tan wrote: >> > So what I'm trying to verify... >> > It looks like SOLR doesn't attempt to reconnect to ZK if it has >> previously >> > failed. Is that intentional? Is there a way to get it to do so? >> >> What version of Solr, and what version of ZK? >> >> And something that may be important ... when you scale the ZK back up to >> 3 nodes, do they all have the same IP addresses they did before scaling >> back? >> >> Thanks, >> Shawn >> >>
