That's what I think is happening too. The problem is the code is not expecting it to happen and not handling it correctly. I'm wondering if there's a way to reset it.
On Tue, Dec 3, 2024 at 3:28 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > Didn’t look at the code but from the number of digits wouldn’t it be a long > wrapping around into negative territory? > > On Tue 3 Dec 2024 at 02:55, Patrick Lok <patrick....@salesforce.com > .invalid> > wrote: > > > Hi, > > > > We are seeing some weird issues with the Overseer ID which causes some > > overseer election problems in our cluster. > > > > Recently we have noticed that one of our Solr 8 clusters is having > trouble > > electing dedicated overseer hosts as leader. After some investigation, we > > noticed that we are having "negative" Overseer ID (Overseer ID with > leading > > dash" > > > > [zk: localhost:2181(CONNECTED) 0] ls /overseer_elect/election > > [-5188057493699159958-1.1.1.15:8983_solr-n_0000192189, > > -5260098076001480373- > > 1.1.1.19:8983_solr-n_0000192192, > > -5548288611309897871-1.1.1.28:8983_solr-n_0000192191, > > -6124715353171356222-1.1.1.18:8983_solr-n_0000192188, > -6412935227404643144- > > 1.1.1.22:8983_solr-n_0000192186, > > -6412935227404648050-1.1.1.89:8983_solr-n_0000192181, > > -6557083032988176767-1.1.1.105:8983_solr-n_0000192190, > > -6701159159471144532- > > 1.1.1.219:8983_solr-n_0000192183] > > > > > > (the actual IP addresses are different from what pasted above) > > > > Because of the leading dash in the Overseer ID, it causes the > > LeaderElector.getNodeName() to return "5188057493699159958-1.1.1.15 > > :8983_solr" instead "1.1.1.15:8983_solr" causing quite a bit of issues. > > > > Does anyone know why we started seeing a leading dash with the initial > set > > of digits in the Overseer ID? Who's generating that set of digits? Solr > or > > ZooKeeper? Is there a way to fix it? > > > > A simple change to LeaderElector.NODE_NAME seems to be an easy fix. But > > since there's no unit test around it, I'm a bit worried that it might > break > > somewhere else in the code. > > > > Thanks, > > Patrick > > >