Re: leader election stuck after hosts restarts

2021-01-22 Thread Pierre Salagnac
Thanks Alessandro. We found this Jira ticket that may be the root cause of this issue: https://issues.apache.org/jira/browse/SOLR-14356 I'm not sure whether it is the reason of the leader election initially failing, but it prevents Solr from exiting this error loop. Le mer. 13 janv. 2021 à

Re: leader election stuck after hosts restarts

2021-01-13 Thread Alessandro Benedetti
I faced these problems a while ago, but at the time I created a blog post which I hope could help: https://sease.io/2018/05/solrcloud-leader-election-failing.html - --- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director Sease Ltd. - www.sease.io -- Sent

Re: leader election stuck after hosts restarts

2021-01-12 Thread Pierre Salagnac
Sorry I missed this detail. We are running Solr 8.2. Thanks Le mar. 12 janv. 2021 à 16:46, Phill Campbell a écrit : > Which version of Apache Solr? > > > On Jan 12, 2021, at 8:36 AM, Pierre Salagnac > wrote: > > > > Hello, > > We had a stuck leader elec

Re: leader election stuck after hosts restarts

2021-01-12 Thread Phill Campbell
Which version of Apache Solr? > On Jan 12, 2021, at 8:36 AM, Pierre Salagnac > wrote: > > Hello, > We had a stuck leader election for a shard. > > We have collections with 2 shards, each shard has 5 replicas. We have many > collections but the issue happened for a sin

Re: leader election stuck after hosts restarts

2021-01-12 Thread matthew sporleder
AM Pierre Salagnac wrote: > > Hello, > We had a stuck leader election for a shard. > > We have collections with 2 shards, each shard has 5 replicas. We have many > collections but the issue happened for a single shard. Once all host > restarts completed, this shard was stuc

leader election stuck after hosts restarts

2021-01-12 Thread Pierre Salagnac
Hello, We had a stuck leader election for a shard. We have collections with 2 shards, each shard has 5 replicas. We have many collections but the issue happened for a single shard. Once all host restarts completed, this shard was stuck with one replica is "recovery" state and all othe

Solr8 improvements to SolrCloud leader election

2020-06-02 Thread Danny Shih
Are there any significant (or not so significant) changes? I have browsed the release notes and searched JIRA, but the latest news seems to be in 7.3 (where the old Leader-In-Recovery logic was replaced). Context: We are currently running Solr 7.4 in production. In the past year, we’ve seen t

Re: StackOverflowError leader election on 8.2.0

2019-08-21 Thread Mikhail Khludnev
> Looking this up i found SOLR-5692, but that was solved a lifetime ago, It wasn't. https://issues.apache.org/jira/browse/SOLR-5692?focusedCommentId=14556876&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14556876 On Wed, Aug 21, 2019 at 1:29 PM Markus Jelsma wr

StackOverflowError leader election on 8.2.0

2019-08-21 Thread Markus Jelsma
Hello, Looking this up i found SOLR-5692, but that was solved a lifetime ago, so just checking if this is a familiar error and one i missing in Jira: A client's Solr 8.2.0 cluster brought us the next StackOverflowError while running 8.2.0 on Java 8: Exception in thread "coreZkRegister-1-thread

Re: SolrCloud 7.2 problem with leader election

2018-04-04 Thread Gael Jourdan-Weil
Using property legacyCloud=true, coreNodeNames are well written by Solr in core.properties file. We are wondering if the problem comes from our configuration or the bugfix https://issues.apache.org/jira/browse/SOLR-11503 ? _*Without legacyCloud=true:*_ > Our configuration before Solr start:

SolrCloud 7.2 problem with leader election

2018-04-03 Thread Gael Jourdan-Weil
Hello, We are trying to upgrade from Solr 6.6 to Solr 7.2.1 and we are using Solr Cloud. Doing some tests with 2 replicas, ZooKeeper doesn't know which one to elect as a leader: ERROR org.apache.solr.cloud.ZkController:getLeader:1206 - Error getting leader from zk org.apache.solr.common.Solr

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Shawn. Yes I did index some docs after moving to 6.4.0. The release notes did not mention anything about format being changed so I thought it would be backward compatible. Yeah my only recourse is to re-index data. Apart from that it was weird problems overall with 6.4.0. I was excited about

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Shawn Heisey
On 2/2/2017 7:23 AM, Ravi Solr wrote: > When i try to rollback from 6.4.0 to my original version of 6.0.1 it now > throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1 > > Could not load codec 'Lucene62'. Did you forget to add > lucene-backward-codecs.jar? > at org.apache.

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Thanks Hendrik. Iam baffled as to why I did not hit this issue prior to moving to 6.4.0. On Thu, Feb 2, 2017 at 7:58 AM, Hendrik Haddorp wrote: > Might be that your overseer queue overloaded. Similar to what is described > here: > https://support.lucidworks.com/hc/en-us/articles/203959903- > Bri

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
When i try to rollback from 6.4.0 to my original version of 6.0.1 it now throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1 Could not load codec 'Lucene62'. Did you forget to add lucene-backward-codecs.jar? at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Hendrik Haddorp
Might be that your overseer queue overloaded. Similar to what is described here: https://support.lucidworks.com/hc/en-us/articles/203959903-Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up If the overseer queue gets too long you get hit by this: https://github.com/Netflix/curator/wiki/

Re: 6.4.0 collection leader election and recovery issues

2017-02-02 Thread Ravi Solr
Following up on my previous email, the intermittent server unavailability seems to be linked to the interaction between Solr and Zookeeper. Can somebody help me understand what this error means and how to recover from it. 2017-02-02 09:44:24.648 ERROR (recoveryExecutor-3-thread-16-processing-n:xx.

6.4.0 collection leader election and recovery issues

2017-02-01 Thread Ravi Solr
Hello, Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12 hours of debugging spree!! Can somebody kindly help me out of this misery. I have a set has 8 single shard collections with 3 replicas. As soon as I updated the configs and started the servers one of my collection got

Collection going to recovery mode - Leader election issue?

2016-08-02 Thread Aswath Srinivasan (TMS)
like a leader election issue? 2016-07-29 06:52:48.610 ERROR (coreZkRegister-1-thread-32-processing-s:shard2 x:tCollection_shard2_replica4 c:tCollection n:tsolr.prod2.xxx.com:8983_solr r:core_node6) [c:tCollection s:shard2 r:core_node6 x:tCollection_shard2_replica4] o.a.s.c.ZkController Error

Re: Overriding SolrCloud Leader Election and manually assign leadership?-Is it possible?

2016-03-24 Thread Erick Erickson
past,we have seen > whenever the one of the boxes is leader in solrcloud,the performance seems > to be really good. However the leader election changes from time to time and > most of the time the cloud boxes seem to process most of the traffic > Currently our solrcloud looks somethin

Overriding SolrCloud Leader Election and manually assign leadership?-Is it possible?

2016-03-24 Thread ram
to be really good. However the leader election changes from time to time and most of the time the cloud boxes seem to process most of the traffic Currently our solrcloud looks something like this Physical Box 1 X ->shard 1 Clou

Leader election issues after upgrade from 4.10.4 to 5.4.1

2016-02-08 Thread Mike Thomsen
We get this error on one of our nodes: Caused by: org.apache.solr.common.SolrException: There is conflicting information about the leader of shard: shard2 our state says: http://server01:8983/solr/collection/ but zookeeper says: http://server02:8983/collection/ Then I noticed this in the log: ]

Leader Election Time

2016-01-15 Thread Robert Brown
Hi, I have 2 shards, 1 leader and 1 replica in each. I've just removed a leader from one of the shards but the replica hasn't become a leader yet. How quickly should this normally happen? tickTime=2000 dataDir=/home/rob/zoodata clientPort=2181 initLimit=5 syncLimit=2 Thanks, Rob

Re: Zookeeper Quorum leader election

2015-10-22 Thread Erick Erickson
Thanks for adding that to our collective knowledge store! On Thu, Oct 22, 2015 at 2:44 AM, Arcadius Ahouansou wrote: > The leader election issue we were having was solved by passing > > -Djava.net.preferIPv4Stack=true > > to zookeeper startup script > > It seems our Li

Re: Zookeeper Quorum leader election

2015-10-22 Thread Arcadius Ahouansou
The leader election issue we were having was solved by passing -Djava.net.preferIPv4Stack=true to zookeeper startup script It seems our Linux servers have IPv6 enabled but we have no IPv6 network. Hope this helps others. Arcadius. On 4 September 2015 at 04:57, Arcadius Ahouansou wrote

Zookeeper Quorum leader election

2015-09-03 Thread Arcadius Ahouansou
We have a quorum of 3 ZK nodes zk1, zk2 and zk3. All nodes are identicals. After multiple restart of the ZK nodes, always keeping the majority of 2, we have noticed that the node zk1 has never become the leader. Only zk2 and zk3 become leader. 1) Is there any know reason or possible misconfigurat

Re: Leader election

2015-07-29 Thread Timothy Potter
re down. > I look in the logs I can see problems of leader election, eg: > - Checking if I (core = test339_shard1_replica1, coreNodeName = > core_node5) shoulds try and be the leader. > - Cloud says we are still state leader. > > I feel that all server pass the buck! > >

Leader election

2015-07-29 Thread Olivier Damiot
, all my collections are down. I look in the logs I can see problems of leader election, eg: - Checking if I (core = test339_shard1_replica1, coreNodeName = core_node5) shoulds try and be the leader. - Cloud says we are still state leader. I feel that all server pass the buck! I do not

Re: Issue when zookeeper session expires during shard leader election.

2015-07-28 Thread Shalin Shekhar Mangar
Hi Mike, Yes, please open a new Jira issue and attach your patch there. We can discuss more on the issue. On Tue, Jul 28, 2015 at 11:40 AM, Michael Roberts wrote: > Hey, > > I am encountering an issue which looks a lot like > https://issues.apache.org/jira/browse/SOLR-6763. > > However, it seem

Issue when zookeeper session expires during shard leader election.

2015-07-27 Thread Michael Roberts
Hey, I am encountering an issue which looks a lot like https://issues.apache.org/jira/browse/SOLR-6763. However, it seems like the fix for that does not address the entire problem. That fix will only work if we hit the zkClient.getChildren() call before the reconnect logic has finished reconne

Re: Sync failure after shard leader election when adding new replica.

2015-05-26 Thread Erick Erickson
Please, please, please do _not_ try to use core discovery to add new replicas by manually editing stuff. bq: and my deployment tools create an empty core on newly provisioned machines. This is a really bad idea (as you have discovered). Basically, your deployment tools have to do everything righ

Sync failure after shard leader election when adding new replica.

2015-05-26 Thread Michael Roberts
Hi, I have a SolrCloud setup, running 4.10.3. The setup consists of several cores, each with a single shard and initially each shard has a single replica (so, basically, one machine). I am using core discovery, and my deployment tools create an empty core on newly provisioned machines. The sce

Re: SolrCloud Leader Election

2015-05-22 Thread Ryan Steele
Restarting the node cleared out the problem and everything recovered. Thanks! On 5/21/15 5:42 AM, Ramkumar R. Aiyengar wrote: This shouldn't happen, but if it does, there's no good way currently for Solr to automatically fix it. There are a couple of issues being worked on to do that currently.

Re: SolrCloud Leader Election

2015-05-21 Thread Ramkumar R. Aiyengar
This shouldn't happen, but if it does, there's no good way currently for Solr to automatically fix it. There are a couple of issues being worked on to do that currently. But till then, your best bet is to restart the node which you expect to be the leader (you can look at ZK to see who is at the he

SolrCloud Leader Election

2015-05-20 Thread Ryan Steele
My SolrCloud cluster isn't reassigning the collections leaders from downed cores--the downed cores are still listed as the leaders. The cluster has been in the state for a few hours and the logs continue to report "No registered leader was found after waiting for 4000ms." Is there a way to forc

Re: Manual leader election in SolrCloud

2014-10-13 Thread Erick Erickson
, Oct 13, 2014 at 9:33 PM, sachinpkale wrote: > Thanks for the info. I will wait for the next release then. Will it come with > 4.10.2? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Manual-leader-election-in-SolrCloud-tp4164047p4164115.htm

Re: Manual leader election in SolrCloud

2014-10-13 Thread sachinpkale
Thanks for the info. I will wait for the next release then. Will it come with 4.10.2? -- View this message in context: http://lucene.472066.n3.nabble.com/Manual-leader-election-in-SolrCloud-tp4164047p4164115.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Manual leader election in SolrCloud

2014-10-13 Thread Erick Erickson
Not to my knowledge. There's quite a bit of work going on around leader balancing, see the umbrella issue at https://issues.apache.org/jira/browse/SOLR-6491. That work won't quite do what you want in the sense that you can't say "nodeX you become the leader" though. The way that set of operations

Manual leader election in SolrCloud

2014-10-13 Thread Sachin Kale
Is it possible to elect the leader manually in SOLR Cloud 4.10.1? -Sachin-

Re: Race condition in Leader Election

2014-04-15 Thread Mark Miller
We have to fix that then. --  Mark Miller about.me/markrmiller On April 15, 2014 at 12:20:03 PM, Rich Mayfield (mayfield.r...@gmail.com) wrote: I see something similar where, given ~1000 shards, both nodes spend a LOT of time sorting through the leader election process. Roughly 30 minutes

Race condition in Leader Election

2014-04-15 Thread Rich Mayfield
I see something similar where, given ~1000 shards, both nodes spend a LOT of time sorting through the leader election process. Roughly 30 minutes. I too am wondering - if I force all leaders onto one node, then shut down both, then start up the node with all of the leaders on it first, then

Re: Race condition in Leader Election

2014-03-06 Thread KNitin
cloud, i run into scenarios where both the > > replicas for a shard get into "recovering" state and never come up > causing > > the error "No servers hosting this shard". To fix this, I either unload > one > > core or restart one of the nodes again so that one of them

Re: Race condition in Leader Election

2014-03-06 Thread Mark Miller
gt; the error "No servers hosting this shard". To fix this, I either unload one > core or restart one of the nodes again so that one of them becomes the > leader. > > Is there a way to "force" leader election for a shard for solrcloud? Is > there a way to break ti

Race condition in Leader Election

2014-03-06 Thread KNitin
one of them becomes the leader. Is there a way to "force" leader election for a shard for solrcloud? Is there a way to break ties automatically (without restarting nodes) to make a node as the leader for the shard? Thanks Nitin

RE: SolrCloud 4.6.0 - leader election issue

2013-12-09 Thread Markus Jelsma
I can confirm i've seen this issue as well on trunk, a very recent build. -Original message- > From:Elodie Sannier > Sent: Monday 9th December 2013 16:43 > To: solr-user@lucene.apache.org > Cc: search5t...@lists.kelkoo.com > Subject: SolrCloud 4.6.0 - leader elect

SolrCloud 4.6.0 - leader election issue

2013-12-09 Thread Elodie Sannier
.ShardLeaderElectionContext:runLeaderProcess:251 - I am the new leader: http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/ shard1 Is it a bug with the leader election ? This problem does not occur : - with the version 4.5.1. - or if I start the four solr instances wit

Leader election fails in some point.

2013-10-18 Thread yriveiro
java:219) No leader means we can't index data because a 503 http status code is returned. Is this the normal behaviour or a bug? - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-fails-in-some-point-tp4096514.html Sent from the Solr -

Re: Leader election

2013-08-23 Thread Srivatsan
No exceptions. And leaderVoteWait value will be used only during startup rite ? A new leader will be elected once the leader node is down. Am i right ? -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-tp4086259p4086290.html Sent from the Solr - User mailing

Re: Leader election

2013-08-23 Thread Shalin Shekhar Mangar
Any exceptions in the logs of other replicas. The default leaderVoteWait time is 3 minutes after which a leader election should have been initiated automatically. On Fri, Aug 23, 2013 at 4:01 PM, Srivatsan wrote: > almost 15 minutes. After that i restarted the entire cluster. I am using s

Re: Leader election

2013-08-23 Thread Srivatsan
almost 15 minutes. After that i restarted the entire cluster. I am using solr 4.4 with 1 shard and 3 replicas -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-tp4086259p4086287.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Leader election

2013-08-23 Thread Shalin Shekhar Mangar
ient.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)/ > > after that i checked solr admin page, leader election didnt get t

Leader election

2013-08-23 Thread Srivatsan
page, leader election didnt get triggered for that collection. <http://lucene.472066.n3.nabble.com/file/n4086259/Screenshot.png> I couldnt able to index for that collection but i can able to search from that collection. Help me in this issue Thanks in advance Srivatsan -- Vie

Re: Wrong leader election leads to shard removal

2013-08-16 Thread Erick Erickson
bq:why does it replicate all the index instead of copying just the newer formed segments because there's no guarantee that the segments are identical on the nodes that make up a shard. The simplest way to conceptualize this is to consider the autocommit settings on the servers Let's say the hard c

Re: Wrong leader election leads to shard removal

2013-08-16 Thread Ido Kissos
Yes, I have erased the tlog in replica 2 and it appears that the the first replica's tlog was corrupted because of an ungracefull servlet shutdown. There was no log for it unfortunately, neither the zookeeper log logged anything about this. Is there a a place I could check in the zookeeper what exa

Re: Wrong leader election leads to shard removal

2013-08-14 Thread Mark Miller
e first bulk replications worked well, but after a while an internal >> script pkilled all the solr instances, some while replicating. After >> starting back the servlet I discovered the disaster - on part of the >> replicas that were in a replicating stage there was a wrong z

Re: Wrong leader election leads to shard removal

2013-08-14 Thread Manuel Le Normand
cating. After > starting back the servlet I discovered the disaster - on part of the > replicas that were in a replicating stage there was a wrong zookeeper > leader election - good state replicas (sub-cluster 1) replicated from empty > replicas (sub-cluster 2) ending up in removing all docu

Wrong leader election leads to shard removal

2013-08-14 Thread Manuel Le Normand
wrong zookeeper leader election - good state replicas (sub-cluster 1) replicated from empty replicas (sub-cluster 2) ending up in removing all documents in these shards!! These are the logs from solr-prod32 (sub cluster #2 - bad state) - the shard1_replica1 is elected to be leader although it was not b

Re: Leader Election, when?

2013-07-12 Thread Erick Erickson
a leader. > > My question is why Zookeeper takes this behavior. Shouldn't it distribute > leaders? If i deliver some stress to a double-leader instance, is Zookeeper > going to run an election? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html > Sent from the Solr - User mailing list archive at Nabble.com.

Re: Leader Election, when?

2013-07-12 Thread Furkan KAMACI
deliver some stress to a double-leader instance, is Zookeeper > going to run an election? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html > Sent from the Solr - User mailing list archive at Nabble.com. >

Leader Election, when?

2013-07-11 Thread aabreur
ders? If i deliver some stress to a double-leader instance, is Zookeeper going to run an election? -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Leader election deadlock after restarting leader in 4.2.1

2013-06-04 Thread John Guerrero
https://issues.apache.org/jira/browse/SOLR-4900 -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-leader-in-4-2-1-tp4067988p4068238.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Leader election deadlock after restarting leader in 4.2.1

2013-06-04 Thread John Guerrero
:49 PM org.apache.catalina.startup.HostConfig > > deployDirectory > > INFO: Deploying web application directory ROOT > > May 28, 2013 5:34:49 PM org.apache.coyote.http11.Http11AprProtocol > start > > *INFO: Starting Coyote HTTP/1.1 on http-8080 #<-- &

Re: Leader election deadlock after restarting leader in 4.2.1

2013-06-03 Thread Mark Miller
668324 ms #<-- 668 sec > = 11 minutes to start Catalina.* > > Our Workaround: > > * We changed our script to allow 15 seconds before kill -9. > * Also, we no longer do a restart. We just stop the leader and wait for a > new leader. This > still results in a few "No registered leader was found" exceptions, but at > least the duration is short. > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-leader-in-4-2-1-tp4067988.html > Sent from the Solr - User mailing list archive at Nabble.com.

Leader election deadlock after restarting leader in 4.2.1

2013-06-03 Thread John Guerrero
restart. We just stop the leader and wait for a new leader. This still results in a few "No registered leader was found" exceptions, but at least the duration is short. -- View this message in context: http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-

Re: SolrCloud leader election on single node

2012-10-25 Thread Mark Miller
g > SEVERE: Error while trying to recover. > core=collection1:org.apache.solr.common.SolrException: No registered leader > was found, collection:collection1 slice:shard1 > at > org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:413) > at > org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:399) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node-tp4015804.html > Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud leader election on single node

2012-10-25 Thread AlexeyK
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220) -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node-tp4015804.html Sent from the Solr - User mailing list archive at Nabble.com.