Thanks Alessandro.
We found this Jira ticket that may be the root cause of this issue:
https://issues.apache.org/jira/browse/SOLR-14356
I'm not sure whether it is the reason of the leader election initially
failing, but it prevents Solr from exiting this error loop.
Le mer. 13 janv. 2021 à
I faced these problems a while ago, but at the time I created a blog post
which I hope could help:
https://sease.io/2018/05/solrcloud-leader-election-failing.html
-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent
Sorry I missed this detail.
We are running Solr 8.2.
Thanks
Le mar. 12 janv. 2021 à 16:46, Phill Campbell
a écrit :
> Which version of Apache Solr?
>
> > On Jan 12, 2021, at 8:36 AM, Pierre Salagnac
> wrote:
> >
> > Hello,
> > We had a stuck leader elec
Which version of Apache Solr?
> On Jan 12, 2021, at 8:36 AM, Pierre Salagnac
> wrote:
>
> Hello,
> We had a stuck leader election for a shard.
>
> We have collections with 2 shards, each shard has 5 replicas. We have many
> collections but the issue happened for a sin
AM Pierre Salagnac
wrote:
>
> Hello,
> We had a stuck leader election for a shard.
>
> We have collections with 2 shards, each shard has 5 replicas. We have many
> collections but the issue happened for a single shard. Once all host
> restarts completed, this shard was stuc
Hello,
We had a stuck leader election for a shard.
We have collections with 2 shards, each shard has 5 replicas. We have many
collections but the issue happened for a single shard. Once all host
restarts completed, this shard was stuck with one replica is "recovery"
state and all othe
Are there any significant (or not so significant) changes? I have browsed the
release notes and searched JIRA, but the latest news seems to be in 7.3 (where
the old Leader-In-Recovery logic was replaced).
Context:
We are currently running Solr 7.4 in production. In the past year, we’ve seen
t
> Looking this up i found SOLR-5692, but that was solved a lifetime ago,
It wasn't.
https://issues.apache.org/jira/browse/SOLR-5692?focusedCommentId=14556876&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14556876
On Wed, Aug 21, 2019 at 1:29 PM Markus Jelsma
wr
Hello,
Looking this up i found SOLR-5692, but that was solved a lifetime ago, so just
checking if this is a familiar error and one i missing in Jira:
A client's Solr 8.2.0 cluster brought us the next StackOverflowError while
running 8.2.0 on Java 8:
Exception in thread "coreZkRegister-1-thread
Using property legacyCloud=true, coreNodeNames are well written by Solr
in core.properties file.
We are wondering if the problem comes from our configuration or the
bugfix https://issues.apache.org/jira/browse/SOLR-11503 ?
_*Without legacyCloud=true:*_
> Our configuration before Solr start:
Hello,
We are trying to upgrade from Solr 6.6 to Solr 7.2.1 and we are using Solr
Cloud.
Doing some tests with 2 replicas, ZooKeeper doesn't know which one to elect as
a leader:
ERROR org.apache.solr.cloud.ZkController:getLeader:1206 - Error getting leader
from zk
org.apache.solr.common.Solr
Thanks Shawn. Yes I did index some docs after moving to 6.4.0. The release
notes did not mention anything about format being changed so I thought it
would be backward compatible. Yeah my only recourse is to re-index data.
Apart from that it was weird problems overall with 6.4.0. I was excited
about
On 2/2/2017 7:23 AM, Ravi Solr wrote:
> When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
> throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1
>
> Could not load codec 'Lucene62'. Did you forget to add
> lucene-backward-codecs.jar?
> at org.apache.
Thanks Hendrik. Iam baffled as to why I did not hit this issue prior to
moving to 6.4.0.
On Thu, Feb 2, 2017 at 7:58 AM, Hendrik Haddorp
wrote:
> Might be that your overseer queue overloaded. Similar to what is described
> here:
> https://support.lucidworks.com/hc/en-us/articles/203959903-
> Bri
When i try to rollback from 6.4.0 to my original version of 6.0.1 it now
throws another issue. Now I cant go to 6.4.0 nor can I roll back to 6.0.1
Could not load codec 'Lucene62'. Did you forget to add
lucene-backward-codecs.jar?
at org.apache.lucene.index.SegmentInfos.readCodec(SegmentInfos.
Might be that your overseer queue overloaded. Similar to what is
described here:
https://support.lucidworks.com/hc/en-us/articles/203959903-Bringing-up-downed-Solr-servers-that-don-t-want-to-come-up
If the overseer queue gets too long you get hit by this:
https://github.com/Netflix/curator/wiki/
Following up on my previous email, the intermittent server unavailability
seems to be linked to the interaction between Solr and Zookeeper. Can
somebody help me understand what this error means and how to recover from
it.
2017-02-02 09:44:24.648 ERROR
(recoveryExecutor-3-thread-16-processing-n:xx.
Hello,
Yesterday I upgraded from 6.0.1 to 6.4.0, its been straight 12
hours of debugging spree!! Can somebody kindly help me out of this misery.
I have a set has 8 single shard collections with 3 replicas. As soon as I
updated the configs and started the servers one of my collection got
like a leader
election issue?
2016-07-29 06:52:48.610 ERROR (coreZkRegister-1-thread-32-processing-s:shard2
x:tCollection_shard2_replica4 c:tCollection n:tsolr.prod2.xxx.com:8983_solr
r:core_node6) [c:tCollection s:shard2 r:core_node6
x:tCollection_shard2_replica4] o.a.s.c.ZkController Error
past,we have seen
> whenever the one of the boxes is leader in solrcloud,the performance seems
> to be really good. However the leader election changes from time to time and
> most of the time the cloud boxes seem to process most of the traffic
> Currently our solrcloud looks somethin
to be really good. However the leader election changes from time to time and
most of the time the cloud boxes seem to process most of the traffic
Currently our solrcloud looks something like this
Physical Box 1
X ->shard 1 Clou
We get this error on one of our nodes:
Caused by: org.apache.solr.common.SolrException: There is conflicting
information about the leader of shard: shard2 our state says:
http://server01:8983/solr/collection/ but zookeeper says:
http://server02:8983/collection/
Then I noticed this in the log:
]
Hi,
I have 2 shards, 1 leader and 1 replica in each.
I've just removed a leader from one of the shards but the replica hasn't
become a leader yet.
How quickly should this normally happen?
tickTime=2000
dataDir=/home/rob/zoodata
clientPort=2181
initLimit=5
syncLimit=2
Thanks,
Rob
Thanks for adding that to our collective knowledge store!
On Thu, Oct 22, 2015 at 2:44 AM, Arcadius Ahouansou
wrote:
> The leader election issue we were having was solved by passing
>
> -Djava.net.preferIPv4Stack=true
>
> to zookeeper startup script
>
> It seems our Li
The leader election issue we were having was solved by passing
-Djava.net.preferIPv4Stack=true
to zookeeper startup script
It seems our Linux servers have IPv6 enabled but we have no IPv6 network.
Hope this helps others.
Arcadius.
On 4 September 2015 at 04:57, Arcadius Ahouansou
wrote
We have a quorum of 3 ZK nodes zk1, zk2 and zk3.
All nodes are identicals.
After multiple restart of the ZK nodes, always keeping the majority of 2,
we have noticed that the node zk1 has never become the leader.
Only zk2 and zk3 become leader.
1) Is there any know reason or possible misconfigurat
re down.
> I look in the logs I can see problems of leader election, eg:
> - Checking if I (core = test339_shard1_replica1, coreNodeName =
> core_node5) shoulds try and be the leader.
> - Cloud says we are still state leader.
>
> I feel that all server pass the buck!
>
>
, all my collections are down.
I look in the logs I can see problems of leader election, eg:
- Checking if I (core = test339_shard1_replica1, coreNodeName =
core_node5) shoulds try and be the leader.
- Cloud says we are still state leader.
I feel that all server pass the buck!
I do not
Hi Mike,
Yes, please open a new Jira issue and attach your patch there. We can
discuss more on the issue.
On Tue, Jul 28, 2015 at 11:40 AM, Michael Roberts wrote:
> Hey,
>
> I am encountering an issue which looks a lot like
> https://issues.apache.org/jira/browse/SOLR-6763.
>
> However, it seem
Hey,
I am encountering an issue which looks a lot like
https://issues.apache.org/jira/browse/SOLR-6763.
However, it seems like the fix for that does not address the entire problem.
That fix will only work if we hit the zkClient.getChildren() call before the
reconnect logic has finished reconne
Please, please, please do _not_ try to use core discovery to add new
replicas by manually editing stuff.
bq: and my deployment tools create an empty core on newly provisioned machines.
This is a really bad idea (as you have discovered). Basically, your
deployment tools have to do everything righ
Hi,
I have a SolrCloud setup, running 4.10.3. The setup consists of several cores,
each with a single shard and initially each shard has a single replica (so,
basically, one machine). I am using core discovery, and my deployment tools
create an empty core on newly provisioned machines.
The sce
Restarting the node cleared out the problem and everything recovered.
Thanks!
On 5/21/15 5:42 AM, Ramkumar R. Aiyengar wrote:
This shouldn't happen, but if it does, there's no good way currently for
Solr to automatically fix it. There are a couple of issues being worked on
to do that currently.
This shouldn't happen, but if it does, there's no good way currently for
Solr to automatically fix it. There are a couple of issues being worked on
to do that currently. But till then, your best bet is to restart the node
which you expect to be the leader (you can look at ZK to see who is at the
he
My SolrCloud cluster isn't reassigning the collections leaders from
downed cores--the downed cores are still listed as the leaders. The
cluster has been in the state for a few hours and the logs continue to
report "No registered leader was found after waiting for 4000ms." Is
there a way to forc
, Oct 13, 2014 at 9:33 PM, sachinpkale wrote:
> Thanks for the info. I will wait for the next release then. Will it come with
> 4.10.2?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Manual-leader-election-in-SolrCloud-tp4164047p4164115.htm
Thanks for the info. I will wait for the next release then. Will it come with
4.10.2?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Manual-leader-election-in-SolrCloud-tp4164047p4164115.html
Sent from the Solr - User mailing list archive at Nabble.com.
Not to my knowledge. There's quite a bit of work going on around
leader balancing, see the umbrella issue at
https://issues.apache.org/jira/browse/SOLR-6491.
That work won't quite do what you want in the sense that you can't say
"nodeX you become the leader" though. The way that set of operations
Is it possible to elect the leader manually in SOLR Cloud 4.10.1?
-Sachin-
We have to fix that then.
--
Mark Miller
about.me/markrmiller
On April 15, 2014 at 12:20:03 PM, Rich Mayfield (mayfield.r...@gmail.com) wrote:
I see something similar where, given ~1000 shards, both nodes spend a LOT of
time sorting through the leader election process. Roughly 30 minutes
I see something similar where, given ~1000 shards, both nodes spend a LOT of
time sorting through the leader election process. Roughly 30 minutes.
I too am wondering - if I force all leaders onto one node, then shut down both,
then start up the node with all of the leaders on it first, then
cloud, i run into scenarios where both the
> > replicas for a shard get into "recovering" state and never come up
> causing
> > the error "No servers hosting this shard". To fix this, I either unload
> one
> > core or restart one of the nodes again so that one of them
gt; the error "No servers hosting this shard". To fix this, I either unload one
> core or restart one of the nodes again so that one of them becomes the
> leader.
>
> Is there a way to "force" leader election for a shard for solrcloud? Is
> there a way to break ti
one of them becomes the
leader.
Is there a way to "force" leader election for a shard for solrcloud? Is
there a way to break ties automatically (without restarting nodes) to make
a node as the leader for the shard?
Thanks
Nitin
I can confirm i've seen this issue as well on trunk, a very recent build.
-Original message-
> From:Elodie Sannier
> Sent: Monday 9th December 2013 16:43
> To: solr-user@lucene.apache.org
> Cc: search5t...@lists.kelkoo.com
> Subject: SolrCloud 4.6.0 - leader elect
.ShardLeaderElectionContext:runLeaderProcess:251 - I am
the new leader:
http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/
shard1
Is it a bug with the leader election ?
This problem does not occur :
- with the version 4.5.1.
- or if I start the four solr instances wit
java:219)
No leader means we can't index data because a 503 http status code is
returned.
Is this the normal behaviour or a bug?
-
Best regards
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-fails-in-some-point-tp4096514.html
Sent from the Solr -
No exceptions. And leaderVoteWait value will be used only during startup rite
? A new leader will be elected once the leader node is down. Am i right ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-tp4086259p4086290.html
Sent from the Solr - User mailing
Any exceptions in the logs of other replicas. The default
leaderVoteWait time is 3 minutes after which a leader election should
have been initiated automatically.
On Fri, Aug 23, 2013 at 4:01 PM, Srivatsan wrote:
> almost 15 minutes. After that i restarted the entire cluster. I am using s
almost 15 minutes. After that i restarted the entire cluster. I am using solr
4.4 with 1 shard and 3 replicas
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-tp4086259p4086287.html
Sent from the Solr - User mailing list archive at Nabble.com.
ient.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:102)/
>
> after that i checked solr admin page, leader election didnt get t
page, leader election didnt get triggered
for that collection.
<http://lucene.472066.n3.nabble.com/file/n4086259/Screenshot.png>
I couldnt able to index for that collection but i can able to search from
that collection.
Help me in this issue
Thanks in advance
Srivatsan
--
Vie
bq:why does it replicate all the index instead of copying just the
newer formed segments
because there's no guarantee that the segments are identical on the
nodes that make up a shard. The simplest way to conceptualize this
is to consider the autocommit settings on the servers Let's say
the hard c
Yes, I have erased the tlog in replica 2 and it appears that the the first
replica's tlog was corrupted because of an ungracefull servlet shutdown.
There was no log for it unfortunately, neither the zookeeper log logged
anything about this. Is there a a place I could check in the zookeeper what
exa
e first bulk replications worked well, but after a while an internal
>> script pkilled all the solr instances, some while replicating. After
>> starting back the servlet I discovered the disaster - on part of the
>> replicas that were in a replicating stage there was a wrong z
cating. After
> starting back the servlet I discovered the disaster - on part of the
> replicas that were in a replicating stage there was a wrong zookeeper
> leader election - good state replicas (sub-cluster 1) replicated from empty
> replicas (sub-cluster 2) ending up in removing all docu
wrong zookeeper
leader election - good state replicas (sub-cluster 1) replicated from empty
replicas (sub-cluster 2) ending up in removing all documents in these
shards!!
These are the logs from solr-prod32 (sub cluster #2 - bad state) - the
shard1_replica1 is elected to be leader although it was not b
a leader.
>
> My question is why Zookeeper takes this behavior. Shouldn't it distribute
> leaders? If i deliver some stress to a double-leader instance, is Zookeeper
> going to run an election?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html
> Sent from the Solr - User mailing list archive at Nabble.com.
deliver some stress to a double-leader instance, is Zookeeper
> going to run an election?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
ders? If i deliver some stress to a double-leader instance, is Zookeeper
going to run an election?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-Election-when-tp4077381.html
Sent from the Solr - User mailing list archive at Nabble.com.
https://issues.apache.org/jira/browse/SOLR-4900
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-leader-in-4-2-1-tp4067988p4068238.html
Sent from the Solr - User mailing list archive at Nabble.com.
:49 PM org.apache.catalina.startup.HostConfig
> > deployDirectory
> > INFO: Deploying web application directory ROOT
> > May 28, 2013 5:34:49 PM org.apache.coyote.http11.Http11AprProtocol
> start
> > *INFO: Starting Coyote HTTP/1.1 on http-8080 #<--
&
668324 ms #<-- 668 sec
> = 11 minutes to start Catalina.*
>
> Our Workaround:
>
> * We changed our script to allow 15 seconds before kill -9.
> * Also, we no longer do a restart. We just stop the leader and wait for a
> new leader. This
> still results in a few "No registered leader was found" exceptions, but at
> least the duration is short.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-leader-in-4-2-1-tp4067988.html
> Sent from the Solr - User mailing list archive at Nabble.com.
restart. We just stop the leader and wait for a
new leader. This
still results in a few "No registered leader was found" exceptions, but at
least the duration is short.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Leader-election-deadlock-after-restarting-
g
> SEVERE: Error while trying to recover.
> core=collection1:org.apache.solr.common.SolrException: No registered leader
> was found, collection:collection1 slice:shard1
> at
> org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:413)
> at
> org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:399)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318)
> at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node-tp4015804.html
> Sent from the Solr - User mailing list archive at Nabble.com.
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node-tp4015804.html
Sent from the Solr - User mailing list archive at Nabble.com.
66 matches
Mail list logo