Problem: "pull" replica commits to leader?

2020-09-14 Thread Taisuke Miyazaki
Hi, everyone,
The data is not synchronized to the pull replica from tlog solr node.

We are using a replica of tlog and pull.
We are trying to change the version of solr we use from 7.5.0 to 8.6.2.

Configuration :
The pull node is being started after the tlog node is filled with data.

Problem.
The data is not synchronized to the pull replica.

Looking at the log, it looks like the "pull" replica is trying to commit to
the leader in the doReplicateOnlyRecovery function.
Is this a bug? Or is it set up wrong?

The log looks like this

2020-09-14 09:34:45.576 ERROR
(recoveryExecutor-11-thread-1-processing-n:172.20.17.40:8983_solr
x:mycollection_shard1_replica_p3 c:mycollection s:shard1 r:core_node4)
[c:mycollection s:shard1 r:core_node4 x:mycollection_shard1_replica_p3]
o.a.s.c.RecoveryStrategy Error while trying to
recover:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at
http://172.20.16.100:8983/solr/mycollection_shard1_replica_t1: Thou shall
not issue a commit!
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:681)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:266)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:231)
at
org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:298)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:230)
at
org.apache.solr.cloud.RecoveryStrategy.doReplicateOnlyRecovery(RecoveryStrategy.java:394)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:336)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:317)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:212)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)

I need a solution.
Thank you.


Weak Leader & Weak Replica VS Strong Leader

2020-03-21 Thread SOLR4189
Hi all,

Maybe a tricky question little bit, but I need to ask. Let's say I have
infinite RAM and infinite SSDs, but I have deficiency of CPU (Lets's say 4
CPU for each shard). So, my question is which is more preferable:

1. One leader with 4 CPU

OR

2. One leader with 2 CPU and one replica with 2 CPU

OR

3. One leader with 1 CPU and 3 replicas with 1 CPU each? 

I understand that the options with replicas are more preferable due to fault
tolerance, BUT what about PERFORMANCE theoretically? 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: replica never takes leader role

2015-01-28 Thread Mark Miller
Yes, after 45 seconds a replica should take over as leader. It should
likely explain in the logs of the replica that should be taking over why
this is not happening.

- Mar

On Wed Jan 28 2015 at 2:52:32 PM Joshi, Shital shital.jo...@gs.com wrote:

 When leader reaches 99% physical memory on the box and starts swapping
 (stops replicating), we forcefully bring down leader (first kill -15 and
 then kill -9 if kill -15 doesn't work). This is when we are looking up to
 replica to assume leader's role and it never happens.

 Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and
 test.

 cores adminPath=/admin/cores defaultCoreName=collection1
 host=${host:} hostPort=${jetty.port:8983} 
 hostContext=${hostContext:solr}
 zkClientTimeout=${zkClientTimeout:45000}

 As per definition of zkClientTimeout, After the leader is brought down and
 it doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica
 to leader? I am not sure how increasing zk timeout will help.


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, January 28, 2015 11:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: replica never takes leader role

 This is not the desired behavior at all. I know there have been
 improvements in this area since 4.8, but can't seem to locate the JIRAs.

 I'm curious _why_ the nodes are going down though, is it happening at
 random or are you taking it down? One problem has been that the Zookeeper
 timeout used to default to 15 seconds, and occasionally a node would be
 unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
 the ZK timeout has helped some people avoid this...

 FWIW,
 Erick

 On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital shital.jo...@gs.com
 wrote:

  We're using Solr 4.8.0
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Tuesday, January 27, 2015 7:47 PM
  To: solr-user@lucene.apache.org
  Subject: Re: replica never takes leader role
 
  What version of Solr? This is an ongoing area of improvements and several
  are very recent.
 
  Try searching the JIRA for Solr for details.
 
  Best,
  Erick
 
  On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com
  wrote:
 
   Hello,
  
   We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and
 three
   zookeeper instances. We have noticed that when a leader node goes down
  the
   replica never takes over as a leader, cloud becomes unusable and we
 have
  to
   bounce entire cloud for replica to assume leader role. Is this default
   behavior? How can we change this?
  
   Thanks.
  
  
  
 



RE: replica never takes leader role

2015-01-28 Thread Joshi, Shital
We're using Solr 4.8.0


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, January 27, 2015 7:47 PM
To: solr-user@lucene.apache.org
Subject: Re: replica never takes leader role

What version of Solr? This is an ongoing area of improvements and several
are very recent.

Try searching the JIRA for Solr for details.

Best,
Erick

On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Hello,

 We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
 zookeeper instances. We have noticed that when a leader node goes down the
 replica never takes over as a leader, cloud becomes unusable and we have to
 bounce entire cloud for replica to assume leader role. Is this default
 behavior? How can we change this?

 Thanks.





Re: replica never takes leader role

2015-01-28 Thread Erick Erickson
This is not the desired behavior at all. I know there have been
improvements in this area since 4.8, but can't seem to locate the JIRAs.

I'm curious _why_ the nodes are going down though, is it happening at
random or are you taking it down? One problem has been that the Zookeeper
timeout used to default to 15 seconds, and occasionally a node would be
unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
the ZK timeout has helped some people avoid this...

FWIW,
Erick

On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital shital.jo...@gs.com wrote:

 We're using Solr 4.8.0


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, January 27, 2015 7:47 PM
 To: solr-user@lucene.apache.org
 Subject: Re: replica never takes leader role

 What version of Solr? This is an ongoing area of improvements and several
 are very recent.

 Try searching the JIRA for Solr for details.

 Best,
 Erick

 On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Hello,
 
  We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
  zookeeper instances. We have noticed that when a leader node goes down
 the
  replica never takes over as a leader, cloud becomes unusable and we have
 to
  bounce entire cloud for replica to assume leader role. Is this default
  behavior? How can we change this?
 
  Thanks.
 
 
 



RE: replica never takes leader role

2015-01-28 Thread Joshi, Shital
When leader reaches 99% physical memory on the box and starts swapping (stops 
replicating), we forcefully bring down leader (first kill -15 and then kill -9 
if kill -15 doesn't work). This is when we are looking up to replica to assume 
leader's role and it never happens. 

Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and test. 

cores adminPath=/admin/cores defaultCoreName=collection1 host=${host:} 
hostPort=${jetty.port:8983} hostContext=${hostContext:solr} 
zkClientTimeout=${zkClientTimeout:45000}

As per definition of zkClientTimeout, After the leader is brought down and it 
doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica to 
leader? I am not sure how increasing zk timeout will help. 

 
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, January 28, 2015 11:42 AM
To: solr-user@lucene.apache.org
Subject: Re: replica never takes leader role

This is not the desired behavior at all. I know there have been
improvements in this area since 4.8, but can't seem to locate the JIRAs.

I'm curious _why_ the nodes are going down though, is it happening at
random or are you taking it down? One problem has been that the Zookeeper
timeout used to default to 15 seconds, and occasionally a node would be
unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
the ZK timeout has helped some people avoid this...

FWIW,
Erick

On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital shital.jo...@gs.com wrote:

 We're using Solr 4.8.0


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, January 27, 2015 7:47 PM
 To: solr-user@lucene.apache.org
 Subject: Re: replica never takes leader role

 What version of Solr? This is an ongoing area of improvements and several
 are very recent.

 Try searching the JIRA for Solr for details.

 Best,
 Erick

 On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com
 wrote:

  Hello,
 
  We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
  zookeeper instances. We have noticed that when a leader node goes down
 the
  replica never takes over as a leader, cloud becomes unusable and we have
 to
  bounce entire cloud for replica to assume leader role. Is this default
  behavior? How can we change this?
 
  Thanks.
 
 
 



replica never takes leader role

2015-01-27 Thread Joshi, Shital
Hello,

We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three 
zookeeper instances. We have noticed that when a leader node goes down the 
replica never takes over as a leader, cloud becomes unusable and we have to 
bounce entire cloud for replica to assume leader role. Is this default 
behavior? How can we change this?

Thanks. 




Re: replica never takes leader role

2015-01-27 Thread Erick Erickson
What version of Solr? This is an ongoing area of improvements and several
are very recent.

Try searching the JIRA for Solr for details.

Best,
Erick

On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital shital.jo...@gs.com wrote:

 Hello,

 We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
 zookeeper instances. We have noticed that when a leader node goes down the
 replica never takes over as a leader, cloud becomes unusable and we have to
 bounce entire cloud for replica to assume leader role. Is this default
 behavior? How can we change this?

 Thanks.





Re: Replica as a leader

2014-05-19 Thread Erick Erickson
bq: Is there a way that solr can recover without losing docs in this scenario?

Not that I know of currently. SolrCloud is designed to _not_ lose
documents as long
as all leaders are present. And when a leader goes down, assuming
there's a replica
handy docs shouldn't be lost either. But taking down the leader then
starting an out-of-date
replica up and hoping that Solr has somehow magically cached all the
intervening updates
is not a supported scenario. Perhaps SOLR-5468 will help here, I'm not
entirely sure. This
scenario seems out-of-band though.

Best,
Erick

On Sun, May 18, 2014 at 3:12 AM, Anshum Gupta ans...@anshumgupta.net wrote:
 SOLR-5468 https://issues.apache.org/jira/browse/SOLR-5468 might be useful
 for you.


 On Sun, May 18, 2014 at 1:54 AM, adfel70 adfe...@gmail.com wrote:

 *one of the most impotent requirements in my system is not to lose docs and
 not to retrieve part of the data at query time.*

 I expect the replica to wait until the real leader will start or
 at least to sync the real leader with the docs indexed in the replica after
 starting and syncing the replica with the docs that were indexed to the
 leader.

 Is there a way that solr can recover without losing docs in this scenario?

 Thanks.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Replica-as-a-leader-tp4135614p4136729.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --

 Anshum Gupta
 http://www.anshumgupta.net


Re: Replica as a leader

2014-05-18 Thread adfel70
*one of the most impotent requirements in my system is not to lose docs and
not to retrieve part of the data at query time.*

I expect the replica to wait until the real leader will start or 
at least to sync the real leader with the docs indexed in the replica after
starting and syncing the replica with the docs that were indexed to the
leader. 

Is there a way that solr can recover without losing docs in this scenario?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replica-as-a-leader-tp4135614p4136729.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replica as a leader

2014-05-18 Thread Anshum Gupta
SOLR-5468 https://issues.apache.org/jira/browse/SOLR-5468 might be useful
for you.


On Sun, May 18, 2014 at 1:54 AM, adfel70 adfe...@gmail.com wrote:

 *one of the most impotent requirements in my system is not to lose docs and
 not to retrieve part of the data at query time.*

 I expect the replica to wait until the real leader will start or
 at least to sync the real leader with the docs indexed in the replica after
 starting and syncing the replica with the docs that were indexed to the
 leader.

 Is there a way that solr can recover without losing docs in this scenario?

 Thanks.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Replica-as-a-leader-tp4135614p4136729.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Anshum Gupta
http://www.anshumgupta.net


Re: Replica as a leader

2014-05-16 Thread Erick Erickson
1. Indexing 100-200 docs per second.
2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while
indexing).
3. Indexing for 10-20 minutes and doing hard commit.
4. Doing Pkill -9 java to the leader and then starting one replica in shard
3 (while indexing).

I think you're in uncharted territory. By only having the leader
running, indexing docs to it, then killing it, there's no way for one
of the restarted followers to know what docs were indexed. Eventually
the follower will become the leader and the docs are just lost.
Updates are NOT stored on ZK for instance.

Why do you expect the machines to stay in down status? SolrCloud is
doing the best it can. How do you expect this scenario to recover?

FWIW,
Erick

On Thu, May 8, 2014 at 8:00 AM, adfel70 adfe...@gmail.com wrote:
 Solr  Collection Info:
 solr 4.8 , 4 shards, 3 replicas per shard, 30-40 milion docs per shard.

 Process:
 1. Indexing 100-200 docs per second.
 2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while
 indexing).
 3. Indexing for 10-20 minutes and doing hard commit.
 4. Doing Pkill -9 java to the leader and then starting one replica in shard
 3 (while indexing).
 5. After 20 minutes starting another replica in shard 3 ,while indexing (not
 the leader in step 1).

 Results:
 2. Only the leader is active in shard 3.
 3. Thousands of docs were added to the leader in shard 3.
 4. After staring the replica, it's state was down and after 10 minutes it
 became the leader in cluster state (and still down). no servers hosting
 shards for index and search requests.
 5. After starting another replica, it's state was recovering for 2-3 minutes
 and then it became active (not leader in cluster state).
 6. Index, commit and search requests are handeled in the other replicae
 (*active status, not leader!!!*).


 Expected:
 5. To stay in down status.
 *6. Not to handel index, commit and search requests - no servers hosting
 shards!*

 Thanks!




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Replica-as-a-leader-tp4135077.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Replica as a leader

2014-05-15 Thread adfel70
/Solr Collection Info:/
Solr 4.8 , 4 shards, 3 replicas per shard, 30-40 million docs per shard.

/Process:/
1. Indexing 100-200 docs per second.
2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while
indexing).
3. Indexing for 10-20 minutes and doing hard commit. 
4. Doing Pkill -9 java to the leader and then starting one replica in shard
3 (while indexing).
5. After 20 minutes starting another replica in shard 3 ,while indexing (not
the leader in step 1). 
6. After 10 minutes starting the rep that was the leader in step 1. 

/Results:/
2. Only the leader is active in shard 3.
3. Thousands of docs were added to the leader in shard 3.
4. After staring the replica, it's state was down and after 10 minutes it
became the leader in cluster state (and still down). no servers hosting
shards for index and search requests.
*5. After starting another replica, it's state was recovering for 2-3
minutes and then it became active (not leader in cluster state).
   Index, commit and search requests are handled in the other replica
(active status, not leader!!!). 
   The search Results not includes docs that have been indexed to the leader
in step 3.  *
6. syncing with the active rep. 

/Expected:/
*5. To stay in down status.
   Not to handle index, commit and search requests - no servers hosting
shards!*
6. Become the leader.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replica-as-a-leader-tp4135078.html
Sent from the Solr - User mailing list archive at Nabble.com.


Replica as a leader

2014-05-11 Thread adfel70
Solr  Collection Info:
solr 4.8 , 4 shards, 3 replicas per shard, 30-40 milion docs per shard.

Process:
1. Indexing 100-200 docs per second.
2. Doing Pkill -9 java to 2 replicas (not the leader) in shard 3 (while
indexing).
3. Indexing for 10-20 minutes and doing hard commit. 
4. Doing Pkill -9 java to the leader and then starting one replica in shard
3 (while indexing).
5. After 20 minutes starting another replica in shard 3 ,while indexing (not
the leader in step 1). 

Results:
2. Only the leader is active in shard 3.
3. Thousands of docs were added to the leader in shard 3.
4. After staring the replica, it's state was down and after 10 minutes it
became the leader in cluster state (and still down). no servers hosting
shards for index and search requests.
5. After starting another replica, it's state was recovering for 2-3 minutes
and then it became active (not leader in cluster state).
6. Index, commit and search requests are handeled in the other replicae
(*active status, not leader!!!*).


Expected:
5. To stay in down status.
*6. Not to handel index, commit and search requests - no servers hosting
shards!*

Thanks!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replica-as-a-leader-tp4135077.html
Sent from the Solr - User mailing list archive at Nabble.com.