Re: leader split-brain at least once a day - need help

2015-01-13 Thread Thomas Lamy
Hi Mark, we're currently at 4.10.2, update to 4.10.3 ist scheduled for tomorrow. T Am 12.01.15 um 17:30 schrieb Mark Miller: bq. ClusterState says we are the leader, but locally we don't think so Generally this is due to some bug. One bug that can lead to it was recently fixed in 4.10.3 I

Re: leader split-brain at least once a day - need help

2015-01-13 Thread Shawn Heisey
On 1/12/2015 5:34 AM, Thomas Lamy wrote: I found no big/unusual GC pauses in the Log (at least manually; I found no free solution to analyze them that worked out of the box on a headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G before) on one of the nodes, after checking

Re: leader split-brain at least once a day - need help

2015-01-12 Thread Thomas Lamy
Hi, I found no big/unusual GC pauses in the Log (at least manually; I found no free solution to analyze them that worked out of the box on a headless debian wheezy box). Eventually i tried with -Xmx8G (was 64G before) on one of the nodes, after checking allocation after 1 hour run time was

Re: leader split-brain at least once a day - need help

2015-01-12 Thread Mark Miller
bq. ClusterState says we are the leader, but locally we don't think so Generally this is due to some bug. One bug that can lead to it was recently fixed in 4.10.3 I think. What version are you on? - Mark On Mon Jan 12 2015 at 7:35:47 AM Thomas Lamy t.l...@cytainment.de wrote: Hi, I found no

Re: leader split-brain at least once a day - need help

2015-01-08 Thread Thomas Lamy
Hi Alan, thanks for the pointer, I'll look at our gc logs Am 07.01.2015 um 15:46 schrieb Alan Woodward: I had a similar issue, which was caused by https://issues.apache.org/jira/browse/SOLR-6763. Are you getting long GC pauses or similar before the leader mismatches occur? Alan Woodward

Re: leader split-brain at least once a day - need help

2015-01-08 Thread Yonik Seeley
It's worth noting that those messages alone don't necessarily signify a problem with the system (and it wouldn't be called split brain). The async nature of updates (and thread scheduling) along with stop-the-world GC pauses that can change leadership, cause these little windows of inconsistencies

leader split-brain at least once a day - need help

2015-01-07 Thread Thomas Lamy
Hi there, we are running a 3 server cloud serving a dozen single-shard/replicate-everywhere collections. The 2 biggest collections are ~15M docs, and about 13GiB / 2.5GiB size. Solr is 4.10.2, ZK 3.4.5, Tomcat 7.0.56, Oracle Java 1.7.0_72-b14 10 of the 12 collections (the small ones) get

Re: leader split-brain at least once a day - need help

2015-01-07 Thread Ugo Matrangolo
Hi Thomas, I did not get these split brains (probably our use case is simpler) but we got the spammed Zk phenomenon. The easiest way to fix it is to: 1. Shut down all the Solr servers in the failing cluster 2. Connect to zk using its CLI 3. rmr overseer/queue 4. Restart Solr Think is way faster

Re: leader split-brain at least once a day - need help

2015-01-07 Thread Alan Woodward
I had a similar issue, which was caused by https://issues.apache.org/jira/browse/SOLR-6763. Are you getting long GC pauses or similar before the leader mismatches occur? Alan Woodward www.flax.co.uk On 7 Jan 2015, at 10:01, Thomas Lamy wrote: Hi there, we are running a 3 server cloud