Re: Attention Solr 4.0 SolrCloud users

2012-12-08 Thread Jamie Johnson
thanks for the info.  we were looking to move to a stable release soon (we
are on an old nightly build from April!).  Has this issue existed since
then?  Do we have an idea when solr 4.1 will be made available?  I am just
trying to get an idea if we should wait or not.


On Thu, Dec 6, 2012 at 9:11 PM, Mark Miller markrmil...@gmail.com wrote:

 I should have sent this some time ago:

 https://issues.apache.org/jira/browse/SOLR-3940 Rejoining the leader
 election incorrectly triggers the code path for a fresh cluster start
 rather than fail over.

 The above is a somewhat ugly bug.

 It means that if you are playing around with recovery and you kill a
 replica in a shard, it will take 3 minutes before a new leader takes over.

 This will be fixed in the upcoming 4.1 release (And has been fixed on 4x
 since early October).

 This wait is only meant for cluster startup. The idea is that you might
 introduce some random, old, out of date shard and then start up your
 cluster - you don't want that shard to be a leader - so we wait around for
 all known shards to startup so they can all participate in the initial
 leader election and the best one can be chosen. It's meant as a protective
 measure against a fairly unlikely event. But it's kicking in when it
 shouldn't.

 You can just accept the 3 minute wait, or you can lower the wait from 3
 minutes (to like 10 seconds or to 0 seconds - just avoid the scenario I
 mention above if you do).

 You can set the wait time in solr.xml by adding the attribute
 leaderVoteWait={whatever miliseconds} to the cores node.

 Sorry about this - completely my fault.

 - Mark


Re: Attention Solr 4.0 SolrCloud users

2012-12-08 Thread Mark Miller
Hey Jamie - long time, no see.

On Dec 8, 2012, at 5:19 AM, Jamie Johnson jej2...@gmail.com wrote:

 thanks for the info.  we were looking to move to a stable release soon (we
 are on an old nightly build from April!).  Has this issue existed since
 then?  

It was introduced shortly before 4.0 was released, so no, I don't think so.

 Do we have an idea when solr 4.1 will be made available?  I am just
 trying to get an idea if we should wait or not.

I hope very, very soon…just have to herd a few cats…

- Mark

 
 
 On Thu, Dec 6, 2012 at 9:11 PM, Mark Miller markrmil...@gmail.com wrote:
 
 I should have sent this some time ago:
 
 https://issues.apache.org/jira/browse/SOLR-3940 Rejoining the leader
 election incorrectly triggers the code path for a fresh cluster start
 rather than fail over.
 
 The above is a somewhat ugly bug.
 
 It means that if you are playing around with recovery and you kill a
 replica in a shard, it will take 3 minutes before a new leader takes over.
 
 This will be fixed in the upcoming 4.1 release (And has been fixed on 4x
 since early October).
 
 This wait is only meant for cluster startup. The idea is that you might
 introduce some random, old, out of date shard and then start up your
 cluster - you don't want that shard to be a leader - so we wait around for
 all known shards to startup so they can all participate in the initial
 leader election and the best one can be chosen. It's meant as a protective
 measure against a fairly unlikely event. But it's kicking in when it
 shouldn't.
 
 You can just accept the 3 minute wait, or you can lower the wait from 3
 minutes (to like 10 seconds or to 0 seconds - just avoid the scenario I
 mention above if you do).
 
 You can set the wait time in solr.xml by adding the attribute
 leaderVoteWait={whatever miliseconds} to the cores node.
 
 Sorry about this - completely my fault.
 
 - Mark



Re: Attention Solr 4.0 SolrCloud users

2012-12-08 Thread Jamie Johnson
Yes, been off the radar for some time, a testimant to just how well
SolrCloud works even in it's alpha state!

Glad to hear that it should be soon, we are hoping to move to a stable
version of Solr for our next release at the end of February, so anything in
January should give us enough time to react.  Appreciate the information,
hope all is well!


On Sat, Dec 8, 2012 at 9:25 AM, Mark Miller markrmil...@gmail.com wrote:

 Hey Jamie - long time, no see.

 On Dec 8, 2012, at 5:19 AM, Jamie Johnson jej2...@gmail.com wrote:

  thanks for the info.  we were looking to move to a stable release soon
 (we
  are on an old nightly build from April!).  Has this issue existed since
  then?

 It was introduced shortly before 4.0 was released, so no, I don't think so.

  Do we have an idea when solr 4.1 will be made available?  I am just
  trying to get an idea if we should wait or not.

 I hope very, very soon…just have to herd a few cats…

 - Mark

 
 
  On Thu, Dec 6, 2012 at 9:11 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  I should have sent this some time ago:
 
  https://issues.apache.org/jira/browse/SOLR-3940 Rejoining the leader
  election incorrectly triggers the code path for a fresh cluster start
  rather than fail over.
 
  The above is a somewhat ugly bug.
 
  It means that if you are playing around with recovery and you kill a
  replica in a shard, it will take 3 minutes before a new leader takes
 over.
 
  This will be fixed in the upcoming 4.1 release (And has been fixed on 4x
  since early October).
 
  This wait is only meant for cluster startup. The idea is that you might
  introduce some random, old, out of date shard and then start up your
  cluster - you don't want that shard to be a leader - so we wait around
 for
  all known shards to startup so they can all participate in the initial
  leader election and the best one can be chosen. It's meant as a
 protective
  measure against a fairly unlikely event. But it's kicking in when it
  shouldn't.
 
  You can just accept the 3 minute wait, or you can lower the wait from 3
  minutes (to like 10 seconds or to 0 seconds - just avoid the scenario I
  mention above if you do).
 
  You can set the wait time in solr.xml by adding the attribute
  leaderVoteWait={whatever miliseconds} to the cores node.
 
  Sorry about this - completely my fault.
 
  - Mark




Attention Solr 4.0 SolrCloud users

2012-12-06 Thread Mark Miller
I should have sent this some time ago:

https://issues.apache.org/jira/browse/SOLR-3940 Rejoining the leader election 
incorrectly triggers the code path for a fresh cluster start rather than fail 
over.

The above is a somewhat ugly bug.

It means that if you are playing around with recovery and you kill a replica in 
a shard, it will take 3 minutes before a new leader takes over.

This will be fixed in the upcoming 4.1 release (And has been fixed on 4x since 
early October).

This wait is only meant for cluster startup. The idea is that you might 
introduce some random, old, out of date shard and then start up your cluster - 
you don't want that shard to be a leader - so we wait around for all known 
shards to startup so they can all participate in the initial leader election 
and the best one can be chosen. It's meant as a protective measure against a 
fairly unlikely event. But it's kicking in when it shouldn't.

You can just accept the 3 minute wait, or you can lower the wait from 3 minutes 
(to like 10 seconds or to 0 seconds - just avoid the scenario I mention above 
if you do).

You can set the wait time in solr.xml by adding the attribute 
leaderVoteWait={whatever miliseconds} to the cores node.

Sorry about this - completely my fault.

- Mark