Re: REBALANCELEADERS is not reliable

Erick Erickson Fri, 11 Jan 2019 08:35:03 -0800

bq: You have to check if the cores, participating in leadership
election, are _really_
in sync. And this must be done before starting any rebalance.
Sounds ugly... :-(


This _should_ not be necessary. I'll add parenthetically that leader
election has
been extensively re-worked in Solr 7.3+ though because "interesting" things
could happen.

Manipulating the leader election queue is really no different than
having to deal with, say, someone killing the leader un-gracefully. It  should
"just work". That said if you're seeing evidence to the contrary that's reality.

What do you mean by "stats" though? It's perfectly ordinary for there to
be different numbers of _deleted_ documents on various replicas, and
consequently things like term frequencies and doc frequencies being
different. What's emphatically _not_ expected is for there to be different
numbers of "live" docs.

"making sure nodes are in sync" is certainly an option. That should all
be automatic if you pause indexing and issue a commit, _then_
do a rebalance.

I certainly agree that the code is broken and needs to be fixed, but I
also have to ask how many shards are we talking here? The code was
originally written for the case where 100s of leaders could be on the
same node, until you get in to a significant number of leaders on
a single node (10s at least) there haven't been reliable stats showing
that it's a performance issue. If you have threshold numbers where
you've seen it make a material difference it'd be great to share them.

And I won't be getting back to this until the weekend, other urgent
stuff has come up...

Best,
Erick

On Fri, Jan 11, 2019 at 12:58 AM Bernd Fehling
<bernd.fehl...@uni-bielefeld.de> wrote:
>
> Hi Erik,
> yes, I would be happy to test any patches.
>
> Good news, I got rebalance working.
> After running the rebalance about 50 times with debugger and watching
> the behavior of my problem shard and its core_nodes within my test cloud
> I came to the point of failure. I solved it and now it works.
>
> Bad news, rebalance is still not reliable and there are many more
> problems and point of failure initiated by rebalanceLeaders or better
> by re-queueing the watchlist.
>
> How I located _my_ problem:
> Test cloud is 5 server (VM), 5 shards, 3 replica per shard, 1 java
> instance per server. 3 separate zookeepers.
> My problem, shard2 wasn't willing to rebalance to a specific core_node.
> core_nodes related (core_node1, core_node2, core_node10).
> core_node10 was the preferredLeader.
> It was just changing leader ship between core_node1 and core_node2,
> back and forth, whenever I called rebalanceLeader.
> First step, I stopped the server holding core_node2.
> Result, the leadership was staying at core_node1 whenever I called 
> rebalanceLeaders.
> Second step, from debugger I _forced_ during rebalanceLeaders the
> system to give the leadership to core_node10.
> Result, there was no leader anymore for that shard. Yes it can happen,
> you can end up with a shard having no leader but active core_nodes!!!
> To fix this I was giving preferredLeader to core_node1 and called 
> rebalanceLeaders.
> After that, preferredLeader was set back to core_node10 and I was back
> at the point I started, all calls to rebalanceLeaders kept the leader at 
> core_node1.
>
>  From the debug logs I got the hint about PeerSync of cores and 
> IndexFingerprint.
> The stats from my problem core_node10 showed that they differ from leader 
> core_node1.
> And the system notices the difference, starts a PeerSync and ends with 
> success.
> But actually the PeerSync seem to fail, because the stats of core_node1 and
> core_node10 still differ afterwards.
> Solution, I also stopped my server holding my problem core_node10, wiped all 
> data
> directories and started that server again. The core_nodes where rebuilt from 
> leader
> and now they are really in sync.
> Calling now rebalanceLeaders ended now with success to preferredLeader.
>
> My guess:
> You have to check if the cores, participating in leadership election, are 
> _really_
> in sync. And this must be done before starting any rebalance.
> Sounds ugly... :-(
>
> Next question, why is PeerSync not reporting an error?
> There is an info about "PeerSync START", "PeerSync Received 0 versions from 
> ... fingeprint:null"
> and "PeerSync DONE. sync succeeded" but the cores are not really in sync.
>
> Another test I did (with my new knowledge about synced cores):
> - Removing all preferredLeader properties
> - stopping, wiping data directory, starting all server one by one to get
>    all cores of all shards in sync
> - setting one preferredLeader for each shard but different from the actual 
> leader
> - calling rebalanceLeaders succeeded only at 2 shards with the first run,
>    not for all 5 shards (even with really all cores in sync).
> - after calling rebalanceLeaders again the other shards succeeded also.
> Result, rebalanceLeaders is still not reliable.
>
> I have to mention that I have about 520.000 docs per core in my test cloud
> and that there might also be a timing issue between calling rebalanceLeaders,
> detecting that cores to become leader are not in sync with actual leader,
> and resync while waiting for new leader election.
>
> So far,
> Bernd
>
>
> Am 10.01.19 um 17:02 schrieb Erick Erickson:
> > Bernd:
> >
> > Don't feel bad about missing it, I wrote the silly stuff and it took me
> > some time to remember.....
> >
> > Those are  the rules.
> >
> > It's always humbling to look back at my own code and say "that
> > idiot should have put some comments in here..." ;)
> >
> > yeah, I agree there are a lot of moving parts here. I have a note to
> > myself to provide better feedback in the response. You're absolutely
> > right that we fire all these commands and hope they all work.  Just
> > returning "success" status doesn't guarantee leadership change.
> >
> > I'll be on another task the rest of this week, but I should be able
> > to dress things up over the weekend. That'll give you a patch to test
> > if you're willing.
> >
> > The actual code changes are pretty minimal, the bulk of the patch
> > will be the reworked test.
> >
> > Best,
> > Erick
> >

Re: REBALANCELEADERS is not reliable

Reply via email to