Re: REBALANCELEADERS is not reliable

Bernd Fehling Mon, 21 Jan 2019 05:38:17 -0800

Hi Erik,

patches and the new comments look good.
Unfortunately I'm at 6.6.5 and can't test this with my cloud.
Replica (o.a.s.common.cloud.Replica) at 6.6.5 is to far away from 7.6 and up.
And a backport for 6.6.5 is to much rework, if possible at all.


Thanks for solving this issue.

Regards,
Bernd


Am 20.01.19 um 17:04 schrieb Erick Erickson:

Bernd:

I just committed fixes on SOLR-13091 and SOLR-10935 to the repo, if
you wanted to give it a whirl it's ready. By tonight (Sunday) I expect
to change the response format a bit and update the ref guide, although
you'll have to look at the doc changes in the format. There's a new
summary section that gives "Success" or "Failure" that's supposed to
be the only thing you really need to check...

One judgement call I made was that if a replica on a down node is the
preferredLeader, it _can't_ be made leader, but this is still labeled
"Success".

Best,
Erick

On Sun, Jan 13, 2019 at 7:43 PM Erick Erickson <erickerick...@gmail.com> wrote:


Bernd:

I just attached a patch to
https://issues.apache.org/jira/browse/SOLR-13091. It's still rough,
the response from REBALANCELEADERS needs quite a bit of work (lots of
extra stuff in it now, and no overall verification).
I haven't run all the tests, nor precommit.

I wanted to get something up so if you have a test environment that
you can easily test it in you'd have an early chance to play with it.

It's against master, I also haven't tried to backport to 8.0 or 7x
yet. I doubt it'll be a problem, but if it does't apply cleanly let me
know.

Best,
Erick

On Fri, Jan 11, 2019 at 8:33 AM Erick Erickson <erickerick...@gmail.com> wrote:


bq: You have to check if the cores, participating in leadership
election, are _really_
in sync. And this must be done before starting any rebalance.
Sounds ugly... :-(

This _should_ not be necessary. I'll add parenthetically that leader
election has
been extensively re-worked in Solr 7.3+ though because "interesting" things
could happen.

Manipulating the leader election queue is really no different than
having to deal with, say, someone killing the leader un-gracefully. It  should
"just work". That said if you're seeing evidence to the contrary that's reality.

What do you mean by "stats" though? It's perfectly ordinary for there to
be different numbers of _deleted_ documents on various replicas, and
consequently things like term frequencies and doc frequencies being
different. What's emphatically _not_ expected is for there to be different
numbers of "live" docs.

"making sure nodes are in sync" is certainly an option. That should all
be automatic if you pause indexing and issue a commit, _then_
do a rebalance.

I certainly agree that the code is broken and needs to be fixed, but I
also have to ask how many shards are we talking here? The code was
originally written for the case where 100s of leaders could be on the
same node, until you get in to a significant number of leaders on
a single node (10s at least) there haven't been reliable stats showing
that it's a performance issue. If you have threshold numbers where
you've seen it make a material difference it'd be great to share them.

And I won't be getting back to this until the weekend, other urgent
stuff has come up...

Best,
Erick

On Fri, Jan 11, 2019 at 12:58 AM Bernd Fehling
<bernd.fehl...@uni-bielefeld.de> wrote:


Hi Erik,
yes, I would be happy to test any patches.

Good news, I got rebalance working.
After running the rebalance about 50 times with debugger and watching
the behavior of my problem shard and its core_nodes within my test cloud
I came to the point of failure. I solved it and now it works.

Bad news, rebalance is still not reliable and there are many more
problems and point of failure initiated by rebalanceLeaders or better
by re-queueing the watchlist.

How I located _my_ problem:
Test cloud is 5 server (VM), 5 shards, 3 replica per shard, 1 java
instance per server. 3 separate zookeepers.
My problem, shard2 wasn't willing to rebalance to a specific core_node.
core_nodes related (core_node1, core_node2, core_node10).
core_node10 was the preferredLeader.
It was just changing leader ship between core_node1 and core_node2,
back and forth, whenever I called rebalanceLeader.
First step, I stopped the server holding core_node2.
Result, the leadership was staying at core_node1 whenever I called 
rebalanceLeaders.
Second step, from debugger I _forced_ during rebalanceLeaders the
system to give the leadership to core_node10.
Result, there was no leader anymore for that shard. Yes it can happen,
you can end up with a shard having no leader but active core_nodes!!!
To fix this I was giving preferredLeader to core_node1 and called 
rebalanceLeaders.
After that, preferredLeader was set back to core_node10 and I was back
at the point I started, all calls to rebalanceLeaders kept the leader at 
core_node1.

  From the debug logs I got the hint about PeerSync of cores and 
IndexFingerprint.
The stats from my problem core_node10 showed that they differ from leader 
core_node1.
And the system notices the difference, starts a PeerSync and ends with success.
But actually the PeerSync seem to fail, because the stats of core_node1 and
core_node10 still differ afterwards.
Solution, I also stopped my server holding my problem core_node10, wiped all 
data
directories and started that server again. The core_nodes where rebuilt from 
leader
and now they are really in sync.
Calling now rebalanceLeaders ended now with success to preferredLeader.

My guess:
You have to check if the cores, participating in leadership election, are 
_really_
in sync. And this must be done before starting any rebalance.
Sounds ugly... :-(

Next question, why is PeerSync not reporting an error?
There is an info about "PeerSync START", "PeerSync Received 0 versions from ... 
fingeprint:null"
and "PeerSync DONE. sync succeeded" but the cores are not really in sync.

Another test I did (with my new knowledge about synced cores):
- Removing all preferredLeader properties
- stopping, wiping data directory, starting all server one by one to get
    all cores of all shards in sync
- setting one preferredLeader for each shard but different from the actual 
leader
- calling rebalanceLeaders succeeded only at 2 shards with the first run,
    not for all 5 shards (even with really all cores in sync).
- after calling rebalanceLeaders again the other shards succeeded also.
Result, rebalanceLeaders is still not reliable.

I have to mention that I have about 520.000 docs per core in my test cloud
and that there might also be a timing issue between calling rebalanceLeaders,
detecting that cores to become leader are not in sync with actual leader,
and resync while waiting for new leader election.

So far,
Bernd


Am 10.01.19 um 17:02 schrieb Erick Erickson:

Bernd:

Don't feel bad about missing it, I wrote the silly stuff and it took me
some time to remember.....

Those are  the rules.

It's always humbling to look back at my own code and say "that
idiot should have put some comments in here..." ;)

yeah, I agree there are a lot of moving parts here. I have a note to
myself to provide better feedback in the response. You're absolutely
right that we fire all these commands and hope they all work.  Just
returning "success" status doesn't guarantee leadership change.

I'll be on another task the rest of this week, but I should be able
to dress things up over the weekend. That'll give you a patch to test
if you're willing.

The actual code changes are pretty minimal, the bulk of the patch
will be the reworked test.

Best,
Erick


--
*************************************************************
Bernd Fehling                    Bielefeld University Library
Dipl.-Inform. (FH)                LibTec - Library Technology
Universitätsstr. 25                  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060       bernd.fehling(at)uni-bielefeld.de
          https://www.ub.uni-bielefeld.de/~befehl/

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

Re: REBALANCELEADERS is not reliable

Reply via email to