Re: Assigning regions after restart

2017-03-16 Thread Lars George
JIRA creation is done: https://issues.apache.org/jira/browse/HBASE-17791 I will have a look into it over the next few days, maybe I can come up with a patch. On Wed, Mar 15, 2017 at 6:14 AM, Stack wrote: > File a blocker please Lars. I'm pretty sure the boolean on whether we

Re: Assigning regions after restart

2017-03-14 Thread Stack
File a blocker please Lars. I'm pretty sure the boolean on whether we are doing a recovery or not has been there a long time so yeah, a single server recovery could throw us off, but you make a practical point, that one server should not destroy locality over the cluster. St.Ack On Tue, Mar 14,

Re: Assigning regions after restart

2017-03-14 Thread Lars George
Wait, HBASE-15251 is not enough methinks. The checks added help, but are not covering all the possible edge cases. In particular, say a node really fails, why not just reassign the few regions it did hold and leave all the others where they are? Seems insane as it is. On Tue, Mar 14, 2017 at 2:24

Re: Assigning regions after restart

2017-03-14 Thread Lars George
Looking at the code more... it seems the issue is here In AssignmentManager.processDeadServersAndRegionsInTransition(): ... failoverCleanupDone(); if (!failover) { // Fresh cluster startup. LOG.info("Clean cluster startup. Assigning user regions"); assignAllUserRegions(allRegions); } ...

Assigning regions after restart

2017-03-14 Thread Lars George
Hi, I had this happened at multiple clusters recently where after the restart the locality dropped from close to or exactly 100% down to single digits. The reason is that all regions were completely shuffled and reassigned to random servers. Upon reading the (yet again non-trivial) assignment

Re: Assigning regions after restart

2017-03-14 Thread Lars George
Doh, https://issues.apache.org/jira/browse/HBASE-15251 addresses this (though I am not sure exactly how, see below). This should be backported to all 1.x branches! As for the patch, I see this if (!failover) { // Fresh cluster startup. - LOG.info("Clean cluster startup.