[
https://issues.apache.org/jira/browse/HBASE-24633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145729#comment-17145729
]
Huaxiang Sun commented on HBASE-24633:
--
{code:java}
Data locality of replica regions in balancer has negative impact for cluster's
"balanced" state. Hbase balancer's goal is to move regions so cluster can reach
a "balanced" state (region#s/rs, data locality, ops etc). Each time it runs, it
makes decision so the cluster goes closer to a "balanced" state. Some of the
factors actually support this direction. For an example, primary region's data
locality. If balancer decides region A needs to be moved to region server 1 for
better data locality, over time, region A at region server 1's data locality
will be improved (flush/compaction will increase data locality). The cluster
will become more stable. However, today, data locality for replica region is
also playing the same critical role as primary region, this factor actually
moves in opposite direction. For an example, if replica region Ar is moved to
region server 1 for better data locality, over time, the data locality for this
Ar will get worse (as primary region does all compaction/flush, hdfs may not
put data copy to the same data node as replica region resides). Some time
later, balancer will need to move this Ar region again for better data
locality. The solution I am proposing is to remove this factor from balancer's
decision make, data locality for replica region is not a goal for balancer. If
we need better latency for replica region read, we need extra mechanism to warm
up the caches for replica regions.
{code}
> Remove data locality and StoreFileCostFunction for replica regions out of
> balancer's cost calculation
> -
>
> Key: HBASE-24633
> URL: https://issues.apache.org/jira/browse/HBASE-24633
> Project: HBase
> Issue Type: Improvement
> Components: Balancer
>Affects Versions: 2.3.0
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> We found one of the clusters with read replica enabled always balance lots of
> replica regions. going through the balancer's cost functions, found that data
> locality and StoreFileCost have same multiplier for both primary and replica
> regions. That is something we can improve. Data locality for replica regions
> should not be a dominant factor for balancer. We can either remove it out of
> balancer's picture for now and give it a small multiplier.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)