[jira] [Updated] (HBASE-26304) Reflect out-of-band locality improvements in served requests

2021-12-03 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-26304:
--
Fix Version/s: 2.5.0
   3.0.0-alpha-2

> Reflect out-of-band locality improvements in served requests
> 
>
> Key: HBASE-26304
> URL: https://issues.apache.org/jira/browse/HBASE-26304
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2
>
>
> Edit: Description updated to avoid needing to read the full investigation 
> laid out in the comments.
> Once the LocalityHealer has improved locality of a StoreFile (by moving 
> blocks onto the correct host), the Reader's DFSInputStream and Region's 
> localityIndex metric must be refreshed. Without refreshing the 
> DFSInputStream, the improved locality will not improve latencies. In fact, 
> the DFSInputStream may try to fetch blocks that have moved, resulting in a 
> ReplicaNotFoundException. This is automatically retried, but the retry will 
> temporarily increase long tail latencies relative to configured backoff 
> strategy.
> In the original LocalityHealer design, I created a new 
> RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
> of region names and, for each region store, re-opens the underlying StoreFile 
> if the locality has changed. This implementation was complicated both in 
> integrating callbacks into the HDFS Dispatcher and in terms of safely 
> re-opening StoreFiles without impacting reads or caches. 
> In working to port the LocalityHealer to the Apache projects, I'm taking a 
> different approach:
>  * The part of the LocalityHealer that moves blocks will be an HDFS project 
> contribution
>  * As such, the DFSClient should be able to more gracefully recover from 
> block moves.
>  * Additionally, HBase has some caches of block locations for locality 
> reporting and the balancer. Those need to be kept up-to-date.
> The DFSClient improvements are covered in HDFS-16261 and HDFS-16262. As such, 
> this issue becomes about updating HBase's block location caches.
> I considered a few different approaches, but the most elegant one I could 
> come up with was to tie the HDFSBlockDistribution metrics directly to the 
> underlying DFSInputStream of each StoreFile's initialReader. That way, our 
> locality metrics are identically representing the block allocations that our 
> reads are going through. This also means that our locality metrics will 
> naturally adjust as the DFSInputStream adjusts to block moves.
> Once we have accurate locality metrics on the regionserver, the Balancer's 
> cache can easily be invalidated via our usual heartbeat methods. 
> RegionServers report to the HMaster periodically, which keeps a 
> ClusterMetrics method up to date. Right before each balancer invocation, the 
> balancer is updated with the latest ClusterMetrics. At this time, we compare 
> the old ClusterMetrics to the new, and invalidate the caches for any regions 
> whose locality has changed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HBASE-26304) Reflect out-of-band locality improvements in served requests

2021-11-09 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault updated HBASE-26304:
--
Description: 
Edit: Description updated to avoid needing to read the full investigation laid 
out in the comments.

Once the LocalityHealer has improved locality of a StoreFile (by moving blocks 
onto the correct host), the Reader's DFSInputStream and Region's localityIndex 
metric must be refreshed. Without refreshing the DFSInputStream, the improved 
locality will not improve latencies. In fact, the DFSInputStream may try to 
fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is 
automatically retried, but the retry will temporarily increase long tail 
latencies relative to configured backoff strategy.

In the original LocalityHealer design, I created a new 
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
of region names and, for each region store, re-opens the underlying StoreFile 
if the locality has changed. This implementation was complicated both in 
integrating callbacks into the HDFS Dispatcher and in terms of safely 
re-opening StoreFiles without impacting reads or caches. 

In working to port the LocalityHealer to the Apache projects, I'm taking a 
different approach:
 * The part of the LocalityHealer that moves blocks will be an HDFS project 
contribution
 * As such, the DFSClient should be able to more gracefully recover from block 
moves.
 * Additionally, HBase has some caches of block locations for locality 
reporting and the balancer. Those need to be kept up-to-date.

The DFSClient improvements are covered in HDFS-16261 and HDFS-16262. As such, 
this issue becomes about updating HBase's block location caches.

I considered a few different approaches, but the most elegant one I could come 
up with was to tie the HDFSBlockDistribution metrics directly to the underlying 
DFSInputStream of each StoreFile's initialReader. That way, our locality 
metrics are identically representing the block allocations that our reads are 
going through. This also means that our locality metrics will naturally adjust 
as the DFSInputStream adjusts to block moves.

Once we have accurate locality metrics on the regionserver, the Balancer's 
cache can easily be invalidated via our usual heartbeat methods. RegionServers 
report to the HMaster periodically, which keeps a ClusterMetrics method up to 
date. Right before each balancer invocation, the balancer is updated with the 
latest ClusterMetrics. At this time, we compare the old ClusterMetrics to the 
new, and invalidate the caches for any regions whose locality has changed.

  was:
Once the LocalityHealer has improved locality of a StoreFile (by moving blocks 
onto the correct host), the Reader's DFSInputStream and Region's localityIndex 
metric must be refreshed. Without refreshing the DFSInputStream, the improved 
locality will not improve latencies. In fact, the DFSInputStream may try to 
fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is 
automatically retried, but the retry will temporarily increase long tail 
latencies relative to configured backoff strategy.

In the original LocalityHealer design, I created a new 
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
of region names and, for each region store, re-opens the underlying StoreFile 
if the locality has changed. This implementation was complicated both in 
integrating callbacks into the HDFS Dispatcher and in terms of safely 
re-opening StoreFiles without impacting reads or caches. 

In working to port the LocalityHealer to the Apache projects, I'm taking a 
different approach:
 * The part of the LocalityHealer that moves blocks will be an HDFS project 
contribution
 * As such, the DFSClient should be able to more gracefully recover from block 
moves.
 * Additionally, HBase has some caches of block locations for locality 
reporting and the balancer. Those need to be kept up-to-date.

The DFSClient improvements are covered in 
https://issues.apache.org/jira/browse/HDFS-16261. As such, this issue becomes 
about updating HBase's block location caches.

I considered a few different approaches, but the most elegant one I could come 
up with was to tie the HDFSBlockDistribution metrics directly to the underlying 
DFSInputStream of each StoreFile's initialReader. That way, our locality 
metrics are identically representing the block allocations that our reads are 
going through. This also means that our locality metrics will naturally adjust 
as the DFSInputStream adjusts to block moves.

Once we have accurate locality metrics on the regionserver, the Balancer's 
cache can easily be invalidated via our usual heartbeat methods. RegionServers 
report to the HMaster periodically, which keeps a ClusterMetrics method up to 
date. Right before each balancer invocation, the balancer is 

[jira] [Updated] (HBASE-26304) Reflect out-of-band locality improvements in served requests

2021-10-28 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault updated HBASE-26304:
--
Description: 
Once the LocalityHealer has improved locality of a StoreFile (by moving blocks 
onto the correct host), the Reader's DFSInputStream and Region's localityIndex 
metric must be refreshed. Without refreshing the DFSInputStream, the improved 
locality will not improve latencies. In fact, the DFSInputStream may try to 
fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is 
automatically retried, but the retry will temporarily increase long tail 
latencies relative to configured backoff strategy.

In the original LocalityHealer design, I created a new 
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
of region names and, for each region store, re-opens the underlying StoreFile 
if the locality has changed. This implementation was complicated both in 
integrating callbacks into the HDFS Dispatcher and in terms of safely 
re-opening StoreFiles without impacting reads or caches. 

In working to port the LocalityHealer to the Apache projects, I'm taking a 
different approach:
 * The part of the LocalityHealer that moves blocks will be an HDFS project 
contribution
 * As such, the DFSClient should be able to more gracefully recover from block 
moves.
 * Additionally, HBase has some caches of block locations for locality 
reporting and the balancer. Those need to be kept up-to-date.

The DFSClient improvements are covered in 
https://issues.apache.org/jira/browse/HDFS-16261. As such, this issue becomes 
about updating HBase's block location caches.

I considered a few different approaches, but the most elegant one I could come 
up with was to tie the HDFSBlockDistribution metrics directly to the underlying 
DFSInputStream of each StoreFile's initialReader. That way, our locality 
metrics are identically representing the block allocations that our reads are 
going through. This also means that our locality metrics will naturally adjust 
as the DFSInputStream adjusts to block moves.

Once we have accurate locality metrics on the regionserver, the Balancer's 
cache can easily be invalidated via our usual heartbeat methods. RegionServers 
report to the HMaster periodically, which keeps a ClusterMetrics method up to 
date. Right before each balancer invocation, the balancer is updated with the 
latest ClusterMetrics. At this time, we compare the old ClusterMetrics to the 
new, and invalidate the caches for any regions whose locality has changed.

  was:
Once the LocalityHealer has improved locality of a StoreFile (by moving blocks 
onto the correct host), the Reader's DFSInputStream and Region's localityIndex 
metric must be refreshed. Without refreshing the DFSInputStream, the improved 
locality will not improve latencies. In fact, the DFSInputStream may try to 
fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is 
automatically retried, but the retry will temporarily increase long tail 
latencies relative to configured backoff strategy.

 

In the original LocalityHealer design, I created a new 
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
of region names and, for each region store, re-opens the underlying StoreFile 
if the locality has changed. This implementation was complicated both in 
integrating callbacks into the HDFS Dispatcher and in terms of safely 
re-opening StoreFiles without impacting reads or caches. 

In working to port the LocalityHealer I'm taking a different approach:
 * The part of the LocalityHealer that moves blocks will be an HDFS project 
contribution
 * As such, the DFSClient should be able to more gracefully recover from block 
moves.
 * Additionally, HBase has some caches of block locations for locality 
reporting and the balancer. Those need to be kept up-to-date.

I will submit a PR with that implementation, but I am also investigating other 
avenues. For example, I noticed 
https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but 
maybe can be improved as an automatic lower-level handling of block moves.


> Reflect out-of-band locality improvements in served requests
> 
>
> Key: HBASE-26304
> URL: https://issues.apache.org/jira/browse/HBASE-26304
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>
> Once the LocalityHealer has improved locality of a StoreFile (by moving 
> blocks onto the correct host), the Reader's DFSInputStream and Region's 
> localityIndex metric must be refreshed. Without refreshing the 
> DFSInputStream, the improved locality will not improve latencies. In fact, 
> the DFSInputStream may try to fetch 

[jira] [Updated] (HBASE-26304) Reflect out-of-band locality improvements in served requests

2021-10-28 Thread Bryan Beaudreault (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Beaudreault updated HBASE-26304:
--
Description: 
Once the LocalityHealer has improved locality of a StoreFile (by moving blocks 
onto the correct host), the Reader's DFSInputStream and Region's localityIndex 
metric must be refreshed. Without refreshing the DFSInputStream, the improved 
locality will not improve latencies. In fact, the DFSInputStream may try to 
fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is 
automatically retried, but the retry will temporarily increase long tail 
latencies relative to configured backoff strategy.

 

In the original LocalityHealer design, I created a new 
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
of region names and, for each region store, re-opens the underlying StoreFile 
if the locality has changed. This implementation was complicated both in 
integrating callbacks into the HDFS Dispatcher and in terms of safely 
re-opening StoreFiles without impacting reads or caches. 

In working to port the LocalityHealer I'm taking a different approach:
 * The part of the LocalityHealer that moves blocks will be an HDFS project 
contribution
 * As such, the DFSClient should be able to more gracefully recover from block 
moves.
 * Additionally, HBase has some caches of block locations for locality 
reporting and the balancer. Those need to be kept up-to-date.

I will submit a PR with that implementation, but I am also investigating other 
avenues. For example, I noticed 
https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but 
maybe can be improved as an automatic lower-level handling of block moves.

  was:
Once the LocalityHealer has improved locality of a StoreFile (by moving blocks 
onto the correct host), the Reader's DFSInputStream and Region's localityIndex 
metric must be refreshed. Without refreshing the DFSInputStream, the improved 
locality will not improve latencies. In fact, the DFSInputStream may try to 
fetch blocks that have moved, resulting in a ReplicaNotFoundException. This is 
automatically retried, but the retry will increase long tail latencies relative 
to configured backoff strategy.

See https://issues.apache.org/jira/browse/HDFS-16155 for an improvement in 
backoff strategy which can greatly mitigate latency impact of the missing block 
retry.

Even with that mitigation, a StoreFile is often made up of many blocks. Without 
some sort of intervention, we will continue to hit ReplicaNotFoundException 
over time as clients naturally request data from moved blocks.

In the original LocalityHealer design, I created a new 
RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
of region names and, for each region store, re-opens the underlying StoreFile 
if the locality has changed.

I will submit a PR with that implementation, but I am also investigating other 
avenues. For example, I noticed 
https://issues.apache.org/jira/browse/HDFS-15119 which doesn't seem ideal but 
maybe can be improved as an automatic lower-level handling of block moves.


> Reflect out-of-band locality improvements in served requests
> 
>
> Key: HBASE-26304
> URL: https://issues.apache.org/jira/browse/HBASE-26304
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>
> Once the LocalityHealer has improved locality of a StoreFile (by moving 
> blocks onto the correct host), the Reader's DFSInputStream and Region's 
> localityIndex metric must be refreshed. Without refreshing the 
> DFSInputStream, the improved locality will not improve latencies. In fact, 
> the DFSInputStream may try to fetch blocks that have moved, resulting in a 
> ReplicaNotFoundException. This is automatically retried, but the retry will 
> temporarily increase long tail latencies relative to configured backoff 
> strategy.
>  
> In the original LocalityHealer design, I created a new 
> RefreshHDFSBlockDistribution RPC on the RegionServer. This RPC accepts a list 
> of region names and, for each region store, re-opens the underlying StoreFile 
> if the locality has changed. This implementation was complicated both in 
> integrating callbacks into the HDFS Dispatcher and in terms of safely 
> re-opening StoreFiles without impacting reads or caches. 
> In working to port the LocalityHealer I'm taking a different approach:
>  * The part of the LocalityHealer that moves blocks will be an HDFS project 
> contribution
>  * As such, the DFSClient should be able to more gracefully recover from 
> block moves.
>  * Additionally, HBase has some caches of block locations for locality 
> reporting and the balancer. Those need to be