subject:"\[jira\] \[Commented\] \(HBASE\-26519\) StoreFileScanner parallel seek \-\- productionize or drop\?"

[jira] [Commented] (HBASE-26519) StoreFileScanner parallel seek -- productionize or drop?

2021-12-01 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451902#comment-17451902
 ] 

Andrew Kyle Purtell commented on HBASE-26519:
-

Unless the potential payoff is significant (yes, this might be hard to guess) I 
would vote for dropping a complex and incomplete (IMHO) disabled-by-default 
'feature' that is, I would estimate, rarely used if at all, probably not at 
all. 

> StoreFileScanner parallel seek -- productionize or drop?
> 
>
> Key: HBASE-26519
> URL: https://issues.apache.org/jira/browse/HBASE-26519
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Minor
>
> hbase.storescanner.parallel.seek.enable was added a few years ago in 
> https://issues.apache.org/jira/browse/HBASE-7495, but still defaults to 
> disabled. The description of that says "Enables StoreFileScanner 
> parallel-seeking in StoreScanner, a feature which can reduce response latency 
> under special conditions".
> It's not very clear what "special conditions" means. Reading through the 
> entire comment history on that issue seems to indicate it can help when you 
> have "high random read, low cache hit rate, many store files". 
> We have a bunch of clusters with this shape, and in fact we use SSDs for all 
> storage so I figured this might help a lot. I tried setting this to true on 
> one RegionServer of one of our highest QPS clusters hoping I'd see some clear 
> improvement. This very simple test was pretty much a wash, so I need to do 
> more methodical testing.
> In the test one thing became clear though – is the default thread pool size 
> of 10 good enough for my use-case? I have no way of knowing, as there is no 
> logging or metrics that I can find around thread pool saturation. What I 
> ended up doing was spamming refresh of the /dump endpoint of the RS, and 
> noticed that there were sometimes 1-5 tasks queued for the RS_PARALLEL_SEEK 
> executor. This indicates maybe I should scale the thread pool, but use-cases 
> change over time so this seems like not a great way to determine that.
> Task queuing seems not great for a feature which is aimed at reducing 
> latencies. I wonder if we should consider some changes to make this more easy 
> to deploy in production. Here are some ideas I had:
>  * Can we generate a better default value for the thread pool size, maybe 
> based on number of RS handler threads or some other heuristic?
>  * Should we consider eliminating queuing for this feature? Instead, if the 
> threadpool is saturated run the seek in-line in the current thread (i.e. 
> revert to normal). This would be more similar to how hedged reads work in 
> HDFS.
>  * Can we expose a metric or logging to help operators know when to scale up 
> the thread pool? If we implemented the 2nd option above we could expose 
> "seeksInCurrentThread" counter to track this, again similar to how hedged 
> reads report on saturation.
> But with all of this said, I wonder if anyone is running this in production 
> and has any updated guidance on when to use this? Does it still make sense 
> given the last 8 years of development in HBase? Would it ever make sense to 
> make it enabled by default?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HBASE-26519) StoreFileScanner parallel seek -- productionize or drop?

2021-12-01 Thread Bryan Beaudreault (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451801#comment-17451801
 ] 

Bryan Beaudreault commented on HBASE-26519:
---

Good call. Done. I'll leave this open for now and relay any decisions or close 
it once discussion has finished.

> StoreFileScanner parallel seek -- productionize or drop?
> 
>
> Key: HBASE-26519
> URL: https://issues.apache.org/jira/browse/HBASE-26519
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Minor
>
> hbase.storescanner.parallel.seek.enable was added a few years ago in 
> https://issues.apache.org/jira/browse/HBASE-7495, but still defaults to 
> disabled. The description of that says "Enables StoreFileScanner 
> parallel-seeking in StoreScanner, a feature which can reduce response latency 
> under special conditions".
> It's not very clear what "special conditions" means. Reading through the 
> entire comment history on that issue seems to indicate it can help when you 
> have "high random read, low cache hit rate, many store files". 
> We have a bunch of clusters with this shape, and in fact we use SSDs for all 
> storage so I figured this might help a lot. I tried setting this to true on 
> one RegionServer of one of our highest QPS clusters hoping I'd see some clear 
> improvement. This very simple test was pretty much a wash, so I need to do 
> more methodical testing.
> In the test one thing became clear though – is the default thread pool size 
> of 10 good enough for my use-case? I have no way of knowing, as there is no 
> logging or metrics that I can find around thread pool saturation. What I 
> ended up doing was spamming refresh of the /dump endpoint of the RS, and 
> noticed that there were sometimes 1-5 tasks queued for the RS_PARALLEL_SEEK 
> executor. This indicates maybe I should scale the thread pool, but use-cases 
> change over time so this seems like not a great way to determine that.
> Task queuing seems not great for a feature which is aimed at reducing 
> latencies. I wonder if we should consider some changes to make this more easy 
> to deploy in production. Here are some ideas I had:
>  * Can we generate a better default value for the thread pool size, maybe 
> based on number of RS handler threads or some other heuristic?
>  * Should we consider eliminating queuing for this feature? Instead, if the 
> threadpool is saturated run the seek in-line in the current thread (i.e. 
> revert to normal). This would be more similar to how hedged reads work in 
> HDFS.
>  * Can we expose a metric or logging to help operators know when to scale up 
> the thread pool? If we implemented the 2nd option above we could expose 
> "seeksInCurrentThread" counter to track this, again similar to how hedged 
> reads report on saturation.
> But with all of this said, I wonder if anyone is running this in production 
> and has any updated guidance on when to use this? Does it still make sense 
> given the last 8 years of development in HBase? Would it ever make sense to 
> make it enabled by default?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HBASE-26519) StoreFileScanner parallel seek -- productionize or drop?

2021-12-01 Thread Reid Chan (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451691#comment-17451691
 ] 

Reid Chan commented on HBASE-26519:
---

Sounds more like a discussion topic. Would you mind posting it to dev@hbase 
email.

> StoreFileScanner parallel seek -- productionize or drop?
> 
>
> Key: HBASE-26519
> URL: https://issues.apache.org/jira/browse/HBASE-26519
> Project: HBase
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Minor
>
> hbase.storescanner.parallel.seek.enable was added a few years ago in 
> https://issues.apache.org/jira/browse/HBASE-7495, but still defaults to 
> disabled. The description of that says "Enables StoreFileScanner 
> parallel-seeking in StoreScanner, a feature which can reduce response latency 
> under special conditions".
> It's not very clear what "special conditions" means. Reading through the 
> entire comment history on that issue seems to indicate it can help when you 
> have "high random read, low cache hit rate, many store files". 
> We have a bunch of clusters with this shape, and in fact we use SSDs for all 
> storage so I figured this might help a lot. I tried setting this to true on 
> one RegionServer of one of our highest QPS clusters hoping I'd see some clear 
> improvement. This very simple test was pretty much a wash, so I need to do 
> more methodical testing.
> In the test one thing became clear though – is the default thread pool size 
> of 10 good enough for my use-case? I have no way of knowing, as there is no 
> logging or metrics that I can find around thread pool saturation. What I 
> ended up doing was spamming refresh of the /dump endpoint of the RS, and 
> noticed that there were sometimes 1-5 tasks queued for the RS_PARALLEL_SEEK 
> executor. This indicates maybe I should scale the thread pool, but use-cases 
> change over time so this seems like not a great way to determine that.
> Task queuing seems not great for a feature which is aimed at reducing 
> latencies. I wonder if we should consider some changes to make this more easy 
> to deploy in production. Here are some ideas I had:
>  * Can we generate a better default value for the thread pool size, maybe 
> based on number of RS handler threads or some other heuristic?
>  * Should we consider eliminating queuing for this feature? Instead, if the 
> threadpool is saturated run the seek in-line in the current thread (i.e. 
> revert to normal). This would be more similar to how hedged reads work in 
> HDFS.
>  * Can we expose a metric or logging to help operators know when to scale up 
> the thread pool? If we implemented the 2nd option above we could expose 
> "seeksInCurrentThread" counter to track this, again similar to how hedged 
> reads report on saturation.
> But with all of this said, I wonder if anyone is running this in production 
> and has any updated guidance on when to use this? Does it still make sense 
> given the last 8 years of development in HBase? Would it ever make sense to 
> make it enabled by default?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HBASE-26519) StoreFileScanner parallel seek -- productionize or drop?

[jira] [Commented] (HBASE-26519) StoreFileScanner parallel seek -- productionize or drop?

[jira] [Commented] (HBASE-26519) StoreFileScanner parallel seek -- productionize or drop?

3 matches

Site Navigation

Mail list logo

Footer information