[jira] [Commented] (HBASE-24454) BucketCache disabled instantly before error duration toleration is reached due to timing issue
[ https://issues.apache.org/jira/browse/HBASE-24454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119059#comment-17119059 ] Jacob LeBlanc commented on HBASE-24454: --- Sure, I'll contribute a patch. I was thinking the same thing about saving to a local variable so we avoid the double read. ioErrorStartTime is already volatile so read/writes are atomic. No need to add locking which is good given the success case (writing -1) is executed so frequently. > BucketCache disabled instantly before error duration toleration is reached > due to timing issue > -- > > Key: HBASE-24454 > URL: https://issues.apache.org/jira/browse/HBASE-24454 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.4.10 > Environment: We saw this in HBase 1.4.10 (EMR 5.28.1), but I believe > the newest code still has this problem. >Reporter: Jacob LeBlanc >Priority: Major > > We saw an instance where BucketCache was disabled after only two IO error > were encountered at nearly the same time. The following shows all errors from > a region server log for the 2020-05-26 17:00 hour. Notice that there are no > other errors until the 17:14:50 at which time the BucketCache is disabled > because it claims duration time has exceeded 6 ms: > $ grep ERROR hbase-hbase-regionserver-ip-172-20-113-147.log.2020-05-26-17 > 2020-05-26 17:14:50,396 ERROR [hfile-prefetch-1589117924884] > bucket.BucketCache: Failed reading block > 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache > 2020-05-26 17:14:50,397 ERROR > [regionserver/ip-172-20-113-147.us-west-2.compute.internal/172.20.113.147:16020-BucketCacheWriter-0] > bucket.BucketCache: Failed syncing IO engine > 2020-05-26 17:14:50,400 ERROR [hfile-prefetch-1589117924884] > bucket.BucketCache: Failed reading block > 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache > 2020-05-26 17:14:50,401 ERROR [hfile-prefetch-1589117924884] > bucket.BucketCache: IO errors duration time has exceeded 6ms, disabling > cache, please check your IOEngine > The region server is very busy so it should be constantly getting successful > reads and writes in the bucket cache (it is not as though there was some long > ago error and then no successful IO to clear the error flag). > We are running a busy EMR cluster backed by S3. A bucketcache getting > disabled is a huge performance issue because all reads must go to S3. > Looking at the code, I believe I've found a timing issue. Here is the code > for checking and setting the ioErrorStartTime (taken from BucketCache.java): > > {code:java} > /** >* Check whether we tolerate IO error this time. If the duration of IOEngine >* throwing errors exceeds ioErrorsDurationTimeTolerated, we will disable > the >* cache >*/ > private void checkIOErrorIsTolerated() { > long now = EnvironmentEdgeManager.currentTime(); > if (this.ioErrorStartTime > 0) { > if (cacheEnabled && (now - ioErrorStartTime) > > this.ioErrorsTolerationDuration) { > LOG.error("IO errors duration time has exceeded " + > ioErrorsTolerationDuration + > "ms, disabling cache, please check your IOEngine"); > disableCache(); > } > } else { > this.ioErrorStartTime = now; > } > } > {code} > > And here is the code for clearing the ioErrorStartTime when a successful read > or write is done: > {code:java} > if (this.ioErrorStartTime > 0) { > ioErrorStartTime = -1; > } > {code} > Notice that that if ioErrorStartTime is set to -1 after the first if > statement in checkIOErrorIsTolerated but before (now - ioErrorStartTime), > then (now - ioErrorStartTime) will evaluate to (now + 1) resulting in the > code thinking it has exceeded ioErrorsTolerationDuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-24454) BucketCache disabled instantly before error duration toleration is reached due to timing issue
[ https://issues.apache.org/jira/browse/HBASE-24454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob LeBlanc reassigned HBASE-24454: - Assignee: Jacob LeBlanc > BucketCache disabled instantly before error duration toleration is reached > due to timing issue > -- > > Key: HBASE-24454 > URL: https://issues.apache.org/jira/browse/HBASE-24454 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.4.10 > Environment: We saw this in HBase 1.4.10 (EMR 5.28.1), but I believe > the newest code still has this problem. >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Major > > We saw an instance where BucketCache was disabled after only two IO error > were encountered at nearly the same time. The following shows all errors from > a region server log for the 2020-05-26 17:00 hour. Notice that there are no > other errors until the 17:14:50 at which time the BucketCache is disabled > because it claims duration time has exceeded 6 ms: > $ grep ERROR hbase-hbase-regionserver-ip-172-20-113-147.log.2020-05-26-17 > 2020-05-26 17:14:50,396 ERROR [hfile-prefetch-1589117924884] > bucket.BucketCache: Failed reading block > 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache > 2020-05-26 17:14:50,397 ERROR > [regionserver/ip-172-20-113-147.us-west-2.compute.internal/172.20.113.147:16020-BucketCacheWriter-0] > bucket.BucketCache: Failed syncing IO engine > 2020-05-26 17:14:50,400 ERROR [hfile-prefetch-1589117924884] > bucket.BucketCache: Failed reading block > 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache > 2020-05-26 17:14:50,401 ERROR [hfile-prefetch-1589117924884] > bucket.BucketCache: IO errors duration time has exceeded 6ms, disabling > cache, please check your IOEngine > The region server is very busy so it should be constantly getting successful > reads and writes in the bucket cache (it is not as though there was some long > ago error and then no successful IO to clear the error flag). > We are running a busy EMR cluster backed by S3. A bucketcache getting > disabled is a huge performance issue because all reads must go to S3. > Looking at the code, I believe I've found a timing issue. Here is the code > for checking and setting the ioErrorStartTime (taken from BucketCache.java): > > {code:java} > /** >* Check whether we tolerate IO error this time. If the duration of IOEngine >* throwing errors exceeds ioErrorsDurationTimeTolerated, we will disable > the >* cache >*/ > private void checkIOErrorIsTolerated() { > long now = EnvironmentEdgeManager.currentTime(); > if (this.ioErrorStartTime > 0) { > if (cacheEnabled && (now - ioErrorStartTime) > > this.ioErrorsTolerationDuration) { > LOG.error("IO errors duration time has exceeded " + > ioErrorsTolerationDuration + > "ms, disabling cache, please check your IOEngine"); > disableCache(); > } > } else { > this.ioErrorStartTime = now; > } > } > {code} > > And here is the code for clearing the ioErrorStartTime when a successful read > or write is done: > {code:java} > if (this.ioErrorStartTime > 0) { > ioErrorStartTime = -1; > } > {code} > Notice that that if ioErrorStartTime is set to -1 after the first if > statement in checkIOErrorIsTolerated but before (now - ioErrorStartTime), > then (now - ioErrorStartTime) will evaluate to (now + 1) resulting in the > code thinking it has exceeded ioErrorsTolerationDuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24454) BucketCache disabled instantly before error duration toleration is reached due to timing issue
[ https://issues.apache.org/jira/browse/HBASE-24454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob LeBlanc updated HBASE-24454: -- Description: We saw an instance where BucketCache was disabled after only two IO error were encountered at nearly the same time. The following shows all errors from a region server log for the 2020-05-26 17:00 hour. Notice that there are no other errors until the 17:14:50 at which time the BucketCache is disabled because it claims duration time has exceeded 6 ms: $ grep ERROR hbase-hbase-regionserver-ip-172-20-113-147.log.2020-05-26-17 2020-05-26 17:14:50,396 ERROR [hfile-prefetch-1589117924884] bucket.BucketCache: Failed reading block 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache 2020-05-26 17:14:50,397 ERROR [regionserver/ip-172-20-113-147.us-west-2.compute.internal/172.20.113.147:16020-BucketCacheWriter-0] bucket.BucketCache: Failed syncing IO engine 2020-05-26 17:14:50,400 ERROR [hfile-prefetch-1589117924884] bucket.BucketCache: Failed reading block 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache 2020-05-26 17:14:50,401 ERROR [hfile-prefetch-1589117924884] bucket.BucketCache: IO errors duration time has exceeded 6ms, disabling cache, please check your IOEngine The region server is very busy so it should be constantly getting successful reads and writes in the bucket cache (it is not as though there was some long ago error and then no successful IO to clear the error flag). We are running a busy EMR cluster backed by S3. A bucketcache getting disabled is a huge performance issue because all reads must go to S3. Looking at the code, I believe I've found a timing issue. Here is the code for checking and setting the ioErrorStartTime (taken from BucketCache.java): {code:java} /** * Check whether we tolerate IO error this time. If the duration of IOEngine * throwing errors exceeds ioErrorsDurationTimeTolerated, we will disable the * cache */ private void checkIOErrorIsTolerated() { long now = EnvironmentEdgeManager.currentTime(); if (this.ioErrorStartTime > 0) { if (cacheEnabled && (now - ioErrorStartTime) > this.ioErrorsTolerationDuration) { LOG.error("IO errors duration time has exceeded " + ioErrorsTolerationDuration + "ms, disabling cache, please check your IOEngine"); disableCache(); } } else { this.ioErrorStartTime = now; } } {code} And here is the code for clearing the ioErrorStartTime when a successful read or write is done: {code:java} if (this.ioErrorStartTime > 0) { ioErrorStartTime = -1; } {code} Notice that that if ioErrorStartTime is set to -1 after the first if statement in checkIOErrorIsTolerated but before (now - ioErrorStartTime), then (now - ioErrorStartTime) will evaluate to (now + 1) resulting in the code thinking it has exceeded ioErrorsTolerationDuration. was: We saw an instance where BucketCache was disabled after only two IO error were encountered at nearly the same time. The following shows all errors from a region server log for the 2020-05-26 17:00 hour. Notice that there are no other errors until the 17:14:50 at which time the BucketCache is disabled because it claims duration time has exceeded 6 ms: $ grep ERROR hbase-hbase-regionserver-ip-172-20-113-147.log.2020-05-26-17 2020-05-26 17:14:50,396 ERROR [hfile-prefetch-1589117924884] bucket.BucketCache: Failed reading block 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache 2020-05-26 17:14:50,397 ERROR [regionserver/ip-172-20-113-147.us-west-2.compute.internal/172.20.113.147:16020-BucketCacheWriter-0] bucket.BucketCache: Failed syncing IO engine 2020-05-26 17:14:50,400 ERROR [hfile-prefetch-1589117924884] bucket.BucketCache: Failed reading block 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache 2020-05-26 17:14:50,401 ERROR [hfile-prefetch-1589117924884] bucket.BucketCache: IO errors duration time has exceeded 6ms, disabling cache, please check your IOEngine The region server is very busy so it should be constantly getting successful reads and writes in the bucket cache (it is not as though there was some long ago error and then no successful IO to clear the error flag). We are running a busy EMR cluster backed by S3. A bucketcache getting disabled is a huge performance issue because all reads must go to S3. Looking at the code, I believe I've found a timing issue. Here is the code for checking and setting the ioErrorStartTime (taken from BucketCache.java): /** * Check whether we tolerate IO error this time. If the duration of IOEngine * throwing errors exceeds ioErrorsDurationTimeTolerated, we will disable the * cache */ private void checkIOErrorIsTolerated() { long now = EnvironmentEdgeManager.currentTime(); if (this.ioErrorStartTime > 0) { if (cacheEnabled && (now - ioErrorStartTime) >
[jira] [Created] (HBASE-24454) BucketCache disabled instantly IO errors due to timing issue before error duration toleration is reached
Jacob LeBlanc created HBASE-24454: - Summary: BucketCache disabled instantly IO errors due to timing issue before error duration toleration is reached Key: HBASE-24454 URL: https://issues.apache.org/jira/browse/HBASE-24454 Project: HBase Issue Type: Bug Components: BucketCache Affects Versions: 1.4.10 Environment: We saw this in HBase 1.4.10 (EMR 5.28.1), but I believe the newest code still has this problem. Reporter: Jacob LeBlanc We saw an instance where BucketCache was disabled after only two IO error were encountered at nearly the same time. The following shows all errors from a region server log for the 2020-05-26 17:00 hour. Notice that there are no other errors until the 17:14:50 at which time the BucketCache is disabled because it claims duration time has exceeded 6 ms: $ grep ERROR hbase-hbase-regionserver-ip-172-20-113-147.log.2020-05-26-17 2020-05-26 17:14:50,396 ERROR [hfile-prefetch-1589117924884] bucket.BucketCache: Failed reading block 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache 2020-05-26 17:14:50,397 ERROR [regionserver/ip-172-20-113-147.us-west-2.compute.internal/172.20.113.147:16020-BucketCacheWriter-0] bucket.BucketCache: Failed syncing IO engine 2020-05-26 17:14:50,400 ERROR [hfile-prefetch-1589117924884] bucket.BucketCache: Failed reading block 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache 2020-05-26 17:14:50,401 ERROR [hfile-prefetch-1589117924884] bucket.BucketCache: IO errors duration time has exceeded 6ms, disabling cache, please check your IOEngine The region server is very busy so it should be constantly getting successful reads and writes in the bucket cache (it is not as though there was some long ago error and then no successful IO to clear the error flag). We are running a busy EMR cluster backed by S3. A bucketcache getting disabled is a huge performance issue because all reads must go to S3. Looking at the code, I believe I've found a timing issue. Here is the code for checking and setting the ioErrorStartTime (taken from BucketCache.java): /** * Check whether we tolerate IO error this time. If the duration of IOEngine * throwing errors exceeds ioErrorsDurationTimeTolerated, we will disable the * cache */ private void checkIOErrorIsTolerated() { long now = EnvironmentEdgeManager.currentTime(); if (this.ioErrorStartTime > 0) { if (cacheEnabled && (now - ioErrorStartTime) > this.ioErrorsTolerationDuration) { LOG.error("IO errors duration time has exceeded " + ioErrorsTolerationDuration + "ms, disabling cache, please check your IOEngine"); disableCache(); } } else { this.ioErrorStartTime = now; } } And here is the code for clearing the ioErrorStartTime when a successful read or write is done: if (this.ioErrorStartTime > 0) { ioErrorStartTime = -1; } Notice that that if ioErrorStartTime is set to -1 after the first if statement in checkIOErrorIsTolerated but before (now - ioErrorStartTime), then (now - ioErrorStartTime) will evaluate to (now + 1) resulting in the code thinking it has exceeded ioErrorsTolerationDuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-24454) BucketCache disabled instantly before error duration toleration is reached due to timing issue
[ https://issues.apache.org/jira/browse/HBASE-24454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob LeBlanc updated HBASE-24454: -- Summary: BucketCache disabled instantly before error duration toleration is reached due to timing issue (was: BucketCache disabled instantly IO errors due to timing issue before error duration toleration is reached) > BucketCache disabled instantly before error duration toleration is reached > due to timing issue > -- > > Key: HBASE-24454 > URL: https://issues.apache.org/jira/browse/HBASE-24454 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 1.4.10 > Environment: We saw this in HBase 1.4.10 (EMR 5.28.1), but I believe > the newest code still has this problem. >Reporter: Jacob LeBlanc >Priority: Major > > We saw an instance where BucketCache was disabled after only two IO error > were encountered at nearly the same time. The following shows all errors from > a region server log for the 2020-05-26 17:00 hour. Notice that there are no > other errors until the 17:14:50 at which time the BucketCache is disabled > because it claims duration time has exceeded 6 ms: > $ grep ERROR hbase-hbase-regionserver-ip-172-20-113-147.log.2020-05-26-17 > 2020-05-26 17:14:50,396 ERROR [hfile-prefetch-1589117924884] > bucket.BucketCache: Failed reading block > 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache > 2020-05-26 17:14:50,397 ERROR > [regionserver/ip-172-20-113-147.us-west-2.compute.internal/172.20.113.147:16020-BucketCacheWriter-0] > bucket.BucketCache: Failed syncing IO engine > 2020-05-26 17:14:50,400 ERROR [hfile-prefetch-1589117924884] > bucket.BucketCache: Failed reading block > 9adfad3a603047cfa0c98b16da0b0974_13786 from bucket cache > 2020-05-26 17:14:50,401 ERROR [hfile-prefetch-1589117924884] > bucket.BucketCache: IO errors duration time has exceeded 6ms, disabling > cache, please check your IOEngine > The region server is very busy so it should be constantly getting successful > reads and writes in the bucket cache (it is not as though there was some long > ago error and then no successful IO to clear the error flag). > We are running a busy EMR cluster backed by S3. A bucketcache getting > disabled is a huge performance issue because all reads must go to S3. > Looking at the code, I believe I've found a timing issue. Here is the code > for checking and setting the ioErrorStartTime (taken from BucketCache.java): > /** >* Check whether we tolerate IO error this time. If the duration of IOEngine >* throwing errors exceeds ioErrorsDurationTimeTolerated, we will disable > the >* cache >*/ > private void checkIOErrorIsTolerated() { > long now = EnvironmentEdgeManager.currentTime(); > if (this.ioErrorStartTime > 0) { > if (cacheEnabled && (now - ioErrorStartTime) > > this.ioErrorsTolerationDuration) { > LOG.error("IO errors duration time has exceeded " + > ioErrorsTolerationDuration + > "ms, disabling cache, please check your IOEngine"); > disableCache(); > } > } else { > this.ioErrorStartTime = now; > } > } > And here is the code for clearing the ioErrorStartTime when a successful > read or write is done: > if (this.ioErrorStartTime > 0) { > ioErrorStartTime = -1; > } > Notice that that if ioErrorStartTime is set to -1 after the first if > statement in checkIOErrorIsTolerated but before (now - ioErrorStartTime), > then (now - ioErrorStartTime) will evaluate to (now + 1) resulting in the > code thinking it has exceeded ioErrorsTolerationDuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997827#comment-16997827 ] Jacob LeBlanc commented on HBASE-23066: --- [~ram_krish] thanks so much for making the code changes for this! Not to be a pain, but any chance on getting this merged into branch-1.4 or whatever the next "stable" version will be (not sure if there will be a 1.4.13)? So far at least Amazon chooses the version marked "stable" ([https://hbase.apache.org/downloads.html]) to include in new EMR releases and since the main use case for this is cloud-based, it would be great to see it part of EMR. Thanks! > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988783#comment-16988783 ] Jacob LeBlanc commented on HBASE-23066: --- {quote}There is global config. The global config value can be overridden at a CF level via ColumnFamilyDescriptorBuilder#setConfiguration(final String key, final String value) Using the same global config name as the Key BTW - confirmed seeing the code that it is easy to trigger this per table/family also. Even using shell or the java client. {quote} Ah, OK. I wasn't aware there was a free-form way to set configuration at CF level, as I saw all of the explicit properties listed there. Thanks. {quote}No. Once compaction is done, the old compacted away files get closed with evictBlocks = true always. It wont honor this conf then. The conf is more used while closing of region. {quote} OK, I see now in HStore.removeCompactedFiles that evictBlocks is set explicitly to true every time. Thanks for clarification on that. So with caching compacted blocks on write there will be a temporary rise in cache size until the compacted files are evicted after compaction is done. Got it. {quote}Can you just give some rough numbers on you cache size and the number of blocks that you always see in your cache? Is there a sporadic raise in your block count and if so by how much and hope your cache size is good enough to have them. {quote} We have two production environments currently with our largest currently hosting 25.4 TB of data, with about 14 TB we are trying to keep "hot" in the cache for quick reading, with about 4 TB of that kept "IN_MEMORY" since we want fast phoenix searching on the data and care more about keeping it in cache. We currently are running 48 region servers with 1 TB on-disk bucket cache each, but that's probably overprovisioned as we've scaled up due to some recent issues. We can probably shoot for around 24 RS which would mean about 600 GB of data we are trying to keep in bucket cache. With the number of regions we have, we have plenty of spare space in bucket cache for handling many large compactions happening at once with this setting. In reality our used space for the cache will be a bit higher because other reading is still done beyond the 14 TB we prefetch, but it's OK if LRU blocks get evicted. I haven't done a deep analysis to see if there are sporadic spikes in cache size during compactions. 1 TB bucket caches may seem large but with the way costs are in AWS, storage is relatively cheap compared to compute resources so it's more cost effective for us to host fewer region servers with hosting more data with large caches rather than smaller caches where we would need more region servers to maintain performance (our compute nodes are probably 5x more expensive than the 1 TB of storage per month). {quote}If you are fine with the latest PR - I can just merge them and work on the other sub task to make this configuration based on a size so that all the older files' blocks are not cacheed. {quote} I am fine with it. Thanks for the attention on this! > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988077#comment-16988077 ] Jacob LeBlanc commented on HBASE-23066: --- {quote}The latest patch what Ram published yesterday is having this aspect also. We were discussing it offline. The allow cache on write for compacted files should be a global config as well as one can override it at Table or CF levels. On HTD or HCD level, user can do setConfigure with same config name and override the global value. {quote} OK, that sounds great! I did check on that PR and saw only the global setting, which is why I suggested that for my organization's use case at least we would need a table or CF specific setting. Let me know if there's anything else needed on my end. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986152#comment-16986152 ] Jacob LeBlanc commented on HBASE-23066: --- Apologies for the delayed response on my part. Just as an FYI, we manually deployed this patch (my original version) to our production environment and have been running it with good results for about a month and a half. {quote}With this new config, what we do is write to cache along with the HFile create itself. Blocks are added to cache as and when it is written to the HFile. So its aggressive. Ya it helps to make the new File data available from time 0 itself. The concern is this in a way demands 2x cache size. Because the compacting files data might be already there in the cache. While the new file write, those old files are still valid. The new one is not even committed by the RS. {quote} Without evictblocksonclose enabled (it looks like it is disabled by default) then wouldn't the old file data still be in cache even after compaction is finished? Granted once compaction is done it will no longer be accessed, age out, and be evicted if necessary but the same amount of data is read into the cache both with and without this new setting, is it not? When prefetching is enabled the only difference is the caching of the new file is done a little bit earlier, but other than that it seems caching requirements are the same. I'm not sure I understand why 2x cache size is needed - perhaps I am missing something. Having eviceblocksonclose enabled does change things and means you would need 2x cache size compared to normal as you change the ordering of caching/evicting. {quote}Also IMHO no need to check that based on whether prefetch is on or not! Make this conf name and doc clear what it is doing and what is the size expectations. {quote} Coming from the perspective of my organization's requirements - this would not work well for us as we only want data to be cached on compaction for tables where prefetching is enabled: The clear intention of enabling prefetching on a table is to keep as much data in the read cache as possible to ensure consistently fast reading, but without this configuration there are consistently huge drops in read performance whenever compaction is done because large parts of the table are essentially dropped from the cache (actually the pre-compaction data is still there unless evictblocksonclose is enabled, but the pre-compaction data is for the old file names which will never be accessed again after compaction is finished so it's the same as dropping the data). This configuration is to mitigate that effect to better achieve read performance sought by prefetching. The intention is *not* just to cache everything that gets compacted. So caching all compacted data on all tables does not meet this requirement and in fact would cause problems if it were to be used. In our use cases we have several tables where we write and compact a lot but where we don't want to prefetch those tables into our cache. Caching all blocks on compaction would cause big problems where we'd evict data we care about in favor of data we will never/rarely read. An alternative to having this setting contingent on prefetching would be to have a CACHE_BLOCKS_ON_COMPACTION as part of ColumnFamilyDescriptor. Then we could choose to turn it on for the same CFs where we also have prefetching. This seems like a bigger code/documentation change, whereas my original intention on this patch was to keep it small and focused for the only use case I could think of (why else would someone want to cache blocks during compaction except if they were prefetching?). But if a per-column family setting is preferred, then I could try making those changes. I welcome input from you experts. Thanks! > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948640#comment-16948640 ] Jacob LeBlanc commented on HBASE-23066: --- Not sure how folks are notified of new PRs, but I submitted one for this, including some checkstyle fixes that were in the previous patch. Please review, thank you! > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0, 1.5.1 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941877#comment-16941877 ] Jacob LeBlanc commented on HBASE-23066: --- [~ram_krish] the attached HBASE-23066.patch is for master. This is my first patch submission, can you please let me know next steps? > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941076#comment-16941076 ] Jacob LeBlanc commented on HBASE-23066: --- Regarding getting some numbers on if the size of the data exceeds the cache, I'm not sure what is being asked for? My thinking is that behavior in that regard is not going to be any different. Keep in mind: this setting only applies when prefetching is already enabled for the column family. In other words we are already going to read the new file entirely into cache. Enabling this setting will only do a little bit earlier while we are writing it out to circumvent a glut of cache misses that kill performance for a period of time after compaction finishes. So if other data will be evicted with the setting enabled, then it would be evicted without this patch as well. This is also why I'm not sure a per-table setting such as a warmup threshold is needed. In fact I'd be happy if this was the default setting as I don't see any negatives, but I'd understand keeping it disabled by default for risk purposes. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941073#comment-16941073 ] Jacob LeBlanc commented on HBASE-23066: --- I've run some performance tests to demonstrate the effectiveness of the patch. I have not patched our production cluster yet as I'm waiting on confirmation from AWS service team that I won't be overwriting AWS-specific changes in the HStore class, but I've done some sampling on a test cluster. Basic setup is an EMR cluster running 1.4.9 backed by S3 as well as ganglia installed to capture the metrics. I have a stress tester executing about 1000 scans per second on a 1.5 GB region. Prefetching is enabled, and I have one region server that is unpatched or has the new configuration setting disabled, and one region server that is patched and has the new configuration option enabled. I then execute the following test: 1. Move the region to the desired region server (either patched or unpatched). 2. Wait for prefetching to complete and for mean scan times to normalize. 3. Execute a major compaction on the target region. 4. Check region server UI / logs to see when the compaction completes. 5. Collect data from ganglia. One issue I identified with my test is the scans aren't as random as they should be so I believe data after compaction is getting cached on read more quickly than it otherwise should be on the unpatched server if my scans were truly random. I can improve the test, but results still validate the patch. Baseline mean scan time was about 20 - 60 milliseconds. After compaction the results were: Trial 1 (unpatched): mean scan time peaked at over 27000 milliseconds, and stayed above 5000 milliseconds for 3 minutes Trial 2 (unpatched): mean scan time peaked at over 27000 milliseconds, and stayed above 5000 milliseconds for 3.5 minutes Trial 3 (patched): mean scan time peaked to 282 milliseconds for one time sample Trial 4 (patched): mean scan time peaked at just over 1300 milliseconds and remained abover 1000 milliseconds for 30 seconds Trial 5 (patched): no noticable spike in mean scan time I've attached a picture of a graph of the results. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob LeBlanc updated HBASE-23066: -- Attachment: performance_results.png > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937109#comment-16937109 ] Jacob LeBlanc commented on HBASE-23066: --- Thanks for taking a look at this! I have not tested on our cluster yet. Our production cluster is the one hitting performance problems after compaction and given that EMR is a managed service where I'm not exactly sure what is being deployed (does Amazon customize any part of the server jar?) I need to somehow verify that replacing the class files in the jar with my patched versions isn't going to cause any harm before trying the patch on that cluster. I'll try patching on a test environment first and I'll also try to get some confirmation through AWS support that I won't be overriding customized changes by replacing a class file, particularly the HStore class. So I'll try to do some testing and get some performance numbers, but it will take me a bit of time. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: HBASE-23066.patch, prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacob LeBlanc updated HBASE-23066: -- Attachment: HBASE-23066.patch Status: Patch Available (was: Open) Added patch for master. After I added to TestCacheOnWrite, I ran into errors with "Too many open files" when running it. It would occur when chmod was run on one of the region files or a child thread was created on around the 50th test iteration. Running only the test I added worked fine, as did running all of the tests after I adjusted my system's open file limit. I'm not sure what threads/open files/processes are getting left around in this test but my addition seems to be pushing it over the edge. Is this a problem? Is it expected to have >1024 open file limit when running large and/or integration tests? > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: HBASE-23066.patch, prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936321#comment-16936321 ] Jacob LeBlanc commented on HBASE-23066: --- I took a shot at creating a patch based off of branch-1.4, calling the new configuration option "hbase.rs.prefetchcompactedblocksonwrite". Please consider this improvement in upcoming release. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Priority: Minor > Fix For: 1.4.11 > > Attachments: prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
Jacob LeBlanc created HBASE-23066: - Summary: Allow cache on write during compactions when prefetching is enabled Key: HBASE-23066 URL: https://issues.apache.org/jira/browse/HBASE-23066 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 1.4.10 Reporter: Jacob LeBlanc Fix For: 1.4.11 Attachments: prefetchCompactedBlocksOnWrite.patch In cases where users care a lot about read performance for tables that are small enough to fit into a cache (or the cache is large enough), prefetchOnOpen can be enabled to make the entire table available in cache after the initial region opening is completed. Any new data can also be guaranteed to be in cache with the cacheBlocksOnWrite setting. However, the missing piece is when all blocks are evicted after a compaction. We found very poor performance after compactions for tables under heavy read load and a slower backing filesystem (S3). After a compaction the prefetching threads need to compete with threads servicing read requests and get constantly blocked as a result. This is a proposal to introduce a new cache configuration option that would cache blocks on write during compaction for any column family that has prefetch enabled. This would virtually guarantee all blocks are kept in cache after the initial prefetch on open is completed allowing for guaranteed steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)