[jira] [Updated] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-23066: --- Release Note: The configuration 'hbase.rs.cacheblocksonwrite' was used to enable caching the blocks on write. But purposefully we were not caching the blocks when we do compaction (since it may be very aggressive) as the caching happens as and when the writer completes a block. In cloud environments since they have bigger sized caches - though they try to enable 'hbase.rs.prefetchblocksonopen' (non - aggressive way of caching the blocks proactively on reader creation) it does not help them because it takes time to cache the compacted blocks. This feature creates a new configuration 'hbase.rs.cachecompactedblocksonwrite' which when set to 'true' will enable the blocks created out of compaction. Remember that since it is aggressive caching the user should be having enough cache space - if not it may lead to other active blocks getting evicted. >From the shell this can be enabled by using the option per Column Family also >by using the below format {code} create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', CONFIGURATION => {'hbase.rs.cachecompactedblocksonwrite' => 'true'}} {code} > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992767#comment-16992767 ] ramkrishna.s.vasudevan edited comment on HBASE-23066 at 12/10/19 5:52 PM: -- [~jacob.leblanc] - Pushed this to master. Tomorrow will push this to branch-2.x (latest). I have some issues in pulling the code - since my VM is not working. BTW thanks for the nice patch. was (Author: ram_krish): [~jacob.leblanc] - Pushed this to master. Tomorrow will push this to branch-2.x (latest). I have some issues in pulling the code - since my VM is not working. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992767#comment-16992767 ] ramkrishna.s.vasudevan commented on HBASE-23066: [~jacob.leblanc] - Pushed this to master. Tomorrow will push this to branch-2.x (latest). I have some issues in pulling the code - since my VM is not working. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23349) Reader lock on compacted store files preventing archival of compacted files
[ https://issues.apache.org/jira/browse/HBASE-23349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992326#comment-16992326 ] ramkrishna.s.vasudevan commented on HBASE-23349: There is no problem. Because the ongoing scans are closed and opened again. So it wont physically close or end the ongoing scans. > Reader lock on compacted store files preventing archival of compacted files > --- > > Key: HBASE-23349 > URL: https://issues.apache.org/jira/browse/HBASE-23349 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 1.6.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Fix For: 3.0.0, 2.3.0, 1.6.0 > > > refCounts on compacted away store files as low as 1 can also prevent archival. > {code:java} > regionserver.HStore - Can't archive compacted file > hdfs://{{root-dir}}/hbase/data/default/t1/12a9e1112e0371955b3db8d3ebb2d298/cf1/73b72f5ddfce4a34a9e01afe7b83c1f9 > because of either isCompactedAway=true or file has reference, > isReferencedInReads=true, refCount=1, skipping for now. > {code} > We should come up with core code blocking reader lock if client or > coprocessor has held the lock for significantly high amount of > time(configurable - mostly same as discharger thread interval) or gracefully > resolve reader lock issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23349) Reader lock on compacted store files preventing archival of compacted files
[ https://issues.apache.org/jira/browse/HBASE-23349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989545#comment-16989545 ] ramkrishna.s.vasudevan commented on HBASE-23349: Sorry for being late here. Good comments and discussion over here. If the refCount is making an issue - then I think previously the scanners were notified that the new compacted files are created and on receiving the notificaiton it just resets the heap. I think to make things simple based on the timeout config (as discussed here) that thread can notify the scanner to reset itself and ensuring that the refCount is decremented and inturn the discharger thread in the next cycle can archive the compacted files. Does that make sense here? > Reader lock on compacted store files preventing archival of compacted files > --- > > Key: HBASE-23349 > URL: https://issues.apache.org/jira/browse/HBASE-23349 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 1.6.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Fix For: 3.0.0, 2.3.0, 1.6.0 > > Attachments: HBASE-23349.master.000.patch, > HBASE-23349.master.001.patch, HBASE-23349.master.002.patch > > > refCounts on compacted away store files as low as 1 can also prevent archival. > {code:java} > regionserver.HStore - Can't archive compacted file > hdfs://{{root-dir}}/hbase/data/default/t1/12a9e1112e0371955b3db8d3ebb2d298/cf1/73b72f5ddfce4a34a9e01afe7b83c1f9 > because of either isCompactedAway=true or file has reference, > isReferencedInReads=true, refCount=1, skipping for now. > {code} > We should come up with core code blocking reader lock if client or > coprocessor has held the lock for significantly high amount of > time(configurable - mostly same as discharger thread interval) or gracefully > resolve reader lock issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988474#comment-16988474 ] ramkrishna.s.vasudevan commented on HBASE-23066: BTW - confirmed seeing the code that it is easy to trigger this per table/family also. Even using shell or the java client. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988473#comment-16988473 ] ramkrishna.s.vasudevan commented on HBASE-23066: [~jacob.leblanc] Thanks for your detailed write up. I saw the reply from [~anoop.hbase]. hope it helps. How ever since you have prefetch also enabled - it means every time your cache was always getting loaded asynchronously and that was helping you in a big way always. Can you just give some rough numbers on you cache size and the number of blocks that you always see in your cache? Is there a sporadic raise in your block count and if so by how much and hope your cache size is good enough to have them. [~jacob.leblanc] If you are fine with the latest PR - I can just merge them and work on the other sub task to make this configuration based on a size so that all the older files' blocks are not cacheed. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23350) Make compaction files cacheonWrite configurable based on threshold
ramkrishna.s.vasudevan created HBASE-23350: -- Summary: Make compaction files cacheonWrite configurable based on threshold Key: HBASE-23350 URL: https://issues.apache.org/jira/browse/HBASE-23350 Project: HBase Issue Type: Sub-task Components: Compaction Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 3.0.0, 2.3.0 As per comment from [~javaman_chen] in the parent JIRA https://issues.apache.org/jira/browse/HBASE-23066?focusedCommentId=16937361=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937361 This is to introduce a config to identify if the resulting compacted file's blocks should be added to the cache - while writing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984157#comment-16984157 ] ramkrishna.s.vasudevan commented on HBASE-23066: [~jacob.leblanc] You want to do the [~javaman_chen] angle of adding threshold based on some size based config - in this JIRA or another one? I can do that in another JIRA if you are busy with other things. [~javaman_chen], [~anoop.hbase] - FYI. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984143#comment-16984143 ] ramkrishna.s.vasudevan commented on HBASE-23066: bq.On a side note, (Not related to this issue) when we have cache on write ON as well as prefetch also On, do we do the caching part for the flushed files twice? When it is written, its already been added to cache. Later as part of HFile reader open, the prefetch threads will again do a read and add to cache! I checked this part. Seems we just read the block and if it is from cache we just return it. Because HfileReaderImpl#readBlock() just return if the block is already cached. bq.The comment from @chenxu seems valid. Should we see that angle also? Ok. We can see that but it is part of this JIRA or should we raise another JIRA and address it. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982284#comment-16982284 ] ramkrishna.s.vasudevan edited comment on HBASE-23066 at 11/26/19 9:27 AM: -- [~jacob.leblanc] Is the patch still applicable on trunk/2.0 branches? Any more reviews here? I can commit this if no other reviews are pending here. Will wait for another day or so. was (Author: ram_krish): [~jacob.leblanc] Is the patch still applicable on trunk/2.0 branches? Any more reviews here? I can commit this if no other reviews are pending here. Will wait for another day or so. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982284#comment-16982284 ] ramkrishna.s.vasudevan commented on HBASE-23066: [~jacob.leblanc] Is the patch still applicable on trunk/2.0 branches? Any more reviews here? I can commit this if no other reviews are pending here. Will wait for another day or so. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23279) Switch default block encoding to ROW_INDEX_V1
[ https://issues.apache.org/jira/browse/HBASE-23279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979010#comment-16979010 ] ramkrishna.s.vasudevan commented on HBASE-23279: [~vjasani] You can try enabling other encoding instead of NONE and see even if it fails bq.Do we support get closest at or before in hbase2? Its deprecated, no? Seems it is deprecated as I remember. But it is still better to verify if the encoding is messing anything. I hope no. > Switch default block encoding to ROW_INDEX_V1 > - > > Key: HBASE-23279 > URL: https://issues.apache.org/jira/browse/HBASE-23279 > Project: HBase > Issue Type: Wish >Affects Versions: 3.0.0, 2.3.0 >Reporter: Lars Hofhansl >Assignee: Viraj Jasani >Priority: Minor > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-23279.master.000.patch, > HBASE-23279.master.001.patch, HBASE-23279.master.002.patch > > > Currently we set both block encoding and compression to NONE. > ROW_INDEX_V1 has many advantages and (almost) no disadvantages (the hfiles > are slightly larger about 3% or so). I think that would a better default than > NONE. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23296) Support Bucket based L1 Cache
[ https://issues.apache.org/jira/browse/HBASE-23296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976452#comment-16976452 ] ramkrishna.s.vasudevan commented on HBASE-23296: Good one [~javaman_chen]. We tried for a tiered bucket cache but that is for the data block itself. But this is for the index blocks itself. Seems like a good improvement. > Support Bucket based L1 Cache > - > > Key: HBASE-23296 > URL: https://issues.apache.org/jira/browse/HBASE-23296 > Project: HBase > Issue Type: Improvement > Components: BlockCache >Reporter: chenxu >Priority: Major > > LruBlockCache is not suitable in the following scenarios: > (1) cache size too large (will take too much heap memory, and > evictBlocksByHfileName is not so efficient, as HBASE-23277 mentioned) > (2) block evicted frequently, especially cacheOnWrite & prefetchOnOpen are > enabled. > Since block‘s data is reclaimed by GC, this may affect GC performance. > So how about enabling a Bucket based L1 Cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-23270) Inter-cluster replication is unaware destination peer cluster's RSGroup to push the WALEdits
[ https://issues.apache.org/jira/browse/HBASE-23270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-23270: -- Assignee: Pradeep > Inter-cluster replication is unaware destination peer cluster's RSGroup to > push the WALEdits > > > Key: HBASE-23270 > URL: https://issues.apache.org/jira/browse/HBASE-23270 > Project: HBase > Issue Type: Bug >Reporter: Pradeep >Assignee: Pradeep >Priority: Major > > In a source RSGroup enabled HBase cluster where replication is enabled to > another destination RSGroup enabled cluster, the replication stream of > List go to any node in the destination cluster without the > awareness of RSGroup and then gets routed to appropriate node where the > region is hosted. This extra hop where the data is received and routed could > be of any node in the cluster and no restriction exists to select the node > within the same RSGroup. > Implications: RSGroup owner in the multi-tenant HBase cluster can see > performance and throughput deviations because of this unpredictability caused > by replication. > Potential fix: options: > a) Select a destination node having RSGroup awareness > b) Group the WAL.Edit list based on region and then by region-servers in > which the regions are assigned in the destination. Pass the list WAL.Edit > directly to the region-server to avoid extra intermediate hop in the > destination cluster during the replication process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953614#comment-16953614 ] ramkrishna.s.vasudevan commented on HBASE-23066: [~busbey] - you want to have a look at the patch and the charts added by [~jacob.leblanc]. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952549#comment-16952549 ] ramkrishna.s.vasudevan commented on HBASE-23066: You may have to add a clear release note for this - as how to use this feature and clearly highlight it is used only when prefetch is turned ON. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 2.3.0, 1.6.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23107) Avoid temp byte array creation when doing cacheDataOnWrite
[ https://issues.apache.org/jira/browse/HBASE-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950971#comment-16950971 ] ramkrishna.s.vasudevan commented on HBASE-23107: Will have a look at this later today or tomorrow.. thanks. > Avoid temp byte array creation when doing cacheDataOnWrite > -- > > Key: HBASE-23107 > URL: https://issues.apache.org/jira/browse/HBASE-23107 > Project: HBase > Issue Type: Improvement >Reporter: chenxu >Assignee: chenxu >Priority: Major > Attachments: flamegraph_after.svg, flamegraph_before.svg > > > code in HFileBlock.Writer.cloneUncompressedBufferWithHeader > {code:java} > ByteBuffer cloneUncompressedBufferWithHeader() { > expectState(State.BLOCK_READY); > byte[] uncompressedBlockBytesWithHeader = baosInMemory.toByteArray(); > … > } > {code} > When cacheOnWrite feature enabled, a temp byte array was created in order to > copy block’s data, we can avoid this by use of ByteBuffAllocator. This can > improve GC performance in write heavy scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22608) MVCC's writeEntry didn't complete and make MVCC stuck
[ https://issues.apache.org/jira/browse/HBASE-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950963#comment-16950963 ] ramkrishna.s.vasudevan commented on HBASE-22608: [~openinx] Are you working on this issue / related issues with IN_MEMORY_COMPACTION? > MVCC's writeEntry didn't complete and make MVCC stuck > - > > Key: HBASE-22608 > URL: https://issues.apache.org/jira/browse/HBASE-22608 > Project: HBase > Issue Type: Bug > Components: in-memory-compaction >Reporter: Guanghao Zhang >Assignee: Zheng Hu >Priority: Critical > > {code:java} > 2019-06-20,05:03:44,917 ERROR > [RpcServer.default.RWQ.Fifo.write.handler=61,queue=1,port=22600] > org.apache.hadoop.hbase.regionserver.HRegion: Asked to modify this region's > (xx,,1560481375170.10b01c12d58ce75c9aaf1ac15cc2a7f3.) memStoreSizing to a > negative value which is incorrect. Current memStoreSizing=-1686222, > delta=1489930 > java.lang.Exception > at > org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1317) > at > org.apache.hadoop.hbase.regionserver.HRegion.incMemStoreSize(HRegion.java:1295) > at > org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3316) > at > org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3821) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4248) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4179) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4109) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1059) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:991) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:954) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2833) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > {code} > See > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3728] > {code:java} > @Override > public WriteEntry writeMiniBatchOperationsToMemStore( > final MiniBatchOperationInProgress miniBatchOp, @Nullable > WriteEntry writeEntry) > throws IOException { > if (writeEntry == null) { > writeEntry = region.mvcc.begin(); > } > super.writeMiniBatchOperationsToMemStore(miniBatchOp, > writeEntry.getWriteNumber()); > return writeEntry; > } > {code} > super.writeMiniBatchOperationsToMemStore throw a exception and the new > writeEntry cannot be complete and make the MVCC stuck. > > And we meet this problem when enable in-memory compaction. But that should be > another issue and need to dig more. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23143) Region Server Crash due to 2 cells out of order ( between 2 DELETEs)
[ https://issues.apache.org/jira/browse/HBASE-23143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949159#comment-16949159 ] ramkrishna.s.vasudevan commented on HBASE-23143: Reading the Cells that you have and the code {code} if (lastCell != null) { int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell); {code} Here we pass lastCell as the 'left' and 'cell' as the right. Generally timestamp is swapped. {code} @Override public int compareTimestamps(final long ltimestamp, final long rtimestamp) { // Swap order we pass into compare so we get DESCENDING order. return Long.compare(rtimestamp, ltimestamp); } {code} So here since the currentCell has the bigger TS we get this exception. But the seqId of the cells as in the log is that the currentCell has lesser seqID than the lastCell. So I suspect how things were added to the memstore. First question is that - how is the Timestamp added? Is it added by the client in this case? > Region Server Crash due to 2 cells out of order ( between 2 DELETEs) > > > Key: HBASE-23143 > URL: https://issues.apache.org/jira/browse/HBASE-23143 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.2 >Reporter: Xu Cang >Priority: Major > Fix For: 1.4.12, 1.3.7, 1.5.1 > > > Region Server Crash due to 2 cells out of order ( between 2 DELETEs) > > Caused by: java.io.IOException: Added a key not lexically larger than > previous. > Current cell = > 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*, > > lastCell = > 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378* > > > I am aware of this JIRA: https://issues.apache.org/jira/browse/HBASE-22862 > Though it's slightly different, HBASE-22862 issue was caused One Delete and > One Put. > This issue I am reporting is caused by 2 Deletes > > Has anyone seen this issue? > > After I read the code and debugged the test cases. > In AbstractHFileWriter.java > {code:java} > int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell);{code} > This call will always ignore SequenceId. And time stamps are in the correct > order (above case) > And since these 2 cells have same KEY. The comparison result should be 0. > *only possible issue I can think of is, in this code piece: in > CellComparator.java:* > {code:java} > Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(), > right.getRowArray(), right.getRowOffset(), right.getRowLength());{code} > The getRowLength() returns a wrong value. > Or the offset is messed up. (?) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943319#comment-16943319 ] ramkrishna.s.vasudevan commented on HBASE-23066: [~jacob.leblanc] Can you create a PR using github? That will trigger the CI for running the tests. And it is easy to merge the patch too. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941735#comment-16941735 ] ramkrishna.s.vasudevan commented on HBASE-23066: bq.As time goes on, HFile will grow larger(because of Compaction), and it's data may get colder and colder, In some scenarios, only the latest time window data is accessed, so warmup the large HFile seems unnecessary. got it. thanks [~javaman_chen]. More of a size based threshold. > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941556#comment-16941556 ] ramkrishna.s.vasudevan commented on HBASE-23066: I think [~jacob.leblanc] says for cloud related use cases having a bigger cache the cache on write after compactions should benefit them - considering the fact that this feature is disabled by default and it is enabled only when prefetch is enabled. The results that are attached here shows very positive impact. [~jacob.leblanc] - you want to prepare a patch for master? [~javaman_chen] bq. If the compacted HFile greater than this threshold, do not cache it, just a suggestion. You mean after every compaciton or in general if the hfile count increases certain level do not cache? > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: HBASE-23066.patch, performance_results.png, > prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940773#comment-16940773 ] ramkrishna.s.vasudevan commented on HBASE-11288: Thanks for the write up. It gives the idea on what is there here. Since now META can split all the journal entries that is been added for a region split will now be added to ROOT and the failure of the meta split will be treated similar to a normal region, correct? Thanks. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937054#comment-16937054 ] ramkrishna.s.vasudevan commented on HBASE-11288: +1 to what [~stack] says. Thanks [~toffer]. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Liu >Assignee: Francis Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled
[ https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937032#comment-16937032 ] ramkrishna.s.vasudevan commented on HBASE-23066: On first glance patch looks good to me [~jacob.leblanc]. Have you tested this in your cluster running on AWS? > Allow cache on write during compactions when prefetching is enabled > --- > > Key: HBASE-23066 > URL: https://issues.apache.org/jira/browse/HBASE-23066 > Project: HBase > Issue Type: Improvement > Components: Compaction, regionserver >Affects Versions: 1.4.10 >Reporter: Jacob LeBlanc >Assignee: Jacob LeBlanc >Priority: Minor > Fix For: 1.5.0, 2.3.0 > > Attachments: prefetchCompactedBlocksOnWrite.patch > > > In cases where users care a lot about read performance for tables that are > small enough to fit into a cache (or the cache is large enough), > prefetchOnOpen can be enabled to make the entire table available in cache > after the initial region opening is completed. Any new data can also be > guaranteed to be in cache with the cacheBlocksOnWrite setting. > However, the missing piece is when all blocks are evicted after a compaction. > We found very poor performance after compactions for tables under heavy read > load and a slower backing filesystem (S3). After a compaction the prefetching > threads need to compete with threads servicing read requests and get > constantly blocked as a result. > This is a proposal to introduce a new cache configuration option that would > cache blocks on write during compaction for any column family that has > prefetch enabled. This would virtually guarantee all blocks are kept in cache > after the initial prefetch on open is completed allowing for guaranteed > steady read performance despite a slow backing file system. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936390#comment-16936390 ] ramkrishna.s.vasudevan commented on HBASE-23035: So on restart you just want to leave the target location as null and allow the LB to take care of the location - right? > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931193#comment-16931193 ] ramkrishna.s.vasudevan commented on HBASE-23035: We were always doing a round robin method in case of SCP right? I mean for non region replica cases? > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (HBASE-22929) MemStoreLAB ChunkCreator may memory leak
[ https://issues.apache.org/jira/browse/HBASE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan resolved HBASE-22929. Fix Version/s: 2.1.6 2.2.1 2.3.0 3.0.0 Resolution: Fixed Thanks for all the reviews. Pushed to master, branch-2, branch-2.1 and branch-2.2. > MemStoreLAB ChunkCreator may memory leak > - > > Key: HBASE-22929 > URL: https://issues.apache.org/jira/browse/HBASE-22929 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Yechao Chen >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6 > > Attachments: OOM_log.png, hbase-site.xml, hbase_heap_monitor.png, > hbase_rs_heap_dump_mat_1.png, > hbase_rs_heap_dump_mat_ChunkCreator_chunkIdMap.png, hbase_rs_mem_used.png > > > We use hbase 2.1.2 with memstorelab enable > RegionServer crashed case of oom > I dump the heap ,found the ChunkCreator may be memory leak > The heap is 32GB, > hbase.regionserver.global.memstore.size=0.4, > hbase.hregion.memstore.mslab.enabled=true > hbase.hregion.memstore.chunkpool.initialsize=0.5, > hbase.hregion.memstore.chunkpool.maxsize=1.0 > BucketCache with offheap -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (HBASE-22929) MemStoreLAB ChunkCreator may memory leak
[ https://issues.apache.org/jira/browse/HBASE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-22929: -- Assignee: ramkrishna.s.vasudevan > MemStoreLAB ChunkCreator may memory leak > - > > Key: HBASE-22929 > URL: https://issues.apache.org/jira/browse/HBASE-22929 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Yechao Chen >Assignee: ramkrishna.s.vasudevan >Priority: Major > Attachments: OOM_log.png, hbase-site.xml, hbase_heap_monitor.png, > hbase_rs_heap_dump_mat_1.png, > hbase_rs_heap_dump_mat_ChunkCreator_chunkIdMap.png, hbase_rs_mem_used.png > > > We use hbase 2.1.2 with memstorelab enable > RegionServer crashed case of oom > I dump the heap ,found the ChunkCreator may be memory leak > The heap is 32GB, > hbase.regionserver.global.memstore.size=0.4, > hbase.hregion.memstore.mslab.enabled=true > hbase.hregion.memstore.chunkpool.initialsize=0.5, > hbase.hregion.memstore.chunkpool.maxsize=1.0 > BucketCache with offheap -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-23006) RSGroupBasedLoadBalancer should also try to place replicas for the same region to different region servers
[ https://issues.apache.org/jira/browse/HBASE-23006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928194#comment-16928194 ] ramkrishna.s.vasudevan commented on HBASE-23006: In many of the cases the LB was not considering the replicas. good to see this getting solved. > RSGroupBasedLoadBalancer should also try to place replicas for the same > region to different region servers > -- > > Key: HBASE-23006 > URL: https://issues.apache.org/jira/browse/HBASE-23006 > Project: HBase > Issue Type: Bug > Components: Region Assignment, rsgroup >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2 > > Attachments: HBASE-23006-UT.patch > > > Found this when implementing HBASE-22971. TestSCPWithReplicas fails when > RSGroupBasedLoadBalancer is enabled. > And this can be reproduced by a UT on master branch too. I think the problem > is that in RSGroupBasedLoadBalancer.retainAssignment we do not consider > region replicas. > We should fix this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924202#comment-16924202 ] ramkrishna.s.vasudevan commented on HBASE-22072: [~aoxiang] Oh I missed this. Just to reiterate if close(true) happens then anyway the memstoreScanners are closed and then no longer updateReaders() happen in 2.1.5. I think in previous versions that was the problem leading to memstore chunks to leak and the store file ref count also was not happening correctly. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.0.0 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Labels: compaction > Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5 > > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920637#comment-16920637 ] ramkrishna.s.vasudevan commented on HBASE-22072: [~aoxiang] Yes. Now it is either updateReaders or close() that will happen. If close(true) already happens then updateReaders won't happen. Previously that was possible and now in 2.1.5 that is not possible since we have closeLock(). > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.0.0 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Labels: compaction > Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5 > > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919269#comment-16919269 ] ramkrishna.s.vasudevan commented on HBASE-22072: [~aoxiang] The patch here has gone into 2.1.5. As per what I see now is that since we have closeLock in place it is either updateReaders() or close() that can happen. As per the previous comments - you can see that if close(false) has happened, then close(true) is bound to happen when the Storescanner actually gets closed. So if updateREaders() happened the memstoreflushers though updated will anyway get closed, when the final close(true) happens. Previously it was multithreaded, now they are not. So if the issue over in HBASE-22929 can be checked with HBASE-2.1.5 probably the issue won't be there? Lemme know if am missing something. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.0.0 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Labels: compaction > Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5 > > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous
[ https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917394#comment-16917394 ] ramkrishna.s.vasudevan commented on HBASE-22862: [~openinx] That is true. Sorry for the confusion. > Region Server crash with: Added a key not lexically larger than previous > > > Key: HBASE-22862 > URL: https://issues.apache.org/jira/browse/HBASE-22862 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.4.10 > Environment: {code} > openjdk version "1.8.0_181" > OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02) > OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed > mode) > {code} >Reporter: Alex Batyrshin >Assignee: Zheng Hu >Priority: Critical > Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch > > > We observe error "Added a key not lexically larger than previous” that cause > most of our region-servers to crash in our cluster. > {code} > 2019-08-15 18:02:10,554 INFO [MemStoreFlusher.0] regionserver.HRegion: > Flushing 1/1 column families, memstore=56.08 MB > 2019-08-15 18:02:10,727 WARN [MemStoreFlusher.0] regionserver.HStore: Failed > flushing store file, retrying num=0 > java.io.IOException: Added a key not lexically larger than previous. Current > cell = > \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567, > lastCell = > \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770 >at > org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204) >at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279) >at > org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) >at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127) >at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139) >at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) >at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003) >at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314) >at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200) >at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125) >at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512) >at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482) >at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76) >at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264) >at java.lang.Thread.run(Thread.java:748) > 2019-08-15 18:02:21,776 WARN [MemStoreFlusher.0] regionserver.HStore: Failed > flushing store file, retrying num=9 > java.io.IOException: Added a key not lexically larger than previous. Current > cell = > \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567, > lastCell = > \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770 >at > org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204) >at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279) >at > org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) >at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127) >at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139) >at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) >at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003) >at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314) >at >
[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous
[ https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917385#comment-16917385 ] ramkrishna.s.vasudevan commented on HBASE-22862: bq. return (0xff & left.getTypeByte()) - (0xff & right.getTypeByte()); [~openinx] - This is correct right - we need the type to be sorted in reverse order - Deletes to appear before puts. > Region Server crash with: Added a key not lexically larger than previous > > > Key: HBASE-22862 > URL: https://issues.apache.org/jira/browse/HBASE-22862 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.4.10 > Environment: {code} > openjdk version "1.8.0_181" > OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02) > OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed > mode) > {code} >Reporter: Alex Batyrshin >Assignee: Zheng Hu >Priority: Critical > Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch > > > We observe error "Added a key not lexically larger than previous” that cause > most of our region-servers to crash in our cluster. > {code} > 2019-08-15 18:02:10,554 INFO [MemStoreFlusher.0] regionserver.HRegion: > Flushing 1/1 column families, memstore=56.08 MB > 2019-08-15 18:02:10,727 WARN [MemStoreFlusher.0] regionserver.HStore: Failed > flushing store file, retrying num=0 > java.io.IOException: Added a key not lexically larger than previous. Current > cell = > \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567, > lastCell = > \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770 >at > org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204) >at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279) >at > org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) >at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127) >at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139) >at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) >at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003) >at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314) >at > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200) >at > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125) >at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512) >at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482) >at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76) >at > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264) >at java.lang.Thread.run(Thread.java:748) > 2019-08-15 18:02:21,776 WARN [MemStoreFlusher.0] regionserver.HStore: Failed > flushing store file, retrying num=9 > java.io.IOException: Added a key not lexically larger than previous. Current > cell = > \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567, > lastCell = > \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770 >at > org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204) >at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279) >at > org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) >at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127) >at > org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139) >at > org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) >at > org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003) >at > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523) >at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622) >at >
[jira] [Commented] (HBASE-22936) Close memStoreScanners in StoreScanner#updateReaders else memory leak
[ https://issues.apache.org/jira/browse/HBASE-22936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917382#comment-16917382 ] ramkrishna.s.vasudevan commented on HBASE-22936: If any one needs to give a patch I can review it. If not I can prepare one. [~aoxiang], [~javaman_chen], [~chenyechao]. > Close memStoreScanners in StoreScanner#updateReaders else memory leak > - > > Key: HBASE-22936 > URL: https://issues.apache.org/jira/browse/HBASE-22936 > Project: HBase > Issue Type: Bug >Reporter: stack >Priority: Major > Fix For: 2.3.0, 2.1.7, 2.2.2 > > > Via [~aoxiang] from over on HBASE-22723 > {code} > + if (!closeLock.tryLock()) { > +// no lock acquired. > +LOG.debug("StoreScanner already has the close lokc. There is no need > to updateReaders"); > +return; > + } > + // lock acquired > + updateReaders = true; > + if (this.closing) { > +LOG.debug("StoreScanner already closing. There is no need to > updateReaders"); > +return; > + } > {code} > We need to close memStoreScanners in StoreScanner#updateReaders before this > two return, someone else can take over the task. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917380#comment-16917380 ] ramkrishna.s.vasudevan commented on HBASE-22072: Thanks [~stack] for filing. It is HBASE-22936. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.0.0 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Labels: compaction > Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5 > > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22929) MemStoreLAB ChunkCreator may memory leak
[ https://issues.apache.org/jira/browse/HBASE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916549#comment-16916549 ] ramkrishna.s.vasudevan commented on HBASE-22929: [~chenyechao] Are you seeing that the chunkIdMap is getting added with new chunks that are not from the pool? Any chance you have enabled CompactingMemstore? > MemStoreLAB ChunkCreator may memory leak > - > > Key: HBASE-22929 > URL: https://issues.apache.org/jira/browse/HBASE-22929 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.2 >Reporter: Yechao Chen >Priority: Major > Attachments: OOM_log.png, hbase-site.xml, hbase_heap_monitor.png, > hbase_rs_heap_dump_mat_1.png, > hbase_rs_heap_dump_mat_ChunkCreator_chunkIdMap.png, hbase_rs_mem_used.png > > > We use hbase 2.1.2 with memstorelab enable > RegionServer crashed case of oom > I dump the heap ,found the ChunkCreator may be memory leak > The heap is 32GB, > hbase.regionserver.global.memstore.size=0.4, > hbase.hregion.memstore.mslab.enabled=true > hbase.hregion.memstore.chunkpool.initialsize=0.5, > hbase.hregion.memstore.chunkpool.maxsize=1.0 > BucketCache with offheap -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-22802) Avoid temp ByteBuffer allocation in FileIOEngine#read
[ https://issues.apache.org/jira/browse/HBASE-22802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913967#comment-16913967 ] ramkrishna.s.vasudevan commented on HBASE-22802: [~javaman_chen] In your case the file based IO engine is backed by what type of Storage device? > Avoid temp ByteBuffer allocation in FileIOEngine#read > - > > Key: HBASE-22802 > URL: https://issues.apache.org/jira/browse/HBASE-22802 > Project: HBase > Issue Type: Improvement > Components: BucketCache >Reporter: chenxu >Assignee: chenxu >Priority: Major > Attachments: HBASE-22802-master-v1.patch, profile_mem_alloc.png, > profile_mem_alloc_with_pool.png > > > a temp ByteBuffer was allocated each time FileIOEngine#read was called > {code:java} > public Cacheable read(BucketEntry be) throws IOException { > long offset = be.offset(); > int length = be.getLength(); > Preconditions.checkArgument(length >= 0, "Length of read can not be less > than 0."); > ByteBuffer dstBuffer = ByteBuffer.allocate(length); > ... > } > {code} > we can avoid this by use of ByteBuffAllocator#allocate(length) after > HBASE-21879 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose
[ https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1694#comment-1694 ] ramkrishna.s.vasudevan commented on HBASE-21879: OOO for personal reasons. No access to official emails during this period. > Read HFile's block to ByteBuffer directly instead of to byte for reducing > young gc purpose > -- > > Key: HBASE-21879 > URL: https://issues.apache.org/jira/browse/HBASE-21879 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, > QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png > > > In HFileBlock#readBlockDataInternal, we have the following: > {code} > @VisibleForTesting > protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, > long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, > boolean updateMetrics) > throws IOException { > // . > // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with > BBPool (offheap). > byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize]; > int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize, > onDiskSizeWithHeader - preReadHeaderSize, true, offset + > preReadHeaderSize, pread); > if (headerBuf != null) { > // ... > } > // ... > } > {code} > In the read path, we still read the block from hfile to on-heap byte[], then > copy the on-heap byte[] to offheap bucket cache asynchronously, and in my > 100% get performance test, I also observed some frequent young gc, The > largest memory footprint in the young gen should be the on-heap block byte[]. > In fact, we can read HFile's block to ByteBuffer directly instead of to > byte[] for reducing young gc purpose. we did not implement this before, > because no ByteBuffer reading interface in the older HDFS client, but 2.7+ > has supported this now, so we can fix this now. I think. > Will provide an patch and some perf-comparison for this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (HBASE-22670) JDK 11 and CellComparator
[ https://issues.apache.org/jira/browse/HBASE-22670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-22670: --- Labels: jdk11 (was: ) > JDK 11 and CellComparator > - > > Key: HBASE-22670 > URL: https://issues.apache.org/jira/browse/HBASE-22670 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Labels: jdk11 > > This could act as a parent JIRA for analysing JDK 11 and the Comparator impls > that we have. > Latest JDK has support for SIMD and AVX512, which means it has support for > vectorizations. > See JDK11's ArraysSupport#mismatch() and vectorizedMismatch(). > We also have BufferMismatch#mismatch() which is not publicly exposed but it > uses the ArraysSupport#vectorizedMismatch(). > Internally vectorizedMismatch() does something similar to what > UnsafeComparator#compareToUnsafe does. Will add about the details of the > study in further comments. > The JDK also exposes new annotations like @HotSpotIntrinsicCandidate and > @ForceInline tags that helps in inlining the intrinsic calls. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (HBASE-22670) JDK 11 and CellComparator
[ https://issues.apache.org/jira/browse/HBASE-22670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881909#comment-16881909 ] ramkrishna.s.vasudevan commented on HBASE-22670: Though we have the ArraysSupport#mismatch() the impl does some more steps than what we have in compareToUnSafe(). Like ArraysSupport#vectorizedMismatch() does {code} long av = U.getLongUnaligned(a, aOffset + bi); long bv = U.getLongUnaligned(b, bOffset + bi); {code} Than doing a Unsafe#getLong() asin compareToUnsafe(). Also the mismatch() API gives you the index where the mismatch happens but the compareToUnsafe() directly returns the comparator output. The ArraysSupport#mismatch() also tries to do some optimization by reading the first bit and if there is a mismatch return there even before doing the getLongUnaligned(). I tried copying the API's impl to the BBUtils class and tried doing getLong() instead of getLongUnaligned() and avoided the first bit read as done in ArraysSupport#mismatch(). The JMH results for a 27 bit row key , 3 bit family and with a 4 bit qualifier where the qualifier alone changes, the CellCompartor#compare() with compareToUnsafe() and mismatch() based impl are as follows With compareToUnsafe() {code} Comparator.arrayBBCompare avgt 10 554.920 ± 2.085 ns/op Comparator.arrayCompareavgt 10 494.358 ± 8.810 ns/op Comparator.bbArrayCompare avgt 10 539.219 ± 5.260 ns/op Comparator.bbCompare avgt 10 220.743 ± 11.723 ns/op {code} With ArraysSupport#mismatch() based impl {code} BenchmarkMode Cnt ScoreError Units Comparator.arrayBBCompareavgt 10 511.787 ± 6.902 ns/op Comparator.arrayCompare avgt 10 440.026 ± 17.410 ns/op Comparator.bbArrayCompareavgt 10 510.578 ± 1.209 ns/op Comparator.bbCompare avgt 10 274.158 ± 1.975 ns/op {code} Basically we don't get a significant difference here. > JDK 11 and CellComparator > - > > Key: HBASE-22670 > URL: https://issues.apache.org/jira/browse/HBASE-22670 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > > This could act as a parent JIRA for analysing JDK 11 and the Comparator impls > that we have. > Latest JDK has support for SIMD and AVX512, which means it has support for > vectorizations. > See JDK11's ArraysSupport#mismatch() and vectorizedMismatch(). > We also have BufferMismatch#mismatch() which is not publicly exposed but it > uses the ArraysSupport#vectorizedMismatch(). > Internally vectorizedMismatch() does something similar to what > UnsafeComparator#compareToUnsafe does. Will add about the details of the > study in further comments. > The JDK also exposes new annotations like @HotSpotIntrinsicCandidate and > @ForceInline tags that helps in inlining the intrinsic calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22671) ByteBufferUtils#findCommonPrefix() can be safely changed to ArraysSupport#mismatch()
ramkrishna.s.vasudevan created HBASE-22671: -- Summary: ByteBufferUtils#findCommonPrefix() can be safely changed to ArraysSupport#mismatch() Key: HBASE-22671 URL: https://issues.apache.org/jira/browse/HBASE-22671 Project: HBase Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Microbenchmarks reveal that finding the common prefix for encoders can safely be replaced with ArraysSupport#mismatch(). the microbenchmark just compares Cells that are backed with array and BB. For a 27 bit common row prefix the existing BBUtils#findCommonPrefix {code} BenchmarkMode CntScoreError Units PrefixComparator.arrayBBCompare avgt 10 869.897 ± 9.429 ns/op PrefixComparator.arrayCompareavgt 10 302.074 ± 13.448 ns/op PrefixComparator.bbArrayCompare avgt 10 869.369 ± 5.368 ns/op PrefixComparator.bbCompare avgt 10 409.479 ± 1.587 ns/op {code} the same with ArraysSupport#mismatch() change gives this {code} BenchmarkMode CntScore Error Units PrefixComparator.arrayBBCompare avgt 10 311.946 ± 1.902 ns/op PrefixComparator.arrayCompareavgt 10 157.010 ± 4.482 ns/op PrefixComparator.bbArrayCompare avgt 10 311.568 ± 1.348 ns/op PrefixComparator.bbCompare avgt 10 92.540 ± 0.501 ns/op {code} How ever note that this comes in flushes/compaction and not during the read path. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22670) JDK 11 and CellComparator
ramkrishna.s.vasudevan created HBASE-22670: -- Summary: JDK 11 and CellComparator Key: HBASE-22670 URL: https://issues.apache.org/jira/browse/HBASE-22670 Project: HBase Issue Type: Improvement Affects Versions: 3.0.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan This could act as a parent JIRA for analysing JDK 11 and the Comparator impls that we have. Latest JDK has support for SIMD and AVX512, which means it has support for vectorizations. See JDK11's ArraysSupport#mismatch() and vectorizedMismatch(). We also have BufferMismatch#mismatch() which is not publicly exposed but it uses the ArraysSupport#vectorizedMismatch(). Internally vectorizedMismatch() does something similar to what UnsafeComparator#compareToUnsafe does. Will add about the details of the study in further comments. The JDK also exposes new annotations like @HotSpotIntrinsicCandidate and @ForceInline tags that helps in inlining the intrinsic calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22608) MVCC's writeEntry didn't complete and make MVCC stuck
[ https://issues.apache.org/jira/browse/HBASE-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868280#comment-16868280 ] ramkrishna.s.vasudevan commented on HBASE-22608: Seems an size accounting issue with in-memory compaction - due to some threading issues? Or some wrong accounting. > MVCC's writeEntry didn't complete and make MVCC stuck > - > > Key: HBASE-22608 > URL: https://issues.apache.org/jira/browse/HBASE-22608 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Priority: Critical > > {code:java} > 2019-06-20,05:03:44,917 ERROR > [RpcServer.default.RWQ.Fifo.write.handler=61,queue=1,port=22600] > org.apache.hadoop.hbase.regionserver.HRegion: Asked to modify this region's > (xx,,1560481375170.10b01c12d58ce75c9aaf1ac15cc2a7f3.) memStoreSizing to a > negative value which is incorrect. Current memStoreSizing=-1686222, > delta=1489930 > java.lang.Exception > at > org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1317) > at > org.apache.hadoop.hbase.regionserver.HRegion.incMemStoreSize(HRegion.java:1295) > at > org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3316) > at > org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3821) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4248) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4179) > at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4109) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1059) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:991) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:954) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2833) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) > at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) > {code} > See > [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3728] > {code:java} > @Override > public WriteEntry writeMiniBatchOperationsToMemStore( > final MiniBatchOperationInProgress miniBatchOp, @Nullable > WriteEntry writeEntry) > throws IOException { > if (writeEntry == null) { > writeEntry = region.mvcc.begin(); > } > super.writeMiniBatchOperationsToMemStore(miniBatchOp, > writeEntry.getWriteNumber()); > return writeEntry; > } > {code} > super.writeMiniBatchOperationsToMemStore throw a exception and the new > writeEntry cannot be complete and make the MVCC stuck. > > And we meet this problem when enable in-memory compaction. But that should be > another issue and need to dig more. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22602) Allow storage policy to be set per column family in PE tool
ramkrishna.s.vasudevan created HBASE-22602: -- Summary: Allow storage policy to be set per column family in PE tool Key: HBASE-22602 URL: https://issues.apache.org/jira/browse/HBASE-22602 Project: HBase Issue Type: Improvement Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.3.0 Currently PE tool does not have support for per column family storage policy support. This JIRA is aimed to add that support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22539) Potential WAL corruption due to Unsafe.copyMemory usage when DBB are in place
[ https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856391#comment-16856391 ] ramkrishna.s.vasudevan commented on HBASE-22539: [~wchevreuil] Have you verified that all the calls that come to the ByteBufferWriterStream#write() has len which is always less than the buffSize? Because if there is something wrong there - then the sanity code that you have where you directly read from the ByteBuffer 'b' to testBuf will work fine but not the other one. I think by default SimpleRpcServer also uses pool and that is also offheap. This is quite difficult to dig in and great work. > Potential WAL corruption due to Unsafe.copyMemory usage when DBB are in place > - > > Key: HBASE-22539 > URL: https://issues.apache.org/jira/browse/HBASE-22539 > Project: HBase > Issue Type: Bug > Components: rpc, wal >Affects Versions: 2.1.1 >Reporter: Wellington Chevreuil >Priority: Blocker > > Summary > We had been chasing a WAL corruption issue reported on one of our customers > deployments running release 2.1.1 (CDH 6.1.0). After providing a custom > modified jar with the extra sanity checks implemented by HBASE-21401 applied > on some code points, plus additional debugging messages, we believe it is > related to DirectByteBuffer usage, and Unsafe copy from offheap memory to > on-heap array triggered > [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157], > such as when writing into a non ByteBufferWriter type, as done > [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84]. > More details on the following comment. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22532) There's still too much cpu wasting on validating checksum even if buffer.size=65KB
[ https://issues.apache.org/jira/browse/HBASE-22532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855300#comment-16855300 ] ramkrishna.s.vasudevan commented on HBASE-22532: Probably we should see the size of per block getting written and while reading see what are the offsets and length we pass to HDFS and then ascertain if that matches with the dataLength you got here. Probably we are reading more (approx 2 blocks). good one [~openinx]. > There's still too much cpu wasting on validating checksum even if > buffer.size=65KB > -- > > Key: HBASE-22532 > URL: https://issues.apache.org/jira/browse/HBASE-22532 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: async-prof-pid-27827-cpu-3.svg, > async-prof-pid-64695-cpu-1.svg > > > After disabled the block cache, and with the following config: > {code} > # Disable the block cache > hfile.block.cache.size=0 > hbase.ipc.server.allocator.buffer.size=66560 > hbase.ipc.server.reservoir.minimal.allocating.size=0 > {code} > The ByteBuff for block should be expected to be a SingleByteBuff, which will > use the hadoop native lib to validate the checksum, while in the cpu flame > graph > [async-prof-pid-27827-cpu-3.svg|https://issues.apache.org/jira/secure/attachment/12970683/async-prof-pid-27827-cpu-3.svg], > we can still see that about 32% CPU wasted on PureJavaCrc32#update, which > means it's not using the faster hadoop native lib. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22531) The HFileReaderImpl#shouldUseHeap return the incorrect true when disabled BlockCache
[ https://issues.apache.org/jira/browse/HBASE-22531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854469#comment-16854469 ] ramkrishna.s.vasudevan commented on HBASE-22531: Nice one. +1. > The HFileReaderImpl#shouldUseHeap return the incorrect true when disabled > BlockCache > - > > Key: HBASE-22531 > URL: https://issues.apache.org/jira/browse/HBASE-22531 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22531.HBASE-21879.v1.patch, > async-prof-pid-13311-alloc-4.svg > > > I'm having a benchmark with block cache disabled for HBASE-21879 branch. > Just caurious about why still so many heap allocation in the heap allocation > flame graph [async-prof-pid-13311-alloc-4.svg | > https://issues.apache.org/jira/secure/attachment/12970648/async-prof-pid-13311-alloc-4.svg], >actually, I've set the following config, which means all allocation should > be offheap, while it's not: > {code} > # Disable the block cache > hfile.block.cache.size=0 > hbase.ipc.server.reservoir.minimal.allocating.size=0 # Let all allocation > from pooled allocator. > {code} > Checked the code, I found the problem here: > {code} > private boolean shouldUseHeap(BlockType expectedBlockType) { > if (cacheConf.getBlockCache() == null) { > return false; > } else if (!cacheConf.isCombinedBlockCache()) { > // Block to cache in LruBlockCache must be an heap one. So just > allocate block memory from > // heap for saving an extra off-heap to heap copying. > return true; > } > return expectedBlockType != null && !expectedBlockType.isData(); > } > {code} > Say, the CacheConfig#getBlockCache will return a Optional, > which is always non-null: > {code} > /** >* Returns the block cache. >* >* @return the block cache, or null if caching is completely disabled >*/ > public Optional getBlockCache() { > return Optional.ofNullable(this.blockCache); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22483) Maybe it's better to use 65KB as the default buffer size in ByteBuffAllocator
[ https://issues.apache.org/jira/browse/HBASE-22483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852789#comment-16852789 ] ramkrishna.s.vasudevan commented on HBASE-22483: Excellent results !! Seems QPS is more stable and all the p9X latency are also stable. > Maybe it's better to use 65KB as the default buffer size in ByteBuffAllocator > - > > Key: HBASE-22483 > URL: https://issues.apache.org/jira/browse/HBASE-22483 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 121240.stack, BucketCacheWriter-is-busy.png, > checksum-stacktrace.png, with-buffer-size-64KB.png, with-buffer-size-65KB.png > > > There're some reason why it's better to choose 65KB as the default buffer > size: > 1. Almost all of the data block have a block size: 64KB + delta, whose delta > is very small, depends on the size of lastKeyValue. If we use the default > hbase.ipc.server.allocator.buffer.size=64KB, then each block will be > allocated as a MultiByteBuff: one 64KB DirectByteBuffer and delta bytes > HeapByteBuffer, the HeapByteBuffer will increase the GC pressure. Ideally, we > should let the data block to be allocated as a SingleByteBuff, it has simpler > data structure, faster access speed, less heap usage... > 2. In my benchmark, I found some checksum stack traces . (see > [checksum-stacktrace.png > |https://issues.apache.org/jira/secure/attachment/12969905/checksum-stacktrace.png]) > > Since the block are MultiByteBuff, so we have to calculate the checksum by > an temp heap copying ( see HBASE-21917), while if we're a SingleByteBuff, we > can speed the checksum by calling the hadoop' checksum in native lib, it's > more faster. > 3. Seems the BucketCacheWriters were always busy because of the higher cost > of copying from MultiByteBuff to DirectByteBuffer. For SingleByteBuff, we > can just use the unsafe array copying while for MultiByteBuff we have to copy > byte one by one. > Anyway, I will give a benchmark for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849899#comment-16849899 ] ramkrishna.s.vasudevan commented on HBASE-22422: [~openinx] I just asked a question in the PR. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, > HBASE-22422-qps-after-fix-the-zero-retain-bug.png, > HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22480) Get block from BlockCache once and return this block to BlockCache twice make ref count error.
[ https://issues.apache.org/jira/browse/HBASE-22480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849319#comment-16849319 ] ramkrishna.s.vasudevan commented on HBASE-22480: [~aoxiang] So here there is no negative ref counting na. As per the v2 patch may be we should add to prevBlocks and then return them? > Get block from BlockCache once and return this block to BlockCache twice make > ref count error. > -- > > Key: HBASE-22480 > URL: https://issues.apache.org/jira/browse/HBASE-22480 > Project: HBase > Issue Type: Sub-task >Reporter: binlijin >Assignee: binlijin >Priority: Major > Attachments: HBASE-22480-master-v1.patch, HBASE-22480-master-v2.patch > > > After debugging HBASE-22433, i find the problem it is that we get a block > from BucketCache once and return this block to BucketCache twice and make the > ref count error, sometimes the refCount can be negative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22480) Get block from BlockCache once and return this block to BlockCache twice make ref count error.
[ https://issues.apache.org/jira/browse/HBASE-22480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849089#comment-16849089 ] ramkrishna.s.vasudevan commented on HBASE-22480: So here it means that the {code} HFileBlock seekToBlock = reader.getDataBlockIndexReader().seekToDataBlock(key, curBlock, cacheBlocks, pread, isCompaction, reader.getEffectiveEncodingInCache(isCompaction)); if (seekToBlock == null) { return false; } {code} the seekToDataBlock() returns the same curBlock and so you want the curBlock from being returned? But just after this anyway the curBlock is updated with a new block right? > Get block from BlockCache once and return this block to BlockCache twice make > ref count error. > -- > > Key: HBASE-22480 > URL: https://issues.apache.org/jira/browse/HBASE-22480 > Project: HBase > Issue Type: Sub-task >Reporter: binlijin >Assignee: binlijin >Priority: Major > Attachments: HBASE-22480-master-v1.patch, HBASE-22480-master-v2.patch > > > After debugging HBASE-22433, i find the problem it is that we get a block > from BucketCache once and return this block to BucketCache twice and make the > ref count error, sometimes the refCount can be negative. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
[ https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847259#comment-16847259 ] ramkrishna.s.vasudevan commented on HBASE-22422: bq.Understand now, it's a cnocurrent bug in RAMCache, say if thread1 try to getBlock as following: Good one. > Retain an ByteBuff with refCnt=0 when getBlock from LRUCache > > > Key: HBASE-22422 > URL: https://issues.apache.org/jira/browse/HBASE-22422 > Project: HBase > Issue Type: Sub-task > Components: BlockCache >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, > 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch, > LRUBlockCache-getBlock.png, debug.patch, > failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png > > > After runing YCSB scan/get benchmark in our XiaoMi cluster, we found the get > QPS dropped from 25000/s to hunderds per second in a cluster with five > nodes. > After enable the debug log at YCSB client side, I found the following > stacktrace , see > https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png. > > After looking into the stractrace, I can ensure that the zero refCnt block is > an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png > Need a patch to fix this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22448) Scan is slow for Multiple Column prefixes
[ https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-22448: --- Attachment: org.apache.hadoop.hbase.filter.TestSlowColumnPrefix-output.zip > Scan is slow for Multiple Column prefixes > - > > Key: HBASE-22448 > URL: https://issues.apache.org/jira/browse/HBASE-22448 > Project: HBase > Issue Type: Bug > Components: Scanners >Affects Versions: 1.4.8, 1.4.9 >Reporter: Karthick >Assignee: Zheng Hu >Priority: Critical > Labels: prefix, scan, scanner > Fix For: 1.5.0, 1.4.10 > > Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, > org.apache.hadoop.hbase.filter.TestSlowColumnPrefix-output.zip, > qualifiers.txt, scanquery.txt > > > While scanning a row (around 10 lakhs columns) with 100 column prefixes, it > takes around 4 seconds in hbase-1.2.5 and when the same query is executed in > hbase-1.4.9 it takes around 50 seconds. > Is there any way to optimise this? > > *P.S:* > We have applied the patch provided in > [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and > [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached > *qualifiers*.*txt* file which contains the column keys. Use the > *HBaseFileImport.java* file provided to populate in your table and use > *scanquery.txt* to query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22448) Scan is slow for Multiple Column prefixes
[ https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845781#comment-16845781 ] ramkrishna.s.vasudevan edited comment on HBASE-22448 at 5/22/19 11:22 AM: -- Attached the output with some sysouts. Seems we are doing lot of SEEK_USING_HINTS for every column that we already visited for each for the cells. And this goes on for every column. was (Author: ram_krish): Attached the output with some sysouts. Seems we are doing lot of SEEK_USING_HINTS for every column that we already visited for each for the cells. And this goes on. > Scan is slow for Multiple Column prefixes > - > > Key: HBASE-22448 > URL: https://issues.apache.org/jira/browse/HBASE-22448 > Project: HBase > Issue Type: Bug > Components: Scanners >Affects Versions: 1.4.8, 1.4.9 >Reporter: Karthick >Assignee: Zheng Hu >Priority: Critical > Labels: prefix, scan, scanner > Fix For: 1.5.0, 1.4.10 > > Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, > qualifiers.txt, scanquery.txt > > > While scanning a row (around 10 lakhs columns) with 100 column prefixes, it > takes around 4 seconds in hbase-1.2.5 and when the same query is executed in > hbase-1.4.9 it takes around 50 seconds. > Is there any way to optimise this? > > *P.S:* > We have applied the patch provided in > [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and > [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached > *qualifiers*.*txt* file which contains the column keys. Use the > *HBaseFileImport.java* file provided to populate in your table and use > *scanquery.txt* to query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22448) Scan is slow for Multiple Column prefixes
[ https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845781#comment-16845781 ] ramkrishna.s.vasudevan commented on HBASE-22448: Attached the output with some sysouts. Seems we are doing lot of SEEK_USING_HINTS for every column that we already visited for each for the cells. And this goes on. > Scan is slow for Multiple Column prefixes > - > > Key: HBASE-22448 > URL: https://issues.apache.org/jira/browse/HBASE-22448 > Project: HBase > Issue Type: Bug > Components: Scanners >Affects Versions: 1.4.8, 1.4.9 >Reporter: Karthick >Assignee: Zheng Hu >Priority: Critical > Labels: prefix, scan, scanner > Fix For: 1.5.0, 1.4.10 > > Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, > qualifiers.txt, scanquery.txt > > > While scanning a row (around 10 lakhs columns) with 100 column prefixes, it > takes around 4 seconds in hbase-1.2.5 and when the same query is executed in > hbase-1.4.9 it takes around 50 seconds. > Is there any way to optimise this? > > *P.S:* > We have applied the patch provided in > [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and > [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached > *qualifiers*.*txt* file which contains the column keys. Use the > *HBaseFileImport.java* file provided to populate in your table and use > *scanquery.txt* to query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22448) Scan is slow for Multiple Column prefixes
[ https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845592#comment-16845592 ] ramkrishna.s.vasudevan commented on HBASE-22448: SEems so. Previously only the list of ColumnPrefixfilters were doing a comparison on the qualifier and prefix but now after that we again seem to do more comparisons particularly when the prefix added to the filter list are not sorted. [~openinx] is that correct ? > Scan is slow for Multiple Column prefixes > - > > Key: HBASE-22448 > URL: https://issues.apache.org/jira/browse/HBASE-22448 > Project: HBase > Issue Type: Bug > Components: Scanners >Affects Versions: 1.4.8, 1.4.9 >Reporter: Karthick >Assignee: Zheng Hu >Priority: Critical > Labels: prefix, scan, scanner > Fix For: 1.5.0, 1.4.10 > > Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, > qualifiers.txt, scanquery.txt > > > While scanning a row (around 10 lakhs columns) with 100 column prefixes, it > takes around 4 seconds in hbase-1.2.5 and when the same query is executed in > hbase-1.4.9 it takes around 50 seconds. > Is there any way to optimise this? > > *P.S:* > We have applied the patch provided in > [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and > [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached > *qualifiers*.*txt* file which contains the column keys. Use the > *HBaseFileImport.java* file provided to populate in your table and use > *scanquery.txt* to query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845506#comment-16845506 ] ramkrishna.s.vasudevan commented on HBASE-22072: [~lhofhansl] The specific problem here was happening in 2.0 branches due to close() calls that happens twice - once during the scan and other one during the shipped() call to release the block ref in the block cache. In 1.3 that problem does not exist as far as I can see and the test case also did not fail. Do you see any other potential issue ? I can help here . > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.0.0 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Labels: compaction > Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5 > > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22433) Corrupt hfile data
[ https://issues.apache.org/jira/browse/HBASE-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843720#comment-16843720 ] ramkrishna.s.vasudevan commented on HBASE-22433: Is this some how related to https://issues.apache.org/jira/browse/HBASE-19511. There we forced a onheap copy to avoid this ref counting issue. Seems the code has changed a lot now. > Corrupt hfile data > -- > > Key: HBASE-22433 > URL: https://issues.apache.org/jira/browse/HBASE-22433 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: binlijin >Priority: Critical > > We use 2.2.0 version and encounter corrupt cell data. > {code} > 2019-05-15 22:53:59,354 ERROR > [regionserver/hb-mbasedata-14:16020-longCompactions-1557048533421] > regionserver.CompactSplit: Compaction failed > region=mktdm_id_src,9990,1557681762973.255e9adde013e370deb595c59a7285c3., > storeName=o, priority=196, startTime=1557931927314 > java.lang.IllegalStateException: Invalid currKeyLen 1700752997 or > currValueLen 2002739568. Block offset: 70452918, block length: 66556, > position: 42364 (without header). > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.checkKeyValueLen(HFileReaderImpl.java:1182) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readKeyValueLen(HFileReaderImpl.java:628) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1080) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:644) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:386) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:326) > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126) > at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1429) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2231) > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:629) > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:671) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > 2019-05-15 23:14:24,143 ERROR > [regionserver/hb-mbasedata-14:16020-longCompactions-1557048533422] > regionserver.CompactSplit: Compaction failed > region=mktdm_id_src,9fdee4,1557681762973.1782aebb83eae551e7bdfc2bfa13eb3d., > storeName=o, priority=194, startTime=1557932726849 > java.lang.RuntimeException: Unknown code 98 > at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:274) > at org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(CellUtil.java:1307) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.getMidpoint(HFileWriterImpl.java:383) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishBlock(HFileWriterImpl.java:343) > at > org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:603) > at > org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:376) > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:98) > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:42) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:335) > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126) > at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1429) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2231) > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:629) > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:671) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at
[jira] [Commented] (HBASE-22412) Improve the metrics in ByteBuffAllocator
[ https://issues.apache.org/jira/browse/HBASE-22412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843679#comment-16843679 ] ramkrishna.s.vasudevan commented on HBASE-22412: [~openinx] Patch looks good to me. So what was the motivation to do this? As I can see from the example case you quoted in the description - you want to show the actual allocation on heap to be much lesser than the number of allocations ? > Improve the metrics in ByteBuffAllocator > > > Key: HBASE-22412 > URL: https://issues.apache.org/jira/browse/HBASE-22412 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22412.HBASE-21879.v1.patch, > HBASE-22412.HBASE-21879.v2.patch, HBASE-22412.HBASE-21879.v3.patch, JMX.png, > web-UI.png > > > gAddress the comment in HBASE-22387: > bq. The ByteBuffAllocator#getFreeBufferCount will be O(N) complexity, because > the buffers here is an ConcurrentLinkedQueue. It's worth file an issue for > this. > Also I think we should use the allcated bytes instead of allocation number to > evaluate the heap allocation percent , so that we can decide whether the > ByteBuffer is too small and whether will have higher GC pressure. Assume the > case: the buffer size is 64KB, and each time we have a block with 65KB, then > it will have one heap allocation (1KB) and one pool allocation (64KB), if > only consider the allocation num, then the heap allocation ratio will be 1 / > (1 + 1) = 50%, but if consider the allocation bytes, the allocation ratio > will be 1KB / 65KB = 1.5%. > If the heap allocation percent is less than > hbase.ipc.server.reservoir.minimal.allocating.size / > hbase.ipc.server.allocator.buffer.size, then the allocator works fine, > otherwise it's overload. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836257#comment-16836257 ] ramkrishna.s.vasudevan commented on HBASE-22072: Verified in branch-1 series. This issue does not exist there. Because we don't have the shipped() call and there is only one version of close(). On checking 'closing' as true the updateREaders() does not proceed with the updation of the various scanners in the StoreScanner. The test attached in this patch does not fail in branch-1. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Labels: compaction > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-22072: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.1.5 2.0.6 2.3.0 2.2.0 Status: Resolved (was: Patch Available) > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Labels: compaction > Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5 > > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835679#comment-16835679 ] ramkrishna.s.vasudevan commented on HBASE-22072: Pushed to all the branch-2 lines. Need to rebase the patch for branch-1 series. Will resolve it once I push it there. Thanks for all the reviews. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Labels: compaction > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21921) Notify users if the ByteBufAllocator is always allocating ByteBuffers from heap which means the increacing GC pressure
[ https://issues.apache.org/jira/browse/HBASE-21921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834396#comment-16834396 ] ramkrishna.s.vasudevan commented on HBASE-21921: Good one. > Notify users if the ByteBufAllocator is always allocating ByteBuffers from > heap which means the increacing GC pressure > -- > > Key: HBASE-21921 > URL: https://issues.apache.org/jira/browse/HBASE-21921 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Minor > Attachments: HBASE-21921.HBASE-21879.v01.patch, > HBASE-21921.HBASE-21879.v02.patch, jmx-metrics.png, web-ui.png > > > As the javadoc of ByteBuffAllocator says: > {code} > There's possible that the desired memory size is large than ByteBufferPool > has, we'll downgrade to allocate ByteBuffers from heap which meaning the GC > pressure may increase again. Of course, an better way is increasing the > ByteBufferPool size if we detected this case. > {code} > So I think we need some messages to remind the user that an larger > ByteBufferPool size may be better if the allocator allocate ByteBuffer from > heap frequently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22090) The HFileBlock#CacheableDeserializer should pass ByteBuffAllocator to the newly created HFileBlock
[ https://issues.apache.org/jira/browse/HBASE-22090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826804#comment-16826804 ] ramkrishna.s.vasudevan commented on HBASE-22090: bq. private final ByteBuffAllocator allocator; BucketEntry will have one more reference now. As Anoop said in RB this may be adding some more overhead. Is it better to have BucketEntry inside BucketCache only so that the bucket cache can have ref to the allocator ? > The HFileBlock#CacheableDeserializer should pass ByteBuffAllocator to the > newly created HFileBlock > -- > > Key: HBASE-22090 > URL: https://issues.apache.org/jira/browse/HBASE-22090 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Attachments: HBASE-22090.HBASE-21879.v01.patch > > > In HBASE-22005, we have the following TODO in > HFileBlock#CacheableDeserializer: > {code} > public static final class BlockDeserializer implements > CacheableDeserializer { > private BlockDeserializer() { > } > @Override > public HFileBlock deserialize(ByteBuff buf, boolean reuse, MemoryType > memType) > throws IOException { >// > // TODO make the newly created HFileBlock use the off-heap allocator, > Need change the > // deserializer or change the deserialize interface. > return new HFileBlock(newByteBuff, usesChecksum, memType, offset, > nextBlockOnDiskSize, null, > ByteBuffAllocator.HEAP); > } > {code} > Should use the global ByteBuffAllocator here rather than HEAP allocator, as > the TODO said, we need to adjust the interface of deserializer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821671#comment-16821671 ] ramkrishna.s.vasudevan commented on HBASE-22072: bq.Also in an earlier comment I have raised some more issues where we just open scanner on some files and do not use those scanner as the are TTL not matching for the scan . Well that is not happening in this issue still they are issues. I have not checked this comment or this code part. If at all there I think we can fix in new issue. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821670#comment-16821670 ] ramkrishna.s.vasudevan commented on HBASE-22072: bq.But the scanner is still not over. And so the scanner did not get a chance to update the readers. So we can not really do this immediate return model. This is what I tried to check in the code. As per my code reading it seems once a StoreScanner says close(false) in the next() flow or reseek() flow it means from the region level there are not going to be any other scan that is going to happen from that StoreScanner. Finally after a shipped call this store scanner will be closed when the scan completes. So I felt it is better we just don't update the readers in that case. And that is why if at all there is a close() call we just avoid the updateReaders itself. The other way to look at that is by making 'closing' true in all cases. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819979#comment-16819979 ] ramkrishna.s.vasudevan commented on HBASE-22072: bq.Is it possible if other thread, performing updateReaders, see closing flag still false after StoreScanner#close acomplished? As far as I see - since we have any way restricted the multi threaded way of accessing the 'closing' variable and always it is only one thread trying to read it it should be able to see the latest copy. Some one can correct me if my understanding is wrong here. BTW thanks [~pKirillov] for the confirmation by testing it in your cluster. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-22072: -- Assignee: ramkrishna.s.vasudevan > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Assignee: ramkrishna.s.vasudevan >Priority: Major > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-22072: --- Status: Patch Available (was: Open) BTW I created the patch with HBASE-21879 branch. I had that branch for some reviews. If the patch is fine I can create patches for the master branch. Only the test case would need to be modified a little. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Priority: Major > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818898#comment-16818898 ] ramkrishna.s.vasudevan commented on HBASE-22072: Created a patch that now creates a closeLock. I checked the code where close(false) happens when the current scanner thread sees there is no data to retrieve. And finally the close(true) will any way happen wthen the scan finishes the complete fetch of data and happens at the RegionScanner level. So it is the updateReaders and the close(true) call that may have happened asynchronously leading to the case that [~pKirillov]has mentioned here. bq.notice flushedstoreFileScanners is an ArrayList, neither volatile no a threadsafe one. Rarely thread, that closes StoreScanner right after flusher thread executed StoreScanner.updateReaders may not see changes in flushedstoreFileScanners list and keeps unclosed scanner. This am not sure. Declaring the flushedstoreFileScanners as volatile is only ensuring the reference to be volatile but the contents of the list since in this patch we do with a lock i think the thread doing the close() and the thread doing updateReaders() should anyway be seeing the updated contents of the flushedstoreFileScanners list. [~pKirillov] Can you see this patch and give your comments? If you feel this is good can you try it in your cluster to see if the problem that you said happens again? > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Priority: Major > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-22072: --- Attachment: HBASE-22072.HBASE-21879-v1.patch > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Priority: Major > Attachments: HBASE-22072.HBASE-21879-v1.patch > > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818669#comment-16818669 ] ramkrishna.s.vasudevan commented on HBASE-22072: Am able to reproduce this. Will upload a formal test and then a fix for it. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Priority: Major > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815297#comment-16815297 ] ramkrishna.s.vasudevan commented on HBASE-22072: Lets see how to take this forward. Will see how can we write a UT for this. > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Priority: Major > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose
[ https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808685#comment-16808685 ] ramkrishna.s.vasudevan commented on HBASE-21879: OOO for personal reasons. No access to official emails during this period. > Read HFile's block to ByteBuffer directly instead of to byte for reducing > young gc purpose > -- > > Key: HBASE-21879 > URL: https://issues.apache.org/jira/browse/HBASE-21879 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, > QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png > > > In HFileBlock#readBlockDataInternal, we have the following: > {code} > @VisibleForTesting > protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, > long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, > boolean updateMetrics) > throws IOException { > // . > // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with > BBPool (offheap). > byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize]; > int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize, > onDiskSizeWithHeader - preReadHeaderSize, true, offset + > preReadHeaderSize, pread); > if (headerBuf != null) { > // ... > } > // ... > } > {code} > In the read path, we still read the block from hfile to on-heap byte[], then > copy the on-heap byte[] to offheap bucket cache asynchronously, and in my > 100% get performance test, I also observed some frequent young gc, The > largest memory footprint in the young gen should be the on-heap block byte[]. > In fact, we can read HFile's block to ByteBuffer directly instead of to > byte[] for reducing young gc purpose. we did not implement this before, > because no ByteBuffer reading interface in the older HDFS client, but 2.7+ > has supported this now, so we can fix this now. I think. > Will provide an patch and some perf-comparison for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808390#comment-16808390 ] ramkrishna.s.vasudevan commented on HBASE-22072: May be we should have a lock for closing and the updateReader() should try to get that lock before trying to update the scanners? If already closing is done then don't do it? > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Priority: Major > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery
[ https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806496#comment-16806496 ] ramkrishna.s.vasudevan commented on HBASE-22072: bq.updateReaders and further updateReaders procedure does not consider StoreScanner is closing or not. Thanks [~pKirillov] for the analysis. So you are saying that already the StoreScanner itself is getting closed and during that time updateReaders is doing a new set of scanners as part of which refCount increment happens. Need to see if this is really possible - can you write a UT to see if this case can be reproduced? > High read/write intensive regions may cause long crash recovery > --- > > Key: HBASE-22072 > URL: https://issues.apache.org/jira/browse/HBASE-22072 > Project: HBase > Issue Type: Bug > Components: Performance, Recovery >Affects Versions: 2.1.2 >Reporter: Pavel >Priority: Major > > Compaction of high read loaded region may leave compacted files undeleted > because of existing scan references: > INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted > file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file > has reference, isReferencedInReads=true, refCount=1, skipping for now > If region is either high write loaded this happens quite often and region may > have few storefiles and tons of undeleted compacted hdfs files. > Region keeps all that files (in my case thousands) untill graceful region > closing procedure, which ignores existing references and drop obsolete files. > It works fine unless consuming some extra hdfs space, but only in case of > normal region closing. If region server crashes than new region server, > responsible for that overfiling region, reads hdfs folder and try to deal > with all undeleted files, producing tons of storefiles, compaction tasks and > consuming abnormal amount of memory, wich may lead to OutOfMemory Exception > and further region servers crash. This stops writing to region because number > of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC > duty and may take hours to compact all files into working set of files. > Workaround is a periodically check hdfs folders files count and force region > assign for ones with too many files. > It could be nice if regionserver had a setting similar to > hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted > compacted files if number of files reaches this setting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-21874: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.x Status: Resolved (was: Patch Available) Pushed to branch-2 also. Resolving. > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0, 2.x > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, > HBASE-21874_V6.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785918#comment-16785918 ] ramkrishna.s.vasudevan commented on HBASE-21874: Thanks for all the reviews and feedback. Pushed to master. [~busbey], [~wchevreuil], [~jdcryans], [~elserj], [~vrodionov] & [~anoop.hbase]. > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, > HBASE-21874_V6.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785285#comment-16785285 ] ramkrishna.s.vasudevan commented on HBASE-21874: The test failures are unrelated and seems to be flakey tests. > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, > HBASE-21874_V6.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-21874: --- Attachment: HBASE-21874_V6.patch > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, > HBASE-21874_V6.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784692#comment-16784692 ] ramkrishna.s.vasudevan commented on HBASE-21874: bq.ExclusiveMemoryMmapIOEngine extends from FileMmapIOEngine which returns true, so it is needed. [~busbey] - Added that usesSharedMemory to return false for just being explicit. But as [~wchevreuil] said since FileMMapIOEngine is implementing IOEngine interface by default it is false. So we can avoid that too in ExclusiveMemoryMmapIOEngine. > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-21874: --- Status: Open (was: Patch Available) > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-21874: --- Status: Patch Available (was: Open) > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-21874: --- Attachment: HBASE-21874_V5.patch > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-21874: --- Status: Patch Available (was: Open) Patch seems to be big now due to the refactoring done. Now we have an abstract MmapIOEngine with ExclusiveMemoryMMapIOEngine (old FileMMapIOEngine) and SharedMemoryMMapIOEngine(PmemIOEngine) are its subclasses. Since both have similar mechanisms for mmaping and only the backing device is different which helps us in creating a SHARED memory we went with this approach (so that is more abstract in nature). It also answers [~jdcryans] comments. [~busbey] - Thanks for pointing out the xmls and doc to be changed. We had missed it out. Let us know what you think of the latest patch. > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-21874: --- Attachment: HBASE-21874_V4.patch > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, HBASE-21874_V4.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21981) MMaped bucket cache IOEngine does not work with persistence
[ https://issues.apache.org/jira/browse/HBASE-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-21981: --- Summary: MMaped bucket cache IOEngine does not work with persistence (was: MMaped bucket cache IOEngines does not work with persistence) > MMaped bucket cache IOEngine does not work with persistence > --- > > Key: HBASE-21981 > URL: https://issues.apache.org/jira/browse/HBASE-21981 > Project: HBase > Issue Type: Bug > Components: BucketCache >Affects Versions: 2.1.3 >Reporter: ramkrishna.s.vasudevan >Assignee: Anoop Sam John >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.5, 2.3.0, 2.1.4 > > > The MMap based IOEngines does not retrieve the data back if > 'hbase.bucketcache.persistent.path' is enabled. FileIOEngine works fine but > only the FileMMapEngine has this problem. > The reason is that we don't get the byte buffers in the proper order while > reading back from the file in case of persistence. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21981) MMaped bucket cache IOEngines does not work with persistence
ramkrishna.s.vasudevan created HBASE-21981: -- Summary: MMaped bucket cache IOEngines does not work with persistence Key: HBASE-21981 URL: https://issues.apache.org/jira/browse/HBASE-21981 Project: HBase Issue Type: Bug Components: BucketCache Affects Versions: 2.1.3 Reporter: ramkrishna.s.vasudevan Assignee: Anoop Sam John Fix For: 3.0.0, 2.2.0, 2.0.5, 2.3.0, 2.1.4 The MMap based IOEngines does not retrieve the data back if 'hbase.bucketcache.persistent.path' is enabled. FileIOEngine works fine but only the FileMMapEngine has this problem. The reason is that we don't get the byte buffers in the proper order while reading back from the file in case of persistence. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose
[ https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774760#comment-16774760 ] ramkrishna.s.vasudevan commented on HBASE-21879: bq.And you can get a ByteBuffer from a netty ByteBuf, by calling the nioBuffer method, no different from our ByteBuff. And we have CompositeByteBuf where we can have multiple ByteBuf combined. Thanks [~Apache9]. Yes in the recent years seeing some Netty code - I was thinking while typing the above comment that your reply on using nioBuffer or CompositeByteBuf will be the answer for it. The ref count and the resource leaking detection may be different so I could be wrong there. Ya it will be a big project. The Cell,, Cellcomparators, CellUtils all needs to be changed and that will alone be a big change. doing it in a seperate branch will be better. Thanks for the useful discussions here. > Read HFile's block to ByteBuffer directly instead of to byte for reducing > young gc purpose > -- > > Key: HBASE-21879 > URL: https://issues.apache.org/jira/browse/HBASE-21879 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4 > > Attachments: QPS-latencies-before-HBASE-21879.png, > gc-data-before-HBASE-21879.png > > > In HFileBlock#readBlockDataInternal, we have the following: > {code} > @VisibleForTesting > protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, > long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, > boolean updateMetrics) > throws IOException { > // . > // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with > BBPool (offheap). > byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize]; > int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize, > onDiskSizeWithHeader - preReadHeaderSize, true, offset + > preReadHeaderSize, pread); > if (headerBuf != null) { > // ... > } > // ... > } > {code} > In the read path, we still read the block from hfile to on-heap byte[], then > copy the on-heap byte[] to offheap bucket cache asynchronously, and in my > 100% get performance test, I also observed some frequent young gc, The > largest memory footprint in the young gen should be the on-heap block byte[]. > In fact, we can read HFile's block to ByteBuffer directly instead of to > byte[] for reducing young gc purpose. we did not implement this before, > because no ByteBuffer reading interface in the older HDFS client, but 2.7+ > has supported this now, so we can fix this now. I think. > Will provide an patch and some perf-comparison for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose
[ https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774386#comment-16774386 ] ramkrishna.s.vasudevan commented on HBASE-21879: However if at all we need netty's ref counting mechanism I believe the ResourceLeakDetector cannot be DISABLED. > Read HFile's block to ByteBuffer directly instead of to byte for reducing > young gc purpose > -- > > Key: HBASE-21879 > URL: https://issues.apache.org/jira/browse/HBASE-21879 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4 > > Attachments: QPS-latencies-before-HBASE-21879.png, > gc-data-before-HBASE-21879.png > > > In HFileBlock#readBlockDataInternal, we have the following: > {code} > @VisibleForTesting > protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, > long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, > boolean updateMetrics) > throws IOException { > // . > // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with > BBPool (offheap). > byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize]; > int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize, > onDiskSizeWithHeader - preReadHeaderSize, true, offset + > preReadHeaderSize, pread); > if (headerBuf != null) { > // ... > } > // ... > } > {code} > In the read path, we still read the block from hfile to on-heap byte[], then > copy the on-heap byte[] to offheap bucket cache asynchronously, and in my > 100% get performance test, I also observed some frequent young gc, The > largest memory footprint in the young gen should be the on-heap block byte[]. > In fact, we can read HFile's block to ByteBuffer directly instead of to > byte[] for reducing young gc purpose. we did not implement this before, > because no ByteBuffer reading interface in the older HDFS client, but 2.7+ > has supported this now, so we can fix this now. I think. > Will provide an patch and some perf-comparison for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose
[ https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774380#comment-16774380 ] ramkrishna.s.vasudevan commented on HBASE-21879: Thanks for the ping here folks. From the docs that we prepared when we did the offheaping work we have the following points that were discussed Netty's ByteBuf and NIO Bytebuffers- The comparison using JMH showed that NIO BBs are 17% better. Ideally we should have seen similar performance but in the netty version 4.0.23 had this reference counting and memory leak detection mechanism which was actually not allowing the C2 compiler to do some proper iniling of the code. How ever netty 4.0.4 had the feature to disable the ResourceLeakDetector which brought the performance closer to the NIO case. Still the reason that we went ahead with NIO - is indirectly a reason why this JIRA is created- in the sense that since HDFS was already having an API to pass NIO BB and read into the NIO BB, going with Netty ByteBuf would not allow that to happen easily because of the HDFS API. The other advantage is that if we are able to pass a offheap NIO BB we can avoid a copy to onheap once we read from the DFS. [~anoopsamjohn] - Is there anything I had missed out here. But I think the idea of Netty doing ref counting helps in avoiding we doing the ref counting which is adding some complexity. May be we had missed out some options- if so it would be great to know about them. Good one. > Read HFile's block to ByteBuffer directly instead of to byte for reducing > young gc purpose > -- > > Key: HBASE-21879 > URL: https://issues.apache.org/jira/browse/HBASE-21879 > Project: HBase > Issue Type: Improvement >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4 > > Attachments: QPS-latencies-before-HBASE-21879.png, > gc-data-before-HBASE-21879.png > > > In HFileBlock#readBlockDataInternal, we have the following: > {code} > @VisibleForTesting > protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, > long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, > boolean updateMetrics) > throws IOException { > // . > // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with > BBPool (offheap). > byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize]; > int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize, > onDiskSizeWithHeader - preReadHeaderSize, true, offset + > preReadHeaderSize, pread); > if (headerBuf != null) { > // ... > } > // ... > } > {code} > In the read path, we still read the block from hfile to on-heap byte[], then > copy the on-heap byte[] to offheap bucket cache asynchronously, and in my > 100% get performance test, I also observed some frequent young gc, The > largest memory footprint in the young gen should be the on-heap block byte[]. > In fact, we can read HFile's block to ByteBuffer directly instead of to > byte[] for reducing young gc purpose. we did not implement this before, > because no ByteBuffer reading interface in the older HDFS client, but 2.7+ > has supported this now, so we can fix this now. I think. > Will provide an patch and some perf-comparison for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773976#comment-16773976 ] ramkrishna.s.vasudevan commented on HBASE-21874: bq.sysctl -w vm.max_map_count=13 Thanks [~wchevreuil]. That was very useful. So in that case we need not configure the buffer size and just set this value at the Os level. > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21916) Abstract an ByteBuffAllocator to allocate/free ByteBuffer in ByteBufferPool
[ https://issues.apache.org/jira/browse/HBASE-21916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772643#comment-16772643 ] ramkrishna.s.vasudevan commented on HBASE-21916: [~openinx] Sorry for the time taken here. Just got back here. thanks for the ping. checking your patch and subtasks. > Abstract an ByteBuffAllocator to allocate/free ByteBuffer in ByteBufferPool > --- > > Key: HBASE-21916 > URL: https://issues.apache.org/jira/browse/HBASE-21916 > Project: HBase > Issue Type: Sub-task >Reporter: Zheng Hu >Assignee: Zheng Hu >Priority: Major > Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4 > > Attachments: HBASE-21916.v1.patch, HBASE-21916.v2.patch, > HBASE-21916.v3.patch, HBASE-21916.v4.patch, HBASE-21916.v5.patch > > > Now our read/write path allocate ByteBuffer from the ByteBufferPool, but we > need consider the minSizeForReservoirUse for better utilization, those > allocate/free api are some static methods, not so good to use. > For HBASE-21879, we need an universal ByteBuffer allocator to manage all the > ByteBuffers through the entire read path, so create this issue. > Will upload a patch to abstract an ByteBufAllocator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771056#comment-16771056 ] ramkrishna.s.vasudevan commented on HBASE-21874: {quote}PmemIOEngine only overrides read() method to sign deserializers that memory type is shared. {quote} Yes. What you say is right. The heavy lifting was already done in HBASE-11425 and so the change here seems to be very small given that bucket cache engine's already were doing what was needed. We are also trying to prepare a patch for multi file support. > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770101#comment-16770101 ] ramkrishna.s.vasudevan commented on HBASE-21874: bq.Thus, main goal of setting "Direct Mode" here for now is not to use the persistence capabilities (although it's probably already working), but just have a mean to guarantee we use space from PMem device for caching (and not DRAM at all, which can't be guaranteed with "Memory Mode") Right. bq.So theoretically, we can already do this with the current FileMmapEngine, no? Yes. Theoretically correct provided the file is on the pmem engine. But FileMmapEngine's assumes that if at all mmap is not able to fit the file in the DRAM then the block has to be copied onheap. So the entire block wil be copied to onheap. Our recent tests show that if we try to use as a mmap based file but on AEP the copy is costlier because we copy from the AEP a 64K block to onheap. So perf is on the lower side rather than doing what the pmem IOEngine does. Thanks [~wchevreuil]. > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769964#comment-16769964 ] ramkrishna.s.vasudevan commented on HBASE-21874: bq. Can you point the exact place in the patch where you control this? We need not control at the Java level. It will controlled at the OS level. These devices are configured with DAX (Direct Access mode) at the OS level. As said in the link - here we use the App Direct mode and not the memory mode. Memory mode does not give us control as where the cache or the address space could reside. It may be on DRAM or Pmem address space. But here we specifically ask our cache to reside only on the Pmem area and once it is mapped in the Pmem address space everything is transparent to us. > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory
[ https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769598#comment-16769598 ] ramkrishna.s.vasudevan commented on HBASE-21874: {quote}Where are your going to keep bucket cache? Not in DRAM definitely, hence in NVDIMM (PMEM)? {quote} Yes. The cache will reside in Pmem only. {quote}If you keep data in PMEM and use extended FileMmapIOEngine, where do yo you mmap it into? into DRAM? That is strange {quote} We use extended FileMmapIOEngine, but the mmap won't do the memmory mapping to DRAM, it will mmap to a different address space maintained by the NVDIMM. So even if you have less DRAM capacity still your data is served from PMEM's address space. That is why you can use the SHARED mode in the IOEngine. Where as in a file mmap case you will go with EXCLUSIVE - where you need to copy the content to the onheap memory. {quote}My question, regarding file system required on top PMEM has remained unanswered. You rely on file system on top of PMEM {quote} Pls check the description from [http://pmem.io.|http://pmem.io./] NVDIMMs are going to be addressed as an mmap files only unlike DRAM where you directly access the memory addresses. {quote}You mmap PMEM resided file into RAM {quote} No as explained previously. Some related links which is there in public https://software.intel.com/en-us/blogs/2018/10/30/intel-optane-dc-persistent-memory-a-major-advance-in-memory-and-storage-architecture > Bucket cache on Persistent memory > - > > Key: HBASE-21874 > URL: https://issues.apache.org/jira/browse/HBASE-21874 > Project: HBase > Issue Type: New Feature > Components: BucketCache >Affects Versions: 3.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-21874.patch, HBASE-21874.patch, > HBASE-21874_V2.patch, Pmem_BC.png > > > Non volatile persistent memory devices are byte addressable like DRAM (for > eg. Intel DCPMM). Bucket cache implementation can take advantage of this new > memory type and can make use of the existing offheap data structures to serve > data directly from this memory area without having to bring the data to > onheap. > The patch is a new IOEngine implementation that works with the persistent > memory. > Note : Here we don't make use of the persistence nature of the device and > just make use of the big memory it provides. > Performance numbers to follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)