[jira] [Updated] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-12-10 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-23066:
---
Release Note: 
The configuration 'hbase.rs.cacheblocksonwrite' was used to enable caching the 
blocks on write. But purposefully we were not caching the blocks when we do 
compaction (since it may be very aggressive) as the caching happens as and when 
the writer completes a block. 
In cloud environments since they have bigger sized caches - though they try to 
enable 'hbase.rs.prefetchblocksonopen' (non - aggressive way of caching the 
blocks proactively on reader creation) it does not help them because it takes 
time to cache the compacted blocks. 
This feature creates a new configuration  
'hbase.rs.cachecompactedblocksonwrite' which when set to 'true' will enable the 
blocks created out of compaction. 
Remember that since it is aggressive caching the user should be having enough 
cache space - if not it may lead to other active blocks getting evicted.
>From the shell this can be enabled by using the option per Column Family also 
>by using the below format
{code}
create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', 
CONFIGURATION => {'hbase.rs.cachecompactedblocksonwrite' => 'true'}}
{code}


> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-12-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992767#comment-16992767
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-23066 at 12/10/19 5:52 PM:
--

[~jacob.leblanc] - Pushed this to master. Tomorrow will push this to branch-2.x 
(latest). I have some issues in pulling the code - since my VM is not working.
BTW thanks for the nice patch.


was (Author: ram_krish):
[~jacob.leblanc] - Pushed this to master. Tomorrow will push this to branch-2.x 
(latest). I have some issues in pulling the code - since my VM is not working.

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-12-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992767#comment-16992767
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


[~jacob.leblanc] - Pushed this to master. Tomorrow will push this to branch-2.x 
(latest). I have some issues in pulling the code - since my VM is not working.

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23349) Reader lock on compacted store files preventing archival of compacted files

2019-12-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992326#comment-16992326
 ] 

ramkrishna.s.vasudevan commented on HBASE-23349:


There is no problem. Because the ongoing scans are closed and opened again. So 
it wont physically close or end the ongoing scans.

> Reader lock on compacted store files preventing archival of compacted files
> ---
>
> Key: HBASE-23349
> URL: https://issues.apache.org/jira/browse/HBASE-23349
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 1.6.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 1.6.0
>
>
> refCounts on compacted away store files as low as 1 can also prevent archival.
> {code:java}
> regionserver.HStore - Can't archive compacted file 
> hdfs://{{root-dir}}/hbase/data/default/t1/12a9e1112e0371955b3db8d3ebb2d298/cf1/73b72f5ddfce4a34a9e01afe7b83c1f9
>  because of either isCompactedAway=true or file has reference, 
> isReferencedInReads=true, refCount=1, skipping for now.
> {code}
> We should come up with core code blocking reader lock if client or 
> coprocessor has held the lock for significantly high amount of 
> time(configurable - mostly same as discharger thread interval) or gracefully 
> resolve reader lock issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23349) Reader lock on compacted store files preventing archival of compacted files

2019-12-06 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989545#comment-16989545
 ] 

ramkrishna.s.vasudevan commented on HBASE-23349:


Sorry for being late here. Good comments and discussion over here. 
If the refCount is making an issue - then I think previously the scanners were 
notified that the new compacted files are created and on receiving the 
notificaiton it just resets the heap.
I think to make things simple based on the timeout config (as discussed here) 
that thread can notify the scanner to reset itself and ensuring that the 
refCount is decremented and inturn the discharger thread in the next cycle can 
archive the compacted files. Does that make sense here?

> Reader lock on compacted store files preventing archival of compacted files
> ---
>
> Key: HBASE-23349
> URL: https://issues.apache.org/jira/browse/HBASE-23349
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 1.6.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 1.6.0
>
> Attachments: HBASE-23349.master.000.patch, 
> HBASE-23349.master.001.patch, HBASE-23349.master.002.patch
>
>
> refCounts on compacted away store files as low as 1 can also prevent archival.
> {code:java}
> regionserver.HStore - Can't archive compacted file 
> hdfs://{{root-dir}}/hbase/data/default/t1/12a9e1112e0371955b3db8d3ebb2d298/cf1/73b72f5ddfce4a34a9e01afe7b83c1f9
>  because of either isCompactedAway=true or file has reference, 
> isReferencedInReads=true, refCount=1, skipping for now.
> {code}
> We should come up with core code blocking reader lock if client or 
> coprocessor has held the lock for significantly high amount of 
> time(configurable - mostly same as discharger thread interval) or gracefully 
> resolve reader lock issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-12-04 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988474#comment-16988474
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


BTW - confirmed seeing the code that it is easy to trigger this per 
table/family also. Even using shell or the java client.

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-12-04 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988473#comment-16988473
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


[~jacob.leblanc]
Thanks for your detailed write up. I saw the reply from [~anoop.hbase]. hope it 
helps.

How ever since you have prefetch also enabled - it means every time your cache 
was always getting loaded asynchronously and that was helping you in a big way 
always.
Can you just give some rough numbers on you cache size and the number of blocks 
that you always see in your cache? Is there a sporadic raise in your block 
count and if so by how much and hope your cache size is good enough to have 
them.

[~jacob.leblanc]
If you are fine with the latest PR - I can just merge them and work on the 
other sub task to make this configuration based on a size so that all the older 
files' blocks are not cacheed.  

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23350) Make compaction files cacheonWrite configurable based on threshold

2019-11-28 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-23350:
--

 Summary: Make compaction files cacheonWrite configurable based on 
threshold
 Key: HBASE-23350
 URL: https://issues.apache.org/jira/browse/HBASE-23350
 Project: HBase
  Issue Type: Sub-task
  Components: Compaction
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 3.0.0, 2.3.0


As per comment from [~javaman_chen] in the parent JIRA
https://issues.apache.org/jira/browse/HBASE-23066?focusedCommentId=16937361=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16937361
This is to introduce a config to identify if the resulting compacted file's 
blocks should  be added to the cache - while writing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-11-27 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984157#comment-16984157
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


[~jacob.leblanc]
You want to do the [~javaman_chen] angle of adding threshold based on some size 
based config - in this JIRA or another one? 
I can do that in another JIRA if you are busy with other things.
[~javaman_chen], [~anoop.hbase] - FYI.


> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-11-27 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984143#comment-16984143
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


bq.On a side note, (Not related to this issue) when we have cache on write ON 
as well as prefetch also On, do we do the caching part for the flushed files 
twice? When it is written, its already been added to cache. Later as part of 
HFile reader open, the prefetch threads will again do a read and add to cache!
I checked this part. Seems we just read the block and if it is from cache we 
just return it. Because HfileReaderImpl#readBlock() just return if the block is 
already cached.

bq.The comment from @chenxu seems valid. Should we see that angle also?
Ok. We can see that but it is part of this JIRA or should we raise another JIRA 
and address it. 

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-11-26 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982284#comment-16982284
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-23066 at 11/26/19 9:27 AM:
--

[~jacob.leblanc]
Is the patch still applicable on trunk/2.0 branches?

 Any more reviews here? I can commit this if no other reviews are pending here. 
Will wait for another day or so.


was (Author: ram_krish):
[~jacob.leblanc]
Is the patch still applicable on trunk/2.0 branches? Any more reviews here? I 
can commit this if no other reviews are pending here. Will wait for another day 
or so.

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-11-26 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982284#comment-16982284
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


[~jacob.leblanc]
Is the patch still applicable on trunk/2.0 branches? Any more reviews here? I 
can commit this if no other reviews are pending here. Will wait for another day 
or so.

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23279) Switch default block encoding to ROW_INDEX_V1

2019-11-20 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979010#comment-16979010
 ] 

ramkrishna.s.vasudevan commented on HBASE-23279:


[~vjasani]
You can try enabling other encoding instead of NONE and see even if it fails
bq.Do we support get closest at or before in hbase2? Its deprecated, no?
Seems it is deprecated as I remember. But it is still better to verify if the 
encoding is messing anything. I hope no.

> Switch default block encoding to ROW_INDEX_V1
> -
>
> Key: HBASE-23279
> URL: https://issues.apache.org/jira/browse/HBASE-23279
> Project: HBase
>  Issue Type: Wish
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Lars Hofhansl
>Assignee: Viraj Jasani
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-23279.master.000.patch, 
> HBASE-23279.master.001.patch, HBASE-23279.master.002.patch
>
>
> Currently we set both block encoding and compression to NONE.
> ROW_INDEX_V1 has many advantages and (almost) no disadvantages (the hfiles 
> are slightly larger about 3% or so). I think that would a better default than 
> NONE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23296) Support Bucket based L1 Cache

2019-11-18 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976452#comment-16976452
 ] 

ramkrishna.s.vasudevan commented on HBASE-23296:


Good one [~javaman_chen]. We tried for a tiered bucket cache but that is for 
the data block itself. But this is for the index blocks itself. Seems like a 
good improvement.

> Support Bucket based L1 Cache
> -
>
> Key: HBASE-23296
> URL: https://issues.apache.org/jira/browse/HBASE-23296
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache
>Reporter: chenxu
>Priority: Major
>
> LruBlockCache is not suitable in the following scenarios:
> (1) cache size too large (will take too much heap memory, and 
> evictBlocksByHfileName is not so efficient, as HBASE-23277 mentioned)
> (2) block evicted frequently, especially cacheOnWrite & prefetchOnOpen are 
> enabled.
> Since block‘s data is reclaimed by GC, this may affect GC performance.
> So how about enabling a Bucket based L1 Cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-23270) Inter-cluster replication is unaware destination peer cluster's RSGroup to push the WALEdits

2019-11-07 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-23270:
--

Assignee: Pradeep

> Inter-cluster replication is unaware destination peer cluster's RSGroup to 
> push the WALEdits
> 
>
> Key: HBASE-23270
> URL: https://issues.apache.org/jira/browse/HBASE-23270
> Project: HBase
>  Issue Type: Bug
>Reporter: Pradeep
>Assignee: Pradeep
>Priority: Major
>
> In a source RSGroup enabled HBase cluster where replication is enabled to 
> another destination RSGroup enabled cluster, the replication stream of 
> List go to any node in the destination cluster without the 
> awareness of RSGroup and then gets routed to appropriate node where the 
> region is hosted. This extra hop where the data is received and routed could 
> be of any node in the cluster and no restriction exists to select the node 
> within the same RSGroup.
> Implications: RSGroup owner in the multi-tenant HBase cluster can see 
> performance and throughput deviations because of this unpredictability caused 
> by replication.
> Potential fix: options:
> a) Select a destination node having RSGroup awareness
> b) Group the WAL.Edit list based on region and then by region-servers in 
> which the regions are assigned in the destination. Pass the list WAL.Edit 
> directly to the region-server to avoid extra intermediate hop in the 
> destination cluster during the replication process. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-10-17 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953614#comment-16953614
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


[~busbey] - you want to have a look at the patch and the charts added by 
[~jacob.leblanc].

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-10-16 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952549#comment-16952549
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


You may have to add a clear release note for this - as how to use this feature 
and clearly highlight it is used only when prefetch is turned ON.

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 2.3.0, 1.6.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23107) Avoid temp byte array creation when doing cacheDataOnWrite

2019-10-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950971#comment-16950971
 ] 

ramkrishna.s.vasudevan commented on HBASE-23107:


Will have a look at this later today or tomorrow.. thanks.

> Avoid temp byte array creation when doing cacheDataOnWrite
> --
>
> Key: HBASE-23107
> URL: https://issues.apache.org/jira/browse/HBASE-23107
> Project: HBase
>  Issue Type: Improvement
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: flamegraph_after.svg, flamegraph_before.svg
>
>
> code in HFileBlock.Writer.cloneUncompressedBufferWithHeader
> {code:java}
> ByteBuffer cloneUncompressedBufferWithHeader() {
>   expectState(State.BLOCK_READY);
>   byte[] uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   …
> }
> {code}
> When cacheOnWrite feature enabled, a temp byte array was created in order to 
> copy block’s data, we can avoid this by use of ByteBuffAllocator. This can 
> improve GC performance in write heavy scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22608) MVCC's writeEntry didn't complete and make MVCC stuck

2019-10-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950963#comment-16950963
 ] 

ramkrishna.s.vasudevan commented on HBASE-22608:


[~openinx]
Are you working on this issue / related issues with IN_MEMORY_COMPACTION?

> MVCC's writeEntry didn't complete and make MVCC stuck
> -
>
> Key: HBASE-22608
> URL: https://issues.apache.org/jira/browse/HBASE-22608
> Project: HBase
>  Issue Type: Bug
>  Components: in-memory-compaction
>Reporter: Guanghao Zhang
>Assignee: Zheng Hu
>Priority: Critical
>
> {code:java}
> 2019-06-20,05:03:44,917 ERROR 
> [RpcServer.default.RWQ.Fifo.write.handler=61,queue=1,port=22600] 
> org.apache.hadoop.hbase.regionserver.HRegion: Asked to modify this region's 
> (xx,,1560481375170.10b01c12d58ce75c9aaf1ac15cc2a7f3.) memStoreSizing to a 
> negative value which is incorrect. Current memStoreSizing=-1686222, 
> delta=1489930
> java.lang.Exception
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1317)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.incMemStoreSize(HRegion.java:1295)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3316)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3821)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4248)
> at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4179)
> at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4109)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1059)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:991)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:954)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2833)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {code}
> See 
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3728]
> {code:java}
> @Override
> public WriteEntry writeMiniBatchOperationsToMemStore(
> final MiniBatchOperationInProgress miniBatchOp, @Nullable 
> WriteEntry writeEntry)
> throws IOException {
>   if (writeEntry == null) {
> writeEntry = region.mvcc.begin();
>   }
>   super.writeMiniBatchOperationsToMemStore(miniBatchOp, 
> writeEntry.getWriteNumber());
>   return writeEntry;
> }
> {code}
> super.writeMiniBatchOperationsToMemStore throw a exception and the new 
> writeEntry cannot be complete and make the MVCC stuck.
>  
> And we meet this problem when enable in-memory compaction. But that should be 
> another issue and need to dig more.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23143) Region Server Crash due to 2 cells out of order ( between 2 DELETEs)

2019-10-11 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949159#comment-16949159
 ] 

ramkrishna.s.vasudevan commented on HBASE-23143:


Reading the Cells that you have and the code
{code}
if (lastCell != null) {
  int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell);
{code}
Here we pass lastCell as the 'left' and 'cell' as the right.  Generally 
timestamp is swapped.
{code}
  @Override
  public int compareTimestamps(final long ltimestamp, final long rtimestamp) {
// Swap order we pass into compare so we get DESCENDING order.
return Long.compare(rtimestamp, ltimestamp);
  }
{code}
So here since the currentCell has the bigger TS we get this exception. But the 
seqId of the cells as in the log is that the currentCell has lesser seqID than 
the lastCell.
So I suspect how things were added to the memstore. First question is that - 
how is the Timestamp added? Is it added by the client in this case?

> Region Server Crash due to 2 cells out of order ( between 2 DELETEs)
> 
>
> Key: HBASE-23143
> URL: https://issues.apache.org/jira/browse/HBASE-23143
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Priority: Major
> Fix For: 1.4.12, 1.3.7, 1.5.1
>
>
> Region Server Crash due to 2 cells out of order ( between 2 DELETEs)
>  
> Caused by: java.io.IOException: Added a key not lexically larger than 
> previous.
>  Current cell = 
> 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095189597*/DeleteColumn/vlen=0/seqid=*2128373*,
>  
>  lastCell = 
> 00D7F00xxQ10D52v8UY6yV0057F00bPaGT\x00057F00bPaG/0:TABLE1_ID/*1570095165147*/DeleteColumn/vlen=0/seqid=*2128378*
>  
>  
> I am aware of this JIRA: https://issues.apache.org/jira/browse/HBASE-22862
> Though it's slightly different, HBASE-22862 issue was caused One Delete and 
> One Put.
> This issue I am reporting is caused by 2 Deletes
>  
> Has anyone seen this issue? 
>  
> After I read the code and debugged the test cases.
> In AbstractHFileWriter.java
> {code:java}
> int keyComp = comparator.compareOnlyKeyPortion(lastCell, cell);{code}
> This call will always ignore SequenceId. And time stamps are in the correct 
> order (above case)
> And since these 2 cells have same KEY. The comparison result should be 0.
>  *only possible issue I can think of is, in this code piece: in 
> CellComparator.java:*
> {code:java}
> Bytes.compareTo(left.getRowArray(), left.getRowOffset(), left.getRowLength(),
>  right.getRowArray(), right.getRowOffset(), right.getRowLength());{code}
> The getRowLength() returns a wrong value.
> Or the offset is messed up. (?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-10-02 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943319#comment-16943319
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


[~jacob.leblanc]
Can you create a PR using github? That will trigger the CI for running the 
tests. And it is easy to merge the patch too. 

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 1.5.0, 2.3.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-10-01 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941735#comment-16941735
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


bq.As time goes on, HFile will grow larger(because of Compaction), and it's 
data may get colder and colder, In some scenarios, only the latest time window 
data is accessed, so warmup the large HFile seems unnecessary.
got it. thanks [~javaman_chen]. More of a size based threshold. 

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 1.5.0, 2.3.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-10-01 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941556#comment-16941556
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


I think [~jacob.leblanc] says  for cloud related use cases having a bigger 
cache the cache on write after compactions should benefit them - considering 
the fact that this feature is disabled by default and it is enabled only when 
prefetch is enabled.
The results that are attached here  shows very positive impact. 
[~jacob.leblanc] - you want to prepare a patch for master?
[~javaman_chen]
bq.  If the compacted HFile greater than this threshold, do not cache it, just 
a suggestion.
You mean after every compaciton or in general if the hfile count increases 
certain level do not cache? 


> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 1.5.0, 2.3.0
>
> Attachments: HBASE-23066.patch, performance_results.png, 
> prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-11288) Splittable Meta

2019-09-30 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940773#comment-16940773
 ] 

ramkrishna.s.vasudevan commented on HBASE-11288:


Thanks for the write up. It gives the idea on what is there here. 
Since now META can split all the journal entries that is been added for a 
region split will now be added to ROOT and the failure of the meta split will 
be treated similar to a normal region, correct? Thanks. 


> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-11288) Splittable Meta

2019-09-24 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937054#comment-16937054
 ] 

ramkrishna.s.vasudevan commented on HBASE-11288:


+1 to what [~stack] says. Thanks [~toffer].

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Liu
>Assignee: Francis Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23066) Allow cache on write during compactions when prefetching is enabled

2019-09-24 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937032#comment-16937032
 ] 

ramkrishna.s.vasudevan commented on HBASE-23066:


On first glance patch looks good to me [~jacob.leblanc]. Have you tested this 
in your cluster running on AWS? 

> Allow cache on write during compactions when prefetching is enabled
> ---
>
> Key: HBASE-23066
> URL: https://issues.apache.org/jira/browse/HBASE-23066
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, regionserver
>Affects Versions: 1.4.10
>Reporter: Jacob LeBlanc
>Assignee: Jacob LeBlanc
>Priority: Minor
> Fix For: 1.5.0, 2.3.0
>
> Attachments: prefetchCompactedBlocksOnWrite.patch
>
>
> In cases where users care a lot about read performance for tables that are 
> small enough to fit into a cache (or the cache is large enough), 
> prefetchOnOpen can be enabled to make the entire table available in cache 
> after the initial region opening is completed. Any new data can also be 
> guaranteed to be in cache with the cacheBlocksOnWrite setting.
> However, the missing piece is when all blocks are evicted after a compaction. 
> We found very poor performance after compactions for tables under heavy read 
> load and a slower backing filesystem (S3). After a compaction the prefetching 
> threads need to compete with threads servicing read requests and get 
> constantly blocked as a result. 
> This is a proposal to introduce a new cache configuration option that would 
> cache blocks on write during compaction for any column family that has 
> prefetch enabled. This would virtually guarantee all blocks are kept in cache 
> after the initial prefetch on open is completed allowing for guaranteed 
> steady read performance despite a slow backing file system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-23 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936390#comment-16936390
 ] 

ramkrishna.s.vasudevan commented on HBASE-23035:


So on restart you just want to leave the target location as null and allow the 
LB to take care of the location - right?

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower

2019-09-17 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931193#comment-16931193
 ] 

ramkrishna.s.vasudevan commented on HBASE-23035:


We were always doing a round robin method in case of SCP right? I mean for non 
region replica cases? 

> Retain region to the last RegionServer make the failover slower
> ---
>
> Key: HBASE-23035
> URL: https://issues.apache.org/jira/browse/HBASE-23035
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Major
>
> Now if one RS crashed, the regions will try to use the old location for the 
> region deploy. But one RS only have 3 threads to open region by default. If a 
> RS have hundreds of regions, the failover is very slower. Assign to same RS 
> may have good locality if the Datanode is deploied on same host. But slower 
> failover make the availability worse. And the locality is not big deal when 
> deploy HBase on cloud.
> This was introduced by HBASE-18946.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (HBASE-22929) MemStoreLAB ChunkCreator may memory leak

2019-09-12 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-22929.

Fix Version/s: 2.1.6
   2.2.1
   2.3.0
   3.0.0
   Resolution: Fixed

Thanks for all the reviews. Pushed to master, branch-2, branch-2.1 and 
branch-2.2.


> MemStoreLAB  ChunkCreator may memory leak
> -
>
> Key: HBASE-22929
> URL: https://issues.apache.org/jira/browse/HBASE-22929
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Yechao Chen
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>
> Attachments: OOM_log.png, hbase-site.xml, hbase_heap_monitor.png, 
> hbase_rs_heap_dump_mat_1.png, 
> hbase_rs_heap_dump_mat_ChunkCreator_chunkIdMap.png, hbase_rs_mem_used.png
>
>
> We use hbase 2.1.2 with memstorelab enable
> RegionServer crashed case of oom 
> I dump the heap ,found the ChunkCreator may be memory leak
> The heap is 32GB, 
> hbase.regionserver.global.memstore.size=0.4,
> hbase.hregion.memstore.mslab.enabled=true
> hbase.hregion.memstore.chunkpool.initialsize=0.5,
> hbase.hregion.memstore.chunkpool.maxsize=1.0
> BucketCache with offheap



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (HBASE-22929) MemStoreLAB ChunkCreator may memory leak

2019-09-12 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-22929:
--

Assignee: ramkrishna.s.vasudevan

> MemStoreLAB  ChunkCreator may memory leak
> -
>
> Key: HBASE-22929
> URL: https://issues.apache.org/jira/browse/HBASE-22929
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Yechao Chen
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: OOM_log.png, hbase-site.xml, hbase_heap_monitor.png, 
> hbase_rs_heap_dump_mat_1.png, 
> hbase_rs_heap_dump_mat_ChunkCreator_chunkIdMap.png, hbase_rs_mem_used.png
>
>
> We use hbase 2.1.2 with memstorelab enable
> RegionServer crashed case of oom 
> I dump the heap ,found the ChunkCreator may be memory leak
> The heap is 32GB, 
> hbase.regionserver.global.memstore.size=0.4,
> hbase.hregion.memstore.mslab.enabled=true
> hbase.hregion.memstore.chunkpool.initialsize=0.5,
> hbase.hregion.memstore.chunkpool.maxsize=1.0
> BucketCache with offheap



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-23006) RSGroupBasedLoadBalancer should also try to place replicas for the same region to different region servers

2019-09-11 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928194#comment-16928194
 ] 

ramkrishna.s.vasudevan commented on HBASE-23006:


In many of the cases the LB was not considering the replicas. good to see this 
getting solved. 

> RSGroupBasedLoadBalancer should also try to place replicas for the same 
> region to different region servers
> --
>
> Key: HBASE-23006
> URL: https://issues.apache.org/jira/browse/HBASE-23006
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment, rsgroup
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
> Attachments: HBASE-23006-UT.patch
>
>
> Found this when implementing HBASE-22971. TestSCPWithReplicas fails when 
> RSGroupBasedLoadBalancer is enabled.
> And this can be reproduced by a UT on master branch too. I think the problem 
> is that in RSGroupBasedLoadBalancer.retainAssignment we do not consider 
> region replicas.
> We should fix this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-09-06 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924202#comment-16924202
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


[~aoxiang]
Oh I missed this. Just to reiterate if close(true) happens then anyway the 
memstoreScanners are closed and then no longer updateReaders() happen in 2.1.5. 
I think in previous versions that was the problem leading to memstore chunks to 
leak and the store file ref count also was not happening correctly.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.0.0
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>  Labels: compaction
> Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5
>
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-09-02 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920637#comment-16920637
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


[~aoxiang]
Yes. Now it is either updateReaders or close() that will happen. If close(true) 
already happens then updateReaders won't happen. Previously that was possible 
and now in 2.1.5 that is not possible since we have closeLock(). 

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.0.0
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>  Labels: compaction
> Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5
>
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-08-30 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919269#comment-16919269
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


[~aoxiang]
The patch here has gone into 2.1.5. As per what I see now is that since we have 
closeLock in place it is either updateReaders() or close() that can happen. As 
per the previous comments - you can see that if close(false) has happened, then 
close(true) is bound to happen when the Storescanner actually gets closed. So 
if updateREaders() happened the memstoreflushers though updated will anyway get 
closed, when the final close(true) happens. Previously it was multithreaded, 
now they are not. So if the issue over in HBASE-22929 can be checked with 
HBASE-2.1.5 probably the issue won't be there? Lemme know if am missing 
something.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.0.0
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>  Labels: compaction
> Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5
>
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-27 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917394#comment-16917394
 ] 

ramkrishna.s.vasudevan commented on HBASE-22862:


[~openinx]
That is true. Sorry for the confusion. 

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> 

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-27 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917385#comment-16917385
 ] 

ramkrishna.s.vasudevan commented on HBASE-22862:


bq.  return (0xff & left.getTypeByte()) - (0xff & right.getTypeByte());
[~openinx] - This is correct right -  we need the type to be sorted in reverse 
order - Deletes to appear before puts.

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> 

[jira] [Commented] (HBASE-22936) Close memStoreScanners in StoreScanner#updateReaders else memory leak

2019-08-27 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917382#comment-16917382
 ] 

ramkrishna.s.vasudevan commented on HBASE-22936:


If any one needs to give a patch I can review it. If not I can prepare one. 
[~aoxiang], [~javaman_chen], [~chenyechao].

> Close memStoreScanners in StoreScanner#updateReaders else memory leak
> -
>
> Key: HBASE-22936
> URL: https://issues.apache.org/jira/browse/HBASE-22936
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Priority: Major
> Fix For: 2.3.0, 2.1.7, 2.2.2
>
>
> Via [~aoxiang] from over on HBASE-22723
> {code}
> +  if (!closeLock.tryLock()) {
> +// no lock acquired.
> +LOG.debug("StoreScanner already has the close lokc. There is no need 
> to updateReaders");
> +return;
> +  }
> +  // lock acquired
> +  updateReaders = true;
> +  if (this.closing) {
> +LOG.debug("StoreScanner already closing. There is no need to 
> updateReaders");
> +return;
> +  }
> {code}
> We need to close memStoreScanners in StoreScanner#updateReaders before this 
> two return, someone else can take over the task.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-08-27 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917380#comment-16917380
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


Thanks [~stack] for filing. It is HBASE-22936.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.0.0
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>  Labels: compaction
> Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5
>
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22929) MemStoreLAB ChunkCreator may memory leak

2019-08-27 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916549#comment-16916549
 ] 

ramkrishna.s.vasudevan commented on HBASE-22929:


[~chenyechao]
Are you seeing that the chunkIdMap is getting added with new chunks that are 
not from the pool? Any chance you have enabled CompactingMemstore?

> MemStoreLAB  ChunkCreator may memory leak
> -
>
> Key: HBASE-22929
> URL: https://issues.apache.org/jira/browse/HBASE-22929
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.2
>Reporter: Yechao Chen
>Priority: Major
> Attachments: OOM_log.png, hbase-site.xml, hbase_heap_monitor.png, 
> hbase_rs_heap_dump_mat_1.png, 
> hbase_rs_heap_dump_mat_ChunkCreator_chunkIdMap.png, hbase_rs_mem_used.png
>
>
> We use hbase 2.1.2 with memstorelab enable
> RegionServer crashed case of oom 
> I dump the heap ,found the ChunkCreator may be memory leak
> The heap is 32GB, 
> hbase.regionserver.global.memstore.size=0.4,
> hbase.hregion.memstore.mslab.enabled=true
> hbase.hregion.memstore.chunkpool.initialsize=0.5,
> hbase.hregion.memstore.chunkpool.maxsize=1.0
> BucketCache with offheap



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-22802) Avoid temp ByteBuffer allocation in FileIOEngine#read

2019-08-23 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913967#comment-16913967
 ] 

ramkrishna.s.vasudevan commented on HBASE-22802:


[~javaman_chen]
In your case the file based IO engine is backed by what type of Storage device? 

> Avoid temp ByteBuffer allocation in FileIOEngine#read
> -
>
> Key: HBASE-22802
> URL: https://issues.apache.org/jira/browse/HBASE-22802
> Project: HBase
>  Issue Type: Improvement
>  Components: BucketCache
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Attachments: HBASE-22802-master-v1.patch, profile_mem_alloc.png, 
> profile_mem_alloc_with_pool.png
>
>
> a temp ByteBuffer was allocated each time FileIOEngine#read was called
> {code:java}
> public Cacheable read(BucketEntry be) throws IOException {
>   long offset = be.offset();
>   int length = be.getLength();
>   Preconditions.checkArgument(length >= 0, "Length of read can not be less 
> than 0.");
>   ByteBuffer dstBuffer = ByteBuffer.allocate(length);
>   ...
> }
> {code}
> we can avoid this by use of ByteBuffAllocator#allocate(length) after 
> HBASE-21879



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-07-19 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1694#comment-1694
 ] 

ramkrishna.s.vasudevan commented on HBASE-21879:


OOO for personal reasons. No access to official emails during this period.


> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, 
> QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (HBASE-22670) JDK 11 and CellComparator

2019-07-11 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-22670:
---
Labels: jdk11  (was: )

> JDK 11 and CellComparator
> -
>
> Key: HBASE-22670
> URL: https://issues.apache.org/jira/browse/HBASE-22670
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>  Labels: jdk11
>
> This could act as a parent JIRA for analysing JDK 11 and the Comparator impls 
> that we have. 
> Latest JDK has support for SIMD and AVX512, which means it has support for 
> vectorizations.
> See JDK11's ArraysSupport#mismatch() and vectorizedMismatch().
> We also have BufferMismatch#mismatch() which is not publicly exposed but it 
> uses the ArraysSupport#vectorizedMismatch(). 
> Internally vectorizedMismatch() does something similar to what 
> UnsafeComparator#compareToUnsafe does. Will add about the details of the 
> study in further comments.
> The JDK also exposes new annotations like @HotSpotIntrinsicCandidate and 
> @ForceInline tags that helps in inlining the intrinsic calls. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (HBASE-22670) JDK 11 and CellComparator

2019-07-10 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881909#comment-16881909
 ] 

ramkrishna.s.vasudevan commented on HBASE-22670:


Though we have the ArraysSupport#mismatch() the impl does some more steps than 
what we have in compareToUnSafe(). Like ArraysSupport#vectorizedMismatch() does 
{code}
long av = U.getLongUnaligned(a, aOffset + bi);
long bv = U.getLongUnaligned(b, bOffset + bi);
{code}
Than doing a Unsafe#getLong() asin compareToUnsafe(). 
Also the mismatch() API gives you the index where the mismatch happens but the 
compareToUnsafe() directly returns the comparator output.
The ArraysSupport#mismatch() also tries to do some optimization by reading the 
first bit and if there is a mismatch return there even before doing the 
getLongUnaligned(). I tried copying the API's impl to the BBUtils class  and 
tried doing getLong() instead of getLongUnaligned() and avoided the first bit 
read as done in ArraysSupport#mismatch().
The JMH results for a 27 bit row key , 3 bit family and with a 4 bit qualifier 
where the qualifier alone changes, the CellCompartor#compare() with 
compareToUnsafe() and mismatch() based impl are as follows
With compareToUnsafe()
{code}
Comparator.arrayBBCompare  avgt   10  554.920 ±  2.085  ns/op
Comparator.arrayCompareavgt   10  494.358 ±  8.810  ns/op
Comparator.bbArrayCompare  avgt   10  539.219 ±  5.260  ns/op
Comparator.bbCompare   avgt   10  220.743 ± 11.723  ns/op
{code}

With ArraysSupport#mismatch() based impl
{code}
BenchmarkMode  Cnt ScoreError  Units
Comparator.arrayBBCompareavgt   10   511.787 ±  6.902  ns/op
Comparator.arrayCompare  avgt   10   440.026 ± 17.410  ns/op
Comparator.bbArrayCompareavgt   10   510.578 ±  1.209  ns/op
Comparator.bbCompare avgt   10   274.158 ±  1.975  ns/op
{code}

Basically we don't get a significant difference here. 



> JDK 11 and CellComparator
> -
>
> Key: HBASE-22670
> URL: https://issues.apache.org/jira/browse/HBASE-22670
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>
> This could act as a parent JIRA for analysing JDK 11 and the Comparator impls 
> that we have. 
> Latest JDK has support for SIMD and AVX512, which means it has support for 
> vectorizations.
> See JDK11's ArraysSupport#mismatch() and vectorizedMismatch().
> We also have BufferMismatch#mismatch() which is not publicly exposed but it 
> uses the ArraysSupport#vectorizedMismatch(). 
> Internally vectorizedMismatch() does something similar to what 
> UnsafeComparator#compareToUnsafe does. Will add about the details of the 
> study in further comments.
> The JDK also exposes new annotations like @HotSpotIntrinsicCandidate and 
> @ForceInline tags that helps in inlining the intrinsic calls. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22671) ByteBufferUtils#findCommonPrefix() can be safely changed to ArraysSupport#mismatch()

2019-07-10 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-22671:
--

 Summary: ByteBufferUtils#findCommonPrefix() can be safely changed 
to ArraysSupport#mismatch()
 Key: HBASE-22671
 URL: https://issues.apache.org/jira/browse/HBASE-22671
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


Microbenchmarks reveal that finding the common prefix for encoders can safely 
be replaced with ArraysSupport#mismatch().
the microbenchmark just compares Cells that are backed with array and BB. 
For a 27 bit common row prefix the existing BBUtils#findCommonPrefix
{code}
BenchmarkMode  CntScoreError  Units
PrefixComparator.arrayBBCompare  avgt   10  869.897 ±  9.429  ns/op
PrefixComparator.arrayCompareavgt   10  302.074 ± 13.448  ns/op
PrefixComparator.bbArrayCompare  avgt   10  869.369 ±  5.368  ns/op
PrefixComparator.bbCompare   avgt   10  409.479 ±  1.587  ns/op
{code}

the same with ArraysSupport#mismatch() change gives this
{code}
BenchmarkMode  CntScore   Error  Units
PrefixComparator.arrayBBCompare  avgt   10  311.946 ± 1.902  ns/op
PrefixComparator.arrayCompareavgt   10  157.010 ± 4.482  ns/op
PrefixComparator.bbArrayCompare  avgt   10  311.568 ± 1.348  ns/op
PrefixComparator.bbCompare   avgt   10   92.540 ± 0.501  ns/op
{code}

How ever note that this comes in flushes/compaction and not during the read 
path. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22670) JDK 11 and CellComparator

2019-07-10 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-22670:
--

 Summary: JDK 11 and CellComparator
 Key: HBASE-22670
 URL: https://issues.apache.org/jira/browse/HBASE-22670
 Project: HBase
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


This could act as a parent JIRA for analysing JDK 11 and the Comparator impls 
that we have. 
Latest JDK has support for SIMD and AVX512, which means it has support for 
vectorizations.
See JDK11's ArraysSupport#mismatch() and vectorizedMismatch().
We also have BufferMismatch#mismatch() which is not publicly exposed but it 
uses the ArraysSupport#vectorizedMismatch(). 
Internally vectorizedMismatch() does something similar to what 
UnsafeComparator#compareToUnsafe does. Will add about the details of the study 
in further comments.

The JDK also exposes new annotations like @HotSpotIntrinsicCandidate and 
@ForceInline tags that helps in inlining the intrinsic calls. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22608) MVCC's writeEntry didn't complete and make MVCC stuck

2019-06-20 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868280#comment-16868280
 ] 

ramkrishna.s.vasudevan commented on HBASE-22608:


Seems an size accounting issue with in-memory compaction - due to some 
threading issues? Or some wrong accounting.

> MVCC's writeEntry didn't complete and make MVCC stuck
> -
>
> Key: HBASE-22608
> URL: https://issues.apache.org/jira/browse/HBASE-22608
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Priority: Critical
>
> {code:java}
> 2019-06-20,05:03:44,917 ERROR 
> [RpcServer.default.RWQ.Fifo.write.handler=61,queue=1,port=22600] 
> org.apache.hadoop.hbase.regionserver.HRegion: Asked to modify this region's 
> (xx,,1560481375170.10b01c12d58ce75c9aaf1ac15cc2a7f3.) memStoreSizing to a 
> negative value which is incorrect. Current memStoreSizing=-1686222, 
> delta=1489930
> java.lang.Exception
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkNegativeMemStoreDataSize(HRegion.java:1317)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.incMemStoreSize(HRegion.java:1295)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3316)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3821)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4248)
> at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4179)
> at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4109)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1059)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:991)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:954)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2833)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:132)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
> {code}
> See 
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L3728]
> {code:java}
> @Override
> public WriteEntry writeMiniBatchOperationsToMemStore(
> final MiniBatchOperationInProgress miniBatchOp, @Nullable 
> WriteEntry writeEntry)
> throws IOException {
>   if (writeEntry == null) {
> writeEntry = region.mvcc.begin();
>   }
>   super.writeMiniBatchOperationsToMemStore(miniBatchOp, 
> writeEntry.getWriteNumber());
>   return writeEntry;
> }
> {code}
> super.writeMiniBatchOperationsToMemStore throw a exception and the new 
> writeEntry cannot be complete and make the MVCC stuck.
>  
> And we meet this problem when enable in-memory compaction. But that should be 
> another issue and need to dig more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22602) Allow storage policy to be set per column family in PE tool

2019-06-19 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-22602:
--

 Summary: Allow storage policy to be set per column family in PE 
tool
 Key: HBASE-22602
 URL: https://issues.apache.org/jira/browse/HBASE-22602
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 2.3.0


Currently PE tool does not have support for per column family storage policy 
support. This JIRA is aimed to add that support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22539) Potential WAL corruption due to Unsafe.copyMemory usage when DBB are in place

2019-06-05 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856391#comment-16856391
 ] 

ramkrishna.s.vasudevan commented on HBASE-22539:


[~wchevreuil]
Have you verified that all the calls that come to the 
ByteBufferWriterStream#write() has len which is always less than the buffSize? 
Because if there is something wrong there - then the sanity code that you have 
where you directly read from the ByteBuffer  'b' to testBuf will work fine but 
not the other one. 
I think by default SimpleRpcServer also uses pool and that is also offheap. 
This is quite difficult to dig in and great work.

> Potential WAL corruption due to Unsafe.copyMemory usage when DBB are in place
> -
>
> Key: HBASE-22539
> URL: https://issues.apache.org/jira/browse/HBASE-22539
> Project: HBase
>  Issue Type: Bug
>  Components: rpc, wal
>Affects Versions: 2.1.1
>Reporter: Wellington Chevreuil
>Priority: Blocker
>
> Summary
> We had been chasing a WAL corruption issue reported on one of our customers 
> deployments running release 2.1.1 (CDH 6.1.0). After providing a custom 
> modified jar with the extra sanity checks implemented by HBASE-21401 applied 
> on some code points, plus additional debugging messages, we believe it is 
> related to DirectByteBuffer usage, and Unsafe copy from offheap memory to 
> on-heap array triggered 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java#L1157],
>  such as when writing into a non ByteBufferWriter type, as done 
> [here|https://github.com/apache/hbase/blob/branch-2.1/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBufferWriterOutputStream.java#L84].
> More details on the following comment.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22532) There's still too much cpu wasting on validating checksum even if buffer.size=65KB

2019-06-03 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855300#comment-16855300
 ] 

ramkrishna.s.vasudevan commented on HBASE-22532:


Probably we should see the size of per block getting written and while reading 
see what are the offsets and length we pass to HDFS and then ascertain if that  
matches with the dataLength you got here. Probably we are reading more (approx 
2 blocks). good one [~openinx].

> There's still too much cpu wasting on validating checksum even if 
> buffer.size=65KB
> --
>
> Key: HBASE-22532
> URL: https://issues.apache.org/jira/browse/HBASE-22532
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: async-prof-pid-27827-cpu-3.svg, 
> async-prof-pid-64695-cpu-1.svg
>
>
> After disabled the block cache, and with the following config: 
> {code}
> # Disable the block cache
> hfile.block.cache.size=0
> hbase.ipc.server.allocator.buffer.size=66560
> hbase.ipc.server.reservoir.minimal.allocating.size=0
> {code}
> The ByteBuff for block should be expected to be a SingleByteBuff,  which will 
> use the hadoop native lib to validate the checksum, while in the cpu flame 
> graph 
> [async-prof-pid-27827-cpu-3.svg|https://issues.apache.org/jira/secure/attachment/12970683/async-prof-pid-27827-cpu-3.svg],
>   we can still see that about 32% CPU wasted on PureJavaCrc32#update,  which 
> means it's not using the faster hadoop native lib.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22531) The HFileReaderImpl#shouldUseHeap return the incorrect true when disabled BlockCache

2019-06-03 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854469#comment-16854469
 ] 

ramkrishna.s.vasudevan commented on HBASE-22531:


Nice one. +1.

> The HFileReaderImpl#shouldUseHeap return the incorrect true when disabled 
> BlockCache 
> -
>
> Key: HBASE-22531
> URL: https://issues.apache.org/jira/browse/HBASE-22531
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: HBASE-22531.HBASE-21879.v1.patch, 
> async-prof-pid-13311-alloc-4.svg
>
>
> I'm having a benchmark with block cache disabled for HBASE-21879 branch.   
> Just caurious about why still so many heap allocation in the heap allocation 
> flame graph [async-prof-pid-13311-alloc-4.svg | 
> https://issues.apache.org/jira/secure/attachment/12970648/async-prof-pid-13311-alloc-4.svg],
>actually, I've set the following config, which means all allocation should 
>  be offheap, while it's not: 
> {code}
> # Disable the block cache
> hfile.block.cache.size=0
> hbase.ipc.server.reservoir.minimal.allocating.size=0   # Let all allocation 
> from pooled allocator. 
> {code}
> Checked the code,  I found the problem here: 
> {code}
>   private boolean shouldUseHeap(BlockType expectedBlockType) {
> if (cacheConf.getBlockCache() == null) {
>   return false;
> } else if (!cacheConf.isCombinedBlockCache()) {
>   // Block to cache in LruBlockCache must be an heap one. So just 
> allocate block memory from
>   // heap for saving an extra off-heap to heap copying.
>   return true;
> }
> return expectedBlockType != null && !expectedBlockType.isData();
>   }
> {code}
> Say, the CacheConfig#getBlockCache  will return a Optional,  
> which is always non-null: 
> {code}
>   /**
>* Returns the block cache.
>*
>* @return the block cache, or null if caching is completely disabled
>*/
>   public Optional getBlockCache() {
> return Optional.ofNullable(this.blockCache);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22483) Maybe it's better to use 65KB as the default buffer size in ByteBuffAllocator

2019-05-31 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16852789#comment-16852789
 ] 

ramkrishna.s.vasudevan commented on HBASE-22483:


Excellent results !! Seems QPS is more stable and all the p9X latency are also 
stable. 

> Maybe it's better to use 65KB as the default buffer size in ByteBuffAllocator
> -
>
> Key: HBASE-22483
> URL: https://issues.apache.org/jira/browse/HBASE-22483
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: 121240.stack, BucketCacheWriter-is-busy.png, 
> checksum-stacktrace.png, with-buffer-size-64KB.png, with-buffer-size-65KB.png
>
>
> There're some reason why it's better to choose 65KB as the default buffer 
> size: 
> 1. Almost all of the data block have a block size: 64KB + delta, whose delta 
> is very small, depends on the size of lastKeyValue. If we use the default 
> hbase.ipc.server.allocator.buffer.size=64KB, then each block will be 
> allocated as a MultiByteBuff: one 64KB DirectByteBuffer and delta bytes 
> HeapByteBuffer, the HeapByteBuffer will increase the GC pressure. Ideally, we 
> should let the data block to be allocated as a SingleByteBuff, it has simpler 
> data structure, faster access speed, less heap usage... 
> 2. In my benchmark, I found some checksum stack traces . (see 
> [checksum-stacktrace.png 
> |https://issues.apache.org/jira/secure/attachment/12969905/checksum-stacktrace.png])
>  
>  Since the block are MultiByteBuff, so we have to calculate the checksum by 
> an temp heap copying ( see HBASE-21917), while if we're a SingleByteBuff, we 
> can speed the checksum by calling the hadoop' checksum in native lib, it's 
> more faster.
> 3. Seems the BucketCacheWriters were always busy because of the higher cost 
> of copying from MultiByteBuff to DirectByteBuffer.  For SingleByteBuff, we 
> can just use the unsafe array copying while for MultiByteBuff we have to copy 
> byte one by one.
> Anyway, I will give a benchmark for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache

2019-05-28 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849899#comment-16849899
 ] 

ramkrishna.s.vasudevan commented on HBASE-22422:


[~openinx]
I just asked a question in the PR. 

> Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
> 
>
> Key: HBASE-22422
> URL: https://issues.apache.org/jira/browse/HBASE-22422
> Project: HBase
>  Issue Type: Sub-task
>  Components: BlockCache
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, 
> 0001-debug3.patch, 0001-debug4.patch, 
> HBASE-22422-qps-after-fix-the-zero-retain-bug.png, 
> HBASE-22422.HBASE-21879.v01.patch, HBASE-22422.HBASE-21879.v02.patch, 
> LRUBlockCache-getBlock.png, debug.patch, 
> failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png
>
>
> After runing YCSB scan/get benchmark in our XiaoMi cluster,  we found the get 
> QPS dropped from  25000/s to hunderds per second in a cluster with five 
> nodes.  
> After enable the debug log at YCSB client side,  I found the following 
> stacktrace , see 
> https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png.
>  
> After looking into the stractrace, I can ensure that the zero refCnt block is 
> an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png
> Need a patch to fix this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22480) Get block from BlockCache once and return this block to BlockCache twice make ref count error.

2019-05-27 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849319#comment-16849319
 ] 

ramkrishna.s.vasudevan commented on HBASE-22480:


[~aoxiang]
So here there is no negative ref counting na. As per the v2 patch may be we 
should add to prevBlocks and then return them? 

> Get block from BlockCache once and return this block to BlockCache twice make 
> ref count error.
> --
>
> Key: HBASE-22480
> URL: https://issues.apache.org/jira/browse/HBASE-22480
> Project: HBase
>  Issue Type: Sub-task
>Reporter: binlijin
>Assignee: binlijin
>Priority: Major
> Attachments: HBASE-22480-master-v1.patch, HBASE-22480-master-v2.patch
>
>
> After debugging HBASE-22433, i find the problem it is that we get a block 
> from BucketCache once and return this block to BucketCache twice and make the 
> ref count error, sometimes the refCount can be negative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22480) Get block from BlockCache once and return this block to BlockCache twice make ref count error.

2019-05-27 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16849089#comment-16849089
 ] 

ramkrishna.s.vasudevan commented on HBASE-22480:


So here it means that the 
{code}
 HFileBlock seekToBlock = reader.getDataBlockIndexReader().seekToDataBlock(key, 
curBlock,
  cacheBlocks, pread, isCompaction, 
reader.getEffectiveEncodingInCache(isCompaction));
  if (seekToBlock == null) {
return false;
  }
{code}
the seekToDataBlock() returns the same curBlock and so you want the curBlock 
from being returned? But just after this anyway the curBlock is updated with a 
new block right? 

> Get block from BlockCache once and return this block to BlockCache twice make 
> ref count error.
> --
>
> Key: HBASE-22480
> URL: https://issues.apache.org/jira/browse/HBASE-22480
> Project: HBase
>  Issue Type: Sub-task
>Reporter: binlijin
>Assignee: binlijin
>Priority: Major
> Attachments: HBASE-22480-master-v1.patch, HBASE-22480-master-v2.patch
>
>
> After debugging HBASE-22433, i find the problem it is that we get a block 
> from BucketCache once and return this block to BucketCache twice and make the 
> ref count error, sometimes the refCount can be negative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22422) Retain an ByteBuff with refCnt=0 when getBlock from LRUCache

2019-05-23 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847259#comment-16847259
 ] 

ramkrishna.s.vasudevan commented on HBASE-22422:


bq.Understand now, it's a cnocurrent bug in RAMCache, say if thread1 try to 
getBlock as following: 
Good one. 

> Retain an ByteBuff with refCnt=0 when getBlock from LRUCache
> 
>
> Key: HBASE-22422
> URL: https://issues.apache.org/jira/browse/HBASE-22422
> Project: HBase
>  Issue Type: Sub-task
>  Components: BlockCache
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: 0001-debug2.patch, 0001-debug2.patch, 0001-debug2.patch, 
> 0001-debug3.patch, 0001-debug4.patch, HBASE-22422.HBASE-21879.v01.patch, 
> LRUBlockCache-getBlock.png, debug.patch, 
> failed-to-check-positive-on-web-ui.png, image-2019-05-15-12-00-03-641.png
>
>
> After runing YCSB scan/get benchmark in our XiaoMi cluster,  we found the get 
> QPS dropped from  25000/s to hunderds per second in a cluster with five 
> nodes.  
> After enable the debug log at YCSB client side,  I found the following 
> stacktrace , see 
> https://issues.apache.org/jira/secure/attachment/12968745/image-2019-05-15-12-00-03-641.png.
>  
> After looking into the stractrace, I can ensure that the zero refCnt block is 
> an intermedia index block, see [2] http://hbase.apache.org/images/hfilev2.png
> Need a patch to fix this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22448) Scan is slow for Multiple Column prefixes

2019-05-22 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-22448:
---
Attachment: org.apache.hadoop.hbase.filter.TestSlowColumnPrefix-output.zip

> Scan is slow for Multiple Column prefixes
> -
>
> Key: HBASE-22448
> URL: https://issues.apache.org/jira/browse/HBASE-22448
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Affects Versions: 1.4.8, 1.4.9
>Reporter: Karthick
>Assignee: Zheng Hu
>Priority: Critical
>  Labels: prefix, scan, scanner
> Fix For: 1.5.0, 1.4.10
>
> Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, 
> org.apache.hadoop.hbase.filter.TestSlowColumnPrefix-output.zip, 
> qualifiers.txt, scanquery.txt
>
>
> While scanning a row (around 10 lakhs columns) with 100 column prefixes, it 
> takes around 4 seconds in hbase-1.2.5 and when the same query is executed in 
> hbase-1.4.9 it takes around 50 seconds.
> Is there any way to optimise this?
>  
> *P.S:*
> We have applied the patch provided in 
> [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and  
> [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached 
> *qualifiers*.*txt* file which contains the column keys. Use the 
> *HBaseFileImport.java* file provided to populate in your table and use 
> *scanquery.txt* to query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-22448) Scan is slow for Multiple Column prefixes

2019-05-22 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845781#comment-16845781
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-22448 at 5/22/19 11:22 AM:
--

Attached the output with some sysouts. Seems we are doing lot of 
SEEK_USING_HINTS for every column that we already visited for each for the 
cells. And this goes on for every column.


was (Author: ram_krish):
Attached the output with some sysouts. Seems we are doing lot of 
SEEK_USING_HINTS for every column that we already visited for each for the 
cells. And this goes on. 

> Scan is slow for Multiple Column prefixes
> -
>
> Key: HBASE-22448
> URL: https://issues.apache.org/jira/browse/HBASE-22448
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Affects Versions: 1.4.8, 1.4.9
>Reporter: Karthick
>Assignee: Zheng Hu
>Priority: Critical
>  Labels: prefix, scan, scanner
> Fix For: 1.5.0, 1.4.10
>
> Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, 
> qualifiers.txt, scanquery.txt
>
>
> While scanning a row (around 10 lakhs columns) with 100 column prefixes, it 
> takes around 4 seconds in hbase-1.2.5 and when the same query is executed in 
> hbase-1.4.9 it takes around 50 seconds.
> Is there any way to optimise this?
>  
> *P.S:*
> We have applied the patch provided in 
> [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and  
> [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached 
> *qualifiers*.*txt* file which contains the column keys. Use the 
> *HBaseFileImport.java* file provided to populate in your table and use 
> *scanquery.txt* to query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22448) Scan is slow for Multiple Column prefixes

2019-05-22 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845781#comment-16845781
 ] 

ramkrishna.s.vasudevan commented on HBASE-22448:


Attached the output with some sysouts. Seems we are doing lot of 
SEEK_USING_HINTS for every column that we already visited for each for the 
cells. And this goes on. 

> Scan is slow for Multiple Column prefixes
> -
>
> Key: HBASE-22448
> URL: https://issues.apache.org/jira/browse/HBASE-22448
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Affects Versions: 1.4.8, 1.4.9
>Reporter: Karthick
>Assignee: Zheng Hu
>Priority: Critical
>  Labels: prefix, scan, scanner
> Fix For: 1.5.0, 1.4.10
>
> Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, 
> qualifiers.txt, scanquery.txt
>
>
> While scanning a row (around 10 lakhs columns) with 100 column prefixes, it 
> takes around 4 seconds in hbase-1.2.5 and when the same query is executed in 
> hbase-1.4.9 it takes around 50 seconds.
> Is there any way to optimise this?
>  
> *P.S:*
> We have applied the patch provided in 
> [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and  
> [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached 
> *qualifiers*.*txt* file which contains the column keys. Use the 
> *HBaseFileImport.java* file provided to populate in your table and use 
> *scanquery.txt* to query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22448) Scan is slow for Multiple Column prefixes

2019-05-22 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845592#comment-16845592
 ] 

ramkrishna.s.vasudevan commented on HBASE-22448:


SEems so. Previously only the list of ColumnPrefixfilters were doing a 
comparison on the qualifier and prefix but now after that we again seem to do 
more comparisons particularly when the prefix added to the filter list are not 
sorted. [~openinx] is that correct ?

> Scan is slow for Multiple Column prefixes
> -
>
> Key: HBASE-22448
> URL: https://issues.apache.org/jira/browse/HBASE-22448
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Affects Versions: 1.4.8, 1.4.9
>Reporter: Karthick
>Assignee: Zheng Hu
>Priority: Critical
>  Labels: prefix, scan, scanner
> Fix For: 1.5.0, 1.4.10
>
> Attachments: 0001-benchmark-UT.patch, HBaseFileImport.java, 
> qualifiers.txt, scanquery.txt
>
>
> While scanning a row (around 10 lakhs columns) with 100 column prefixes, it 
> takes around 4 seconds in hbase-1.2.5 and when the same query is executed in 
> hbase-1.4.9 it takes around 50 seconds.
> Is there any way to optimise this?
>  
> *P.S:*
> We have applied the patch provided in 
> [-HBASE-21620-|https://jira.apache.org/jira/browse/HBASE-21620] and  
> [-HBASE-21734-|https://jira.apache.org/jira/browse/HBASE-21734] . Attached 
> *qualifiers*.*txt* file which contains the column keys. Use the 
> *HBaseFileImport.java* file provided to populate in your table and use 
> *scanquery.txt* to query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-05-21 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845506#comment-16845506
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


[~lhofhansl]
The specific problem here was happening in 2.0 branches due to close() calls 
that happens twice - once during the scan and other one during the shipped() 
call to release the block ref in the block cache. In 1.3 that problem does not 
exist as far as I can see and the test case also did not fail. Do you see any 
other potential issue ? I can help here .

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.0.0
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>  Labels: compaction
> Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5
>
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22433) Corrupt hfile data

2019-05-20 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843720#comment-16843720
 ] 

ramkrishna.s.vasudevan commented on HBASE-22433:


Is this some how related to 
https://issues.apache.org/jira/browse/HBASE-19511. 
There we forced a onheap copy to avoid this ref counting issue. Seems the code 
has changed a lot now. 

> Corrupt hfile data
> --
>
> Key: HBASE-22433
> URL: https://issues.apache.org/jira/browse/HBASE-22433
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: binlijin
>Priority: Critical
>
> We use 2.2.0 version and encounter corrupt cell data.
> {code}
> 2019-05-15 22:53:59,354 ERROR 
> [regionserver/hb-mbasedata-14:16020-longCompactions-1557048533421] 
> regionserver.CompactSplit: Compaction failed 
> region=mktdm_id_src,9990,1557681762973.255e9adde013e370deb595c59a7285c3., 
> storeName=o, priority=196, startTime=1557931927314
> java.lang.IllegalStateException: Invalid currKeyLen 1700752997 or 
> currValueLen 2002739568. Block offset: 70452918, block length: 66556, 
> position: 42364 (without header).
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.checkKeyValueLen(HFileReaderImpl.java:1182)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readKeyValueLen(HFileReaderImpl.java:628)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1080)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208)
>  at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:644)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:386)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:326)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
>  at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
>  at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1429)
>  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2231)
>  at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:629)
>  at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:671)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> 2019-05-15 23:14:24,143 ERROR 
> [regionserver/hb-mbasedata-14:16020-longCompactions-1557048533422] 
> regionserver.CompactSplit: Compaction failed 
> region=mktdm_id_src,9fdee4,1557681762973.1782aebb83eae551e7bdfc2bfa13eb3d., 
> storeName=o, priority=194, startTime=1557932726849
> java.lang.RuntimeException: Unknown code 98
>  at org.apache.hadoop.hbase.KeyValue$Type.codeToType(KeyValue.java:274)
>  at org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(CellUtil.java:1307)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.getMidpoint(HFileWriterImpl.java:383)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishBlock(HFileWriterImpl.java:343)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:603)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:376)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:98)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.abortWriter(DefaultCompactor.java:42)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:335)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
>  at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
>  at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1429)
>  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2231)
>  at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:629)
>  at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:671)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at 

[jira] [Commented] (HBASE-22412) Improve the metrics in ByteBuffAllocator

2019-05-20 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843679#comment-16843679
 ] 

ramkrishna.s.vasudevan commented on HBASE-22412:


[~openinx]
Patch looks good to me. So what was the motivation to do this? As I can see 
from the example case you quoted in the description - you want to show the 
actual allocation on heap to be much lesser than the number of allocations ? 

> Improve the metrics in ByteBuffAllocator
> 
>
> Key: HBASE-22412
> URL: https://issues.apache.org/jira/browse/HBASE-22412
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: HBASE-22412.HBASE-21879.v1.patch, 
> HBASE-22412.HBASE-21879.v2.patch, HBASE-22412.HBASE-21879.v3.patch, JMX.png, 
> web-UI.png
>
>
> gAddress the comment in HBASE-22387: 
> bq. The ByteBuffAllocator#getFreeBufferCount will be O(N) complexity, because 
> the buffers here is an ConcurrentLinkedQueue. It's worth file an issue for 
> this.
> Also I think we should use the allcated bytes instead of allocation number to 
> evaluate the heap allocation percent , so that we can decide whether the 
> ByteBuffer is too small and whether will have higher GC pressure.  Assume the 
> case:  the buffer size is 64KB, and each time we have a block with 65KB, then 
> it will have one heap allocation (1KB) and one pool allocation (64KB), if 
> only consider the allocation num, then the heap allocation ratio will be 1 / 
> (1 + 1) = 50%, but if consider the allocation bytes, the allocation ratio 
> will be  1KB / 65KB = 1.5%.
> If the heap allocation percent is less than  
> hbase.ipc.server.reservoir.minimal.allocating.size /  
> hbase.ipc.server.allocator.buffer.size,  then the allocator  works fine, 
> otherwise it's overload. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-05-09 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836257#comment-16836257
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


Verified in branch-1 series. This issue does not exist there. Because we don't 
have the shipped() call and there is only one version of close(). On checking 
'closing' as true the updateREaders() does not proceed with the updation of the 
various scanners in the StoreScanner. The test attached in this patch does not 
fail in branch-1.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>  Labels: compaction
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-05-09 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-22072:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.1.5
   2.0.6
   2.3.0
   2.2.0
   Status: Resolved  (was: Patch Available)

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>  Labels: compaction
> Fix For: 2.2.0, 2.3.0, 2.0.6, 2.1.5
>
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-05-08 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835679#comment-16835679
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


Pushed to all the branch-2 lines. Need to rebase the patch for branch-1 series. 
Will resolve it once I push it there. Thanks for all the reviews.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>  Labels: compaction
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21921) Notify users if the ByteBufAllocator is always allocating ByteBuffers from heap which means the increacing GC pressure

2019-05-06 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834396#comment-16834396
 ] 

ramkrishna.s.vasudevan commented on HBASE-21921:


Good one.

> Notify users if the ByteBufAllocator is always allocating ByteBuffers from 
> heap which means the increacing GC pressure
> --
>
> Key: HBASE-21921
> URL: https://issues.apache.org/jira/browse/HBASE-21921
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Minor
> Attachments: HBASE-21921.HBASE-21879.v01.patch, 
> HBASE-21921.HBASE-21879.v02.patch, jmx-metrics.png, web-ui.png
>
>
> As the javadoc of ByteBuffAllocator says: 
> {code}
> There's possible that the desired memory size is large than ByteBufferPool 
> has, we'll downgrade to allocate ByteBuffers from heap which meaning the GC 
> pressure may increase again. Of course, an better way is increasing the 
> ByteBufferPool size if we detected this case. 
> {code}
> So I think we need some messages to remind the user that an larger 
> ByteBufferPool size may be better if the allocator allocate ByteBuffer from 
> heap frequently. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22090) The HFileBlock#CacheableDeserializer should pass ByteBuffAllocator to the newly created HFileBlock

2019-04-26 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826804#comment-16826804
 ] 

ramkrishna.s.vasudevan commented on HBASE-22090:


bq. private final ByteBuffAllocator allocator;
BucketEntry will have one more reference now. As Anoop said in RB this may be 
adding some more overhead. Is it better to have BucketEntry inside BucketCache 
only so that the bucket cache can have ref to the allocator ? 

> The HFileBlock#CacheableDeserializer should pass ByteBuffAllocator to the 
> newly created HFileBlock
> --
>
> Key: HBASE-22090
> URL: https://issues.apache.org/jira/browse/HBASE-22090
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: HBASE-22090.HBASE-21879.v01.patch
>
>
> In HBASE-22005, we have the following TODO in 
> HFileBlock#CacheableDeserializer:
> {code}
>   public static final class BlockDeserializer implements 
> CacheableDeserializer {
> private BlockDeserializer() {
> }
> @Override
> public HFileBlock deserialize(ByteBuff buf, boolean reuse, MemoryType 
> memType)
> throws IOException {
>// 
>   // TODO make the newly created HFileBlock use the off-heap allocator, 
> Need change the
>   // deserializer or change the deserialize interface.
>   return new HFileBlock(newByteBuff, usesChecksum, memType, offset, 
> nextBlockOnDiskSize, null,
>   ByteBuffAllocator.HEAP);
> }
> {code}
> Should use the global ByteBuffAllocator here rather than HEAP allocator, as 
> the TODO said, we need to adjust the interface of deserializer. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-18 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821671#comment-16821671
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


bq.Also in an earlier comment I have raised some more issues where we just open 
scanner on some files and do not use those scanner as the are TTL not matching 
for the scan . Well that is not happening in this issue still they are issues. 
I have not checked this comment or this code part. If at all there I think we 
can fix in new issue.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-18 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821670#comment-16821670
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


bq.But the scanner is still not over. And so the scanner did not get a chance 
to update the readers. So we can not really do this immediate return model.
This is what I tried to check in the code. As per my code reading it seems once 
a StoreScanner says close(false) in the next() flow or reseek() flow it means 
from the region level there are not going to be any other scan that is going to 
happen from that StoreScanner. Finally after a shipped call this store scanner 
will be closed when the scan completes. So I felt it is better we just don't 
update the readers in that case. And that is why if at all there is a close() 
call we just avoid the updateReaders itself. The other way to look at that is 
by making 'closing' true in all cases.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-17 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819979#comment-16819979
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


bq.Is it possible if other thread, performing updateReaders, see closing flag 
still false after StoreScanner#close acomplished?
As far as I see - since we have any way restricted the multi threaded way of 
accessing the 'closing' variable and always it is only one thread trying to 
read it it should be able to see the latest copy. Some one can correct me if my 
understanding is wrong here.
BTW thanks [~pKirillov] for the confirmation by testing it in your cluster.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-16 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-22072:
--

Assignee: ramkrishna.s.vasudevan

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-16 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-22072:
---
Status: Patch Available  (was: Open)

BTW I created the patch with HBASE-21879 branch. I had that branch for some 
reviews. If the patch is fine I can create patches for the master branch. Only 
the test case would need to be modified a little. 

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Priority: Major
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-16 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818898#comment-16818898
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


Created a patch that now creates a closeLock. I checked the code where 
close(false) happens when the current scanner thread sees there is no data to 
retrieve. And finally the close(true) will any way happen wthen the scan 
finishes the complete fetch of data and happens at the RegionScanner level. 
So it is the updateReaders and the close(true) call that may have happened 
asynchronously leading to the case that [~pKirillov]has mentioned here.
bq.notice flushedstoreFileScanners is an ArrayList, neither volatile no a 
threadsafe one. Rarely thread, that closes StoreScanner right after flusher 
thread executed StoreScanner.updateReaders may not see changes in 
flushedstoreFileScanners list and keeps unclosed scanner.
This am not sure. Declaring the flushedstoreFileScanners  as volatile is only 
ensuring the reference to be volatile but the contents of the list since in 
this patch we do with a lock i think the thread doing the close() and the 
thread doing updateReaders() should anyway be seeing the updated contents of 
the flushedstoreFileScanners  list. 
[~pKirillov]
Can you see this patch and give your comments? If you feel this is good can you 
try it in your cluster to see if the problem that you said happens again?

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Priority: Major
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-16 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-22072:
---
Attachment: HBASE-22072.HBASE-21879-v1.patch

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Priority: Major
> Attachments: HBASE-22072.HBASE-21879-v1.patch
>
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-16 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818669#comment-16818669
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


Am able to reproduce this. Will upload a formal test and then a fix for it.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Priority: Major
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-11 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815297#comment-16815297
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


Lets see how to take this forward. Will see how can we write a UT for this.

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Priority: Major
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-04-03 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808685#comment-16808685
 ] 

ramkrishna.s.vasudevan commented on HBASE-21879:


OOO for personal reasons. No access to official emails during this period.


> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, 
> QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-02 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808390#comment-16808390
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


May be we should have a lock for closing and the updateReader() should try to 
get that lock before trying to update the scanners? If already closing is done 
then don't do it? 

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Priority: Major
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22072) High read/write intensive regions may cause long crash recovery

2019-04-01 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806496#comment-16806496
 ] 

ramkrishna.s.vasudevan commented on HBASE-22072:


bq.updateReaders and further updateReaders procedure does not consider 
StoreScanner is closing or not.
Thanks [~pKirillov] for the analysis.
So you are saying that already the StoreScanner itself is getting closed  and 
during that time updateReaders is doing a new set of scanners  as part of which 
refCount increment happens. 
Need to see if this is really possible - can you write a UT to see if this case 
can be reproduced?

> High read/write intensive regions may cause long crash recovery
> ---
>
> Key: HBASE-22072
> URL: https://issues.apache.org/jira/browse/HBASE-22072
> Project: HBase
>  Issue Type: Bug
>  Components: Performance, Recovery
>Affects Versions: 2.1.2
>Reporter: Pavel
>Priority: Major
>
> Compaction of high read loaded region may leave compacted files undeleted 
> because of existing scan references:
> INFO org.apache.hadoop.hbase.regionserver.HStore - Can't archive compacted 
> file hdfs://hdfs-ha/hbase... because of either isCompactedAway=true or file 
> has reference, isReferencedInReads=true, refCount=1, skipping for now
> If region is either high write loaded this happens quite often and region may 
> have few storefiles and tons of undeleted compacted hdfs files.
> Region keeps all that files (in my case thousands) untill graceful region 
> closing procedure, which ignores existing references and drop obsolete files. 
> It works fine unless consuming some extra hdfs space, but only in case of 
> normal region closing. If region server crashes than new region server, 
> responsible for that overfiling region, reads hdfs folder and try to deal 
> with all undeleted files, producing tons of storefiles, compaction tasks and 
> consuming abnormal amount of memory, wich may lead to OutOfMemory Exception 
> and further region servers crash. This stops writing to region because number 
> of storefiles reach *hbase.hstore.blockingStoreFiles* limit, forces high GC 
> duty and may take hours to compact all files into working set of files.
> Workaround is a periodically check hdfs folders files count and force region 
> assign for ones with too many files.
> It could be nice if regionserver had a setting similar to 
> hbase.hstore.blockingStoreFiles and invoke attempt to drop undeleted 
> compacted files if number of files reaches this setting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory

2019-03-06 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-21874:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.x
   Status: Resolved  (was: Patch Available)

Pushed to branch-2 also. Resolving. 

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0, 2.x
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, 
> HBASE-21874_V6.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-03-06 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785918#comment-16785918
 ] 

ramkrishna.s.vasudevan commented on HBASE-21874:


Thanks for all the reviews and feedback. Pushed to master.
[~busbey], [~wchevreuil], [~jdcryans], [~elserj], [~vrodionov] & [~anoop.hbase].

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, 
> HBASE-21874_V6.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-03-05 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785285#comment-16785285
 ] 

ramkrishna.s.vasudevan commented on HBASE-21874:


The test failures are unrelated and seems to be flakey tests.

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, 
> HBASE-21874_V6.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory

2019-03-05 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-21874:
---
Attachment: HBASE-21874_V6.patch

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, 
> HBASE-21874_V6.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-03-05 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784692#comment-16784692
 ] 

ramkrishna.s.vasudevan commented on HBASE-21874:


bq.ExclusiveMemoryMmapIOEngine extends from FileMmapIOEngine which returns 
true, so it is needed.
[~busbey] - Added that usesSharedMemory to return false for just being 
explicit. But as [~wchevreuil] said since FileMMapIOEngine is implementing 
IOEngine interface by default it is false. So we can avoid that too in 
ExclusiveMemoryMmapIOEngine.

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory

2019-03-02 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-21874:
---
Status: Open  (was: Patch Available)

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory

2019-03-02 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-21874:
---
Status: Patch Available  (was: Open)

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory

2019-03-02 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-21874:
---
Attachment: HBASE-21874_V5.patch

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, HBASE-21874_V5.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory

2019-03-02 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-21874:
---
Status: Patch Available  (was: Open)

Patch seems to be big now due to the refactoring done. Now we have an abstract 
MmapIOEngine with ExclusiveMemoryMMapIOEngine (old FileMMapIOEngine) and 
SharedMemoryMMapIOEngine(PmemIOEngine) are its subclasses. Since both have 
similar mechanisms for mmaping and only the backing device is different which 
helps us in creating a SHARED memory  we went with this approach (so that is 
more abstract in nature). It also answers [~jdcryans] comments. 

[~busbey] - Thanks for pointing out the xmls and doc to be changed. We had 
missed it out. Let us know what you think of the latest patch. 

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21874) Bucket cache on Persistent memory

2019-03-02 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-21874:
---
Attachment: HBASE-21874_V4.patch

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, HBASE-21874_V4.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21981) MMaped bucket cache IOEngine does not work with persistence

2019-03-02 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-21981:
---
Summary: MMaped bucket cache IOEngine does not work with persistence  (was: 
MMaped bucket cache IOEngines does not work with persistence)

> MMaped bucket cache IOEngine does not work with persistence
> ---
>
> Key: HBASE-21981
> URL: https://issues.apache.org/jira/browse/HBASE-21981
> Project: HBase
>  Issue Type: Bug
>  Components: BucketCache
>Affects Versions: 2.1.3
>Reporter: ramkrishna.s.vasudevan
>Assignee: Anoop Sam John
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.0.5, 2.3.0, 2.1.4
>
>
> The MMap based IOEngines does not retrieve the data back if 
> 'hbase.bucketcache.persistent.path' is enabled. FileIOEngine works fine but 
> only the FileMMapEngine has this problem.
> The reason is that we don't get the byte buffers in the proper order while 
> reading back from the file in case of persistence.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21981) MMaped bucket cache IOEngines does not work with persistence

2019-03-02 Thread ramkrishna.s.vasudevan (JIRA)
ramkrishna.s.vasudevan created HBASE-21981:
--

 Summary: MMaped bucket cache IOEngines does not work with 
persistence
 Key: HBASE-21981
 URL: https://issues.apache.org/jira/browse/HBASE-21981
 Project: HBase
  Issue Type: Bug
  Components: BucketCache
Affects Versions: 2.1.3
Reporter: ramkrishna.s.vasudevan
Assignee: Anoop Sam John
 Fix For: 3.0.0, 2.2.0, 2.0.5, 2.3.0, 2.1.4


The MMap based IOEngines does not retrieve the data back if 
'hbase.bucketcache.persistent.path' is enabled. FileIOEngine works fine but 
only the FileMMapEngine has this problem.

The reason is that we don't get the byte buffers in the proper order while 
reading back from the file in case of persistence.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-02-21 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774760#comment-16774760
 ] 

ramkrishna.s.vasudevan commented on HBASE-21879:


bq.And you can get a ByteBuffer from a netty ByteBuf, by calling the nioBuffer 
method, no different from our ByteBuff. And we have CompositeByteBuf where we 
can have multiple ByteBuf combined.

Thanks [~Apache9]. Yes in the recent years seeing some Netty code - I was 
thinking while typing the above comment that your reply on using nioBuffer or 
CompositeByteBuf will be the answer for it. The ref count and the resource 
leaking detection may be different so I could be wrong there. 

Ya it will be a big project. The Cell,, Cellcomparators, CellUtils all needs to 
be changed and that will alone  be a big change. doing it in a seperate branch 
will be better. 
Thanks for the useful discussions here.


> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
>
> Attachments: QPS-latencies-before-HBASE-21879.png, 
> gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-02-21 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774386#comment-16774386
 ] 

ramkrishna.s.vasudevan commented on HBASE-21879:


However if at all we need netty's ref counting mechanism I believe the 
ResourceLeakDetector cannot be DISABLED. 

> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
>
> Attachments: QPS-latencies-before-HBASE-21879.png, 
> gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-02-21 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774380#comment-16774380
 ] 

ramkrishna.s.vasudevan commented on HBASE-21879:


Thanks for the ping here folks. From the docs that we prepared when we did the 
offheaping work we have the following points that were discussed 

Netty's ByteBuf and NIO Bytebuffers- The comparison using JMH showed that NIO 
BBs are 17% better. Ideally we should have seen similar performance but in the 
netty version 4.0.23 had this reference counting and memory leak detection 
mechanism which was actually not allowing the C2 compiler to do some proper 
iniling of the code. How ever netty 4.0.4 had the feature to disable the 
ResourceLeakDetector which brought the performance closer to the NIO case.

Still the reason that we went ahead with NIO - is indirectly a reason why this 
JIRA is created- in the sense that since HDFS was already having an API to pass 
NIO BB and read into the NIO BB, going with Netty ByteBuf would not allow that 
to happen easily because of the HDFS API. The other advantage is that if we are 
able to pass a offheap NIO BB we can avoid a copy to onheap once we read from 
the DFS. 

[~anoopsamjohn] - Is there anything I had missed out here. 

But I think the idea of Netty doing ref counting helps in avoiding we doing the 
ref counting which is adding some complexity. May be we had missed out some 
options- if so it would be great to know about them. Good one. 

> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
>
> Attachments: QPS-latencies-before-HBASE-21879.png, 
> gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-21 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773976#comment-16773976
 ] 

ramkrishna.s.vasudevan commented on HBASE-21874:


bq.sysctl -w vm.max_map_count=13
Thanks [~wchevreuil]. That was very useful. So in that case we need not 
configure the buffer size and just set this value at the Os level. 

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21916) Abstract an ByteBuffAllocator to allocate/free ByteBuffer in ByteBufferPool

2019-02-19 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772643#comment-16772643
 ] 

ramkrishna.s.vasudevan commented on HBASE-21916:


[~openinx]
Sorry for the time taken here. Just got back here. thanks for the ping. 
checking your patch and subtasks. 

> Abstract an ByteBuffAllocator to allocate/free ByteBuffer in ByteBufferPool
> ---
>
> Key: HBASE-21916
> URL: https://issues.apache.org/jira/browse/HBASE-21916
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
>
> Attachments: HBASE-21916.v1.patch, HBASE-21916.v2.patch, 
> HBASE-21916.v3.patch, HBASE-21916.v4.patch, HBASE-21916.v5.patch
>
>
> Now  our read/write path allocate ByteBuffer from the ByteBufferPool, but we 
> need consider the minSizeForReservoirUse for better utilization, those 
> allocate/free api are some static methods,  not so good to use. 
> For HBASE-21879,  we need an universal ByteBuffer allocator to manage all the 
> ByteBuffers through the entire read path, so create this issue. 
> Will upload a patch to abstract an ByteBufAllocator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-18 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771056#comment-16771056
 ] 

ramkrishna.s.vasudevan commented on HBASE-21874:


{quote}PmemIOEngine only overrides read() method to sign deserializers that 
memory type is shared.
{quote}
Yes. What you say is right. The heavy lifting was already done in HBASE-11425 
and so the change here seems to be very small given that bucket cache engine's 
already were doing what was needed. We are also trying to prepare a patch for 
multi file support. 

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-16 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770101#comment-16770101
 ] 

ramkrishna.s.vasudevan commented on HBASE-21874:


bq.Thus, main goal of setting "Direct Mode" here for now is not to use the 
persistence capabilities (although it's probably already working), but just 
have a mean to guarantee we use space from PMem device for caching (and not 
DRAM at all, which can't be guaranteed with "Memory Mode")

Right.

bq.So theoretically, we can already do this with the current FileMmapEngine, no?

Yes. Theoretically correct provided the file is on the pmem engine. But 
FileMmapEngine's assumes that if at all mmap is not able to fit the file in the 
DRAM then the block has to be copied onheap. So the entire block wil be copied 
to onheap. Our recent tests show that if we try to use as a mmap based file but 
on AEP the copy is costlier because we copy from the AEP a 64K block to onheap. 
So perf is on the lower side rather than doing what the pmem IOEngine does.

Thanks [~wchevreuil].

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-15 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769964#comment-16769964
 ] 

ramkrishna.s.vasudevan commented on HBASE-21874:


bq. Can you point the exact place in the patch where you control this?

We need not control at the Java level. It will controlled at the OS level. 
These devices are configured with DAX (Direct Access mode) at the OS level. 

As said in the link - here we use the App Direct mode and not the memory mode. 
Memory mode does not give us control as where the cache or the address space 
could reside. It may be on DRAM or Pmem address space. But here we specifically 
ask our cache to reside only on the Pmem area and once it is mapped in the Pmem 
address space everything is transparent to us.  

 

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-15 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769598#comment-16769598
 ] 

ramkrishna.s.vasudevan commented on HBASE-21874:


{quote}Where are your going to keep bucket cache? Not in DRAM definitely, hence 
in NVDIMM (PMEM)?
{quote}
Yes. The cache will reside in Pmem only.
{quote}If you keep data in PMEM and use extended FileMmapIOEngine, where do yo 
you mmap it into? into DRAM? That is strange
{quote}
We use extended FileMmapIOEngine, but the mmap won't do the memmory mapping to 
DRAM, it will mmap to a different address space maintained by the NVDIMM. So 
even if you have less DRAM capacity still your data is served from  PMEM's 
address space. That is why you can use the SHARED mode in the IOEngine. Where 
as in a file mmap case you will go with EXCLUSIVE - where you need to copy the 
content to the onheap memory.
{quote}My question, regarding file system required on top PMEM has remained 
unanswered.

You rely on file system on top of PMEM
{quote}
Pls check the description from [http://pmem.io.|http://pmem.io./] 

NVDIMMs are going to be addressed as an mmap files only unlike DRAM where you 
directly access the memory addresses.  
{quote}You mmap PMEM resided file into RAM
{quote}
No as explained previously.

Some related links which is there in public

https://software.intel.com/en-us/blogs/2018/10/30/intel-optane-dc-persistent-memory-a-major-advance-in-memory-and-storage-architecture

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2   3   4   5   6   7   8   9   10   >