[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399010#comment-16399010 ] stack commented on HBASE-11425: --- bq. We don't have any metric at the BBPool layer. That may be not needed too I believe. Needed or not needed (smile). File issue if former? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Fix For: 2.0.0 > > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398935#comment-16398935 ] Anoop Sam John commented on HBASE-11425: Nothing new very specific to the off heaping as such. We have the hit ratio etc at BC layer. We don't have any metric at the BBPool layer. That may be not needed too I believe. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Fix For: 2.0.0 > > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398748#comment-16398748 ] stack commented on HBASE-11425: --- Nice RN [~anoopsamjohn]. Do you have metrics or other attributes operators should watch to see how their offheap config is doing? Thanks. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Fix For: 2.0.0 > > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374801#comment-16374801 ] Anoop Sam John commented on HBASE-11425: On read path off heaping doc in out book, not many new things are required. There is nothing much new to be configured by user. The bucket cache off heap mode was there before also. All the changes were very internal only. Ya we need to make sure the Bucket cache usage related stuff in book is up to date and with a highlight for using the off heap mode more. It is even better than the on heap cache (Old days we called it the L1 cache).. I think some changes we did as part of some of the jiras. Need to verify again. On write path yes, some new info to be written. Let me raise some jiras for these. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Fix For: 2.0.0 > > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374783#comment-16374783 ] ramkrishna.s.vasudevan commented on HBASE-11425: Atleast as far as I know the read path documentation is available as blog but not the write path related things. We may need to update and write about it I believe. Will read this blog once again to know if the read path offheap is upto date or needs some update. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Fix For: 2.0.0 > > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374689#comment-16374689 ] stack commented on HBASE-11425: --- [~anoopsamjohn] [~ram_krish] Is there doc on this feature and the write-side offheaping for users? Need it in the refguide even if it just a paragraph. There is stuff here in Yu Li's blog... Is it stale? https://blogs.apache.org/hbase/entry/offheap-read-path-in-production > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Major > Fix For: 2.0.0 > > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121247#comment-15121247 ] stack commented on HBASE-11425: --- Hurray! > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961754#comment-14961754 ] Anoop Sam John commented on HBASE-11425: bq.It should not OOME. If it does, its broke? Agree fully.. We need fix this and it is a critical bug. Was just trying to understand flow so that we can reproduce it easily here :-) > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961753#comment-14961753 ] Anoop Sam John commented on HBASE-11425: One doubt after seeing the logs... The OOME comes during a time when some compaction happens? There was one open item we discussed after the shipped() call during the compaction..This API gets called during the scan flow after we ship a set of rows back to client.. So all the blocks other than the cur block we came across during this scan, can get released. (Ref count decrements). But during compaction this call is not at all happening and only at the end one close happens. We can correct this.. I have a quick patch for that. U will be interested to see it and test it with once boss? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961738#comment-14961738 ] ramkrishna.s.vasudevan commented on HBASE-11425: In the latest tests that we did, Iensured that after the loading was done we try to simply run the ycsb read workloads and ensure that we load the block cache and at the same time there is lot of evicitons as the bucket cache confirgured is lesser in size than the available data. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961735#comment-14961735 ] stack commented on HBASE-11425: --- This is a different loading. This is me trying to run a suite of ycsb load and workloads but I'd forgotten to disable bucketcache. It should not OOME. If it does, its broke? See the attached image for what is holding objects. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961733#comment-14961733 ] Anoop Sam John commented on HBASE-11425: Is the log at the time of workload C (pure reads alone)? Because I can see lot of flushes and compaction logs in it. Exact steps to reproduce can u give boss? Will help us to reproduce and fix. What we do is Start the cluster and pump in whole data. Then stopping the cluster so that whole data is flushed and in disk Now start cluster again. Make a full table scan so that the entire data is read once and getting cached to BC Now run the YCSB workload C > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, Screen Shot 2015-10-16 at 5.13.22 PM.png, gc.png, > gets.png, heap.png, load.png, median.png, ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961355#comment-14961355 ] stack commented on HBASE-11425: --- My master branch is at this stage: commit 1458798eb593358fe5415596b2958f2f7e451ea5 Author: stack Date: Tue Oct 13 15:16:57 2015 -0700 HBASE-14600 Make #testWalRollOnLowReplication looser still > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, gc.png, gets.png, heap.png, load.png, median.png, > ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961356#comment-14961356 ] stack commented on HBASE-11425: --- My master branch is at this stage: commit 1458798eb593358fe5415596b2958f2f7e451ea5 Author: stack Date: Tue Oct 13 15:16:57 2015 -0700 HBASE-14600 Make #testWalRollOnLowReplication looser still > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, gc.png, gets.png, heap.png, load.png, median.png, > ram.log > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961322#comment-14961322 ] stack commented on HBASE-11425: --- Seems easy for me to reproduce. Just happened again. Here is the log. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, gc.png, gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959007#comment-14959007 ] Yu Li commented on HBASE-11425: --- Thanks for the update [~anoop.hbase], the number looks great and nice work! It's a pity that we have to wait until 2.0 released to take this advantage, maybe months away... > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, gc.png, gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958968#comment-14958968 ] Anoop Sam John commented on HBASE-11425: [~carp84] I have done a quick multi get test using PE tool default settings (ie. 1 GB data) Multi get with 100 rows and 25 client threads and each doing the op 10 times. Avg completion time for each thread On heap LRU Cache (L1) : 9492ms Off heap Bucket Cache (L2): 9596ms So it is almost same. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, gc.png, gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958674#comment-14958674 ] ramkrishna.s.vasudevan commented on HBASE-11425: I could run with OOME with even 2G of heap without OOME. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, gc.png, gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958587#comment-14958587 ] ramkrishna.s.vasudevan commented on HBASE-11425: Yes. We can . We will test once with LRU and bucket cache and see how much diff we have. Thanks. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, gc.png, gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958505#comment-14958505 ] Yu Li commented on HBASE-11425: --- >From the "Offheap reads in HBase using BBs_V2.pdf" doc, we could see a kind of >big perf gap between BucketCache and LRUCache, excerpt as below: {noformat} Multiple Gets with 25 threads AvgRT 95th 99th BucketCache 111.81130133 LRUCache23.49 34 39 {noformat} But from the latest comments here it seems this data is out of date, right? Mind update the doc with latest perf data? Thanks. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, gc.png, gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958409#comment-14958409 ] ramkrishna.s.vasudevan commented on HBASE-11425: Just checking the code bq.Or if it is fitting into the bucket cache, probably that was a file that was trying to get compacted and there was a reader referencing to it and due to the OOME the ref count decrement did not happen and the forceful eviction was failing. This should not be the case. Any way logs will be better. Trying to reproduce this case if possible. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, GC pics > with evictions_4G heap.png, HBASE-11425-E2E-NotComplete.patch, > HBASE-11425.patch, Offheap reads in HBase using BBs_V2.pdf, Offheap reads in > HBase using BBs_final.pdf, gc.png, gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958253#comment-14958253 ] stack commented on HBASE-11425: --- Workload c. Try 4g. I don't see why 4g should not be enough when 7//regions and 8g of offheap and all cache hits. I could be wrong. I have not dug in... > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, > HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, Offheap reads in HBase > using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf, gc.png, > gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958244#comment-14958244 ] ramkrishna.s.vasudevan commented on HBASE-11425: BTW - this behaviour are you seeing while doing pure read workload? ie. YCSB workload c? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, > HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, Offheap reads in HBase > using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf, gc.png, > gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14958230#comment-14958230 ] ramkrishna.s.vasudevan commented on HBASE-11425: bq.Have you fellas run this for a while? I seem to OOME easy enough. I tried with a small heap, 4G, but it OOME'd then had to keep going back up to 16G again though I had a big offheap. I tried to capture the increasing use of heap but this diagram is best I got... I've not done heap analysis.. but in the diagram you can see heap use start to rise and then plummet... now I am crawling doing Full GCs every couple of seconds. Argh!! We have run some read related workloads for an hour or so but did not observe any OOMEs. But our heap size was set at 32G. This is the case with YCSB and did not observe any full GCs. In case of PE tool we only used 9G heap size and 16G offheap size. But did not observe any fluctuations in the heap usage. bq.2015-10-14 16:51:10,058 DEBUG [main-BucketCacheWriter-2] bucket.BucketCache: This block 3f0157e7daee45fdb25202c496c95c46_1649898813 is still referred by 1 readers. Can not be freed now May be because of OOME some error handling is not done. Let us internally install a cluster to test this. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, > HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, Offheap reads in HBase > using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf, gc.png, > gets.png, heap.png, load.png, median.png > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693077#comment-14693077 ] Hudson commented on HBASE-11425: FAILURE: Integrated in HBase-TRUNK #6717 (See [https://builds.apache.org/job/HBase-TRUNK/6717/]) HBASE-14188- Read path optimizations after HBASE-11425 profiling- (ramkrishna: rev aa3538f80278f5c0ba1bc8ca903066fa02ac79ec) * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/FilterAllFilter.java > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, > HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, Offheap reads in HBase > using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660142#comment-14660142 ] Hudson commented on HBASE-11425: FAILURE: Integrated in HBase-TRUNK #6701 (See [https://builds.apache.org/job/HBase-TRUNK/6701/]) HBASE-14188 - Read path optimizations after HBASE-11425 profiling (Ram) (ramkrishna: rev 7a9e10dc11877420c53245c403897d746bebc077) * hbase-common/src/main/java/org/apache/hadoop/hbase/OffheapKeyValue.java * hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderImpl.java * hbase-common/src/test/java/org/apache/hadoop/hbase/TestOffheapKeyValue.java * hbase-server/src/test/java/org/apache/hadoop/hbase/filter/FilterAllFilter.java * hbase-server/src/main/java/org/apache/hadoop/hbase/SizeCachedNoTagsKeyValue.java * hbase-server/src/main/java/org/apache/hadoop/hbase/SizeCachedKeyValue.java * hbase-common/src/main/java/org/apache/hadoop/hbase/nio/MultiByteBuff.java > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, > HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, Offheap reads in HBase > using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585129#comment-14585129 ] Anoop Sam John commented on HBASE-11425: https://docs.google.com/document/d/1WHLYmccHw28itox4qdeXRgH5SZkt_zruSzngcmmBiUs/edit?usp=sharing > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, > HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, Offheap reads in HBase > using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14584761#comment-14584761 ] stack commented on HBASE-11425: --- [~anoopsamjohn] and [~ram_krish] as per chat this morning, post the original word doc and I'll stick it up in a shared google doc so we can all comment/bang on it... or if you are able, you post it as a google doc. Thanks lads. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: BenchmarkTestCode.zip, Benchmarks_Tests.docx, > HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, Offheap reads in HBase > using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523448#comment-14523448 ] stack commented on HBASE-11425: --- [~anoop.hbase] The write up is excellent -- especially the bit where you dumb it down listing out conclusion at end of each test and do up the table with all results. Suggest you do an edit, add a conclusion, and post the jmh files on this issue (I could not open them from the doc) rather than in the doc, and then post a note to dev list pointing at these findings since we are going to build on top of them going forward (we should put it on hbase blog?). Very nice. Here are some comments on the doc that might help w/ the edit. Suggest you add sentence after first on why we want to go offheap (this will make the doc 'standalone') Change: "When we support E2E off heap support and in turn support Cell also backed by off heap memory, we have to make sure to select the best performing data structure/framework for this off heap storage." to "When we implement E2E off heap support, we have to make sure to select the best performing data structure/framework." s/below test is/below tests are/ s/pros like, our PRC layer/pros such as our RPC layer already/ s/HDFS/an HDFS/ Change "But the NIO Buffer APIs have complaints over its performance and methods not inlineable" to But NIO ByteBuffers can be slow (boundary checks and/or some methods may not inline). Change: "This make us to think for netty also as it seems better performing. (Really? We have test results below)" to "This makes us look to Netty ByteBuf as a possibly better performing alternative." On first test, add sentence comparing difference between onheap and offheap runs (onheap looks 30% slower compared to onheap?) Do same comparing jdk8 to jdk7? (Hmm... yeah, would like to see the code... smile) s/Similar way/Similarly/ s/The next test cmopare/The next test compares/ s/come almost/come out almost/ s/test what I/test that I/ > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Benchmarks_Tests.docx, > HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, Offheap reads in HBase > using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493832#comment-14493832 ] ramkrishna.s.vasudevan commented on HBASE-11425: We will update all the other things as you have said in the doc. bq.You think it could be different now we have buffer reuse? We could save making a cellblock? We can try once again with buffers reuse. But still writing multiple cells individually to the socket was the time taking factor which was reduced when we were creating a cell block. Will come back to the other comments shortly. Thanks [~saint@gmail.com]. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493712#comment-14493712 ] stack commented on HBASE-11425: --- Took another read of the doc (and above comments on it). # (Continuing from comments above), suggest adding to the doc estimation of how many extra objects will be made going this route and Vladimir's grid to show what you are focused on. # Did you fellas have a look at how others do offheaping or if there were libs you could have made use of? Would have been good to include notes on your findings in here. # The section on hasArray (if hasArray is false, it seems to imply hasByteBuffer is true) and the discussion of added APIs and when they come into effect and when they throw unsupported exceptions will need a rewrite in light of feedback above and review of recent patches (API method names I think we've cleaned up too). # Sounds like you fellas looked at netty ByteBuf too. Add in your findings I'd say. # Would have liked to have more detail around the RPC findings. You think it could be different now we have buffer reuse? We could save making a cellblock? # Looking at diagrams for perf, I can't tell if more is better or not. Suggest you write up a summary of what the diagrams are showing. # This feature when on, will be for whole server, right? Can't do by table or region, right? Thanks lads. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484678#comment-14484678 ] Anoop Sam John commented on HBASE-11425: Yes, I would say we deal with HBASE-10800 first and then the ServerCell and then come to this Jira. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483658#comment-14483658 ] stack commented on HBASE-11425: --- So, what is going on with this patch now? You want the CellComparator patch in first HBASE-10800? Let me look at ServerCell patch too. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391033#comment-14391033 ] ramkrishna.s.vasudevan commented on HBASE-11425: Also this patch would need certain public facing APIs to be deprecated and new ones to be added. That can be done as a seperate task too. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391030#comment-14391030 ] ramkrishna.s.vasudevan commented on HBASE-11425: Yes Stack. We have plans to split up the patch. But the most challenging part is how to split it up and in what order? Few tasks like Cell API changes, Ref Counting, MultiByteBuffer class can all be added individually. But the rest are all inter wined. We can think of a strategy to do it. Thanks for having a look. We are parallel\y working on the review comments too. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391000#comment-14391000 ] Anoop Sam John commented on HBASE-11425: Basically we have to split it into Sub patches.. This we posted so that there can be a look and understanding on the general approach.. I think you got to that :-) Any way we got some good feedback.. We are working on that.. The ServerCell stuff am doing. In a day more I can put up some thing so that we can know what is the changes... Pls stay tuned.. -Anoop- > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390984#comment-14390984 ] stack commented on HBASE-11425: --- The patch is too big. I'm 7 pages in on a 13 page review. Could we piecemeal it? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388541#comment-14388541 ] Hadoop QA commented on HBASE-11425: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708404/HBASE-11425.patch against master branch at commit f1f4b6618334767d0da0f47965309b21676e7e9f. ATTACHMENT ID: 12708404 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 246 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.1 2.5.2 2.6.0) {color:red}-1 javac{color}. The applied patch generated 63 javac compiler warnings (more than the master's current 46 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 14 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 2050 checkstyle errors (more than the master's current 1926 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +ByteBufferUtils.copyFromBufferToByteArray(val, getValueBuffer(), getValueOffset(), 0, getValueLength()); +ByteBufferUtils.copyFromBufferToByteArray(fam, getFamilyBuffer(), getFamilyOffset(), 0, getFamilyLength()); +ByteBufferUtils.copyFromBufferToByteArray(qual, getQualifierBuffer(), getQualifierOffset(), 0, getQualifierLength()); +ByteBufferUtils.copyFromBufferToByteArray(row, getRowBuffer(), getRowOffset(), 0, getRowLength()); + ByteBufferUtils.copyFromBufferToByteArray(minimumMidpointArray, right, rightOffset, 0, diffIdx + 1); +ByteBufferUtils.copyFromBufferToByteArray(minimumMidpointArray, left, leftOffset, 0, diffIdx); +ByteBufferUtils.copyFromBufferToByteArray(minimumMidpointArray, right, rightOffset, 0, diffIdx + 1); +return matchingFamily(left, left.getFamilyOffset(), left.getFamilyLength(), buf, offset, length); + cell.getFamilyArray(), cell.getFamilyOffset(), cell.getFamilyLength(), cell.getQualifierArray(), + public static int findCommonPrefixInQualifierPart(Cell left, Cell right, int qualifierCommonPrefix) { {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13505//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13505//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13505//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13505//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13505//console This message is automatically generated. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388355#comment-14388355 ] ramkrishna.s.vasudevan commented on HBASE-11425: RB link https://reviews.apache.org/r/32687/ > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, HBASE-11425.patch, > Offheap reads in HBase using BBs_V2.pdf, Offheap reads in HBase using > BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380193#comment-14380193 ] ramkrishna.s.vasudevan commented on HBASE-11425: Any comments/suggestions on the E2E patch attached. We are working on refining the patch such that some more cases where we need to clearly distinguish the getXXXArray and getXXXBB methods. Suggestions on the above patch would help us greatly. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, Offheap reads in > HBase using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367500#comment-14367500 ] ramkrishna.s.vasudevan commented on HBASE-11425: Not able to add to RB. The RB tool hangs when we try to add a patch. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: HBASE-11425-E2E-NotComplete.patch, Offheap reads in > HBase using BBs_V2.pdf, Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355488#comment-14355488 ] Vladimir Rodionov commented on HBASE-11425: --- {quote} I think No. 4 (DBE - OFF, block compression ON, byte array backed) is going to be faster than No. 5 (DBE - ON, block compression OFF , byte buffer - backed) in any benchmark. {quote} in *almost* any. Scan with lots of skips will be faster when block compression is off, I think. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355375#comment-14355375 ] Anoop Sam John commented on HBASE-11425: Having block data in compressed form in the BC is optional thing. In such a case, yes, have to decompress first and at that time, it can be to a byte array backed BB. We are not trying to change that. The change is when the data is cached in the non compressed form (But can be in the DBE form). Then avoiding need for copy. The block can be backed by N offheap buckets. Cells are made out of that. And cells are backed by buffers rather than byte[] then > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355373#comment-14355373 ] Anoop Sam John commented on HBASE-11425: Having block data in compressed form in the BC is optional thing. In such a case, yes, have to decompress first and at that time, it can be to a byte array backed BB. We are not trying to change that. The change is when the data is cached in the non compressed form (But can be in the DBE form). Then avoiding need for copy. The block can be backed by N offheap buckets. Cells are made out of that. And cells are backed by buffers rather than byte[] then > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356461#comment-14356461 ] Anoop Sam John commented on HBASE-11425: You mean for all these case, the data is already cached and come from BC? Or direct read from DFS? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356459#comment-14356459 ] Anoop Sam John commented on HBASE-11425: You say about this case hbase.block.data.cachecompressed = true right? Yes this recently added feature allow to keep block data in compressed form in BC. When this block is read from BC, these happens Step 1 : We create a new on heap buffer and copy compressed data from buckets into it. Make an HFileBlock backed by this compressed data Step 2 : Unpack this block. Then we will create a new byte[] with size equal to the uncompresed data size for this block. The Compress algo will do uncompress of the block data into this new buffer As in our changes we avoid this step 1 new buffer and copy need. We create a block backed by MBB. We have new InputStream over MBB and pass that for uncomress. Yes Step 2 will be still there. So adv here also. Make some sense? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356290#comment-14356290 ] stack commented on HBASE-11425: --- [~ram_krish] On this "But I would argue not do that because then the user would have two types of Cells - one on the write path and the other cell on the read path.", expecting clients to use BB in a wonky way is not on (smile). I think [~anoop.hbase] cleaned up what is meant. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356281#comment-14356281 ] Anoop Sam John commented on HBASE-11425: bq.We discussed on that ServerCell concepts. But I would argue not do that because then the user would have two types of Cells - one on the write path and the other cell on the read path. I would say that would make things more complex and not much ease of use too. No here what I mean is the Cell interface wont change and at client side, the user will still interact with Cell. ServerCell is an extension for Cell which is only at server side. Both in read and write paths. Only thing is that the CP and Filter will get a new type then. ServerCell instead of Cell. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356187#comment-14356187 ] ramkrishna.s.vasudevan commented on HBASE-11425: bq.Would be good to get some answers on his questions too. I could take that IA. Get some tests in this area to be more clear on this. bq.If only for illustration of where you are focused, suggest you add to the doc Sure. Would also ensure the other points that were specifically discussed gets added to the doc. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356185#comment-14356185 ] ramkrishna.s.vasudevan commented on HBASE-11425: Adding to what Anoop says, we are not trying to focus on where we do compression. That part remains the same. It is only after we start using an Hfileblock we tend to use it in a offheap mode and particularly avoid the copy that happens in the BucketCache every time a block needs to be used. (In this case it is going to be a decompressed block only). But one point to note here is that in the case of DBE it is an encoded block - and we still go on with the encoded block only and the existing logic of decoding the block still works the same way. In the existing code there are two copies that happen here - one from the BucketCache and other in the DBE algo. Now we try to avoid the first one. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356184#comment-14356184 ] ramkrishna.s.vasudevan commented on HBASE-11425: Adding to what Anoop says, we are not trying to focus on where we do compression. That part remains the same. It is only after we start using an Hfileblock we tend to use it in a offheap mode and particularly avoid the copy that happens in the BucketCache every time a block needs to be used. (In this case it is going to be a decompressed block only). But one point to note here is that in the case of DBE it is an encoded block - and we still go on with the encoded block only and the existing logic of decoding the block still works the same way. In the existing code there are two copies that happen here - one from the BucketCache and other in the DBE algo. Now we try to avoid the first one. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356182#comment-14356182 ] ramkrishna.s.vasudevan commented on HBASE-11425: bq.Should bucket size be same as the hfile block size? Yes. that would be better in many cases how ever the odd blocks may go beyond the hfile block size. bq.Can MBB be developed in isolation with tests and refcounting tests apart from main code base? Is that being done? We need some tests for the refcounting part. Apart from that they can be individual tasks as Anoop says. Reg the BB and comparators having two paths, that would be the ideal way as per the profiler reports. That is because for all the KVs that is coming from the HHFiles we have Buffer backed cells. But for the cells in memstore is byte[]. So as mentioned in the doc, if we try to create only BB based rows, families and qualifiers, we may have to do wrapping of these byte[]. That is a costlier operation. Also in cases of creating fake keys it is always better to create fake keys in byte[] rather than in BB because for BB's we have to do some allocation and then copy the contents. All these are costlier. Hence when we create a fake key and compare it against a key from HFile we have two version of cells. One backed by byte[] and another by BB. So it would be better if/else based comparisons. Reg the Unsafe comparators, They are just the same as in byte[] array now. bq.So, you might want to underline this point. Its BB but WE are managing the position and length to save on object creation and to bypass BB range checking, etc. Yes. That is the important decision that we had to make. One objective is to reduce the objects creation and another is to use the same APIs for offset and length. bq.Client won't be offheaping? If so, could the BB APIs be mixed in to Cell on the server only? We discussed on that ServerCell concepts. But I would argue not do that because then the user would have two types of Cells - one on the write path and the other cell on the read path. I would say that would make things more complex and not much ease of use too. I would try to make a trunk based patch and upload for reference. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355518#comment-14355518 ] Vladimir Rodionov commented on HBASE-11425: --- {quote} BigBase open source Vladimir? {quote} Yes, but is not Apache. yet . > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355502#comment-14355502 ] stack commented on HBASE-11425: --- I like [~vrodionov] 's grid. If only for illustration of where you are focused, suggest you add to the doc [~anoop.hbase] Would be good to get some answers on his questions too. BigBase open source Vladimir? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355497#comment-14355497 ] stack commented on HBASE-11425: --- bq. I wanted to come to this topic. This is hard coded now. Can this be made configurable? If this can be larger value,like block size, better as per our changes Yeah, configurable sounds right but thinking on it more, rare will be the case that the fuzzy hfile block will fit into the BC hard-coded block. bq. May be that should be tried after a delay? Would this be a new block type? One that is not backed by BC? No on delay. Flag its happening and move on. It'd be an offheap allocation I suppose so will be delay enough (smlie). bq. Am I making it clear? Yes. bq. Some thing like a ServerCell which extend Cell? We can't give users an API that has you get data from a BB but you need to use the enclosing Cell to figure where to read from and how much. Users will kick us out! bq. Yes the extra BB wrapper which has to be created every time one calls getXXXArray(). Would be interesting to see cost. Would be sweet if only one readpath... but I'd imagine the perf difference will be too great so we'll have to have two. bq. It reads data from BB bypassing the BB APIs. Add this to doc. Thanks. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355484#comment-14355484 ] Vladimir Rodionov commented on HBASE-11425: --- {quote} Would be cool if decompress could be done with native code before we the brought the block into BC. {quote} bigbase.org does that :). Block Cache compression is all native. Unfortunately, do not have time to continue working on this project now, may be in a near future. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355478#comment-14355478 ] Vladimir Rodionov commented on HBASE-11425: --- {quote} The change is when the data is cached in the non compressed form (But can be in the DBE form) {quote} [~anoop.hbase], if the goal of this JIRA performance improvement, have you estimate the following scenarios: # DBE - ON, block compression - OFF (byte array end-to-end - BA) # DBE - OFF, block compression - OFF (BA) # DBE - ON, block compression - ON (BA) # DBE - OFF, block compression - ON (BA) # DBE - ON, block compression - OFF (byte buffer end-to-end - BB) # DBE - OFF, block compression - OFF (BB) # DBE - ON, block compression - ON (BB) # DBE - OFF, block compression - ON (BB) You optimize No 5 use case only. Do you think its going to be faster than any of first four (BA)? People like compression, especially if it does not affect benchmark performance too much and helps them with their application. Essentially, you advise users not to use compression and use only DBE, but all scan operations and get/multi get take performance hit with DBE enabled. I think No. 4 (DBE - OFF, block compression ON, byte array backed) is going to be faster than No. 5 (DBE - ON, block compression OFF , byte buffer - backed) in any benchmark. These are my words of caution ... before you start such a large project - make sure that benefits you are hoping for are really possible. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355465#comment-14355465 ] stack commented on HBASE-11425: --- [~vrodionov] Sortof. This effort takes us further along a couple of paths. There is the foreground being able to have most of the data offheap when reading. An ancillary is proving an alternate Cell implementation is possible, one that is other than KeyValue. After the lads have the above behind us, we can move to the next interesting challenges. For example, a PrefixTree Cell implementation that keeps the key and value encoded/compressed as we traverse the read path. Regards the particular point you raise, yeah, would have to decompress currently do put this read-path on top of it. Would be cool if decompress could be done with native code before we the brought the block into BC. TODO > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355463#comment-14355463 ] Anoop Sam John commented on HBASE-11425: bq.This ain't right, is it? Usually we have folks hover just below 32G so can do compressed pointers. I think I have seen in mails some users have 48G also. At least some users who were trying some PoCs.(Offline I met).. Any way can change to ~32G. :-) bq.Should bucket size be same as the hfile block size? I wanted to come to this topic. This is hard coded now. Can this be made configurable? If this can be larger value,like block size, better as per our changes bq.Can MBB be developed in isolation with tests and refcounting tests apart from main code base? Is that being done? Yep. When put patches, we can make sure to do this way. Those in sub tasks. bq.The eviction is now made more complicated because have to check for non-zero refcount? And what if can't find necessary memory? What happens? The eviction try evict some unused blocks. If all are like in read (worst case), the new block can not be cached. May be that should be tried after a delay? bq.Why not? We copy from the LRU blocks to Cell arrays? Couldn't Cells go against the LRU blocks directly too? Or I have it wrong? In the LRU we cache the block object itself. It has its own underlying memory. Even if an in read progress block is evicted, the memory area it refers to , is not freed. Only thing is that after this read, that block will not be referenced and so the block data area too. Am I making it clear? bq.I don't see a downside listing that we'll be doubling the objects made when offheap reading. Is that right? In read say we deal with N HFileBlock, we will be having extra objects MBB objects created for each block. But per cell we wont create any new objects. In comparators etc, we check hasArray() and based on that use the buffer/array based APIs. When creating BB backed cells from an HFileBlock which is backed by MBB, we try best to refer to original BB (and item in MBB) and not create/duplicate extra BBs. But yes some etra objects will be there. (duplicated BBs) I can give a count based on a test scenario Stack. Was in middle of some thing else and missed doing this. bq. have to read from the MemStore so this means that read path can be a mix of onheap and offheap results? yes bq. or maybe the holes have been plugged by 'Using getXXXArray() would throw UnSupportedOperationException. '? And Yep. If the Cell impl is backed by a BB (on heap/off heap) its getXXXArray APIs will throw UnSupportedOperationException bq.So, you might want to underline this point. Its BB but WE are managing the position and length to save on object creation and to bypass BB range checking, etc Yes. correct bq.Client won't be offheaping? If so, could the BB APIs be mixed in to Cell on the server only? Some thing like a ServerCell which extend Cell? Sounds reasonable.. Have some discuss like this also. bq.So, why have the switch at all? The hasArray switch? Why not BB it all the time? It would simplify the read path. Disadvantage would be it'd be extra objects? Yes the extra BB wrapper which has to be created every time one calls getXXXArray(). It is an extra obj creation and some ops (like limit, pos checks) which happens in the BB classes. That is bit costly only. Had done some Unit tests. Ram have the numbers or so? bq.When you say this: "Note that even if the HFileBlock is on heap BB we do not support getXXXArray() APIs. " This is only if hasArray returns false, right? Yes when hasArray return false. The point is when the Cell is backed by a buffer then we will have hasArray as false. (whether DBB/HBB) bq.Tell us more about the unsafe manipulation of BBs? How's that work? It reads data from BB bypassing the BB APIs. Directly read from memory. HBASE-12345 having a patch which add Unsafe based compare for data in BB. Similar way added for reading int/long etc. Same we do for bytes in Bytes.java > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will cr
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355378#comment-14355378 ] Anoop Sam John commented on HBASE-11425: Having block data in compressed form in the BC is optional thing. In such a case, yes, have to decompress first and at that time, it can be to a byte array backed BB. We are not trying to change that. The change is when the data is cached in the non compressed form (But can be in the DBE form). Then avoiding need for copy. The block can be backed by N offheap buckets. Cells are made out of that. And cells are backed by buffers rather than byte[] then > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355374#comment-14355374 ] Anoop Sam John commented on HBASE-11425: Having block data in compressed form in the BC is optional thing. In such a case, yes, have to decompress first and at that time, it can be to a byte array backed BB. We are not trying to change that. The change is when the data is cached in the non compressed form (But can be in the DBE form). Then avoiding need for copy. The block can be backed by N offheap buckets. Cells are made out of that. And cells are backed by buffers rather than byte[] then > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355376#comment-14355376 ] Anoop Sam John commented on HBASE-11425: Having block data in compressed form in the BC is optional thing. In such a case, yes, have to decompress first and at that time, it can be to a byte array backed BB. We are not trying to change that. The change is when the data is cached in the non compressed form (But can be in the DBE form). Then avoiding need for copy. The block can be backed by N offheap buckets. Cells are made out of that. And cells are backed by buffers rather than byte[] then > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355372#comment-14355372 ] Anoop Sam John commented on HBASE-11425: Having block data in compressed form in the BC is optional thing. In such a case, yes, have to decompress first and at that time, it can be to a byte array backed BB. We are not trying to change that. The change is when the data is cached in the non compressed form (But can be in the DBE form). Then avoiding need for copy. The block can be backed by N offheap buckets. Cells are made out of that. And cells are backed by buffers rather than byte[] then > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355360#comment-14355360 ] Vladimir Rodionov commented on HBASE-11425: --- I am quite skeptical to all this idea ... Here is why: Off heap cache can store blocks in a compressed form. It means that you won't be able to back HFileBlock by a such compressed block - you have to decompress it first. From performance point of view it does not matter whether you do this into direct BB (new approach) or into a byte array-backed BB (existing). Am I missing anything? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355297#comment-14355297 ] stack commented on HBASE-11425: --- Thanks for the writeup. Makes it easier discussing this new dev. "Typical used value for max heap size is 32-48 GB." This ain't right, is it? Usually we have folks hover just below 32G so can do compressed pointers. "Each bucket’s size is fixed to 4KB." Should bucket size be same as the hfile block size? Can MBB be developed in isolation with tests and refcounting tests apart from main code base? Is that being done? High-level, general question: So eviction was easy before. When memory pressure just evict until needed memory is made available. The eviction is now made more complicated because have to check for non-zero refcount? And what if can't find necessary memory? What happens? "Note that the LRU Cache does not have this block reference counting happening as that does not deal with BBs and deals with the HFileblock objects directly." Why not? We copy from the LRU blocks to Cell arrays? Couldn't Cells go against the LRU blocks directly too? Or I have it wrong? I don't see a downside listing that we'll be doubling the objects made when offheap reading. Is that right? "Please note that the Cells in the memstore are still KV based (byte [] backed)" ... this is because you are only doing read-path in this JIRA, right? Then again, reading, we have to read from the MemStore so this means that read path can be a mix of onheap and offheap results? On adding new methods to Cell, are there 'holes'? We talked about this in the past and it seemed like there could be strange areas in the Cell API if you did certain calls. If you don't know what I am on about, I'll dig up the old discussion (I think it was on mailing list... Ram you asked for input). ... or maybe the holes have been plugged by 'Using getXXXArray() would throw UnSupportedOperationException. '? And "This will make so many short living objects creation also. That is why we decided to go with usage of getXXXOffset() and getXXXLength() API usage also along with buffer based APIs" So, you might want to underline this point. Its BB but WE are managing the position and length to save on object creation and to bypass BB range checking, etc. What does that mean for the 'client'? When you give out a BB, its position, etc., is not to be relied upon. That will be disorientating. Pity you couldn't throw unsupportedexception if they tried use position, etc. So you need BB AND the Cell to get at content. BB for the array and then Cell for the offset and length... So, this API is for users on client-side? It is going to confuse them when they have a BB but the position and limit are duds. In client, when would they be doing BB? Never? Client won't be offheaping? If so, could the BB APIs be mixed in to Cell on the server only? So, why have the switch at all? The hasArray switch? Why not BB it all the time? It would simplify the read path. Disadvantage would be it'd be extra objects? When you say this: "Note that even if the HFileBlock is on heap BB we do not support getXXXArray() APIs. " This is only if hasArray returns false, right? Yeah, looks like 2.0. Tell us more about the unsafe manipulation of BBs? How's that work? Nice writeup. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353001#comment-14353001 ] zhangduo commented on HBASE-11425: -- [~anoopsamjohn] increment/decrement in one place sounds good to be. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352996#comment-14352996 ] Anoop Sam John commented on HBASE-11425: The ref count and increment/decrement happens in one place. You can get more when seeing code. Yes it should be HBASE-13412 yet. Thanks for correcting. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352989#comment-14352989 ] zhangduo commented on HBASE-11425: -- I'm still a little worried about the ref counting part as I said before. Sometimes it could be a disaster for later developers because it is easy to miss a decrement but very hard to know a problem is caused by missing a decrement, and even we know, it is hard to find where we miss it... Let's see the code, maybe we could find a way to handle it cleanly. BTW, it should be a typo, you mean HBASE-13142 at the end of the document? We do not have HBASE-13412 yet. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Attachments: Offheap reads in HBase using BBs_final.pdf > > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184075#comment-14184075 ] Anoop Sam John commented on HBASE-11425: {quote} Testing with a 2 million Cells with single cell per row. Writing all cells to a BB/DBB and trying a seek with to last kv. (To make compare across all cells in BB/DBB) Seek code is like what we have in ScannerV3#blockSeek with RK length 17 bytes (1st 13 bytes are same) Getting almost same result. With RK length 117 bytes (1st 113 bytes are same) the DBB based read is ~3% degrade {quote} Well in this test, the read and compare were from HBB and DBB and those are almost same. In case of our CellComparator we have Unsafe based optimization. In my old test this was not in use. With Unsafe based read from HBB#array() [this is what happens in HFileReaderV2/V3] there is a significant perf diff with DBB. Here RK length of 117 bytes and 2 millions cells and we seek to last cell, the DBB test is 50% slower. :( I am thinking of doing Unsafe based compares for data in DBB as well. Just done Unsafe based access from DBB/HBB and then we are in a better shape. The DBB based above test is ~12% slower than old HBB.array() based compares. Will raise a subtask and attach the approach there. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077991#comment-14077991 ] stack commented on HBASE-11425: --- Nice [~anoop.hbase] Mind posting your little test tool so can run local? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077940#comment-14077940 ] Andrew Purtell commented on HBASE-11425: Does it make a difference if using the BR API instead? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077898#comment-14077898 ] Anoop Sam John commented on HBASE-11425: BB only. And not the actual read perf number. Just the seek part and so reads from BB vs DBB bq.How about 1kb RKs? Will test more cases. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077868#comment-14077868 ] Andrew Purtell commented on HBASE-11425: This is with BB or BR? How about 1kb RKs? > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077656#comment-14077656 ] Anoop Sam John commented on HBASE-11425: Testing with a 2 million Cells with single cell per row. Writing all cells to a BB/DBB and trying a seek with to last kv. (To make compare across all cells in BB/DBB) Seek code is like what we have in ScannerV3#blockSeek with RK length 17 bytes (1st 13 bytes are same) Getting almost same result. With RK length 117 bytes (1st 113 bytes are same) the DBB based read is ~3% degrade. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073043#comment-14073043 ] Anoop Sam John commented on HBASE-11425: bq.We need BR instead of BB to work around issues with BB API issues: inlining pessimism, range checking and index compensations that cannot be skipped for performance, and related. Yes. So we will have our own written HeapBB/DirectBB stuff than just wrapping the nio objects. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050050#comment-14050050 ] Anoop Sam John commented on HBASE-11425: Yes we need ref counting. The MemstoreSlab chunk pool is having a similar need and doing the ref counting way. bq.Any idea of how much slower an offheap merge sort will be doing BB#get (or BR#get)? I'm up for doing a bit of measuring Not done any compare test. Will do some plain test doing just key compare(millions of times) reading from DBB and HBB(This will use the unsafe compare). Got into some bug fixes on visibility. Will start again next week. That will be great if u can measure boss. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047590#comment-14047590 ] ramkrishna.s.vasudevan commented on HBASE-11425: bq.Currently we copy the block cache bytes onheap to guard against the blocks being evicted out from under us. We'll do reference counting? I think this is one major area where we may have to work. If a block is currently getting scanned set a reference and do not evict them. On what basis should we select the next block that can be evicted ? Need to do some more analysis on this. Interesting!!. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047396#comment-14047396 ] stack commented on HBASE-11425: --- Thanks for filing this one and yeah lets finish up the Cell convertions. How we thinking to back Cells w/ bytes that are back in the block cache? Currently we copy the block cache bytes onheap to guard against the blocks being evicted out from under us. We'll do reference counting? Any idea of how much slower an offheap merge sort will be doing BB#get (or BR#get)? I'm up for doing a bit of measuring > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046730#comment-14046730 ] ramkrishna.s.vasudevan commented on HBASE-11425: All the subtasks under HBASE-7320 are in some way helping this. Few more may be needed here. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046726#comment-14046726 ] ramkrishna.s.vasudevan commented on HBASE-11425: bq. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. This is the idea for changing to Cells in the read path. Still there are some places which this was not achieved. I will those tasks to this. bq.We need BR instead of BB to work around issues with BB API issues Yes. +1 on t his. HBASE-10772, HBASE-10773 are all those related to this. I can link the related tasks to this JIRA to have a clear picture on the subtasks. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046224#comment-14046224 ] Andrew Purtell commented on HBASE-11425: bq. Now one related Q is we go with BR rather than BB for APIs. +1 We need BR instead of BB to work around issues with BB API issues: inlining pessimism, range checking and index compensations that cannot be skipped for performance, and related. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11425) Cell/DBB end-to-end on the read-path
[ https://issues.apache.org/jira/browse/HBASE-11425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046164#comment-14046164 ] Anoop Sam John commented on HBASE-11425: Now one related Q is we go with BR rather than BB for APIs. > Cell/DBB end-to-end on the read-path > > > Key: HBASE-11425 > URL: https://issues.apache.org/jira/browse/HBASE-11425 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 0.99.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > > Umbrella jira to make sure we can have blocks cached in offheap backed cache. > In the entire read path, we can refer to this offheap buffer and avoid onheap > copying. > The high level items I can identify as of now are > 1. Avoid the array() call on BB in read path.. (This is there in many > classes. We can handle class by class) > 2. Support Buffer based getter APIs in cell. In read path we will create a > new Cell with backed by BB. Will need in CellComparator, Filter (like SCVF), > CPs etc. > 3. Avoid KeyValue.ensureKeyValue() calls in read path - This make byte copy. > 4. Remove all CP hooks (which are already deprecated) which deal with KVs. > (In read path) > Will add subtasks under this. -- This message was sent by Atlassian JIRA (v6.2#6252)