[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2018-09-12 Thread binlijin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611693#comment-16611693
 ] 

binlijin commented on HBASE-16594:
--

bq.Did this get abandoned?

Sorry for the late replay, i do not continue this task and abandon it.

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2018-09-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608319#comment-16608319
 ] 

Lars Hofhansl commented on HBASE-16594:
---

Did this get abandoned?

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2017-08-27 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143366#comment-16143366
 ] 

Anoop Sam John commented on HBASE-16594:


+1 on resuming this work.  Once the latest patch is available, can do reviews. 
Ya lets try getting this in for 2.0

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.5.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2017-04-05 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957392#comment-15957392
 ] 

Sean Busbey commented on HBASE-16594:
-

did this get fixed as a part of the parent jira?

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2016-12-29 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784984#comment-15784984
 ] 

binlijin commented on HBASE-16594:
--

I do not compare the perf with PrefixTree. But PrefixTree encoding is slow and 
hard to improve, so i give up the prefix tree...

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2016-12-27 Thread Chang chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781916#comment-15781916
 ] 

Chang chen commented on HBASE-16594:


Hi Guys 

How does ROW_INDEX_VX encoder compare to prefix tree? 

Thanks
Chang

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2016-09-13 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486789#comment-15486789
 ] 

Yu Li commented on HBASE-16594:
---

+1 on the idea, but I think it might be better to supply data of 
EncodedSeekPerformanceTest  and E2E testing just like you did in V1, rather 
than using a special case. Wdyt? [~aoxiang]

Let's also wait for others' thoughts.

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2016-09-11 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482816#comment-15482816
 ] 

binlijin commented on HBASE-16594:
--

[~anoop.hbase] [~ram_krish] [~saint@gmail.com] mind take a look? Thanks 
very much.

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
>Assignee: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2016-09-11 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482804#comment-15482804
 ] 

binlijin commented on HBASE-16594:
--

I get a part of column family a' data, and test it with ROW_INDEX_V2.
Second the random get qps result is:
{code}
RegionServer Network out is about 1.8GB

8k   NONE  (CPU System/User 7/58)  QPS=167k
8k   Row_Index_V1  (CPU System/User 7/60)  QPS=164k
8k   Row_Index_V2  (CPU System/User 7/52)  QPS=164k

16k  NONE  (CPU System/User 7/59)  QPS=166.5k
16k  Row_Index_V1  (CPU System/User 7/55)  QPS=165.6k
16k  Row_Index_V2  (CPU System/User 7/54)  QPS=165k

32k  NONE  (CPU System/User 7/63)  QPS=165k
32k  Row_Index_V1  (CPU System/User 7/56)  QPS=166k
32k  Row_Index_V2  (CPU System/User 7/54)  QPS=164k

64k  NONE  (CPU System/User 7/65)  QPS=160k
64k  Row_Index_V1  (CPU System/User 7/56)  QPS=165k
64k  Row_Index_V2  (CPU System/User 7/53)  QPS=165k
{code}


> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2016-09-11 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482795#comment-15482795
 ] 

binlijin commented on HBASE-16594:
--

I get a part of column family a' data, and test it with ROW_INDEX_V2.
First the detail info is:
{code}
number of rows : 456399

avgKeyLen=56
avgValueLen=11
entries=69742427
length=5609482650

avg cells per row : 69742427/456399=152.8
avg row size: (56+11) * 152.8=10237.6(10k)

COMPRESSION => 'NONE'
BlockSize=8k   DATA_BLOCK_ENCODING => 'NONE’  5671843807
BlockSize=8k   DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’  5683168196
BlockSize=8k   DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’  3354641599

BlockSize=16k  DATA_BLOCK_ENCODING => 'NONE’  5636883803
BlockSize=16k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’  5643473654
BlockSize=16k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’  3306460265

BlockSize=32k  DATA_BLOCK_ENCODING => 'NONE’  5618631549
BlockSize=32k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’  5622842708
BlockSize=32k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’  3284154231

BlockSize=64k  DATA_BLOCK_ENCODING => 'NONE’  5609482650(5.22GB)
BlockSize=64k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’  5612502105(5.23GB)
BlockSize=64k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’  3273791654(3.05GB) -41.6%


COMPRESSION => 'LZO'
BlockSize=8k   DATA_BLOCK_ENCODING => 'NONE’  1.13GB
BlockSize=8k   DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’  1.13GB
BlockSize=8k   DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’  997MB

BlockSize=16k  DATA_BLOCK_ENCODING => 'NONE’  1.03GB
BlockSize=16k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’  1.03GB
BlockSize=16k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’  884MB

BlockSize=32k  DATA_BLOCK_ENCODING => 'NONE’  981MB
BlockSize=32k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’  983MB
BlockSize=32k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’  800MB

BlockSize=64k  DATA_BLOCK_ENCODING => 'NONE’  970MB
BlockSize=64k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’  971MB
BlockSize=64k  DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’  744MB -23.3%
{code}

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2016-09-09 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476986#comment-15476986
 ] 

binlijin commented on HBASE-16594:
--

The data is too big so only a few cached in LruCache when decompression, so 
need to test it when all data cached in LruCache.

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2016-09-09 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476744#comment-15476744
 ] 

binlijin commented on HBASE-16594:
--

The performance on a single regionserver is :
BlockSize=8K  DATA_BLOCK_ENCODING => 'NONE' (CPU 4/42)  37k
BlockSize=16K DATA_BLOCK_ENCODING => 'NONE' (CPU 3/41)  41k
BlockSize=32K DATA_BLOCK_ENCODING => 'NONE' (CPU 3/45)  43k
BlockSize=64K DATA_BLOCK_ENCODING => 'NONE' (CPU 3/46)  36k
BlockSize=32k DATA_BLOCK_ENCODING => 'Row_Index_V1' (CPU 4/45)  45k
BlockSize=32k DATA_BLOCK_ENCODING => 'Row_Index_V2' (CPU 4/48)  64k

> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE

2016-09-09 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476741#comment-15476741
 ] 

binlijin commented on HBASE-16594:
--

I do test with one of our very important table. This table have 5 column 
family(why have so many families, this is for history reason.)
I get one region's data and do the random get performance on a regionserver.

This region's detail information is:
number of row : 3463153
5 family: a,b,c,d,f

family a : avgKeyLen=54,avgValueLen=12  entries=234100060  
length=4369389736(4.07GB)
family b : avgKeyLen=53,avgValueLen=10  entries=51913519   
length=981625160(936MB)
family c : avgKeyLen=50,avgValueLen=6   entries=14864860   
length=273820502(261MB)
family d : avgKeyLen=50,avgValueLen=6   entries=141422679  
length=3216604161(3GB)
family f : avgKeyLen=38,avgValueLen=13  entries=73084074   
length=1174375801(1.09GB)

avg cells per row
family a :  67.6
family b :  15
family c :  4.3
family d :  40.8
family f :  21.1


BlockSize=8k  COMPRESSION=LZO RegionSize=9.33GB  DATA_BLOCK_ENCODING => 'NONE'
BlockSize=16k COMPRESSION=LZO RegionSize=8.52GB  DATA_BLOCK_ENCODING => 'NONE'
BlockSize=32k COMPRESSION=LZO RegionSize=7.81GB  DATA_BLOCK_ENCODING => 'NONE'
BlockSize=64k COMPRESSION=LZO RegionSize=7.74GB  DATA_BLOCK_ENCODING => 'NONE'
BlockSize=32k COMPRESSION=LZO RegionSize=7.84GB  DATA_BLOCK_ENCODING => 
'ROW_INDEX_V1'
BlockSize=32k COMPRESSION=LZO RegionSize=6.24GB  DATA_BLOCK_ENCODING => 
'ROW_INDEX_V2'


> ROW_INDEX_V2 DBE
> 
>
> Key: HBASE-16594
> URL: https://issues.apache.org/jira/browse/HBASE-16594
> Project: HBase
>  Issue Type: Sub-task
>  Components: Performance
>Reporter: binlijin
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16594-master_v1.patch
>
>
> See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding.
> ROW_INDEX_V1 is the first version which have no storage optimization, 
> ROW_INDEX_V2 do storage optimization: store every row only once, store column 
> family only once in a HFileBlock.
> ROW_INDEX_V1 is : 
> /** 
>  * Store cells following every row's start offset, so we can binary search to 
> a row's cells. 
>  * 
>  * Format: 
>  * flat cells 
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * integer: dataSize 
>  * 
> */
> ROW_INDEX_V2 is :
>  * row1 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  * row2 qualifier timestamp type value tag
>  * row3 qualifier timestamp type value tag
>  *  qualifier timestamp type value tag
>  *  
>  * integer: number of rows 
>  * integer: row0's offset 
>  * integer: row1's offset 
>  *  
>  * column family
>  * integer: dataSize 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)