[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611693#comment-16611693 ] binlijin commented on HBASE-16594: -- bq.Did this get abandoned? Sorry for the late replay, i do not continue this task and abandon it. > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Improvement > Components: Performance >Reporter: binlijin >Assignee: binlijin >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608319#comment-16608319 ] Lars Hofhansl commented on HBASE-16594: --- Did this get abandoned? > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Improvement > Components: Performance >Reporter: binlijin >Assignee: binlijin >Priority: Major > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143366#comment-16143366 ] Anoop Sam John commented on HBASE-16594: +1 on resuming this work. Once the latest patch is available, can do reviews. Ya lets try getting this in for 2.0 > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.5.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957392#comment-15957392 ] Sean Busbey commented on HBASE-16594: - did this get fixed as a part of the parent jira? > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784984#comment-15784984 ] binlijin commented on HBASE-16594: -- I do not compare the perf with PrefixTree. But PrefixTree encoding is slow and hard to improve, so i give up the prefix tree... > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781916#comment-15781916 ] Chang chen commented on HBASE-16594: Hi Guys How does ROW_INDEX_VX encoder compare to prefix tree? Thanks Chang > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15486789#comment-15486789 ] Yu Li commented on HBASE-16594: --- +1 on the idea, but I think it might be better to supply data of EncodedSeekPerformanceTest and E2E testing just like you did in V1, rather than using a special case. Wdyt? [~aoxiang] Let's also wait for others' thoughts. > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482816#comment-15482816 ] binlijin commented on HBASE-16594: -- [~anoop.hbase] [~ram_krish] [~saint@gmail.com] mind take a look? Thanks very much. > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482804#comment-15482804 ] binlijin commented on HBASE-16594: -- I get a part of column family a' data, and test it with ROW_INDEX_V2. Second the random get qps result is: {code} RegionServer Network out is about 1.8GB 8k NONE (CPU System/User 7/58) QPS=167k 8k Row_Index_V1 (CPU System/User 7/60) QPS=164k 8k Row_Index_V2 (CPU System/User 7/52) QPS=164k 16k NONE (CPU System/User 7/59) QPS=166.5k 16k Row_Index_V1 (CPU System/User 7/55) QPS=165.6k 16k Row_Index_V2 (CPU System/User 7/54) QPS=165k 32k NONE (CPU System/User 7/63) QPS=165k 32k Row_Index_V1 (CPU System/User 7/56) QPS=166k 32k Row_Index_V2 (CPU System/User 7/54) QPS=164k 64k NONE (CPU System/User 7/65) QPS=160k 64k Row_Index_V1 (CPU System/User 7/56) QPS=165k 64k Row_Index_V2 (CPU System/User 7/53) QPS=165k {code} > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482795#comment-15482795 ] binlijin commented on HBASE-16594: -- I get a part of column family a' data, and test it with ROW_INDEX_V2. First the detail info is: {code} number of rows : 456399 avgKeyLen=56 avgValueLen=11 entries=69742427 length=5609482650 avg cells per row : 69742427/456399=152.8 avg row size: (56+11) * 152.8=10237.6(10k) COMPRESSION => 'NONE' BlockSize=8k DATA_BLOCK_ENCODING => 'NONE’ 5671843807 BlockSize=8k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 5683168196 BlockSize=8k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 3354641599 BlockSize=16k DATA_BLOCK_ENCODING => 'NONE’ 5636883803 BlockSize=16k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 5643473654 BlockSize=16k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 3306460265 BlockSize=32k DATA_BLOCK_ENCODING => 'NONE’ 5618631549 BlockSize=32k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 5622842708 BlockSize=32k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 3284154231 BlockSize=64k DATA_BLOCK_ENCODING => 'NONE’ 5609482650(5.22GB) BlockSize=64k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 5612502105(5.23GB) BlockSize=64k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 3273791654(3.05GB) -41.6% COMPRESSION => 'LZO' BlockSize=8k DATA_BLOCK_ENCODING => 'NONE’ 1.13GB BlockSize=8k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 1.13GB BlockSize=8k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 997MB BlockSize=16k DATA_BLOCK_ENCODING => 'NONE’ 1.03GB BlockSize=16k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 1.03GB BlockSize=16k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 884MB BlockSize=32k DATA_BLOCK_ENCODING => 'NONE’ 981MB BlockSize=32k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 983MB BlockSize=32k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 800MB BlockSize=64k DATA_BLOCK_ENCODING => 'NONE’ 970MB BlockSize=64k DATA_BLOCK_ENCODING => 'ROW_INDEX_V1’ 971MB BlockSize=64k DATA_BLOCK_ENCODING => 'ROW_INDEX_V2’ 744MB -23.3% {code} > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch, HBASE-16594-master_v2.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476986#comment-15476986 ] binlijin commented on HBASE-16594: -- The data is too big so only a few cached in LruCache when decompression, so need to test it when all data cached in LruCache. > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476744#comment-15476744 ] binlijin commented on HBASE-16594: -- The performance on a single regionserver is : BlockSize=8K DATA_BLOCK_ENCODING => 'NONE' (CPU 4/42) 37k BlockSize=16K DATA_BLOCK_ENCODING => 'NONE' (CPU 3/41) 41k BlockSize=32K DATA_BLOCK_ENCODING => 'NONE' (CPU 3/45) 43k BlockSize=64K DATA_BLOCK_ENCODING => 'NONE' (CPU 3/46) 36k BlockSize=32k DATA_BLOCK_ENCODING => 'Row_Index_V1' (CPU 4/45) 45k BlockSize=32k DATA_BLOCK_ENCODING => 'Row_Index_V2' (CPU 4/48) 64k > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16594) ROW_INDEX_V2 DBE
[ https://issues.apache.org/jira/browse/HBASE-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15476741#comment-15476741 ] binlijin commented on HBASE-16594: -- I do test with one of our very important table. This table have 5 column family(why have so many families, this is for history reason.) I get one region's data and do the random get performance on a regionserver. This region's detail information is: number of row : 3463153 5 family: a,b,c,d,f family a : avgKeyLen=54,avgValueLen=12 entries=234100060 length=4369389736(4.07GB) family b : avgKeyLen=53,avgValueLen=10 entries=51913519 length=981625160(936MB) family c : avgKeyLen=50,avgValueLen=6 entries=14864860 length=273820502(261MB) family d : avgKeyLen=50,avgValueLen=6 entries=141422679 length=3216604161(3GB) family f : avgKeyLen=38,avgValueLen=13 entries=73084074 length=1174375801(1.09GB) avg cells per row family a : 67.6 family b : 15 family c : 4.3 family d : 40.8 family f : 21.1 BlockSize=8k COMPRESSION=LZO RegionSize=9.33GB DATA_BLOCK_ENCODING => 'NONE' BlockSize=16k COMPRESSION=LZO RegionSize=8.52GB DATA_BLOCK_ENCODING => 'NONE' BlockSize=32k COMPRESSION=LZO RegionSize=7.81GB DATA_BLOCK_ENCODING => 'NONE' BlockSize=64k COMPRESSION=LZO RegionSize=7.74GB DATA_BLOCK_ENCODING => 'NONE' BlockSize=32k COMPRESSION=LZO RegionSize=7.84GB DATA_BLOCK_ENCODING => 'ROW_INDEX_V1' BlockSize=32k COMPRESSION=LZO RegionSize=6.24GB DATA_BLOCK_ENCODING => 'ROW_INDEX_V2' > ROW_INDEX_V2 DBE > > > Key: HBASE-16594 > URL: https://issues.apache.org/jira/browse/HBASE-16594 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: binlijin > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-16594-master_v1.patch > > > See HBASE-16213, ROW_INDEX_V1 DataBlockEncoding. > ROW_INDEX_V1 is the first version which have no storage optimization, > ROW_INDEX_V2 do storage optimization: store every row only once, store column > family only once in a HFileBlock. > ROW_INDEX_V1 is : > /** > * Store cells following every row's start offset, so we can binary search to > a row's cells. > * > * Format: > * flat cells > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * integer: dataSize > * > */ > ROW_INDEX_V2 is : > * row1 qualifier timestamp type value tag > * qualifier timestamp type value tag > * qualifier timestamp type value tag > * row2 qualifier timestamp type value tag > * row3 qualifier timestamp type value tag > * qualifier timestamp type value tag > * > * integer: number of rows > * integer: row0's offset > * integer: row1's offset > * > * column family > * integer: dataSize -- This message was sent by Atlassian JIRA (v6.3.4#6332)