[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4532:
--

Attachment: hbase-4532-remove-system.out.println.patch

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4532:
--

Attachment: (was: hbase-4532-remove-system.out.println.patch)

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4532:
--

Attachment: hbase-4532-remove-system.out.println.patch

Attached file has the system out line removed.

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4532:
--

Fix Version/s: 0.94.0

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread Liyin Tang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4532:
--

Description: 
The previous jira, HBASE-4469, is to avoid the top row seek operation if 
row-col bloom filter is enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.


The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.


Evaluation from TestSeekOptimization:
Previously:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

So we can only get 10% more seek savings if the ROWCOL bloom filter is 
enabled.[HBASE-4469]



After this change:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

So we can get 10% more seek savings for ALL kinds of bloom filter.

  was:
HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.


The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.




 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 

[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread Liyin Tang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4532:
--

Attachment: hbase-4532-89-fb.patch

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread Liyin Tang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4532:
--

Attachment: HBASE-4532-apache-trunk.patch

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-18 Thread jsichi (John Sichi) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jsichi (John Sichi) updated HBASE-4532:
---

Attachment: D27.1.patch

Liyin requested code review of [jira] [HBASE-4532] Avoid top row seek by 
dedicated bloom filter for delete family bloom filter.
Reviewers: DUMMY_REVIEWER

  hbase 4532 89

  a href=https://issues.apache.org/jira/browse/HBASE-4469; title=Avoid top 
row seek by looking up bloomfilterdelHBASE-4469/del/a avoids the top 
row seek operation if row-col bloom filter is enabled.
  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family

  The only subtle use case is when we are interested in the top row with empty 
column.

  For example,
  we are interested in row1/cf1:/1/put.
  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
  Then it will avoid the top row seek and return a fake kv, which is the last 
kv for this row (createLastOnRowCol).
  In this way, we have already missed the real kv we are interested in.

  The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

TEST PLAN
  EMPTY

REVISION DETAIL
  http://reviews.facebook.net/D27

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java

MANAGE HERALD DIFFERENTIAL RULES
  http://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  http://reviews.facebook.net/herald/transcript/57/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-18 Thread jsichi (John Sichi) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jsichi (John Sichi) updated HBASE-4532:
---

Attachment: D27.1.patch

Liyin requested code review of [jira] [HBASE-4532] Avoid top row seek by 
dedicated bloom filter for delete family bloom filter.
Reviewers: DUMMY_REVIEWER

  hbase 4532 89

  a href=https://issues.apache.org/jira/browse/HBASE-4469; title=Avoid top 
row seek by looking up bloomfilterdelHBASE-4469/del/a avoids the top 
row seek operation if row-col bloom filter is enabled.
  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family

  The only subtle use case is when we are interested in the top row with empty 
column.

  For example,
  we are interested in row1/cf1:/1/put.
  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
  Then it will avoid the top row seek and return a fake kv, which is the last 
kv for this row (createLastOnRowCol).
  In this way, we have already missed the real kv we are interested in.

  The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

TEST PLAN
  EMPTY

REVISION DETAIL
  http://reviews.facebook.net/D27

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java

MANAGE HERALD DIFFERENTIAL RULES
  http://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  http://reviews.facebook.net/herald/transcript/57/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-14 Thread Liyin Tang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4532:
--

Description: 
HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.


The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.



  was:
HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for row with empty column.

Previous solution is to create the dedicated bloom filter for delete family, 
which does not work if there is a row with empty column.
For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.


The root cause is that even there is no delete family at top row, we still 
cannot avoid the top row seek.
We can ONLY avoid the top row seek when there is no row with empty column, no 
matter what kind of kv type (delete/deleteCol/deleteFamily/put).
So the current solution is to create the dedicate bloom filter for row with 
empty column.


Summary: Avoid top row seek by dedicated bloom filter for delete family 
bloom filter  (was: Avoid top row seek by dedicated bloom filter for row with 
empty column)

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family

2011-10-13 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4532:
-

Comment: was deleted

(was: OK. I was confused.  I'm +0 on this patch (since I am not familiar with 
what is going on here -- it looks innocuous enough).  Jon you going to commit?)

 Avoid top row seek by dedicated bloom filter for delete family
 --

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira