[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2012-03-26 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239167#comment-13239167
 ] 

stack commented on HBASE-4532:
--

This feature looks like its always on (which would make sense).  Can you 
confirm Liyin?  Thanks.

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-12-19 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172794#comment-13172794
 ] 

Phabricator commented on HBASE-4532:


Liyin has abandoned the revision [jira] [HBASE-4532] Avoid top row seek by 
dedicated bloom filter for delete family bloom filter.

  abandon the stale revision.

REVISION DETAIL
  https://reviews.facebook.net/D27


 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-12-19 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172795#comment-13172795
 ] 

Phabricator commented on HBASE-4532:


Liyin has abandoned the revision [jira] [HBASE-4532] Avoid top row seek by 
dedicated bloom filter for delete family bloom filter.

  abandon the stale revision.

REVISION DETAIL
  https://reviews.facebook.net/D27


 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-12-19 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172799#comment-13172799
 ] 

Phabricator commented on HBASE-4532:


Liyin has abandoned the revision [jira] [HBASE-4532] Avoid top row seek by 
dedicated bloom filter for delete family bloom filter.

  abandon the stale revision.

REVISION DETAIL
  https://reviews.facebook.net/D27


 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-11-01 Thread Liyin Tang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141360#comment-13141360
 ] 

Liyin Tang commented on HBASE-4532:
---

Shall we add an incompatible flag for this jira?
Because adding a new block type is not backward compatible.

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-11-01 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141362#comment-13141362
 ] 

Ted Yu commented on HBASE-4532:
---

@Liyin:
Can you update Release Notes ?
Thanks

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-11-01 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141756#comment-13141756
 ] 

Hudson commented on HBASE-4532:
---

Integrated in HBase-TRUNK #2397 (See 
[https://builds.apache.org/job/HBase-TRUNK/2397/])
Fixed CHANGES file for HBASE-4532  HBASE-4611

nspiegelberg : 
Files : 
* /hbase/trunk/CHANGES.txt


 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-31 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140463#comment-13140463
 ] 

Ted Yu commented on HBASE-4532:
---

Looks like CHANGES.txt wasn't updated to include this JIRA.

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138729#comment-13138729
 ] 

Jonathan Hsieh commented on HBASE-4532:
---

This seems to be checked into trunk now and there seems to be an extraneous 
System.out.println that is causing some of my tests to fail when run from 
maven (apparently maven buffers in memory instead of writing it out as a test 
is executing).

Here's the OOME that maven reports:

Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java 
heap spaceat java.util.Arrays.copyOf(Arrays.java:2882)at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)at
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)at 
java.lang.StringBuffer.append(StringBuffer.java:224)at 
org.apache.maven.surefire.report.ConsoleOutputFileReporter.writeMessage(ConsoleOutputFileReporter.java:115)at
 
org.apache.maven.surefire.report.MulticastingReporter.writeMessage(MulticastingReporter.java:101)at
 
org.apache.maven.surefire.report.TestSetRunListener.writeTestOutput(TestSetRunListener.java:99)at
 
org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:132)at
 
org.apache.maven.plugin.surefire.booterclient.output.ThreadedStreamConsumer$Pumper.run(ThreadedStreamConsumer.java:67)at
 java.lang.Thread.run(Thread.java:662) man

I've attached a patch eliminates this issue.


 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Liyin Tang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138799#comment-13138799
 ] 

Liyin Tang commented on HBASE-4532:
---

Thanks Jonathan for the patch. I should remove this line out.

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138822#comment-13138822
 ] 

Jonathan Gray commented on HBASE-4532:
--

Please stop doing multiple commits on the same JIRA! :)  I thought we agreed on 
this, or no?

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138826#comment-13138826
 ] 

Ted Yu commented on HBASE-4532:
---

This JIRA wasn't closed before applying the addendum.
May I ask why this JIRA was integrated on the 24th without announcement ?

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Jonathan Gray (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138831#comment-13138831
 ] 

Jonathan Gray commented on HBASE-4532:
--

I don't think JIRA being open/closed is the issue, it's more multiple commits.

But yeah, as a separate note, looks like there was no final comment and 
resolution after the commit.

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138834#comment-13138834
 ] 

Ted Yu commented on HBASE-4532:
---

I would interpret the JIRA not being resolved as anticipation for an addendum 
:-)

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138992#comment-13138992
 ] 

Hudson commented on HBASE-4532:
---

Integrated in HBase-TRUNK #2380 (See 
[https://builds.apache.org/job/HBase-TRUNK/2380/])
HBASE-4532 remove system.out.println (Jonathan Hsieh)

tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java


 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-28 Thread Liyin Tang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139016#comment-13139016
 ] 

Liyin Tang commented on HBASE-4532:
---

Thank Ted, Jonathan Gray for committing this. 
I will double check the submitted patch to avoid this problem.

Nice Catch Jonathan Hsieh. Thank you for the patch:)

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-24 Thread Nicolas Spiegelberg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134277#comment-13134277
 ] 

Nicolas Spiegelberg commented on HBASE-4532:


+1 on commit.  TestHCM is an issue unrelated to this JIRA and shouldn't hold it 
up.  Should use 'git bisect' to figure out where it was introduced and comment 
on that JIRA.

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134421#comment-13134421
 ] 

Hudson commented on HBASE-4532:
---

Integrated in HBase-TRUNK #2363 (See 
[https://builds.apache.org/job/HBase-TRUNK/2363/])
HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family 
bloom filter

nspiegelberg : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java


 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please 

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-23 Thread Liyin Tang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133585#comment-13133585
 ] 

Liyin Tang commented on HBASE-4532:
---

Thanks Ted:)
here is the test results I got.
So the testConnectionUniqueness in TestHCM has been fixed now ?

==
Results :

Tests in error: 
  testConnectionUniqueness(org.apache.hadoop.hbase.client.TestHCM)
  
testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting):
 Unexpected exception, 
expectedorg.apache.hadoop.hbase.regionserver.wal.OrphanHLogAfterSplitException
 but wasjava.lang.NullPointerException
  
testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)
  
testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): 
/data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/dfs/name1/current/fsimage
 (Too many open files)
  testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)
  testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): 
/data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/3949c75c-8c23-4513-b1cc-e94b1bba640b/dfs/name1/current/fsimage
 (Too many open files) 
  testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)

Tests run: 1056, Failures: 0, Errors: 7, Skipped: 9

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-23 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133599#comment-13133599
 ] 

Ted Yu commented on HBASE-4532:
---

TestHCM wasn't fixed. If the test fails consistently, maybe you can help debug 
it. 
For the other test failures, it seems ulimit on the machine performing tests 
has to
be increased. 

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-22 Thread Liyin Tang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133449#comment-13133449
 ] 

Liyin Tang commented on HBASE-4532:
---

For 89-fb, all the unit tests are passed.
For apache-trunk, there are 2 unit tests failed with and without my change:
TestHCM and TestDistributedLogSpliting

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-22 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133495#comment-13133495
 ] 

Ted Yu commented on HBASE-4532:
---

Thanks for running the test suites, Liyin.

TRUNK build 2358 passed.
Though I did see this in build 2357:
https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2357/testReport/junit/org.apache.hadoop.hbase.client/TestHCM/testConnectionUniqueness/

Please tell us the subtest that failed for the above two tests.

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132995#comment-13132995
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-21 19:49:16.120589)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.


Changes
---

Thanks Kannan:) Update the diff to address Kannan's review


Summary
---

HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as 
well. I will submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Passed all the unit tests


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133002#comment-13133002
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-21 19:58:05.693922)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.


Summary (updated)
---

The previous jira, HBASE-4469, is to avoid the top row seek operation if 
row-col bloom filter is enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.


The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.


Evaluation from TestSeekOptimization:
Previously:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
enabled.[HBASE-4469]



After this change:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

So we can get about 10% more seek savings for ALL kinds of bloom filter.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Passed all the unit tests


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: 

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133003#comment-13133003
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2758
---

Ship it!


+1. One typo below.


src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
https://reviews.apache.org/r/2393/#comment6199

exits - exist


- Kannan


On 2011-10-21 19:58:05, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-21 19:58:05)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  The previous jira, HBASE-4469, is to avoid the top row seek operation if 
row-col bloom filter is enabled. 
bq.  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family
bq.  
bq.  The only subtle use case is when we are interested in the top row with 
empty column.
bq.  
bq.  For example, 
bq.  we are interested in row1/cf1:/1/put.
bq.  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
bq.  Then it will avoid the top row seek and return a fake kv, which is the 
last kv for this row (createLastOnRowCol).
bq.  In this way, we have already missed the real kv we are interested in.
bq.  
bq.  
bq.  The solution for the above problem is to disable this optimization if we 
are trying to GET/SCAN a row with empty column.
bq.  
bq.  
bq.  Evaluation from TestSeekOptimization:
bq.  Previously:
bq.  For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
bq.  For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
bq.  For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
bq.  
bq.  For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
bq.  For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
bq.  For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
bq.  
bq.  So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter 
is enabled.[HBASE-4469]
bq.  
bq.  
bq.  
bq.  After this change:
bq.  For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
bq.  For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
bq.  For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
bq.  
bq.  For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
bq.  For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
bq.  For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
bq.  
bq.  So we can get about 10% more seek savings for ALL kinds of bloom filter.
bq.  
bq.  
bq.  This addresses bug HBASE-4532.
bq.  https://issues.apache.org/jira/browse/HBASE-4532
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 
1f78dd4 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 
3c34f86 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 
2e1d23a 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java 
c4b60e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
e4dfc2e 
bq.

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133008#comment-13133008
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-21 20:01:23.160793)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.


Changes
---

Fix the typo:)


Summary
---

The previous jira, HBASE-4469, is to avoid the top row seek operation if 
row-col bloom filter is enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.


The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.


Evaluation from TestSeekOptimization:
Previously:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
enabled.[HBASE-4469]



After this change:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
optimization: 1458 (58.18%), savings: 41.82%

So we can get about 10% more seek savings for ALL kinds of bloom filter.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Passed all the unit tests


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: 

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131813#comment-13131813
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--



bq.  On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 
792
bq.   https://reviews.apache.org/r/2393/diff/3/?file=51375#file51375line792
bq.  
bq.   is there _ever_ a case someone would not want this turned on?  if 
someone was doing a ton of delete families maybe?  u might not want to pay the 
cost of making this bloom.

Yes, We can disable this by 
conf.setBoolean(IO_STOREFILE_DELETEFAMILY_BLOOM_ENABLED, false);


bq.  On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 
98
bq.   https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line98
bq.  
bq.   this means the null qualifier?

yes. I have updated the comments:)


bq.  On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java, lines 
677-678
bq.   https://reviews.apache.org/r/2393/diff/3/?file=51370#file51370line677
bq.  
bq.   is this right to return IOE and not null like if it doesn't exist in 
the general bloom case?

agreed :)


bq.  On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, lines 
67-68
bq.   https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line67
bq.  
bq.   can you describe what an empty column means?  does this mean 
wildcard or does this mean the null column?

yes. I have updated the comments:)


bq.  On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq.   src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java, line 
73
bq.   https://reviews.apache.org/r/2393/diff/3/?file=51378#file51378line73
bq.  
bq.   this enables the creation or the usage?

This enable the creation.


bq.  On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq.   src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java, 
line 26
bq.   https://reviews.apache.org/r/2393/diff/3/?file=51379#file51379line26
bq.  
bq.   it's hard to tell what in this actually changed.  i don't see much 
that actually went down?  and should you also do some tests where you 
enable/disable the delete family bloom to ensure that it's working as expected 
both ways?

It expects no number goes down :) It shows we can avoid the top row seek even 
there is ROW/NONE bloom filter.

Previously, this unit test only enabled the ROWCOL bloom filter for HBASE-4469 
(Avoid top row seek by looking up row_col bloomfilter)

But right now, in the TestBlocksRead, it will check the number seeks for 
ROWCOL, ROW and NONE Bloom filter one by one.  
No matter what Bloom filter the CF is using, we always avoid the top row seek:)


bq.  On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq.   src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java, 
line 401
bq.   https://reviews.apache.org/r/2393/diff/3/?file=51381#file51381line401
bq.  
bq.   you seem to be setting the conf to 0.01 and then retrieving it back?

Yes. I try to be consistent with other bloom filter unit tests. 
So set the same error rate as testBloomFilter() function.


- Liyin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2693
---


On 2011-10-20 03:46:26, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-20 03:46:26)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
bq.  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family
bq.  
bq.  The only subtle use case is when we are interested in the top row with 
empty column.
bq.  
bq.  For example, 
bq.  we are interested in row1/cf1:/1/put.
bq.  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
bq.  Then it will avoid the top row seek and return a fake kv, which is the 
last kv for this row (createLastOnRowCol).
bq.  In this way, we have already missed the real kv we are interested in.
bq.  
bq.  The solution for 

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131814#comment-13131814
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--



bq.  On 2011-10-20 05:07:00, Ted Yu wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 
98
bq.   https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line98
bq.  
bq.   If hasEmptyColumn is true, shall we be using 
ScanWildcardColumnTracker (as in the if block) ?

I have updated comments for this variable:
   * This variable shows whether there is an null column in the query.
   * There is always a null column in the wildcard column query. 
   * There maybe exits a null column in the explicit column query based on the 
   * first column.


- Liyin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2695
---


On 2011-10-20 03:46:26, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-20 03:46:26)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
bq.  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family
bq.  
bq.  The only subtle use case is when we are interested in the top row with 
empty column.
bq.  
bq.  For example, 
bq.  we are interested in row1/cf1:/1/put.
bq.  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
bq.  Then it will avoid the top row seek and return a fake kv, which is the 
last kv for this row (createLastOnRowCol).
bq.  In this way, we have already missed the real kv we are interested in.
bq.  
bq.  The solution for the above problem is to disable this optimization if we 
are trying to GET/SCAN a row with empty column.
bq.  
bq.  This patch is rebased on 0.89-fb. But it should be the same for 
apache-trunk as well. I will submit the patch for apache-trunk later.
bq.  
bq.  
bq.  This addresses bug HBASE-4532.
bq.  https://issues.apache.org/jira/browse/HBASE-4532
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 
1f78dd4 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 
3c34f86 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 
2e1d23a 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java 
c4b60e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
e4dfc2e 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
8814812 
bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java 
fb4f2df 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
0eca9b8 
bq.  
bq.  Diff: https://reviews.apache.org/r/2393/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Passed all the unit tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Liyin
bq.  
bq.



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131821#comment-13131821
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-20 17:35:38.480668)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.


Changes
---

Thanks Jonathan and Ted's review. Address their comments.


Summary
---

HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as 
well. I will submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs (updated)
-

  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Passed all the unit tests


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131893#comment-13131893
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-20 18:45:51.242661)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.


Summary
---

HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as 
well. I will submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Passed all the unit tests


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131950#comment-13131950
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2720
---

Ship it!


Looks good!  Want to attach the various branch patches to the JIRA?  (As you 
have them).  I can help rebase on trunk if you'd like, let me know.  Nice work!

- Jonathan


On 2011-10-20 18:45:51, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-20 18:45:51)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
bq.  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family
bq.  
bq.  The only subtle use case is when we are interested in the top row with 
empty column.
bq.  
bq.  For example, 
bq.  we are interested in row1/cf1:/1/put.
bq.  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
bq.  Then it will avoid the top row seek and return a fake kv, which is the 
last kv for this row (createLastOnRowCol).
bq.  In this way, we have already missed the real kv we are interested in.
bq.  
bq.  The solution for the above problem is to disable this optimization if we 
are trying to GET/SCAN a row with empty column.
bq.  
bq.  This patch is rebased on 0.89-fb. But it should be the same for 
apache-trunk as well. I will submit the patch for apache-trunk later.
bq.  
bq.  
bq.  This addresses bug HBASE-4532.
bq.  https://issues.apache.org/jira/browse/HBASE-4532
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
e4dfc2e 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java 
c4b60e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 
1f78dd4 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 
3c34f86 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 
2e1d23a 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
8814812 
bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java 
fb4f2df 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
0eca9b8 
bq.  
bq.  Diff: https://reviews.apache.org/r/2393/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Passed all the unit tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Liyin
bq.  
bq.



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row 

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132100#comment-13132100
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--



bq.  On 2011-10-20 19:47:02, Jonathan Gray wrote:
bq.   Looks good!  Want to attach the various branch patches to the JIRA?  (As 
you have them).  I can help rebase on trunk if you'd like, let me know.  Nice 
work!

thanks Jonathan for reviewing. I will submit a  patch against apache trunk :)


- Liyin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2720
---


On 2011-10-20 18:45:51, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-20 18:45:51)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
bq.  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family
bq.  
bq.  The only subtle use case is when we are interested in the top row with 
empty column.
bq.  
bq.  For example, 
bq.  we are interested in row1/cf1:/1/put.
bq.  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
bq.  Then it will avoid the top row seek and return a fake kv, which is the 
last kv for this row (createLastOnRowCol).
bq.  In this way, we have already missed the real kv we are interested in.
bq.  
bq.  The solution for the above problem is to disable this optimization if we 
are trying to GET/SCAN a row with empty column.
bq.  
bq.  This patch is rebased on 0.89-fb. But it should be the same for 
apache-trunk as well. I will submit the patch for apache-trunk later.
bq.  
bq.  
bq.  This addresses bug HBASE-4532.
bq.  https://issues.apache.org/jira/browse/HBASE-4532
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
e4dfc2e 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java 
c4b60e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 
1f78dd4 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 
3c34f86 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 
2e1d23a 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
8814812 
bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java 
fb4f2df 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
0eca9b8 
bq.  
bq.  Diff: https://reviews.apache.org/r/2393/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Passed all the unit tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Liyin
bq.  
bq.



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO 

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132344#comment-13132344
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2734
---


done with 2nd pass of review. Final comments inlined. This is looking really 
good.


src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
https://reviews.apache.org/r/2393/#comment6151

getGeneralBloom - getDeleteBloom?



src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
https://reviews.apache.org/r/2393/#comment6152

exits - exist



src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
https://reviews.apache.org/r/2393/#comment6154

tomorrow someone may introduce a new constructor which forgets to 
initialize this variable, and it'll default to false, which is an unsafe 
default for this variable.

Let's do something like:

private boolean hasNullColumn = true; // initialize conservatively



src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
https://reviews.apache.org/r/2393/#comment6153

simplify to:

hasNullColumn = (columnSet.first().length == 0);



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment6155

has - was



src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
https://reviews.apache.org/r/2393/#comment6157

This function only does DeleteFamily Bloom. So, the comment needs to be 
updated to remove ROWCOL  ROW.



src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
https://reviews.apache.org/r/2393/#comment6158

nice enhancement to the test!


- Kannan


On 2011-10-20 18:45:51, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-20 18:45:51)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
bq.  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family
bq.  
bq.  The only subtle use case is when we are interested in the top row with 
empty column.
bq.  
bq.  For example, 
bq.  we are interested in row1/cf1:/1/put.
bq.  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
bq.  Then it will avoid the top row seek and return a fake kv, which is the 
last kv for this row (createLastOnRowCol).
bq.  In this way, we have already missed the real kv we are interested in.
bq.  
bq.  The solution for the above problem is to disable this optimization if we 
are trying to GET/SCAN a row with empty column.
bq.  
bq.  This patch is rebased on 0.89-fb. But it should be the same for 
apache-trunk as well. I will submit the patch for apache-trunk later.
bq.  
bq.  
bq.  This addresses bug HBASE-4532.
bq.  https://issues.apache.org/jira/browse/HBASE-4532
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
e4dfc2e 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java 
c4b60e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 
1f78dd4 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 
3c34f86 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 
2e1d23a 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
8814812 
bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java 
fb4f2df 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
bq.

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-19 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130883#comment-13130883
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2615
---


Hi Liyin-- find the first pass review comments inlined. Haven't reviewed the 
test changes yet. Looking fwd to this optimization landing.


src/main/java/org/apache/hadoop/hbase/KeyValue.java
https://reviews.apache.org/r/2393/#comment5908

remove qualifier from the comment, since all we are passing here is row 
and family (no column name).



src/main/java/org/apache/hadoop/hbase/KeyValue.java
https://reviews.apache.org/r/2393/#comment5909

* remove qualifier from comment here too.
* 80 char issue



src/main/java/org/apache/hadoop/hbase/KeyValue.java
https://reviews.apache.org/r/2393/#comment5910

remove qualifier param.



src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
https://reviews.apache.org/r/2393/#comment5913

80 char issues.



src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
https://reviews.apache.org/r/2393/#comment5914

can we enhance HFilePrettyPrinter to report info about the 
DeleteBloomFilter as well (provided HFile is V2).



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment5985

even though - even if



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment5987

what's the differerence between getPath() (in line 1027) and 
this.writer.getPath()? Did you mean to log the general  delete Bloom filter 
instead?



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment6015

not clear where you are using this -1 state



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment6017

or this is no - or there is no



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment6016

To make sure I understand this...

for HFileV1 case or for HFileV2 + but without this fix, I am guessing 
deleteFamilyCnt will be equal to -1, and the fact that it doesn't have a 
bloomFilter will cause it to return true. That look's fine. Just not obvious.



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment6014

space between cnt  !=



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment6020

did you intend to initialize bloomTypeLog here as well?



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment6021

bloomTypeLog is only initialized for GeneralBloomFilter case. If that's the 
intent, why not move the logging near line 1382?



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
https://reviews.apache.org/r/2393/#comment6027

In case there is a deleteFamily kv, there are two sub-cases here...

a) we have ROWCOL bloom (in which case there is no DeleteFamilyBloomFilter) 
and we want to use the ROWCOL bloom filter itself.

b) we have a DeleteFamilyBloomFilter.

I don't see us taking advantage of (a) like we used to earlier. Isn't this 
a regression for the ROWCOL bloom case? And if so, TestBlocksRead should have 
caught it, no?



src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
https://reviews.apache.org/r/2393/#comment6023

isSeekToEmptyColumn and useBloom should be separate flags I think.

For example, if the CF had ROWCOL bloom, and the query for looking for 
row/0-length column, then with this change, we won't use the ROWCOL bloom 
filter even when it exists.

Isn't it the case that we want to avoid using only the deleteFamilyBloom 
filter when isSeekToEmptyColumn is true?


- Kannan


On 2011-10-18 20:38:41, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-18 20:38:41)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
bq.  This jira tries to 

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-19 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131190#comment-13131190
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-20 00:08:14.459108)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.


Changes
---

Thanks for Kannan's review.
Update the diff to address Kannan's comments.


Summary
---

HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as 
well. I will submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Running all the unit tests now


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-19 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131191#comment-13131191
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--



bq.  On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 
1058
bq.   https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1058
bq.  
bq.   not clear where you are using this -1 state

Even if there is no delete family bloom filter, the Store file will still count 
how many delete family key values and append this information into HFile's File 
info.
So when reading the file, we will know how many delete family kvs.

However, if there is no this delete family field in the file info, 
deleteFamilyCnt shall be set to -1. So the function 
passesDeleteFamilyBloomFilter won't take this into account.


bq.  On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 
1217
bq.   https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1217
bq.  
bq.   To make sure I understand this...
bq.   
bq.   for HFileV1 case or for HFileV2 + but without this fix, I am 
guessing deleteFamilyCnt will be equal to -1, and the fact that it doesn't have 
a bloomFilter will cause it to return true. That look's fine. Just not obvious.

Yes:) If there is a deleteFamilyCnt and the deleteFamilyCnt is 0, then there is 
no need to check Bloom filter and return false for function 
passesDeleteFamilyBloomFilter(). It means there is no need to seek this store 
file for delete family with the row.

if the deleteFamilyCnt is not initialized properly for some reason, which is 
set to -1, then it needs to check the delete family bloom filter.
So there is no delete family bloom filter, it will return true. It means it is 
possible that there is a delete family for this row.


bq.  On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:
bq.   
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 
238
bq.   https://reviews.apache.org/r/2393/diff/2/?file=50559#file50559line238
bq.  
bq.   In case there is a deleteFamily kv, there are two sub-cases here...
bq.   
bq.   a) we have ROWCOL bloom (in which case there is no 
DeleteFamilyBloomFilter) and we want to use the ROWCOL bloom filter itself.
bq.   
bq.   b) we have a DeleteFamilyBloomFilter.
bq.   
bq.   I don't see us taking advantage of (a) like we used to earlier. 
Isn't this a regression for the ROWCOL bloom case? And if so, TestBlocksRead 
should have caught it, no?

1) Yes, it should the ROWCOL Bloom filter. It can also help to warm up row col 
bloom filter in the cache OR get benefit from block cache.
 I will update the code.
2) There is no regression for the ROWCOL bloom case. It is because we only 
count for data block seek number. 
No matter which bloom filter (delete family or row col), it will return the 
same result. So it won't affect the decision whether to seek to the store file 
file or not.
Please correct me if I am wrong :)


bq.  On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:
bq.   src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java, 
line 111
bq.   https://reviews.apache.org/r/2393/diff/2/?file=50560#file50560line111
bq.  
bq.   isSeekToEmptyColumn and useBloom should be separate flags I think.
bq.   
bq.   For example, if the CF had ROWCOL bloom, and the query for looking 
for row/0-length column, then with this change, we won't use the ROWCOL bloom 
filter even when it exists.
bq.   
bq.   Isn't it the case that we want to avoid using only the 
deleteFamilyBloom filter when isSeekToEmptyColumn is true?

Agree:) I will update the code to pass the scan query matcher to each store 
file scanner. Also this will help us for further optimization.
When the store file scanner has more information about the matcher's status, it 
may help to avoid more unnecessarily seeks.


- Liyin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2615
---


On 2011-10-20 00:08:14, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-20 00:08:14)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-19 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131331#comment-13131331
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-20 03:46:26.190655)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.


Summary
---

HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as 
well. I will submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing (updated)
---

Passed all the unit tests


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-19 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131375#comment-13131375
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2693
---


Nice work Liyin!  I've been wanting this feature for so long!  Some minor 
comments but looks good.


src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
https://reviews.apache.org/r/2393/#comment6069

is this right to return IOE and not null like if it doesn't exist in the 
general bloom case?



src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
https://reviews.apache.org/r/2393/#comment6070

missing leading space



src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
https://reviews.apache.org/r/2393/#comment6071

can you describe what an empty column means?  does this mean wildcard or 
does this mean the null column?



src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
https://reviews.apache.org/r/2393/#comment6072

this means the null qualifier?



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment6073

is there _ever_ a case someone would not want this turned on?  if someone 
was doing a ton of delete families maybe?  u might not want to pay the cost of 
making this bloom.



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
https://reviews.apache.org/r/2393/#comment6074

flipped args



src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
https://reviews.apache.org/r/2393/#comment6076

this enables the creation or the usage?



src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
https://reviews.apache.org/r/2393/#comment6075

stale comment from general method



src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
https://reviews.apache.org/r/2393/#comment6077

it's hard to tell what in this actually changed.  i don't see much that 
actually went down?  and should you also do some tests where you enable/disable 
the delete family bloom to ensure that it's working as expected both ways?



src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
https://reviews.apache.org/r/2393/#comment6078

you seem to be setting the conf to 0.01 and then retrieving it back?


- Jonathan


On 2011-10-20 03:46:26, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-20 03:46:26)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
bq.  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family
bq.  
bq.  The only subtle use case is when we are interested in the top row with 
empty column.
bq.  
bq.  For example, 
bq.  we are interested in row1/cf1:/1/put.
bq.  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
bq.  Then it will avoid the top row seek and return a fake kv, which is the 
last kv for this row (createLastOnRowCol).
bq.  In this way, we have already missed the real kv we are interested in.
bq.  
bq.  The solution for the above problem is to disable this optimization if we 
are trying to GET/SCAN a row with empty column.
bq.  
bq.  This patch is rebased on 0.89-fb. But it should be the same for 
apache-trunk as well. I will submit the patch for apache-trunk later.
bq.  
bq.  
bq.  This addresses bug HBASE-4532.
bq.  https://issues.apache.org/jira/browse/HBASE-4532
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 
1f78dd4 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 
3c34f86 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 
2e1d23a 
bq.

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-19 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131381#comment-13131381
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2695
---



src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
https://reviews.apache.org/r/2393/#comment6079

If hasEmptyColumn is true, shall we be using ScanWildcardColumnTracker (as 
in the if block) ?


- Ted


On 2011-10-20 03:46:26, Liyin Tang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2393/
bq.  ---
bq.  
bq.  (Updated 2011-10-20 03:46:26)
bq.  
bq.  
bq.  Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
bq.  This jira tries to avoid top row seek for all the cases by creating a 
dedicated bloom filter only for delete family
bq.  
bq.  The only subtle use case is when we are interested in the top row with 
empty column.
bq.  
bq.  For example, 
bq.  we are interested in row1/cf1:/1/put.
bq.  So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
bq.  Then it will avoid the top row seek and return a fake kv, which is the 
last kv for this row (createLastOnRowCol).
bq.  In this way, we have already missed the real kv we are interested in.
bq.  
bq.  The solution for the above problem is to disable this optimization if we 
are trying to GET/SCAN a row with empty column.
bq.  
bq.  This patch is rebased on 0.89-fb. But it should be the same for 
apache-trunk as well. I will submit the patch for apache-trunk later.
bq.  
bq.  
bq.  This addresses bug HBASE-4532.
bq.  https://issues.apache.org/jira/browse/HBASE-4532
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 
1f78dd4 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 
3c34f86 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 
2e1d23a 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java 
c4b60e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
92070b3 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
e4dfc2e 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
8814812 
bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java 
fb4f2df 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
b8bcc65 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
0eca9b8 
bq.  
bq.  Diff: https://reviews.apache.org/r/2393/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Passed all the unit tests
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Liyin
bq.  
bq.



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch


 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which 

[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-18 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130006#comment-13130006
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-18 20:38:41.990053)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.


Summary (updated)
---

HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as 
well. I will submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
a1d7de5 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
c88b23f 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Running all the unit tests now


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-15 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128253#comment-13128253
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, 
Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, 
Karthik Ranganathan, and Nicolas Spiegelberg.


Summary
---

HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for row with empty column.

Previous solution is to create the dedicated bloom filter for delete family, 
which does not work if there is a row with empty column.
For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The root cause is that even there is no delete family at top row, we still 
cannot avoid the top row seek.
We can ONLY avoid the top row seek when there is no row with empty column, no 
matter what kind of kv type (delete/deleteCol/deleteFamily/put).
So the current solution is to create the dedicate bloom filter for row with 
empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as 
well. I will submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
a1d7de5 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
c88b23f 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Running all the unit tests now


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-15 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128256#comment-13128256
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-15 18:33:14.858552)


Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, 
Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, 
Karthik Ranganathan, and Nicolas Spiegelberg.


Summary (updated)
---

HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as 
well. I will submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
a1d7de5 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
c88b23f 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Running all the unit tests now


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-15 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128261#comment-13128261
 ] 

jirapos...@reviews.apache.org commented on HBASE-4532:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/
---

(Updated 2011-10-15 19:06:53.079380)


Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, 
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry 
Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg.


Summary
---

HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
enabled. 
This jira tries to avoid top row seek for all the cases by creating a dedicated 
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty 
column.

For example, 
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv 
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are 
trying to GET/SCAN a row with empty column.

This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as 
well. I will submit the patch for apache-trunk later.


This addresses bug HBASE-4532.
https://issues.apache.org/jira/browse/HBASE-4532


Diffs
-

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb 
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
6cf7cce 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
a1d7de5 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
ebb360c 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 
  src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
c88b23f 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 
48e9163 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 

Diff: https://reviews.apache.org/r/2393/diff


Testing
---

Running all the unit tests now


Thanks,

Liyin



 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 HBASE-4469 avoids the top row seek operation if row-col bloom filter is 
 enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira