[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239167#comment-13239167 ] stack commented on HBASE-4532: -- This feature looks like its always on (which would make sense). Can you confirm Liyin? Thanks. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172794#comment-13172794 ] Phabricator commented on HBASE-4532: Liyin has abandoned the revision [jira] [HBASE-4532] Avoid top row seek by dedicated bloom filter for delete family bloom filter. abandon the stale revision. REVISION DETAIL https://reviews.facebook.net/D27 Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172795#comment-13172795 ] Phabricator commented on HBASE-4532: Liyin has abandoned the revision [jira] [HBASE-4532] Avoid top row seek by dedicated bloom filter for delete family bloom filter. abandon the stale revision. REVISION DETAIL https://reviews.facebook.net/D27 Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13172799#comment-13172799 ] Phabricator commented on HBASE-4532: Liyin has abandoned the revision [jira] [HBASE-4532] Avoid top row seek by dedicated bloom filter for delete family bloom filter. abandon the stale revision. REVISION DETAIL https://reviews.facebook.net/D27 Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141360#comment-13141360 ] Liyin Tang commented on HBASE-4532: --- Shall we add an incompatible flag for this jira? Because adding a new block type is not backward compatible. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141362#comment-13141362 ] Ted Yu commented on HBASE-4532: --- @Liyin: Can you update Release Notes ? Thanks Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141756#comment-13141756 ] Hudson commented on HBASE-4532: --- Integrated in HBase-TRUNK #2397 (See [https://builds.apache.org/job/HBase-TRUNK/2397/]) Fixed CHANGES file for HBASE-4532 HBASE-4611 nspiegelberg : Files : * /hbase/trunk/CHANGES.txt Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13140463#comment-13140463 ] Ted Yu commented on HBASE-4532: --- Looks like CHANGES.txt wasn't updated to include this JIRA. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138729#comment-13138729 ] Jonathan Hsieh commented on HBASE-4532: --- This seems to be checked into trunk now and there seems to be an extraneous System.out.println that is causing some of my tests to fail when run from maven (apparently maven buffers in memory instead of writing it out as a test is executing). Here's the OOME that maven reports: Exception in thread ThreadedStreamConsumer java.lang.OutOfMemoryError: Java heap spaceat java.util.Arrays.copyOf(Arrays.java:2882)at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)at java.lang.StringBuffer.append(StringBuffer.java:224)at org.apache.maven.surefire.report.ConsoleOutputFileReporter.writeMessage(ConsoleOutputFileReporter.java:115)at org.apache.maven.surefire.report.MulticastingReporter.writeMessage(MulticastingReporter.java:101)at org.apache.maven.surefire.report.TestSetRunListener.writeTestOutput(TestSetRunListener.java:99)at org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:132)at org.apache.maven.plugin.surefire.booterclient.output.ThreadedStreamConsumer$Pumper.run(ThreadedStreamConsumer.java:67)at java.lang.Thread.run(Thread.java:662) man I've attached a patch eliminates this issue. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138799#comment-13138799 ] Liyin Tang commented on HBASE-4532: --- Thanks Jonathan for the patch. I should remove this line out. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138822#comment-13138822 ] Jonathan Gray commented on HBASE-4532: -- Please stop doing multiple commits on the same JIRA! :) I thought we agreed on this, or no? Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138826#comment-13138826 ] Ted Yu commented on HBASE-4532: --- This JIRA wasn't closed before applying the addendum. May I ask why this JIRA was integrated on the 24th without announcement ? Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138831#comment-13138831 ] Jonathan Gray commented on HBASE-4532: -- I don't think JIRA being open/closed is the issue, it's more multiple commits. But yeah, as a separate note, looks like there was no final comment and resolution after the commit. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138834#comment-13138834 ] Ted Yu commented on HBASE-4532: --- I would interpret the JIRA not being resolved as anticipation for an addendum :-) Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13138992#comment-13138992 ] Hudson commented on HBASE-4532: --- Integrated in HBase-TRUNK #2380 (See [https://builds.apache.org/job/HBase-TRUNK/2380/]) HBASE-4532 remove system.out.println (Jonathan Hsieh) tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13139016#comment-13139016 ] Liyin Tang commented on HBASE-4532: --- Thank Ted, Jonathan Gray for committing this. I will double check the submitted patch to avoid this problem. Nice Catch Jonathan Hsieh. Thank you for the patch:) Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.0 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch, hbase-4532-remove-system.out.println.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134277#comment-13134277 ] Nicolas Spiegelberg commented on HBASE-4532: +1 on commit. TestHCM is an issue unrelated to this JIRA and shouldn't hold it up. Should use 'git bisect' to figure out where it was introduced and comment on that JIRA. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13134421#comment-13134421 ] Hudson commented on HBASE-4532: --- Integrated in HBase-TRUNK #2363 (See [https://builds.apache.org/job/HBase-TRUNK/2363/]) HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter nspiegelberg : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133585#comment-13133585 ] Liyin Tang commented on HBASE-4532: --- Thanks Ted:) here is the test results I got. So the testConnectionUniqueness in TestHCM has been fixed now ? == Results : Tests in error: testConnectionUniqueness(org.apache.hadoop.hbase.client.TestHCM) testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): Unexpected exception, expectedorg.apache.hadoop.hbase.regionserver.wal.OrphanHLogAfterSplitException but wasjava.lang.NullPointerException testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): /data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/dfs/name1/current/fsimage (Too many open files) testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): /data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/3949c75c-8c23-4513-b1cc-e94b1bba640b/dfs/name1/current/fsimage (Too many open files) testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) Tests run: 1056, Failures: 0, Errors: 7, Skipped: 9 Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133599#comment-13133599 ] Ted Yu commented on HBASE-4532: --- TestHCM wasn't fixed. If the test fails consistently, maybe you can help debug it. For the other test failures, it seems ulimit on the machine performing tests has to be increased. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133449#comment-13133449 ] Liyin Tang commented on HBASE-4532: --- For 89-fb, all the unit tests are passed. For apache-trunk, there are 2 unit tests failed with and without my change: TestHCM and TestDistributedLogSpliting Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133495#comment-13133495 ] Ted Yu commented on HBASE-4532: --- Thanks for running the test suites, Liyin. TRUNK build 2358 passed. Though I did see this in build 2357: https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2357/testReport/junit/org.apache.hadoop.hbase.client/TestHCM/testConnectionUniqueness/ Please tell us the subtest that failed for the above two tests. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, hbase-4532-89-fb.patch The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132995#comment-13132995 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-21 19:49:16.120589) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Changes --- Thanks Kannan:) Update the diff to address Kannan's review Summary --- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing --- Passed all the unit tests Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133002#comment-13133002 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-21 19:58:05.693922) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary (updated) --- The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs - src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing --- Passed all the unit tests Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL:
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133003#comment-13133003 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2758 --- Ship it! +1. One typo below. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java https://reviews.apache.org/r/2393/#comment6199 exits - exist - Kannan On 2011-10-21 19:58:05, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-21 19:58:05) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq. The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. bq. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family bq. bq. The only subtle use case is when we are interested in the top row with empty column. bq. bq. For example, bq. we are interested in row1/cf1:/1/put. bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. bq. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). bq. In this way, we have already missed the real kv we are interested in. bq. bq. bq. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. bq. bq. bq. Evaluation from TestSeekOptimization: bq. Previously: bq. For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% bq. For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% bq. For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% bq. bq. For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% bq. For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% bq. For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% bq. bq. So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] bq. bq. bq. bq. After this change: bq. For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% bq. For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% bq. For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% bq. bq. For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% bq. For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% bq. For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% bq. bq. So we can get about 10% more seek savings for ALL kinds of bloom filter. bq. bq. bq. This addresses bug HBASE-4532. bq. https://issues.apache.org/jira/browse/HBASE-4532 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e bq.
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13133008#comment-13133008 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-21 20:01:23.160793) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Changes --- Fix the typo:) Summary --- The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.[HBASE-4469] After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing --- Passed all the unit tests Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key:
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131813#comment-13131813 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- bq. On 2011-10-20 04:55:44, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 792 bq. https://reviews.apache.org/r/2393/diff/3/?file=51375#file51375line792 bq. bq. is there _ever_ a case someone would not want this turned on? if someone was doing a ton of delete families maybe? u might not want to pay the cost of making this bloom. Yes, We can disable this by conf.setBoolean(IO_STOREFILE_DELETEFAMILY_BLOOM_ENABLED, false); bq. On 2011-10-20 04:55:44, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 98 bq. https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line98 bq. bq. this means the null qualifier? yes. I have updated the comments:) bq. On 2011-10-20 04:55:44, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java, lines 677-678 bq. https://reviews.apache.org/r/2393/diff/3/?file=51370#file51370line677 bq. bq. is this right to return IOE and not null like if it doesn't exist in the general bloom case? agreed :) bq. On 2011-10-20 04:55:44, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, lines 67-68 bq. https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line67 bq. bq. can you describe what an empty column means? does this mean wildcard or does this mean the null column? yes. I have updated the comments:) bq. On 2011-10-20 04:55:44, Jonathan Gray wrote: bq. src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java, line 73 bq. https://reviews.apache.org/r/2393/diff/3/?file=51378#file51378line73 bq. bq. this enables the creation or the usage? This enable the creation. bq. On 2011-10-20 04:55:44, Jonathan Gray wrote: bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java, line 26 bq. https://reviews.apache.org/r/2393/diff/3/?file=51379#file51379line26 bq. bq. it's hard to tell what in this actually changed. i don't see much that actually went down? and should you also do some tests where you enable/disable the delete family bloom to ensure that it's working as expected both ways? It expects no number goes down :) It shows we can avoid the top row seek even there is ROW/NONE bloom filter. Previously, this unit test only enabled the ROWCOL bloom filter for HBASE-4469 (Avoid top row seek by looking up row_col bloomfilter) But right now, in the TestBlocksRead, it will check the number seeks for ROWCOL, ROW and NONE Bloom filter one by one. No matter what Bloom filter the CF is using, we always avoid the top row seek:) bq. On 2011-10-20 04:55:44, Jonathan Gray wrote: bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java, line 401 bq. https://reviews.apache.org/r/2393/diff/3/?file=51381#file51381line401 bq. bq. you seem to be setting the conf to 0.01 and then retrieving it back? Yes. I try to be consistent with other bloom filter unit tests. So set the same error rate as testBloomFilter() function. - Liyin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2693 --- On 2011-10-20 03:46:26, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-20 03:46:26) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. bq. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family bq. bq. The only subtle use case is when we are interested in the top row with empty column. bq. bq. For example, bq. we are interested in row1/cf1:/1/put. bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. bq. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). bq. In this way, we have already missed the real kv we are interested in. bq. bq. The solution for
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131814#comment-13131814 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- bq. On 2011-10-20 05:07:00, Ted Yu wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 98 bq. https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line98 bq. bq. If hasEmptyColumn is true, shall we be using ScanWildcardColumnTracker (as in the if block) ? I have updated comments for this variable: * This variable shows whether there is an null column in the query. * There is always a null column in the wildcard column query. * There maybe exits a null column in the explicit column query based on the * first column. - Liyin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2695 --- On 2011-10-20 03:46:26, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-20 03:46:26) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. bq. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family bq. bq. The only subtle use case is when we are interested in the top row with empty column. bq. bq. For example, bq. we are interested in row1/cf1:/1/put. bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. bq. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). bq. In this way, we have already missed the real kv we are interested in. bq. bq. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. bq. bq. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. bq. bq. bq. This addresses bug HBASE-4532. bq. https://issues.apache.org/jira/browse/HBASE-4532 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 bq. bq. Diff: https://reviews.apache.org/r/2393/diff bq. bq. bq. Testing bq. --- bq. bq. Passed all the unit tests bq. bq. bq. Thanks, bq. bq. Liyin bq. bq. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131821#comment-13131821 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-20 17:35:38.480668) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Changes --- Thanks Jonathan and Ted's review. Address their comments. Summary --- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) - src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 Diff: https://reviews.apache.org/r/2393/diff Testing --- Passed all the unit tests Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131893#comment-13131893 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-20 18:45:51.242661) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary --- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing --- Passed all the unit tests Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131950#comment-13131950 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2720 --- Ship it! Looks good! Want to attach the various branch patches to the JIRA? (As you have them). I can help rebase on trunk if you'd like, let me know. Nice work! - Jonathan On 2011-10-20 18:45:51, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-20 18:45:51) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. bq. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family bq. bq. The only subtle use case is when we are interested in the top row with empty column. bq. bq. For example, bq. we are interested in row1/cf1:/1/put. bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. bq. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). bq. In this way, we have already missed the real kv we are interested in. bq. bq. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. bq. bq. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. bq. bq. bq. This addresses bug HBASE-4532. bq. https://issues.apache.org/jira/browse/HBASE-4532 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 bq. bq. Diff: https://reviews.apache.org/r/2393/diff bq. bq. bq. Testing bq. --- bq. bq. Passed all the unit tests bq. bq. bq. Thanks, bq. bq. Liyin bq. bq. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132100#comment-13132100 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- bq. On 2011-10-20 19:47:02, Jonathan Gray wrote: bq. Looks good! Want to attach the various branch patches to the JIRA? (As you have them). I can help rebase on trunk if you'd like, let me know. Nice work! thanks Jonathan for reviewing. I will submit a patch against apache trunk :) - Liyin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2720 --- On 2011-10-20 18:45:51, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-20 18:45:51) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. bq. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family bq. bq. The only subtle use case is when we are interested in the top row with empty column. bq. bq. For example, bq. we are interested in row1/cf1:/1/put. bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. bq. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). bq. In this way, we have already missed the real kv we are interested in. bq. bq. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. bq. bq. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. bq. bq. bq. This addresses bug HBASE-4532. bq. https://issues.apache.org/jira/browse/HBASE-4532 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 bq. bq. Diff: https://reviews.apache.org/r/2393/diff bq. bq. bq. Testing bq. --- bq. bq. Passed all the unit tests bq. bq. bq. Thanks, bq. bq. Liyin bq. bq. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13132344#comment-13132344 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2734 --- done with 2nd pass of review. Final comments inlined. This is looking really good. src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java https://reviews.apache.org/r/2393/#comment6151 getGeneralBloom - getDeleteBloom? src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java https://reviews.apache.org/r/2393/#comment6152 exits - exist src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java https://reviews.apache.org/r/2393/#comment6154 tomorrow someone may introduce a new constructor which forgets to initialize this variable, and it'll default to false, which is an unsafe default for this variable. Let's do something like: private boolean hasNullColumn = true; // initialize conservatively src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java https://reviews.apache.org/r/2393/#comment6153 simplify to: hasNullColumn = (columnSet.first().length == 0); src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment6155 has - was src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java https://reviews.apache.org/r/2393/#comment6157 This function only does DeleteFamily Bloom. So, the comment needs to be updated to remove ROWCOL ROW. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java https://reviews.apache.org/r/2393/#comment6158 nice enhancement to the test! - Kannan On 2011-10-20 18:45:51, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-20 18:45:51) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. bq. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family bq. bq. The only subtle use case is when we are interested in the top row with empty column. bq. bq. For example, bq. we are interested in row1/cf1:/1/put. bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. bq. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). bq. In this way, we have already missed the real kv we are interested in. bq. bq. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. bq. bq. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. bq. bq. bq. This addresses bug HBASE-4532. bq. https://issues.apache.org/jira/browse/HBASE-4532 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 bq.
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130883#comment-13130883 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2615 --- Hi Liyin-- find the first pass review comments inlined. Haven't reviewed the test changes yet. Looking fwd to this optimization landing. src/main/java/org/apache/hadoop/hbase/KeyValue.java https://reviews.apache.org/r/2393/#comment5908 remove qualifier from the comment, since all we are passing here is row and family (no column name). src/main/java/org/apache/hadoop/hbase/KeyValue.java https://reviews.apache.org/r/2393/#comment5909 * remove qualifier from comment here too. * 80 char issue src/main/java/org/apache/hadoop/hbase/KeyValue.java https://reviews.apache.org/r/2393/#comment5910 remove qualifier param. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java https://reviews.apache.org/r/2393/#comment5913 80 char issues. src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java https://reviews.apache.org/r/2393/#comment5914 can we enhance HFilePrettyPrinter to report info about the DeleteBloomFilter as well (provided HFile is V2). src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment5985 even though - even if src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment5987 what's the differerence between getPath() (in line 1027) and this.writer.getPath()? Did you mean to log the general delete Bloom filter instead? src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment6015 not clear where you are using this -1 state src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment6017 or this is no - or there is no src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment6016 To make sure I understand this... for HFileV1 case or for HFileV2 + but without this fix, I am guessing deleteFamilyCnt will be equal to -1, and the fact that it doesn't have a bloomFilter will cause it to return true. That look's fine. Just not obvious. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment6014 space between cnt != src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment6020 did you intend to initialize bloomTypeLog here as well? src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment6021 bloomTypeLog is only initialized for GeneralBloomFilter case. If that's the intent, why not move the logging near line 1382? src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java https://reviews.apache.org/r/2393/#comment6027 In case there is a deleteFamily kv, there are two sub-cases here... a) we have ROWCOL bloom (in which case there is no DeleteFamilyBloomFilter) and we want to use the ROWCOL bloom filter itself. b) we have a DeleteFamilyBloomFilter. I don't see us taking advantage of (a) like we used to earlier. Isn't this a regression for the ROWCOL bloom case? And if so, TestBlocksRead should have caught it, no? src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java https://reviews.apache.org/r/2393/#comment6023 isSeekToEmptyColumn and useBloom should be separate flags I think. For example, if the CF had ROWCOL bloom, and the query for looking for row/0-length column, then with this change, we won't use the ROWCOL bloom filter even when it exists. Isn't it the case that we want to avoid using only the deleteFamilyBloom filter when isSeekToEmptyColumn is true? - Kannan On 2011-10-18 20:38:41, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-18 20:38:41) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. bq. This jira tries to
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131190#comment-13131190 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-20 00:08:14.459108) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Changes --- Thanks for Kannan's review. Update the diff to address Kannan's comments. Summary --- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing --- Running all the unit tests now Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131191#comment-13131191 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- bq. On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 1058 bq. https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1058 bq. bq. not clear where you are using this -1 state Even if there is no delete family bloom filter, the Store file will still count how many delete family key values and append this information into HFile's File info. So when reading the file, we will know how many delete family kvs. However, if there is no this delete family field in the file info, deleteFamilyCnt shall be set to -1. So the function passesDeleteFamilyBloomFilter won't take this into account. bq. On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 1217 bq. https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1217 bq. bq. To make sure I understand this... bq. bq. for HFileV1 case or for HFileV2 + but without this fix, I am guessing deleteFamilyCnt will be equal to -1, and the fact that it doesn't have a bloomFilter will cause it to return true. That look's fine. Just not obvious. Yes:) If there is a deleteFamilyCnt and the deleteFamilyCnt is 0, then there is no need to check Bloom filter and return false for function passesDeleteFamilyBloomFilter(). It means there is no need to seek this store file for delete family with the row. if the deleteFamilyCnt is not initialized properly for some reason, which is set to -1, then it needs to check the delete family bloom filter. So there is no delete family bloom filter, it will return true. It means it is possible that there is a delete family for this row. bq. On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 238 bq. https://reviews.apache.org/r/2393/diff/2/?file=50559#file50559line238 bq. bq. In case there is a deleteFamily kv, there are two sub-cases here... bq. bq. a) we have ROWCOL bloom (in which case there is no DeleteFamilyBloomFilter) and we want to use the ROWCOL bloom filter itself. bq. bq. b) we have a DeleteFamilyBloomFilter. bq. bq. I don't see us taking advantage of (a) like we used to earlier. Isn't this a regression for the ROWCOL bloom case? And if so, TestBlocksRead should have caught it, no? 1) Yes, it should the ROWCOL Bloom filter. It can also help to warm up row col bloom filter in the cache OR get benefit from block cache. I will update the code. 2) There is no regression for the ROWCOL bloom case. It is because we only count for data block seek number. No matter which bloom filter (delete family or row col), it will return the same result. So it won't affect the decision whether to seek to the store file file or not. Please correct me if I am wrong :) bq. On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote: bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java, line 111 bq. https://reviews.apache.org/r/2393/diff/2/?file=50560#file50560line111 bq. bq. isSeekToEmptyColumn and useBloom should be separate flags I think. bq. bq. For example, if the CF had ROWCOL bloom, and the query for looking for row/0-length column, then with this change, we won't use the ROWCOL bloom filter even when it exists. bq. bq. Isn't it the case that we want to avoid using only the deleteFamilyBloom filter when isSeekToEmptyColumn is true? Agree:) I will update the code to pass the scan query matcher to each store file scanner. Also this will help us for further optimization. When the store file scanner has more information about the matcher's status, it may help to avoid more unnecessarily seeks. - Liyin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2615 --- On 2011-10-20 00:08:14, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-20 00:08:14) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq.
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131331#comment-13131331 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-20 03:46:26.190655) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary --- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs - src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing (updated) --- Passed all the unit tests Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131375#comment-13131375 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2693 --- Nice work Liyin! I've been wanting this feature for so long! Some minor comments but looks good. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java https://reviews.apache.org/r/2393/#comment6069 is this right to return IOE and not null like if it doesn't exist in the general bloom case? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java https://reviews.apache.org/r/2393/#comment6070 missing leading space src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java https://reviews.apache.org/r/2393/#comment6071 can you describe what an empty column means? does this mean wildcard or does this mean the null column? src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java https://reviews.apache.org/r/2393/#comment6072 this means the null qualifier? src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment6073 is there _ever_ a case someone would not want this turned on? if someone was doing a ton of delete families maybe? u might not want to pay the cost of making this bloom. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java https://reviews.apache.org/r/2393/#comment6074 flipped args src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java https://reviews.apache.org/r/2393/#comment6076 this enables the creation or the usage? src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java https://reviews.apache.org/r/2393/#comment6075 stale comment from general method src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java https://reviews.apache.org/r/2393/#comment6077 it's hard to tell what in this actually changed. i don't see much that actually went down? and should you also do some tests where you enable/disable the delete family bloom to ensure that it's working as expected both ways? src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java https://reviews.apache.org/r/2393/#comment6078 you seem to be setting the conf to 0.01 and then retrieving it back? - Jonathan On 2011-10-20 03:46:26, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-20 03:46:26) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. bq. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family bq. bq. The only subtle use case is when we are interested in the top row with empty column. bq. bq. For example, bq. we are interested in row1/cf1:/1/put. bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. bq. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). bq. In this way, we have already missed the real kv we are interested in. bq. bq. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. bq. bq. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. bq. bq. bq. This addresses bug HBASE-4532. bq. https://issues.apache.org/jira/browse/HBASE-4532 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a bq.
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13131381#comment-13131381 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2695 --- src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java https://reviews.apache.org/r/2393/#comment6079 If hasEmptyColumn is true, shall we be using ScanWildcardColumnTracker (as in the if block) ? - Ted On 2011-10-20 03:46:26, Liyin Tang wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2393/ bq. --- bq. bq. (Updated 2011-10-20 03:46:26) bq. bq. bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. bq. bq. bq. Summary bq. --- bq. bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. bq. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family bq. bq. The only subtle use case is when we are interested in the top row with empty column. bq. bq. For example, bq. we are interested in row1/cf1:/1/put. bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. bq. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). bq. In this way, we have already missed the real kv we are interested in. bq. bq. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. bq. bq. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. bq. bq. bq. This addresses bug HBASE-4532. bq. https://issues.apache.org/jira/browse/HBASE-4532 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb bq.src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 bq.src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 bq. bq. Diff: https://reviews.apache.org/r/2393/diff bq. bq. bq. Testing bq. --- bq. bq. Passed all the unit tests bq. bq. bq. Thanks, bq. bq. Liyin bq. bq. Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang Attachments: D27.1.patch, D27.1.patch HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13130006#comment-13130006 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-18 20:38:41.990053) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary (updated) --- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs - src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing --- Running all the unit tests now Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128253#comment-13128253 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg. Summary --- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for row with empty column. Previous solution is to create the dedicated bloom filter for delete family, which does not work if there is a row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The root cause is that even there is no delete family at top row, we still cannot avoid the top row seek. We can ONLY avoid the top row seek when there is no row with empty column, no matter what kind of kv type (delete/deleteCol/deleteFamily/put). So the current solution is to create the dedicate bloom filter for row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs - src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing --- Running all the unit tests now Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128256#comment-13128256 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-15 18:33:14.858552) Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg. Summary (updated) --- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs - src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing --- Running all the unit tests now Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter
[ https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13128261#comment-13128261 ] jirapos...@reviews.apache.org commented on HBASE-4532: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ --- (Updated 2011-10-15 19:06:53.079380) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg. Summary --- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532. https://issues.apache.org/jira/browse/HBASE-4532 Diffs - src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing --- Running all the unit tests now Thanks, Liyin Avoid top row seek by dedicated bloom filter for delete family bloom filter --- Key: HBASE-4532 URL: https://issues.apache.org/jira/browse/HBASE-4532 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira