[jira] [Commented] (HBASE-2794) Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get
[ https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118265#comment-13118265 ] Ted Yu commented on HBASE-2794: --- Integrated to 0.92 branch and TRUNK. Thanks for the patch Mikhail. Thanks for the review Jonathan. Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get - Key: HBASE-2794 URL: https://issues.apache.org/jira/browse/HBASE-2794 Project: HBase Issue Type: Improvement Components: performance Reporter: Kannan Muthukkaruppan Assignee: Mikhail Bautin Fix For: 0.92.0 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek(): {code} switch(bloomFilterType) { case ROW: key = row; break; case ROWCOL: if (columns.size() == 1) { byte[] col = columns.first(); key = Bytes.add(row, col); break; } //$FALL-THROUGH$ default: return true; } {code} If columns.size 1, then we currently don't take advantage of the bloom filter. We should optimize this to check bloom for each of columns and if none of the columns are present in the bloom avoid opening the file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2794) Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get
[ https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118294#comment-13118294 ] jirapos...@reviews.apache.org commented on HBASE-2794: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/#review2226 --- Ship it! I'm +0 on commmitting this. I tried reviewing it but I don't know this code well. The added unit test is nicely intrusive and the asserts look right. What about Nicolas's performance concerns. How are they addressed by this patch? I'm running a build of the patch and if that passes I'm +1 on commit. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java https://reviews.apache.org/r/2084/#comment5175 Interesting method name. We should use this pattern everywhere we have to do this. src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java https://reviews.apache.org/r/2084/#comment5176 Should we get rid of this javadoc if an override? (Let us know can do on commit) - Michael On 2011-09-29 21:05:20, Mikhail Bautin wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2084/ bq. --- bq. bq. (Updated 2011-09-29 21:05:20) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. bq. bq. bq. This addresses bug HBASE-2794. bq. https://issues.apache.org/jira/browse/HBASE-2794 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e bq. src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 bq.src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c bq.src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 bq.src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e bq.src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2084/diff bq. bq. bq. Testing bq. --- bq. bq. Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. bq. bq. bq. Thanks, bq. bq. Mikhail bq. bq. Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get - Key: HBASE-2794 URL: https://issues.apache.org/jira/browse/HBASE-2794 Project: HBase Issue Type: Improvement Components: performance Reporter: Kannan Muthukkaruppan Assignee: Mikhail Bautin Fix For: 0.92.0 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek(): {code} switch(bloomFilterType) { case ROW: key = row; break; case ROWCOL: if (columns.size() == 1) { byte[] col = columns.first(); key = Bytes.add(row, col); break; } //$FALL-THROUGH$ default: return true; } {code} If columns.size 1,
[jira] [Commented] (HBASE-2794) Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get
[ https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118449#comment-13118449 ] stack commented on HBASE-2794: -- These failed after running full suite but seem unrelated: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 Tests in error: testEnableDisableAddColumnDeleteColumn(org.apache.hadoop.hbase.client.TestAdmin): org.apache.hadoop.hbase.TableNotEnabledException: testMasterAdmin {code} Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get - Key: HBASE-2794 URL: https://issues.apache.org/jira/browse/HBASE-2794 Project: HBase Issue Type: Improvement Components: performance Reporter: Kannan Muthukkaruppan Assignee: Mikhail Bautin Fix For: 0.92.0 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek(): {code} switch(bloomFilterType) { case ROW: key = row; break; case ROWCOL: if (columns.size() == 1) { byte[] col = columns.first(); key = Bytes.add(row, col); break; } //$FALL-THROUGH$ default: return true; } {code} If columns.size 1, then we currently don't take advantage of the bloom filter. We should optimize this to check bloom for each of columns and if none of the columns are present in the bloom avoid opening the file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2794) Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get
[ https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118471#comment-13118471 ] Mikhail Bautin commented on HBASE-2794: --- @Michael: I am observing a different set of spuriously failing tests, also seemingly unrelated. 2011-09-29_20_41_15 | tests: 1015, fail: 0, err: 0, skip: 21, time: 6027.3 2011-09-29_23_09_51 | tests: 1012, fail: 0, err: 0, skip: 21, time: 5328.0 2011-09-30_01_44_42 | tests: 1015, fail: 0, err: 0, skip: 21, time: 6338.4 2011-09-30_04_28_29 | tests: 1015, fail: 0, err: 0, skip: 21, time: 6079.2 2011-09-30_07_00_24 | tests: 1015, fail: 1, err: 0, skip: 21, time: 6656.2, failed: Admin 2011-09-30_09_41_53 | tests: 1015, fail: 0, err: 0, skip: 21, time: 5900.8 2011-09-30_12_10_25 | tests: 1004, fail: 1, err: 0, skip: 21, time: 5397.7, failed: DistributedLogSplitting (Patch applied on top of http://svn.apache.org/repos/asf/hbase/trunk@1176613) Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get - Key: HBASE-2794 URL: https://issues.apache.org/jira/browse/HBASE-2794 Project: HBase Issue Type: Improvement Components: performance Reporter: Kannan Muthukkaruppan Assignee: Mikhail Bautin Fix For: 0.92.0 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek(): {code} switch(bloomFilterType) { case ROW: key = row; break; case ROWCOL: if (columns.size() == 1) { byte[] col = columns.first(); key = Bytes.add(row, col); break; } //$FALL-THROUGH$ default: return true; } {code} If columns.size 1, then we currently don't take advantage of the bloom filter. We should optimize this to check bloom for each of columns and if none of the columns are present in the bloom avoid opening the file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2794) Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get
[ https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118632#comment-13118632 ] Hudson commented on HBASE-2794: --- Integrated in HBase-TRUNK #2274 (See [https://builds.apache.org/job/HBase-TRUNK/2274/]) HBASE-2794 Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get (Mikhail Bautin) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get - Key: HBASE-2794 URL: https://issues.apache.org/jira/browse/HBASE-2794 Project: HBase Issue Type: Improvement Components: performance Reporter: Kannan Muthukkaruppan Assignee: Mikhail Bautin Fix For: 0.92.0 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek(): {code} switch(bloomFilterType) { case ROW: key = row; break; case ROWCOL: if (columns.size() == 1) { byte[] col = columns.first(); key = Bytes.add(row, col); break; } //$FALL-THROUGH$ default: return true; } {code} If columns.size 1, then we currently don't take advantage of the bloom filter. We should optimize this to check bloom for each of columns and if none of the columns are present in the bloom avoid opening the file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira