[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4683: --- Attachment: D1695.1.patch mbautin requested code review of [jira] [HBASE-4683] Test that we always cache index and bloom blocks. Reviewers: JIRA, jdcryans, lhofhansl, Liyin This was reviewed D807 but was not checked in. Submitting unit test as a separate patch, and extending the scope of the test to also handle the case when block cache is enabled for the column family. TEST PLAN Run unit tests REVISION DETAIL https://reviews.facebook.net/D1695 AFFECTED FILES src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestForceCacheImportantBlocks.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/3621/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. Always cache index and bloom blocks --- Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Mikhail Bautin Priority: Minor Fix For: 0.94.0, 0.92.0 Attachments: 0001-Cache-important-block-types.patch, 4683-v2.txt, 4683.txt, D1695.1.patch, D807.1.patch, D807.2.patch, D807.3.patch, HBASE-4683-0.92-v2.patch, HBASE-4683-v3.patch This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. Update (Mikhail): we probably don't need a new conf option. Instead, we will make index blocks cached by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4683: --- Attachment: D807.3.patch mbautin updated the revision [jira] [HBASE-4683] Always cache index and bloom blocks. Reviewers: jdcryans, lhofhansl, JIRA Rebasing up to r1214519. REVISION DETAIL https://reviews.facebook.net/D807 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestForceCacheImportantBlocks.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSeekOptimizations.java src/test/java/org/apache/hadoop/hbase/regionserver/metrics/TestSchemaConfigured.java Always cache index and bloom blocks --- Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Mikhail Bautin Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: 4683-v2.txt, 4683.txt, D807.1.patch, D807.2.patch, D807.3.patch, HBASE-4683-0.92-v2.patch, HBASE-4683-v3.patch This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. Update (Mikhail): we probably don't need a new conf option. Instead, we will make index blocks cached by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4683: -- Attachment: 0001-Cache-important-block-types.patch Attaching the patch rebased on top of r1214519. Always cache index and bloom blocks --- Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Mikhail Bautin Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: 0001-Cache-important-block-types.patch, 4683-v2.txt, 4683.txt, D807.1.patch, D807.2.patch, D807.3.patch, HBASE-4683-0.92-v2.patch, HBASE-4683-v3.patch This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. Update (Mikhail): we probably don't need a new conf option. Instead, we will make index blocks cached by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4683: --- Attachment: D807.2.patch mbautin updated the revision [jira] [HBASE-4683] Always cache index and bloom blocks. Reviewers: jdcryans, lhofhansl, JIRA Actually, the metrics changes are important. The unit test uses per-CF metrics to determine whether the blocks have been cached (or not cached, for data blocks), and when running the unit test I discovered some bugs in how table and CF name are passed down to HFile readers/writers, so I had to clean that up a bit. REVISION DETAIL https://reviews.facebook.net/D807 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/regionserver/Store.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestForceCacheImportantBlocks.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSeekOptimizations.java src/test/java/org/apache/hadoop/hbase/regionserver/metrics/TestSchemaConfigured.java Always cache index and bloom blocks --- Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Mikhail Bautin Priority: Minor Fix For: 0.94.0 Attachments: 4683-v2.txt, 4683.txt, D807.1.patch, D807.2.patch, HBASE-4683-v3.patch This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. Update (Mikhail): we probably don't need a new conf option. Instead, we will make index blocks cached by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-4683: -- Fix Version/s: 0.92.0 This needs to be in 0.92 too, the potential performance regression is too big. Always cache index and bloom blocks --- Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Mikhail Bautin Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: 4683-v2.txt, 4683.txt, D807.1.patch, D807.2.patch, HBASE-4683-v3.patch This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. Update (Mikhail): we probably don't need a new conf option. Instead, we will make index blocks cached by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4683: -- Summary: Always cache index and bloom blocks (was: Always cache index blocks) Always cache index and bloom blocks --- Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Mikhail Bautin Priority: Minor Fix For: 0.94.0 Attachments: 4683-v2.txt, 4683.txt, HBASE-4683-v3.patch This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. Update (Mikhail): we probably don't need a new conf option. Instead, we will make index blocks cached by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4683: --- Attachment: D807.1.patch mbautin requested code review of [jira] [HBASE-4683] Always cache index and bloom blocks. Reviewers: jdcryans, lhofhansl, JIRA Making sure that we always cache index and bloom blocks, as long as we have block cache, even if the cache is disabled for the CF. Adding a unit test to verify this using metrics. In process of doing this, I've factored out a region creation method in HBaseTestingUtility, and cleaned up a few things in HFile readers/writers. I have also categorized HFile v1 bloom blocks (that are technically meta blocks) as bloom blocks in metrics. TEST PLAN Unit tests REVISION DETAIL https://reviews.facebook.net/D807 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/io/hfile/TestForceCacheImportantBlocks.java src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java src/test/java/org/apache/hadoop/hbase/regionserver/TestSeekOptimizations.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/1719/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. Always cache index and bloom blocks --- Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Mikhail Bautin Priority: Minor Fix For: 0.94.0 Attachments: 4683-v2.txt, 4683.txt, D807.1.patch, HBASE-4683-v3.patch This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. Update (Mikhail): we probably don't need a new conf option. Instead, we will make index blocks cached by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira