[jira] [Updated] (HBASE-5763) Fix random failures in TestFSErrorsExposed
[ https://issues.apache.org/jira/browse/HBASE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5763: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed everywhere. Fix random failures in TestFSErrorsExposed -- Key: HBASE-5763 URL: https://issues.apache.org/jira/browse/HBASE-5763 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D2739.1.patch, D2739.2.patch, D2739.3.patch, D2739.4.patch, D2793.1.patch, D2793.2.patch, D2793.3.patch, Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch, Fix-TestFSErrorsExposed-2012-04-16_15_41_24.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5104: -- Attachment: jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch Manually attaching the most recent patch. Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5763) Fix random failures in TestFSErrorsExposed
[ https://issues.apache.org/jira/browse/HBASE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5763: -- Attachment: Fix-TestFSErrorsExposed-2012-04-16_15_41_24.patch Attaching trunk patch for Jenkins testing. Fix random failures in TestFSErrorsExposed -- Key: HBASE-5763 URL: https://issues.apache.org/jira/browse/HBASE-5763 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D2739.1.patch, D2739.2.patch, D2739.3.patch, D2739.4.patch, D2793.1.patch, D2793.2.patch, Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch, Fix-TestFSErrorsExposed-2012-04-16_15_41_24.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5684) Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust
[ https://issues.apache.org/jira/browse/HBASE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5684: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust --- Key: HBASE-5684 URL: https://issues.apache.org/jira/browse/HBASE-5684 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D2709.1.patch, D2709.2.patch, D2709.3.patch, D2709.4.patch, D2757.1.patch, D2757.2.patch, D2757.3.patch, D2757.4.patch, jira-HBASE-5684-Make-ProcessBasedLocalHBaseCluster-r-2012-04-12_20_42_02.patch Currently ProcessBasedLocalHBaseCluster runs on top of raw local filesystem. We need it to start a process-based HDFS cluster as well. We also need to make the whole thing more stable so we can use it in unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5763) Fix random failures in TestFSErrorsExposed
[ https://issues.apache.org/jira/browse/HBASE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5763: -- Attachment: Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch Fix random failures in TestFSErrorsExposed -- Key: HBASE-5763 URL: https://issues.apache.org/jira/browse/HBASE-5763 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D2739.1.patch, D2739.2.patch, D2739.3.patch, D2739.4.patch, D2793.1.patch, Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5763) Fix random failures in TestFSErrorsExposed
[ https://issues.apache.org/jira/browse/HBASE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5763: -- Status: Patch Available (was: Open) Fix random failures in TestFSErrorsExposed -- Key: HBASE-5763 URL: https://issues.apache.org/jira/browse/HBASE-5763 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D2739.1.patch, D2739.2.patch, D2739.3.patch, D2739.4.patch, D2793.1.patch, Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5104) Provide a reliable intra-row pagination mechanism
[ https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5104: -- Status: Patch Available (was: Open) Provide a reliable intra-row pagination mechanism - Key: HBASE-5104 URL: https://issues.apache.org/jira/browse/HBASE-5104 Project: HBase Issue Type: Bug Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: D2799.1.patch, testFilterList.rb Addendum: Doing pagination (retrieving at most limit number of KVs at a particular offset) is currently supported via the ColumnPaginationFilter. However, it is not a very clean way of supporting pagination. Some of the problems with it are: * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This is not the case for ColumnPaginationFilter as its internal state gets updated depending on whether or not Filter(A) returns TRUE/FALSE for a particular cell. * When this Filter is used in combination with other filters (e.g., doing AND with another filter using FilterList), the behavior of the query depends on the order of filters in the FilterList. This is not ideal. * ColumnPaginationFilter is a stateful filter which ends up counting multiple versions of the cell as separate values even if another filter upstream or the ScanQueryMatcher is going to reject the value for other reasons. Seems like we need a reliable way to do pagination. The particular use case that prompted this JIRA is pagination within the same rowKey. For example, for a given row key R, get columns with prefix P, starting at offset X (among columns which have prefix P) and limit Y. Some possible fixes might be: 1) enhance ColumnPrefixFilter to support another constructor which supports limit/offset. 2) Support pagination (limit/offset) at the Scan/Get API level (rather than as a filter) [Like SQL]. Original Post: Thanks Jiakai Liu for reporting this issue and doing the initial investigation. Email from Jiakai below: Assuming that we have an index column family with the following entries: tag0:001:thread1 ... tag1:001:thread1 tag1:002:thread2 ... tag1:010:thread10 ... tag2:001:thread1 tag2:005:thread5 ... To get threads with tag1 in range [5, 10), I tried the following code: ColumnPrefixFilter filter1 = new ColumnPrefixFilter(Bytes.toBytes(tag1)); ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit */, 5 /* offset */); FilterList filters = new FilterList(Operator.MUST_PASS_ALL); filters.addFilter(filter1); filters.addFilter(filter2); Get get = new Get(USER); get.addFamily(COLUMN_FAMILY); get.setMaxVersions(1); get.setFilter(filters); Somehow it didn't work as expected. It returned the entries as if the filter1 were not set. Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. The FilterList filter does not handle this return code properly (treat it as INCLUDE). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5684) Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust
[ https://issues.apache.org/jira/browse/HBASE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5684: -- Attachment: jira-HBASE-5684-Make-ProcessBasedLocalHBaseCluster-r-2012-04-12_20_42_02.patch Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust --- Key: HBASE-5684 URL: https://issues.apache.org/jira/browse/HBASE-5684 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D2709.1.patch, D2709.2.patch, D2709.3.patch, D2709.4.patch, D2757.1.patch, D2757.2.patch, jira-HBASE-5684-Make-ProcessBasedLocalHBaseCluster-r-2012-04-12_20_42_02.patch Currently ProcessBasedLocalHBaseCluster runs on top of raw local filesystem. We need it to start a process-based HDFS cluster as well. We also need to make the whole thing more stable so we can use it in unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5684) Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust
[ https://issues.apache.org/jira/browse/HBASE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5684: -- Status: Patch Available (was: Open) Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust --- Key: HBASE-5684 URL: https://issues.apache.org/jira/browse/HBASE-5684 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D2709.1.patch, D2709.2.patch, D2709.3.patch, D2709.4.patch, D2757.1.patch Currently ProcessBasedLocalHBaseCluster runs on top of raw local filesystem. We need it to start a process-based HDFS cluster as well. We also need to make the whole thing more stable so we can use it in unit tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5744) Thrift server metrics should be long instead of int
[ https://issues.apache.org/jira/browse/HBASE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5744: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thrift server metrics should be long instead of int --- Key: HBASE-5744 URL: https://issues.apache.org/jira/browse/HBASE-5744 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D2679.1.patch, D2685.1.patch, D2685.2.patch, D2685.3.patch, jira-HBASE-5744-89-fb-Thrift-server-metrics-should-b-2012-04-07_21_39_35.patch As we measure our Thrift call latencies in nanoseconds, we need to make latencies long instead of int everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5744) Thrift server metrics should be long instead of int
[ https://issues.apache.org/jira/browse/HBASE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5744: -- Attachment: jira-HBASE-5744-89-fb-Thrift-server-metrics-should-b-2012-04-07_21_39_35.patch The same patch (re-attaching to run a test on Jenkins). Thrift server metrics should be long instead of int --- Key: HBASE-5744 URL: https://issues.apache.org/jira/browse/HBASE-5744 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D2679.1.patch, D2685.1.patch, jira-HBASE-5744-89-fb-Thrift-server-metrics-should-b-2012-04-07_21_39_35.patch As we measure our Thrift call latencies in nanoseconds, we need to make latencies long instead of int everywhere. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits
[ https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5618: -- Status: Patch Available (was: Open) SplitLogManager - prevent unnecessary attempts to resubmits --- Key: HBASE-5618 URL: https://issues.apache.org/jira/browse/HBASE-5618 Project: HBase Issue Type: Improvement Components: wal, zookeeper Reporter: Prakash Khemani Assignee: Prakash Khemani Attachments: 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch Currently once a watch fires that the task node has been updated (hearbeated) by the worker, the splitlogmanager still quite some time before it updates the last heard from time. This is because the manager currently schedules another getDataSetWatch() and only after that finishes will it update the task's last heard from time. This leads to a large number of zk-BadVersion warnings when resubmission is continuously attempted and it fails. Two changes should be made (1) On a resubmission failure because of BadVersion the task's lastUpdate time should get upped. (2) The task's lastUpdate time should get upped as soon as the nodeDataChanged() watch fires and without waiting for getDataSetWatch() to complete. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5730) [89-fb] Make HRegionThriftServer's thread pool bounded
[ https://issues.apache.org/jira/browse/HBASE-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5730: -- Description: This JIRA is for a quick fix in 89-fb to reuse TBoundedThreadPoolServer in HRegionThriftServer. We will address whatever problems HRegionThriftServer still has in trunk in HBASE-5703. [89-fb] Make HRegionThriftServer's thread pool bounded -- Key: HBASE-5730 URL: https://issues.apache.org/jira/browse/HBASE-5730 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin This JIRA is for a quick fix in 89-fb to reuse TBoundedThreadPoolServer in HRegionThriftServer. We will address whatever problems HRegionThriftServer still has in trunk in HBASE-5703. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Release Note: Adds a block compression that stores the diff from the previous key only. Good for big keys and small value datasets. Makes writing and scanning slower but because the blocks compressed with this feature stay compressed when in memory up in the block cache, more data is cached. Off by default (DATA_BLOCK_ENCODING=NONE on column descriptor). To enable, set DATA_BLOCK_ENCODING to PREFIX, DIFF or FAST_DIFF on the column descriptor. Set ENCODE_ON_DISK to true on column descriptor to have the encoding in place out in the hfile (on by default). (was: Adds a block compression that stores the diff from the previous key only. Good for big keys and small value datasets. Makes writing and scanning slower but because the blocks compressed with this feature stay compressed when in memory up in the block cache, more data is cached. Off by default. To enable, on the column descriptor set DATA_BLOCK_ENCODING to NONE, PREFIX, DIFF or FAST_DIFF. Set ENCODE_ON_DISK to true on column descriptor to have the encoding in place out in the hfile (on by default).) Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, D1659.1.patch, D1659.2.patch, D1659.3.patch, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, D447.23.patch, D447.24.patch, D447.25.patch, D447.26.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding-2012-01-17_11_09_09.patch, Delta-encoding-2012-01-25_00_45_29.patch, Delta-encoding-2012-01-25_16_32_14.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta-encoding.patch-2012-01-07_14_12_48.patch, Delta-encoding.patch-2012-01-13_12_20_07.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5469) Add baseline compression efficiency to DataBlockEncodingTool
[ https://issues.apache.org/jira/browse/HBASE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5469: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Add baseline compression efficiency to DataBlockEncodingTool Key: HBASE-5469 URL: https://issues.apache.org/jira/browse/HBASE-5469 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D2409.1.patch, D2409.2.patch, jira-HBASE-5469-Add-baseline-compression-efficiency--2012-03-23_15_04_41.patch DataBlockEncodingTool currently does not provide baseline compression efficiency, e.g. Hadoop compression codec applied to unencoded data. E.g. if we are using LZO to compress blocks, we would like to have the following columns in the report (possibly as percentages of raw data size). Baseline K+V in blockcache | Baseline K + V on disk (LZO compressed) | K + V DataBlockEncoded in block cache | K + V DataBlockEncoded + LZOCompressed (on disk) Background: we never store compressed blocks in cache, but we always store encoded data blocks in cache if data block encoding is enabled for the column family. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5469) Add baseline compression efficiency to DataBlockEncodingTool
[ https://issues.apache.org/jira/browse/HBASE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5469: -- Attachment: jira-HBASE-5469-Add-baseline-compression-efficiency--2012-03-23_15_04_41.patch The exact patch that was committed. Add baseline compression efficiency to DataBlockEncodingTool Key: HBASE-5469 URL: https://issues.apache.org/jira/browse/HBASE-5469 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D2409.1.patch, D2409.2.patch, jira-HBASE-5469-Add-baseline-compression-efficiency--2012-03-23_15_04_41.patch DataBlockEncodingTool currently does not provide baseline compression efficiency, e.g. Hadoop compression codec applied to unencoded data. E.g. if we are using LZO to compress blocks, we would like to have the following columns in the report (possibly as percentages of raw data size). Baseline K+V in blockcache | Baseline K + V on disk (LZO compressed) | K + V DataBlockEncoded in block cache | K + V DataBlockEncoded + LZOCompressed (on disk) Background: we never store compressed blocks in cache, but we always store encoded data blocks in cache if data block encoding is enabled for the column family. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4607) Split log worker should terminate properly when waiting for znode
[ https://issues.apache.org/jira/browse/HBASE-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4607: -- Resolution: Fixed Status: Resolved (was: Patch Available) The same changes committed in HBASE-5542. Split log worker should terminate properly when waiting for znode - Key: HBASE-4607 URL: https://issues.apache.org/jira/browse/HBASE-4607 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Fix For: 0.94.0 Attachments: HBASE-4607_SplitLogWorker_should_correct-20111017231456-47a82ef3.patch This is an attempt to fix the fact that SplitLogWorker threads are not being terminated properly in some unit tests. This probably does not happen in production because the master always creates the log-splitting ZK node, but it does happen in 89-fb. Thanks to Prakash Khemani for help on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5521) Move compression/decompression to an encoder specific encoding context
[ https://issues.apache.org/jira/browse/HBASE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5521: -- Attachment: HBASE-5521-jira-Move-compression-decompression-to-an-2012-03-19_12_12_32.patch Attaching what has been committed. Move compression/decompression to an encoder specific encoding context -- Key: HBASE-5521 URL: https://issues.apache.org/jira/browse/HBASE-5521 Project: HBase Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Fix For: 0.96.0 Attachments: HBASE-5521-jira-Move-compression-decompression-to-an-2012-03-19_12_12_32.patch, HBASE-5521.1.patch, HBASE-5521.D2097.1.patch, HBASE-5521.D2097.10.patch, HBASE-5521.D2097.2.patch, HBASE-5521.D2097.3.patch, HBASE-5521.D2097.4.patch, HBASE-5521.D2097.5.patch, HBASE-5521.D2097.6.patch, HBASE-5521.D2097.7.patch, HBASE-5521.D2097.8.patch, HBASE-5521.D2097.9.patch As part of working on HBASE-5313, we want to add a new columnar encoder/decoder. It makes sense to move compression to be part of encoder/decoder: 1) a scanner for a columnar encoded block can do lazy decompression to a specific part of a key value object 2) avoid an extra bytes copy from encoder to hblock-writer. If there is no encoder specified for a writer, the HBlock.Writer will use a default compression-context to do something very similar to today's code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5521) Move compression/decompression to an encoder specific encoding context
[ https://issues.apache.org/jira/browse/HBASE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5521: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Move compression/decompression to an encoder specific encoding context -- Key: HBASE-5521 URL: https://issues.apache.org/jira/browse/HBASE-5521 Project: HBase Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Fix For: 0.96.0 Attachments: HBASE-5521-jira-Move-compression-decompression-to-an-2012-03-19_12_12_32.patch, HBASE-5521.1.patch, HBASE-5521.D2097.1.patch, HBASE-5521.D2097.10.patch, HBASE-5521.D2097.2.patch, HBASE-5521.D2097.3.patch, HBASE-5521.D2097.4.patch, HBASE-5521.D2097.5.patch, HBASE-5521.D2097.6.patch, HBASE-5521.D2097.7.patch, HBASE-5521.D2097.8.patch, HBASE-5521.D2097.9.patch As part of working on HBASE-5313, we want to add a new columnar encoder/decoder. It makes sense to move compression to be part of encoder/decoder: 1) a scanner for a columnar encoded block can do lazy decompression to a specific part of a key value object 2) avoid an extra bytes copy from encoder to hblock-writer. If there is no encoder specified for a writer, the HBlock.Writer will use a default compression-context to do something very similar to today's code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5575) Configure Arcanist lint engine for HBase
[ https://issues.apache.org/jira/browse/HBASE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5575: -- Attachment: Enabling-lint-2012-03-16_13_40_37.patch Configure Arcanist lint engine for HBase Key: HBASE-5575 URL: https://issues.apache.org/jira/browse/HBASE-5575 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: Enabling-lint-2012-03-16_13_40_37.patch We need to enable Arcanist lint engine in HBase, so that a commit could be checked by running arc lint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5566) [89-fb] Region server can get stuck getMaster on master failover
[ https://issues.apache.org/jira/browse/HBASE-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5566: -- Reporter: Prakash Khemani (was: Mikhail Bautin) [89-fb] Region server can get stuck getMaster on master failover Key: HBASE-5566 URL: https://issues.apache.org/jira/browse/HBASE-5566 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Prakash Khemani Assignee: Mikhail Bautin Reported by Prakash. We have a retry loop in HRegionServer.getMaster where we do not read the location of the master from ZK, so a region server can get stuck there on master failover. We need to add a unit test to reliably catch this, and fix the bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5566) [89-fb] Region server can get stuck getMaster on master failover
[ https://issues.apache.org/jira/browse/HBASE-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5566: -- Description: This is specific to the 89-fb master. We have a retry loop in HRegionServer.getMaster where we do not read the location of the master from ZK, so a region server can get stuck there on master failover. We need to add a unit test to reliably catch this, and fix the bug. was: Reported by Prakash. We have a retry loop in HRegionServer.getMaster where we do not read the location of the master from ZK, so a region server can get stuck there on master failover. We need to add a unit test to reliably catch this, and fix the bug. [89-fb] Region server can get stuck getMaster on master failover Key: HBASE-5566 URL: https://issues.apache.org/jira/browse/HBASE-5566 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Prakash Khemani Assignee: Mikhail Bautin This is specific to the 89-fb master. We have a retry loop in HRegionServer.getMaster where we do not read the location of the master from ZK, so a region server can get stuck there on master failover. We need to add a unit test to reliably catch this, and fix the bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5566) [89-fb] Region server can get stuck in getMaster on master failover
[ https://issues.apache.org/jira/browse/HBASE-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5566: -- Summary: [89-fb] Region server can get stuck in getMaster on master failover (was: [89-fb] Region server can get stuck getMaster on master failover) [89-fb] Region server can get stuck in getMaster on master failover --- Key: HBASE-5566 URL: https://issues.apache.org/jira/browse/HBASE-5566 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Prakash Khemani Assignee: Mikhail Bautin This is specific to the 89-fb master. We have a retry loop in HRegionServer.getMaster where we do not read the location of the master from ZK, so a region server can get stuck there on master failover. We need to add a unit test to reliably catch this, and fix the bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4542) add filter info to slow query logging
[ https://issues.apache.org/jira/browse/HBASE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4542: -- Resolution: Fixed Status: Resolved (was: Patch Available) add filter info to slow query logging - Key: HBASE-4542 URL: https://issues.apache.org/jira/browse/HBASE-4542 Project: HBase Issue Type: Improvement Affects Versions: 0.89.20100924 Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: 0001-jira-HBASE-4542-Add-filter-info-to-slow-query-loggin.patch, Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch, D1263.2.patch, D1539.1.patch Slow query log doesn't report filters in effect. For example: {code} (operationTooSlow): \ {processingtimems:3468,client:10.138.43.206:40035,timeRange: [0,9223372036854775807],\ starttimems:1317772005821,responsesize:42411, \ class:HRegionServer,table:myTable,families:{CF1:ALL]},\ row:6c3b8efa132f0219b7621ed1e5c8c70b,queuetimems:0,\ method:get,totalColumns:1,maxVersions:1,storeLimit:-1} {code} the above would suggest that all columns of myTable:CF1 are being requested for the given row. But in reality there could be filters in effect (such as ColumnPrefixFilter, ColumnRangeFilter, TimestampsFilter() etc.). We should enhance the slow query log to capture report this information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4542) add filter info to slow query logging
[ https://issues.apache.org/jira/browse/HBASE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4542: -- Fix Version/s: 0.94.0 add filter info to slow query logging - Key: HBASE-4542 URL: https://issues.apache.org/jira/browse/HBASE-4542 Project: HBase Issue Type: Improvement Affects Versions: 0.89.20100924 Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Fix For: 0.94.0 Attachments: 0001-jira-HBASE-4542-Add-filter-info-to-slow-query-loggin.patch, Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch, D1263.2.patch, D1539.1.patch Slow query log doesn't report filters in effect. For example: {code} (operationTooSlow): \ {processingtimems:3468,client:10.138.43.206:40035,timeRange: [0,9223372036854775807],\ starttimems:1317772005821,responsesize:42411, \ class:HRegionServer,table:myTable,families:{CF1:ALL]},\ row:6c3b8efa132f0219b7621ed1e5c8c70b,queuetimems:0,\ method:get,totalColumns:1,maxVersions:1,storeLimit:-1} {code} the above would suggest that all columns of myTable:CF1 are being requested for the given row. But in reality there could be filters in effect (such as ColumnPrefixFilter, ColumnRangeFilter, TimestampsFilter() etc.). We should enhance the slow query log to capture report this information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well
[ https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5292: -- Attachment: jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch Rebased patch for Hadoop QA testing getsize per-CF metric incorrectly counts compaction related reads as well -- Key: HBASE-5292 URL: https://issues.apache.org/jira/browse/HBASE-5292 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100924 Reporter: Kannan Muthukkaruppan Attachments: 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch, D1617.1.patch, jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch The per-CF getsize metric's intent was to track bytes returned (to HBase clients) per-CF. [Note: We already have metrics to track # of HFileBlock's read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt vs. fsblockreadcnt.] Currently, the getsize metric gets updated for both client initiated Get/Scan operations as well for compaction related reads. The metric is updated in StoreScanner.java:next() when the Scan query matcher returns an INCLUDE* code via a: HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength()); We should not do the above in case of compactions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5557) [89-fb] Fix incorrect reader/writer thread interaction in HBaseTest
[ https://issues.apache.org/jira/browse/HBASE-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5557: -- Summary: [89-fb] Fix incorrect reader/writer thread interaction in HBaseTest (was: [89-fb] Fix incorrect writer / thread interaction in HBaseTest) [89-fb] Fix incorrect reader/writer thread interaction in HBaseTest --- Key: HBASE-5557 URL: https://issues.apache.org/jira/browse/HBASE-5557 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor In the HBaseTest load test we have a condition when the writer has not written any keys but the reader might attempt to read key 0, resulting in a failure. This bug is specific to 89-fb because it has been fixed while open-sourcing HBaseTest as LoadTestTool, and those improvements still have not been back-ported to 89-fb. Doing a temporary fix now and we will get to the back-port later. 12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = cfcd208495d565ef66e7dff9f98764da:0 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to get actions for key = cfcd208495d565ef66e7dff9f98764da:0 12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = cfcd208495d565ef66e7dff9f98764da:0 12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = cfcd208495d565ef66e7dff9f98764da:0 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to get actions for key = cfcd208495d565ef66e7dff9f98764da:0 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to get actions for key = cfcd208495d565ef66e7dff9f98764da:0 12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = cfcd208495d565ef66e7dff9f98764da:0 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to get actions for key = cfcd208495d565ef66e7dff9f98764da:0 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: Aborting run -- found more than three errors -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well
[ https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5292: -- Resolution: Fixed Fix Version/s: 0.94.0 Status: Resolved (was: Patch Available) getsize per-CF metric incorrectly counts compaction related reads as well -- Key: HBASE-5292 URL: https://issues.apache.org/jira/browse/HBASE-5292 Project: HBase Issue Type: Bug Affects Versions: 0.89.20100924 Reporter: Kannan Muthukkaruppan Fix For: 0.94.0 Attachments: 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch, D1617.1.patch, jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch The per-CF getsize metric's intent was to track bytes returned (to HBase clients) per-CF. [Note: We already have metrics to track # of HFileBlock's read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt vs. fsblockreadcnt.] Currently, the getsize metric gets updated for both client initiated Get/Scan operations as well for compaction related reads. The metric is updated in StoreScanner.java:next() when the Scan query matcher returns an INCLUDE* code via a: HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength()); We should not do the above in case of compactions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5535) Make the functions in task monitor synchronized
[ https://issues.apache.org/jira/browse/HBASE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5535: -- Attachment: HBASE-5535-Make-the-functions-in-task-monitor-synchr-2012-03-08_16_33_42.patch Liyin's two-line patch from our internal 89-fb repository. Make the functions in task monitor synchronized --- Key: HBASE-5535 URL: https://issues.apache.org/jira/browse/HBASE-5535 Project: HBase Issue Type: Bug Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-5535-Make-the-functions-in-task-monitor-synchr-2012-03-08_16_33_42.patch There are some potential race condition in the task monitor. So update the functions in task monitor to be synchronized. The example of the problem caused by the race condition: ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flush failed for region java.lang.IndexOutOfBoundsException: Index: 1745, Size: 1744 at java.util.ArrayList.add(ArrayList.java:367) at java.util.SubList.add(AbstractList.java:633) at java.util.SubList.add(AbstractList.java:633) at java.util.SubList.add(AbstractList.java:633) at java.util.SubList.add(AbstractList.java:633) at java.util.SubList.add(AbstractList.java:633) at java.util.AbstractList.add(AbstractList.java:91) at org.apache.hadoop.hbase.monitoring.TaskMonitor.createStatus(TaskMonitor.java:74) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1139) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:260) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:234) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5535) Make the functions in task monitor synchronized
[ https://issues.apache.org/jira/browse/HBASE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5535: -- Status: Patch Available (was: Open) Make the functions in task monitor synchronized --- Key: HBASE-5535 URL: https://issues.apache.org/jira/browse/HBASE-5535 Project: HBase Issue Type: Bug Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-5535-Make-the-functions-in-task-monitor-synchr-2012-03-08_16_33_42.patch There are some potential race condition in the task monitor. So update the functions in task monitor to be synchronized. The example of the problem caused by the race condition: ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flush failed for region java.lang.IndexOutOfBoundsException: Index: 1745, Size: 1744 at java.util.ArrayList.add(ArrayList.java:367) at java.util.SubList.add(AbstractList.java:633) at java.util.SubList.add(AbstractList.java:633) at java.util.SubList.add(AbstractList.java:633) at java.util.SubList.add(AbstractList.java:633) at java.util.SubList.add(AbstractList.java:633) at java.util.AbstractList.add(AbstractList.java:91) at org.apache.hadoop.hbase.monitoring.TaskMonitor.createStatus(TaskMonitor.java:74) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1139) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:260) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:234) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4542) add filter info to slow query logging
[ https://issues.apache.org/jira/browse/HBASE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4542: -- Attachment: Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch Rebasing patch on trunk add filter info to slow query logging - Key: HBASE-4542 URL: https://issues.apache.org/jira/browse/HBASE-4542 Project: HBase Issue Type: Improvement Affects Versions: 0.89.20100924 Reporter: Kannan Muthukkaruppan Assignee: Madhuwanti Vaidya Attachments: 0001-jira-HBASE-4542-Add-filter-info-to-slow-query-loggin.patch, Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch, D1263.2.patch, D1539.1.patch Slow query log doesn't report filters in effect. For example: {code} (operationTooSlow): \ {processingtimems:3468,client:10.138.43.206:40035,timeRange: [0,9223372036854775807],\ starttimems:1317772005821,responsesize:42411, \ class:HRegionServer,table:myTable,families:{CF1:ALL]},\ row:6c3b8efa132f0219b7621ed1e5c8c70b,queuetimems:0,\ method:get,totalColumns:1,maxVersions:1,storeLimit:-1} {code} the above would suggest that all columns of myTable:CF1 are being requested for the given row. But in reality there could be filters in effect (such as ColumnPrefixFilter, ColumnRangeFilter, TimestampsFilter() etc.). We should enhance the slow query log to capture report this information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5357: -- Attachment: Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, D1851.4.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5357: -- Attachment: Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_25.patch Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, D1851.4.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_25.patch, Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5357: -- Attachment: (was: Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_25.patch) Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, D1851.4.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5357: -- Attachment: Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, D1851.4.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5442) Use builder pattern in StoreFile and HFile
[ https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5442: -- Resolution: Fixed Fix Version/s: 0.94.0 Status: Resolved (was: Patch Available) Committed to trunk. Use builder pattern in StoreFile and HFile -- Key: HBASE-5442 URL: https://issues.apache.org/jira/browse/HBASE-5442 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.94.0 Attachments: D1893.1.patch, D1893.2.patch, HFile-StoreFile-builder-2012-02-22_22_49_00.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses StoreFile and HFile refactoring. For HColumnDescriptor refactoring see HBASE-5357. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5442) Use builder pattern in StoreFile and HFile
[ https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5442: -- Attachment: HFile-StoreFile-builder-2012-02-22_22_49_00.patch Use builder pattern in StoreFile and HFile -- Key: HBASE-5442 URL: https://issues.apache.org/jira/browse/HBASE-5442 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1893.1.patch, D1893.2.patch, HFile-StoreFile-builder-2012-02-22_22_49_00.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses StoreFile and HFile refactoring. For HColumnDescriptor refactoring see HBASE-5357. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5442) Use builder pattern in StoreFile and HFile
[ https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5442: -- Status: Patch Available (was: Open) Use builder pattern in StoreFile and HFile -- Key: HBASE-5442 URL: https://issues.apache.org/jira/browse/HBASE-5442 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1893.1.patch, D1893.2.patch, HFile-StoreFile-builder-2012-02-22_22_49_00.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses StoreFile and HFile refactoring. For HColumnDescriptor refactoring see HBASE-5357. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5357) Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5357: -- Attachment: Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5357) Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5357: -- Status: Patch Available (was: Open) Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5357: -- Description: We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. was: We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. Summary: Use builder pattern in HColumnDescriptor (was: Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation) Use builder pattern in HColumnDescriptor Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: D1851.1.patch, Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. This particular JIRA addresses the HColumnDescriptor refactoring. For StoreFile/HFile refactoring see HBASE-5442. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer
[ https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5387: -- Status: Patch Available (was: Open) Reuse compression streams in HFileBlock.Writer -- Key: HBASE-5387 URL: https://issues.apache.org/jira/browse/HBASE-5387 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical Fix For: 0.94.0 Attachments: 5387.txt, D1719.1.patch, D1719.2.patch, D1719.3.patch, D1719.4.patch, D1719.5.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch, Fix-deflater-leak-2012-02-11_17_13_10.patch, Fix-deflater-leak-2012-02-12_00_37_27.patch We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer
[ https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5387: -- Status: Open (was: Patch Available) Reuse compression streams in HFileBlock.Writer -- Key: HBASE-5387 URL: https://issues.apache.org/jira/browse/HBASE-5387 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical Fix For: 0.94.0 Attachments: 5387.txt, D1719.1.patch, D1719.2.patch, D1719.3.patch, D1719.4.patch, D1719.5.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch, Fix-deflater-leak-2012-02-11_17_13_10.patch, Fix-deflater-leak-2012-02-12_00_37_27.patch We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer
[ https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5387: -- Attachment: Fix-deflater-leak-2012-02-12_00_37_27.patch Reuse compression streams in HFileBlock.Writer -- Key: HBASE-5387 URL: https://issues.apache.org/jira/browse/HBASE-5387 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical Fix For: 0.94.0 Attachments: 5387.txt, D1719.1.patch, D1719.2.patch, D1719.3.patch, D1719.4.patch, D1719.5.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch, Fix-deflater-leak-2012-02-11_17_13_10.patch, Fix-deflater-leak-2012-02-12_00_37_27.patch We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer
[ https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5387: -- Attachment: Fix-deflater-leak-2012-02-11_17_13_10.patch Reuse compression streams in HFileBlock.Writer -- Key: HBASE-5387 URL: https://issues.apache.org/jira/browse/HBASE-5387 Project: HBase Issue Type: Bug Affects Versions: 0.94.0 Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Critical Fix For: 0.94.0 Attachments: D1719.1.patch, D1719.2.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch, Fix-deflater-leak-2012-02-11_17_13_10.patch We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure every time we create a compression stream. The native data structure is only deallocated in the finalizer. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5369) Compaction selection based on the hotness of the HFile's block in the block cache
[ https://issues.apache.org/jira/browse/HBASE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5369: -- Description: HBase reserves a large set memory for the block cache and the cached blocks will be age out in a LRU fashion. Obviously, we don't want to age out the blocks which are still hot. However, when the compactions are starting, these hot blocks may naturally be invalid. Considering that the block cache has already known which HFiles these hot blocks come from, the compaction selection algorithm could just simply skip compact these HFiles until these block cache become cold. was: HBase reserves a large set memory for the block cache and the cached blocks will be age out in a LRU fashion. Obviously, we don't want to age out the blocks which are still hot. However, when the compactions are starting, these hot blocks may naturally be invalid. Considering that the block cache has already known which HFiles these hot blocks come from, the compaction selection algorithm could just simply skip compact these HFiles until these block cache become cold. Furthermore, the HBase could compact multiple HFiles into two HFiles. One of them only contains hot blocks which are supposed be cached directly. Compaction selection based on the hotness of the HFile's block in the block cache - Key: HBASE-5369 URL: https://issues.apache.org/jira/browse/HBASE-5369 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang HBase reserves a large set memory for the block cache and the cached blocks will be age out in a LRU fashion. Obviously, we don't want to age out the blocks which are still hot. However, when the compactions are starting, these hot blocks may naturally be invalid. Considering that the block cache has already known which HFiles these hot blocks come from, the compaction selection algorithm could just simply skip compact these HFiles until these block cache become cold. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5382) Test that we always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5382: -- Status: Patch Available (was: Open) Test that we always cache index and bloom blocks Key: HBASE-5382 URL: https://issues.apache.org/jira/browse/HBASE-5382 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch This is a unit test that should have been part of HBASE-4683 but was not committed. The original test was reviewed https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and patch, and extending the scope of the test to also handle the case when block cache is enabled for the column family. The new review is at https://reviews.facebook.net/D1695. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5382) Test that we always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5382: -- Assignee: Mikhail Bautin Test that we always cache index and bloom blocks Key: HBASE-5382 URL: https://issues.apache.org/jira/browse/HBASE-5382 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch This is a unit test that should have been part of HBASE-4683 but was not committed. The original test was reviewed https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and patch, and extending the scope of the test to also handle the case when block cache is enabled for the column family. The new review is at https://reviews.facebook.net/D1695. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5382) Test that we always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5382: -- Attachment: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch Test that we always cache index and bloom blocks Key: HBASE-5382 URL: https://issues.apache.org/jira/browse/HBASE-5382 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch This is a unit test that should have been part of HBASE-4683 but was not committed. The original test was reviewed https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and patch, and extending the scope of the test to also handle the case when block cache is enabled for the column family. The new review is at https://reviews.facebook.net/D1695. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5382) Test that we always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5382: -- Description: This is a unit test that should have been part of HBASE-4683 but was not committed. The original test was reviewed as part of https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and patch, and extending the scope of the test to also handle the case when block cache is enabled for the column family. The new review is at https://reviews.facebook.net/D1695. (was: This is a unit test that should have been part of HBASE-4683 but was not committed. The original test was reviewed https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and patch, and extending the scope of the test to also handle the case when block cache is enabled for the column family. The new review is at https://reviews.facebook.net/D1695.) Test that we always cache index and bloom blocks Key: HBASE-5382 URL: https://issues.apache.org/jira/browse/HBASE-5382 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch This is a unit test that should have been part of HBASE-4683 but was not committed. The original test was reviewed as part of https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and patch, and extending the scope of the test to also handle the case when block cache is enabled for the column family. The new review is at https://reviews.facebook.net/D1695. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer
[ https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5387: -- Attachment: Fix-deflater-leak-2012-02-10_18_48_45.patch Reuse compression streams in HFileBlock.Writer -- Key: HBASE-5387 URL: https://issues.apache.org/jira/browse/HBASE-5387 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch We need to to reuse compression streams in HFileBlock.Writer instead of allocating them every time. The motivation is that when using Java's built-in implementation of Gzip, we allocate a new GZIPOutputStream object and an associated native data structure any time. This is one suspected cause of recent TestHFileBlock failures on Hadoop QA: https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5230) Ensure compactions do not cache-on-write data blocks
[ https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5230: -- Resolution: Fixed Release Note: Committed into both trunk and 89-fb Status: Resolved (was: Patch Available) Ensure compactions do not cache-on-write data blocks Key: HBASE-5230 URL: https://issues.apache.org/jira/browse/HBASE-5230 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, D1353.4.patch, Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch Create a unit test for HBASE-3976 (making sure we don't cache data blocks on write during compactions even if cache-on-write is enabled generally enabled). This is because we have very different implementations of HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) and with CacheConfig (presumably it's there but not sure if it even works, since the patch in HBASE-3976 may not have been committed). We need to create a unit test to verify that we don't cache data blocks on write during compactions, and resolve HBASE-3976 so that this new unit test does not fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5230) Ensure compactions do not cache-on-write data blocks
[ https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5230: -- Release Note: (was: Committed into both trunk and 89-fb) Ensure compactions do not cache-on-write data blocks Key: HBASE-5230 URL: https://issues.apache.org/jira/browse/HBASE-5230 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, D1353.4.patch, Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch Create a unit test for HBASE-3976 (making sure we don't cache data blocks on write during compactions even if cache-on-write is enabled generally enabled). This is because we have very different implementations of HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) and with CacheConfig (presumably it's there but not sure if it even works, since the patch in HBASE-3976 may not have been committed). We need to create a unit test to verify that we don't cache data blocks on write during compactions, and resolve HBASE-3976 so that this new unit test does not fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5357) Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation
[ https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5357: -- Description: We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. was: We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .instantiate(); {code} Each parameter setter being on the same line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation Key: HBASE-5357 URL: https://issues.apache.org/jira/browse/HBASE-5357 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin We have five ways to create an HFile writer, two ways to create a StoreFile writer, and the sets of parameters keep changing, creating a lot of confusion, especially when porting patches across branches. The same thing is happening to HColumnDescriptor. I think we should move to a builder pattern solution, e.g. {code:java} HFileWriter w = HFile.getWriterBuilder(conf, some common args) .setParameter1(value1) .setParameter2(value2) ... .build(); {code} Each parameter setter being on its own line will make merges/cherry-pick work properly, we will not have to even mention default parameters again, and we can eliminate a dozen impossible-to-remember constructors. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5010) Filter HFiles based on TTL
[ https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5010: -- Resolution: Fixed Fix Version/s: 0.94.0 Assignee: Mikhail Bautin (was: Zhihong Yu) Status: Resolved (was: Patch Available) A follow-up fix was submitted as part of HBASE-5274 to bring the trunk fix for this issue to parity with the 89-fb fix. Resolving. Filter HFiles based on TTL -- Key: HBASE-5010 URL: https://issues.apache.org/jira/browse/HBASE-5010 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Fix For: 0.94.0 Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, D909.2.patch, D909.3.patch, D909.4.patch, D909.5.patch, D909.6.patch In ScanWildcardColumnTracker we have {code:java} this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl; ... private boolean isExpired(long timestamp) { return timestamp oldestStamp; } {code} but this time range filtering does not participate in HFile selection. In one real case this caused next() calls to time out because all KVs in a table got expired, but next() had to iterate over the whole table to find that out. We should be able to filter out those HFiles right away. I think a reasonable approach is to add a default timerange filter to every scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding-2012-01-25_00_45_29.patch Submitting for Jenkins testing. This corresponds to the latest patch on Phabricator: https://reviews.facebook.net/D447?vs=id=4407whitespace=ignore-all Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, D447.23.patch, D447.24.patch, D447.25.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding-2012-01-17_11_09_09.patch, Delta-encoding-2012-01-25_00_45_29.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta-encoding.patch-2012-01-07_14_12_48.patch, Delta-encoding.patch-2012-01-13_12_20_07.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5230) Ensure compactions do not cache-on-write data blocks
[ https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5230: -- Issue Type: Improvement (was: Test) Summary: Ensure compactions do not cache-on-write data blocks (was: Unit test to ensure compactions don't cache data on write) Ensure compactions do not cache-on-write data blocks Key: HBASE-5230 URL: https://issues.apache.org/jira/browse/HBASE-5230 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, D1353.4.patch, Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch Create a unit test for HBASE-3976 (making sure we don't cache data blocks on write during compactions even if cache-on-write is enabled generally enabled). This is because we have very different implementations of HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) and with CacheConfig (presumably it's there but not sure if it even works, since the patch in HBASE-3976 may not have been committed). We need to create a unit test to verify that we don't cache data blocks on write during compactions, and resolve HBASE-3976 so that this new unit test does not fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding-2012-01-25_16_32_14.patch Attaching a patch rebased on HBASE-5230 and addressing Jerry's new comment. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, D447.23.patch, D447.24.patch, D447.25.patch, D447.26.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding-2012-01-17_11_09_09.patch, Delta-encoding-2012-01-25_00_45_29.patch, Delta-encoding-2012-01-25_16_32_14.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta-encoding.patch-2012-01-07_14_12_48.patch, Delta-encoding.patch-2012-01-13_12_20_07.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3796) Per-Store Entries in Compaction Queue
[ https://issues.apache.org/jira/browse/HBASE-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-3796: -- Release Note: (was: Sorry, it seems like I re-opened the wrong patch instead of HBASE-3976. Restoring the Fixed status.) Per-Store Entries in Compaction Queue - Key: HBASE-3796 URL: https://issues.apache.org/jira/browse/HBASE-3796 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Nicolas Spiegelberg Priority: Minor Fix For: 0.92.1 Attachments: HBASE-3796-fixed.patch, HBASE-3796.patch Although compaction is decided on a per-store basis, right now the CompactSplitThread only deals at the Region level for queueing. Store-level compaction queue entries will give us more visibility into compaction workload + allow us to stop summarizing priorities. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5230) Unit test to ensure compactions don't cache data on write
[ https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5230: -- Attachment: Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch Attaching the most recent patch (rebased on trunk changes -- maybe even identical). Unit test to ensure compactions don't cache data on write - Key: HBASE-5230 URL: https://issues.apache.org/jira/browse/HBASE-5230 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch Create a unit test for HBASE-3976 (making sure we don't cache data blocks on write during compactions even if cache-on-write is enabled generally enabled). This is because we have very different implementations of HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) and with CacheConfig (presumably it's there but not sure if it even works, since the patch in HBASE-3976 may not have been committed). We need to create a unit test to verify that we don't cache data blocks on write during compactions, and resolve HBASE-3976 so that this new unit test does not fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5130) A map-reduce wrapper for HBase test suite (mr-test-runner)
[ https://issues.apache.org/jira/browse/HBASE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5130: -- Description: We have a tool we call mrunit (but will call mr-test-runner in the open-source version) that runs HBase unit tests on a map-reduce cluster. We need modify it to use distributed cache to deploy the code on the cluster instead of our internal deployment tool, and open-source it. (was: We have a tool we call mrunit that runs HBase unit tests on a map-reduce cluster. We need modify it to use distributed cache to deploy the code on the cluster instead of our internal deployment tool, and open-source it.) Summary: A map-reduce wrapper for HBase test suite (mr-test-runner) (was: A map-reduce wrapper for HBase test suite (mrunit)) A map-reduce wrapper for HBase test suite (mr-test-runner) Key: HBASE-5130 URL: https://issues.apache.org/jira/browse/HBASE-5130 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin We have a tool we call mrunit (but will call mr-test-runner in the open-source version) that runs HBase unit tests on a map-reduce cluster. We need modify it to use distributed cache to deploy the code on the cluster instead of our internal deployment tool, and open-source it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5230) Unit test to ensure compactions don't cache data on write
[ https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5230: -- Attachment: Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch A new patch addressing Nicolas's comments. Unit test to ensure compactions don't cache data on write - Key: HBASE-5230 URL: https://issues.apache.org/jira/browse/HBASE-5230 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, D1353.4.patch, Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch Create a unit test for HBASE-3976 (making sure we don't cache data blocks on write during compactions even if cache-on-write is enabled generally enabled). This is because we have very different implementations of HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) and with CacheConfig (presumably it's there but not sure if it even works, since the patch in HBASE-3976 may not have been committed). We need to create a unit test to verify that we don't cache data blocks on write during compactions, and resolve HBASE-3976 so that this new unit test does not fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5230) Unit test to ensure compactions don't cache data on write
[ https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5230: -- Attachment: Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch Attaching patch for Jenkins testing. Unit test to ensure compactions don't cache data on write - Key: HBASE-5230 URL: https://issues.apache.org/jira/browse/HBASE-5230 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D1353.1.patch, D1353.2.patch, Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch Create a unit test for HBASE-3976 (making sure we don't cache data blocks on write during compactions even if cache-on-write is enabled generally enabled). This is because we have very different implementations of HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) and with CacheConfig (presumably it's there but not sure if it even works, since the patch in HBASE-3976 may not have been committed). We need to create a unit test to verify that we don't cache data blocks on write during compactions, and resolve HBASE-3976 so that this new unit test does not fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5230) Unit test to ensure compactions don't cache data on write
[ https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5230: -- Status: Patch Available (was: Open) Unit test to ensure compactions don't cache data on write - Key: HBASE-5230 URL: https://issues.apache.org/jira/browse/HBASE-5230 Project: HBase Issue Type: Test Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D1353.1.patch, D1353.2.patch, Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch Create a unit test for HBASE-3976 (making sure we don't cache data blocks on write during compactions even if cache-on-write is enabled generally enabled). This is because we have very different implementations of HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) and with CacheConfig (presumably it's there but not sure if it even works, since the patch in HBASE-3976 may not have been committed). We need to create a unit test to verify that we don't cache data blocks on write during compactions, and resolve HBASE-3976 so that this new unit test does not fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding-2012-01-17_11_09_09.patch Appending a patch that can be applied by Hadoop QA. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, D447.23.patch, D447.24.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding-2012-01-17_11_09_09.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta-encoding.patch-2012-01-07_14_12_48.patch, Delta-encoding.patch-2012-01-13_12_20_07.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding.patch-2012-01-13_12_20_07.patch Attaching a patch generated using git format-patch --no-prefix HEAD^..HEAD that can be applied by the normal patch command. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta-encoding.patch-2012-01-07_14_12_48.patch, Delta-encoding.patch-2012-01-13_12_20_07.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding.patch-2012-01-07_14_12_48.patch Attaching a patch rebased on trunk changes. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta-encoding.patch-2012-01-07_14_12_48.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding.patch-2012-01-05_15_16_43.patch Uploading a patch that should apply clearly. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding.patch-2012-01-05_16_31_44.patch Fixing an NPE in EncodedSeekPerformanceTest. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding.patch-2012-01-05_16_31_44_copy.patch Attaching a patch that applies. (A new unit test is coming for HFile v1 to encoded HFile v2 upgrade, so the patch is not final yet.) Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding.patch-2012-01-05_18_50_47.patch Adding a test that upgrades from HFile v1 to encoded HFile v2. Data Block Encoding of KeyValues (aka delta encoding / prefix compression) --- Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Fix For: 0.94.0 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, D447.20.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta-encoding.patch-2012-01-05_15_16_43.patch, Delta-encoding.patch-2012-01-05_16_31_44.patch, Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, Delta-encoding.patch-2012-01-05_18_50_47.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta-encoding.patch-2011-12-22_11_52_07.patch Appending a new version of patch that should apply using the patch command, compile, and pass TestHeapSize on Jenkins. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Status: Patch Available (was: Open) Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Status: Open (was: Patch Available) Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Delta-encoding.patch-2011-12-22_11_52_07.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: 0001-Delta-encoding.patch Adding a patch generated by git format-patch --no-prefix, since those auto-generated by Phabricator do not apply with the patch command for some reason. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4683: -- Attachment: 0001-Cache-important-block-types.patch Attaching the patch rebased on top of r1214519. Always cache index and bloom blocks --- Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Mikhail Bautin Priority: Minor Fix For: 0.92.0, 0.94.0 Attachments: 0001-Cache-important-block-types.patch, 4683-v2.txt, 4683.txt, D807.1.patch, D807.2.patch, D807.3.patch, HBASE-4683-0.92-v2.patch, HBASE-4683-v3.patch This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. Update (Mikhail): we probably don't need a new conf option. Instead, we will make index blocks cached by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4683) Always cache index blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4683: -- Summary: Always cache index blocks (was: Create config option to only cache index blocks) Always cache index blocks - Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor Fix For: 0.94.0 Attachments: 4683-v2.txt, 4683.txt This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5010) Filter HFiles based on TTL
[ https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5010: -- Description: In ScanWildcardColumnTracker we have {code:java} this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl; ... private boolean isExpired(long timestamp) { return timestamp oldestStamp; } {code} but this time range filtering does not participate in HFile selection. In one real case this caused next() calls to time out because all KVs in a table got expired, but next() had to iterate over the whole table to find that out. We should be able to filter out those HFiles right away. I think a reasonable approach is to add a default timerange filter to every scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter. was: In ScanWildcardColumnTracker we have { this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl; ... private boolean isExpired(long timestamp) { return timestamp oldestStamp; } } but this time range filtering does not participate in HFile selection. In one real case this caused next() calls to time out because all KVs in a table got expired, but next() had to iterate over the whole table to find that out. We should be able to filter out those HFiles right away. I think a reasonable approach is to add a default timerange filter to every scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter. Filter HFiles based on TTL -- Key: HBASE-5010 URL: https://issues.apache.org/jira/browse/HBASE-5010 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin In ScanWildcardColumnTracker we have {code:java} this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl; ... private boolean isExpired(long timestamp) { return timestamp oldestStamp; } {code} but this time range filtering does not participate in HFile selection. In one real case this caused next() calls to time out because all KVs in a table got expired, but next() had to iterate over the whole table to find that out. We should be able to filter out those HFiles right away. I think a reasonable approach is to add a default timerange filter to every scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4683: -- Summary: Always cache index and bloom blocks (was: Always cache index blocks) Always cache index and bloom blocks --- Key: HBASE-4683 URL: https://issues.apache.org/jira/browse/HBASE-4683 Project: HBase Issue Type: New Feature Reporter: Lars Hofhansl Assignee: Mikhail Bautin Priority: Minor Fix For: 0.94.0 Attachments: 4683-v2.txt, 4683.txt, HBASE-4683-v3.patch This would add a new boolean config option: hfile.block.cache.datablocks Default would be true. Setting this to false allows HBase in a mode where only index blocks are cached, which is useful for analytical scenarios where a useful working set of the data cannot be expected to fit into the (aggregate) cache. This is the equivalent of setting cacheBlocks to false on all scans (including scans on behalf of gets). I would like to get a general feeling about what folks think about this. The change itself would be simple. Update (Mikhail): we probably don't need a new conf option. Instead, we will make index blocks cached by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Status: Patch Available (was: Open) Testing current version on Jenkins. Not ready to commit yet -- more testing required. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Assignee: Mikhail Bautin Labels: compression Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, D447.1.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Open (was: Patch Available) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch, D549.9.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Attachment: 0003-HBase-cluster-test-tool.patch Attaching the most recent patch. HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, 0003-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch, D549.9.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Patch Available (was: Open) One more round of Jenkins testing for the patch. HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, 0003-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch, D549.9.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Patch Available (was: Open) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Open (was: Patch Available) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Open (was: Patch Available) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Patch Available (was: Open) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Attachment: 0002-HBase-cluster-test-tool.patch Uploading a patch for Jenkins testing. HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Patch Available (was: Open) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Attachment: 0001-HBase-cluster-test-tool.patch HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4218: -- Attachment: Delta_encoding_with_memstore_TS.patch Attaching the most recent patch for testing on Jenkins. This is still pending cluster testing. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Affects Versions: 0.94.0 Reporter: Jacek Migdal Labels: compression Attachments: D447.1.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4863: -- Status: Open (was: Patch Available) Make HBase Thrift server more configurable and add a command-line UI test - Key: HBASE-4863 URL: https://issues.apache.org/jira/browse/HBASE-4863 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, D531.2.patch, D531.3.patch, D531.4.patch This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side. Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4863: -- Attachment: 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch Make HBase Thrift server more configurable and add a command-line UI test - Key: HBASE-4863 URL: https://issues.apache.org/jira/browse/HBASE-4863 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, D531.2.patch, D531.3.patch, D531.4.patch This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side. Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4863: -- Status: Patch Available (was: Open) Make HBase Thrift server more configurable and add a command-line UI test - Key: HBASE-4863 URL: https://issues.apache.org/jira/browse/HBASE-4863 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, D531.2.patch, D531.3.patch This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side. Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4863: -- Attachment: 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch The same as D531.3.patch but generated using git format-patch --no-prefix HEAD^..HEAD so that it can be applied using the normal patch command. Make HBase Thrift server more configurable and add a command-line UI test - Key: HBASE-4863 URL: https://issues.apache.org/jira/browse/HBASE-4863 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, D531.2.patch, D531.3.patch This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side. Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics
[ https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4809: -- Attachment: HBASE-4809_Per_CF_set_RPC_metrics.patch This corresponds to D483.3.patch. Per-CF set RPC metrics -- Key: HBASE-4809 URL: https://issues.apache.org/jira/browse/HBASE-4809 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D483.1.patch, D483.2.patch, D483.3.patch, HBASE-4809_Per_CF_set_RPC_metrics.patch Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to trunk. For each mutation signature (a set of column families involved in an RPC request) we increment several metrics, allowing to monitor access patterns. We deal with guarding against an explosion of the number of metrics in HBASE-4638 (which might even be implemented as part of this JIRA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics
[ https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4809: -- Release Note: Testing the patch on Hudson. Status: Patch Available (was: Open) Per-CF set RPC metrics -- Key: HBASE-4809 URL: https://issues.apache.org/jira/browse/HBASE-4809 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D483.1.patch, D483.2.patch, D483.3.patch, HBASE-4809_Per_CF_set_RPC_metrics.patch Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to trunk. For each mutation signature (a set of column families involved in an RPC request) we increment several metrics, allowing to monitor access patterns. We deal with guarding against an explosion of the number of metrics in HBASE-4638 (which might even be implemented as part of this JIRA). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4795) Fix TestHFileBlock when running on a 32-bit JVM
[ https://issues.apache.org/jira/browse/HBASE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4795: -- Priority: Minor (was: Major) Fix TestHFileBlock when running on a 32-bit JVM --- Key: HBASE-4795 URL: https://issues.apache.org/jira/browse/HBASE-4795 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D459.1.patch Our Hudson test server seems to run a 32-bit JVM. This patch fixes TestHFileBlock to work correctly for both 64-bit and 32-bit JVM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4795) Fix TestHFileBlock when running on a 32-bit JVM
[ https://issues.apache.org/jira/browse/HBASE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4795: -- Status: Patch Available (was: Open) Fix TestHFileBlock when running on a 32-bit JVM --- Key: HBASE-4795 URL: https://issues.apache.org/jira/browse/HBASE-4795 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D459.1.patch Our Hudson test server seems to run a 32-bit JVM. This patch fixes TestHFileBlock to work correctly for both 64-bit and 32-bit JVM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4795) Fix TestHFileBlock when running on a 32-bit JVM
[ https://issues.apache.org/jira/browse/HBASE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4795: -- Status: Open (was: Patch Available) Fix TestHFileBlock when running on a 32-bit JVM --- Key: HBASE-4795 URL: https://issues.apache.org/jira/browse/HBASE-4795 Project: HBase Issue Type: Bug Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor Attachments: D459.1.patch, D459.2.patch Our Hudson test server seems to run a 32-bit JVM. This patch fixes TestHFileBlock to work correctly for both 64-bit and 32-bit JVM. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira