from:"Mikhail Bautin \(Updated\) \(JIRA\)"

[jira] [Updated] (HBASE-5763) Fix random failures in TestFSErrorsExposed

2012-04-17 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5763:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed everywhere.

 Fix random failures in TestFSErrorsExposed
 --

 Key: HBASE-5763
 URL: https://issues.apache.org/jira/browse/HBASE-5763
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2739.1.patch, D2739.2.patch, D2739.3.patch, 
 D2739.4.patch, D2793.1.patch, D2793.2.patch, D2793.3.patch, 
 Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch, 
 Fix-TestFSErrorsExposed-2012-04-16_15_41_24.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5104) Provide a reliable intra-row pagination mechanism

2012-04-16 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5104:
--

Attachment: 
jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch

Manually attaching the most recent patch.

 Provide a reliable intra-row pagination mechanism
 -

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: D2799.1.patch, D2799.2.patch, D2799.3.patch, 
 jira-HBASE-5104-Provide-a-reliable-intra-row-paginat-2012-04-16_12_39_42.patch,
  testFilterList.rb


 Addendum:
 Doing pagination (retrieving at most limit number of KVs at a particular 
 offset) is currently supported via the ColumnPaginationFilter. However, it 
 is not a very clean way of supporting pagination.  Some of the problems with 
 it are:
 * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have 
 same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This 
 is not the case for ColumnPaginationFilter as its internal state gets updated 
 depending on whether or not Filter(A) returns TRUE/FALSE for a particular 
 cell.
 * When this Filter is used in combination with other filters (e.g., doing AND 
 with another filter using FilterList), the behavior of the query depends on 
 the order of filters in the FilterList. This is not ideal.
 * ColumnPaginationFilter is a stateful filter which ends up counting multiple 
 versions of the cell as separate values even if another filter upstream or 
 the ScanQueryMatcher is going to reject the value for other reasons.
 Seems like we need a reliable way to do pagination. The particular use case 
 that prompted this JIRA is pagination within the same rowKey. For example, 
 for a given row key R, get columns with prefix P, starting at offset X (among 
 columns which have prefix P) and limit Y. Some possible fixes might be:
 1) enhance ColumnPrefixFilter to support another constructor which supports 
 limit/offset.
 2) Support pagination (limit/offset) at the Scan/Get API level (rather than 
 as a filter) [Like SQL].
 Original Post:
 Thanks Jiakai Liu for reporting this issue and doing the initial 
 investigation. Email from Jiakai below:
 Assuming that we have an index column family with the following entries:
 tag0:001:thread1
 ...
 tag1:001:thread1
 tag1:002:thread2
 ...
 tag1:010:thread10
 ...
 tag2:001:thread1
 tag2:005:thread5
 ...
 To get threads with tag1 in range [5, 10), I tried the following code:
 ColumnPrefixFilter filter1 = new 
 ColumnPrefixFilter(Bytes.toBytes(tag1));
 ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
 */, 5 /* offset */);
 FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 Get get = new Get(USER);
 get.addFamily(COLUMN_FAMILY);
 get.setMaxVersions(1);
 get.setFilter(filters);
 Somehow it didn't work as expected. It returned the entries as if the filter1 
 were not set.
 Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
 The FilterList filter does not handle this return code properly (treat it as 
 INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5763) Fix random failures in TestFSErrorsExposed

2012-04-16 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5763:
--

Attachment: Fix-TestFSErrorsExposed-2012-04-16_15_41_24.patch

Attaching trunk patch for Jenkins testing.

 Fix random failures in TestFSErrorsExposed
 --

 Key: HBASE-5763
 URL: https://issues.apache.org/jira/browse/HBASE-5763
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2739.1.patch, D2739.2.patch, D2739.3.patch, 
 D2739.4.patch, D2793.1.patch, D2793.2.patch, 
 Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch, 
 Fix-TestFSErrorsExposed-2012-04-16_15_41_24.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5684) Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust

2012-04-13 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5684:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.

 Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust
 ---

 Key: HBASE-5684
 URL: https://issues.apache.org/jira/browse/HBASE-5684
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D2709.1.patch, D2709.2.patch, D2709.3.patch, 
 D2709.4.patch, D2757.1.patch, D2757.2.patch, D2757.3.patch, D2757.4.patch, 
 jira-HBASE-5684-Make-ProcessBasedLocalHBaseCluster-r-2012-04-12_20_42_02.patch


 Currently ProcessBasedLocalHBaseCluster runs on top of raw local filesystem. 
 We need it to start a process-based HDFS cluster as well. We also need to 
 make the whole thing more stable so we can use it in unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5763) Fix random failures in TestFSErrorsExposed

2012-04-13 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5763:
--

Attachment: Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch

 Fix random failures in TestFSErrorsExposed
 --

 Key: HBASE-5763
 URL: https://issues.apache.org/jira/browse/HBASE-5763
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2739.1.patch, D2739.2.patch, D2739.3.patch, 
 D2739.4.patch, D2793.1.patch, 
 Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5763) Fix random failures in TestFSErrorsExposed

2012-04-13 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5763:
--

Status: Patch Available  (was: Open)

 Fix random failures in TestFSErrorsExposed
 --

 Key: HBASE-5763
 URL: https://issues.apache.org/jira/browse/HBASE-5763
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2739.1.patch, D2739.2.patch, D2739.3.patch, 
 D2739.4.patch, D2793.1.patch, 
 Fix-TestFSErrorsExposed-2012-04-13_18_59_36.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5104) Provide a reliable intra-row pagination mechanism

2012-04-13 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5104:
--

Status: Patch Available  (was: Open)

 Provide a reliable intra-row pagination mechanism
 -

 Key: HBASE-5104
 URL: https://issues.apache.org/jira/browse/HBASE-5104
 Project: HBase
  Issue Type: Bug
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: D2799.1.patch, testFilterList.rb


 Addendum:
 Doing pagination (retrieving at most limit number of KVs at a particular 
 offset) is currently supported via the ColumnPaginationFilter. However, it 
 is not a very clean way of supporting pagination.  Some of the problems with 
 it are:
 * Normally, one would expect a query with (Filter(A) AND Filter(B)) to have 
 same results as (query with Filter(A)) INTERSECT (query with Filter(B)). This 
 is not the case for ColumnPaginationFilter as its internal state gets updated 
 depending on whether or not Filter(A) returns TRUE/FALSE for a particular 
 cell.
 * When this Filter is used in combination with other filters (e.g., doing AND 
 with another filter using FilterList), the behavior of the query depends on 
 the order of filters in the FilterList. This is not ideal.
 * ColumnPaginationFilter is a stateful filter which ends up counting multiple 
 versions of the cell as separate values even if another filter upstream or 
 the ScanQueryMatcher is going to reject the value for other reasons.
 Seems like we need a reliable way to do pagination. The particular use case 
 that prompted this JIRA is pagination within the same rowKey. For example, 
 for a given row key R, get columns with prefix P, starting at offset X (among 
 columns which have prefix P) and limit Y. Some possible fixes might be:
 1) enhance ColumnPrefixFilter to support another constructor which supports 
 limit/offset.
 2) Support pagination (limit/offset) at the Scan/Get API level (rather than 
 as a filter) [Like SQL].
 Original Post:
 Thanks Jiakai Liu for reporting this issue and doing the initial 
 investigation. Email from Jiakai below:
 Assuming that we have an index column family with the following entries:
 tag0:001:thread1
 ...
 tag1:001:thread1
 tag1:002:thread2
 ...
 tag1:010:thread10
 ...
 tag2:001:thread1
 tag2:005:thread5
 ...
 To get threads with tag1 in range [5, 10), I tried the following code:
 ColumnPrefixFilter filter1 = new 
 ColumnPrefixFilter(Bytes.toBytes(tag1));
 ColumnPaginationFilter filter2 = new ColumnPaginationFilter(5 /* limit 
 */, 5 /* offset */);
 FilterList filters = new FilterList(Operator.MUST_PASS_ALL);
 filters.addFilter(filter1);
 filters.addFilter(filter2);
 Get get = new Get(USER);
 get.addFamily(COLUMN_FAMILY);
 get.setMaxVersions(1);
 get.setFilter(filters);
 Somehow it didn't work as expected. It returned the entries as if the filter1 
 were not set.
 Turns out the ColumnPrefixFilter returns SEEK_NEXT_USING_HINT in some cases. 
 The FilterList filter does not handle this return code properly (treat it as 
 INCLUDE).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5684) Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust

2012-04-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5684:
--

Attachment: 
jira-HBASE-5684-Make-ProcessBasedLocalHBaseCluster-r-2012-04-12_20_42_02.patch

 Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust
 ---

 Key: HBASE-5684
 URL: https://issues.apache.org/jira/browse/HBASE-5684
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D2709.1.patch, D2709.2.patch, D2709.3.patch, 
 D2709.4.patch, D2757.1.patch, D2757.2.patch, 
 jira-HBASE-5684-Make-ProcessBasedLocalHBaseCluster-r-2012-04-12_20_42_02.patch


 Currently ProcessBasedLocalHBaseCluster runs on top of raw local filesystem. 
 We need it to start a process-based HDFS cluster as well. We also need to 
 make the whole thing more stable so we can use it in unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5684) Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust

2012-04-11 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5684:
--

Status: Patch Available  (was: Open)

 Make ProcessBasedLocalHBaseCluster run HDFS and make it more robust
 ---

 Key: HBASE-5684
 URL: https://issues.apache.org/jira/browse/HBASE-5684
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D2709.1.patch, D2709.2.patch, D2709.3.patch, 
 D2709.4.patch, D2757.1.patch


 Currently ProcessBasedLocalHBaseCluster runs on top of raw local filesystem. 
 We need it to start a process-based HDFS cluster as well. We also need to 
 make the whole thing more stable so we can use it in unit tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5744) Thrift server metrics should be long instead of int

2012-04-09 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5744:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.

 Thrift server metrics should be long instead of int
 ---

 Key: HBASE-5744
 URL: https://issues.apache.org/jira/browse/HBASE-5744
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2679.1.patch, D2685.1.patch, D2685.2.patch, 
 D2685.3.patch, 
 jira-HBASE-5744-89-fb-Thrift-server-metrics-should-b-2012-04-07_21_39_35.patch


 As we measure our Thrift call latencies in nanoseconds, we need to make 
 latencies long instead of int everywhere.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5744) Thrift server metrics should be long instead of int

2012-04-07 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5744:
--

Attachment: 
jira-HBASE-5744-89-fb-Thrift-server-metrics-should-b-2012-04-07_21_39_35.patch

The same patch (re-attaching to run a test on Jenkins).

 Thrift server metrics should be long instead of int
 ---

 Key: HBASE-5744
 URL: https://issues.apache.org/jira/browse/HBASE-5744
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2679.1.patch, D2685.1.patch, 
 jira-HBASE-5744-89-fb-Thrift-server-metrics-should-b-2012-04-07_21_39_35.patch


 As we measure our Thrift call latencies in nanoseconds, we need to make 
 latencies long instead of int everywhere.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits

2012-04-06 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5618:
--

Status: Patch Available  (was: Open)

 SplitLogManager - prevent unnecessary attempts to resubmits
 ---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
  Components: wal, zookeeper
Reporter: Prakash Khemani
Assignee: Prakash Khemani
 Attachments: 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch, 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch


 Currently once a watch fires that the task node has been updated (hearbeated) 
 by the worker, the splitlogmanager still quite some time before it updates 
 the last heard from time. This is because the manager currently schedules 
 another getDataSetWatch() and only after that finishes will it update the 
 task's last heard from time.
 This leads to a large number of zk-BadVersion warnings when resubmission is 
 continuously attempted and it fails.
 Two changes should be made
 (1) On a resubmission failure because of BadVersion the task's lastUpdate 
 time should get upped.
 (2) The task's lastUpdate time should get upped as soon as the 
 nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
 complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5730) [89-fb] Make HRegionThriftServer's thread pool bounded

2012-04-05 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5730:
--

Description: This JIRA is for a quick fix in 89-fb to reuse 
TBoundedThreadPoolServer in HRegionThriftServer. We will address whatever 
problems HRegionThriftServer still has in trunk in HBASE-5703.

 [89-fb] Make HRegionThriftServer's thread pool bounded
 --

 Key: HBASE-5730
 URL: https://issues.apache.org/jira/browse/HBASE-5730
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 This JIRA is for a quick fix in 89-fb to reuse TBoundedThreadPoolServer in 
 HRegionThriftServer. We will address whatever problems HRegionThriftServer 
 still has in trunk in HBASE-5703.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-03-27 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Release Note: Adds a block compression that stores the diff from the 
previous key only.  Good for big keys and small value datasets.  Makes writing 
and scanning slower but because the blocks compressed with this feature stay 
compressed when in memory up in the block cache, more data is cached.  Off by 
default (DATA_BLOCK_ENCODING=NONE on column descriptor).  To enable, set 
DATA_BLOCK_ENCODING to PREFIX, DIFF or FAST_DIFF on the column descriptor.  Set 
ENCODE_ON_DISK to true on column descriptor to have the encoding in place out 
in the hfile (on by default).  (was: Adds a block compression that stores the 
diff from the previous key only.  Good for big keys and small value datasets.  
Makes writing and scanning slower but because the blocks compressed with this 
feature stay compressed when in memory up in the block cache, more data is 
cached.  Off by default.  To enable, on the column descriptor set 
DATA_BLOCK_ENCODING to NONE, PREFIX, DIFF or FAST_DIFF.  Set ENCODE_ON_DISK to 
true on column descriptor to have the encoding in place out in the hfile (on by 
default).)

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
 D1659.1.patch, D1659.2.patch, D1659.3.patch, D447.1.patch, D447.10.patch, 
 D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, D447.15.patch, 
 D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, D447.2.patch, 
 D447.20.patch, D447.21.patch, D447.22.patch, D447.23.patch, D447.24.patch, 
 D447.25.patch, D447.26.patch, D447.3.patch, D447.4.patch, D447.5.patch, 
 D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Data-block-encoding-2011-12-23.patch, 
 Delta-encoding-2012-01-17_11_09_09.patch, 
 Delta-encoding-2012-01-25_00_45_29.patch, 
 Delta-encoding-2012-01-25_16_32_14.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5469) Add baseline compression efficiency to DataBlockEncodingTool

2012-03-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5469:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.

 Add baseline compression efficiency to DataBlockEncodingTool
 

 Key: HBASE-5469
 URL: https://issues.apache.org/jira/browse/HBASE-5469
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2409.1.patch, D2409.2.patch, 
 jira-HBASE-5469-Add-baseline-compression-efficiency--2012-03-23_15_04_41.patch


 DataBlockEncodingTool currently does not provide baseline compression 
 efficiency, e.g. Hadoop compression codec applied to unencoded data. E.g. if 
 we are using LZO to compress blocks, we would like to have the following 
 columns in the report (possibly as percentages of raw data size).
 Baseline K+V in blockcache  |   Baseline K + V on disk  (LZO compressed)  | K 
 + V  DataBlockEncoded in block cache |   K + V DataBlockEncoded + 
 LZOCompressed (on disk)
 Background: we never store compressed blocks in cache, but we always store 
 encoded data blocks in cache if data block encoding is enabled for the column 
 family.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5469) Add baseline compression efficiency to DataBlockEncodingTool

2012-03-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5469:
--

Attachment: 
jira-HBASE-5469-Add-baseline-compression-efficiency--2012-03-23_15_04_41.patch

The exact patch that was committed.

 Add baseline compression efficiency to DataBlockEncodingTool
 

 Key: HBASE-5469
 URL: https://issues.apache.org/jira/browse/HBASE-5469
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D2409.1.patch, D2409.2.patch, 
 jira-HBASE-5469-Add-baseline-compression-efficiency--2012-03-23_15_04_41.patch


 DataBlockEncodingTool currently does not provide baseline compression 
 efficiency, e.g. Hadoop compression codec applied to unencoded data. E.g. if 
 we are using LZO to compress blocks, we would like to have the following 
 columns in the report (possibly as percentages of raw data size).
 Baseline K+V in blockcache  |   Baseline K + V on disk  (LZO compressed)  | K 
 + V  DataBlockEncoded in block cache |   K + V DataBlockEncoded + 
 LZOCompressed (on disk)
 Background: we never store compressed blocks in cache, but we always store 
 encoded data blocks in cache if data block encoding is enabled for the column 
 family.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4607) Split log worker should terminate properly when waiting for znode

2012-03-22 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4607:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

The same changes committed in HBASE-5542.

 Split log worker should terminate properly when waiting for znode
 -

 Key: HBASE-4607
 URL: https://issues.apache.org/jira/browse/HBASE-4607
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Fix For: 0.94.0

 Attachments: 
 HBASE-4607_SplitLogWorker_should_correct-20111017231456-47a82ef3.patch


 This is an attempt to fix the fact that SplitLogWorker threads are not being 
 terminated properly in some unit tests. This probably does not happen in 
 production because the master always creates the log-splitting ZK node, but 
 it does happen in 89-fb. Thanks to Prakash Khemani for help on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5521) Move compression/decompression to an encoder specific encoding context

2012-03-19 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5521:
--

Attachment: 
HBASE-5521-jira-Move-compression-decompression-to-an-2012-03-19_12_12_32.patch

Attaching what has been committed.

 Move compression/decompression to an encoder specific encoding context
 --

 Key: HBASE-5521
 URL: https://issues.apache.org/jira/browse/HBASE-5521
 Project: HBase
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.96.0

 Attachments: 
 HBASE-5521-jira-Move-compression-decompression-to-an-2012-03-19_12_12_32.patch,
  HBASE-5521.1.patch, HBASE-5521.D2097.1.patch, HBASE-5521.D2097.10.patch, 
 HBASE-5521.D2097.2.patch, HBASE-5521.D2097.3.patch, HBASE-5521.D2097.4.patch, 
 HBASE-5521.D2097.5.patch, HBASE-5521.D2097.6.patch, HBASE-5521.D2097.7.patch, 
 HBASE-5521.D2097.8.patch, HBASE-5521.D2097.9.patch


 As part of working on HBASE-5313, we want to add a new columnar 
 encoder/decoder. It makes sense to move compression to be part of 
 encoder/decoder:
 1) a scanner for a columnar encoded block can do lazy decompression to a 
 specific part of a key value object
 2) avoid an extra bytes copy from encoder to hblock-writer. 
 If there is no encoder specified for a writer, the HBlock.Writer will use a 
 default compression-context to do something very similar to today's code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5521) Move compression/decompression to an encoder specific encoding context

2012-03-19 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5521:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk.

 Move compression/decompression to an encoder specific encoding context
 --

 Key: HBASE-5521
 URL: https://issues.apache.org/jira/browse/HBASE-5521
 Project: HBase
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Fix For: 0.96.0

 Attachments: 
 HBASE-5521-jira-Move-compression-decompression-to-an-2012-03-19_12_12_32.patch,
  HBASE-5521.1.patch, HBASE-5521.D2097.1.patch, HBASE-5521.D2097.10.patch, 
 HBASE-5521.D2097.2.patch, HBASE-5521.D2097.3.patch, HBASE-5521.D2097.4.patch, 
 HBASE-5521.D2097.5.patch, HBASE-5521.D2097.6.patch, HBASE-5521.D2097.7.patch, 
 HBASE-5521.D2097.8.patch, HBASE-5521.D2097.9.patch


 As part of working on HBASE-5313, we want to add a new columnar 
 encoder/decoder. It makes sense to move compression to be part of 
 encoder/decoder:
 1) a scanner for a columnar encoded block can do lazy decompression to a 
 specific part of a key value object
 2) avoid an extra bytes copy from encoder to hblock-writer. 
 If there is no encoder specified for a writer, the HBlock.Writer will use a 
 default compression-context to do something very similar to today's code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5575) Configure Arcanist lint engine for HBase

2012-03-16 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5575:
--

Attachment: Enabling-lint-2012-03-16_13_40_37.patch

 Configure Arcanist lint engine for HBase
 

 Key: HBASE-5575
 URL: https://issues.apache.org/jira/browse/HBASE-5575
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: Enabling-lint-2012-03-16_13_40_37.patch


 We need to enable Arcanist lint engine in HBase, so that a commit could be 
 checked by running arc lint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5566) [89-fb] Region server can get stuck getMaster on master failover

2012-03-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5566:
--

Reporter: Prakash Khemani  (was: Mikhail Bautin)

 [89-fb] Region server can get stuck getMaster on master failover
 

 Key: HBASE-5566
 URL: https://issues.apache.org/jira/browse/HBASE-5566
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Prakash Khemani
Assignee: Mikhail Bautin

 Reported by Prakash. We have a retry loop in HRegionServer.getMaster where we 
 do not read the location of the master from ZK, so a region server can get 
 stuck there on master failover. We need to add a unit test to reliably catch 
 this, and fix the bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5566) [89-fb] Region server can get stuck getMaster on master failover

2012-03-12 Thread Mikhail Bautin (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Bautin updated HBASE-5566:
--

Description:
This is specific to the 89-fb master. We have a retry loop in
HRegionServer.getMaster where we do not read the location of the master from
ZK, so a region server can get stuck there on master failover. We need to add a
unit test to reliably catch this, and fix the bug.

was:
Reported by Prakash. We have a retry loop in HRegionServer.getMaster where we
do not read the location of the master from ZK, so a region server can get
stuck there on master failover. We need to add a unit test to reliably catch
this, and fix the bug.

[89-fb] Region server can get stuck getMaster on master failover

Key: HBASE-5566
URL: https://issues.apache.org/jira/browse/HBASE-5566
Project: HBase
Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Prakash Khemani
Assignee: Mikhail Bautin

This is specific to the 89-fb master. We have a retry loop in
HRegionServer.getMaster where we do not read the location of the master from
ZK, so a region server can get stuck there on master failover. We need to add
a unit test to reliably catch this, and fix the bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5566) [89-fb] Region server can get stuck in getMaster on master failover

2012-03-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5566:
--

Summary: [89-fb] Region server can get stuck in getMaster on master 
failover  (was: [89-fb] Region server can get stuck getMaster on master 
failover)

 [89-fb] Region server can get stuck in getMaster on master failover
 ---

 Key: HBASE-5566
 URL: https://issues.apache.org/jira/browse/HBASE-5566
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Prakash Khemani
Assignee: Mikhail Bautin

 This is specific to the 89-fb master. We have a retry loop in 
 HRegionServer.getMaster where we do not read the location of the master from 
 ZK, so a region server can get stuck there on master failover. We need to add 
 a unit test to reliably catch this, and fix the bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4542) add filter info to slow query logging

2012-03-09 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4542:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 add filter info to slow query logging
 -

 Key: HBASE-4542
 URL: https://issues.apache.org/jira/browse/HBASE-4542
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89.20100924
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: 
 0001-jira-HBASE-4542-Add-filter-info-to-slow-query-loggin.patch, 
 Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch, 
 D1263.2.patch, D1539.1.patch


 Slow query log doesn't report filters in effect.
 For example:
 {code}
 (operationTooSlow): \
 {processingtimems:3468,client:10.138.43.206:40035,timeRange: 
 [0,9223372036854775807],\
 starttimems:1317772005821,responsesize:42411, \
 class:HRegionServer,table:myTable,families:{CF1:ALL]},\
 row:6c3b8efa132f0219b7621ed1e5c8c70b,queuetimems:0,\
 method:get,totalColumns:1,maxVersions:1,storeLimit:-1}
 {code}
 the above would suggest that all columns of myTable:CF1 are being requested 
 for the given row. But in reality there could be filters in effect (such as 
 ColumnPrefixFilter, ColumnRangeFilter, TimestampsFilter() etc.). We should 
 enhance the slow query log to capture  report this information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4542) add filter info to slow query logging

2012-03-09 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4542:
--

Fix Version/s: 0.94.0

 add filter info to slow query logging
 -

 Key: HBASE-4542
 URL: https://issues.apache.org/jira/browse/HBASE-4542
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89.20100924
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Fix For: 0.94.0

 Attachments: 
 0001-jira-HBASE-4542-Add-filter-info-to-slow-query-loggin.patch, 
 Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch, 
 D1263.2.patch, D1539.1.patch


 Slow query log doesn't report filters in effect.
 For example:
 {code}
 (operationTooSlow): \
 {processingtimems:3468,client:10.138.43.206:40035,timeRange: 
 [0,9223372036854775807],\
 starttimems:1317772005821,responsesize:42411, \
 class:HRegionServer,table:myTable,families:{CF1:ALL]},\
 row:6c3b8efa132f0219b7621ed1e5c8c70b,queuetimems:0,\
 method:get,totalColumns:1,maxVersions:1,storeLimit:-1}
 {code}
 the above would suggest that all columns of myTable:CF1 are being requested 
 for the given row. But in reality there could be filters in effect (such as 
 ColumnPrefixFilter, ColumnRangeFilter, TimestampsFilter() etc.). We should 
 enhance the slow query log to capture  report this information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well

2012-03-09 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5292:
--

Attachment: 
jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch

Rebased patch for Hadoop QA testing

 getsize per-CF metric incorrectly counts compaction related reads as well 
 --

 Key: HBASE-5292
 URL: https://issues.apache.org/jira/browse/HBASE-5292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100924
Reporter: Kannan Muthukkaruppan
 Attachments: 
 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, 
 D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch, D1617.1.patch, 
 jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch


 The per-CF getsize metric's intent was to track bytes returned (to HBase 
 clients) per-CF. [Note: We already have metrics to track # of HFileBlock's 
 read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt 
 vs. fsblockreadcnt.]
 Currently, the getsize metric gets updated for both client initiated 
 Get/Scan operations as well for compaction related reads. The metric is 
 updated in StoreScanner.java:next() when the Scan query matcher returns an 
 INCLUDE* code via a:
  HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength());
 We should not do the above in case of compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5557) [89-fb] Fix incorrect reader/writer thread interaction in HBaseTest

2012-03-09 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5557:
--

Summary: [89-fb] Fix incorrect reader/writer thread interaction in 
HBaseTest  (was: [89-fb] Fix incorrect writer / thread interaction in HBaseTest)

 [89-fb] Fix incorrect reader/writer thread interaction in HBaseTest
 ---

 Key: HBASE-5557
 URL: https://issues.apache.org/jira/browse/HBASE-5557
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor

 In the HBaseTest load test we have a condition when the writer has not 
 written any keys but the reader might attempt to read key 0, resulting in a 
 failure. This bug is specific to 89-fb because it has been fixed while 
 open-sourcing HBaseTest as LoadTestTool, and those improvements still have 
 not been back-ported to 89-fb. Doing a temporary fix now and we will get to 
 the back-port later. 
 12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = 
 cfcd208495d565ef66e7dff9f98764da:0
 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to 
 get actions for key = cfcd208495d565ef66e7dff9f98764da:0
 12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = 
 cfcd208495d565ef66e7dff9f98764da:0
 12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = 
 cfcd208495d565ef66e7dff9f98764da:0
 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to 
 get actions for key = cfcd208495d565ef66e7dff9f98764da:0
 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to 
 get actions for key = cfcd208495d565ef66e7dff9f98764da:0
 12/03/09 14:12:52 INFO utils.MultiThreadedReader: Key = 
 cfcd208495d565ef66e7dff9f98764da:0
 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: No data returned, tried to 
 get actions for key = cfcd208495d565ef66e7dff9f98764da:0
 12/03/09 14:12:52 ERROR utils.MultiThreadedReader: Aborting run -- found more 
 than three errors

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5292) getsize per-CF metric incorrectly counts compaction related reads as well

2012-03-09 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5292:
--

   Resolution: Fixed
Fix Version/s: 0.94.0
   Status: Resolved  (was: Patch Available)

 getsize per-CF metric incorrectly counts compaction related reads as well 
 --

 Key: HBASE-5292
 URL: https://issues.apache.org/jira/browse/HBASE-5292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100924
Reporter: Kannan Muthukkaruppan
 Fix For: 0.94.0

 Attachments: 
 0001-jira-HBASE-5292-Prevent-counting-getSize-on-compacti.patch, 
 D1527.1.patch, D1527.2.patch, D1527.3.patch, D1527.4.patch, D1617.1.patch, 
 jira-HBASE-5292-Prevent-counting-getSize-on-compacti-2012-03-09_13_26_52.patch


 The per-CF getsize metric's intent was to track bytes returned (to HBase 
 clients) per-CF. [Note: We already have metrics to track # of HFileBlock's 
 read for compaction vs. non-compaction cases -- e.g., compactionblockreadcnt 
 vs. fsblockreadcnt.]
 Currently, the getsize metric gets updated for both client initiated 
 Get/Scan operations as well for compaction related reads. The metric is 
 updated in StoreScanner.java:next() when the Scan query matcher returns an 
 INCLUDE* code via a:
  HRegion.incrNumericMetric(this.metricNameGetsize, copyKv.getLength());
 We should not do the above in case of compactions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5535) Make the functions in task monitor synchronized

2012-03-08 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5535:
--

Attachment: 
HBASE-5535-Make-the-functions-in-task-monitor-synchr-2012-03-08_16_33_42.patch

Liyin's two-line patch from our internal 89-fb repository.

 Make the functions in task monitor synchronized
 ---

 Key: HBASE-5535
 URL: https://issues.apache.org/jira/browse/HBASE-5535
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: 
 HBASE-5535-Make-the-functions-in-task-monitor-synchr-2012-03-08_16_33_42.patch


 There are some potential race condition in the task monitor. So update the 
 functions in task monitor to be synchronized.
 The example of the problem caused by the race condition:
 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flush 
 failed for region 
 java.lang.IndexOutOfBoundsException: Index: 1745, Size: 1744
 at java.util.ArrayList.add(ArrayList.java:367)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.AbstractList.add(AbstractList.java:91)
 at 
 org.apache.hadoop.hbase.monitoring.TaskMonitor.createStatus(TaskMonitor.java:74)
 at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1139)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:260)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:234)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5535) Make the functions in task monitor synchronized

2012-03-08 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5535:
--

Status: Patch Available  (was: Open)

 Make the functions in task monitor synchronized
 ---

 Key: HBASE-5535
 URL: https://issues.apache.org/jira/browse/HBASE-5535
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: 
 HBASE-5535-Make-the-functions-in-task-monitor-synchr-2012-03-08_16_33_42.patch


 There are some potential race condition in the task monitor. So update the 
 functions in task monitor to be synchronized.
 The example of the problem caused by the race condition:
 ERROR org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flush 
 failed for region 
 java.lang.IndexOutOfBoundsException: Index: 1745, Size: 1744
 at java.util.ArrayList.add(ArrayList.java:367)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.SubList.add(AbstractList.java:633)
 at java.util.AbstractList.add(AbstractList.java:91)
 at 
 org.apache.hadoop.hbase.monitoring.TaskMonitor.createStatus(TaskMonitor.java:74)
 at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1139)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:260)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:234)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4542) add filter info to slow query logging

2012-03-06 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4542:
--

Attachment: Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch

Rebasing patch on trunk

 add filter info to slow query logging
 -

 Key: HBASE-4542
 URL: https://issues.apache.org/jira/browse/HBASE-4542
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89.20100924
Reporter: Kannan Muthukkaruppan
Assignee: Madhuwanti Vaidya
 Attachments: 
 0001-jira-HBASE-4542-Add-filter-info-to-slow-query-loggin.patch, 
 Add-filter-info-to-slow-query-logging-2012-03-06_14_28_13.patch, 
 D1263.2.patch, D1539.1.patch


 Slow query log doesn't report filters in effect.
 For example:
 {code}
 (operationTooSlow): \
 {processingtimems:3468,client:10.138.43.206:40035,timeRange: 
 [0,9223372036854775807],\
 starttimems:1317772005821,responsesize:42411, \
 class:HRegionServer,table:myTable,families:{CF1:ALL]},\
 row:6c3b8efa132f0219b7621ed1e5c8c70b,queuetimems:0,\
 method:get,totalColumns:1,maxVersions:1,storeLimit:-1}
 {code}
 the above would suggest that all columns of myTable:CF1 are being requested 
 for the given row. But in reality there could be filters in effect (such as 
 ColumnPrefixFilter, ColumnRangeFilter, TimestampsFilter() etc.). We should 
 enhance the slow query log to capture  report this information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor

2012-02-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5357:
--

Attachment: 
Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch

 Use builder pattern in HColumnDescriptor
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, 
 D1851.4.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, 
 Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses the HColumnDescriptor refactoring. For 
 StoreFile/HFile refactoring see HBASE-5442.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor

2012-02-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5357:
--

Attachment: 
Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_25.patch

 Use builder pattern in HColumnDescriptor
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, 
 D1851.4.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_25.patch, 
 Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses the HColumnDescriptor refactoring. For 
 StoreFile/HFile refactoring see HBASE-5442.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor

2012-02-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5357:
--

Attachment: (was: 
Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_25.patch)

 Use builder pattern in HColumnDescriptor
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, 
 D1851.4.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, 
 Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses the HColumnDescriptor refactoring. For 
 StoreFile/HFile refactoring see HBASE-5442.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor

2012-02-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5357:
--

Attachment: 
Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch

 Use builder pattern in HColumnDescriptor
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1851.1.patch, D1851.2.patch, D1851.3.patch, 
 D1851.4.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-23_12_42_49.patch, 
 Use-builder-pattern-for-HColumnDescriptor-20120223113155-e387d251.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses the HColumnDescriptor refactoring. For 
 StoreFile/HFile refactoring see HBASE-5442.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5442) Use builder pattern in StoreFile and HFile

2012-02-23 Thread Mikhail Bautin (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Bautin updated HBASE-5442:
--

Resolution: Fixed
Fix Version/s: 0.94.0
Status: Resolved (was: Patch Available)

Committed to trunk.

Use builder pattern in StoreFile and HFile
--

Key: HBASE-5442
URL: https://issues.apache.org/jira/browse/HBASE-5442
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Fix For: 0.94.0

Attachments: D1893.1.patch, D1893.2.patch,
HFile-StoreFile-builder-2012-02-22_22_49_00.patch

We have five ways to create an HFile writer, two ways to create a StoreFile
writer, and the sets of parameters keep changing, creating a lot of
confusion, especially when porting patches across branches. The same thing is
happening to HColumnDescriptor. I think we should move to a builder pattern
solution, e.g.
{code:java}
HFileWriter w = HFile.getWriterBuilder(conf, some common args)
.setParameter1(value1)
.setParameter2(value2)
...
.build();
{code}
Each parameter setter being on its own line will make merges/cherry-pick work
properly, we will not have to even mention default parameters again, and we
can eliminate a dozen impossible-to-remember constructors.
This particular JIRA addresses StoreFile and HFile refactoring. For
HColumnDescriptor refactoring see HBASE-5357.

[jira] [Updated] (HBASE-5442) Use builder pattern in StoreFile and HFile

2012-02-22 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5442:
--

Attachment: HFile-StoreFile-builder-2012-02-22_22_49_00.patch

 Use builder pattern in StoreFile and HFile
 --

 Key: HBASE-5442
 URL: https://issues.apache.org/jira/browse/HBASE-5442
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1893.1.patch, D1893.2.patch, 
 HFile-StoreFile-builder-2012-02-22_22_49_00.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses StoreFile and HFile refactoring. For 
 HColumnDescriptor refactoring see HBASE-5357.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5442) Use builder pattern in StoreFile and HFile

2012-02-22 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5442:
--

Status: Patch Available  (was: Open)

 Use builder pattern in StoreFile and HFile
 --

 Key: HBASE-5442
 URL: https://issues.apache.org/jira/browse/HBASE-5442
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1893.1.patch, D1893.2.patch, 
 HFile-StoreFile-builder-2012-02-22_22_49_00.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.
 This particular JIRA addresses StoreFile and HFile refactoring. For 
 HColumnDescriptor refactoring see HBASE-5357.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5357) Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation

2012-02-21 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5357:
--

Attachment: 
Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch

 Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1851.1.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5357) Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation

2012-02-21 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5357:
--

Status: Patch Available  (was: Open)

 Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D1851.1.patch, 
 Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch


 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5357) Use builder pattern in HColumnDescriptor

2012-02-21 Thread Mikhail Bautin (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Bautin updated HBASE-5357:
--

Description:
We have five ways to create an HFile writer, two ways to create a StoreFile
writer, and the sets of parameters keep changing, creating a lot of confusion,
especially when porting patches across branches. The same thing is happening to
HColumnDescriptor. I think we should move to a builder pattern solution, e.g.

{code:java}
HFileWriter w = HFile.getWriterBuilder(conf, some common args)
.setParameter1(value1)
.setParameter2(value2)
...
.build();
{code}

Each parameter setter being on its own line will make merges/cherry-pick work
properly, we will not have to even mention default parameters again, and we can
eliminate a dozen impossible-to-remember constructors.

This particular JIRA addresses the HColumnDescriptor refactoring. For
StoreFile/HFile refactoring see HBASE-5442.

was:
We have five ways to create an HFile writer, two ways to create a StoreFile
writer, and the sets of parameters keep changing, creating a lot of confusion,
especially when porting patches across branches. The same thing is happening to
HColumnDescriptor. I think we should move to a builder pattern solution, e.g.

{code:java}
HFileWriter w = HFile.getWriterBuilder(conf, some common args)
.setParameter1(value1)
.setParameter2(value2)
...
.build();
{code}

Summary: Use builder pattern in HColumnDescriptor (was: Use builder
pattern in StoreFile, HFile, and HColumnDescriptor instantiation)

Use builder pattern in HColumnDescriptor

Key: HBASE-5357
URL: https://issues.apache.org/jira/browse/HBASE-5357
Project: HBase
Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Attachments: D1851.1.patch,
Use-builder-pattern-for-HColumnDescriptor-2012-02-21_19_13_35.patch

We have five ways to create an HFile writer, two ways to create a StoreFile
writer, and the sets of parameters keep changing, creating a lot of
confusion, especially when porting patches across branches. The same thing is
happening to HColumnDescriptor. I think we should move to a builder pattern
solution, e.g.
{code:java}
HFileWriter w = HFile.getWriterBuilder(conf, some common args)
.setParameter1(value1)
.setParameter2(value2)
...
.build();
{code}
Each parameter setter being on its own line will make merges/cherry-pick work
properly, we will not have to even mention default parameters again, and we
can eliminate a dozen impossible-to-remember constructors.
This particular JIRA addresses the HColumnDescriptor refactoring. For
StoreFile/HFile refactoring see HBASE-5442.

[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5387:
--

Status: Patch Available  (was: Open)

 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5387.txt, D1719.1.patch, D1719.2.patch, D1719.3.patch, 
 D1719.4.patch, D1719.5.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch, 
 Fix-deflater-leak-2012-02-11_17_13_10.patch, 
 Fix-deflater-leak-2012-02-12_00_37_27.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure every time we create a compression stream. 
 The native data structure is only deallocated in the finalizer. This is one 
 suspected cause of recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5387:
--

Status: Open  (was: Patch Available)

 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5387.txt, D1719.1.patch, D1719.2.patch, D1719.3.patch, 
 D1719.4.patch, D1719.5.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch, 
 Fix-deflater-leak-2012-02-11_17_13_10.patch, 
 Fix-deflater-leak-2012-02-12_00_37_27.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure every time we create a compression stream. 
 The native data structure is only deallocated in the finalizer. This is one 
 suspected cause of recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5387:
--

Attachment: Fix-deflater-leak-2012-02-12_00_37_27.patch

 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5387.txt, D1719.1.patch, D1719.2.patch, D1719.3.patch, 
 D1719.4.patch, D1719.5.patch, Fix-deflater-leak-2012-02-10_18_48_45.patch, 
 Fix-deflater-leak-2012-02-11_17_13_10.patch, 
 Fix-deflater-leak-2012-02-12_00_37_27.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure every time we create a compression stream. 
 The native data structure is only deallocated in the finalizer. This is one 
 suspected cause of recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-11 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5387:
--

Attachment: Fix-deflater-leak-2012-02-11_17_13_10.patch

 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Critical
 Fix For: 0.94.0

 Attachments: D1719.1.patch, D1719.2.patch, 
 Fix-deflater-leak-2012-02-10_18_48_45.patch, 
 Fix-deflater-leak-2012-02-11_17_13_10.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure every time we create a compression stream. 
 The native data structure is only deallocated in the finalizer. This is one 
 suspected cause of recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5369) Compaction selection based on the hotness of the HFile's block in the block cache

2012-02-10 Thread Mikhail Bautin (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Bautin updated HBASE-5369:
--

Description:
HBase reserves a large set memory for the block cache and the cached blocks
will be age out in a LRU fashion. Obviously, we don't want to age out the
blocks which are still hot. However, when the compactions are starting, these
hot blocks may naturally be invalid. Considering that the block cache has
already known which HFiles these hot blocks come from, the compaction selection
algorithm could just simply skip compact these HFiles until these block cache
become cold.

was:
HBase reserves a large set memory for the block cache and the cached blocks
will be age out in a LRU fashion. Obviously, we don't want to age out the
blocks which are still hot. However, when the compactions are starting, these
hot blocks may naturally be invalid. Considering that the block cache has
already known which HFiles these hot blocks come from, the compaction selection
algorithm could just simply skip compact these HFiles until these block cache
become cold.
Furthermore, the HBase could compact multiple HFiles into two HFiles. One of
them only contains hot blocks which are supposed be cached directly.

Compaction selection based on the hotness of the HFile's block in the block
cache
-

Key: HBASE-5369
URL: https://issues.apache.org/jira/browse/HBASE-5369
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

HBase reserves a large set memory for the block cache and the cached blocks
will be age out in a LRU fashion. Obviously, we don't want to age out the
blocks which are still hot. However, when the compactions are starting, these
hot blocks may naturally be invalid. Considering that the block cache has
already known which HFiles these hot blocks come from, the compaction
selection algorithm could just simply skip compact these HFiles until these
block cache become cold.

[jira] [Updated] (HBASE-5382) Test that we always cache index and bloom blocks

2012-02-10 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5382:
--

Status: Patch Available  (was: Open)

 Test that we always cache index and bloom blocks
 

 Key: HBASE-5382
 URL: https://issues.apache.org/jira/browse/HBASE-5382
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch


 This is a unit test that should have been part of HBASE-4683 but was not 
 committed. The original test was reviewed https://reviews.facebook.net/D807. 
 Submitting unit test as a separate JIRA and patch, and extending the scope of 
 the test to also handle the case when block cache is enabled for the column 
 family. The new review is at https://reviews.facebook.net/D1695.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5382) Test that we always cache index and bloom blocks

2012-02-10 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5382:
--

Assignee: Mikhail Bautin

 Test that we always cache index and bloom blocks
 

 Key: HBASE-5382
 URL: https://issues.apache.org/jira/browse/HBASE-5382
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch


 This is a unit test that should have been part of HBASE-4683 but was not 
 committed. The original test was reviewed https://reviews.facebook.net/D807. 
 Submitting unit test as a separate JIRA and patch, and extending the scope of 
 the test to also handle the case when block cache is enabled for the column 
 family. The new review is at https://reviews.facebook.net/D1695.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5382) Test that we always cache index and bloom blocks

2012-02-10 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5382:
--

Attachment: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch

 Test that we always cache index and bloom blocks
 

 Key: HBASE-5382
 URL: https://issues.apache.org/jira/browse/HBASE-5382
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch


 This is a unit test that should have been part of HBASE-4683 but was not 
 committed. The original test was reviewed https://reviews.facebook.net/D807. 
 Submitting unit test as a separate JIRA and patch, and extending the scope of 
 the test to also handle the case when block cache is enabled for the column 
 family. The new review is at https://reviews.facebook.net/D1695.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5382) Test that we always cache index and bloom blocks

2012-02-10 Thread Mikhail Bautin (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Bautin updated HBASE-5382:
--

Description: This is a unit test that should have been part of HBASE-4683
but was not committed. The original test was reviewed as part of
https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and
patch, and extending the scope of the test to also handle the case when block
cache is enabled for the column family. The new review is at
https://reviews.facebook.net/D1695. (was: This is a unit test that should have
been part of HBASE-4683 but was not committed. The original test was reviewed
https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and
patch, and extending the scope of the test to also handle the case when block
cache is enabled for the column family. The new review is at
https://reviews.facebook.net/D1695.)

Test that we always cache index and bloom blocks

Key: HBASE-5382
URL: https://issues.apache.org/jira/browse/HBASE-5382
Project: HBase
Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Attachments: TestForceCacheImportantBlocks-2012-02-10_11_07_15.patch

This is a unit test that should have been part of HBASE-4683 but was not
committed. The original test was reviewed as part of
https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA
and patch, and extending the scope of the test to also handle the case when
block cache is enabled for the column family. The new review is at
https://reviews.facebook.net/D1695.

[jira] [Updated] (HBASE-5387) Reuse compression streams in HFileBlock.Writer

2012-02-10 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5387:
--

Attachment: Fix-deflater-leak-2012-02-10_18_48_45.patch

 Reuse compression streams in HFileBlock.Writer
 --

 Key: HBASE-5387
 URL: https://issues.apache.org/jira/browse/HBASE-5387
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
 Attachments: Fix-deflater-leak-2012-02-10_18_48_45.patch


 We need to to reuse compression streams in HFileBlock.Writer instead of 
 allocating them every time. The motivation is that when using Java's built-in 
 implementation of Gzip, we allocate a new GZIPOutputStream object and an 
 associated native data structure any time. This is one suspected cause of 
 recent TestHFileBlock failures on Hadoop QA: 
 https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5230) Ensure compactions do not cache-on-write data blocks

2012-02-09 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5230:
--

  Resolution: Fixed
Release Note: Committed into both trunk and 89-fb
  Status: Resolved  (was: Patch Available)

 Ensure compactions do not cache-on-write data blocks
 

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 D1353.4.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5230) Ensure compactions do not cache-on-write data blocks

2012-02-09 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5230:
--

Release Note:   (was: Committed into both trunk and 89-fb)

 Ensure compactions do not cache-on-write data blocks
 

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 D1353.4.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5357) Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation

2012-02-08 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5357:
--

Description: 
We have five ways to create an HFile writer, two ways to create a StoreFile 
writer, and the sets of parameters keep changing, creating a lot of confusion, 
especially when porting patches across branches. The same thing is happening to 
HColumnDescriptor. I think we should move to a builder pattern solution, e.g.

{code:java}
  HFileWriter w = HFile.getWriterBuilder(conf, some common args)
  .setParameter1(value1)
  .setParameter2(value2)
  ...
  .build();
{code}

Each parameter setter being on its own line will make merges/cherry-pick work 
properly, we will not have to even mention default parameters again, and we can 
eliminate a dozen impossible-to-remember constructors.


  was:
We have five ways to create an HFile writer, two ways to create a StoreFile 
writer, and the sets of parameters keep changing, creating a lot of confusion, 
especially when porting patches across branches. The same thing is happening to 
HColumnDescriptor. I think we should move to a builder pattern solution, e.g.

{code:java}
  HFileWriter w = HFile.getWriterBuilder(conf, some common args)
  .setParameter1(value1)
  .setParameter2(value2)
  ...
  .instantiate();
{code}

Each parameter setter being on the same line will make merges/cherry-pick work 
properly, we will not have to even mention default parameters again, and we can 
eliminate a dozen impossible-to-remember constructors.



 Use builder pattern in StoreFile, HFile, and HColumnDescriptor instantiation
 

 Key: HBASE-5357
 URL: https://issues.apache.org/jira/browse/HBASE-5357
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 We have five ways to create an HFile writer, two ways to create a StoreFile 
 writer, and the sets of parameters keep changing, creating a lot of 
 confusion, especially when porting patches across branches. The same thing is 
 happening to HColumnDescriptor. I think we should move to a builder pattern 
 solution, e.g.
 {code:java}
   HFileWriter w = HFile.getWriterBuilder(conf, some common args)
   .setParameter1(value1)
   .setParameter2(value2)
   ...
   .build();
 {code}
 Each parameter setter being on its own line will make merges/cherry-pick work 
 properly, we will not have to even mention default parameters again, and we 
 can eliminate a dozen impossible-to-remember constructors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5010) Filter HFiles based on TTL

2012-01-26 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5010:
--

   Resolution: Fixed
Fix Version/s: 0.94.0
 Assignee: Mikhail Bautin  (was: Zhihong Yu)
   Status: Resolved  (was: Patch Available)

A follow-up fix was submitted as part of HBASE-5274 to bring the trunk fix for 
this issue to parity with the 89-fb fix. Resolving.

 Filter HFiles based on TTL
 --

 Key: HBASE-5010
 URL: https://issues.apache.org/jira/browse/HBASE-5010
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.94.0

 Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, 
 D909.2.patch, D909.3.patch, D909.4.patch, D909.5.patch, D909.6.patch


 In ScanWildcardColumnTracker we have
 {code:java}
  
   this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;
   ...
   private boolean isExpired(long timestamp) {
 return timestamp  oldestStamp;
   }
 {code}
 but this time range filtering does not participate in HFile selection. In one 
 real case this caused next() calls to time out because all KVs in a table got 
 expired, but next() had to iterate over the whole table to find that out. We 
 should be able to filter out those HFiles right away. I think a reasonable 
 approach is to add a default timerange filter to every scan for a CF with a 
 finite TTL and utilize existing filtering in 
 StoreFile.Reader.passesTimerangeFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-25 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding-2012-01-25_00_45_29.patch

Submitting for Jenkins testing. This corresponds to the latest patch on 
Phabricator: https://reviews.facebook.net/D447?vs=id=4407whitespace=ignore-all


 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
 D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, 
 D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, 
 D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, 
 D447.23.patch, D447.24.patch, D447.25.patch, D447.3.patch, D447.4.patch, 
 D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Data-block-encoding-2011-12-23.patch, 
 Delta-encoding-2012-01-17_11_09_09.patch, 
 Delta-encoding-2012-01-25_00_45_29.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5230) Ensure compactions do not cache-on-write data blocks

2012-01-25 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5230:
--

Issue Type: Improvement  (was: Test)
   Summary: Ensure compactions do not cache-on-write data blocks  (was: 
Unit test to ensure compactions don't cache data on write)

 Ensure compactions do not cache-on-write data blocks
 

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 D1353.4.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-25 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding-2012-01-25_16_32_14.patch

Attaching a patch rebased on HBASE-5230 and addressing Jerry's new comment.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
 D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, 
 D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, 
 D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, 
 D447.23.patch, D447.24.patch, D447.25.patch, D447.26.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding-2012-01-17_11_09_09.patch, 
 Delta-encoding-2012-01-25_00_45_29.patch, 
 Delta-encoding-2012-01-25_16_32_14.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3796) Per-Store Entries in Compaction Queue

2012-01-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-3796:
--

Release Note:   (was: Sorry, it seems like I re-opened the wrong patch 
instead of HBASE-3976. Restoring the Fixed status.)

 Per-Store Entries in Compaction Queue
 -

 Key: HBASE-3796
 URL: https://issues.apache.org/jira/browse/HBASE-3796
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.92.1

 Attachments: HBASE-3796-fixed.patch, HBASE-3796.patch


 Although compaction is decided on a per-store basis, right now the 
 CompactSplitThread only deals at the Region level for queueing.  Store-level 
 compaction queue entries will give us more visibility into compaction 
 workload + allow us to stop summarizing priorities.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5230) Unit test to ensure compactions don't cache data on write

2012-01-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5230:
--

Attachment: Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch

Attaching the most recent patch (rebased on trunk changes -- maybe even 
identical).

 Unit test to ensure compactions don't cache data on write
 -

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5130) A map-reduce wrapper for HBase test suite (mr-test-runner)

2012-01-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5130:
--

Description: We have a tool we call mrunit (but will call 
mr-test-runner in the open-source version) that runs HBase unit tests on a 
map-reduce cluster. We need modify it to use distributed cache to deploy the 
code on the cluster instead of our internal deployment tool, and open-source 
it.  (was: We have a tool we call mrunit that runs HBase unit tests on a 
map-reduce cluster. We need modify it to use distributed cache to deploy the 
code on the cluster instead of our internal deployment tool, and open-source 
it.)
Summary: A map-reduce wrapper for HBase test suite (mr-test-runner)  
(was: A map-reduce wrapper for HBase test suite (mrunit))

 A map-reduce wrapper for HBase test suite (mr-test-runner)
 

 Key: HBASE-5130
 URL: https://issues.apache.org/jira/browse/HBASE-5130
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 We have a tool we call mrunit (but will call mr-test-runner in the 
 open-source version) that runs HBase unit tests on a map-reduce cluster. We 
 need modify it to use distributed cache to deploy the code on the cluster 
 instead of our internal deployment tool, and open-source it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5230) Unit test to ensure compactions don't cache data on write

2012-01-23 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5230:
--

Attachment: Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch

A new patch addressing Nicolas's comments.

 Unit test to ensure compactions don't cache data on write
 -

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 D1353.4.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5230) Unit test to ensure compactions don't cache data on write

2012-01-21 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5230:
--

Attachment: Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch

Attaching patch for Jenkins testing.

 Unit test to ensure compactions don't cache data on write
 -

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5230) Unit test to ensure compactions don't cache data on write

2012-01-21 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5230:
--

Status: Patch Available  (was: Open)

 Unit test to ensure compactions don't cache data on write
 -

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-17 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding-2012-01-17_11_09_09.patch

Appending a patch that can be applied by Hadoop QA.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
 D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, 
 D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, 
 D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, 
 D447.23.patch, D447.24.patch, D447.3.patch, D447.4.patch, D447.5.patch, 
 D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Data-block-encoding-2011-12-23.patch, 
 Delta-encoding-2012-01-17_11_09_09.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-13 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding.patch-2012-01-13_12_20_07.patch

Attaching a patch generated using 

git format-patch --no-prefix HEAD^..HEAD

that can be applied by the normal patch command.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, 
 D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-07 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding.patch-2012-01-07_14_12_48.patch

Attaching a patch rebased on trunk changes.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, 
 D447.2.patch, D447.20.patch, D447.3.patch, D447.4.patch, D447.5.patch, 
 D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-05 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding.patch-2012-01-05_15_16_43.patch

Uploading a patch that should apply clearly.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.2.patch, 
 D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, 
 D447.8.patch, D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-05 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding.patch-2012-01-05_16_31_44.patch

Fixing an NPE in EncodedSeekPerformanceTest.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, 
 D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, 
 D447.7.patch, D447.8.patch, D447.9.patch, 
 Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-05 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding.patch-2012-01-05_16_31_44_copy.patch

Attaching a patch that applies. (A new unit test is coming for HFile v1 to 
encoded HFile v2 upgrade, so the patch is not final yet.)

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, 
 D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, 
 D447.7.patch, D447.8.patch, D447.9.patch, 
 Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-05 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding.patch-2012-01-05_18_50_47.patch

Adding a test that upgrades from HFile v1 to encoded HFile v2.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-v16.txt, 4218.txt, D447.1.patch, 
 D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, D447.14.patch, 
 D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, D447.19.patch, 
 D447.2.patch, D447.20.patch, D447.3.patch, D447.4.patch, D447.5.patch, 
 D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Data-block-encoding-2011-12-23.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-22 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding.patch-2011-12-22_11_52_07.patch

Appending a new version of patch that should apply using the patch command, 
compile, and pass TestHeapSize on Jenkins.

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, 
 D447.12.patch, D447.13.patch, D447.2.patch, D447.3.patch, D447.4.patch, 
 D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-22 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Status: Patch Available  (was: Open)

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, 
 D447.12.patch, D447.13.patch, D447.2.patch, D447.3.patch, D447.4.patch, 
 D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-22 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Status: Open  (was: Patch Available)

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, 
 D447.12.patch, D447.13.patch, D447.2.patch, D447.3.patch, D447.4.patch, 
 D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-21 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: 0001-Delta-encoding.patch

Adding a patch generated by git format-patch --no-prefix, since those 
auto-generated by Phabricator do not apply with the patch command for some 
reason.

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, D447.1.patch, D447.10.patch, D447.11.patch, 
 D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, D447.6.patch, 
 D447.7.patch, D447.8.patch, D447.9.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks

2011-12-14 Thread Mikhail Bautin (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Bautin updated HBASE-4683:
--

Attachment: 0001-Cache-important-block-types.patch

Attaching the patch rebased on top of r1214519.

Always cache index and bloom blocks
---

Key: HBASE-4683
URL: https://issues.apache.org/jira/browse/HBASE-4683
Project: HBase
Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Mikhail Bautin
Priority: Minor
Fix For: 0.92.0, 0.94.0

Attachments: 0001-Cache-important-block-types.patch, 4683-v2.txt,
4683.txt, D807.1.patch, D807.2.patch, D807.3.patch, HBASE-4683-0.92-v2.patch,
HBASE-4683-v3.patch

This would add a new boolean config option: hfile.block.cache.datablocks
Default would be true.
Setting this to false allows HBase in a mode where only index blocks are
cached, which is useful for analytical scenarios where a useful working set
of the data cannot be expected to fit into the (aggregate) cache.
This is the equivalent of setting cacheBlocks to false on all scans
(including scans on behalf of gets).
I would like to get a general feeling about what folks think about this.
The change itself would be simple.
Update (Mikhail): we probably don't need a new conf option. Instead, we will
make index blocks cached by default.

[jira] [Updated] (HBASE-4683) Always cache index blocks

2011-12-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4683:
--

Summary: Always cache index blocks  (was: Create config option to only 
cache index blocks)

 Always cache index blocks
 -

 Key: HBASE-4683
 URL: https://issues.apache.org/jira/browse/HBASE-4683
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 4683-v2.txt, 4683.txt


 This would add a new boolean config option: hfile.block.cache.datablocks
 Default would be true.
 Setting this to false allows HBase in a mode where only index blocks are 
 cached, which is useful for analytical scenarios where a useful working set 
 of the data cannot be expected to fit into the (aggregate) cache.
 This is the equivalent of setting cacheBlocks to false on all scans 
 (including scans on behalf of gets).
 I would like to get a general feeling about what folks think about this.
 The change itself would be simple.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5010) Filter HFiles based on TTL

2011-12-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5010:
--

Description: 
In ScanWildcardColumnTracker we have

{code:java}
 
  this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;

  ...

  private boolean isExpired(long timestamp) {
return timestamp  oldestStamp;
  }
{code}

but this time range filtering does not participate in HFile selection. In one 
real case this caused next() calls to time out because all KVs in a table got 
expired, but next() had to iterate over the whole table to find that out. We 
should be able to filter out those HFiles right away. I think a reasonable 
approach is to add a default timerange filter to every scan for a CF with a 
finite TTL and utilize existing filtering in 
StoreFile.Reader.passesTimerangeFilter.


  was:
In ScanWildcardColumnTracker we have

{
this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;

...

  private boolean isExpired(long timestamp) {
return timestamp  oldestStamp;
  }
}

but this time range filtering does not participate in HFile selection. In one 
real case this caused next() calls to time out because all KVs in a table got 
expired, but next() had to iterate over the whole table to find that out. We 
should be able to filter out those HFiles right away. I think a reasonable 
approach is to add a default timerange filter to every scan for a CF with a 
finite TTL and utilize existing filtering in 
StoreFile.Reader.passesTimerangeFilter.



 Filter HFiles based on TTL
 --

 Key: HBASE-5010
 URL: https://issues.apache.org/jira/browse/HBASE-5010
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin

 In ScanWildcardColumnTracker we have
 {code:java}
  
   this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;
   ...
   private boolean isExpired(long timestamp) {
 return timestamp  oldestStamp;
   }
 {code}
 but this time range filtering does not participate in HFile selection. In one 
 real case this caused next() calls to time out because all KVs in a table got 
 expired, but next() had to iterate over the whole table to find that out. We 
 should be able to filter out those HFiles right away. I think a reasonable 
 approach is to add a default timerange filter to every scan for a CF with a 
 finite TTL and utilize existing filtering in 
 StoreFile.Reader.passesTimerangeFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4683) Always cache index and bloom blocks

2011-12-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4683:
--

Summary: Always cache index and bloom blocks  (was: Always cache index 
blocks)

 Always cache index and bloom blocks
 ---

 Key: HBASE-4683
 URL: https://issues.apache.org/jira/browse/HBASE-4683
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Mikhail Bautin
Priority: Minor
 Fix For: 0.94.0

 Attachments: 4683-v2.txt, 4683.txt, HBASE-4683-v3.patch


 This would add a new boolean config option: hfile.block.cache.datablocks
 Default would be true.
 Setting this to false allows HBase in a mode where only index blocks are 
 cached, which is useful for analytical scenarios where a useful working set 
 of the data cannot be expected to fit into the (aggregate) cache.
 This is the equivalent of setting cacheBlocks to false on all scans 
 (including scans on behalf of gets).
 I would like to get a general feeling about what folks think about this.
 The change itself would be simple.
 Update (Mikhail): we probably don't need a new conf option. Instead, we will 
 make index blocks cached by default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-12-12 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Status: Patch Available  (was: Open)

Testing current version on Jenkins. Not ready to commit yet -- more testing 
required.

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 D447.1.patch, D447.2.patch, D447.3.patch, D447.4.patch, D447.5.patch, 
 D447.6.patch, D447.7.patch, Delta_encoding_with_memstore_TS.patch, 
 open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-07 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Status: Open  (was: Patch Available)

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, 
 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, 
 D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch, 
 D549.9.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-07 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Attachment: 0003-HBase-cluster-test-tool.patch

Attaching the most recent patch.

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, 
 0002-HBase-cluster-test-tool.patch, 0003-HBase-cluster-test-tool.patch, 
 D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, 
 D549.6.patch, D549.7.patch, D549.8.patch, D549.9.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-07 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Status: Patch Available  (was: Open)

One more round of Jenkins testing for the patch.

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, 
 0002-HBase-cluster-test-tool.patch, 0003-HBase-cluster-test-tool.patch, 
 D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, 
 D549.6.patch, D549.7.patch, D549.8.patch, D549.9.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-06 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Status: Patch Available  (was: Open)

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, 
 D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, 
 D549.7.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-06 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Status: Open  (was: Patch Available)

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, 
 D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, 
 D549.7.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-06 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Status: Open  (was: Patch Available)

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, 
 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, 
 D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-06 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Status: Patch Available  (was: Open)

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, 
 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, 
 D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-06 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Attachment: 0002-HBase-cluster-test-tool.patch

Uploading a patch for Jenkins testing.

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, 
 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, 
 D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-05 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Status: Patch Available  (was: Open)

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, 
 D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, 
 D549.7.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)

2011-12-05 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4908:
--

Attachment: 0001-HBase-cluster-test-tool.patch

 HBase cluster test tool (port from 0.89-fb)
 ---

 Key: HBASE-4908
 URL: https://issues.apache.org/jira/browse/HBASE-4908
 Project: HBase
  Issue Type: Sub-task
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, 
 D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, 
 D549.7.patch


 Porting one of our HBase cluster test tools (a single-process multi-threaded 
 load generator and verifier) from 0.89-fb to trunk.
 I cleaned up the code a bit compared to what's in 0.89-fb, and discovered 
 that it has some features that I have not tried yet (some kind of a kill 
 test, and some way to run HBase as multiple processes on one machine).
 The main utility of this piece of code for us has been the HBaseClusterTest 
 command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a 
 load test in our five-node dev cluster testing, e.g.:
 hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn 
 load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression 
 GZIP
 I will be using this code to load-test the delta encoding patch and making 
 fixes, but I am submitting the patch for early feedback. I will probably try 
 out its other functionality and comment on how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-11-28 Thread Mikhail Bautin (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta_encoding_with_memstore_TS.patch

Attaching the most recent patch for testing on Jenkins. This is still pending
cluster testing.

Delta Encoding of KeyValues (aka prefix compression)
-

Key: HBASE-4218
URL: https://issues.apache.org/jira/browse/HBASE-4218
Project: HBase
Issue Type: Improvement
Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Labels: compression
Attachments: D447.1.patch, D447.2.patch, D447.3.patch, D447.4.patch,
D447.5.patch, Delta_encoding_with_memstore_TS.patch, open-source.diff

A compression for keys. Keys are sorted in HFile and they are usually very
similar. Because of that, it is possible to design better compression than
general purpose algorithms,
It is an additional step designed to be used in memory. It aims to save
memory in cache as well as speeding seeks within HFileBlocks. It should
improve performance a lot, if key lengths are larger than value lengths. For
example, it makes a lot of sense to use it when value is a counter.
Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes)
shows that I could achieve decent level of compression:
key compression ratio: 92%
total compression ratio: 85%
LZO on the same data: 85%
LZO after delta encoding: 91%
While having much better performance (20-80% faster decompression ratio than
LZO). Moreover, it should allow far more efficient seeking which should
improve performance a bit.
It seems that a simple compression algorithms are good enough. Most of the
savings are due to prefix compression, int128 encoding, timestamp diffs and
bitfields to avoid duplication. That way, comparisons of compressed data can
be much faster than a byte comparator (thanks to prefix compression and
bitfields).
In order to implement it in HBase two important changes in design will be
needed:
-solidify interface to HFileBlock / HFileReader Scanner to provide seeking
and iterating; access to uncompressed buffer in HFileBlock will have bad
performance
-extend comparators to support comparison assuming that N first bytes are
equal (or some fields are equal)
Link to a discussion about something similar:
http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-25 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4863:
--

Status: Open  (was: Patch Available)

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 
 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 
 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
 D531.2.patch, D531.3.patch, D531.4.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-25 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4863:
--

Attachment: 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 
 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, 
 0002-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
 D531.2.patch, D531.3.patch, D531.4.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4863:
--

Status: Patch Available  (was: Open)

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 
 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
 D531.2.patch, D531.3.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4863:
--

Attachment: 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch

The same as D531.3.patch but generated using git format-patch --no-prefix 
HEAD^..HEAD so that it can be applied using the normal patch command.

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 
 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
 D531.2.patch, D531.3.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics

2011-11-20 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4809:
--

Attachment: HBASE-4809_Per_CF_set_RPC_metrics.patch

This corresponds to D483.3.patch.

 Per-CF set RPC metrics
 --

 Key: HBASE-4809
 URL: https://issues.apache.org/jira/browse/HBASE-4809
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D483.1.patch, D483.2.patch, D483.3.patch, 
 HBASE-4809_Per_CF_set_RPC_metrics.patch


 Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to 
 trunk. For each mutation signature (a set of column families involved in an 
 RPC request) we increment several metrics, allowing to monitor access 
 patterns.  We deal with guarding against an explosion of the number of 
 metrics in HBASE-4638 (which might even be implemented as part of this JIRA).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics

2011-11-20 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4809:
--

Release Note: Testing the patch on Hudson.
  Status: Patch Available  (was: Open)

 Per-CF set RPC metrics
 --

 Key: HBASE-4809
 URL: https://issues.apache.org/jira/browse/HBASE-4809
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D483.1.patch, D483.2.patch, D483.3.patch, 
 HBASE-4809_Per_CF_set_RPC_metrics.patch


 Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to 
 trunk. For each mutation signature (a set of column families involved in an 
 RPC request) we increment several metrics, allowing to monitor access 
 patterns.  We deal with guarding against an explosion of the number of 
 metrics in HBASE-4638 (which might even be implemented as part of this JIRA).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4795) Fix TestHFileBlock when running on a 32-bit JVM

2011-11-15 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4795:
--

Priority: Minor  (was: Major)

 Fix TestHFileBlock when running on a 32-bit JVM
 ---

 Key: HBASE-4795
 URL: https://issues.apache.org/jira/browse/HBASE-4795
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D459.1.patch


 Our Hudson test server seems to run a 32-bit JVM. This patch fixes 
 TestHFileBlock to work correctly for both 64-bit and 32-bit JVM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4795) Fix TestHFileBlock when running on a 32-bit JVM

2011-11-15 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4795:
--

Status: Patch Available  (was: Open)

 Fix TestHFileBlock when running on a 32-bit JVM
 ---

 Key: HBASE-4795
 URL: https://issues.apache.org/jira/browse/HBASE-4795
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D459.1.patch


 Our Hudson test server seems to run a 32-bit JVM. This patch fixes 
 TestHFileBlock to work correctly for both 64-bit and 32-bit JVM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4795) Fix TestHFileBlock when running on a 32-bit JVM

2011-11-15 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4795:
--

Status: Open  (was: Patch Available)

 Fix TestHFileBlock when running on a 32-bit JVM
 ---

 Key: HBASE-4795
 URL: https://issues.apache.org/jira/browse/HBASE-4795
 Project: HBase
  Issue Type: Bug
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D459.1.patch, D459.2.patch


 Our Hudson test server seems to run a 32-bit JVM. This patch fixes 
 TestHFileBlock to work correctly for both 64-bit and 32-bit JVM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 111 matches

Mail list logo