[jira] [Updated] (HBASE-4489) Better key splitting in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Revell updated HBASE-4489: --- Release Note: The split algorithm used by RegionSplitter is now a required parameter. Previously there was one split algorithm called MD5StringSplit, which was the default. MD5StringSplit has been renamed to HexStringSplit, and tweaked so that its maximum key is now instead of 7FFF. A new split algorithm UniformSplit has been added which treats keys as arbitrary bytes. Better key splitting in RegionSplitter -- Key: HBASE-4489 URL: https://issues.apache.org/jira/browse/HBASE-4489 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Fix For: 0.90.5 Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-branch0.90-v2.patch, HBASE-4489-branch0.90-v3.patch, HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch, HBASE-4489-trunk-v3.patch, HBASE-4489-trunk-v4.patch, HBASE-4489-trunk-v5.patch The RegionSplitter utility allows users to create a pre-split table from the command line or do a rolling split on an existing table. It supports pluggable split algorithms that implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes keys fall in the range from ASCII string to ASCII string 7FFF. This is not a sane default, and seems useless to most users. Users are likely to be surprised by the fact that all the region splits occur in in the byte range of ASCII characters. A better default split algorithm would be one that evenly divides the space of all bytes, which is what this patch does. Making a table with five regions would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4489) Better key splitting in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Revell updated HBASE-4489: --- Attachment: HBASE-4489-trunk-v5.patch Nicolas, agreed. New patch for trunk (v5) attached that incorporates your feedback. The v3 patch is still the most current patch for 0.90. Better key splitting in RegionSplitter -- Key: HBASE-4489 URL: https://issues.apache.org/jira/browse/HBASE-4489 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-branch0.90-v2.patch, HBASE-4489-branch0.90-v3.patch, HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch, HBASE-4489-trunk-v3.patch, HBASE-4489-trunk-v4.patch, HBASE-4489-trunk-v5.patch The RegionSplitter utility allows users to create a pre-split table from the command line or do a rolling split on an existing table. It supports pluggable split algorithms that implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes keys fall in the range from ASCII string to ASCII string 7FFF. This is not a sane default, and seems useless to most users. Users are likely to be surprised by the fact that all the region splits occur in in the byte range of ASCII characters. A better default split algorithm would be one that evenly divides the space of all bytes, which is what this patch does. Making a table with five regions would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4489) Better key splitting in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Revell updated HBASE-4489: --- Attachment: HBASE-4489-trunk-v4.patch New patch for trunk with last night's feedback. - The SplitAlgorithm to use is now a required parameter - MD5StringSplit is now called HexStringSplit - The ceiling of 7FFF is now , and tests were changed to accomodate this. No changes are necessary on the 0.90 branch, so the -v3 patch for 0.90 is still current. Better key splitting in RegionSplitter -- Key: HBASE-4489 URL: https://issues.apache.org/jira/browse/HBASE-4489 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-branch0.90-v2.patch, HBASE-4489-branch0.90-v3.patch, HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch, HBASE-4489-trunk-v3.patch, HBASE-4489-trunk-v4.patch The RegionSplitter utility allows users to create a pre-split table from the command line or do a rolling split on an existing table. It supports pluggable split algorithms that implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes keys fall in the range from ASCII string to ASCII string 7FFF. This is not a sane default, and seems useless to most users. Users are likely to be surprised by the fact that all the region splits occur in in the byte range of ASCII characters. A better default split algorithm would be one that evenly divides the space of all bytes, which is what this patch does. Making a table with five regions would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4489) Better key splitting in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Revell updated HBASE-4489: --- Attachment: HBASE-4489-branch0.90-v3.patch HBASE-4489-trunk-v3.patch v3 patches with suggested fixes. Better key splitting in RegionSplitter -- Key: HBASE-4489 URL: https://issues.apache.org/jira/browse/HBASE-4489 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-branch0.90-v2.patch, HBASE-4489-branch0.90-v3.patch, HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch, HBASE-4489-trunk-v3.patch The RegionSplitter utility allows users to create a pre-split table from the command line or do a rolling split on an existing table. It supports pluggable split algorithms that implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes keys fall in the range from ASCII string to ASCII string 7FFF. This is not a sane default, and seems useless to most users. Users are likely to be surprised by the fact that all the region splits occur in in the byte range of ASCII characters. A better default split algorithm would be one that evenly divides the space of all bytes, which is what this patch does. Making a table with five regions would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms
[ https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Revell updated HBASE-4449: --- Resolution: Fixed Status: Resolved (was: Patch Available) LoadIncrementalHFiles should be able to handle CFs with blooms -- Key: HBASE-4449 URL: https://issues.apache.org/jira/browse/HBASE-4449 Project: HBase Issue Type: Bug Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Fix For: 0.90.5 Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, HBASE-4449.patch When LoadIncrementalHFiles loads a store file that crosses region boundaries, it will split the file at the boundary to create two store files. If the store file is for a column family that has a bloom filter, then a java.lang.ArithmeticException: / by zero will be raised because ByteBloomFilter() is called with maxKeys of 0. The included patch assumes that the number of keys in each split child will be equal to the number of keys in the parent's bloom filter (instead of 0). This is an overestimate, but it's safe and easy. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4489) Better key splitting in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Revell updated HBASE-4489: --- Attachment: HBASE-4489-branch0.90-v2.patch HBASE-4489-trunk-v2.patch New patches ending in -v2. These have new tests for RegionSplitter. Some weirdness: RegionSplitter.rollingSplit() seems to be broken, so it doesn't have any test cases in my code. I opened HBASE-4567 to focus on this. I also included a test case in TestRegionSplitter.java called reproduceDivByZeroFailure() that reproduces the problem. I think fixing this bug is outside the scope of this ticket. Better key splitting in RegionSplitter -- Key: HBASE-4489 URL: https://issues.apache.org/jira/browse/HBASE-4489 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-branch0.90-v2.patch, HBASE-4489-trunk-v1.patch, HBASE-4489-trunk-v2.patch The RegionSplitter utility allows users to create a pre-split table from the command line or do a rolling split on an existing table. It supports pluggable split algorithms that implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes keys fall in the range from ASCII string to ASCII string 7FFF. This is not a sane default, and seems useless to most users. Users are likely to be surprised by the fact that all the region splits occur in in the byte range of ASCII characters. A better default split algorithm would be one that evenly divides the space of all bytes, which is what this patch does. Making a table with five regions would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4489) Better key splitting in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Revell updated HBASE-4489: --- Attachment: HBASE-4489-trunk-v1.patch HBASE-4489-branch0.90-v1.patch Better key splitting in RegionSplitter -- Key: HBASE-4489 URL: https://issues.apache.org/jira/browse/HBASE-4489 Project: HBase Issue Type: Improvement Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch The RegionSplitter utility allows users to create a pre-split table from the command line or do a rolling split on an existing table. It supports pluggable split algorithms that implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes keys fall in the range from ASCII string to ASCII string 7FFF. This is not a sane default, and seems useless to most users. Users are likely to be surprised by the fact that all the region splits occur in in the byte range of ASCII characters. A better default split algorithm would be one that evenly divides the space of all bytes, which is what this patch does. Making a table with five regions would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4489) Better key splitting in RegionSplitter
[ https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Revell updated HBASE-4489: --- Affects Version/s: 0.90.4 Status: Patch Available (was: Open) Better key splitting in RegionSplitter -- Key: HBASE-4489 URL: https://issues.apache.org/jira/browse/HBASE-4489 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch The RegionSplitter utility allows users to create a pre-split table from the command line or do a rolling split on an existing table. It supports pluggable split algorithms that implement the SplitAlgorithm interface. The only/default SplitAlgorithm is one that assumes keys fall in the range from ASCII string to ASCII string 7FFF. This is not a sane default, and seems useless to most users. Users are likely to be surprised by the fact that all the region splits occur in in the byte range of ASCII characters. A better default split algorithm would be one that evenly divides the space of all bytes, which is what this patch does. Making a table with five regions would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and \xFF\xFF. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira