[jira] [Updated] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly
[ https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Schile updated HBASE-15357: -- Attachment: 15357-branch-1.v3.txt Fixes merging on v2 patch on branch-1 branch. There were lines that needed to be deleted. > TableInputFormatBase getSplitKey does not handle signed bytes correctly > --- > > Key: HBASE-15357 > URL: https://issues.apache.org/jira/browse/HBASE-15357 > Project: HBase > Issue Type: Bug > Components: mapreduce >Reporter: Nathan Schile >Assignee: Nathan Schile > Fix For: 2.0.0, 1.4.0 > > Attachments: 15357-branch-1.v2.txt, 15357-branch-1.v3.txt, > 15357.v2.txt, HBASE-15357.patch > > > When auto-balance is enabled in TableInputFormatBase and the table key is a > binary key, the getSplitKey method does not function correctly for signed > bytes. The proposed solution it to utilize > org.apache.hadoop.hbase.util.Bytes#split method to find the split key. > org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive > operation, so if another solution is preferred, that is fine. In addition, > handling of a split key that is equal to the TableSplit end key is added to > calculateRebalancedSplits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly
[ https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Schile updated HBASE-15357: -- Status: Patch Available (was: Open) > TableInputFormatBase getSplitKey does not handle signed bytes correctly > --- > > Key: HBASE-15357 > URL: https://issues.apache.org/jira/browse/HBASE-15357 > Project: HBase > Issue Type: Bug > Components: mapreduce >Reporter: Nathan Schile >Assignee: Nathan Schile > Fix For: 2.0.0, 1.4.0 > > Attachments: 15357-branch-1.v2.txt, 15357.v2.txt, HBASE-15357.patch > > > When auto-balance is enabled in TableInputFormatBase and the table key is a > binary key, the getSplitKey method does not function correctly for signed > bytes. The proposed solution it to utilize > org.apache.hadoop.hbase.util.Bytes#split method to find the split key. > org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive > operation, so if another solution is preferred, that is fine. In addition, > handling of a split key that is equal to the TableSplit end key is added to > calculateRebalancedSplits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly
[ https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204484#comment-15204484 ] Nathan Schile commented on HBASE-15357: --- What is the next step for this JIRA? Thanks. > TableInputFormatBase getSplitKey does not handle signed bytes correctly > --- > > Key: HBASE-15357 > URL: https://issues.apache.org/jira/browse/HBASE-15357 > Project: HBase > Issue Type: Bug > Components: mapreduce >Reporter: Nathan Schile >Assignee: Nathan Schile > Attachments: HBASE-15357.patch > > > When auto-balance is enabled in TableInputFormatBase and the table key is a > binary key, the getSplitKey method does not function correctly for signed > bytes. The proposed solution it to utilize > org.apache.hadoop.hbase.util.Bytes#split method to find the split key. > org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive > operation, so if another solution is preferred, that is fine. In addition, > handling of a split key that is equal to the TableSplit end key is added to > calculateRebalancedSplits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15428) Port auto-balancing to MultiTableInputFormatBase
[ https://issues.apache.org/jira/browse/HBASE-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185628#comment-15185628 ] Nathan Schile commented on HBASE-15428: --- [~yuzhih...@gmail.com] Sorry, I forgot to mention that I would change TableInputFormatBase#calculateRebalancedSplits to be static. > Port auto-balancing to MultiTableInputFormatBase > > > Key: HBASE-15428 > URL: https://issues.apache.org/jira/browse/HBASE-15428 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Reporter: Nathan Schile >Assignee: Nathan Schile > > Apache Crunch currently uses > [MultiTableInputFormatBase|https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-hbase/src/main/java/org/apache/crunch/io/hbase/HBaseSourceTarget.java#L88] > as the default format for reading HBase data. I would like to use the > functionality provided by > [HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590] "A solution > for data skew in HBase-Mapreduce Job", however it is only available in > TableInputFormatBase. This JIRA is to port the changes from > TableInputFormatBase into MultiTableInputFormatBase with respect toa > [HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590]. > I would use to use the [TableInputFormatBase#calculateRebalancedSplits > |https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L381] > and > [TableInputFormatBase#getSplitKey|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L454] > methods from TableInputFormatBase. Is it ok to use those methods directly > from MultiTableInputFormatBase, or should I move them to a new class? I can > submit a patch once I get direction on the above question. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15428) Port auto-balancing to MultiTableInputFormatBase
Nathan Schile created HBASE-15428: - Summary: Port auto-balancing to MultiTableInputFormatBase Key: HBASE-15428 URL: https://issues.apache.org/jira/browse/HBASE-15428 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Nathan Schile Assignee: Nathan Schile Apache Crunch currently uses [MultiTableInputFormatBase|https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-hbase/src/main/java/org/apache/crunch/io/hbase/HBaseSourceTarget.java#L88] as the default format for reading HBase data. I would like to use the functionality provided by [HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590] "A solution for data skew in HBase-Mapreduce Job", however it is only available in TableInputFormatBase. This JIRA is to port the changes from TableInputFormatBase into MultiTableInputFormatBase with respect toa [HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590]. I would use to use the [TableInputFormatBase#calculateRebalancedSplits |https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L381] and [TableInputFormatBase#getSplitKey|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L454] methods from TableInputFormatBase. Is it ok to use those methods directly from MultiTableInputFormatBase, or should I move them to a new class? I can submit a patch once I get direction on the above question. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly
[ https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173160#comment-15173160 ] Nathan Schile commented on HBASE-15357: --- For reference the current code returns a split point of -4, when the split point should be 124. {code} byte[] start = { 120 }; // 'x' byte[] end = { -128 }; // '€' byte[] splitPoint = { -4 }; {code} > TableInputFormatBase getSplitKey does not handle signed bytes correctly > --- > > Key: HBASE-15357 > URL: https://issues.apache.org/jira/browse/HBASE-15357 > Project: HBase > Issue Type: Bug > Components: mapreduce >Reporter: Nathan Schile >Assignee: Nathan Schile > Attachments: HBASE-15357.patch > > > When auto-balance is enabled in TableInputFormatBase and the table key is a > binary key, the getSplitKey method does not function correctly for signed > bytes. The proposed solution it to utilize > org.apache.hadoop.hbase.util.Bytes#split method to find the split key. > org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive > operation, so if another solution is preferred, that is fine. In addition, > handling of a split key that is equal to the TableSplit end key is added to > calculateRebalancedSplits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly
[ https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171400#comment-15171400 ] Nathan Schile commented on HBASE-15357: --- Review board: https://reviews.apache.org/r/44155/diff/1#index_header > TableInputFormatBase getSplitKey does not handle signed bytes correctly > --- > > Key: HBASE-15357 > URL: https://issues.apache.org/jira/browse/HBASE-15357 > Project: HBase > Issue Type: Bug > Components: mapreduce >Reporter: Nathan Schile >Assignee: Nathan Schile > Attachments: HBASE-15357.patch > > > When auto-balance is enabled in TableInputFormatBase and the table key is a > binary key, the getSplitKey method does not function correctly for signed > bytes. The proposed solution it to utilize > org.apache.hadoop.hbase.util.Bytes#split method to find the split key. > org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive > operation, so if another solution is preferred, that is fine. In addition, > handling of a split key that is equal to the TableSplit end key is added to > calculateRebalancedSplits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly
[ https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Schile updated HBASE-15357: -- Status: Patch Available (was: Open) > TableInputFormatBase getSplitKey does not handle signed bytes correctly > --- > > Key: HBASE-15357 > URL: https://issues.apache.org/jira/browse/HBASE-15357 > Project: HBase > Issue Type: Bug > Components: mapreduce >Reporter: Nathan Schile >Assignee: Nathan Schile > Attachments: HBASE-15357.patch > > > When auto-balance is enabled in TableInputFormatBase and the table key is a > binary key, the getSplitKey method does not function correctly for signed > bytes. The proposed solution it to utilize > org.apache.hadoop.hbase.util.Bytes#split method to find the split key. > org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive > operation, so if another solution is preferred, that is fine. In addition, > handling of a split key that is equal to the TableSplit end key is added to > calculateRebalancedSplits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly
[ https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Schile updated HBASE-15357: -- Attachment: HBASE-15357.patch > TableInputFormatBase getSplitKey does not handle signed bytes correctly > --- > > Key: HBASE-15357 > URL: https://issues.apache.org/jira/browse/HBASE-15357 > Project: HBase > Issue Type: Bug > Components: mapreduce >Reporter: Nathan Schile >Assignee: Nathan Schile > Attachments: HBASE-15357.patch > > > When auto-balance is enabled in TableInputFormatBase and the table key is a > binary key, the getSplitKey method does not function correctly for signed > bytes. The proposed solution it to utilize > org.apache.hadoop.hbase.util.Bytes#split method to find the split key. > org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive > operation, so if another solution is preferred, that is fine. In addition, > handling of a split key that is equal to the TableSplit end key is added to > calculateRebalancedSplits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly
Nathan Schile created HBASE-15357: - Summary: TableInputFormatBase getSplitKey does not handle signed bytes correctly Key: HBASE-15357 URL: https://issues.apache.org/jira/browse/HBASE-15357 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Nathan Schile Assignee: Nathan Schile When auto-balance is enabled in TableInputFormatBase and the table key is a binary key, the getSplitKey method does not function correctly for signed bytes. The proposed solution it to utilize org.apache.hadoop.hbase.util.Bytes#split method to find the split key. org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive operation, so if another solution is preferred, that is fine. In addition, handling of a split key that is equal to the TableSplit end key is added to calculateRebalancedSplits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)