[jira] [Updated] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly

2016-04-29 Thread Nathan Schile (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Schile updated HBASE-15357:
--
Attachment: 15357-branch-1.v3.txt

Fixes merging on v2 patch on branch-1 branch. There were lines that needed to 
be deleted. 

> TableInputFormatBase getSplitKey does not handle signed bytes correctly
> ---
>
> Key: HBASE-15357
> URL: https://issues.apache.org/jira/browse/HBASE-15357
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Nathan Schile
>Assignee: Nathan Schile
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 15357-branch-1.v2.txt, 15357-branch-1.v3.txt, 
> 15357.v2.txt, HBASE-15357.patch
>
>
> When auto-balance is enabled in TableInputFormatBase and the table key is a 
> binary key, the getSplitKey method does not function correctly for signed 
> bytes. The proposed solution it to utilize 
> org.apache.hadoop.hbase.util.Bytes#split method to find the split key. 
> org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive 
> operation, so if another solution is preferred, that is fine. In addition, 
> handling of a split key that is equal to the TableSplit end key is added to 
> calculateRebalancedSplits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly

2016-04-29 Thread Nathan Schile (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Schile updated HBASE-15357:
--
Status: Patch Available  (was: Open)

> TableInputFormatBase getSplitKey does not handle signed bytes correctly
> ---
>
> Key: HBASE-15357
> URL: https://issues.apache.org/jira/browse/HBASE-15357
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Nathan Schile
>Assignee: Nathan Schile
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 15357-branch-1.v2.txt, 15357.v2.txt, HBASE-15357.patch
>
>
> When auto-balance is enabled in TableInputFormatBase and the table key is a 
> binary key, the getSplitKey method does not function correctly for signed 
> bytes. The proposed solution it to utilize 
> org.apache.hadoop.hbase.util.Bytes#split method to find the split key. 
> org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive 
> operation, so if another solution is preferred, that is fine. In addition, 
> handling of a split key that is equal to the TableSplit end key is added to 
> calculateRebalancedSplits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly

2016-03-21 Thread Nathan Schile (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204484#comment-15204484
 ] 

Nathan Schile commented on HBASE-15357:
---

What is the next step for this JIRA? Thanks.

> TableInputFormatBase getSplitKey does not handle signed bytes correctly
> ---
>
> Key: HBASE-15357
> URL: https://issues.apache.org/jira/browse/HBASE-15357
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Nathan Schile
>Assignee: Nathan Schile
> Attachments: HBASE-15357.patch
>
>
> When auto-balance is enabled in TableInputFormatBase and the table key is a 
> binary key, the getSplitKey method does not function correctly for signed 
> bytes. The proposed solution it to utilize 
> org.apache.hadoop.hbase.util.Bytes#split method to find the split key. 
> org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive 
> operation, so if another solution is preferred, that is fine. In addition, 
> handling of a split key that is equal to the TableSplit end key is added to 
> calculateRebalancedSplits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15428) Port auto-balancing to MultiTableInputFormatBase

2016-03-08 Thread Nathan Schile (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185628#comment-15185628
 ] 

Nathan Schile commented on HBASE-15428:
---

[~yuzhih...@gmail.com] Sorry, I forgot to mention that I would change 
TableInputFormatBase#calculateRebalancedSplits to be static.

> Port auto-balancing to MultiTableInputFormatBase
> 
>
> Key: HBASE-15428
> URL: https://issues.apache.org/jira/browse/HBASE-15428
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Nathan Schile
>Assignee: Nathan Schile
>
> Apache Crunch currently uses 
> [MultiTableInputFormatBase|https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-hbase/src/main/java/org/apache/crunch/io/hbase/HBaseSourceTarget.java#L88]
>  as the default format for reading HBase data. I would like to use the 
> functionality provided by 
> [HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590] "A solution 
> for data skew in HBase-Mapreduce Job", however it is only available in 
> TableInputFormatBase. This JIRA is to port the changes from 
> TableInputFormatBase into MultiTableInputFormatBase with respect toa 
> [HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590]. 
> I would use to use the [TableInputFormatBase#calculateRebalancedSplits 
> |https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L381]
>  and 
> [TableInputFormatBase#getSplitKey|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L454]
>  methods from TableInputFormatBase. Is it ok to use those methods directly 
> from MultiTableInputFormatBase, or should I move them to a new class?  I can 
> submit a patch once I get direction on the above question.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15428) Port auto-balancing to MultiTableInputFormatBase

2016-03-08 Thread Nathan Schile (JIRA)
Nathan Schile created HBASE-15428:
-

 Summary: Port auto-balancing to MultiTableInputFormatBase
 Key: HBASE-15428
 URL: https://issues.apache.org/jira/browse/HBASE-15428
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nathan Schile
Assignee: Nathan Schile


Apache Crunch currently uses 
[MultiTableInputFormatBase|https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-hbase/src/main/java/org/apache/crunch/io/hbase/HBaseSourceTarget.java#L88]
 as the default format for reading HBase data. I would like to use the 
functionality provided by 
[HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590] "A solution for 
data skew in HBase-Mapreduce Job", however it is only available in 
TableInputFormatBase. This JIRA is to port the changes from 
TableInputFormatBase into MultiTableInputFormatBase with respect toa 
[HBASE-12590|https://issues.apache.org/jira/browse/HBASE-12590]. 

I would use to use the [TableInputFormatBase#calculateRebalancedSplits 
|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L381]
 and 
[TableInputFormatBase#getSplitKey|https://github.com/apache/hbase/blob/rel/1.2.0/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L454]
 methods from TableInputFormatBase. Is it ok to use those methods directly from 
MultiTableInputFormatBase, or should I move them to a new class?  I can submit 
a patch once I get direction on the above question.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly

2016-02-29 Thread Nathan Schile (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173160#comment-15173160
 ] 

Nathan Schile commented on HBASE-15357:
---

For reference the current code returns a split point of -4, when the split 
point should be 124.

{code}
 byte[] start = { 120 }; // 'x'
 byte[] end = { -128 }; // '€'
 byte[] splitPoint = { -4 };
{code}

> TableInputFormatBase getSplitKey does not handle signed bytes correctly
> ---
>
> Key: HBASE-15357
> URL: https://issues.apache.org/jira/browse/HBASE-15357
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Nathan Schile
>Assignee: Nathan Schile
> Attachments: HBASE-15357.patch
>
>
> When auto-balance is enabled in TableInputFormatBase and the table key is a 
> binary key, the getSplitKey method does not function correctly for signed 
> bytes. The proposed solution it to utilize 
> org.apache.hadoop.hbase.util.Bytes#split method to find the split key. 
> org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive 
> operation, so if another solution is preferred, that is fine. In addition, 
> handling of a split key that is equal to the TableSplit end key is added to 
> calculateRebalancedSplits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly

2016-02-28 Thread Nathan Schile (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171400#comment-15171400
 ] 

Nathan Schile commented on HBASE-15357:
---

Review board: https://reviews.apache.org/r/44155/diff/1#index_header

> TableInputFormatBase getSplitKey does not handle signed bytes correctly
> ---
>
> Key: HBASE-15357
> URL: https://issues.apache.org/jira/browse/HBASE-15357
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Nathan Schile
>Assignee: Nathan Schile
> Attachments: HBASE-15357.patch
>
>
> When auto-balance is enabled in TableInputFormatBase and the table key is a 
> binary key, the getSplitKey method does not function correctly for signed 
> bytes. The proposed solution it to utilize 
> org.apache.hadoop.hbase.util.Bytes#split method to find the split key. 
> org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive 
> operation, so if another solution is preferred, that is fine. In addition, 
> handling of a split key that is equal to the TableSplit end key is added to 
> calculateRebalancedSplits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly

2016-02-28 Thread Nathan Schile (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Schile updated HBASE-15357:
--
Status: Patch Available  (was: Open)

> TableInputFormatBase getSplitKey does not handle signed bytes correctly
> ---
>
> Key: HBASE-15357
> URL: https://issues.apache.org/jira/browse/HBASE-15357
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Nathan Schile
>Assignee: Nathan Schile
> Attachments: HBASE-15357.patch
>
>
> When auto-balance is enabled in TableInputFormatBase and the table key is a 
> binary key, the getSplitKey method does not function correctly for signed 
> bytes. The proposed solution it to utilize 
> org.apache.hadoop.hbase.util.Bytes#split method to find the split key. 
> org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive 
> operation, so if another solution is preferred, that is fine. In addition, 
> handling of a split key that is equal to the TableSplit end key is added to 
> calculateRebalancedSplits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly

2016-02-28 Thread Nathan Schile (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nathan Schile updated HBASE-15357:
--
Attachment: HBASE-15357.patch

> TableInputFormatBase getSplitKey does not handle signed bytes correctly
> ---
>
> Key: HBASE-15357
> URL: https://issues.apache.org/jira/browse/HBASE-15357
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Reporter: Nathan Schile
>Assignee: Nathan Schile
> Attachments: HBASE-15357.patch
>
>
> When auto-balance is enabled in TableInputFormatBase and the table key is a 
> binary key, the getSplitKey method does not function correctly for signed 
> bytes. The proposed solution it to utilize 
> org.apache.hadoop.hbase.util.Bytes#split method to find the split key. 
> org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive 
> operation, so if another solution is preferred, that is fine. In addition, 
> handling of a split key that is equal to the TableSplit end key is added to 
> calculateRebalancedSplits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15357) TableInputFormatBase getSplitKey does not handle signed bytes correctly

2016-02-28 Thread Nathan Schile (JIRA)
Nathan Schile created HBASE-15357:
-

 Summary: TableInputFormatBase getSplitKey does not handle signed 
bytes correctly
 Key: HBASE-15357
 URL: https://issues.apache.org/jira/browse/HBASE-15357
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Nathan Schile
Assignee: Nathan Schile


When auto-balance is enabled in TableInputFormatBase and the table key is a 
binary key, the getSplitKey method does not function correctly for signed 
bytes. The proposed solution it to utilize 
org.apache.hadoop.hbase.util.Bytes#split method to find the split key. 
org.apache.hadoop.hbase.util.Bytes#split is stated to be a expensive operation, 
so if another solution is preferred, that is fine. In addition, handling of a 
split key that is equal to the TableSplit end key is added to 
calculateRebalancedSplits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)