subject:"\[jira\] \[Commented\] \(HBASE\-12590\) A solution for data skew in HBase\-Mapreduce Job"

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2017-10-06 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195123#comment-16195123
 ] 

Hudson commented on HBASE-12590:


Results for branch HBASE-18467, done in 4 hr 24 min and counting
[build #136 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18467/136/]: 
FAILURE

details (if available):

(x) *{color:red}-1 overall{color}*
Committer, please check your recent inclusion of a patch for this issue.

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18467/136//General_Nightly_Build_Report/]










(/) {color:green}+1 jdk8 checks{color}
-- For more information [see jdk8 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-18467/136//JDK8_Nightly_Build_Report/]


(x) {color:red}-1 source release artifact{color}
-- See build output for details.



> A solution for data skew in HBase-Mapreduce Job
> ---
>
> Key: HBASE-12590
> URL: https://issues.apache.org/jira/browse/HBASE-12590
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Weichen Ye
>Assignee: Weichen Ye
> Fix For: 2.0.0
>
> Attachments: A Solution for Data Skew in HBase-MapReduce Job 
> (Version2).pdf, A Solution for Data Skew in HBase-MapReduce Job 
> (Version3).pdf, HBase-12590-v1.patch, HBase-12590-v2.patch, 
> HBASE-12590-v3.patch, HBASE-12590-v4.patch
>
>
> 1, Motivation
> In production environment, data skew is a very common case. A HBase table may 
> contains a lot of small regions and several large regions. Small regions 
> waste a lot of computing resources. If we use a job to scan a table with 3000 
> small regions, we need a job with 3000 mappers. Large regions always block 
> the job. If in a 100-region table, one region is far large then the other 99 
> regions. When we run a job with the table as input, 99 mappers will be 
> completed very quickly, and then we need to wait for the last mapper for a 
> long time.
> 2, Configuration
> Add three new configuration 
> hbase.mapreduce.input.autobalance = true means enabling the “auto balance” in 
> HBase-MapReduce jobs. The default value is false. 
> hbase.mapreduce.input.autobalance.maxskewratio= 3 (default is 3). If a region 
> size is larger than 3x average region size, treat the region as 
> “proportionately too large”.
> hbase.table.row.textkey  = true means the row key is text. False means binary 
> row key. It is used to find the mid row key in large region. The default 
> value is true. 
> If (region size >= average size*ratio) :  cut the region into two MR input 
> splits
> If (average size <= region size < average size*ratio) : one region as one MR 
> input split
> If (sum of several continuous regions size < average size): combine these 
> regions into one MR input split.
> Example:
> In attachment
> Welcome to the Review Board.
> https://reviews.apache.org/r/28494/diff/#



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2017-10-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190787#comment-16190787
 ] 

Hudson commented on HBASE-12590:


SUCCESS: Integrated in Jenkins build HBase-Trunk_matrix #3823 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/3823/])
HBASE-16894 Create more than 1 split per region, generalize HBASE-12590 
(apurtell: rev 16d483f9003ddee71404f37ce7694003d1a18ac4)
* (edit) 
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* (edit) 
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
* (edit) 
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan1.java


> A solution for data skew in HBase-Mapreduce Job
> ---
>
> Key: HBASE-12590
> URL: https://issues.apache.org/jira/browse/HBASE-12590
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Weichen Ye
>Assignee: Weichen Ye
> Fix For: 2.0.0
>
> Attachments: A Solution for Data Skew in HBase-MapReduce Job 
> (Version2).pdf, A Solution for Data Skew in HBase-MapReduce Job 
> (Version3).pdf, HBase-12590-v1.patch, HBase-12590-v2.patch, 
> HBASE-12590-v3.patch, HBASE-12590-v4.patch
>
>
> 1, Motivation
> In production environment, data skew is a very common case. A HBase table may 
> contains a lot of small regions and several large regions. Small regions 
> waste a lot of computing resources. If we use a job to scan a table with 3000 
> small regions, we need a job with 3000 mappers. Large regions always block 
> the job. If in a 100-region table, one region is far large then the other 99 
> regions. When we run a job with the table as input, 99 mappers will be 
> completed very quickly, and then we need to wait for the last mapper for a 
> long time.
> 2, Configuration
> Add three new configuration 
> hbase.mapreduce.input.autobalance = true means enabling the “auto balance” in 
> HBase-MapReduce jobs. The default value is false. 
> hbase.mapreduce.input.autobalance.maxskewratio= 3 (default is 3). If a region 
> size is larger than 3x average region size, treat the region as 
> “proportionately too large”.
> hbase.table.row.textkey  = true means the row key is text. False means binary 
> row key. It is used to find the mid row key in large region. The default 
> value is true. 
> If (region size >= average size*ratio) :  cut the region into two MR input 
> splits
> If (average size <= region size < average size*ratio) : one region as one MR 
> input split
> If (sum of several continuous regions size < average size): combine these 
> regions into one MR input split.
> Example:
> In attachment
> Welcome to the Review Board.
> https://reviews.apache.org/r/28494/diff/#



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2017-10-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190768#comment-16190768
 ] 

Hudson commented on HBASE-12590:


SUCCESS: Integrated in Jenkins build HBase-1.4 #940 (See 
[https://builds.apache.org/job/HBase-1.4/940/])
HBASE-16894 Create more than 1 split per region, generalize HBASE-12590 
(apurtell: rev cbbcb2db2f0a94382cb33fef826cbf1a00b5de6e)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/namespace/TestNamespaceAuditor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan1.java


> A solution for data skew in HBase-Mapreduce Job
> ---
>
> Key: HBASE-12590
> URL: https://issues.apache.org/jira/browse/HBASE-12590
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Weichen Ye
>Assignee: Weichen Ye
> Fix For: 2.0.0
>
> Attachments: A Solution for Data Skew in HBase-MapReduce Job 
> (Version2).pdf, A Solution for Data Skew in HBase-MapReduce Job 
> (Version3).pdf, HBase-12590-v1.patch, HBase-12590-v2.patch, 
> HBASE-12590-v3.patch, HBASE-12590-v4.patch
>
>
> 1, Motivation
> In production environment, data skew is a very common case. A HBase table may 
> contains a lot of small regions and several large regions. Small regions 
> waste a lot of computing resources. If we use a job to scan a table with 3000 
> small regions, we need a job with 3000 mappers. Large regions always block 
> the job. If in a 100-region table, one region is far large then the other 99 
> regions. When we run a job with the table as input, 99 mappers will be 
> completed very quickly, and then we need to wait for the last mapper for a 
> long time.
> 2, Configuration
> Add three new configuration 
> hbase.mapreduce.input.autobalance = true means enabling the “auto balance” in 
> HBase-MapReduce jobs. The default value is false. 
> hbase.mapreduce.input.autobalance.maxskewratio= 3 (default is 3). If a region 
> size is larger than 3x average region size, treat the region as 
> “proportionately too large”.
> hbase.table.row.textkey  = true means the row key is text. False means binary 
> row key. It is used to find the mid row key in large region. The default 
> value is true. 
> If (region size >= average size*ratio) :  cut the region into two MR input 
> splits
> If (average size <= region size < average size*ratio) : one region as one MR 
> input split
> If (sum of several continuous regions size < average size): combine these 
> regions into one MR input split.
> Example:
> In attachment
> Welcome to the Review Board.
> https://reviews.apache.org/r/28494/diff/#



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2017-10-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190744#comment-16190744
 ] 

Hudson commented on HBASE-12590:


FAILURE: Integrated in Jenkins build HBase-1.5 #84 (See 
[https://builds.apache.org/job/HBase-1.5/84/])
HBASE-16894 Create more than 1 split per region, generalize HBASE-12590 
(apurtell: rev fc783ef04505eab7e58c6abc3ac1f7d7ecce465b)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan1.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/namespace/TestNamespaceAuditor.java
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


> A solution for data skew in HBase-Mapreduce Job
> ---
>
> Key: HBASE-12590
> URL: https://issues.apache.org/jira/browse/HBASE-12590
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Weichen Ye
>Assignee: Weichen Ye
> Fix For: 2.0.0
>
> Attachments: A Solution for Data Skew in HBase-MapReduce Job 
> (Version2).pdf, A Solution for Data Skew in HBase-MapReduce Job 
> (Version3).pdf, HBase-12590-v1.patch, HBase-12590-v2.patch, 
> HBASE-12590-v3.patch, HBASE-12590-v4.patch
>
>
> 1, Motivation
> In production environment, data skew is a very common case. A HBase table may 
> contains a lot of small regions and several large regions. Small regions 
> waste a lot of computing resources. If we use a job to scan a table with 3000 
> small regions, we need a job with 3000 mappers. Large regions always block 
> the job. If in a 100-region table, one region is far large then the other 99 
> regions. When we run a job with the table as input, 99 mappers will be 
> completed very quickly, and then we need to wait for the last mapper for a 
> long time.
> 2, Configuration
> Add three new configuration 
> hbase.mapreduce.input.autobalance = true means enabling the “auto balance” in 
> HBase-MapReduce jobs. The default value is false. 
> hbase.mapreduce.input.autobalance.maxskewratio= 3 (default is 3). If a region 
> size is larger than 3x average region size, treat the region as 
> “proportionately too large”.
> hbase.table.row.textkey  = true means the row key is text. False means binary 
> row key. It is used to find the mid row key in large region. The default 
> value is true. 
> If (region size >= average size*ratio) :  cut the region into two MR input 
> splits
> If (average size <= region size < average size*ratio) : one region as one MR 
> input split
> If (sum of several continuous regions size < average size): combine these 
> regions into one MR input split.
> Example:
> In attachment
> Welcome to the Review Board.
> https://reviews.apache.org/r/28494/diff/#



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2017-10-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16190661#comment-16190661
 ] 

Hudson commented on HBASE-12590:


FAILURE: Integrated in Jenkins build HBase-2.0 #622 (See 
[https://builds.apache.org/job/HBase-2.0/622/])
HBASE-16894 Create more than 1 split per region, generalize HBASE-12590 
(apurtell: rev 4475ba88c15886bd15c113f2dbd5214600686cfe)
* (edit) 
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
* (edit) 
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
* (edit) 
hbase-mapreduce/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan1.java


> A solution for data skew in HBase-Mapreduce Job
> ---
>
> Key: HBASE-12590
> URL: https://issues.apache.org/jira/browse/HBASE-12590
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Weichen Ye
>Assignee: Weichen Ye
> Fix For: 2.0.0
>
> Attachments: A Solution for Data Skew in HBase-MapReduce Job 
> (Version2).pdf, A Solution for Data Skew in HBase-MapReduce Job 
> (Version3).pdf, HBase-12590-v1.patch, HBase-12590-v2.patch, 
> HBASE-12590-v3.patch, HBASE-12590-v4.patch
>
>
> 1, Motivation
> In production environment, data skew is a very common case. A HBase table may 
> contains a lot of small regions and several large regions. Small regions 
> waste a lot of computing resources. If we use a job to scan a table with 3000 
> small regions, we need a job with 3000 mappers. Large regions always block 
> the job. If in a 100-region table, one region is far large then the other 99 
> regions. When we run a job with the table as input, 99 mappers will be 
> completed very quickly, and then we need to wait for the last mapper for a 
> long time.
> 2, Configuration
> Add three new configuration 
> hbase.mapreduce.input.autobalance = true means enabling the “auto balance” in 
> HBase-MapReduce jobs. The default value is false. 
> hbase.mapreduce.input.autobalance.maxskewratio= 3 (default is 3). If a region 
> size is larger than 3x average region size, treat the region as 
> “proportionately too large”.
> hbase.table.row.textkey  = true means the row key is text. False means binary 
> row key. It is used to find the mid row key in large region. The default 
> value is true. 
> If (region size >= average size*ratio) :  cut the region into two MR input 
> splits
> If (average size <= region size < average size*ratio) : one region as one MR 
> input split
> If (sum of several continuous regions size < average size): combine these 
> regions into one MR input split.
> Example:
> In attachment
> Welcome to the Review Board.
> https://reviews.apache.org/r/28494/diff/#



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2015-03-11 Thread Hudson (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356365#comment-14356365
]

Hudson commented on HBASE-12590:

FAILURE: Integrated in HBase-0.98 #890 (See
[https://builds.apache.org/job/HBase-0.98/890/])
HBASE-13168 Backport HBASE-12590 A solution for data skew in HBase-Mapreduce
Job (tedyu: rev 1b4f8afaec8cd4dfef46154bdceb31ce7ddf5982)
*
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan1.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java

A solution for data skew in HBase-Mapreduce Job
---

Key: HBASE-12590
URL: https://issues.apache.org/jira/browse/HBASE-12590
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Weichen Ye
Assignee: Weichen Ye
Fix For: 2.0.0

Attachments: A Solution for Data Skew in HBase-MapReduce Job
(Version2).pdf, A Solution for Data Skew in HBase-MapReduce Job
(Version3).pdf, HBASE-12590-v3.patch, HBASE-12590-v4.patch,
HBase-12590-v1.patch, HBase-12590-v2.patch

1, Motivation
In production environment, data skew is a very common case. A HBase table may
contains a lot of small regions and several large regions. Small regions
waste a lot of computing resources. If we use a job to scan a table with 3000
small regions, we need a job with 3000 mappers. Large regions always block
the job. If in a 100-region table, one region is far large then the other 99
regions. When we run a job with the table as input, 99 mappers will be
completed very quickly, and then we need to wait for the last mapper for a
long time.
2, Configuration
Add three new configuration
hbase.mapreduce.input.autobalance = true means enabling the “auto balance” in
HBase-MapReduce jobs. The default value is false.
hbase.mapreduce.input.autobalance.maxskewratio= 3 (default is 3). If a region
size is larger than 3x average region size, treat the region as
“proportionately too large”.
hbase.table.row.textkey = true means the row key is text. False means binary
row key. It is used to find the mid row key in large region. The default
value is true.
If (region size = average size*ratio) : cut the region into two MR input
splits
If (average size = region size average size*ratio) : one region as one MR
input split
If (sum of several continuous regions size average size): combine these
regions into one MR input split.
Example:
In attachment
Welcome to the Review Board.
https://reviews.apache.org/r/28494/diff/#

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2015-03-11 Thread Hudson (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356415#comment-14356415
]

Hudson commented on HBASE-12590:

FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #847 (See
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/847/])
HBASE-13168 Backport HBASE-12590 A solution for data skew in HBase-Mapreduce
Job (tedyu: rev 1b4f8afaec8cd4dfef46154bdceb31ce7ddf5982)
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan1.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

A solution for data skew in HBase-Mapreduce Job
---

Key: HBASE-12590
URL: https://issues.apache.org/jira/browse/HBASE-12590
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Weichen Ye
Assignee: Weichen Ye
Fix For: 2.0.0

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2015-03-10 Thread Hudson (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356236#comment-14356236
]

Hudson commented on HBASE-12590:

FAILURE: Integrated in HBase-1.0 #795 (See
[https://builds.apache.org/job/HBase-1.0/795/])
HBASE-13168 Backport HBASE-12590 A solution for data skew in HBase-Mapreduce
Job (tedyu: rev 89112e84957558f31c161256aa2d7054f165ca02)
*
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan1.java

A solution for data skew in HBase-Mapreduce Job
---

Key: HBASE-12590
URL: https://issues.apache.org/jira/browse/HBASE-12590
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Weichen Ye
Assignee: Weichen Ye
Fix For: 2.0.0

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2015-03-10 Thread Hudson (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356251#comment-14356251
]

Hudson commented on HBASE-12590:

SUCCESS: Integrated in HBase-1.1 #276 (See
[https://builds.apache.org/job/HBase-1.1/276/])
HBASE-13168 Backport HBASE-12590 A solution for data skew in HBase-Mapreduce
Job (tedyu: rev 05aef46d942a0196c6c655ab19a160cd7dc56789)
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan1.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

A solution for data skew in HBase-Mapreduce Job
---

Key: HBASE-12590
URL: https://issues.apache.org/jira/browse/HBASE-12590
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Weichen Ye
Assignee: Weichen Ye
Fix For: 2.0.0

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2014-12-24 Thread Jonathan Hsieh (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258216#comment-14258216
]

Jonathan Hsieh commented on HBASE-12590:

nice catches. It would be nice to port a correct algorithm over into this
places.

A solution for data skew in HBase-Mapreduce Job
---

Key: HBASE-12590
URL: https://issues.apache.org/jira/browse/HBASE-12590
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Weichen Ye
Attachments: A Solution for Data Skew in HBase-MapReduce Job
(Version2).pdf, A Solution for Data Skew in HBase-MapReduce Job
(Version3).pdf, HBASE-12590-v3.patch, HBASE-12590-v4.patch,
HBase-12590-v1.patch, HBase-12590-v2.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2014-12-24 Thread Jonathan Hsieh (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258224#comment-14258224
]

Jonathan Hsieh commented on HBASE-12590:

Thanks [~yeweichen]!

A solution for data skew in HBase-Mapreduce Job
---

Key: HBASE-12590
URL: https://issues.apache.org/jira/browse/HBASE-12590
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Weichen Ye
Assignee: Weichen Ye
Fix For: 2.0.0

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2014-12-24 Thread Hudson (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258282#comment-14258282
]

Hudson commented on HBASE-12590:

SUCCESS: Integrated in HBase-TRUNK #5965 (See
[https://builds.apache.org/job/HBase-TRUNK/5965/])
HBASE-12590 A solution for data skew in HBase-Mapreduce jobs (Weichen Ye)
(jmhsieh: rev a912a56b38fca6aada68dab5ef73613c073cbc6a)
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan1.java
*
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScanBase.java
*
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

A solution for data skew in HBase-Mapreduce Job
---

Key: HBASE-12590
URL: https://issues.apache.org/jira/browse/HBASE-12590
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Weichen Ye
Assignee: Weichen Ye
Fix For: 2.0.0

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2014-12-24 Thread Weichen Ye (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258610#comment-14258610
]

Weichen Ye commented on HBASE-12590:

Thank you [~jmhsieh] for your help and comments! I`ll continue working on
HBASE-12716.

And Merry Christmas:)

A solution for data skew in HBase-Mapreduce Job
---

Key: HBASE-12590
URL: https://issues.apache.org/jira/browse/HBASE-12590
Project: HBase
Issue Type: Improvement
Components: mapreduce
Reporter: Weichen Ye
Assignee: Weichen Ye
Fix For: 2.0.0

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2014-12-18 Thread Weichen Ye (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251686#comment-14251686
]

Weichen Ye commented on HBASE-12590:

[~j...@cloudera.com]
Hi~
I used to try the algorithm in RegionSplitter, but I find there is a small bug.
If the start key is the same length as the end key, and their last bytes are
adjacent in alphabetical order , the algorithm would not calculate a split
point with an additional byte.

This split algorithm is not very related to the data skew in HBase-MapReduce
job, so i create two new issues about it .
https://issues.apache.org/jira/browse/HBASE-12716
https://issues.apache.org/jira/browse/HBASE-12717

A solution for data skew in HBase-Mapreduce Job
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2014-12-17 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249605#comment-14249605
]

Hadoop QA commented on HBASE-12590:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12687680/HBASE-12590-v4.patch
against master branch at commit 99a11390b4758c211af04af2ca0696ac6e3e0aeb.
ATTACHMENT ID: 12687680

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified tests.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:green}+1 javadoc{color}. The javadoc tool did not generate any
warning messages.

{color:red}-1 checkstyle{color}. The applied patch generated
2086 checkstyle errors (more than the master's current 2084 errors).

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 lineLengths{color}. The patch does not introduce lines
longer than 100

{color:green}+1 site{color}. The mvn site goal succeeds with this patch.

{color:red}-1 core tests{color}. The patch failed these unit tests:

org.apache.hadoop.hbase.regionserver.TestPerColumnFamilyFlush

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//artifact/patchprocess/checkstyle-aggregate.html

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/12105//console

This message is automatically generated.

A solution for data skew in HBase-Mapreduce Job
---

[jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job

2014-12-17 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249751#comment-14249751
]

Hadoop QA commented on HBASE-12590:
---

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12687706/HBASE-12590-v4.patch
against master branch at commit 99a11390b4758c211af04af2ca0696ac6e3e0aeb.
ATTACHMENT ID: 12687706