[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-12-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276470#comment-16276470
 ] 

Hadoop QA commented on HBASE-19290:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
51s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} branch-1 passed with JDK v1.8.0_152 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} branch-1 passed with JDK v1.7.0_161 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
48s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
10s{color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} branch-1 passed with JDK v1.8.0_152 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} branch-1 passed with JDK v1.7.0_161 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK v1.8.0_152 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed with JDK v1.7.0_161 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
42s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
29m 23s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed with JDK v1.8.0_152 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed with JDK v1.7.0_161 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 93m 
54s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 16s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:36a7029 |
| JIRA Issue | HBASE-19290 |
| JIRA Patch URL | 

[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271207#comment-16271207
 ] 

Hudson commented on HBASE-19290:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4138 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4138/])
HBASE-19290 Reduce zk request when doing split log (binlijin: rev 
8b32d3792934507c774997cd82dc061b75410f83)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZkSplitLogWorkerCoordination.java


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.branch-1.001.patch, 
> HBASE-19290.master.001.patch, HBASE-19290.master.002.patch, 
> HBASE-19290.master.003.patch, HBASE-19290.master.004.patch, 
> HBASE-19290.master.005.patch, HBASE-19290.master.006.patch, 
> HBASE-19290.master.006.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271002#comment-16271002
 ] 

Hudson commented on HBASE-19290:


FAILURE: Integrated in Jenkins build HBase-2.0 #936 (See 
[https://builds.apache.org/job/HBase-2.0/936/])
HBASE-19290 Reduce zk request when doing split log (binlijin: rev 
64ddce303ee352a3ab1eb9cedf3fad097be7569c)
* (edit) 
hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZkSplitLogWorkerCoordination.java


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.branch-1.001.patch, 
> HBASE-19290.master.001.patch, HBASE-19290.master.002.patch, 
> HBASE-19290.master.003.patch, HBASE-19290.master.004.patch, 
> HBASE-19290.master.005.patch, HBASE-19290.master.006.patch, 
> HBASE-19290.master.006.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-29 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270571#comment-16270571
 ] 

binlijin commented on HBASE-19290:
--

Push to master and branch-2

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch, HBASE-19290.master.005.patch, 
> HBASE-19290.master.006.patch, HBASE-19290.master.006.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270476#comment-16270476
 ] 

Hadoop QA commented on HBASE-19290:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
22s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
22s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
26s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
47m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m  
7s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899765/HBASE-19290.master.006.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d9c5afafd53b 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / e67a3699c4 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/10105/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/10105/console |
| Powered by | Apache Yetus 0.6.0   http://yetus.apache.org |


This message was automatically generated.



> Reduce zk request when doing split log
> 

[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270247#comment-16270247
 ] 

Hadoop QA commented on HBASE-19290:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
42s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
36s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
63m 33s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}137m 50s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}224m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899741/HBASE-19290.master.006.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 96502406ebab 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / e67a3699c4 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/10094/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/10094/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/10094/console |
| Powered by | Apache Yetus 0.6.0   

[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-28 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269984#comment-16269984
 ] 

binlijin commented on HBASE-19290:
--

HBASE-19290.master.006.patch fix test failed.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch, HBASE-19290.master.005.patch, 
> HBASE-19290.master.006.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268618#comment-16268618
 ] 

Hadoop QA commented on HBASE-19290:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
13s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
20s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
53m 10s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 44s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}153m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestSplitLogWorker |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12899576/HBASE-19290.master.005.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 6d39709fe37e 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 0f33931b2a |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/10065/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/10065/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-28 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268414#comment-16268414
 ] 

binlijin commented on HBASE-19290:
--

bq.Many apologies for the wasted time! (will try to make up for it somehow)
Not at all.
bq.If you haven't committed already, please change taskGrabbed to a boolean 
since its count doesn't matter.
OK, do in HBASE-19290.master.005.patch 

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch, HBASE-19290.master.005.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-28 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268365#comment-16268365
 ] 

Appy commented on HBASE-19290:
--

(facepalm) Okay, I accept my thick headedness here. Many apologies for the 
wasted time!
+1
If you haven't committed already, please change taskGrabbed to a boolean since 
it's count doesn't matter.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-27 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268045#comment-16268045
 ] 

binlijin commented on HBASE-19290:
--

bq. Can you please walk me through a full case explaining need of that "if" 
condition and why value of grabbedTask=0 is special.
The task loop is the following stage
(1) getTaskList, get tasks from zookeeper splitLogZNode node.
 issue zookeeper request
(2) for loop trying to grab task for every tasks  
  issue zookeeper request
(3) while loop
  sleep when seq_start == taskReadySeq.get(), else skip
When grabbedTask=0 and skip stage 3 the while loop, regionserver will trying to 
grab tasks again and may not get any tasks and giving pressure to zookeeper, so 
throttling it. 
When grabbedTask=1 and skip stage 3 the while loop, regionserver will trying to 
grab tasks again and if not get any tasks this round so will throttle, and if 
get any task may try grab tasks next round again.


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-27 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16267542#comment-16267542
 ] 

Appy commented on HBASE-19290:
--

+1

I still don't get it. Am starting to feel thick head on this one :). But i am 
willing to side it in order to make progress here. Feel free to commit afa i am 
concerned. But let me try to illustrate my question one last time.

bq. The other splitter is available (gated by the return value from 
calculateAvailableSplitters()), hence another call to grabTask() should be 
allowed, right ?
Of course we should do another call. But the confusing is about throttling. Why 
do we throttle (extra sleep) when grabbedTask=0 and not when grabbedTask =1. 
It's not obvious, at least to me.

bq. But when grabbedTask =1, and we still keep failing to grab tasks, it will 
end the for loop and enter while (seq_start == taskReadySeq.get()) {} loop, do 
this have any problem?
"it will end for loop and enter while" - This happens even when 
grabbedTask = 0. Then why do we need {{if (grabbedTask==0  && ...)}}??
I know you said earlier that taskReadySeq will keep increasing as zk nodes keep 
updating, so that wait() on condition may not provide much of throttling when 
there's lot of churn. But again, that applies to irrespective of value of 
grabbedTask.

Can you please walk me through a full case explaining need of that "if" 
condition and why value of grabbedTask=0 is special.
Thanks.





> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263840#comment-16263840
 ] 

Hadoop QA commented on HBASE-19290:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
48s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  6m 
12s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
3s{color} | {color:red} hbase-server: The patch generated 1 new + 5 unchanged - 
0 fixed = 6 total (was 5) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 3s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
54m  7s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m 
24s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}168m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898980/HBASE-19290.master.004.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux df6057c46798 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / cdc2bb17ff |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9984/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9984/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263808#comment-16263808
 ] 

Ted Yu commented on HBASE-19290:


bq. The WAL split speed was stable at 0.2TB/minute

The above is convincing.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263773#comment-16263773
 ] 

Yu Li commented on HBASE-19290:
---

Let me try to add more information:

Once when upgrading the HDFS version, NN had some fencing problem and causing 
all RS aborted one by one. And after HDFS restored and HBase cluster restarted, 
we observed Master threads waiting on zookeeper to return:
{noformat}
"MASTER_SERVER_OPERATIONS-hdpet2mainsem2:60100-28"#2236 prio=5 os_prio=0 
tid=0x7ff526bad800 nid=0xa890 in Object.wait() [0x7ff5150f6000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
- locked <0x0005d9c720d0> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1470)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:635)
at 
org.apache.hadoop.hbase.coordination.ZKSplitLogManagerCoordination.remainingTasksInCoordination(ZKSplitLogManagerCoordination.java:150)
at 
org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:353)
- locked <0x0006440826e8> (a 
org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
at 
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:274)
{noformat}

And after investigation we found the root cause is splitWAL znode contains too 
many children and the {{getChildren}} call is too time-consuming.

After some further discussion on how to resolve the issue, we think the most 
efficient way is to reduce the speed of publishing split tasks, or say only 
publish when there's available WAL splitter. Publishing the task aggressively 
could help nothing but slowing down the {{getChildren}} operation on splitWAL 
thus the whole world.

After the patched version went online, we encountered another disaster case 
(unfortunately...) and experienced no more zk contention problem. The WAL split 
speed was stable at 0.2TB/minute

So we don't have any performance testing result, but theory proved by 
observation from real world, and hope this is convincing (smile).

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263760#comment-16263760
 ] 

binlijin commented on HBASE-19290:
--

{quote}
The patch is adding throttling, so it's certainly changing the way things work. 
It's also adding the 'if' condition controlling throttling, so the choice is 
definitely being made by you.
I suspect you are using grabbedTask=0 as a proxy for 'failed to grab task' and 
wait on it. But when grabbedTask =1, and we still keep failing to grab tasks, 
there is no throttling for that case! Hopefully that makes my question clearer?
{quote}
But when grabbedTask =1, and we still keep failing to grab tasks, it will end 
the for loop and enter while (seq_start == taskReadySeq.get()) {} loop,  do 
this have any problem? 

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263748#comment-16263748
 ] 

binlijin commented on HBASE-19290:
--

bq. The above example lasted for almost an hour. 
  With the patch, roughly how long does log splitting task last ?
We do not record the new numbers and the log do not exists now. But we record 
that we have 7.1TB wals and split it in 40mins.


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263742#comment-16263742
 ] 

Ted Yu commented on HBASE-19290:


Lijin:
The above example lasted for almost an hour.

With the patch, roughly how long does log splitting task last ?

Thanks

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263729#comment-16263729
 ] 

binlijin commented on HBASE-19290:
--

We observe that when the cluster is big, regionserver issue too much zookeeper 
request to get availableRSs from rsZNode and also getTaskList from 
splitLogZNode. Have more nodes, the get availableRSs is more heavy.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263720#comment-16263720
 ] 

binlijin commented on HBASE-19290:
--

[~tedyu]
bq. Assuming patch v3 is very close to the version you run in the 2000+ node 
production cluster, can you post some performance numbers (in terms of 
reduction in zookeeper requests) so that we can know its effectiveness ?

We do not record the performance numbers.
But without the patch we can see the HMaster get the zookeeper event very very 
slowly...

HMaster put up split task:

*2017-07-11 20:22:57,608* DEBUG [main-EventThread] 
coordination.SplitLogManagerCoordination: put up splitlog task at znode 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548

RegionServer grab the task and done it.

*2017-07-11 20:23:33,689* INFO  [SplitLogWorker-hadoop1435:16020] 
coordination.ZkSplitLogWorkerCoordination: worker 
hadoop1435.et2.tbsite.net,16020,1495647366458 acquired task 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548

*2017-07-11 20:25:47,131* INFO  [RS_LOG_REPLAY_OPS-hadoop1435:16020-1] 
coordination.ZkSplitLogWorkerCoordination: successfully transitioned task 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548
 to final state DONE hadoop1435.et2.tbsite.net,16020,1495647366458

HMaster get the task done event and delete it:

*2017-07-11 20:49:52,879* INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: task 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548
 entered state: DONE hadoop1435.et2.tbsite.net,16020,1495647366458

*2017-07-11 20:49:52,881* INFO  [main-EventThread] 
coordination.SplitLogManagerCoordination: Done splitting 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548

*2017-07-11 21:19:52,280* DEBUG [main-EventThread] 
coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted 
/hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, 
> HBASE-19290.master.004.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263699#comment-16263699
 ] 

binlijin commented on HBASE-19290:
--

[~appy]
bq. Yeah, they are definitely clear, but Jingyun Tian's suggestion to improve 
the code further was appropriate and made sense.
Here's the diff on what he was suggesting (and what i was thinking earlier).
Done in HBASE-19290.master.004.patch .


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263187#comment-16263187
 ] 

Ted Yu commented on HBASE-19290:


Reading the code with patch, it seems calculateAvailableSplitters() tries to 
balance the load on zookeeper by considering tasksInProgress which would be 
incremented in pace with taskGrabbed.

bq. But when grabbedTask =1, and we still keep failing to grab tasks

The other splitter is available (gated by the return value from 
calculateAvailableSplitters()), hence another call to grabTask() should be 
allowed, right ?

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263094#comment-16263094
 ] 

Appy commented on HBASE-19290:
--

[~tedyu] what kind of performance numbers are you looking for?

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263089#comment-16263089
 ] 

Appy commented on HBASE-19290:
--

{quote}
bq. I can see it's doing that. The question really meant - why such interesting 
choice? It's not usual thing to do i.e. throttle for first request and start 
hammering servers after that. If it was something you chose by design - please 
add a comment about the behavior and explaining reasoning; if not by design, 
then probably remove the 'if condition' and always sleep.
 The design is not chose by me, and i do not see this design has any problem, 
there are 2 available splitters, and grabbed one task then try to grab the 
second task and doing split task as fast as possible. My patch do not change 
the design, and just trying to issue less zk request.
{quote}
The patch is adding throttling, so it's certainly changing the way things work. 
It's also adding the 'if' condition controlling throttling, so the choice is 
definitely being made by you.
I suspect you are using grabbedTask=0 as a proxy for 'failed to grab task' and 
wait on it. But when grabbedTask =1, and we still keep failing to grab tasks, 
there is no throttling for that case! Hopefully that makes my question clearer?


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263086#comment-16263086
 ] 

Ted Yu commented on HBASE-19290:


Lijin:
Assuming patch v3 is very close to the version you run in the 2000+ node 
production cluster, can you post some performance numbers so that we can know 
its effectiveness ?

Thanks

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-22 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263051#comment-16263051
 ] 

Appy commented on HBASE-19290:
--

{quote}
bq. Since expectedTasksPerRS only relay on availableRSs and numTasks, I think 
it is better to move them to getAvailableRSs() and change a name?
I do not think so, and i think getAvailableRSs and calculateAvailableSplitters 
is clear.
{quote}

Yeah, they are definitely clear, but [~tianjingyun]'s suggestion to improve the 
code further was appropriate and made sense.
Here's the diff on what he was suggesting (and what i was thinking earlier).
{noformat}
diff --git 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZkSplitLogWorkerCoordination.java
 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZkSplitLogWorkerCoordination.java
index e7d15bbc16..e7fd93bf46 100644
--- 
a/hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZkSplitLogWorkerCoordination.java
+++ 
b/hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZkSplitLogWorkerCoordination.java
@@ -318,7 +318,14 @@ public class ZkSplitLogWorkerCoordination extends 
ZKListener implements
 server.getExecutorService().submit(hsh);
   }
 
-  private int getAvailableRSs() {
+  /**
+   * This function calculates how many splitters this RS should create based 
on expected average
+   * tasks per RS and the hard limit upper bound(maxConcurrentTasks) set by 
configuration. 
+   * At any given time, a RS allows spawn MIN(Expected Tasks/RS, Hard Upper 
Bound)
+   * @param numTasks total number of split tasks available
+   * @return number of tasks this RS can grab
+   */
+  private int getNumExpectedTasksPerRS(int numTasks) {
 // at lease one RS(itself) available
 int availableRSs = 1;
 try {
@@ -329,22 +336,17 @@ public class ZkSplitLogWorkerCoordination extends 
ZKListener implements
   // do nothing
   LOG.debug("getAvailableRegionServers got ZooKeeper exception", e);
 }
-return availableRSs;
+int expectedTasksPerRS = (numTasks / availableRSs) + ((numTasks % 
availableRSs == 0) ? 0 : 1);
+return Math.max(1, expectedTasksPerRS); // at least be one
   }
 
   /**
-   * This function calculates how many splitters it could create based on 
expected average tasks per
-   * RS and the hard limit upper bound(maxConcurrentTasks) set by 
configuration. 
-   * At any given time, a RS allows spawn MIN(Expected Tasks/RS, Hard Upper 
Bound)
-   * @param availableRSs current total number of available regionservers
-   * @param numTasks current total number of available tasks
+   * @param expectedTasksPerRS Average number of tasks to be handled by each RS
+   * @return true if more splitters are available, otherwise false.
*/
-  private int calculateAvailableSplitters(int availableRSs, int numTasks) {
-int expectedTasksPerRS = (numTasks / availableRSs) + ((numTasks % 
availableRSs == 0) ? 0 : 1);
-expectedTasksPerRS = Math.max(1, expectedTasksPerRS); // at least be one
-// calculate how many more splitters we could spawn
-return Math.min(expectedTasksPerRS, maxConcurrentTasks)
-- this.tasksInProgress.get();
+  private boolean areSplittersAvailable(int expectedTasksPerRS) {
+return (Math.min(expectedTasksPerRS, maxConcurrentTasks)
+- this.tasksInProgress.get()) > 0;
   }
 
   /**
@@ -425,11 +427,11 @@ public class ZkSplitLogWorkerCoordination extends 
ZKListener implements
 }
   }
   int numTasks = paths.size();
-  int availableRSs = getAvailableRSs();
+  int expectedTasksPerRS = getNumExpectedTasksPerRS(numTasks);
   int taskGrabbed = 0;
   for (int i = 0; i < numTasks; i++) {
 while (!shouldStop) {
-  if (this.calculateAvailableSplitters(availableRSs, numTasks) > 0) {
+  if (this.areSplittersAvailable(expectedTasksPerRS)) {
 LOG.debug("Current region server " + server.getServerName()
 + " is ready to take more tasks, will get task list and try 
grab tasks again.");
 int idx = (i + offset) % paths.size();
{noformat}

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's 

[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-21 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261933#comment-16261933
 ] 

binlijin commented on HBASE-19290:
--

bq.  Since expectedTasksPerRS only relay on availableRSs and numTasks, I think 
it is better to move them to getAvailableRSs() and change a name?
I do not think so,  and i think getAvailableRSs and calculateAvailableSplitters 
is clear.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-21 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261925#comment-16261925
 ] 

binlijin commented on HBASE-19290:
--

bq.  I can see it's doing that. The question really meant - why such 
interesting choice? It's not usual thing to do i.e. throttle for first request 
and start hammering servers after that. If it was something you chose by design 
- please add a comment about the behavior and explaining reasoning; if not by 
design, then probably remove the 'if condition' and always sleep.
The design is not chose by me, and i do not see this design has any problem, 
there are 2 available splitters, and grabbed one task then try to grab the 
second task and doing split task as fast as possible. My patch do not change 
the design, and just trying to issue less zk request.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-21 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261597#comment-16261597
 ] 

Appy commented on HBASE-19290:
--

{quote}
bq. So there are 2 available splitters, and one grabbed task, we don't stop 
here and keep hammering zk?
Yes.
{quote}
I can see it's doing that. The question really meant - why such interesting 
choice? It's not usual thing to do i.e. throttle for first request and start 
hammering servers after that. If it was something you chose by design - please 
add a comment about the behavior and explaining reasoning.

bq. The while loop will enter only if when seq_start == taskReadySeq.get(), and 
when every splitLogZNode's children changed the taskReadySeq will increment, so 
it will not enter the while (seq_start == taskReadySeq.get()) {} and kill 
trying to grab task and issue zk request.
Makes sense. thanks.


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-21 Thread Jingyun Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260541#comment-16260541
 ] 

Jingyun Tian commented on HBASE-19290:
--

{code}
343 int expectedTasksPerRS = (numTasks / availableRSs) + ((numTasks % 
availableRSs == 0) ? 0 : 1);
344 expectedTasksPerRS = Math.max(1, expectedTasksPerRS); // at least 
be one
{code}
Since expectedTasksPerRS only relay on availableRSs and numTasks, I think it is 
better to move them to getAvailableRSs() and change a name?

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260258#comment-16260258
 ] 

Hadoop QA commented on HBASE-19290:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 9s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
23s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
53s{color} | {color:red} hbase-server: The patch generated 1 new + 5 unchanged 
- 0 fixed = 6 total (was 5) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
25s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
46m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 96m 
24s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898598/HBASE-19290.master.003.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux f0af9476849a 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8f806ab486 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9935/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9935/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-20 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260196#comment-16260196
 ] 

binlijin commented on HBASE-19290:
--

bq.  Why randomize? Can be constant?
No particular reason for randomize, i change it to constant.
bq. So there are 2 available splitters, and one grabbed task, we don't stop 
here and keep hammering zk?
Yes.
bq. Can do it in "if" condition itself?
Yes, it can do, done it.
bq. That while condition is just to handle spurious wakeups. See Object#wait. 
You can definitely remove the second sleep (unless there's a concrete reason 
not to).
The while loop will enter only if when seq_start == taskReadySeq.get(), and 
when every splitLogZNode's children changed the taskReadySeq will increment, so 
it will not enter the while (seq_start == taskReadySeq.get()) {} and kill 
trying to grab task and issue zk request.


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch, HBASE-19290.master.003.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-20 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260121#comment-16260121
 ] 

Appy commented on HBASE-19290:
--

bq. int sleepTime = RandomUtils.nextInt(0, 100) + 500;
Why randomize? Can be constant? 

bq if (taskGrabbed == 0 && !shouldStop) {
So there are 2 available splitters, and one grabbed task, we don't stop here 
and keep hammering zk?
Probably change taskGrabbed 

{quote}
int idx = (i + offset) % paths.size();
446 // don't call ZKSplitLog.getNodeName() because that will lead to
447 // double encoding of the path name
448 taskGrabbed += 
grabTask(ZNodePaths.joinZNode(watcher.znodePaths.splitLogZNode, 
paths.get(idx))) ? 1 : 0;
{quote}
Can do it in "if" condition itself?

bq. taskReadySeq.wait may not execute because it has condition.
That while condition is just to handle spurious wakeups.  See Object#wait. You 
can definitely remove the second sleep (unless there's a concrete reason not 
to).


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259078#comment-16259078
 ] 

Hadoop QA commented on HBASE-19290:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
53s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
3s{color} | {color:red} hbase-server: The patch generated 1 new + 5 unchanged - 
0 fixed = 6 total (was 5) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
50s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
53m  4s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 35s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}179m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898426/HBASE-19290.master.002.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 034a3598b11e 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 9b7b83d862 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9921/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9921/artifact/patchprocess/patch-unit-hbase-server.txt
 |
|  Test Results | 

[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-19 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258913#comment-16258913
 ] 

binlijin commented on HBASE-19290:
--

bq. Add javadoc for availableRSs.
OK.
bq. Grabed -> Grabbed
OK.
bq. Consider using shorter sleep time.
OK.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-19 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258912#comment-16258912
 ] 

binlijin commented on HBASE-19290:
--

bq. Please add a short commit message explaining 'idea' of the change (makes 
reviewing easy then having to reverse engineer the 'idea')
Add the idea in description.
bq. grabTask only returns 0 or 1. Use boolean please.
OK.
bq. Add @return comment to grabTask.
OK.
bq. We probably don't need {{if (taskGrabed == 0 && !shouldStop) }}, 
taskReadySeq.wait is indefinite wait on availability of next task.
taskReadySeq.wait may not execute because it have condition. 
bq. Can we pull the while loop one level outside and share the 
if(calculateAvailableSplitters(..)) condition.
OK.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch, 
> HBASE-19290.master.002.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.
> (1) Reduce request to rsZNode, every time calculateAvailableSplitters will 
> get rsZNode's children from zookeeper, when cluster is huge, this is heavy. 
> This patch reduce the request. 
> (2) When the regionserver has max split tasks running, it may still trying to 
> grab task and issue zookeeper request, we should sleep and wait until we can 
> grab tasks again.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257863#comment-16257863
 ] 

Ted Yu commented on HBASE-19290:


{code}
+   * @param numTasks current total number of available tasks
+   */
+  private int calculateAvailableSplitters(int availableRSs, int numTasks) {
{code}
Add javadoc for availableRSs.
{code}
+  taskGrabed += 
grabTask(ZNodePaths.joinZNode(watcher.znodePaths.splitLogZNode, 
paths.get(idx)));
{code}
Grabed -> Grabbed
{code}
+  while (!shouldStop) {
+Thread.sleep(1000);
{code}
Consider using shorter sleep time.


> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257575#comment-16257575
 ] 

Hadoop QA commented on HBASE-19290:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
59s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
52s{color} | {color:red} hbase-server: The patch generated 1 new + 5 unchanged 
- 0 fixed = 6 total (was 5) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 6s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
45m 17s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 78m 
21s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}140m 20s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 |
| JIRA Issue | HBASE-19290 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898139/HBASE-19290.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c5ddf3132a60 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | master / 907b268fd4 |
| maven | version: Apache Maven 3.5.2 
(138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) |
| Default Java | 1.8.0_151 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9897/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/9897/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-17 Thread Appy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256632#comment-16256632
 ] 

Appy commented on HBASE-19290:
--

- Please add a short commit message explaining 'idea' of the change (makes 
reviewing easy then having to reverse engineer the 'idea')
- grabTask only returns 0 or 1. Use boolean please.
- Add @return comment to grabTask.
- We probably don't need {{if (taskGrabed == 0 && !shouldStop) }}, 
taskReadySeq.wait is indefinite wait on availability of next task.
- Can we pull the while loop one level outside and share the 
{{if(calculateAvailableSplitters(..))}} condition.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log

2017-11-16 Thread binlijin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256585#comment-16256585
 ] 

binlijin commented on HBASE-19290:
--

The patch is effective and we have test it on our 2000+ nodes production 
cluster.

> Reduce zk request when doing split log
> --
>
> Key: HBASE-19290
> URL: https://issues.apache.org/jira/browse/HBASE-19290
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-19290.master.001.patch
>
>
> We observe once the cluster has 1000+ nodes and when hundreds of nodes abort 
> and doing split log, the split is very very slow, and we find the 
> regionserver and master wait on the zookeeper response, so we need to reduce 
> zookeeper request and pressure for big cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)