[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338365#comment-15338365 ] Hadoop QA commented on HBASE-14937: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 3s {color} | {color:red} HBASE-14937 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.2.1/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12778212/HBASE-14937.patch | | JIRA Issue | HBASE-14937 | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/2294/console | | Powered by | Apache Yetus 0.2.1 http://yetus.apache.org | This message was automatically generated. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338363#comment-15338363 ] Mikhail Antonov commented on HBASE-14937: - Any progress here? kicked out to 1.4.0 > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179996#comment-15179996 ] Hadoop QA commented on HBASE-14937: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 10s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s {color} | {color:green} master passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} master passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 10s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} master passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} master passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 24m 11s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 90m 22s {color} | {color:green} hbase-server in the patch passed with JDK v1.8.0. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 88m 55s {color} | {color:green} hbase-server in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 225m 15s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12778212/HBASE-14937.patch | | JIRA Issue | HBASE-14937 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 7dabcf2 | | findbugs | v3.0.0 | | JDK v1.7.0_79 Test Results |
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179760#comment-15179760 ] Ashish Singhi commented on HBASE-14937: --- [~andrew.purt...@gmail.com], how about adding another configuration which will limit these callTimeout value ? So that user can set this max replication rpc timeout value according to her/his needs. Default being 2 hours. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088494#comment-15088494 ] Ashish Singhi commented on HBASE-14937: --- I tried to reproduce this but till now not able to see that remote server is available and we are just sleeping. Can you please give me the scenario I would like to test that? Thanks. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088455#comment-15088455 ] Andrew Purtell commented on HBASE-14937: Not convinced waiting longer is better than just retrying. Seems waiting longer can only lead us to be sleeping unnecessarily when the remote is available again. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087715#comment-15087715 ] Ashish Singhi commented on HBASE-14937: --- Ping [~apurtell]. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086583#comment-15086583 ] Ashish Singhi commented on HBASE-14937: --- Ping @Andrew Purtell > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082585#comment-15082585 ] Ashish Singhi commented on HBASE-14937: --- Hi [~andrew.purt...@gmail.com], can you check my reply comment and let me know if it addresses your concern. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066370#comment-15066370 ] Ashish Singhi commented on HBASE-14937: --- The increment in the the timeout value of rpc call will be done only when we get CallTimeoutException for all other exception types the code remains the same. Now suppose due to some issue where in we were able to connect to peer cluster but could not replicate the data and after lot of retries we calculate the timeout value to say 5 hours then during this call if the peer cluster is back after two hours then this will resume and succeed so there is no blocking of replication activity as such. I tried to simulate this on my local cluster, first I made the peer cluster HBase service down so the client was getting ConnectException hence there was no increase in the rpc timeout value and second by keeping a debug point in replication flow in the peer cluster and was not allowing replication activity to complete in the set rpc timeout value where the client was getting CallTimeoutException for 2-3 times and as per the patch it increased the rpc timeout here then on a new call after receiving the call in the peer cluster released the debug point after some time and replication activity begun immediately. Please let me know if this address your concerns or any other thing you would like me to check ? > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066373#comment-15066373 ] Ashish Singhi commented on HBASE-14937: --- The patch attached here is good only when {{hbase.rpc.client.impl}} is set to {{RpcClientImpl.class}} and not {{AbstractRpcClient.class}} as it is by default in master branch. For details HBASE-15018, once that is committed then the patch here will hold good in both the cases. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064193#comment-15064193 ] Andrew Purtell commented on HBASE-14937: When replication is down, say because of a network partition or temporary issue on one cluster, RPC calls can of course time out. Once the network or cluster is back in operation we want replication activity to resume as quickly as possible. Does this change prevent timely restart of replication activity? Won't we potentially be waiting for a long time for the current call to timeout before probing with another? Would the time we might wait unnecessarily increase as the duration of the outage increases, making a long outage a really really long outage? > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064339#comment-15064339 ] Ashish Singhi commented on HBASE-14937: --- Andrew, thanks for the comment. bq. When replication is down, say because of a network partition or temporary issue on one cluster, RPC calls cannot succeed and will time out. In this case we will get ConnectException, right ? Please correct me If I am wrong. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065041#comment-15065041 ] Andrew Purtell commented on HBASE-14937: bq. In this case we will get ConnectException, right ? Please correct me If I am wrong. Not necessarily Anyway, that's not the core of my concerns on this change which is we want replication activity to restart as quickly as possible. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061683#comment-15061683 ] Ashish Singhi commented on HBASE-14937: --- Attached patch as per above approach mentioned. On every retry we increase the timeout by (retry time * 2) if its value is not 0 or it reached Integer.Max_Value. Please review. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061722#comment-15061722 ] Hadoop QA commented on HBASE-14937: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12778203/HBASE-14937.patch against master branch at commit d78eddfdc8bad5068600e28a039276cc55063ce2. ATTACHMENT ID: 12778203 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The patch appears to cause mvn compile goal to fail with Hadoop version 2.4.0. Compilation errors resume: [ERROR] COMPILATION ERROR : [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java:[351,9] constructor Replicator in class org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.Replicator cannot be applied to given types; [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.2:testCompile (default-testCompile) on project hbase-server: Compilation failure [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java:[351,9] constructor Replicator in class org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.Replicator cannot be applied to given types; [ERROR] required: java.util.List,int,int [ERROR] found: java.util.List,int [ERROR] reason: actual and formal argument lists differ in length [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hbase-server Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16900//console This message is automatically generated. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061990#comment-15061990 ] Hadoop QA commented on HBASE-14937: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12778212/HBASE-14937.patch against master branch at commit d78eddfdc8bad5068600e28a039276cc55063ce2. ATTACHMENT ID: 12778212 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not generate new checkstyle errors. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 zombies{color}. No zombie tests found running at the end of the build. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/16902//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16902//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/16902//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/16902//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/16902//console This message is automatically generated. > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062248#comment-15062248 ] Ashish Singhi commented on HBASE-14937: --- bq. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. This will be fixed as part of HBASE-15000 > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062484#comment-15062484 ] Ted Yu commented on HBASE-14937: {code} 300 this.callTimeout *= callTimeoutRetryCounter * 2; {code} Would the timeout increase too fast after several retries ? {code} 303 LOG.debug("Replication RPC request call timeout " + this.callTimeout 304 + " overflows integer value. Setting it to interger max value."); {code} Please include retry count in above message. If we continuously get CallTimeoutException, retry would be performed repeatedly. Should an upperbound be set for the total duration of retries ? > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063567#comment-15063567 ] Ashish Singhi commented on HBASE-14937: --- Thanks Ted for the review. bq. Would the timeout increase too fast after several retries ? Yes it might, if the network between two DC is very slow then it may take more time to finish the replication request when it contains a mix of mutations and bulk loaded data and we have not provided sufficient timeout value. bq. Please include retry count in above message. Already included in the next log message at info level below it. bq. Should an upperbound be set for the total duration of retries ? I purposefully did not set any upper bound to it reason being as stated in my first response. If you would like to have a upper bound, what you suggest to be the maximum number of retries before we give up increasing the timeout value ? > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > Fix For: 2.0.0, 1.3.0 > > Attachments: HBASE-14937.patch > > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive
[ https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048997#comment-15048997 ] Ashish Singhi commented on HBASE-14937: --- To solve this problem client can simply increase the timeout value for {{hbase.rpc.timeout}} as per their requirement (by default it is 1 minute) but this will apply to all the RPC requests so rather than doing this we can make it adaptive by adding another configuration {{hbase.replication.rpc.timeout}} with default value as {{hbase.rpc.timeout}} and set this as call timeout value to the rpc request and on every {{CallTimeOutException}} we can increase this value with some multiplier for some configurable number of times and set this timeout value for the next retry of replication request. Any other thoughts ? > Make rpc call timeout for replication adaptive > -- > > Key: HBASE-14937 > URL: https://issues.apache.org/jira/browse/HBASE-14937 > Project: HBase > Issue Type: Improvement >Reporter: Ashish Singhi >Assignee: Ashish Singhi > Labels: replication > > When peer cluster replication is disabled and lot of writes are happening in > active cluster and later on peer cluster replication is enabled then there > are chances that replication requests to peer cluster may time out. > This is possible after HBASE-13153 and it can also happen with many and many > WAL data replication still pending to replicate. > Approach to this problem will be discussed in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)