[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2016-06-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338365#comment-15338365
 ] 

Hadoop QA commented on HBASE-14937:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 3s {color} 
| {color:red} HBASE-14937 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.2.1/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778212/HBASE-14937.patch |
| JIRA Issue | HBASE-14937 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/2294/console |
| Powered by | Apache Yetus 0.2.1   http://yetus.apache.org |


This message was automatically generated.



> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2016-06-18 Thread Mikhail Antonov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338363#comment-15338363
 ] 

Mikhail Antonov commented on HBASE-14937:
-

Any progress here? kicked out to 1.4.0


> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2016-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179996#comment-15179996
 ] 

Hadoop QA commented on HBASE-14937:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 
0s {color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 
10s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s 
{color} | {color:green} master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
10s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
52s {color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} master passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} master passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
24m 11s {color} | {color:green} Patch does not cause any errors with Hadoop 
2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
59s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_79 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 90m 22s 
{color} | {color:green} hbase-server in the patch passed with JDK v1.8.0. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 88m 55s 
{color} | {color:green} hbase-server in the patch passed with JDK v1.7.0_79. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
16s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 225m 15s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12778212/HBASE-14937.patch |
| JIRA Issue | HBASE-14937 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  
hbaseanti  checkstyle  compile  |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 7dabcf2 |
| findbugs | v3.0.0 |
| JDK v1.7.0_79  Test Results | 

[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2016-03-04 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15179760#comment-15179760
 ] 

Ashish Singhi commented on HBASE-14937:
---

[~andrew.purt...@gmail.com], how about adding another configuration which will 
limit these callTimeout value ? So that user can set this max replication rpc 
timeout value according to her/his needs. Default being 2 hours.

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2016-01-07 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088494#comment-15088494
 ] 

Ashish Singhi commented on HBASE-14937:
---

I tried to reproduce this but till now not able to see that remote server is 
available and we are just sleeping. 
Can you please give me the scenario I would like to test that? Thanks. 

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2016-01-07 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088455#comment-15088455
 ] 

Andrew Purtell commented on HBASE-14937:


Not convinced waiting longer is better than just retrying. Seems waiting longer 
can only lead us to be sleeping unnecessarily when the remote is available 
again.

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2016-01-07 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087715#comment-15087715
 ] 

Ashish Singhi commented on HBASE-14937:
---

Ping [~apurtell].

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2016-01-06 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086583#comment-15086583
 ] 

Ashish Singhi commented on HBASE-14937:
---

Ping @Andrew Purtell 

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2016-01-04 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082585#comment-15082585
 ] 

Ashish Singhi commented on HBASE-14937:
---

Hi [~andrew.purt...@gmail.com], can you check my reply comment and let me know 
if it addresses your concern.

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-21 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066370#comment-15066370
 ] 

Ashish Singhi commented on HBASE-14937:
---

The increment in the the timeout value of rpc call will be done only when we 
get CallTimeoutException for all other exception types the code remains the 
same.
Now suppose due to some issue where in we were able to connect to peer cluster 
but could not replicate the data and after lot of retries we calculate the 
timeout value to say 5 hours then during this call if the peer cluster is back 
after two hours then this will resume and succeed so there is no blocking of 
replication activity as such.

I tried to simulate this on my local cluster, first I made the peer cluster 
HBase service down so the client was getting ConnectException hence there was 
no increase in the rpc timeout value and second by keeping a debug point in 
replication flow in the peer cluster and was not allowing replication activity 
to complete in the set rpc timeout value where the client was getting 
CallTimeoutException for 2-3 times and as per the patch it increased the rpc 
timeout here then on a new call after receiving the call in the peer cluster 
released the debug point after some time and replication activity begun 
immediately.

Please let me know if this address your concerns or any other thing you would 
like me to check ?

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-21 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066373#comment-15066373
 ] 

Ashish Singhi commented on HBASE-14937:
---

The patch attached here is good only when {{hbase.rpc.client.impl}} is set to 
{{RpcClientImpl.class}} and not {{AbstractRpcClient.class}} as it is by default 
in master branch. For details HBASE-15018, once that is committed then the 
patch here will hold good in both the cases.

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-18 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064193#comment-15064193
 ] 

Andrew Purtell commented on HBASE-14937:


When replication is down, say because of a network partition or temporary issue 
on one cluster, RPC calls can of course time out. Once the network or cluster 
is back in operation we want replication activity to resume as quickly as 
possible. Does this change prevent timely restart of replication activity? 
Won't we potentially be waiting for a long time for the current call to timeout 
before probing with another? Would the time we might wait unnecessarily 
increase as the duration of the outage increases, making a long outage a really 
really long outage?

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-18 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064339#comment-15064339
 ] 

Ashish Singhi commented on HBASE-14937:
---

Andrew, thanks for the comment.
bq. When replication is down, say because of a network partition or temporary 
issue on one cluster, RPC calls cannot succeed and will time out.
In this case we will get ConnectException, right ? Please correct me If I am 
wrong.


> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-18 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065041#comment-15065041
 ] 

Andrew Purtell commented on HBASE-14937:


bq. In this case we will get ConnectException, right ? Please correct me If I 
am wrong.

Not necessarily

Anyway, that's not the core of my concerns on this change which is we want 
replication activity to restart as quickly as possible. 

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-17 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061683#comment-15061683
 ] 

Ashish Singhi commented on HBASE-14937:
---

Attached patch as per above approach mentioned.
On every retry we increase the timeout by (retry time * 2) if its value is not 
0 or it reached Integer.Max_Value.

Please review.

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061722#comment-15061722
 ] 

Hadoop QA commented on HBASE-14937:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12778203/HBASE-14937.patch
  against master branch at commit d78eddfdc8bad5068600e28a039276cc55063ce2.
  ATTACHMENT ID: 12778203

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color}.  The patch appears to cause mvn compile goal to 
fail with Hadoop version 2.4.0.

Compilation errors resume:
[ERROR] COMPILATION ERROR : 
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java:[351,9]
 constructor Replicator in class 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.Replicator
 cannot be applied to given types;
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.2:testCompile 
(default-testCompile) on project hbase-server: Compilation failure
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java:[351,9]
 constructor Replicator in class 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.Replicator
 cannot be applied to given types;
[ERROR] required: java.util.List,int,int
[ERROR] found: java.util.List,int
[ERROR] reason: actual and formal argument lists differ in length
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hbase-server


Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16900//console

This message is automatically generated.

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061990#comment-15061990
 ] 

Hadoop QA commented on HBASE-14937:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12778212/HBASE-14937.patch
  against master branch at commit d78eddfdc8bad5068600e28a039276cc55063ce2.
  ATTACHMENT ID: 12778212

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}. The applied patch does not generate new 
checkstyle errors.

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 site{color}.  The mvn post-site goal succeeds with this 
patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 zombies{color}. No zombie tests found running at the end of 
the build.

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16902//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16902//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16902//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16902//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/16902//console

This message is automatically generated.

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-17 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062248#comment-15062248
 ] 

Ashish Singhi commented on HBASE-14937:
---

bq. -1 javadoc. The javadoc tool appears to have generated 1 warning messages.
This will be fixed as part of HBASE-15000

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-17 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062484#comment-15062484
 ] 

Ted Yu commented on HBASE-14937:


{code}
300   this.callTimeout *= callTimeoutRetryCounter * 2;
{code}
Would the timeout increase too fast after several retries ?
{code}
303 LOG.debug("Replication RPC request call timeout " + 
this.callTimeout
304 + " overflows integer value. Setting it to interger 
max value.");
{code}
Please include retry count in above message.

If we continuously get CallTimeoutException, retry would be performed 
repeatedly. Should an upperbound be set for the total duration of retries ?

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-17 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063567#comment-15063567
 ] 

Ashish Singhi commented on HBASE-14937:
---

Thanks Ted for the review.

bq. Would the timeout increase too fast after several retries ?
Yes it might, if the network between two DC is very slow then it may take more 
time to finish the replication request when it contains a mix of mutations and 
bulk loaded data and we have not provided sufficient timeout value.

bq. Please include retry count in above message.
Already included in the next log message at info level below it.

bq. Should an upperbound be set for the total duration of retries ?
I purposefully did not set any upper bound to it reason being as stated in my 
first response. If you would like to have a upper bound, what you suggest to be 
the maximum number of retries before we give up increasing the timeout value ?

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14937.patch
>
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14937) Make rpc call timeout for replication adaptive

2015-12-09 Thread Ashish Singhi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048997#comment-15048997
 ] 

Ashish Singhi commented on HBASE-14937:
---

To solve this problem client can simply increase the timeout value for 
{{hbase.rpc.timeout}} as per their requirement (by default it is 1 minute) but 
this will apply to all the RPC requests so rather than doing this we can make 
it adaptive by adding another configuration {{hbase.replication.rpc.timeout}} 
with default value as {{hbase.rpc.timeout}} and set this as call timeout value 
to the rpc request and on every {{CallTimeOutException}} we can increase this 
value with some multiplier for some configurable number of times and set this 
timeout value for the next retry of replication request.

Any other thoughts ?

> Make rpc call timeout for replication adaptive
> --
>
> Key: HBASE-14937
> URL: https://issues.apache.org/jira/browse/HBASE-14937
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ashish Singhi
>Assignee: Ashish Singhi
>  Labels: replication
>
> When peer cluster replication is disabled and lot of writes are happening in 
> active cluster and later on peer cluster replication is enabled then there 
> are chances that replication requests to peer cluster may time out.
> This is possible after HBASE-13153 and it can also happen with many and many 
> WAL data replication still pending to replicate.
> Approach to this problem will be discussed in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)