[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-10-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469908#comment-13469908
 ] 

Hudson commented on HBASE-6550:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #8 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/8/])
HBASE-6860  [replication] HBASE-6550 is too aggressive, DDOSes .META. 
(Revision 1388695)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.94.2, 0.96.0

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460869#comment-13460869
 ] 

Hudson commented on HBASE-6550:
---

Integrated in HBase-0.94-security #55 (See 
[https://builds.apache.org/job/HBase-0.94-security/55/])
HBASE-6860  [replication] HBASE-6550 is too aggressive, DDOSes .META. 
(Revision 1388695)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.94.2, 0.96.0

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460878#comment-13460878
 ] 

Hudson commented on HBASE-6550:
---

Integrated in HBase-TRUNK #3368 (See 
[https://builds.apache.org/job/HBase-TRUNK/3368/])
HBASE-6860  [replication] HBASE-6550 is too aggressive, DDOSes .META. 
(Revision 1388694)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.94.2, 0.96.0

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-09-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460928#comment-13460928
 ] 

Hudson commented on HBASE-6550:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #186 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/186/])
HBASE-6860  [replication] HBASE-6550 is too aggressive, DDOSes .META. 
(Revision 1388694)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.94.2, 0.96.0

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-09-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448253#comment-13448253
 ] 

Hudson commented on HBASE-6550:
---

Integrated in HBase-0.94-security #51 (See 
[https://builds.apache.org/job/HBase-0.94-security/51/])
HBASE-6550 Refactoring ReplicationSink to make it more responsive of 
cluster health (Himanshu Vashishtha) (Revision 1379229)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-09-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448346#comment-13448346
 ] 

Hudson commented on HBASE-6550:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/])
HBASE-6550 Refactoring ReplicationSink to make it more responsive of 
cluster health (Himanshu Vashishtha) (Revision 1379229)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445205#comment-13445205
 ] 

Lars Hofhansl commented on HBASE-6550:
--

@Himanshu: Wanna update the patch? I would like to get 0.94.2 out of the door, 
and this should included.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445479#comment-13445479
 ] 

Lars Hofhansl commented on HBASE-6550:
--

Thanks for the patch Himanshu.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445613#comment-13445613
 ] 

Hudson commented on HBASE-6550:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #155 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/155/])
HBASE-6550 Refactoring ReplicationSink to make it more responsive of 
cluster health (Himanshu Vashishtha) (Revision 1379227)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445640#comment-13445640
 ] 

Hudson commented on HBASE-6550:
---

Integrated in HBase-0.94 #443 (See 
[https://builds.apache.org/job/HBase-0.94/443/])
HBASE-6550 Refactoring ReplicationSink to make it more responsive of 
cluster health (Himanshu Vashishtha) (Revision 1379229)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, 
 HBase-6550-v5.patch, HBase-6550-v6.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-27 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442791#comment-13442791
 ] 

Jean-Daniel Cryans commented on HBASE-6550:
---

The 0.94 patch uses Threads.newDaemonThreadFactory I can't really find anywhere.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, 
 HBase-6550-v4.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-27 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442794#comment-13442794
 ] 

Jean-Daniel Cryans commented on HBASE-6550:
---

Ah ok I see it now, it was from a recent patch.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, 
 HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, 
 HBase-6550-v4.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434891#comment-13434891
 ] 

Hadoop QA commented on HBASE-6550:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12541006/HBase-6550-v4.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2579//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2579//console

This message is automatically generated.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435287#comment-13435287
 ] 

Jean-Daniel Cryans commented on HBASE-6550:
---

+1 on latest patch. For 0.94 we'll need a backport and I'd be +1 on that only 
if it's tested on a real cluster, which I volunteer on doing if provided with a 
patch that applies cleanly :)

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435439#comment-13435439
 ] 

Lars Hofhansl commented on HBASE-6550:
--

Awesome. Thanks J-D.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.2

 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434650#comment-13434650
 ] 

Lars Hofhansl commented on HBASE-6550:
--

I like this patch.

Ted will point out that you need to restore the threads interrupted state when 
you catch InterruptedException, and he would be correct :)

Let's also get agreement from J-D that the ExecutorService is tuned correctly 
here, because it will be shared between all HTables used by this 
ReplicationSink.


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-14 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434685#comment-13434685
 ] 

Himanshu Vashishtha commented on HBASE-6550:


Ok :)

re: InterrupteException: even when we are closing the host rs?

re:JD. +1 

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434698#comment-13434698
 ] 

Lars Hofhansl commented on HBASE-6550:
--

If you'll guarantee in writing that it only ever happens while the RS is being 
closed then it's fine :)

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-14 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434706#comment-13434706
 ] 

Jean-Daniel Cryans commented on HBASE-6550:
---

The TPE looks ok.

Shouldn't the conf be cloned? I'm worried about propagating those client-side 
configurations back in the RS. You never know when this can bite us especially 
in unit tests.

Don't do this:

{code}
LOG.warn(interrupted while terminating:  + e);
{code}

They put a second argument on those calls just for the exceptions. Also try 
having error messages that are more descriptive about the context.

On the nitpick-side of things:
 - Call {{exec}} something more specific to what it is
 - Call {{con}} something more specific to what it is
 - Its called should be It's called

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434761#comment-13434761
 ] 

Lars Hofhansl commented on HBASE-6550:
--

Oh yeah, the conf must be cloned (like the first version of the patch.)
exec and con are my fault :)


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550.patch, 
 HBase-6550-v1.patch, HBase-6550-v3.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432144#comment-13432144
 ] 

Lars Hofhansl commented on HBASE-6550:
--

Looks like this should work.

I had something simpler in mind:
# Have a decorated conf (like you do), set client pause/retry and also lower 
client rpc timeout.
# Create an unmanaged HConnectionImplementation and an Executor
# For each batch create new HTable(connection, executor)
# apply batch
# close create HTable.

Seems that would be more readable...?


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432185#comment-13432185
 ] 

Himanshu Vashishtha commented on HBASE-6550:


I see :)

I will be glad to make it more simpler. But, its not that difficult...  :P
It basically adds two things: bailout mechanism; and to achieve it, use 
Callable to submit in a RepSink#threadpool.

I wanted to have the bailout functionality for the regionserver handler as part 
of the patch. With this, it gives the opportunity to do cleanup etc in case 
client goes away. Decorating config solves half the purpose. 
Another way is making similar changes at the master cluster regionserver side 
(decorating its config with a lower rpc timeout etc, but that's not desirable 
as its not intra-cluster and we want to give a full try before resending the 
shipment).


bq. Create an unmanaged HConnectionImplementation and an Executor
You mean at class level? In case another master cluster regionserver calls the 
method via another handler, it will wait then?
Or at method level? 

bq.For each batch create new HTable(connection, executor)
apply batch
close create HTable.

Yes, it also happens in the current patch. It closes out the connection, and 
htable's pool after the batch op.


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432257#comment-13432257
 ] 

Lars Hofhansl commented on HBASE-6550:
--

ThreadPool is pretty heavy weight (we're not using it in out Salesforce 
appservers at all, but directly use HConnections and Executors as I do here).


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432268#comment-13432268
 ] 

Lars Hofhansl commented on HBASE-6550:
--

I meant HTablePool :)

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432314#comment-13432314
 ] 

Zhihong Ted Yu commented on HBASE-6550:
---

@Lars:
{code}
+this.exec.shutdown();
+try {
+  this.exec.shutdownNow();
{code}
I think something similar to the following should be placed in the try block 
(copied from 
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html):
{code}
 if (!pool.awaitTermination(60, TimeUnit.SECONDS)) {
   pool.shutdownNow(); // Cancel currently executing tasks
{code}

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432324#comment-13432324
 ] 

Lars Hofhansl commented on HBASE-6550:
--

Thanks Ted. You are right. I did not attach this as a suggested patch, though. 
Just to explain what I meant with my earlier comments without typing a lot :)


 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432340#comment-13432340
 ] 

Himanshu Vashishtha commented on HBASE-6550:


Got it. I like the suggestion of re-using the threadpool for all HTable 
instances. 

bq.Re:Although I would not think that that would be a common problem once the 
timeouts here are short enough.
I think it will be good to have as it makes it more responsive, and also takes 
care from master cluster side: in a case when user doesn't configure a lower 
rpc timeout (for whatever reasons) at slave cluster.

 Refactoring ReplicationSink to make it more responsive of cluster health
 

 Key: HBASE-6550
 URL: https://issues.apache.org/jira/browse/HBASE-6550
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Attachments: 6550-havealook.txt, HBase-6550-v1.patch


 ReplicationSink replicates the WALEdits in the local cluster. It uses native 
 HBase client to insert the mutations. Sometime, it takes a while to process 
 it (may be due to region splitting, gc pause, etc) and it undergoes the 
 retrial phase. 
 It has two repercussions:
 a) The regionserver handler which is serving the request (till now, a 
 priority handler) is blocked for this period.
 b) The caller may get timed out and it will retry it anyway, but the handler 
 serving the ReplicationSink requests is still working.
 Refactoring ReplicationSink to have the following features:
 a) Making it more configurable (have its own number of retrial limit, 
 connection timeout, etc)
 b) Add a fail fast behavior so that it bails out in case caller is timedout, 
 or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira