[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469908#comment-13469908 ] Hudson commented on HBASE-6550: --- Integrated in HBase-0.94-security-on-Hadoop-23 #8 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/8/]) HBASE-6860 [replication] HBASE-6550 is too aggressive, DDOSes .META. (Revision 1388695) Result = FAILURE jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: Replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.94.2, 0.96.0 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460869#comment-13460869 ] Hudson commented on HBASE-6550: --- Integrated in HBase-0.94-security #55 (See [https://builds.apache.org/job/HBase-0.94-security/55/]) HBASE-6860 [replication] HBASE-6550 is too aggressive, DDOSes .META. (Revision 1388695) Result = FAILURE jdcryans : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: Replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.94.2, 0.96.0 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460878#comment-13460878 ] Hudson commented on HBASE-6550: --- Integrated in HBase-TRUNK #3368 (See [https://builds.apache.org/job/HBase-TRUNK/3368/]) HBASE-6860 [replication] HBASE-6550 is too aggressive, DDOSes .META. (Revision 1388694) Result = FAILURE jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: Replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.94.2, 0.96.0 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460928#comment-13460928 ] Hudson commented on HBASE-6550: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #186 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/186/]) HBASE-6860 [replication] HBASE-6550 is too aggressive, DDOSes .META. (Revision 1388694) Result = FAILURE jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: Replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.94.2, 0.96.0 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448253#comment-13448253 ] Hudson commented on HBASE-6550: --- Integrated in HBase-0.94-security #51 (See [https://builds.apache.org/job/HBase-0.94-security/51/]) HBASE-6550 Refactoring ReplicationSink to make it more responsive of cluster health (Himanshu Vashishtha) (Revision 1379229) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448346#comment-13448346 ] Hudson commented on HBASE-6550: --- Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/]) HBASE-6550 Refactoring ReplicationSink to make it more responsive of cluster health (Himanshu Vashishtha) (Revision 1379229) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445205#comment-13445205 ] Lars Hofhansl commented on HBASE-6550: -- @Himanshu: Wanna update the patch? I would like to get 0.94.2 out of the door, and this should included. Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445479#comment-13445479 ] Lars Hofhansl commented on HBASE-6550: -- Thanks for the patch Himanshu. Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445613#comment-13445613 ] Hudson commented on HBASE-6550: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #155 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/155/]) HBASE-6550 Refactoring ReplicationSink to make it more responsive of cluster health (Himanshu Vashishtha) (Revision 1379227) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445640#comment-13445640 ] Hudson commented on HBASE-6550: --- Integrated in HBase-0.94 #443 (See [https://builds.apache.org/job/HBase-0.94/443/]) HBASE-6550 Refactoring ReplicationSink to make it more responsive of cluster health (Himanshu Vashishtha) (Revision 1379229) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550-0.94-v2.patch, HBase-6550-0.94-v3.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch, HBase-6550-v5.patch, HBase-6550-v6.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442791#comment-13442791 ] Jean-Daniel Cryans commented on HBASE-6550: --- The 0.94 patch uses Threads.newDaemonThreadFactory I can't really find anywhere. Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442794#comment-13442794 ] Jean-Daniel Cryans commented on HBASE-6550: --- Ah ok I see it now, it was from a recent patch. Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550-0.94.patch, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434891#comment-13434891 ] Hadoop QA commented on HBASE-6550: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541006/HBase-6550-v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2579//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2579//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2579//console This message is automatically generated. Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435287#comment-13435287 ] Jean-Daniel Cryans commented on HBASE-6550: --- +1 on latest patch. For 0.94 we'll need a backport and I'd be +1 on that only if it's tested on a real cluster, which I volunteer on doing if provided with a patch that applies cleanly :) Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13435439#comment-13435439 ] Lars Hofhansl commented on HBASE-6550: -- Awesome. Thanks J-D. Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6550-havealook.txt, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch, HBase-6550-v4.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434650#comment-13434650 ] Lars Hofhansl commented on HBASE-6550: -- I like this patch. Ted will point out that you need to restore the threads interrupted state when you catch InterruptedException, and he would be correct :) Let's also get agreement from J-D that the ExecutorService is tuned correctly here, because it will be shared between all HTables used by this ReplicationSink. Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434685#comment-13434685 ] Himanshu Vashishtha commented on HBASE-6550: Ok :) re: InterrupteException: even when we are closing the host rs? re:JD. +1 Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434698#comment-13434698 ] Lars Hofhansl commented on HBASE-6550: -- If you'll guarantee in writing that it only ever happens while the RS is being closed then it's fine :) Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434706#comment-13434706 ] Jean-Daniel Cryans commented on HBASE-6550: --- The TPE looks ok. Shouldn't the conf be cloned? I'm worried about propagating those client-side configurations back in the RS. You never know when this can bite us especially in unit tests. Don't do this: {code} LOG.warn(interrupted while terminating: + e); {code} They put a second argument on those calls just for the exceptions. Also try having error messages that are more descriptive about the context. On the nitpick-side of things: - Call {{exec}} something more specific to what it is - Call {{con}} something more specific to what it is - Its called should be It's called Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13434761#comment-13434761 ] Lars Hofhansl commented on HBASE-6550: -- Oh yeah, the conf must be cloned (like the first version of the patch.) exec and con are my fault :) Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550.patch, HBase-6550-v1.patch, HBase-6550-v3.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432144#comment-13432144 ] Lars Hofhansl commented on HBASE-6550: -- Looks like this should work. I had something simpler in mind: # Have a decorated conf (like you do), set client pause/retry and also lower client rpc timeout. # Create an unmanaged HConnectionImplementation and an Executor # For each batch create new HTable(connection, executor) # apply batch # close create HTable. Seems that would be more readable...? Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: HBase-6550-v1.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432185#comment-13432185 ] Himanshu Vashishtha commented on HBASE-6550: I see :) I will be glad to make it more simpler. But, its not that difficult... :P It basically adds two things: bailout mechanism; and to achieve it, use Callable to submit in a RepSink#threadpool. I wanted to have the bailout functionality for the regionserver handler as part of the patch. With this, it gives the opportunity to do cleanup etc in case client goes away. Decorating config solves half the purpose. Another way is making similar changes at the master cluster regionserver side (decorating its config with a lower rpc timeout etc, but that's not desirable as its not intra-cluster and we want to give a full try before resending the shipment). bq. Create an unmanaged HConnectionImplementation and an Executor You mean at class level? In case another master cluster regionserver calls the method via another handler, it will wait then? Or at method level? bq.For each batch create new HTable(connection, executor) apply batch close create HTable. Yes, it also happens in the current patch. It closes out the connection, and htable's pool after the batch op. Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: HBase-6550-v1.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432257#comment-13432257 ] Lars Hofhansl commented on HBASE-6550: -- ThreadPool is pretty heavy weight (we're not using it in out Salesforce appservers at all, but directly use HConnections and Executors as I do here). Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550-v1.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432268#comment-13432268 ] Lars Hofhansl commented on HBASE-6550: -- I meant HTablePool :) Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550-v1.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432314#comment-13432314 ] Zhihong Ted Yu commented on HBASE-6550: --- @Lars: {code} +this.exec.shutdown(); +try { + this.exec.shutdownNow(); {code} I think something similar to the following should be placed in the try block (copied from http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html): {code} if (!pool.awaitTermination(60, TimeUnit.SECONDS)) { pool.shutdownNow(); // Cancel currently executing tasks {code} Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550-v1.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432324#comment-13432324 ] Lars Hofhansl commented on HBASE-6550: -- Thanks Ted. You are right. I did not attach this as a suggested patch, though. Just to explain what I meant with my earlier comments without typing a lot :) Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550-v1.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health
[ https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432340#comment-13432340 ] Himanshu Vashishtha commented on HBASE-6550: Got it. I like the suggestion of re-using the threadpool for all HTable instances. bq.Re:Although I would not think that that would be a common problem once the timeouts here are short enough. I think it will be good to have as it makes it more responsive, and also takes care from master cluster side: in a case when user doesn't configure a lower rpc timeout (for whatever reasons) at slave cluster. Refactoring ReplicationSink to make it more responsive of cluster health Key: HBASE-6550 URL: https://issues.apache.org/jira/browse/HBASE-6550 Project: HBase Issue Type: New Feature Components: replication Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Attachments: 6550-havealook.txt, HBase-6550-v1.patch ReplicationSink replicates the WALEdits in the local cluster. It uses native HBase client to insert the mutations. Sometime, it takes a while to process it (may be due to region splitting, gc pause, etc) and it undergoes the retrial phase. It has two repercussions: a) The regionserver handler which is serving the request (till now, a priority handler) is blocked for this period. b) The caller may get timed out and it will retry it anyway, but the handler serving the ReplicationSink requests is still working. Refactoring ReplicationSink to have the following features: a) Making it more configurable (have its own number of retrial limit, connection timeout, etc) b) Add a fail fast behavior so that it bails out in case caller is timedout, or any exception in processing the mutation batch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira