[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Release Note: This change has an impact on the number of watches set on the ${zookeeper.znode.parent}/rs node in ZK in a replication slave cluster (i.e. a cluster that is being replicated to). Every region server in each master cluster will place a watch on the rs node of each slave node. No additional configuration is necessary for this, but this could potentially have an impact the performance and/or hardware requirements of ZK on very large clusters. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Assignee: Gabriel Reid Fix For: 0.98.0, 0.95.2 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, HBASE-7634.v6.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans updated HBASE-7634: -- Resolution: Fixed Fix Version/s: 0.95.2 0.98.0 Assignee: Gabriel Reid Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to branch and trunk, thanks for the good work Gabriel. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Assignee: Gabriel Reid Fix For: 0.98.0, 0.95.2 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, HBASE-7634.v6.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v5.patch Updated patch based on reviewboard comments from J-D No real functional changes (other than increased configurability). Most changes are formatting and whitespace. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v6.patch Now without javadoc warnings Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, HBASE-7634.v6.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v4.patch Rebased patch on trunk, sorry for the delay on this one. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v3.patch Updated patch with formatting issues fix, and made TestReplicationSinkManager#testReportBadSink_DownToZeroSinks deterministic Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Status: Patch Available (was: Open) Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Status: Open (was: Patch Available) Current patch cancelled because it doesn't apply cleanly. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Status: Patch Available (was: Open) Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v2.patch Updated patch that applies cleanly on current trunk Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.patch Initial patch to resolve this issue. Adds a watcher to replication peer's list of region servers to respond to changes in the list of region servers. Also changes checking randomly-chosen peer regionservers with the ability to report a bad peer regionserver. When a bad peer regionserver has been reported three times, it is no longer used for replication until the list of replication peer regionservers is refreshed. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira