[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-09-20 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Release Note: This change has an impact on the number of watches set on the 
${zookeeper.znode.parent}/rs node in ZK in a replication slave cluster (i.e. a 
cluster that is being replicated to). Every region server in each master 
cluster will place a watch on the rs node of each slave node. No additional 
configuration is necessary for this, but this could potentially have an impact 
the performance and/or hardware requirements of ZK on very large clusters.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
Assignee: Gabriel Reid
 Fix For: 0.98.0, 0.95.2

 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, 
 HBASE-7634.v6.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-08-01 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-7634:
--

   Resolution: Fixed
Fix Version/s: 0.95.2
   0.98.0
 Assignee: Gabriel Reid
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to branch and trunk, thanks for the good work Gabriel.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
Assignee: Gabriel Reid
 Fix For: 0.98.0, 0.95.2

 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, 
 HBASE-7634.v6.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-07-31 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v5.patch

Updated patch based on reviewboard comments from J-D

No real functional changes (other than increased configurability). Most changes 
are formatting and whitespace.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-07-31 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v6.patch

Now without javadoc warnings

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, 
 HBASE-7634.v6.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-07-30 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v4.patch

Rebased patch on trunk, sorry for the delay on this one.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-25 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v3.patch

Updated patch with formatting issues fix, and made 
TestReplicationSinkManager#testReportBadSink_DownToZeroSinks deterministic

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Status: Patch Available  (was: Open)

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Status: Open  (was: Patch Available)

Current patch cancelled because it doesn't apply cleanly.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Status: Patch Available  (was: Open)

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v2.patch

Updated patch that applies cleanly on current trunk

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-21 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.patch

Initial patch to resolve this issue. Adds a watcher to replication peer's list 
of region servers to respond to changes in the list of region servers. 

Also changes checking randomly-chosen peer regionservers with the ability to 
report a bad peer regionserver. When a bad peer regionserver has been 
reported three times, it is no longer used for replication until the list of 
replication peer regionservers is refreshed.


 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira