[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752757#comment-13752757 ] Jean-Daniel Cryans commented on HBASE-7709: --- +1 Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752762#comment-13752762 ] stack commented on HBASE-7709: -- Applied to 0.95 and to trunk. Want this in 0.94 [~lhofhansl]? [~vasu.mariy...@gmail.com] Thanks boss. Any chance of a release note on this issue? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752818#comment-13752818 ] Lars Hofhansl commented on HBASE-7709: -- Yeah, will sync up with Vasu off line and probably commit to 0.94 soon. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752841#comment-13752841 ] Lars Hofhansl commented on HBASE-7709: -- Will commit to 0.94 later today if there are no objections. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, 7709-0.94-rev6.txt, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752858#comment-13752858 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12600457/7709-0.94-rev6.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6953//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, 7709-0.94-rev6.txt, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752911#comment-13752911 ] Hudson commented on HBASE-7709: --- SUCCESS: Integrated in hbase-0.95 #500 (See [https://builds.apache.org/job/hbase-0.95/500/]) HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518334) * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java * /hbase/branches/0.95/hbase-protocol/src/main/protobuf/WAL.proto * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, 7709-0.94-rev6.txt, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752920#comment-13752920 ] Hudson commented on HBASE-7709: --- SUCCESS: Integrated in HBase-TRUNK #4441 (See [https://builds.apache.org/job/HBase-TRUNK/4441/]) HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518335) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java * /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, 7709-0.94-rev6.txt, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753119#comment-13753119 ] Hudson commented on HBASE-7709: --- SUCCESS: Integrated in hbase-0.95-on-hadoop2 #276 (See [https://builds.apache.org/job/hbase-0.95-on-hadoop2/276/]) HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518334) * /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/branches/0.95/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java * /hbase/branches/0.95/hbase-protocol/src/main/protobuf/WAL.proto * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, 7709-0.94-rev6.txt, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753140#comment-13753140 ] Hudson commented on HBASE-7709: --- SUCCESS: Integrated in HBase-0.94-security #274 (See [https://builds.apache.org/job/HBase-0.94-security/274/]) HBASE-7709 Infinite loop possible in Master/Master replication (Vasu Mariyala) (larsh: rev 1518410) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, 7709-0.94-rev6.txt, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753164#comment-13753164 ] Hudson commented on HBASE-7709: --- SUCCESS: Integrated in HBase-0.94 #1128 (See [https://builds.apache.org/job/HBase-0.94/1128/]) HBASE-7709 Infinite loop possible in Master/Master replication (Vasu Mariyala) (larsh: rev 1518410) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, 7709-0.94-rev6.txt, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753204#comment-13753204 ] Hudson commented on HBASE-7709: --- FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #700 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/700/]) HBASE-7709 Infinite loop possible in Master/Master replication (stack: rev 1518335) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Mutation.java * /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java * /hbase/trunk/hbase-protocol/src/main/protobuf/WAL.proto * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/BaseRowProcessor.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RowProcessor.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSink.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotLogSplitter.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, 7709-0.94-rev6.txt, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13751447#comment-13751447 ] Ted Yu commented on HBASE-7709: --- +1 Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750192#comment-13750192 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599751/HBASE-7709-rev5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6896//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750289#comment-13750289 ] Vasu Mariyala commented on HBASE-7709: -- The patch HBASE-7709-rev5.patch is on the top of 0.94 and hence the hadoop qa would always fail while applying this patch on trunk. Can any one please run the hadoop qa build for the patch 0.95-trunk-rev4.patch (which is the trunk and 0.95 patch)? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750297#comment-13750297 ] Ted Yu commented on HBASE-7709: --- Please attach 0.95-trunk-rev4.patch one more time - Hadoop QA picks up the latest attachment Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750559#comment-13750559 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1250/0.95-trunk-rev4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 2 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6902//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750617#comment-13750617 ] Vasu Mariyala commented on HBASE-7709: -- The release audit warnings are not related to the patch. This has to do with the missing licenses in the below files. After correcting the license info in these files, the release audit is successful {code} *** Unapproved licenses: /home/vmariyala/bigdata-dev/testhbase/hbase-server/src/main/resources/hbase-webapps/static/css/bootstrap-theme.min.css /home/vmariyala/bigdata-dev/testhbase/hbase-server/src/main/resources/hbase-webapps/static/css/bootstrap-theme.css *** {code} Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750727#comment-13750727 ] Jeffrey Zhong commented on HBASE-7709: -- I reviewed 0.94 and trunk patch. They both looks good to me! +1 from me. Thanks. In the trunk, we currently carry all clusterIds in the replication path and we could optimize this later when there is a need. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749550#comment-13749550 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599751/HBASE-7709-rev5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6866//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749520#comment-13749520 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599751/HBASE-7709-rev5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6865//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, 0.95-trunk-rev4.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch, HBASE-7709-rev5.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748685#comment-13748685 ] Ted Yu commented on HBASE-7709: --- {code} + public void setClusters(SetUUID clusterIds) { {code} Would setClusterIds() be better name for the above method ? {code} + * @return the set of clusters that have consumed the mutation {code} 'set of clusters' - 'set of cluster Ids' {code} + public SetUUID getClusters() { {code} getClusters - getClusterIds {code} -private UUID clusterId; +private SetUUID clusters; {code} clusters - clusterIds If you agree with the above comments, please modify names in other places as well. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748982#comment-13748982 ] Jeffrey Zhong commented on HBASE-7709: -- I reviewed the trunk patch. One thing I noticed that the trunk patch deprecats clusterId and related code. I think we should still keep it around. The reason is that one of the semantics of clusterId is the Original ClusterId where the changes are generated. This information will be very useful when we build monitoring dashboard to show how many edits from each source cluster. Similarly we could combine the original cluster Id and write time to know replication latency from source to current cluster. The rest looks good. Thanks. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749081#comment-13749081 ] Vasu Mariyala commented on HBASE-7709: -- [~jeffreyz] This cluster information is only stored as part of the HLog and it gets rolled. So do you think it is the place from where we read the information about the originating cluster to build such metrics? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749166#comment-13749166 ] Lars Hofhansl commented on HBASE-7709: -- This does raise a good point. Maybe we should store the cluster ids in order of traversal. That would later allow us to reconstruct the replication path between clusters and display it in the shell. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749171#comment-13749171 ] Ted Yu commented on HBASE-7709: --- bq. we should store the cluster ids in order of traversal +1 Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, 0.95-trunk-rev3.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch, HBASE-7709-rev4.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748263#comment-13748263 ] Hadoop QA commented on HBASE-7709: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12599256/0.95-trunk-rev2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6850//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746586#comment-13746586 ] Lars Hofhansl commented on HBASE-7709: -- The 0.94 patch looks good. Bit large, but then again this is a bad bug to have (when it hits you you'll useless load on your cluster forever, throwing your versions off, etc). Nice refactoring of the replication test. Few nits: * PREFIX_CLUSTER_KEY in WALEdit could just be '_', right? No need to store that longer prefix everywhere. * Similarly maybe make PREFIX_CONSUMED_CLUSTER_IDS in Mutation just _cs.id * The comment for scopes in WALEdit could be a bit more explicit that we're overloading scopes with the cluster id for backwards compatibility. +1 otherwise (assuming the full 0.94 test suite passes) Looking at trunk patch now. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746611#comment-13746611 ] Lars Hofhansl commented on HBASE-7709: -- In trunk: * should repeated UUID clusters = 8 in WAL.proto? Otherwise we can't read old log entries. But maybe that's not a problem...? * in Import: {code} +clusters = new HashSetUUID(); +clusters.add(ZKClusterId.getUUIDForCluster(zkw)); {code} Can be written as {{cluster = Collections.Collections.singleton(ZKClusterId.getUUIDForCluster(zkw))}} * Is this right? {code} + for(UUID clusterId : key.getClusters()) { uuidBuilder.setLeastSigBits(clusterId.getLeastSignificantBits()); uuidBuilder.setMostSigBits(clusterId.getMostSignificantBits()); +keyBuilder.addClusters(uuidBuilder.build()); {code} addClusters expects a Set. * Where is HlogKey.PREFIX_CLUSTER_KEY used? Just to read old versions of WALEdits? Need to discuss if that is necessary. [~stack]? This has to do with upgrading WALEdits from pre 0.95. Otherwise looks great. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746781#comment-13746781 ] Vasu Mariyala commented on HBASE-7709: -- Attached the patches for 0.94 (HBASE-7709-rev3.patch) and 0.95, trunk(0.95-trunk-rev2.patch) which addresses the nits mentioned by Lars 0.94 a) Changed PREFIX_CLUSTER_KEY to '.' (period as the column family names can't start with it) b) PREFIX_CONSUMED_CLUSTER_IDS changed to _cs.id c) A comment has been added in WALEdit mentioning that it is done for backwards compatibility and has been removed in 0.95.2+ releases trunk/0.95 a) From protobuf documentation repeated: this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.. optional: a well-formed message can have zero or one of this field (but not more than one). So does repeated imply it is optional? Also, from the WALProtos.java the clusters list is initialized to empty list in the initFields() method so we would not get any NullPointerException. May be, I would do more research on this. b) clusters in Import has been changed to use singleton c) addClusters has a method public Builder addClusters(org.apache.hadoop.hbase.protobuf.generated.HBaseProtos.UUID value) which takes the UUID as the parameter. d) Yes, this is used only to read the older log entries when migrating from 0.94 to 0.95.2. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746818#comment-13746818 ] Himanshu Vashishtha commented on HBASE-7709: bq. + repeated UUID clusters = 8; /* - optional CustomEntryType custom_entry_type = 8; - + optional CustomEntryType custom_entry_type = 9; This re-ordering good because 0.96.0 is not released yet? I think we should have the flexibility to read older edits as a clean shutdown is a stringent requirement (especially for larger clusters). Also when replication is enabled, there may be some old logs left to replicate. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13746838#comment-13746838 ] Vasu Mariyala commented on HBASE-7709: -- [~v.himanshu] There is no re-ordering done with the patch. The entry custom_entry_type is and was a commented one. I changed the number to 9 just incase if some one un-comments it in the future. Please let me know if I miss anything Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, 0.95-trunk-rev2.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch, HBASE-7709-rev3.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745573#comment-13745573 ] Vasu Mariyala commented on HBASE-7709: -- 0.95-trunk-rev1.patch contains the javadoc fix and is the latest patch for the trunk and 0.95 branches. HBASE-7709-rev2.patch is the updated patch on the top of 0.94 which addresses the comments made by [~lhofhansl], [~jdcryans] and [~jeffreyz] Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Assignee: Vasu Mariyala Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742367#comment-13742367 ] Vasu Mariyala commented on HBASE-7709: -- I ran all the test cases on my local machine with the trunk patch and they are successful. But everytime it is run on jenkins, it throws FATAL: Unable to delete script file /tmp/hudson5964600500647866956.sh hudson.util.IOException2: remote file operation failed: /tmp/hudson5964600500647866956.sh at hudson.remoting.Channel@5ce45886:hadoop1 at hudson.FilePath.act(FilePath.java:902) at hudson.FilePath.act(FilePath.java:879) at hudson.FilePath.delete(FilePath.java:1288) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:101) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1597) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:247) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:516) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:714) at hudson.FilePath.act(FilePath.java:895) ... 13 more Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at hudson.remoting.Command.readFrom(Command.java:92) at hudson.remoting.ClassicCommandTransport.read(ClassicCommandTransport.java:72) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48) FATAL: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:714) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:167) at com.sun.proxy.$Proxy40.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:925) at hudson.Launcher$ProcStarter.join(Launcher.java:360) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:91) at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:60) at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19) at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804) at hudson.model.Build$BuildExecution.build(Build.java:199) at hudson.model.Build$BuildExecution.doRun(Build.java:160) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:586) at hudson.model.Run.execute(Run.java:1597) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:247) Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel at hudson.remoting.Request.abort(Request.java:299) at hudson.remoting.Channel.terminate(Channel.java:774) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:69) Caused by: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50) Caused by: java.io.EOFException at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2596) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742560#comment-13742560 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598500/095-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6788//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742640#comment-13742640 ] Hadoop QA commented on HBASE-7709: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598525/0.95-trunk-rev1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6792//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: 095-trunk.patch, 0.95-trunk-rev1.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741880#comment-13741880 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598350/HBASE-7709-rev2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6775//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741909#comment-13741909 ] Vasu Mariyala commented on HBASE-7709: -- 0.95 patch works for trunk as well. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: HBASE-7709-095.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13741951#comment-13741951 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12598360/HBASE-7709-095.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestHLog org.apache.hadoop.hbase.migration.TestNamespaceUpgrade Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6777//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.94.12, 0.96.0 Attachments: HBASE-7709-095.patch, HBASE-7709.patch, HBASE-7709-rev1.patch, HBASE-7709-rev2.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738867#comment-13738867 ] Jean-Daniel Cryans commented on HBASE-7709: --- The logic here is getting big enough that it should be encapsulated, but I can't think right now if a nice way to do it. WALEdit will need more javadoc. The reason WALEdit.scopes wasn't instantiated is that you wouldn't need to keep an extra TreeMap around if replication wasn't enabled. It seems this patch changes that assumption. If it makes more sense to always instantiate it then it should be final, there are a bunch of if (scopes == null) that aren't needed anymore, and if (scopes != null) would always be true. bq. A - B - C - A replication I think you meant the last cluster to be B? It seems we should refactor TestMasterReplication a bit because with this patch it would just look like the same code is running 3 times (to the untrained eye). Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch, HBASE-7709-rev1.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738999#comment-13738999 ] Dave Latham commented on HBASE-7709: Thanks all for the great work on this. We currently have a pair of clusters in two datacenters in a master/master setup and want to migrate one of them to a new datacenter. I'm trying to determine if this patch will be required for us and would love if someone would be willing to double check my thinking. Currently we have A - B, B - A 1. Setup C, create presplit tables with replication_scope enabled on them. 2. Add peer B - C (New state A - B, B - A, B- C) 3. Copy table on each table from B - C 4. Stop applications in A 5. Wait for queues from A - B to clear 6. Remove peer A - B (New state B - A, B- C 7. Remove peer B - A (New state B - C) 8. Add peer C - B (New state B - C, C - B) 9. Start applications in C Given that we can live with applications only running in a single datacenter for a period of time we don't ever need to have writes from one cluster replicate to a downstream loop. Therefore I don't think this patch is required for this migration. Does that sound correct? So does the state of (A - B) - C still trigger the problem? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch, HBASE-7709-rev1.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13739015#comment-13739015 ] Lars Hofhansl commented on HBASE-7709: -- You are correct, you do not need this patch as along as you do step #6 before step #8. (A - B) - C is fine. C - (A - B) is not. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch, HBASE-7709-rev1.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737365#comment-13737365 ] Vasu Mariyala commented on HBASE-7709: -- Thanks [~lhofhansl] and [~jeffreyz] for the review comments. I have attached the patch HBASE-7709-rev1.patch with the fixes based on your comments. Also added a test cases for A - B - C - A replication. Please do review this patch. For the 0.95+ and the trunk fixes, I would like to either remove/deprecate the setClusterId and getClusterId methods of Mutation and also HLogKey as these are primary maintained to avoid cyclic replication. Please do let me know your opinion on this so that I can work on providing the patches for the same. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch, HBASE-7709-rev1.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737363#comment-13737363 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597557/HBASE-7709-rev1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6706//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch, HBASE-7709-rev1.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737390#comment-13737390 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597567/HBASE-7709-rev1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6707//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch, HBASE-7709-rev1.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736398#comment-13736398 ] Jeffrey Zhong commented on HBASE-7709: -- [~vasu.mariy...@gmail.com] I reviewed your patch. The following lines: {code} - if (!logKey.getClusterId().equals(peerClusterId)) { + // don't replicate if the log entries has already been consumed by the peer cluster + if (!edit.hasClusterConsumed(peerClusterId)) { {code} I think we still need keep the old check around. The above change may cause issues during upgrade. For example, we have ( A - B, B- A) replication setup. If we just upgrade cluster A, the above change may cause infinite loop before we upgrade cluster B. Rest code looks good to me. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736423#comment-13736423 ] Vasu Mariyala commented on HBASE-7709: -- [~jeffreyz] The cluster ids in the WALEdit contains all the cluster ids including the cluster id on which the entry was created first.When ReplicationSource runs on A, it marks the cluster itself as consumed in the WALEdit. This entry would then be sent on to B. ReplicationSource running on B checks that the cluster A has already consumed and therefore does not send the entry back to A. If there are any possible scenarios where this would not work, please do let me know. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736472#comment-13736472 ] Jeffrey Zhong commented on HBASE-7709: -- [~vasu.mariy...@gmail.com] The above you mentioned will work if both A and B cluster are upgraded to the latest bits. Let's say cluster B is NOT upgraded yet while A is upgraded. When B send edits to A, the scope of a WALEdit won't have cluster Ids so A's WALEdits won't have cluster ID B in its scope. Then later the check edit.hasClusterConsumed(peerClusterId) in cluster A will evaluate as false and the edits of B will be sent back to cluster B. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736200#comment-13736200 ] Lars Hofhansl commented on HBASE-7709: -- In any event. We should probably have separate issues for the real 0.95+ fix and the backwards compatible 0.94 patch. Maybe we can try to keep the API the same, or at least similar. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735635#comment-13735635 ] Hadoop QA commented on HBASE-7709: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12597230/HBASE-7709.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6689//console This message is automatically generated. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735742#comment-13735742 ] Lars Hofhansl commented on HBASE-7709: -- HadoopQA only works against trunk. But since the trunk patch would presumably be quite a bit different it wouldn't help with 0.94. Will have a look at the patch over the weekend. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735748#comment-13735748 ] Lars Hofhansl commented on HBASE-7709: -- Patch looks good. Smaller than expected :) Two comments: # Maybe change the new methods to addClusterId and getAllClusterIds? # In ReplicationSink we need to group by unique path, I think, (just like I do now by ClusterId) so that the path is maintained during intermediary replication. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 Attachments: HBASE-7709.patch We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733611#comment-13733611 ] Lars Hofhansl commented on HBASE-7709: -- We also need to keep HBASE-9158 (a bug I just discovered). Here we need to group the edits by path and apply them strictly in these groups. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732124#comment-13732124 ] Jean-Daniel Cryans commented on HBASE-7709: --- I'm +1 on [~lhofhansl]'s proposition, let's do it in a different jira? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732750#comment-13732750 ] Jeffrey Zhong commented on HBASE-7709: -- I think for 0.95 and onwards, we should store relay cluster ids along replication path in HLogKey to solve the issue . Since the list of replay cluster ids is added into each WALEdit, the storage network traffic overhead isn't trivial when we have a long replication path. We can use an optimization(mentioned above as adaptive #2). We introduce a 4 byte path checksum field into HLogKey, a cluster only adds its cluster id into the relay cluster id list when it finds there exist multiple paths from a single cluster id. In most cases such as a simple replication loop or a acyclic replication path, the relay cluster id list is empty. The overhead is just the 4 bytes path check sum. For 0.94, we can either use Lars approach(configuration option hbase.enable.cyclic.replication) or introduce a new configuration option hbase.replication.reset.clusterid=clusterId which a user specifies | *. Only cluster specified here reset clusterId to itself. When hbase.replication.reset.clusterid=*, it is equivalent to Lars approach. In addition, we can leverage existing field HLogKey.writeTime to detect loop in 0.94 if a WALEdit is stale too long for replication(like configurable 30mins). We can pass the writeTime as an attribute like the way cluster Id is passed during replication so that we can check the original writeTime to see if we have a possible infinite loop situation. Thanks. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732869#comment-13732869 ] Lars Hofhansl commented on HBASE-7709: -- Reading back through the comments here. How about we follow Ted's approach. Instead of writing a boolean we write the number of hops into the HLogKey. 0 will still be interpreted as false in the old code, 0 as true. Thus we can store the number hops and still not break the existing code (although the code would not be able to stop the bouncing). In our Salesforce scenario we would limit the hop count to 3 and would be able to support our setup that way. Yet another option is to make this configurable. At this point we're still able to fully bounce our clusters. So we can do the hop count and optionally (per a config option) store the full path, this might even be applicable to trunk as the user now has the choice between limiting loops to some limit with little extra storage or be precise at the expense of more storage. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732880#comment-13732880 ] Jean-Daniel Cryans commented on HBASE-7709: --- My impression regarding configuring path or hops count is that if you start changing the clusters the upkeep becomes very expensive and it's not clear what happens while it's being changed (or if a cluster just goes down). Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732914#comment-13732914 ] Jeffrey Zhong commented on HBASE-7709: -- Due to the 0.94 upgrade complications, the configuration approach is a practical one. Basically we shifts the duty to users to break infinite loop situation with the newly introduced configuration. Meantime we have to provide a way to detect possible infinite loop situation in 0.94 so that a user can act upon it. Using max hop counter is better only used for detection not the way to break infinite loop because it's error prone as JD pointed above when replication path changes. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732928#comment-13732928 ] Lars Hofhansl commented on HBASE-7709: -- Are you referring to switching via config option from storing just the hop count to storing the path? Yep, an admin would need to make the call and bounce the cluster (no rolling restart when that option is enabled the first time). Not ideal. I'm just looking for ways to avoid local Salesforce-only patches, but maybe backporting a trunk patch that stores the path would not be so bad (it sets a bad precedence here, though :) ) The hop count we can always do safely (I think). In our case we'd enable the path option and bounce the cluster(s). Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732933#comment-13732933 ] Vasu Mariyala commented on HBASE-7709: -- Can we add the information of the clusterids which already contain the change to the scopes variable of the WALEdit? Scopes being a navigable map of byte array to integer can contain the byte array of the cluster ids to 1 (indicating the cluster has received the change already)? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732934#comment-13732934 ] Lars Hofhansl commented on HBASE-7709: -- [~jeffreyz] The hop count is always safe to do, no? We'd default it to a reasonably large value (say 1000). This should be immune to topology changes. Without it edits would bounce within the replication ring *forever*, with no way to stop it (other than disabling replication or deleting the WAL files), that is almost worse than downtime. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732947#comment-13732947 ] Lars Hofhansl commented on HBASE-7709: -- The scope idea might just work. It's only read and match against column families so as long as we prefix it with something that cannot be contained in a column family name. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732984#comment-13732984 ] Jeffrey Zhong commented on HBASE-7709: -- [~lhofhansl] The hop count is only safe when using a big value. Let's say you use 1000. For 3 cluster situation, it means the same data will be re-written 300+ times before we stop. This affects a cluster's performance and slow down regular replication as well. Yeah, I think the scope idea can fly as column family only allows printable characters so it's possible to come up a special prefix character to store cluster id Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732996#comment-13732996 ] Lars Hofhansl commented on HBASE-7709: -- [~vasu.mariy...@gmail.com], wanna work out a patch? We can work on that together if you like. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733131#comment-13733131 ] Vasu Mariyala commented on HBASE-7709: -- [~lhofhansl], working on it. Sure, will take your help. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729707#comment-13729707 ] Jean-Daniel Cryans commented on HBASE-7709: --- Ah ok that's a fancy setup you got there. Sounds ok to me. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730116#comment-13730116 ] stack commented on HBASE-7709: -- What is to be done on this for 0.95.2? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.12 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728275#comment-13728275 ] Jean-Daniel Cryans commented on HBASE-7709: --- {quote} So I'd like to introduce a config option: hbase.enable.cyclic.replication. The default is true to maintain the current functionality. If set to false we'd reset the cluster id at each source and hence would only support master-master replication (cycles involving more that 2 nodes would lead to infinite loops). {quote} This seems like a lose-lose. The current functionality has the problem that 7709 is about and setting the config to false would just make it worse? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.11 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728280#comment-13728280 ] Lars Hofhansl commented on HBASE-7709: -- It would allow a A - B - C scenario, which is currently not possible. At the same time it would break setups like A - B - C - A Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.11 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728432#comment-13728432 ] Lars Hofhansl commented on HBASE-7709: -- In fact we will have the following setup: A - B, C - D, E - F, ... (where these are all pairs of DR clusters. We keep them both as master so that a failover for other reasons, even just as exercise does not need further configuration). We sometime migrate an entire cluster, say A. In that case we'd also replicate A - C. Currently we can't do that, because the data from A would bounce between C and D forever. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.11 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711868#comment-13711868 ] Lars Hofhansl commented on HBASE-7709: -- That would be more flexible, but at the same time more tedious to manage. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.10 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710575#comment-13710575 ] Lars Hofhansl commented on HBASE-7709: -- Any opinions? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.10 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13710629#comment-13710629 ] Jeffrey Zhong commented on HBASE-7709: -- For 0.94, I think it'd be better that we introduce a configuration setting hbase.replication.reset.clusterid=clusterId which a user specifies. Only cluster specified here reset clusterId to itself so that we still can support master-master replication involving more than 2 nodes without bumping up logkey version. We could possibly bump up HLogKey version with one upgrade configuration setting like upgrade.logkey plus two rounds of rolling restart. Originally we set the config setting to false. First round rolling start to upgrade RS bits(new RS still write hlogkey in old version) and after all RS upgraded, we set the configuration to true and then second rolling start. The above complicates the upgrade scenario a little bit and requires all involved clusters in the replication are upgraded. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.10 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704388#comment-13704388 ] Lars Hofhansl commented on HBASE-7709: -- It seems for 0.94 we can either do option #1 or nothing at all. So I'd like to introduce a config option: hbase.enable.cyclic.replication. The default is true to maintain the current functionality. If set to false we'd reset the cluster id at each source and hence would only support master-master replication (cycles involving more that 2 nodes would lead to infinite loops). Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.95.2, 0.94.10 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646413#comment-13646413 ] Jeffrey Zhong commented on HBASE-7709: -- [~lhofhansl] Sure. Is that all right to implement option #6 for 0.94 and adaptive option#2 for trunk? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.94.8, 0.95.1 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646400#comment-13646400 ] Lars Hofhansl commented on HBASE-7709: -- The proposal sounds good. Are you still planning to work on this [~jeffreyz]? Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.6, 0.95.1 Reporter: Lars Hofhansl Fix For: 0.98.0, 0.94.8, 0.95.1 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13623119#comment-13623119 ] Lars Hofhansl commented on HBASE-7709: -- Sorry, I missed this. I need to read through and digest it. :) In any event, moving to 0.94.8. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.1, 0.94.6 Reporter: Lars Hofhansl Fix For: 0.95.1, 0.94.7 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13598579#comment-13598579 ] Jeffrey Zhong commented on HBASE-7709: -- Continue with more proposals... The disadvantages of option#2 is obvious as its advantages. Even in cases(maybe majority replication usage cases), there is no loop at all and just a long replication queue. The downstream RSs still need to replay and store a long list of clusterIds for each WALEdit. Encoding may help compress the clusterId list in sending part but not in storing. Let me firstly try to show you if we can do better than option#2 and then an alternative way which is good in most cases without more storage need. Both options are good IMHO. As we know loop is caused by back-edge in graph. We can roughly identify them by the fact if a region server sees there are more than one path from same source. If that's the case, loop situation is likely. Only by then, we need to append current cluster Id to the source cluster Id of a WAL edit for later loop detection. Therefore, in most cases, we don't need store long clusterId list if there is no loop or a simple master-master-master… cycle setup. I called the above updated option#2 as adaptive option#2 where it only need more storage when there is a need. We can implement it as following: 1) Maintain a hash string PathCheckum(= Hash(receivedPathChecksum + current clusterId)) of a WAL edit 2) Each replaying receiving region server maintains an internal memory ClusterDistanceMap clusterId, SetPathChecksums seen so far. 2.a Every time if it sees a new PathChecksum(which isn't in SetPathChecksums ), it add the new PathChecksum into SetPathChecksums or drop a stale one from SetPathChecksums when it's expired, i.e. after a configurable time period, a region server doesn't see any data coming in from the path. 3) When SetPathChecksums's size 1, append current cluster id into the WAL edit for later replication loop detection. We can use top 8 bytes of clusterId to store PathChecksum and the rest 8 bytes as the hash of the original cluserId value. After the update, we only need to pay cost when there is a need. While you can image in real life replication setup normally doesn't involve any complicated graph, the option#2 is using extra storage need to deal with situations most likely won't happen. Therefore, in the following, I want to propose a solution without changing current WAL format and is good for most cases including the situation triggering the JIRA. In extreme cases, it reports errors for infinite loop. The new proposal(option #6) is as following: 1) Maintain a hash string PathCheckum(= Hash(receivedPathChecksum + current clusterId)) of a WAL edit 2) Each replaying receiving region server maintains an internal memory ClusterDistanceMap clusterId, SetPathChecksums seen so far. 2.a Every time if it sees a new PathChecksum(which isn't in SetPathChecksums ), it add the new PathChecksum into SetPathChecksums or drop a stale one from SetPathChecksums when it's expired, i.e. after a configurable time period, a region server doesn't any data coming in from the path. 3) When SetPathChecksums's size 1, reset a WAL edit's clusterId to current clusterId and increment a counter(ResetCounter) to mark how many times current WAL edit's clusterId has been reset. 4) When ResetCounter 64, reports error( we could drop WAL edits as well because when ResetCounter 64, it means we have at least 64 back-edges or duplicated sources. I think it's way complicated to have such cases.) The advantage of the above option is possibly using existing HLog format to prevent possible loop situation in real life cases To implement, 1) we can introduce a new version(3) in HLogKey 2) use top 7 bytes of UUID to store PathChecksum, use the following 1 byte to store RD and the remaining 8 bytes as a hash value of the 16 bytes length of origin UUID value without compromising uniqueness because in most cases we have 10s clusters involved in replication and the collision probability is less than 10(-18) 3) we can introduce a configuration setting with default to false(suggested by Lars). After we rollout the feature, we can turn it on and turn if off in revert scenario. Thanks, -Jeffrey Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.0, 0.94.6 Reporter: Lars Hofhansl Fix For: 0.95.0, 0.94.7 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13597418#comment-13597418 ] Jeffrey Zhong commented on HBASE-7709: -- I have another idea which IMHO is better. The basic idea is following: 1) We maintain a counter value called RD(replication distance) which represents how far a WAL edit from a source cluster to current cluster like the hop-counter mentioned in option 3. 2) Each replaying receiving region server maintains an internal memory ClusterDistanceMap clusterId, MIN(RD). Every time, if it sees a WAL with RD less than it currently has seen then just update the internal map with the smaller RD value. 3) drop all WAL edits from a cluster with RD the the one current region server has in the ClusterDistanceMap Initially we could duplicate data for first several WAL edits but it will be corrected soon so we don't need to persistent any data for fail over scenario. The above idea is similar to option 3 but without always double replicating data on some clusters and maintaining the max-hop is human error-prone if we forget to bump up the max hop-count value when more clusters join in replication cycle. Why it works? Loop detection: quick walker will catch up slow walker but travel more. When we have infinite loop replication as mentioned in the JIRA, the data from a source must come from multiple ways to the destination with different RDs. Because it's evolving some loops, the RD won't be same otherwise there is no loop. Since the RD is different, we just need keep the data from the source with min distance. You may ask the diamond situation like following. a-b-d a-c-d where the data from a will be replicated to d twice. This is we configure to let d receive a's data twice. If there is loop involved and the loop-backed data will be dropped by the above way. This is general loop detection strategy so we can implement it in 0.96 or above. For 0.94, 1) we can introduce a new version(3) in HLogKey 2) use top two bytes of UUID to store the RD value and the remaining 14 bytes as a hash value of the 16 bytes length of origin UUID value without compromising uniqueness because in most cases we have 10s clusters involved in replication and the collision probability is less than 10(-18) OR using Ted's suggestion to overload the boolean byte. 3) we can introduce a configuration setting with default to true. When we want to revert the new behavior, we can turn it off. please let me how do you think? Assign the ticket to me firstly in case we agree the implement the way I'm proposing. Thanks, -Jeffrey Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.0, 0.94.6 Reporter: Lars Hofhansl Fix For: 0.95.0, 0.94.7 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13597532#comment-13597532 ] Lars Hofhansl commented on HBASE-7709: -- Hey Jeffrey, that is my option #3 in the description, right? :) Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.0, 0.94.6 Reporter: Lars Hofhansl Assignee: Jeffrey Zhong Fix For: 0.95.0, 0.94.7 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13597611#comment-13597611 ] Jeffrey Zhong commented on HBASE-7709: -- [~lhofhansl] My proposal is similar to option 3 because we both use hop-counter(replication distance in my proposal). While as I mentioned in the proposal {quote} The above idea is similar to option 3 but without always double replicating data on some clusters and maintaining the max-hop is human error-prone if we forget to bump up the max hop-count value when more clusters join in replication cycle. {quote} In the new proposal, region servers dynamically discover maintain the MIN(RD) from a cluster and drop all edits which higher RDs from the same cluster. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.0, 0.94.6 Reporter: Lars Hofhansl Assignee: Jeffrey Zhong Fix For: 0.95.0, 0.94.7 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13597636#comment-13597636 ] Lars Hofhansl commented on HBASE-7709: -- Ah yes. Cool. I should read the entire text before replying. Yes, that should work. I like it. The distance data does not have to be persisted as you say, upon restart an RS would just relearn. Generally, do you like this better than option #2? #2 would store too much data? As for 0.94. I like the config option, but it needs to be default off, so that we can do rolling restarts by default. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.0, 0.94.6 Reporter: Lars Hofhansl Assignee: Jeffrey Zhong Fix For: 0.95.0, 0.94.7 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13597711#comment-13597711 ] Enis Soztutar commented on HBASE-7709: -- I like option #2 better than this. It is more simpler. Jeff's idea is good, but has the problem of dealing with the topology changes. If the topology changes in a way to make the normal route to a cluster longer, than all the updates afterwards will be dropped unless we somehow clear the cached mappings. This brings in an operational burden of cleaning the caches of downstream clusters, once the admin changes the topology upstream. {code} A - B - C is changed to A - B - D - C - B {code} Orthogonal to this, we also should be dropping the edits at the replication source, not the sink. We are doubling the network cost in cyclic cases. #2 also helps with this condition, because we can detect the sink cluster's id, and filter out. We can do a similar dynamic dictionary encoding for storing set of cluster ids. We can do it as a follow up optimization. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.0, 0.94.6 Reporter: Lars Hofhansl Assignee: Jeffrey Zhong Fix For: 0.95.0, 0.94.7 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13597734#comment-13597734 ] Jeffrey Zhong commented on HBASE-7709: -- I agree #2 is simpler but it is at cost of replaying more data and storing more data. As Enis mentioned, my proposal will need special handling when a new cluster joins. Either dynamically encoding a special token to let downstream RSs to reset its internal cache or ask operators to reset replication. It maks my proposal less appealing. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.0, 0.94.6 Reporter: Lars Hofhansl Assignee: Jeffrey Zhong Fix For: 0.95.0, 0.94.7 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13574115#comment-13574115 ] Ted Yu commented on HBASE-7709: --- TestRowProcessorEndpoint shows another potential implementation for option #3: {code} // We can also inject some meta data to the walEdit KeyValue metaKv = new KeyValue( row, HLog.METAFAMILY, Bytes.toBytes(num of hops), Bytes.toBytes(hops)); walEdit.add(metaKv); {code} Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569611#comment-13569611 ] Ian Varley commented on HBASE-7709: --- At a minimum, this should be called out in the reference guide replication page. Replication is still a pretty advanced feature, and replication for 2 clusters even more so; if a patch doesn't go into 0.94.5, it's not end of the world. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569623#comment-13569623 ] Ted Yu commented on HBASE-7709: --- Looking at HLogKey#readFields(): {code} if (version.atLeast(Version.INITIAL)) { if (in.readBoolean()) { {code} From the javadoc of readBoolean(): Reads one input byte and returns true if that byte is nonzero, false if that byte is zero. I think there is room to implement option #3 in the description. We can introduce new version (two, considering compression) where write, instead of true, the number of hops that HLog.Entry has gone through - starting with 1. A byte should suffice for this purpose. +1 on documenting this intricacy for 0.94.x in the refguide. I think we should create several subtasks for this JIRA. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13569466#comment-13569466 ] Lars Hofhansl commented on HBASE-7709: -- Any good ideas for 0.94? We cannot change the HLog forward in a non-backwards compatible way there. Maybe in 0.94 we can do something simple along Ian's line of thinking. I don't care if it blows up in this case, even the RSs just aborting is better than a infinite back and forth of replication data (will fill up the memstore is useless versions, forever). Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565699#comment-13565699 ] Lars Hofhansl commented on HBASE-7709: -- Thanks to [~cody.mar...@gmail.com], [~ivarley], and [~jesse_yates] for finding the issue. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565706#comment-13565706 ] Ted Yu commented on HBASE-7709: --- Option #2 seems the best. I think number of clusters exceeding 10 in master/master replication would be rare. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565714#comment-13565714 ] Lars Hofhansl commented on HBASE-7709: -- I'd agree. Need to check if this is possible in 0.94 while keeping the HLog backwards compatible. If that is tricky for 0.94 we might need option #1. Also, I cannot promise that I will get to this any time soon. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565752#comment-13565752 ] Ian Varley commented on HBASE-7709: --- Would another option be to do some kind of checking at add_peer time, to make sure no pernicious cycles are detected? I.e. when I add a peer, first walk the graph of current master/peer relationships and refuse to add if I detect a cycle I'm not part of? Would require an API to ask that question, but that's probably a good thing anyway. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565760#comment-13565760 ] Ian Varley commented on HBASE-7709: --- (Because cycles 2 are still fine, they just have to include all nodes. It can go A - B - C - A; when an edit from A gets to C, it won't re-send to A, and the cycle will stop. The problem is just when it's a cycle from A - (B - C - B).) Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565794#comment-13565794 ] Ted Yu commented on HBASE-7709: --- The new API would allow specification of more than one cluster, right ? What about (B - C - B) - A where B replicates to A unidirectionally ? I think option #2 is the general solution. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565835#comment-13565835 ] Ted Yu commented on HBASE-7709: --- For HLog.Entry: {code} public void write(DataOutput dataOutput) throws IOException { this.key.write(dataOutput); this.edit.write(dataOutput); } {code} where the first integer for WALEdit is versionOrLength. If it is 0, it is length. Otherwise it should be -1. We can introduce a marker (a.k.a WALEdit.VERSION_3 == -2) after which additional cluster Ids can be serialized. This is an incompatible change which should be acceptable to singularity release. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565847#comment-13565847 ] Ian Varley commented on HBASE-7709: --- Re: (B - C - B) - A, that's fine; no participants are detecting a cycle they're not part of (A isn't adding any peers, it's the slave). B detects a cycle it's part of (B - C - B) and C does as well. The API would be simple, and would let the caller walk the graph of clusters: ask the peer you're trying to add for all of its peers, then ask each of them in turn, and build up a graph structure that you can interrogate. Only call is Tell me your current peers. I suppose this could cause problems if not all clusters can communicate; say, if B is visible to A, and C is visible to B, but C is not visible to A. And I guess there might be race conditions if you try to add peers on multiple clusters simultaneously, there's not really a way to avoid that. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565851#comment-13565851 ] Ted Yu commented on HBASE-7709: --- What if C - B - A is established, and after some time, a potential cycle is formed with B - C ? Along my comment above, we can PB the metadata in WALEdit and HLogKey where cluster Id is declared as repeated in .proto. A tool is provided to convert pre-0.96 WAL files into the new format. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565869#comment-13565869 ] Ian Varley commented on HBASE-7709: --- Ah, touche. That particular arrangement is still fine (the resulting graph, (B - C - B) - A, doesn't have any bad cycles. However, you raise a good point; if you start with: A - B - C and then later add C - B, you'd get: A - (B - C - B) which is a bad cycle. And C has no way of knowing about A - B; as a peer, you only know who you replicate to, not who replicates to you. A cluster could keep track of who is replicating TO it; in ReplicationSink, we could track all the cluster IDs that have ever sent data in, and report that through the who do you replicate with API. So then it would let you build a full graph, because you get the backwards edges. Of course, there's still plenty of catches: the race conditions, plus the possibility that someone is set up to replicate to you, but they just haven't sent any edits yet. Meh. With this level of complication, a solution in the direction you're talking about (adding info to the WAL) might be safer. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565874#comment-13565874 ] Ted Yu commented on HBASE-7709: --- Thanks Ian for correcting my example given @ 29/Jan/13 21:53 My point was that replication topology can grow quite complex. If we cannot enumerate all the intricacies, we'd better design something that suits future development. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7709) Infinite loop possible in Master/Master replication
[ https://issues.apache.org/jira/browse/HBASE-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565876#comment-13565876 ] Ian Varley commented on HBASE-7709: --- Yes, I agree. While it's possible (in theory) to interrogate the actual topology at runtime, a solution that makes such problems impossible is much better. Infinite loop possible in Master/Master replication --- Key: HBASE-7709 URL: https://issues.apache.org/jira/browse/HBASE-7709 Project: HBase Issue Type: Bug Components: Replication Reporter: Lars Hofhansl Fix For: 0.96.0, 0.94.6 We just discovered the following scenario: # Cluster A and B are setup in master/master replication # By accident we had Cluster C replicate to Cluster A. Now all edit originating from C will be bouncing between A and B. Forever! The reason is that when the edit come in from C the cluster ID is already set and won't be reset. We have a couple of options here: # Optionally only support master/master (not cycles of more than two clusters). In that case we can always reset the cluster ID in the ReplicationSource. That means that now cycles 2 will have the data cycle forever. This is the only option that requires no changes in the HLog format. # Instead of a single cluster id per edit maintain a (unordered) set of cluster id that have seen this edit. Then in ReplicationSource we drop any edit that the sink has seen already. The is the cleanest approach, but it might need a lot of data stored per edit if there are many clusters involved. # Maintain a configurable counter of the maximum cycle side we want to support. Could default to 10 (even maybe even just). Store a hop-count in the WAL and the ReplicationSource increases that hop-count on each hop. If we're over the max, just drop the edit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira