[jira] [Comment Edited] (HBASE-9465) Push entries to peer clusters serially
[ https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530835#comment-15530835 ] Vincent Poon edited comment on HBASE-9465 at 12/28/16 7:12 PM: --- I don't think you need the code addition above 'continue" here - if readAllEntries() returns true, then lastPositionsForSerialScope should always be empty? {code} if (readAllEntriesToReplicateOrNextFile(currentWALisBeingWrittenTo, entries, lastPositionsForSerialScope)) { for (Map.Entryentry : lastPositionsForSerialScope.entrySet()) { waitingUntilCanPush(entry); } try { MetaTableAccessor .updateReplicationPositions(manager.getConnection(), actualPeerId, lastPositionsForSerialScope); } catch (IOException e) { LOG.error("updateReplicationPositions fail", e); stopper.stop("updateReplicationPositions fail"); } continue; } {code} was (Author: vincentpoon): I think you need the code addition above 'continue" here - if readAllEntries... returns true, then lastPositionsForSerialScope should always be empty? {code} if (readAllEntriesToReplicateOrNextFile(currentWALisBeingWrittenTo, entries, lastPositionsForSerialScope)) { for (Map.Entry entry : lastPositionsForSerialScope.entrySet()) { waitingUntilCanPush(entry); } try { MetaTableAccessor .updateReplicationPositions(manager.getConnection(), actualPeerId, lastPositionsForSerialScope); } catch (IOException e) { LOG.error("updateReplicationPositions fail", e); stopper.stop("updateReplicationPositions fail"); } continue; } {code} > Push entries to peer clusters serially > -- > > Key: HBASE-9465 > URL: https://issues.apache.org/jira/browse/HBASE-9465 > Project: HBase > Issue Type: New Feature > Components: regionserver, Replication >Affects Versions: 2.0.0, 1.4.0 >Reporter: Honghua Feng >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-9465-branch-1-v1.patch, > HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, > HBASE-9465-branch-1-v3.patch, HBASE-9465-branch-1-v4.patch, > HBASE-9465-branch-1-v4.patch, HBASE-9465-v1.patch, HBASE-9465-v2.patch, > HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465-v4.patch, > HBASE-9465-v5.patch, HBASE-9465-v6.patch, HBASE-9465-v6.patch, > HBASE-9465-v7.patch, HBASE-9465-v7.patch, HBASE-9465.pdf > > > When region-move or RS failure occurs in master cluster, the hlog entries > that are not pushed before region-move or RS-failure will be pushed by > original RS(for region move) or another RS which takes over the remained hlog > of dead RS(for RS failure), and the new entries for the same region(s) will > be pushed by the RS which now serves the region(s), but they push the hlog > entries of a same region concurrently without coordination. > This treatment can possibly lead to data inconsistency between master and > peer clusters: > 1. there are put and then delete written to master cluster > 2. due to region-move / RS-failure, they are pushed by different > replication-source threads to peer cluster > 3. if delete is pushed to peer cluster before put, and flush and > major-compact occurs in peer cluster before put is pushed to peer cluster, > the delete is collected and the put remains in peer cluster > In this scenario, the put remains in peer cluster, but in master cluster the > put is masked by the delete, hence data inconsistency between master and peer > clusters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-9465) Push entries to peer clusters serially
[ https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412858#comment-15412858 ] Phil Yang edited comment on HBASE-9465 at 8/9/16 3:20 AM: -- Can not see the failure message in Test Results, and it can pass locally, let's resubmit the patch to retry was (Author: yangzhe1991): Can not seen the failure message in Test Results, and it can pass locally, let's resubmit the patch to retry > Push entries to peer clusters serially > -- > > Key: HBASE-9465 > URL: https://issues.apache.org/jira/browse/HBASE-9465 > Project: HBase > Issue Type: New Feature > Components: regionserver, Replication >Reporter: Honghua Feng >Assignee: Phil Yang > Attachments: HBASE-9465-branch-1-v1.patch, > HBASE-9465-branch-1-v1.patch, HBASE-9465-branch-1-v2.patch, > HBASE-9465-branch-1-v3.patch, HBASE-9465-branch-1-v4.patch, > HBASE-9465-v1.patch, HBASE-9465-v2.patch, HBASE-9465-v2.patch, > HBASE-9465-v3.patch, HBASE-9465-v4.patch, HBASE-9465-v5.patch, > HBASE-9465-v6.patch, HBASE-9465-v6.patch, HBASE-9465-v7.patch, > HBASE-9465-v7.patch, HBASE-9465.pdf > > > When region-move or RS failure occurs in master cluster, the hlog entries > that are not pushed before region-move or RS-failure will be pushed by > original RS(for region move) or another RS which takes over the remained hlog > of dead RS(for RS failure), and the new entries for the same region(s) will > be pushed by the RS which now serves the region(s), but they push the hlog > entries of a same region concurrently without coordination. > This treatment can possibly lead to data inconsistency between master and > peer clusters: > 1. there are put and then delete written to master cluster > 2. due to region-move / RS-failure, they are pushed by different > replication-source threads to peer cluster > 3. if delete is pushed to peer cluster before put, and flush and > major-compact occurs in peer cluster before put is pushed to peer cluster, > the delete is collected and the put remains in peer cluster > In this scenario, the put remains in peer cluster, but in master cluster the > put is masked by the delete, hence data inconsistency between master and peer > clusters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-9465) Push entries to peer clusters serially
[ https://issues.apache.org/jira/browse/HBASE-9465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398898#comment-15398898 ] Phil Yang edited comment on HBASE-9465 at 7/29/16 7:41 AM: --- Read HBASE-15156 , I have no idea when it will be committed to master and it is configurable and default is disable. Maybe we can keep the logic simpler now and after HBASE-15156 pushed to master, we can open a follow-up issue to handle this case if needed? was (Author: yangzhe1991): Read HBASE-15156 , I have no idea when it will be committed to master and it is configurable and default is disable. Maybe we can keep the logic simpler now and after HBASE-15156 pushed to master, we can open a follow-up issue to handle this case? > Push entries to peer clusters serially > -- > > Key: HBASE-9465 > URL: https://issues.apache.org/jira/browse/HBASE-9465 > Project: HBase > Issue Type: New Feature > Components: regionserver, Replication >Reporter: Honghua Feng >Assignee: Phil Yang > Attachments: HBASE-9465-v1.patch, HBASE-9465-v2.patch, > HBASE-9465-v2.patch, HBASE-9465-v3.patch, HBASE-9465.pdf > > > When region-move or RS failure occurs in master cluster, the hlog entries > that are not pushed before region-move or RS-failure will be pushed by > original RS(for region move) or another RS which takes over the remained hlog > of dead RS(for RS failure), and the new entries for the same region(s) will > be pushed by the RS which now serves the region(s), but they push the hlog > entries of a same region concurrently without coordination. > This treatment can possibly lead to data inconsistency between master and > peer clusters: > 1. there are put and then delete written to master cluster > 2. due to region-move / RS-failure, they are pushed by different > replication-source threads to peer cluster > 3. if delete is pushed to peer cluster before put, and flush and > major-compact occurs in peer cluster before put is pushed to peer cluster, > the delete is collected and the put remains in peer cluster > In this scenario, the put remains in peer cluster, but in master cluster the > put is masked by the delete, hence data inconsistency between master and peer > clusters -- This message was sent by Atlassian JIRA (v6.3.4#6332)