[jira] [Updated] (GEODE-9191) PR clear should not miss clearing bucket which lost primary
[ https://issues.apache.org/jira/browse/GEODE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaojian Zhou updated GEODE-9191: - Labels: GeodeOperationAPI (was: ) > PR clear should not miss clearing bucket which lost primary > --- > > Key: GEODE-9191 > URL: https://issues.apache.org/jira/browse/GEODE-9191 > Project: Geode > Issue Type: Sub-task >Reporter: Xiaojian Zhou >Priority: Major > Labels: GeodeOperationAPI > > This scenario is found when introducing GII test case for PR clear. The > sequence is: > (1) there're 3 servers, server1 is accessor, server2 and server3 are > datastores. > (2) shutdown server2 > (3) send PR clear from server1 (accessor) and restart server2 at the same > time. There's a race that server2 did not receive the > PartitionedRegionClearMessage. > (4) server2 finished GII > (5) only server3 received PartitionedRegionClearMessage and it hosts all the > primary buckets. When PR clear thread iterates through these primary buckets > one by one, some of them might lose primary to server2. > (6) BR.cmnClearRegion will return immediately since it's no longer primary, > but clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be > called. So from the caller point of view, this bucket is cleared. It wouldn't > even throw PartitionedRegionPartialClearException. > The problem is: > before calling cmnClearRegion, we should call BR.doLockForPrimary to make > sure it's still primary. If not, throw exception. Then > clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for > this bucket. > The expected behavior in this scenario is to throw > PartitionedRegionPartialClearException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (GEODE-9191) PR clear should not miss clearing bucket which lost primary
[ https://issues.apache.org/jira/browse/GEODE-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaojian Zhou updated GEODE-9191: - Parent: GEODE-7665 Issue Type: Sub-task (was: Bug) > PR clear should not miss clearing bucket which lost primary > --- > > Key: GEODE-9191 > URL: https://issues.apache.org/jira/browse/GEODE-9191 > Project: Geode > Issue Type: Sub-task >Reporter: Xiaojian Zhou >Priority: Major > > This scenario is found when introducing GII test case for PR clear. The > sequence is: > (1) there're 3 servers, server1 is accessor, server2 and server3 are > datastores. > (2) shutdown server2 > (3) send PR clear from server1 (accessor) and restart server2 at the same > time. There's a race that server2 did not receive the > PartitionedRegionClearMessage. > (4) server2 finished GII > (5) only server3 received PartitionedRegionClearMessage and it hosts all the > primary buckets. When PR clear thread iterates through these primary buckets > one by one, some of them might lose primary to server2. > (6) BR.cmnClearRegion will return immediately since it's no longer primary, > but clearedBuckets.add(localPrimaryBucketRegion.getId()); will still be > called. So from the caller point of view, this bucket is cleared. It wouldn't > even throw PartitionedRegionPartialClearException. > The problem is: > before calling cmnClearRegion, we should call BR.doLockForPrimary to make > sure it's still primary. If not, throw exception. Then > clearedBuckets.add(localPrimaryBucketRegion.getId()); will not be called for > this bucket. > The expected behavior in this scenario is to throw > PartitionedRegionPartialClearException. -- This message was sent by Atlassian Jira (v8.3.4#803005)