liyuheng55555 commented on code in PR #13178:
URL: https://github.com/apache/iotdb/pull/13178#discussion_r1719270294
##########
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/procedure/impl/region/RemoveRegionPeerProcedure.java:
##########
@@ -68,12 +69,25 @@ public RemoveRegionPeerProcedure(
TConsensusGroupId consensusGroupId,
TDataNodeLocation coordinator,
TDataNodeLocation targetDataNode) {
- super();
this.consensusGroupId = consensusGroupId;
this.coordinator = coordinator;
this.targetDataNode = targetDataNode;
}
+ private void handleTransferLeader(RegionMaintainHandler handler)
+ throws ProcedureException, InterruptedException {
+ LOGGER.info(
+ "[pid{}][RemoveRegion] started, region {} will be removed from
DataNode {}.",
+ getProcId(),
+ consensusGroupId.getId(),
+ targetDataNode.getDataNodeId());
+ handler.forceUpdateRegionCache(consensusGroupId, targetDataNode,
RegionStatus.Removing);
+ List<TDataNodeLocation> excludeDataNode = new ArrayList<>();
+ excludeDataNode.add(targetDataNode);
+ excludeDataNode.add(coordinator);
+ handler.transferRegionLeader(consensusGroupId, targetDataNode,
excludeDataNode);
Review Comment:
If replica = 1, we have to choose coordinator as new leader. Maybe not use
`excludedDataNode` only, but use `excluded` and `notSuggested`
##########
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/procedure/env/RegionMaintainHandler.java:
##########
@@ -693,6 +702,14 @@ public void transferRegionLeader(TConsensusGroupId
regionId, TDataNodeLocation o
(CONF.getSchemaRegionRatisRpcLeaderElectionTimeoutMaxMs()
+ CONF.getSchemaRegionRatisRpcLeaderElectionTimeoutMinMs())
/ 2;
+ Integer leaderId =
configManager.getLoadManager().getRegionLeaderMap().get(regionId);
+
+ if (leaderId != -1) {
+ // The migrated node is not leader, so we don't need to transfer
temporarily
+ if (originalDataNode.getDataNodeId() != leaderId) {
+ return;
+ }
+ }
Review Comment:
1. `if (leaderId != -1)` useless check
2. inline leaderId
##########
iotdb-core/consensus/src/main/java/org/apache/iotdb/consensus/ratis/RatisClient.java:
##########
@@ -167,4 +210,48 @@ public Action handleAttemptFailure(Event event) {
return defaultPolicy.handleAttemptFailure(event);
}
}
+
+ // This policy is used to raft configuration change
+ private static class RatisEndlessRetryPolicy implements RetryPolicy {
+
+ private static final Logger logger =
LoggerFactory.getLogger(RatisEndlessRetryPolicy.class);
+ private static final RetryPolicy defaultPolicy;
+
+ static {
+ String str = "";
+ // 50, 500ms, 40, 1000ms, 30, 1500ms, 20, 2000ms, 10, 2500ms
Review Comment:
What is this comment's meaning? Describe your algorithm more clearly.
And, why define an algorithm in static chunk?
##########
iotdb-core/consensus/src/main/java/org/apache/iotdb/consensus/ratis/RatisConsensus.java:
##########
@@ -864,14 +869,30 @@ private RatisClient getRaftClient(RaftGroup group) throws
ClientManagerException
}
}
+ private RatisClient getConfRaftClient(RaftGroup group) throws
ClientManagerException {
Review Comment:
get**Reconfiguration**RaftClient. Keep name consistent.
##########
iotdb-core/consensus/src/main/java/org/apache/iotdb/consensus/ratis/RatisClient.java:
##########
@@ -167,4 +210,48 @@ public Action handleAttemptFailure(Event event) {
return defaultPolicy.handleAttemptFailure(event);
}
}
+
+ // This policy is used to raft configuration change
+ private static class RatisEndlessRetryPolicy implements RetryPolicy {
+
+ private static final Logger logger =
LoggerFactory.getLogger(RatisEndlessRetryPolicy.class);
+ private static final RetryPolicy defaultPolicy;
+
+ static {
+ String str = "";
+ // 50, 500ms, 40, 1000ms, 30, 1500ms, 20, 2000ms, 10, 2500ms
+ int basicRetry = 50;
+ int basicSleep = 500;
+ for (int i = 0; i < 5; i++) {
+ str += basicRetry + "," + basicSleep + ",";
+ basicRetry -= 10;
+ basicSleep += 500;
+ }
+
+ defaultPolicy =
+ MultipleLinearRandomRetry.parseCommaSeparated(str.substring(0,
str.length() - 1));
+ }
+
+ RatisEndlessRetryPolicy() {}
+
+ @Override
+ public Action handleAttemptFailure(Event event) {
+ // Ratis guarantees that event.getCause() is instance of IOException.
+ // We should allow RaftException or IOException(StatusRuntimeException,
thrown by gRPC) to be
+ // retried.
+ Optional<Throwable> unexpectedCause =
+ Optional.ofNullable(event.getCause())
+ .filter(RaftException.class::isInstance)
+ .map(Throwable::getCause)
+ .filter(StatusRuntimeException.class::isInstance);
Review Comment:
What is `RaftException or IOException(StatusRuntimeException, thrown by
gRPC)`? Words between () is confusing, StatusRuntimeException is not a kind of
IOException.
##########
iotdb-core/consensus/src/main/java/org/apache/iotdb/consensus/ratis/RatisConsensus.java:
##########
@@ -901,12 +922,21 @@ public void allowStaleRead(ConsensusGroupId
consensusGroupId) {
private class RatisClientPoolFactory implements
IClientPoolFactory<RaftGroup, RatisClient> {
+ private boolean isReConfiguration;
+
+ RatisClientPoolFactory(boolean isReConfiguration) {
+ this.isReConfiguration = isReConfiguration;
+ }
Review Comment:
is**Reconfiguration**
##########
iotdb-core/consensus/src/main/java/org/apache/iotdb/consensus/ratis/RatisConsensus.java:
##########
@@ -122,6 +122,7 @@ class RatisConsensus implements IConsensus {
private final RaftClientRpc clientRpc;
private final IClientManager<RaftGroup, RatisClient> clientManager;
+ private final IClientManager<RaftGroup, RatisClient> reconfClientManager;
Review Comment:
**reconfiguration**ClientManager. We usually not use abbreviation in iotdb.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]