CRZbulabula commented on code in PR #17895:
URL: https://github.com/apache/iotdb/pull/17895#discussion_r3386235561


##########
iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/conf/ConfigNodeConfig.java:
##########
@@ -318,6 +318,16 @@ public class ConfigNodeConfig {
   private long schemaRegionRatisInitialSleepTimeMs = 100;
   private long schemaRegionRatisMaxSleepTimeMs = 10000;
 
+  /**
+   * RatisConsensus protocol, max retry attempts for a configuration change 
(add/remove peer). Uses
+   * a fixed 2s retry interval; bounding the attempts stops a killed ADDING 
peer from blocking the
+   * reconfiguration -- and hence a region migration -- forever.
+   */
+  private int configNodeRatisReconfigurationMaxRetryAttempts = 600;
+
+  private int dataRegionRatisReconfigurationMaxRetryAttempts = 600;
+  private int schemaRegionRatisReconfigurationMaxRetryAttempts = 600;
+

Review Comment:
   Agreed. 20 minutes is too long for this failure path. I changed the default 
reconfiguration retry attempts from 600 to 15, which is about 30 seconds with 
the fixed 2s retry interval. This keeps it close to the ConfigNode Unknown 
detection window while leaving a small buffer for transient delays. The newly 
added Ratis region migration ITs have also been moved into DailyIT after the CI 
passed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to