junkaixue commented on a change in pull request #1740:
URL: https://github.com/apache/helix/pull/1740#discussion_r633037624
##########
File path: helix-core/src/main/java/org/apache/helix/model/LiveInstance.java
##########
@@ -40,7 +40,20 @@
LIVE_INSTANCE,
ZKPROPERTYTRANSFERURL,
RESOURCE_CAPACITY,
- CURRENT_TASK_THREAD_POOL_SIZE
+ CURRENT_TASK_THREAD_POOL_SIZE,
+
+ /** Represents {@link LiveInstanceStatus} */
+ STATUS
+ }
+
+ /**
+ * Represents status of a live instance, eg. PAUSED.
+ */
+ public enum LiveInstanceStatus {
+ NORMAL,
+ PAUSED,
+ ENTERING_PAUSE,
+ EXITING_PAUSE
Review comment:
I dont think we need this. This can be derived as the pausing message is
pending there and not removed with status not in PAUSED.
This kind of status is more like in Cluster Pause Status. Instance level is
not necessary.
##########
File path: helix-core/src/main/java/org/apache/helix/HelixAdmin.java
##########
@@ -367,6 +368,41 @@ void manuallyEnableMaintenanceMode(String clusterName,
boolean enabled, String r
*/
boolean isInMaintenanceMode(String clusterName);
+ /**
+ * Puts a cluster into pause mode, which will pause controller and
participants.
+ * This can be used to retain the cluster state. When this method returns,
it means
+ * the pause signal has been successfully sent, but it does not mean the
cluster has
+ * fully entered pause mode. Because the cluster can take some time to
finish the pause and
+ * process possible pending state transitions if necessary.
+ * <p>
+ * To check the cluster pause status, call {@link
#getClusterPauseStatus(String)}.
+ *
+ * @param clusterName Participant cluster name.
+ * @param cancelPendingST whether or not cancel the pending state
transitions for participants.
+ * @param reason The reason to put the cluster into pause mode.
+ */
+ void enableClusterPauseMode(String clusterName, boolean cancelPendingST,
String reason);
+
+ /**
+ * Gets cluster pause status {@link ClusterPauseStatus}.
+ *
+ * @param clusterName cluster name.
+ * @return {@link ClusterPauseStatus}
+ */
+ ClusterPauseStatus getClusterPauseStatus(String clusterName);
+
+ /**
+ * Gets a cluster out of pause mode. When this method returns, it means
+ * the resume signal has been successfully sent, but it does not mean the
cluster has
+ * fully exited pause mode.
+ * <p>
+ * To check whether the cluster has fully exited pause mode, call
+ * {@link #getClusterPauseStatus(String)}.
+ *
+ * @param clusterName Cluster name.
+ */
+ void disableClusterPauseMode(String clusterName);
Review comment:
Why we need two APIs? enableClusterPauseMode can add an argument with
enabled to be true or false. It follows our convention.
You can have private function to support enable/disable separately. But they
can share the same API.
##########
File path: helix-core/src/main/java/org/apache/helix/model/LiveInstance.java
##########
@@ -40,7 +40,20 @@
LIVE_INSTANCE,
ZKPROPERTYTRANSFERURL,
RESOURCE_CAPACITY,
- CURRENT_TASK_THREAD_POOL_SIZE
+ CURRENT_TASK_THREAD_POOL_SIZE,
+
+ /** Represents {@link LiveInstanceStatus} */
+ STATUS
+ }
+
+ /**
+ * Represents status of a live instance, eg. PAUSED.
+ */
+ public enum LiveInstanceStatus {
+ NORMAL,
Review comment:
We do not need a explicit NORMAL for status. We can just remove the
entry as the normal status for backward compatible.
##########
File path:
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
##########
@@ -497,6 +499,67 @@ public boolean isInMaintenanceMode(String clusterName) {
.exists(keyBuilder.maintenance().getPath(), AccessOption.PERSISTENT);
}
+ @Override
+ public void enableClusterPauseMode(String clusterName, boolean
cancelPendingST, String reason) {
+ logger.info("Enable cluster pause mode for cluster: {}. Cancel pending ST:
{}. Reason: {}.",
+ clusterName, cancelPendingST, reason);
+
+ PauseSignal pauseSignal = new PauseSignal("pause");
Review comment:
NIT: we should not hard code it.
##########
File path:
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
##########
@@ -497,6 +499,67 @@ public boolean isInMaintenanceMode(String clusterName) {
.exists(keyBuilder.maintenance().getPath(), AccessOption.PERSISTENT);
}
+ @Override
+ public void enableClusterPauseMode(String clusterName, boolean
cancelPendingST, String reason) {
+ logger.info("Enable cluster pause mode for cluster: {}. Cancel pending ST:
{}. Reason: {}.",
+ clusterName, cancelPendingST, reason);
+
+ PauseSignal pauseSignal = new PauseSignal("pause");
+ pauseSignal.setPauseCluster(Boolean.toString(true));
+ pauseSignal.setCancelPendingST(cancelPendingST);
+ pauseSignal.setFromHost(NetworkUtil.getLocalhostName());
+ pauseSignal.setTriggerTime(Instant.now().toString());
Review comment:
Question: why not use System.currenttimemillis
##########
File path: helix-core/src/main/java/org/apache/helix/model/PauseSignal.java
##########
@@ -28,7 +28,11 @@
public class PauseSignal extends HelixProperty {
public enum PauseSignalProperty {
- REASON
+ REASON,
+ PAUSE_CLUSTER,
Review comment:
Question: I am not quite sure what do you mean the PAUSE_CLUSTER? Can
you elaborate more on each of the entry. FROM_HOST is something I am not clear
as well.
Also do we need to explicitly put CANCEL_PENDING_ST or not. We can just make
cluster status as PAUSING/EXITING something.
##########
File path: helix-core/src/main/java/org/apache/helix/model/PauseSignal.java
##########
@@ -28,7 +28,11 @@
public class PauseSignal extends HelixProperty {
public enum PauseSignalProperty {
Review comment:
I would suggest not mess up with Cluster Status and Action you would
like to perform. Better to have two enums to hold them separately. One for user
command one for cluster status.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]