huizhilu commented on a change in pull request #1740:
URL: https://github.com/apache/helix/pull/1740#discussion_r633039726
##########
File path: helix-core/src/main/java/org/apache/helix/model/PauseSignal.java
##########
@@ -28,7 +28,11 @@
public class PauseSignal extends HelixProperty {
public enum PauseSignalProperty {
Review comment:
We don't want to put cluster status into the PAUSE znode. The PAUSE
znode is only created/updated by users. Controller does not write any cluster
status to it.
The PAUSE znode will be deleted by controller after cluster fully exits.
##########
File path: helix-core/src/main/java/org/apache/helix/model/LiveInstance.java
##########
@@ -40,7 +40,20 @@
LIVE_INSTANCE,
ZKPROPERTYTRANSFERURL,
RESOURCE_CAPACITY,
- CURRENT_TASK_THREAD_POOL_SIZE
+ CURRENT_TASK_THREAD_POOL_SIZE,
+
+ /** Represents {@link LiveInstanceStatus} */
+ STATUS
+ }
+
+ /**
+ * Represents status of a live instance, eg. PAUSED.
+ */
+ public enum LiveInstanceStatus {
+ NORMAL,
Review comment:
Yeah that's correct. I've thought about it. It is put there to help in
the return of java/rest api. If the cluster has fully exited pause, what will
the query result look like? "NORMAL" is helping with it. I just wanted to make
it as a constant instead of hardcoding in the API.
##########
File path:
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
##########
@@ -497,6 +499,67 @@ public boolean isInMaintenanceMode(String clusterName) {
.exists(keyBuilder.maintenance().getPath(), AccessOption.PERSISTENT);
}
+ @Override
+ public void enableClusterPauseMode(String clusterName, boolean
cancelPendingST, String reason) {
+ logger.info("Enable cluster pause mode for cluster: {}. Cancel pending ST:
{}. Reason: {}.",
+ clusterName, cancelPendingST, reason);
+
+ PauseSignal pauseSignal = new PauseSignal("pause");
Review comment:
It's the same as the other API: `disableCluster()`...
Will put "pause" as a default pause id in `PauseSignal`
##########
File path: helix-core/src/main/java/org/apache/helix/HelixAdmin.java
##########
@@ -367,6 +368,41 @@ void manuallyEnableMaintenanceMode(String clusterName,
boolean enabled, String r
*/
boolean isInMaintenanceMode(String clusterName);
+ /**
+ * Puts a cluster into pause mode, which will pause controller and
participants.
+ * This can be used to retain the cluster state. When this method returns,
it means
+ * the pause signal has been successfully sent, but it does not mean the
cluster has
+ * fully entered pause mode. Because the cluster can take some time to
finish the pause and
+ * process possible pending state transitions if necessary.
+ * <p>
+ * To check the cluster pause status, call {@link
#getClusterPauseStatus(String)}.
+ *
+ * @param clusterName Participant cluster name.
+ * @param cancelPendingST whether or not cancel the pending state
transitions for participants.
+ * @param reason The reason to put the cluster into pause mode.
+ */
+ void enableClusterPauseMode(String clusterName, boolean cancelPendingST,
String reason);
+
+ /**
+ * Gets cluster pause status {@link ClusterPauseStatus}.
+ *
+ * @param clusterName cluster name.
+ * @return {@link ClusterPauseStatus}
+ */
+ ClusterPauseStatus getClusterPauseStatus(String clusterName);
+
+ /**
+ * Gets a cluster out of pause mode. When this method returns, it means
+ * the resume signal has been successfully sent, but it does not mean the
cluster has
+ * fully exited pause mode.
+ * <p>
+ * To check whether the cluster has fully exited pause mode, call
+ * {@link #getClusterPauseStatus(String)}.
+ *
+ * @param clusterName Cluster name.
+ */
+ void disableClusterPauseMode(String clusterName);
Review comment:
I understand there are many similar APIs in Helix. However, to me,
passing boolean parameters to differentiate the behaviors is really a code
smell. Ref:
https://softwareengineering.stackexchange.com/questions/208224/is-it-better-to-have-one-method-that-takes-a-bool-as-a-parameter-or-two-methods
I would try to avoid that. So I prefer two methods.
##########
File path:
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
##########
@@ -497,6 +499,67 @@ public boolean isInMaintenanceMode(String clusterName) {
.exists(keyBuilder.maintenance().getPath(), AccessOption.PERSISTENT);
}
+ @Override
+ public void enableClusterPauseMode(String clusterName, boolean
cancelPendingST, String reason) {
+ logger.info("Enable cluster pause mode for cluster: {}. Cancel pending ST:
{}. Reason: {}.",
+ clusterName, cancelPendingST, reason);
+
+ PauseSignal pauseSignal = new PauseSignal("pause");
+ pauseSignal.setPauseCluster(Boolean.toString(true));
+ pauseSignal.setCancelPendingST(cancelPendingST);
+ pauseSignal.setFromHost(NetworkUtil.getLocalhostName());
+ pauseSignal.setTriggerTime(Instant.now().toString());
Review comment:
`Instant.now()` gives a human readable format at UTC zone. The epoch
timestamp really does not help being written to znode, because we usually will
directly read the znode. So a readable format will really help.
The trade-off is also added to the design doc.
##########
File path: helix-core/src/main/java/org/apache/helix/model/PauseSignal.java
##########
@@ -28,7 +28,11 @@
public class PauseSignal extends HelixProperty {
public enum PauseSignalProperty {
- REASON
+ REASON,
+ PAUSE_CLUSTER,
Review comment:
These fields are representing the user's action. It'd be a record to
help tracking and debugging.
PAUSE_CLUSTER: whether enable/disable pause mode
FROM_HOST: where the pause signal is sent from, eg. Helix REST or a user's
laptop that directly calls java api.
CANCEL_PENDING_ST: whether to cancel pending st. Controller will read this
field to determine to send cancel messages.
##########
File path: helix-core/src/main/java/org/apache/helix/model/LiveInstance.java
##########
@@ -40,7 +40,20 @@
LIVE_INSTANCE,
ZKPROPERTYTRANSFERURL,
RESOURCE_CAPACITY,
- CURRENT_TASK_THREAD_POOL_SIZE
+ CURRENT_TASK_THREAD_POOL_SIZE,
+
+ /** Represents {@link LiveInstanceStatus} */
+ STATUS
+ }
+
+ /**
+ * Represents status of a live instance, eg. PAUSED.
+ */
+ public enum LiveInstanceStatus {
+ NORMAL,
+ PAUSED,
+ ENTERING_PAUSE,
+ EXITING_PAUSE
Review comment:
Yes it could be derived. But do we want to hard code those status in the
API? I thought about it. I think it's cleaner to define the status here in live
instance, which could be used for controller and participant.
When exiting pause, once controller receives the exit notification, the
controller status will marked as "EXITING_PAUSE" in LEADER.
So I think this enum will help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]