jiajunwang commented on a change in pull request #1227:
URL: https://github.com/apache/helix/pull/1227#discussion_r474349029
##########
File path:
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
##########
@@ -623,6 +623,13 @@ void reset() {
}
}
+ // some thing like STATE_TRANSITION.Key specific pool are not shut down.
+ for (String msgType : _executorMap.keySet()) {
+ ExecutorService pool = _executorMap.remove(msgType);
Review comment:
If the pool is created in user logic and passed in to construct the
executor, then executor shall not shutdown it here.
##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskDriver.java
##########
@@ -969,17 +969,29 @@ public TaskState pollForJobState(String workflowName,
String jobName, long timeo
Set<TaskState> allowedStates = new HashSet<>(Arrays.asList(states));
// Wait for state
long st = System.currentTimeMillis();
+ long ct = 0;
do {
Thread.sleep(timeToSleep);
ctx = getWorkflowContext(workflowName);
+ ct = System.currentTimeMillis();
} while ((ctx == null || ctx.getJobState(jobName) == null
|| !allowedStates.contains(ctx.getJobState(jobName)))
- && System.currentTimeMillis() < st + timeout);
+ && ct < st + timeout);
if (ctx == null || !allowedStates.contains(ctx.getJobState(jobName))) {
Review comment:
I guess you can simplify this if-else. It is not quite readable. Please
avoid the redundant check and add some comments here.
##########
File path: helix-core/src/test/java/org/apache/helix/TestHelper.java
##########
@@ -794,7 +797,12 @@ public static boolean verify(Verifier verifier, long
timeout) throws Exception {
long start = System.currentTimeMillis();
do {
boolean result = verifier.verify();
- if (result || (System.currentTimeMillis() - start) > timeout) {
+ long now = System.currentTimeMillis();
+ if (result || (now - start) > timeout) {
Review comment:
This code looks chaotic. Please simplify.
boolean result = verifier.verify();
boolean timeout = (System.currentTimeMillis() - start) > timeout;
if (result || timeout) {
if (timeout) {
LOG.error()
}
return result;
}
##########
File path:
helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/BestPossibleExternalViewVerifier.java
##########
@@ -224,6 +224,9 @@ public boolean verifyByZkCallback(long timeout) {
@Override
protected synchronized boolean verifyState() {
try {
+ if (_isLogMore) {
Review comment:
I guess this helps your debug, but are you suggesting we check this into
the master branch? In general, I don't like optional code in master.
##########
File path:
helix-core/src/main/java/org/apache/helix/tools/ClusterStateVerifier.java
##########
@@ -181,6 +181,12 @@ private BestPossAndExtViewZkVerifier(HelixZkClient
zkClient, String clusterName,
this.resources = resources;
}
+ public void close() {
Review comment:
Is this class deprecated? Instead of modifying it, can we just change
the callers to use the newer verifier?
##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskDriver.java
##########
@@ -72,7 +72,7 @@
private static final Logger LOG = LoggerFactory.getLogger(TaskDriver.class);
/** Default time out for monitoring workflow or job state */
- private final static int DEFAULT_TIMEOUT = 5 * 60 * 1000; /* 5 mins */
+ private final static int DEFAULT_TIMEOUT = 15 * 60 * 1000; /* 15 mins, same
as default in TestNg.xml */
Review comment:
I think this impacts our features. ANd this number is not randomly set.
Please don't change the default value. If it fails the test, I think we can
just use the method in which you can specify the timeout explicitly.
And, I do want to understand why it will help the test. Could you please
clarify?
##########
File path: helix-core/src/test/conf/testng.xml
##########
@@ -18,7 +18,7 @@
~ under the License.
-->
<!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd">
-<suite name="Suite" time-out="300000">
+<suite name="Suite" time-out="900000">
Review comment:
Does any test definitely need more than 5 minutes?
##########
File path:
helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/ZkHelixClusterVerifier.java
##########
@@ -48,6 +50,10 @@
private static Logger LOG =
LoggerFactory.getLogger(ZkHelixClusterVerifier.class);
protected static int DEFAULT_TIMEOUT = 300 * 1000;
protected static int DEFAULT_PERIOD = 500;
+ // COOL_DOWN before starting vefiyByPool
+ // The goal is to make sure waiting for controller pipeline starts at least
one cycle
+ // to update ideal state.
+ protected static int DEFAULT_COOLDOWN = 2 * 1000;
Review comment:
I think this is another workaround to avoid issue
https://github.com/apache/helix/issues/526. If this is the case, then it would
be an overkill to add to the general verifier logic. Because not all verifying
logic depends on the pipeline running.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]