jiajunwang commented on a change in pull request #1227:
URL: https://github.com/apache/helix/pull/1227#discussion_r474349029



##########
File path: 
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
##########
@@ -623,6 +623,13 @@ void reset() {
       }
     }
 
+    // some thing like STATE_TRANSITION.Key specific pool are not shut down.
+    for (String msgType : _executorMap.keySet()) {
+      ExecutorService pool = _executorMap.remove(msgType);

Review comment:
       If the pool is created in user logic and passed in to construct the 
executor, then executor shall not shutdown it here. 

##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskDriver.java
##########
@@ -969,17 +969,29 @@ public TaskState pollForJobState(String workflowName, 
String jobName, long timeo
     Set<TaskState> allowedStates = new HashSet<>(Arrays.asList(states));
     // Wait for state
     long st = System.currentTimeMillis();
+    long ct = 0;
     do {
       Thread.sleep(timeToSleep);
       ctx = getWorkflowContext(workflowName);
+      ct =  System.currentTimeMillis();
     } while ((ctx == null || ctx.getJobState(jobName) == null
         || !allowedStates.contains(ctx.getJobState(jobName)))
-        && System.currentTimeMillis() < st + timeout);
+        && ct < st + timeout);
 
     if (ctx == null || !allowedStates.contains(ctx.getJobState(jobName))) {

Review comment:
       I guess you can simplify this if-else. It is not quite readable. Please 
avoid the redundant check and add some comments here.

##########
File path: helix-core/src/test/java/org/apache/helix/TestHelper.java
##########
@@ -794,7 +797,12 @@ public static boolean verify(Verifier verifier, long 
timeout) throws Exception {
     long start = System.currentTimeMillis();
     do {
       boolean result = verifier.verify();
-      if (result || (System.currentTimeMillis() - start) > timeout) {
+      long now = System.currentTimeMillis();
+      if (result || (now - start) > timeout) {

Review comment:
       This code looks chaotic. Please simplify.
   
   boolean result = verifier.verify();
   boolean timeout = (System.currentTimeMillis() - start) > timeout;
   if (result || timeout) {
     if (timeout) {
       LOG.error()
     }
     return result;
   }

##########
File path: 
helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/BestPossibleExternalViewVerifier.java
##########
@@ -224,6 +224,9 @@ public boolean verifyByZkCallback(long timeout) {
   @Override
   protected synchronized boolean verifyState() {
     try {
+      if (_isLogMore) {

Review comment:
       I guess this helps your debug, but are you suggesting we check this into 
the master branch? In general, I don't like optional code in master.

##########
File path: 
helix-core/src/main/java/org/apache/helix/tools/ClusterStateVerifier.java
##########
@@ -181,6 +181,12 @@ private BestPossAndExtViewZkVerifier(HelixZkClient 
zkClient, String clusterName,
       this.resources = resources;
     }
 
+    public void close() {

Review comment:
       Is this class deprecated? Instead of modifying it, can we just change 
the callers to use the newer verifier?

##########
File path: helix-core/src/main/java/org/apache/helix/task/TaskDriver.java
##########
@@ -72,7 +72,7 @@
   private static final Logger LOG = LoggerFactory.getLogger(TaskDriver.class);
 
   /** Default time out for monitoring workflow or job state */
-  private final static int DEFAULT_TIMEOUT = 5 * 60 * 1000; /* 5 mins */
+  private final static int DEFAULT_TIMEOUT = 15 * 60 * 1000; /* 15 mins, same 
as default in TestNg.xml */

Review comment:
       I think this impacts our features. ANd this number is not randomly set. 
Please don't change the default value. If it fails the test, I think we can 
just use the method in which you can specify the timeout explicitly.
   
   And, I do want to understand why it will help the test. Could you please 
clarify?

##########
File path: helix-core/src/test/conf/testng.xml
##########
@@ -18,7 +18,7 @@
   ~ under the License.
   -->
 <!DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd";>
-<suite name="Suite" time-out="300000">
+<suite name="Suite" time-out="900000">

Review comment:
       Does any test definitely need more than 5 minutes?

##########
File path: 
helix-core/src/main/java/org/apache/helix/tools/ClusterVerifiers/ZkHelixClusterVerifier.java
##########
@@ -48,6 +50,10 @@
   private static Logger LOG = 
LoggerFactory.getLogger(ZkHelixClusterVerifier.class);
   protected static int DEFAULT_TIMEOUT = 300 * 1000;
   protected static int DEFAULT_PERIOD = 500;
+  // COOL_DOWN before starting vefiyByPool
+  // The goal is to make sure waiting for controller pipeline starts at least 
one cycle
+  // to update ideal state.
+  protected static int DEFAULT_COOLDOWN = 2 * 1000;

Review comment:
       I think this is another workaround to avoid issue 
https://github.com/apache/helix/issues/526. If this is the case, then it would 
be an overkill to add to the general verifier logic. Because not all verifying 
logic depends on the pipeline running.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to