alirezazamani commented on a change in pull request #1422:
URL: https://github.com/apache/helix/pull/1422#discussion_r500415345



##########
File path: 
helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -122,30 +122,37 @@ public void updatePreviousAssignedTasksStatus(
       Set<Integer> donePartitions = new TreeSet<>();
       for (int pId : pSet) {
         final String pName = pName(jobResource, pId);
-        TaskPartitionState currState = 
updateJobContextAndGetTaskCurrentState(currStateOutput,
+        TaskPartitionState currState = getTaskCurrentState(currStateOutput,
             jobResource, pId, pName, instance, jobCtx, jobTgtState);
 
-        if (!instance.equals(jobCtx.getAssignedParticipant(pId))) {
-          LOG.warn(
-              "Instance {} does not match the assigned participant for pId {} 
in the job context. Skipping task scheduling.",
-              instance, pId);
-          continue;
-        }
-
         // Check for pending state transitions on this (partition, instance). 
If there is a pending
         // state transition, we prioritize this pending state transition and 
set the assignment from
         // this pending state transition, essentially "waiting" until this 
pending message clears
+        // If there is a pending message, we should not continue to update the 
context because from
+        // controller prospective, state transition has not been completed yet 
if pending message
+        // still existed.
+        // If context gets updated here, controller might remove the job from 
RunTimeJobDAG which
+        // can cause the task's CurrentState not being removed when there is a 
pending message for
+        // that task.
         Message pendingMessage =
             currStateOutput.getPendingMessage(jobResource, new 
Partition(pName), instance);
-        if (pendingMessage != null && 
!pendingMessage.getToState().equals(currState.name())) {

Review comment:
       It is very complicated to explain:
   1- The controller sends RUNNINg to COMPLETED message.
   2- The participant marks the current state to be COMPLETED but the pending 
message has not been removed yet.
   3- In the next pipeline (pending message toState is COMPLETED, currentState 
is COMPLETED), this if statement would not be satisfied. Hence, we decide 
COMPLETED to DROPPED without considering pending messages.
   4- The controller marks the job as completed and removes it from the DAG (so 
we will never consider this job again unless the controller switch to etc).
   5- Since the controller sees the pending message, in the message generation 
phase we do not consider a new decision which is COMPLETED to DROPPED and then 
CS will never be DROPPED.
   The conclusion is that the controller should not consider state transition 
as done state transition until the pending message is gone from ZK.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to