alirezazamani commented on a change in pull request #1422:
URL: https://github.com/apache/helix/pull/1422#discussion_r500415345
##########
File path:
helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -122,30 +122,37 @@ public void updatePreviousAssignedTasksStatus(
Set<Integer> donePartitions = new TreeSet<>();
for (int pId : pSet) {
final String pName = pName(jobResource, pId);
- TaskPartitionState currState =
updateJobContextAndGetTaskCurrentState(currStateOutput,
+ TaskPartitionState currState = getTaskCurrentState(currStateOutput,
jobResource, pId, pName, instance, jobCtx, jobTgtState);
- if (!instance.equals(jobCtx.getAssignedParticipant(pId))) {
- LOG.warn(
- "Instance {} does not match the assigned participant for pId {}
in the job context. Skipping task scheduling.",
- instance, pId);
- continue;
- }
-
// Check for pending state transitions on this (partition, instance).
If there is a pending
// state transition, we prioritize this pending state transition and
set the assignment from
// this pending state transition, essentially "waiting" until this
pending message clears
+ // If there is a pending message, we should not continue to update the
context because from
+ // controller prospective, state transition has not been completed yet
if pending message
+ // still existed.
+ // If context gets updated here, controller might remove the job from
RunTimeJobDAG which
+ // can cause the task's CurrentState not being removed when there is a
pending message for
+ // that task.
Message pendingMessage =
currStateOutput.getPendingMessage(jobResource, new
Partition(pName), instance);
- if (pendingMessage != null &&
!pendingMessage.getToState().equals(currState.name())) {
Review comment:
It is very complicated to explain:
1- The controller sends RUNNINg to COMPLETED message.
2- The participant marks the current state to be COMPLETED but the pending
message has not been removed yet.
3- In the next pipeline (pending message toState is COMPLETED, currentState
is COMPLETED), this if statement would not be satisfied. Hence, we decide
COMPLETED to DROPPED without considering pending messages.
4- The controller marks the job as completed and removes it from the DAG (so
we will never consider this job again unless the controller switch to etc).
5- Since the controller sees the pending message, in the message generation
phase we do not consider a new decision which is COMPLETED to DROPPED and then
CS will never be DROPPED.
The conclusion is that the controller should not consider state transition
as done state transition until the pending message is gone from ZK.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]