alirezazamani opened a new issue #1143:
URL: https://github.com/apache/helix/issues/1143


   There is flag in the cache which is called 
_existsLiveInstanceOrCurrentStateChange which is mainly used by Task Framework 
logic. This flag is used for the TF logic to check if task's target partition 
has been moved to new instance or not. However, there is a possibility of the 
race condition here. Since CurrentState and Message are existed in two 
different folders and updated separately, if cache refresh happens in between, 
we might loose the notification once the target partition has moved. (This 
theory has been proved using a test and probably this can be a reason for some 
existing flaky tests for targeted jobs). Also if currentState is changed and we 
still have a pending message for partition, we do not make any decision for 
this partition/task. To resolve this issue, we might want to also consider the 
message change for this flag as well. So the code can be something like this:
   ```
     private void refreshClusterStateChangeFlags(Set<HelixConstants.ChangeType> 
propertyRefreshed) {
       _existsLiveInstanceOrCurrentStateChange =
           
_propertyDataChangedMap.get(HelixConstants.ChangeType.CURRENT_STATE).getAndSet(false)
               || 
_propertyDataChangedMap.get(HelixConstants.ChangeType.MESSAGE).getAndSet(false)
               || 
propertyRefreshed.contains(HelixConstants.ChangeType.CURRENT_STATE)
               || 
propertyRefreshed.contains(HelixConstants.ChangeType.LIVE_INSTANCE);
     }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to