somandal commented on issue #3014:
URL: https://github.com/apache/helix/issues/3014#issuecomment-2807506987

   @junkaixue @zpinto it also looks like there is some behavior change 
regarding DROPPED state transitions
   
   We have a scenario where a state transition throws an exception so the 
partition goes into ERROR state. After that we see that partition get an ERROR 
-> DROPPED state transition 
   
   ```
   2025-04-15T16:52:40.1170674Z 16:52:39.924 ERROR [Server_localhost_22001 - 
SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread_78] 
SegmentOnlineOfflineStateModel.onBecomeDroppedFromError() : 
ZnRecord=661f7f6c-5c85-440e-86c5-350adf2ee5ca, {CREATE_TIMESTAMP=1744735959917, 
ClusterEventName=PeriodicalRebalance, EXECUTE_START_TIMESTAMP=1744735959924, 
EXE_SESSION_ID=10000195d57000b, FROM_STATE=ERROR, 
MSG_ID=661f7f6c-5c85-440e-86c5-350adf2ee5ca, MSG_STATE=read, 
MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=mytable__0__0__20250415T1643Z, 
READ_TIMESTAMP=1744735959921, RESOURCE_NAME=mytable_REALTIME, 
RESOURCE_TAG=mytable_REALTIME, RETRY_COUNT=3, SRC_NAME=localhost_20000, 
SRC_SESSION_ID=10000195d570003, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel, 
STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_localhost_22001, 
TGT_SESSION_ID=10000195d57000b, TO_STATE=DROPPED}{}{}, Stat=Stat {_version=0, 
_creationTime=1744735959918, _modifiedTime=1744735959918, _ephemeralOwner=0}
   ```
   
   On debugging I do see it hit this new code:
   
   ```
         // Look through the current state map and add DROPPED message if the 
instance is not in the
         // resourceStateMap. This instance may not have had been dropped by 
the rebalance strategy.
         // This check is required to ensure that the instances removed from 
the ideal state stateMap
         // are properly dropped.
         for (String instance : currentStateMap.keySet()) {
           if (!instanceStateMap.containsKey(instance)) {
             instanceStateMap.put(instance, HelixDefinedState.DROPPED.name());
           }
         }
   ```
   
   I've attached some screenshots of the fields in the debugger. In Helix 
1.3.1, I see the same scenario in terms of the instanceStateMap being empty for 
that partition, current state having an entry as ERROR, but a DROPPED 
transition is never sent since the above code doesn't exist.
   
   Can you folks elaborate more on this behavior change and why? Is this a bug 
or intended? thanks!
   
   <img width="1223" alt="Image" 
src="https://github.com/user-attachments/assets/feceecfe-a1ad-4122-9719-19fc97c1adb9";
 />
   
   <img width="1259" alt="Image" 
src="https://github.com/user-attachments/assets/c609d638-bc08-4168-923b-322fb7205470";
 />
   
   <img width="1287" alt="Image" 
src="https://github.com/user-attachments/assets/6790b27b-f2d3-42ae-83fd-c676d9146617";
 />


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to