somandal commented on issue #3014:
URL: https://github.com/apache/helix/issues/3014#issuecomment-2807506987
@junkaixue @zpinto it also looks like there is some behavior change
regarding DROPPED state transitions
We have a scenario where a state transition throws an exception so the
partition goes into ERROR state. After that we see that partition get an ERROR
-> DROPPED state transition
```
2025-04-15T16:52:40.1170674Z 16:52:39.924 ERROR [Server_localhost_22001 -
SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread_78]
SegmentOnlineOfflineStateModel.onBecomeDroppedFromError() :
ZnRecord=661f7f6c-5c85-440e-86c5-350adf2ee5ca, {CREATE_TIMESTAMP=1744735959917,
ClusterEventName=PeriodicalRebalance, EXECUTE_START_TIMESTAMP=1744735959924,
EXE_SESSION_ID=10000195d57000b, FROM_STATE=ERROR,
MSG_ID=661f7f6c-5c85-440e-86c5-350adf2ee5ca, MSG_STATE=read,
MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=mytable__0__0__20250415T1643Z,
READ_TIMESTAMP=1744735959921, RESOURCE_NAME=mytable_REALTIME,
RESOURCE_TAG=mytable_REALTIME, RETRY_COUNT=3, SRC_NAME=localhost_20000,
SRC_SESSION_ID=10000195d570003, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel,
STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_localhost_22001,
TGT_SESSION_ID=10000195d57000b, TO_STATE=DROPPED}{}{}, Stat=Stat {_version=0,
_creationTime=1744735959918, _modifiedTime=1744735959918, _ephemeralOwner=0}
```
On debugging I do see it hit this new code:
```
// Look through the current state map and add DROPPED message if the
instance is not in the
// resourceStateMap. This instance may not have had been dropped by
the rebalance strategy.
// This check is required to ensure that the instances removed from
the ideal state stateMap
// are properly dropped.
for (String instance : currentStateMap.keySet()) {
if (!instanceStateMap.containsKey(instance)) {
instanceStateMap.put(instance, HelixDefinedState.DROPPED.name());
}
}
```
I've attached some screenshots of the fields in the debugger. In Helix
1.3.1, I see the same scenario in terms of the instanceStateMap being empty for
that partition, current state having an entry as ERROR, but a DROPPED
transition is never sent since the above code doesn't exist.
Can you folks elaborate more on this behavior change and why? Is this a bug
or intended? thanks!
<img width="1223" alt="Image"
src="https://github.com/user-attachments/assets/feceecfe-a1ad-4122-9719-19fc97c1adb9"
/>
<img width="1259" alt="Image"
src="https://github.com/user-attachments/assets/c609d638-bc08-4168-923b-322fb7205470"
/>
<img width="1287" alt="Image"
src="https://github.com/user-attachments/assets/6790b27b-f2d3-42ae-83fd-c676d9146617"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]