jiajunwang commented on a change in pull request #1500:
URL: https://github.com/apache/helix/pull/1500#discussion_r514911447
##########
File path:
helix-core/src/main/java/org/apache/helix/messaging/handling/HelixTaskExecutor.java
##########
@@ -804,40 +809,66 @@ public void onMessage(String instanceName, List<Message>
messages,
// skip the following operations for the no-op messages.
continue;
}
- // create message handlers, if handlers not found, leave its state as NEW
+
NotificationContext msgWorkingContext = changeContext.clone();
+ MessageHandler msgHandler = null;
try {
- MessageHandler msgHandler = createMessageHandler(message,
msgWorkingContext);
- if (msgHandler == null) {
- // Failed to create message handler, skip processing this message in
this callback.
- // The same message process will be retried in the next round.
- continue;
- }
- if (message.getMsgType().equals(MessageType.STATE_TRANSITION.name())
|| message.getMsgType()
- .equals(MessageType.STATE_TRANSITION_CANCELLATION.name())) {
- if (validateAndProcessStateTransitionMessage(message, instanceName,
manager,
- stateTransitionHandlers, msgHandler)) {
- // Need future process by triggering state transition
- String msgTarget =
- getMessageTarget(message.getResourceName(),
message.getPartitionName());
- stateTransitionHandlers.put(msgTarget, msgHandler);
- stateTransitionContexts.put(msgTarget, msgWorkingContext);
- } else {
- // skip the following operations for the invalid/expired state
transition messages.
- continue;
- }
+ // create message handlers, if handlers not found but no exception,
leave its state as NEW
+ msgHandler = createMessageHandler(message, msgWorkingContext);
+ } catch (Exception ex) {
+ // Failed to create message handler and there is an Exception.
+ int remainingRetryCount = message.getRetryCount();
+ LOG.error(
+ "Exception happens when creating Message Handler for message {}.
Current remaining retry count is {}.",
+ message.getMsgId(), remainingRetryCount);
+ // Set the message retry count to avoid infinite retrying.
+ message.setRetryCount(remainingRetryCount - 1);
+ message.setExecuteSessionId(sessionId);
Review comment:
It is a good point.
There were some discussions earlier yesterday about whether we want this
retry count. So the decision is that we want to allow several retries, but
would be a very small number. Note that this PR won't really add retry to state
transition messages, since the default value of retry count is 0. We will
tackle this later in the other PR.
Now regarding if we want to use an in-memory object instead of ZK, it is
totally doable. But I don't prefer doing so for 2 reasons.
1. This ZK write won't make it worse. The retry is not infinite after this
change. Compare with the current behavior that we keep retrying infinitely, the
new retry logic is much safer. And we can set 0 to the retry count, so no retry
at all. It is purely optional.
2. In-memory count won't work if the participant instance is reset. If the
retry count is really large, then in some corner case (like frequent
participant reset) the retry will never stop.
In addition, the in-memory design requires us to track the messages, it does
not exist on the participant side today. So definitely, it will increase the
complicity of code. There will be more corner cases that we need to consider.
Like memory leakage, etc. Given the benefit of doing so is not critical, I
prefer not to choose this option.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]