[
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044885#comment-16044885
]
Guozhang Wang commented on KAFKA-5415:
--
[~apurva] [~hachikuji] The root cause is in a couple folds:
1) First in {{TransactionStateManager#appendTransactionToLog}}, when we
received an error while trying to append to log (in our case it is
`NotEnoughReplicas`), we will reset the pending state: {{metadata.pendingState
= None}}. This behavior is correct for all its callers EXCEPT the one from
{{TransactionMarkerChannelManager#tryAppendToLog}}, in which {{appendCallback}}
will retry appending again but this time the pending state has been reset.
So when the retry finally succeed, it will call
{{TransactionMetadata#completeTransitionTo}} we will throw an exception since
the pending state is `None`.
2) This exception is thrown all the way to {{KafkaApi#handleFetchRequest}},
because it is the thread that handles the follower fetch request, and then
increment the HW, and then call {{tryCompleteDelayedRequests}} which will throw
the exception. This exception will be handled in {{handleError}}, in which we
do not print ANY logging but simply returns an error to the follower fetcher.
And hence from the logs it is a mystery that it seems the callback was not
called; in fact it did get called by throw an exception that get silently
swallowed.
The key log entries the disclose this is in worker7:
{code}
[2017-06-09 01:16:54,132] DEBUG Partition [__transaction_state,37] on broker 1:
High watermark for partition [__transaction_state,37] updated to 86 [0 : 14667]
(kafka.cluster.Partition)
[2017-06-09 01:16:54,132] DEBUG [Replica Manager on Broker 1]: Request key
__transaction_state-37 unblocked 0 fetch requests. (kafka.server.ReplicaManager)
{code}
Note it does not have any "unblocked x produce requests" after "unblocked 0
fetch requests", while in {{tryCompleteDelayedRequests}} we always try to
complete fetch, then produce, then delete requests; and in worker2:
{code}
[2017-06-09 01:16:54,136] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,22] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,136] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,8] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,21] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,4] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,7] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,9] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,46] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,25] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,35] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition
[__consumer_offsets,41] to broker
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced
an unexpected error when processing the request
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for