Lianet Magrans created KAFKA-16185:
--------------------------------------

             Summary: Fix client reconciliation of same assignment received in 
different epochs 
                 Key: KAFKA-16185
                 URL: https://issues.apache.org/jira/browse/KAFKA-16185
             Project: Kafka
          Issue Type: Sub-task
          Components: clients, consumer
            Reporter: Lianet Magrans
            Assignee: Lianet Magrans


Currently, the intention in the client state machine is that the client always 
reconciles whatever it has pending that has not been removed by the coordinator.

There is still an edge case where this does not happen, and the client might 
get stuck JOINING/RECONCILING, with a pending reconciliation (delayed), and it 
receives the same assignment, but in a new epoch (ex. after being FENCED). 
First time it receives the assignment it takes no action, as it already has it 
as pending to reconcile, but when the reconciliation completes it discards the 
result because the epoch changed. And this is wrong. Note that after sending 
the assignment with the new epoch one time, the broker continues to send null 
assignments. 

Here is a sample sequence leading to the client stuck JOINING:
- client joins, epoch 0
- client receives assignment tp1, stuck RECONCILING, epoch 1
- member gets FENCED on the coord, coord bumps epoch to 2
- client tries to rejoin (JOINING), epoch 0 provided by the client
- new member added to the group (group epoch bumped to 3), client receives same 
assignment that is currently trying to reconcile (tp1), but with epoch 3
- previous reconciliation completes, but will discard the result because it 
will notice that the memberHasRejoined (memberEpochOnReconciliationStart != 
memberEpoch). Client is stuck JOINING, with the server sending null target 
assignment because it hasn't changed since the last one sent (tp1)

(We should end up with a test similar to the existing 
#testDelayedReconciliationResultDiscardedIfMemberRejoins but with the case that 
the member receives the same assignment after being fenced and rejoining)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to