hudeqi created KAFKA-16543: ------------------------------ Summary: There may be ambiguous deletions in the `cleanupGroupMetadata` when the generation of the group is less than or equal to 0 Key: KAFKA-16543 URL: https://issues.apache.org/jira/browse/KAFKA-16543 Project: Kafka Issue Type: Bug Components: group-coordinator Affects Versions: 3.6.2 Reporter: hudeqi Assignee: hudeqi
In the `cleanupGroupMetadata` method, tombstone messages is written to delete the group's MetadataKey only when the group is in the Dead state and the generation is greater than 0. The comment indicates: 'We avoid writing the tombstone when the generationId is 0, since this group is only using Kafka for offset storage.' This means that groups that only use Kafka for offset storage should not be deleted. However, there is a situation where, for example, Flink commit offsets with a generationId equal to -1. If the ApiKeys.DELETE_GROUPS is called to delete this group, Flink's group metadata will never be deleted. Yet, the logic above has already cleaned up commitKey by writing tombstone messages with removedOffsets. Therefore, the actual manifestation is: the group no longer exists (since the offsets have been cleaned up, there is no possibility of adding the group back to the `groupMetadataCache` unless offsets are committed again with the same group name), but the corresponding group metadata information still exists in __consumer_offsets. This leads to the problem that deleting the group does not completely clean up its related information. The group's state is set to Dead only in the following three situations: 1. The group information is unloaded 2. The group is deleted by ApiKeys.DELETE_GROUPS 3. All offsets of the group have expired or removed. Therefore, since the group is already in the Dead state and has been removed from the `groupMetadataCache`, why not directly clean up all the information of the group? Even if it is only used for storing offsets. -- This message was sent by Atlassian Jira (v8.20.10#820010)