Hi there,
I have been learning Flink-related technologies recently and currently using the combination of Flink CDC + Kafka + Flink. The specific process is as follows: Flink CDC collects change data from multiple database tables, then forwards it to Kafka, and Flink consumes the data for subsequent calculations. Now I encounter a problem: the collected multi-table data has strong associations, such as main tables and sub-tables. Their changes usually occur within a single transaction. My initial idea was to use Kafka's Partition and route data through the transactionid of CDC messages, ensuring that one consumer thread can process data from one transaction. This can solve ordinary scenarios. However, there is another case: data in a single table may change multiple times in a short period, and each change belongs to a different transaction. This causes the data to be consumed by multiple consumers, leading to a certain probability of order issues. Currently, the only solution I can think of is to use a single Partition to process all data, but this is too inefficient. Therefore, I would like to ask: Is there a flaw in my thinking? Are there any better solutions? And at which link should the solution be implemented? Thanks, Viktor
