Hi, we're currently in the process of evaluating Debezium and Kafka as a CDC system for our Postgres database in order to build an indexing solution (i.e. Solr or OpenSearch).
Debezium captures changes per table and propagates them into dedicated Kafka topics each. The ingested tables originally feature multiple relationships of all kinds (1:1, 1:n, n:m) - aggregated data from those tables should eventually reflect in composite documents. To illustrate this, assume the following heavily simplified table layout: - customer(id, name, firstname), - communication(id, customer_id, value), - label(id, customer_id, value), - (maybe more dependent tables related to customers 1:n or 1:1 or even n:m - not considered here for brevity). All tables are streamed into Kafka topics as pointed out above. Now in order to aggregate an output customer with all associated communication and label values (i.e. aggregated as lists), what would be the most elegant solution leveraging Kafka Streams? Note that customers do not necessisarily have any communication or label at all, thus non-key joins are out of the game as far as I understand. Our initial (naive) solution was to re-key the dependent KTables and then left joining, but that involves a lot of steps and intermediate State Stores, especially when considering the stated example is heavily simplified: KTable<Long, Customer> customers = streamBuilder.table(customerTopic, Consumed.with(<Serdes>)); KTable<Long, Comm> comm = streamBuilder.table(commTopic, Consumed.with(<Serdes>)); KTable<Long, Label> labels = streamBuilder.table(labelTopic, Consumed.with(<Serdes>)); // Re-Keying KTable<Long, GroupedComm> groupedComm = communications.groupBy(...).aggregate(...); KTable<Long, GroupedLabel> groupedLabels = labels.groupBy(...).aggregate(...); // Join KTable<Long, AggregateCustomer> aggregated = customers .leftJoin(groupedComm, ...) .leftJoin(groupedLabel, ...) .groupBy(...) .aggregate(...); Are there more efficient / less naive / more elegant / simpler solutions to this? Co-grouping input streams did not yield expected results... Best wishes, Karsten