Hello kafka community, Refining the step 2 and some questions: - is the indeterminism of the ktable join a real problem? - how is the ktable join implemented? - do you think the solution outlined is a step in the right direction? - does the ktable join implement such a strategy in a future version upon configuration/demand? Thank you, 1. kafka streams 1:
- map topic1 in: key: the key as the join key between topic1 and topic2; value: topic1Or2UnionPayload, - map topic2 in key: the key as the join key between topic1 and topic2; value: topic1Or2UnionPayload, - merge mapped topics above into a single stream; the keys are identical for elements that need to be joined on both mapped topics/streams - write result to a new topic - 2. transformation part 2: (kafka kafka streams+processor): - input the topic result from above with both mapped messages coming on a single pipe, linearised by join key - groupBy join key - pipe each outcoming stream into a Processor that will store data for both sides of the join and as soon as it has both will start emitting joined records - write joined records to the result topic (the safe join result) - On Tue, 14 Jul 2020 at 12:43, Dumitru-Nicolae Marasoui < nicolae.maras...@kaluza.com> wrote: > Hello kafka community, > As I understand it, a kafka-streams join that involves a kTable: “the > KTable lookup is done on the current KTable state, and thus, out-of-order > records can yield non-deterministic result” [1] > > Does the solution below involving an intermediate topic sound right to you? > 1. kafka streams 1: > > - map topic1 in: key: the key as the join key between topic1 and > topic2; value: topic1Or2UnionPayload, > - map topic2 in key: the key as the join key between topic1 and > topic2; value: topic1Or2UnionPayload, > - merge mapped topics above into a single stream; the keys are > identical for elements that need to be joined on both mapped topics/streams > - write result to a new topic > - > > 2. transformation part 2: (kafka consumer/processor or kafka streams if a > suitable stateful transformation can be applied): > > - input the topic result from above with both mapped messages coming > on a single pipe, linearised by join key > - a state is need, likely a database (in case kafka-streams is > applicable, great, it already embeds one) > - when any two sides for the same join key are in the state, a pair > can be emitted downstream > - > > Does this make sense to you? Do you have any other experiences of possible > approaches that you would like to share? Thank you > > [1] > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics > > > Dumitru-Nicolae Marasoui > > Software Engineer > > > > w kaluza.com <https://www.kaluza.com/> > > LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter > <https://twitter.com/Kaluza_tech> > > Kaluza Ltd. registered in England and Wales No. 08785057 > > VAT No. 100119879 > > Help save paper - do you need to print this email? > -- Dumitru-Nicolae Marasoui Software Engineer w kaluza.com <https://www.kaluza.com/> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter <https://twitter.com/Kaluza_tech> Kaluza Ltd. registered in England and Wales No. 08785057 VAT No. 100119879 Help save paper - do you need to print this email?