Hello kafka community, Writing black on white to be more visible, This is a thought on making join more clear to me and less prone to concurrency issues that would be risky, not knowing the underlying implementation of join: Waiting your feedback, Thanks,
1. kafka streams 1: map topic1 in: key: the key as the join key between topic1 and topic2; value: topic1Or2UnionPayload, map topic2 in key: the key as the join key between topic1 and topic2; value: topic1Or2UnionPayload, merge mapped topics above into a single stream; the keys are identical for elements that need to be joined on both mapped topics/streams write result to a new topic 2. kafka streams 2 (with join processor) input the topic result from above with both mapped messages coming on a single pipe, linearised by join key groupBy join key pipe each outcoming stream into a Processor that will store data for both sides of the join and as soon as it has both will start emitting joined records write joined records to the result topic (the safe join result) On Wed, 15 Jul 2020 at 11:25, Dumitru-Nicolae Marasoui < nicolae.maras...@kaluza.com> wrote: > Hello kafka community, > Refining the step 2 and some questions: > - is the indeterminism of the ktable join a real problem? > - how is the ktable join implemented? > - do you think the solution outlined is a step in the right direction? > - does the ktable join implement such a strategy in a future version upon > configuration/demand? > Thank you, > 1. kafka streams 1: > > - map topic1 in: key: the key as the join key between topic1 and > topic2; value: topic1Or2UnionPayload, > - map topic2 in key: the key as the join key between topic1 and > topic2; value: topic1Or2UnionPayload, > - merge mapped topics above into a single stream; the keys are > identical for elements that need to be joined on both mapped topics/streams > - write result to a new topic > - > > 2. transformation part 2: (kafka kafka streams+processor): > > - input the topic result from above with both mapped messages coming > on a single pipe, linearised by join key > - groupBy join key > - pipe each outcoming stream into a Processor that will store data for > both sides of the join and as soon as it has both will start emitting > joined records > - write joined records to the result topic (the safe join result) > - > > > On Tue, 14 Jul 2020 at 12:43, Dumitru-Nicolae Marasoui < > nicolae.maras...@kaluza.com> wrote: > >> Hello kafka community, >> As I understand it, a kafka-streams join that involves a kTable: “the >> KTable lookup is done on the current KTable state, and thus, out-of-order >> records can yield non-deterministic result” [1] >> >> Does the solution below involving an intermediate topic sound right >> to you? >> 1. kafka streams 1: >> >> - map topic1 in: key: the key as the join key between topic1 and >> topic2; value: topic1Or2UnionPayload, >> - map topic2 in key: the key as the join key between topic1 and >> topic2; value: topic1Or2UnionPayload, >> - merge mapped topics above into a single stream; the keys are >> identical for elements that need to be joined on both mapped >> topics/streams >> - write result to a new topic >> - >> >> 2. transformation part 2: (kafka consumer/processor or kafka streams if a >> suitable stateful transformation can be applied): >> >> - input the topic result from above with both mapped messages coming >> on a single pipe, linearised by join key >> - a state is need, likely a database (in case kafka-streams is >> applicable, great, it already embeds one) >> - when any two sides for the same join key are in the state, a pair >> can be emitted downstream >> - >> >> Does this make sense to you? Do you have any other experiences of >> possible approaches that you would like to share? Thank you >> >> [1] >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics >> >> >> Dumitru-Nicolae Marasoui >> >> Software Engineer >> >> >> >> w kaluza.com <https://www.kaluza.com/> >> >> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter >> <https://twitter.com/Kaluza_tech> >> >> Kaluza Ltd. registered in England and Wales No. 08785057 >> >> VAT No. 100119879 >> >> Help save paper - do you need to print this email? >> > > > -- > > Dumitru-Nicolae Marasoui > > Software Engineer > > > > w kaluza.com <https://www.kaluza.com/> > > LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter > <https://twitter.com/Kaluza_tech> > > Kaluza Ltd. registered in England and Wales No. 08785057 > > VAT No. 100119879 > > Help save paper - do you need to print this email? > -- Dumitru-Nicolae Marasoui Software Engineer w kaluza.com <https://www.kaluza.com/> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter <https://twitter.com/Kaluza_tech> Kaluza Ltd. registered in England and Wales No. 08785057 VAT No. 100119879 Help save paper - do you need to print this email?