Re: ktable join & data loss / indeterminism risk prevention

Dumitru-Nicolae Marasoui Wed, 15 Jul 2020 03:26:21 -0700

Hello kafka community,
Refining the step 2 and some questions:
- is the indeterminism of the ktable join a real problem?
- how is the ktable join implemented?
- do you think the solution outlined is a step in the right direction?
- does the ktable join implement such a strategy in a future version upon
configuration/demand?
Thank you,
1. kafka streams 1:


   - map topic1 in: key: the key as the join key between topic1 and topic2;
   value: topic1Or2UnionPayload,
   - map topic2 in key: the key as the join key between topic1 and topic2;
   value: topic1Or2UnionPayload,
   - merge mapped topics above into a single stream; the keys are identical
   for elements that need to be joined on both mapped topics/streams
   - write result to a new topic
   -

2. transformation part 2: (kafka kafka streams+processor):

   - input the topic result from above with both mapped messages coming on
   a single pipe, linearised by join key
   - groupBy join key
   - pipe each outcoming stream into a Processor that will store data for
   both sides of the join and as soon as it has both will start emitting
   joined records
   - write joined records to the result topic (the safe join result)
   -


On Tue, 14 Jul 2020 at 12:43, Dumitru-Nicolae Marasoui <
nicolae.maras...@kaluza.com> wrote:

> Hello kafka community,
> As I understand it, a kafka-streams join that involves a kTable: “the
> KTable lookup is done on the current KTable state, and thus, out-of-order
> records can yield non-deterministic result” [1]
>
> Does the solution below involving an intermediate topic sound right to you?
> 1. kafka streams 1:
>
>    - map topic1 in: key: the key as the join key between topic1 and
>    topic2; value: topic1Or2UnionPayload,
>    - map topic2 in key: the key as the join key between topic1 and
>    topic2; value: topic1Or2UnionPayload,
>    - merge mapped topics above into a single stream; the keys are
>    identical for elements that need to be joined on both mapped topics/streams
>    - write result to a new topic
>    -
>
> 2. transformation part 2: (kafka consumer/processor or kafka streams if a
> suitable stateful transformation can be applied):
>
>    - input the topic result from above with both mapped messages coming
>    on a single pipe, linearised by join key
>    - a state is need, likely a database (in case kafka-streams is
>    applicable, great, it already embeds one)
>    - when any two sides for the same join key are in the state, a pair
>    can be emitted downstream
>    -
>
> Does this make sense to you? Do you have any other experiences of possible
> approaches that you would like to share? Thank you
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics
>
>
> Dumitru-Nicolae Marasoui
>
> Software Engineer
>
>
>
> w kaluza.com <https://www.kaluza.com/>
>
> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
> <https://twitter.com/Kaluza_tech>
>
> Kaluza Ltd. registered in England and Wales No. 08785057
>
> VAT No. 100119879
>
> Help save paper - do you need to print this email?
>


-- 

Dumitru-Nicolae Marasoui

Software Engineer



w kaluza.com <https://www.kaluza.com/>

LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
<https://twitter.com/Kaluza_tech>

Kaluza Ltd. registered in England and Wales No. 08785057

VAT No. 100119879

Help save paper - do you need to print this email?

Re: ktable join & data loss / indeterminism risk prevention

Reply via email to