Hello kafka community,
Writing black on white to be more visible,
This is a thought on making join more clear to me and less prone to
concurrency issues that would be risky, not knowing the underlying
implementation of join:
Waiting your feedback,
Thanks,

1. kafka streams 1:
map topic1 in: key: the key as the join key between topic1 and topic2;
value: topic1Or2UnionPayload,
map topic2 in key: the key as the join key between topic1 and topic2;
value: topic1Or2UnionPayload,
merge mapped topics above into a single stream; the keys are identical for
elements that need to be joined on both mapped topics/streams
write result to a new topic

2. kafka streams 2 (with join processor)
input the topic result from above with both mapped messages coming on a
single pipe, linearised by join key
groupBy join key
pipe each outcoming stream into a Processor that will store data for both
sides of the join and as soon as it has both will start emitting joined
records
write joined records to the result topic (the safe join result)

On Wed, 15 Jul 2020 at 11:25, Dumitru-Nicolae Marasoui <
nicolae.maras...@kaluza.com> wrote:

> Hello kafka community,
> Refining the step 2 and some questions:
> - is the indeterminism of the ktable join a real problem?
> - how is the ktable join implemented?
> - do you think the solution outlined is a step in the right direction?
> - does the ktable join implement such a strategy in a future version upon
> configuration/demand?
> Thank you,
> 1. kafka streams 1:
>
>    - map topic1 in: key: the key as the join key between topic1 and
>    topic2; value: topic1Or2UnionPayload,
>    - map topic2 in key: the key as the join key between topic1 and
>    topic2; value: topic1Or2UnionPayload,
>    - merge mapped topics above into a single stream; the keys are
>    identical for elements that need to be joined on both mapped topics/streams
>    - write result to a new topic
>    -
>
> 2. transformation part 2: (kafka kafka streams+processor):
>
>    - input the topic result from above with both mapped messages coming
>    on a single pipe, linearised by join key
>    - groupBy join key
>    - pipe each outcoming stream into a Processor that will store data for
>    both sides of the join and as soon as it has both will start emitting
>    joined records
>    - write joined records to the result topic (the safe join result)
>    -
>
>
> On Tue, 14 Jul 2020 at 12:43, Dumitru-Nicolae Marasoui <
> nicolae.maras...@kaluza.com> wrote:
>
>> Hello kafka community,
>> As I understand it, a kafka-streams join that involves a kTable: “the
>> KTable lookup is done on the current KTable state, and thus, out-of-order
>> records can yield non-deterministic result” [1]
>>
>> Does the solution below involving an intermediate topic sound right
>> to you?
>> 1. kafka streams 1:
>>
>>    - map topic1 in: key: the key as the join key between topic1 and
>>    topic2; value: topic1Or2UnionPayload,
>>    - map topic2 in key: the key as the join key between topic1 and
>>    topic2; value: topic1Or2UnionPayload,
>>    - merge mapped topics above into a single stream; the keys are
>>    identical for elements that need to be joined on both mapped 
>> topics/streams
>>    - write result to a new topic
>>    -
>>
>> 2. transformation part 2: (kafka consumer/processor or kafka streams if a
>> suitable stateful transformation can be applied):
>>
>>    - input the topic result from above with both mapped messages coming
>>    on a single pipe, linearised by join key
>>    - a state is need, likely a database (in case kafka-streams is
>>    applicable, great, it already embeds one)
>>    - when any two sides for the same join key are in the state, a pair
>>    can be emitted downstream
>>    -
>>
>> Does this make sense to you? Do you have any other experiences of
>> possible approaches that you would like to share? Thank you
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics
>>
>>
>> Dumitru-Nicolae Marasoui
>>
>> Software Engineer
>>
>>
>>
>> w kaluza.com <https://www.kaluza.com/>
>>
>> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
>> <https://twitter.com/Kaluza_tech>
>>
>> Kaluza Ltd. registered in England and Wales No. 08785057
>>
>> VAT No. 100119879
>>
>> Help save paper - do you need to print this email?
>>
>
>
> --
>
> Dumitru-Nicolae Marasoui
>
> Software Engineer
>
>
>
> w kaluza.com <https://www.kaluza.com/>
>
> LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
> <https://twitter.com/Kaluza_tech>
>
> Kaluza Ltd. registered in England and Wales No. 08785057
>
> VAT No. 100119879
>
> Help save paper - do you need to print this email?
>


-- 

Dumitru-Nicolae Marasoui

Software Engineer



w kaluza.com <https://www.kaluza.com/>

LinkedIn <https://www.linkedin.com/company/kaluza> | Twitter
<https://twitter.com/Kaluza_tech>

Kaluza Ltd. registered in England and Wales No. 08785057

VAT No. 100119879

Help save paper - do you need to print this email?

Reply via email to