Hi, Yes, your reasoning is correct. If you use two binary joins, the data of the first two streams will be buffered twice. Unioning all three streams and joining them in a custom ProcessFunction would reduce the amount of required state.
Best, Fabian Am Sa., 24. Nov. 2018 um 14:08 Uhr schrieb Gagan Agrawal < agrawalga...@gmail.com>: > Hi, > I want to do window join on multiple Kafka streams (say a, b, c) on common > field in all 3 streams and apply some custom function on joined stream. As > I understand we can join only 2 streams at a time via DataStream api. So > may be I need to join a and b first and then join first joined stream with > c. I want to understand how would stream state be stored in backend? Since > I will be joining a and b stream first, I believe both streams will be > stored in state backend for window time. And then again join of first > joined stream (of a and b) with c will result storage of all 3 streams for > windowed period. Does that mean stream a and b are stored twice in state > backend? > > Let's say instead of using inbuilt join api, if I rather union all 3 > streams (after transforming them to common schema) and keyBy stream on > common field and apply process function where I implement joining on my own > and store streams in some state backend, will that be more storage > efficient as I will be saving 3 streams just once instead of twice? > > Gagan >