I am experiencing strange flink stream windowed join behavior.
I want to do windowed (processing time) join between two partitioned
streams. I read data from socket.
I have two cases: 1. data speed in socket is relatively slow (say 1K ps)
2. data speed in socket is high (say 37K).
The number of tuples read from socket is same in both cases to both cases.
Firstly, the size of output (of join operation) is much higher in case 2
although the number of tuples are same in both cases. For example, in
case-1, the overall output size is 500M and in case 2 it is 20G. I couldn't
get the logic behind this.
Secondly, in both cases, flink ingests all data from socket (more or less)
as soon as it is available. So, it has high throughput. However, especially
in case 2, I have to wait long time after data is ingested from source
operator. So, the data from socket is acquired and socket gets idle, and
then I have to wait long time to get actual output to sink. My question is
that, if this behavior is normal, and all the data acquired stays somewhere
inside flink, why backpressure is not applied to source operators? I mean
if the system cannot compute all inputs with high speed, then it should
lower the reading speed from socket.