Hi, I am experiencing strange flink stream windowed join behavior.
I want to do windowed (processing time) join between two partitioned streams. I read data from socket. I have two cases: 1. data speed in socket is relatively slow (say 1K ps) 2. data speed in socket is high (say 37K). The number of tuples read from socket is same in both cases to both cases. Firstly, the size of output (of join operation) is much higher in case 2 although the number of tuples are same in both cases. For example, in case-1, the overall output size is 500M and in case 2 it is 20G. I couldn't get the logic behind this. Secondly, in both cases, flink ingests all data from socket (more or less) as soon as it is available. So, it has high throughput. However, especially in case 2, I have to wait long time after data is ingested from source operator. So, the data from socket is acquired and socket gets idle, and then I have to wait long time to get actual output to sink. My question is that, if this behavior is normal, and all the data acquired stays somewhere inside flink, why backpressure is not applied to source operators? I mean if the system cannot compute all inputs with high speed, then it should lower the reading speed from socket. Thanks Davood