Hi Arvid The second picture shows the metrics of the upstream operator. The upstream has 150 parallelisms as you can see in the first picture. I expect the bytes sent is about 9 * bytes received as we have 9 downstream operators connecting.
Hi Caizhi, Let me create a minimal reproducible DAG and update here On Thu, Dec 16, 2021 at 4:03 AM Arvid Heise <ar...@apache.org> wrote: > Hi, > > Could you please clarify which operator we see in the second picture? > > If you are showing the upstream operator, then this has only parallelism > 1, so there shouldn't be multiple subtasks. > If you are showing the downstream operator, then the metric would refer to > the HASH and not REBALANCE. > > On Tue, Dec 14, 2021 at 2:55 AM Caizhi Weng <tsreape...@gmail.com> wrote: > >> Hi! >> >> This doesn't seem to be the expected behavior. Rebalance shuffle should >> send records to one of the parallelism, not all. >> >> If possible could you please explain what your Flink job is doing and >> preferably share your user code so that others can look into this case? >> >> tao xiao <xiaotao...@gmail.com> 于2021年12月11日周六 01:11写道: >> >>> Hi team, >>> >>> I have one operator that is connected to another 9 downstream operators >>> using rebalance. Each operator has 150 parallelisms[1]. I assume each >>> message in the upstream operation is sent to one of the parallel instances >>> of the 9 receiving operators so the total bytes sent should be roughly 9 >>> times of bytes received in the upstream operator metric. However the Flink >>> UI shows the bytes sent is much higher than 9 times. It is about 150 * 9 * >>> bytes received[2]. This looks to me like every message is duplicated to >>> each parallel instance of all receiving operators like what broadcast >>> does. Is this correct? >>> >>> >>> >>> [1] https://imgur.com/cGyb0QO >>> [2] https://imgur.com/SFqPiJA >>> -- >>> Regards, >>> Tao >>> >> -- Regards, Tao