Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2023-01-02 Thread Yuxin Tan
ory. If >>> the >>> > setting is too small, the task cannot be started. If the setting is too >>> > large, there may be a waste of resources. As far as possible, Flink >>> > framework can automatically set a reasonable value, but I have a small >>> >

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-29 Thread Yuxin Tan
nly related to the parallelism of the >> task, >> > but also to the complexity of the task DAG. The more complex a DAG is, >> > shuffle write and shuffle read require larger buffers. How can we >> determine >> > how many RS and IG a DAG has? >> > >>

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-28 Thread Yuxin Tan
rmine > > how many RS and IG a DAG has? > > > > > > > > Best > > JasonLee > > > > > > Replied Message > > | From | Yuxin Tan | > > | Date | 12/28/2022 18:29 | > > | To | | > > | Subject | Re: [DISCUSS]

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-28 Thread Yuxin Tan
value, but I have a small > > problem. network memory is not only related to the parallelism of the > task, > > but also to the complexity of the task DAG. The more complex a DAG is, > > shuffle write and shuffle read require larger buffers. How can we > determine > > how ma

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-28 Thread Roman Khachatryan
to the complexity of the task DAG. The more complex a DAG is, > shuffle write and shuffle read require larger buffers. How can we determine > how many RS and IG a DAG has? > > > > Best > JasonLee > > > Replied Message ---- > | From | Yuxin Tan | > | Date |

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-28 Thread Yuxin Tan
uffle write and shuffle read require larger buffers. How can we determine > how many RS and IG a DAG has? > > > > Best > JasonLee > > > Replied Message > | From | Yuxin Tan | > | Date | 12/28/2022 18:29 | > | To | | > | Subject | Re: [DISCUSS] F

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-28 Thread JasonLee
a DAG has? Best JasonLee Replied Message | From | Yuxin Tan | | Date | 12/28/2022 18:29 | | To | | | Subject | Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager | Hi, Roman Thanks for the replay. ExclusiveBuffersPerChannel and FloatingBuffersPerGate

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-28 Thread Yuxin Tan
Hi, Roman Thanks for the replay. ExclusiveBuffersPerChannel and FloatingBuffersPerGate are obtained from configurations, which are not calculated. I have described them in the FLIP motivation section. > 3. Each gate requires at least one buffer... The timeout exception occurs when the

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-27 Thread Roman Khachatryan
Hi everyone, Thanks for the proposal and the discussion. I couldn't find much details on how exactly the values of ExclusiveBuffersPerChannel and FloatingBuffersPerGate are calculated. I guess that - the threshold evaluation is done on JM - floating buffers calculation is done on TM based on the

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-26 Thread Yuxin Tan
Hi, Weihua Thanks for your suggestions. > 1. How about reducing ExclusiveBuffersPerChannel to 1 first when the total buffer is not enough? I think it's a good idea. Will try and check the results in PoC. Before all read buffers use floating buffers, I will try to use (ExclusiveBuffersPerChannel

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-26 Thread Weihua Hu
Hi Yuxin, Thanks for the proposal. "Insufficient number of network buffers" exceptions also bother us. It's too hard for users to figure out how much network buffer they really need. It relates to partitioner type, parallelism, slots per taskmanager. Since streaming jobs are our primary

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-26 Thread Yuxin Tan
Hi, all Thanks for the reply and feedback for everyone! After combining everyone's comments, the main concerns, and corresponding adjustments are as follows. @Guowei Ma, Thanks for your feedback. > should we introduce a _new_ non-orthogonal

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-26 Thread Zhu Zhu
Hi Yuxin, Thanks for creating this FLIP. It's good if Flink does not require users to set a very large network memory, or tune the advanced(hard-to-understand) per-channel/per-gate buffer configs, to avoid "Insufficient number of network buffers" exceptions which can easily happen for large

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-25 Thread Yanfei Lei
Hi Yuxin, Thanks for the proposal! After reading the FLIP, I have some questions about the default value. This FLIP seems to introduce a *new* config option(taskmanager.memory.network.required-buffer-per-gate.max) to control the network memory usage. 1. Is this configuration at the job level or

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-24 Thread Dong Lin
Hi Yuxin, Thanks for proposing the FLIP! The motivation section makes sense. But it seems that the proposed change section mixes the proposed config with the evaluation results. It is a bit hard to understand what configs are proposed and how to describe these configs to users. Given that the

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-24 Thread Romit Mahanta
If this improves the performance+১ On Sat, 24 Dec, 2022, 5:47 pm Guowei Ma, wrote: > Hi, > Thank you very much for driving this FLIP in order to improve user > usability. > > I understand that a key goal of this FLIP is to adjust the memory > requirements of shuffle to a more reasonable range.

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-24 Thread Guowei Ma
Hi, Thank you very much for driving this FLIP in order to improve user usability. I understand that a key goal of this FLIP is to adjust the memory requirements of shuffle to a more reasonable range. Through this adaptive range adjustment, the memory efficiency can be improved under the premise

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-22 Thread Lijie Wang
Hi, Thanks for driving this FLIP, +1 for the proposed changes. Limit the maximum value of shuffle read memory is very useful when using when using adaptive batch scheduler. Currently, the adaptive batch scheduler may cause a large number of input channels in a certain TM, so we generally

Re: [DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-19 Thread Xintong Song
Thanks for the proposal, Yuxin. +1 for the proposed changes. I think these are indeed helpful usability improvements. Best, Xintong On Mon, Dec 19, 2022 at 3:36 PM Yuxin Tan wrote: > Hi, devs, > > I'd like to start a discussion about FLIP-266: Simplify network memory > configurations for

[DISCUSS] FLIP-266: Simplify network memory configurations for TaskManager

2022-12-18 Thread Yuxin Tan
Hi, devs, I'd like to start a discussion about FLIP-266: Simplify network memory configurations for TaskManager[1]. When using Flink, users may encounter the following issues that affect usability. 1. The job may fail with an "Insufficient number of network buffers" exception. 2. Flink network