Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-09-19 Thread Shammon FY
Thanks @XintongSong and sorry for replying late. Also thanks Zhengyu for bringing this discussion up. We use flink session cluster to run olap queries in ByteDance, and have supported several users in production. The maximum single cluster has 2000 cores, and we focus on the job scheduling

Fwd: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-09-16 Thread Shammon FY
Thanks @XintongSong and sorry for replying late. Also thanks Zhengyu for bringing this discussion up. We use flink session cluster to run olap queries in ByteDance, and have supported several users in production. The maximum single cluster has 2000 cores, and we focus on the job scheduling

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-29 Thread Zheng Yu Chen
Thanks, for the community fallback suggestions. In fact, the problem I want to solve is to reduce the current workload of the JobManager (as the title says, more focus is on how to reduce the workload of the JobManager). First my idea, I thought of reducing the resource overhead of the JobManager

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-28 Thread Xintong Song
Sorry for joining the discussion late. And thanks Zhengyu for bringing this discussion up. I think this is an interesting topic. I actually had something similar in mind for a long time. I haven't carried it out for the same concerns as others already mentioned, that the benefit for this effort

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-26 Thread Zheng Yu Chen
Hi Chesnay , I have also considered the method you mentioned. If we deploy some load balancing or intelligent scheduling in front of multiple SessionClusters, this may cause the following problems ● Insufficient resource utilization. When we distribute these resources on each cluster, the job

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-23 Thread David Morávek
Hi Zheng, Thanks for the write-up! I tend to agree with Chesnay that this introduces additional complexity to an already complex deployment model. One of the main focuses in this area is to reduce feature sparsity and to have fewer high-quality options. Example efforts are deprecation (and

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-18 Thread Zheng Yu Chen
You're right, this does add to the complexity of their communication coordination I can understand what you mean is similar to ngnix, load balancing to different SessionClusters in the front, rather than one more component. In fact, I have tried this myself, and it seems to solve the problem of

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-18 Thread Zheng Yu Chen
Zheng Yu Chen 18:00 (2分钟前) 发送至 dev Thank you for your interest in this program. This proposal only takes effect for Session Cluster Mode, and will not take effect for Application Mode. I would like to answer some of your previous questions about patterns and practical cases first, and then discuss

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-18 Thread Zheng Yu Chen
Thank you for your interest in this program. This proposal only takes effect for Session Cluster Mode, and will not take effect for Application Mode. I would like to answer some of your previous questions about patterns and practical cases first, and then discuss some implementation details later.

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-17 Thread Chesnay Schepler
To be honest I'm terrified at the idea of splitting the Dispatcher into several processes, even more so if this is supposed to be opt-in and specific to session mode. It would fragment the coordination layer even more than it already is, and make ops more complicated (yet another set of

Re: [Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-17 Thread Matthias Pohl
Hi Conrad, thanks for reaching out to the community with your proposal. I looked through FLIP-257 [1]. Your motivation sounds interesting. Can you elaborate a bit more on the concrete use-cases you have in mind? How do these match the user-cases of the two favored execution modes, i.e. Flink's

[Discuss] Let's Session Cluster JobManager take a breather (FLIP-257: Flink JobManager Process Split)

2022-08-16 Thread Zheng Yu Chen
Hi community ~ I think this title should be quite interesting. The idea is to reduce the workload of the JobManager and make the SessionCluster [2] more stable in the process of running jobs. I designed a plan for splitting the JobManager on FLIP-257 [1]: