Re: Guidelines for setting task slots when running multiple jobs in a Flink cluster

Till Rohrmann Fri, 12 Mar 2021 04:35:32 -0800

Hi Sushruth,

if your jobs need significantly different configurations, then I would
suggest to think about dedicated clusters per job. That way you can
configure the cluster to work best for the respective job. Of course,
running multiple clusters instead of a single one comes at the cost of more
overhead which you pay for the multiple Flink processes.


If you don't want/can't use the per job clusters, then there is not much
else you can do to control how the resources of a session cluster are
distributed among different jobs other than what Roman has already said.
The most effective way is to reduce the parallelism of the jobs which need
fewer resources or splitting chains up into units which consume/require
the same set of resources to run (CPU, memory). In the future, this problem
will most likely be solved by FLIP-53 [1] which allows to specify resource
requirements for operators and, thus, the slots a job needs.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management

Cheers,
Till

On Fri, Mar 12, 2021 at 12:20 PM Roman Khachatryan <ro...@apache.org> wrote:

> Hi,
>
> Do I understand correctly that:
> 1. The workload varies across the jobs but stays the same for the same job
> 2. With a small number of slots per TM you are concerned about uneven
> resource utilization when running low- and high-intensive jobs on the
> same cluster simultaneously?
>
> If so, wouldn't reducing parallelism of low-intensive jobs help?
> Other options to consider are putting subtasks of high-intensive job
> into different slot-sharing groups; or breaking operator chains
> explicitly [1]
>
> There are also a number of improvements coming in 1.13 release: [2][3][4].
>
> I'm pulling in Till and Robert who knows this area better.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups
> [2] https://issues.apache.org/jira/browse/FLINK-21267
> [3] https://issues.apache.org/jira/browse/FLINK-10404
> [4] https://issues.apache.org/jira/browse/FLINK-14187
>
> Regards,
> Roman
>
> On Fri, Mar 12, 2021 at 5:03 AM Sush Bankapura
> <sushrutha.bankap...@man-es.com> wrote:
> >
> > Hi,
> >
> > We  have multiple jobs that need to be deployed to a Flink cluster.
> Parallelism for jobs vary and dependent on the type of work being done  and
> so are the memory requirements. All jobs currently use the same state
> backend.  Since the workloads handled by each job is different, the scaling
> pattern also varies. We run all our jobs in a  single Flink cluster (7 VMs
> with the same instance configuration)
> >
> >  Most of what I have read in the Flink documentation indicates any of
> the following for setting the task slots
> >
> > 1. As a rule of thumb, a good default number of task slots will be the
> number of CPU cores. With hyper-threading, each slot then takes 2 or more
> hardware thread contexts. If you are doing any Blocking IO operations in
> Flink job, it is suggested to have more number of slots than the core.
> >
> > 2. A Flink cluster needs exactly as many task slots as the highest
> parallelism used in the job. No need to calculate how many tasks (with
> varying parallelism) a program contains in total.
> >
> > I did not find documentation  for the task slot setting for the scenario
> I have enumerated. While setting a lower value for the task slots seems to
> work better for jobs which need to process high amounts of traffic than the
> other jobs which process lower amounts of traffic, but this will be
> inefficient if the slots are assigned to jobs which work on lower volumes
> of traffic.
> >
> > Depending on the workload handled by each Flink job. rt seems that we
> would need to set as many clusters.
> >
> > 1. Is this the only option available?
> > 2. Are there any guidelines on deciding on the number of task slots in
> such an environment?
> >
> > Thanks,
> > Sushruth
>

Re: Guidelines for setting task slots when running multiple jobs in a Flink cluster

Reply via email to