Hi Sonam,

I am not a long-standing Flink user (3 months only) so perhaps others will have 
a more authoritative view.

I would say that I am using Flink in k8s, and have had some good success with 
the Google Flink operator 
(https://github.com/GoogleCloudPlatform/flink-on-k8s-operator).  This includes 
Custom Resource Definitions (CRDs) so that you can define your Flink clusters 
in YAML, and deploy using kustomize. 

The result is:

A Flink cluster of a job-manager and one-or-more task-managers.
A Kubernetes job which acts as the link “client” to submit the job to the 
job-manager, the job-submitter

e.g.

flink-example-job-submitter-g4s6g   0/1     Completed   0          6d15h
flink-example-jobmanager-0          1/1     Running     3          6d15h
flink-example-taskmanager-0         1/1     Running     3          6d15h

This all seems in keeping with Flink’s “Per Job-Mode” deployment option 
(https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/overview/#per-job-mode)

Note: i’m only just getting into to state persistence and recovery, so still 
some work to do, but I think this is largely understanding and configuration.

Hope that helps

Paul

> On 17 Jun 2021, at 23:55, Sonam Mandal <soman...@linkedin.com> wrote:
> 
> Hello,
> 
> We are exploring running multiple Flink clusters within a Kubernetes cluster 
> such that each Flink cluster can run with a specified Flink image version. 
> Since the Flink Job Graph needs to be compatible with the Flink version 
> running in the Flink cluster, this brings a challenge in how we ensure that 
> the SQL job graph or Flink job jars are compatible with the Flink cluster 
> users want to run them on.
> 
> E.g. if the Flink cluster is running version 1.12.1, the job graph generated 
> from the SQL must be created using compatible 1.12.1 Flink libraries. 
> Otherwise, we see issues with deserialization etc.
> 
> Is there a recommended way to handle this scenario today? 
> 
> Thanks,
> Sonam

Reply via email to