[Question] Go SDK support for Flink

Alexander Staubo Sun, 10 Mar 2024 06:05:02 -0700

To date there seems to be almost zero documentation on running Beam pipelines 
on Flink using the Go SDK. The official documentation has a small tutorial 
about using the local loopback mode, but there are crucial gaps when it comes 
to actually running pipelines in a Flint cluster, and I’ve been unsuccessful at 
running a job. Pretty much all Beam/Flink docs is targeted at Python.


In particular:

* With the Python SDK, it appears the preferred setup is to run the Python SDK 
as a sidecar in the task manager (though some documents say the job manager). 
The SDK has an option `--worker_pool` that does not exist for the Go SDK.

* Some examples suggest using `--environment_type=DOCKER`, but on Kubernetes 
this requires running Docker inside a pod, which is complex and not well 
documented. It seems more appropriate to use --environment_type=EXTERNAL`, but 
there’s no documentation on how to wire this up.

* There is also a lack of information on how to run the Beam job service 
(though I found an examples somewhere).

* There’s no information on how to do artifact staging without a shared volume, 
which is awkward on Kubernetes; I’d prefer an object store like GCS.

* The job service only goes up to Flink 1.16 
(`apache/beam_flink1.16_job_server`); not sure why this isn’t being kept up 
with Flink releases.

Does anyone know what the state of this support is? Is it even possible to run 
Go pipelines today?

Is nobody using Beam + Go + Flink? This would explain not just the dearth of 
documentation, but the almost total absence of mention on StackOverflow, 
GitHub, Beam mailing lists, and Flink mailing lists. This is frustrating 
considering that Flink appears to be the only way to orchestrate Beam pipelines 
on Kubernetes.

Alexander.

[Question] Go SDK support for Flink

Reply via email to