To date there seems to be almost zero documentation on running Beam pipelines on Flink using the Go SDK. The official documentation has a small tutorial about using the local loopback mode, but there are crucial gaps when it comes to actually running pipelines in a Flint cluster, and I’ve been unsuccessful at running a job. Pretty much all Beam/Flink docs is targeted at Python.
In particular: * With the Python SDK, it appears the preferred setup is to run the Python SDK as a sidecar in the task manager (though some documents say the job manager). The SDK has an option `--worker_pool` that does not exist for the Go SDK. * Some examples suggest using `--environment_type=DOCKER`, but on Kubernetes this requires running Docker inside a pod, which is complex and not well documented. It seems more appropriate to use --environment_type=EXTERNAL`, but there’s no documentation on how to wire this up. * There is also a lack of information on how to run the Beam job service (though I found an examples somewhere). * There’s no information on how to do artifact staging without a shared volume, which is awkward on Kubernetes; I’d prefer an object store like GCS. * The job service only goes up to Flink 1.16 (`apache/beam_flink1.16_job_server`); not sure why this isn’t being kept up with Flink releases. Does anyone know what the state of this support is? Is it even possible to run Go pipelines today? Is nobody using Beam + Go + Flink? This would explain not just the dearth of documentation, but the almost total absence of mention on StackOverflow, GitHub, Beam mailing lists, and Flink mailing lists. This is frustrating considering that Flink appears to be the only way to orchestrate Beam pipelines on Kubernetes. Alexander.
