Hi Kyle, Thanks for the response!
On Wed, Aug 26, 2020 at 5:28 PM Kyle Weaver <[email protected]> wrote: > > - With the Flink operator, I was able to submit a Beam job, but hit the > issue that I need Docker installed on my Flink nodes. I haven't yet tried > changing the operator's yaml files to add Docker inside them. > > Running Beam workers via Docker on the Flink nodes is not recommended (and > probably not even possible), since the Flink nodes are themselves already > running inside Docker containers. Running workers as sidecars avoids that > problem. For example: > https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/examples/beam/with_job_server/beam_flink_cluster.yaml#L17-L20 > > The main problem with the sidecar approach is that I can't use the Flink cluster as a "service" for anybody to submit their jobs with custom containers - the container version is fixed. Do I understand it correctly? Seems like the Docker-in-Docker approach is viable, and is mentioned in the Beam Flink K8s design doc <https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.dtj1gnks47dq> . > > I also haven't tried this > <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md> > yet > because it implies submitting jobs using "kubectl apply" which is weird - > why not just submit it through the Flink job server? > > I'm guessing it goes through k8s for monitoring purposes. I see no reason > it shouldn't be possible to submit to the job server directly through > Python, network permitting, though I haven't tried this. > > > > On Wed, Aug 26, 2020 at 4:10 PM Eugene Kirpichov <[email protected]> > wrote: > >> Hi folks, >> >> I'm still working with Pachama <https://pachama.com/> right now; we have >> a Kubernetes Engine cluster on GCP and want to run Beam Python batch >> pipelines with custom containers against it. >> Flink and Cloud Dataflow are the two options; Cloud Dataflow doesn't >> support custom containers for batch pipelines yes so we're going with Flink. >> >> I'm struggling to find complete documentation on how to do this. There >> seems to be lots of conflicting or incomplete information: several ways to >> deploy Flink, several ways to get Beam working with it, bizarre >> StackOverflow questions, and no documentation explaining a complete working >> example. >> >> == My requests == >> * Could people briefly share their working setup? Would be good to know >> which directions are promising. >> * It would be particularly helpful if someone could volunteer an hour of >> their time to talk to me about their working Beam/Flink/k8s setup. It's for >> a good cause (fixing the planet :) ) and on my side I volunteer to write up >> the findings to share with the community so others suffer less. >> >> == Appendix: My findings so far == >> There are multiple ways to deploy Flink on k8s: >> - The GCP marketplace Flink operator >> <https://cloud.google.com/blog/products/data-analytics/open-source-processing-engines-for-kubernetes> >> (couldn't >> get it to work) and the respective CLI version >> <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator> (buggy, >> but I got it working) >> - https://github.com/lyft/flinkk8soperator (haven't tried) >> - Flink's native k8s support >> <https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html> >> (super >> easy to get working) >> <https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html> >> >> I confirmed that my Flink cluster was operational by running a simple >> Wordcount job, initiated from my machine. However I wasn't yet able to get >> Beam working: >> >> - With the Flink operator, I was able to submit a Beam job, but hit the >> issue that I need Docker installed on my Flink nodes. I haven't yet tried >> changing the operator's yaml files to add Docker inside them. I also >> haven't tried this >> <https://github.com/GoogleCloudPlatform/flink-on-k8s-operator/blob/master/docs/beam_guide.md> >> yet because it implies submitting jobs using "kubectl apply" which is >> weird - why not just submit it through the Flink job server? >> >> - With Flink's native k8s support, I tried two things: >> - Creating a fat portable jar using --output_executable_path. The jar >> is huge (200+MB) and takes forever to upload to my Flink cluster - this is >> a non-starter. But if I actually upload it, then I hit the same issue with >> lacking Docker. Haven't tried fixing it yet. >> - Simply running my pipeline --runner=FlinkRunner >> --environment_type=DOCKER --flink_master=$PUBLIC_IP:8081. The Java process >> appears to send 1+GB of data to somewhere, but the job never even starts. >> >> I looked at a few conference talks: >> - >> https://www.cncf.io/wp-content/uploads/2020/02/CNCF-Webinar_-Apache-Flink-on-Kubernetes-Operator-1.pdf >> - seems to imply that I need to add a Beam worker "sidecar" to the Flink >> workers; and that I need to submit my job using "kubectl apply". >> - https://www.youtube.com/watch?v=8k1iezoc5Sc which also mentions the >> sidecar, but also mentions the fat jar option >> >> -- >> Eugene Kirpichov >> http://www.linkedin.com/in/eugenekirpichov >> > -- Eugene Kirpichov http://www.linkedin.com/in/eugenekirpichov
