Hi Luke and Kyle, Thanks; I think that makes sense. If I run a dedicated beam job server; I assume I use the PortableRunner <https://github.com/apache/beam/blob/2aab1b04c2f5002527e0f2d25075b282feb7c054/sdks/python/apache_beam/runners/portability/portable_runner.py#L253> rather than the FlinkRunner?
J On Tue, Aug 3, 2021 at 2:41 PM Kyle Weaver <[email protected]> wrote: > Hi Jeremy, good to hear from you. > > The Beam->Flink job server translates a Beam pipeline from its Beam > representation to a Flink job. The purpose of --flink_submit_uber_jar is to > bundle the Beam pipeline and the job server together, so that the > translation will happen within the Flink job manager rather than in an > external Beam job server. --flink_submit_uber_jar adds some operational > convenience because you don't have to start up a job server, but it comes > at the cost of you having to upload the entire job server jar to Flink. > Right now the best way to avoid this cost is to start a dedicated job > server and submit your Beam Python job to that rather than using > --flink_submit_uber_jar. > > On Tue, Aug 3, 2021 at 2:06 PM Jeremy Lewi <[email protected]> wrote: > >> Hi Folks, >> >> I'm running Beam Python on Flink on Kubernetes. One thing I'm noticing is >> that it takes a really long time for jobs to start. It looks like this >> slowdown is due to the cost of uploading the Flink Beam Uber Jar (~225 Mb) >> to the Job server. >> >> Is there any way to speed this up? >> >> 1. Can the JAR be cached in the Flink job manager and Flink task manager >> to be reused across runs? >> 2. Is it possible to bake the JAR into my docker images and avoid >> uploading it on each run? >> 3. Should I run a dedicated beam job server separate from the flink >> cluster? >> >> Thanks >> J >> >
