Hi Luke and Kyle,

Thanks; I think that makes sense. If I run a dedicated beam job server; I
assume I use the PortableRunner
<https://github.com/apache/beam/blob/2aab1b04c2f5002527e0f2d25075b282feb7c054/sdks/python/apache_beam/runners/portability/portable_runner.py#L253>
rather
than the FlinkRunner?

J

On Tue, Aug 3, 2021 at 2:41 PM Kyle Weaver <[email protected]> wrote:

> Hi Jeremy, good to hear from you.
>
> The Beam->Flink job server translates a Beam pipeline from its Beam
> representation to a Flink job. The purpose of --flink_submit_uber_jar is to
> bundle the Beam pipeline and the job server together, so that the
> translation will happen within the Flink job manager rather than in an
> external Beam job server. --flink_submit_uber_jar adds some operational
> convenience because you don't have to start up a job server, but it comes
> at the cost of you having to upload the entire job server jar to Flink.
> Right now the best way to avoid this cost is to start a dedicated job
> server and submit your Beam Python job to that rather than using
> --flink_submit_uber_jar.
>
> On Tue, Aug 3, 2021 at 2:06 PM Jeremy Lewi <[email protected]> wrote:
>
>> Hi Folks,
>>
>> I'm running Beam Python on Flink on Kubernetes. One thing I'm noticing is
>> that it takes a really long time for jobs to start. It looks like this
>> slowdown is due to the cost of uploading the Flink Beam Uber Jar (~225 Mb)
>> to the Job server.
>>
>> Is there any way to speed this up?
>>
>> 1. Can the JAR be cached in the Flink job manager and Flink task manager
>> to be reused across runs?
>> 2. Is it possible to bake the JAR into my docker images and avoid
>> uploading it on each run?
>> 3. Should I run a dedicated beam job server separate from the flink
>> cluster?
>>
>> Thanks
>> J
>>
>

Reply via email to