Re: Beam flink runner job not keeping up with input rate after downscaling

Eleanore Jin Mon, 10 Aug 2020 10:31:24 -0700

Hi Sandeep,

Can you please share your DAG? Is your job read and write to some sink?


Thanks a lot!

On Mon, Aug 10, 2020 at 9:27 AM Kathula, Sandeep <[email protected]>
wrote:

> Hi,
>
>    We started a Beam application with Flink runner with parallelism as 50.
> It is a *stateless application.*  With initial parallelism of 50, our
> application is able to process up to *50,000 records* per second. After a
> week, we took a savepoint and restarted from savepoint with the parallelism
> of *18.* We are seeing that our application is only able to process *7000* 
> records
> per second but we expect it to process almost 18,000 records per second.
> Records processed per task manager was almost *half* of what is used to
> process previously with 50 task managers.
>
>
>
>  When we started a new application with 18 pods without any savepoint, it
> is able to process ~18500 records per second. This problem *occurs only
> when we downscale after taking a savepoint*. We ported same application
> to simple *Flink application without Apache Beam*, and there *it scales
> well without any issues* after restarting from savepoint with less
> parallelism.  So the problem should be with Apache Beam or some config we
> are passing to Beam/Flink. We are using the following config:
>
>
>
> numberOfExecutionRetries=2
>
> externalizedCheckpointsEnabled=true
>
> retainExternalizedCheckpointsOnCancellation=true
>
>
>
>
>
> We didn’t give any maxParallelism in our Beam application but just
> specifying parallelism.
>
>
>
> Beam version - 2.19
>
> Flink version- 1.9
>
>
>
> Any suggestions/help would be appreciated.
>
>
>
>
>
> Thanks
>
> Sandeep Kathula
>
>
>
>
>

Re: Beam flink runner job not keeping up with input rate after downscaling

Reply via email to