Hi Derek - could you explain where does the "3000 connections" number come
from, i.e. how did you measure it? It's weird that 5-6 workers would use
3000 connections.

On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <[email protected]> wrote:

> Hi,
>
> We are new to Beam and need some help.
>
> We are working on a flow to ingest events and writes the aggregated
> counts to a database. The input rate is rather low (~2000 message per
> sec), but the processing is relatively heavy, that we need to scale out
> to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also
> low. But because of the number of workers, it keeps 3000 connections to
> the database and it keeps hitting the database connection limits.
>
> Is there a way that we can reduce the concurrency only at the output
> stage? (In Spark we would have done a repartition/coalesce).
>
> And, if it matters, we are using Apache Beam 2.2 via Scio, on Google
> Dataflow.
>
> Thank you in advance!
>
>
>
>

Reply via email to