Re: Reducing database connection with JdbcIO

Jean-Baptiste Onofré Wed, 14 Mar 2018 08:01:47 -0700

Hi Derek,

I think you could be interested by:


https://github.com/apache/beam/pull/4461

related to BEAM-3500.

I introduced an internal poolable datasource.

I hope it could help.

Regards
JB

On 14/03/2018 11:49, Derek Chan wrote:

Hi,

We are new to Beam and need some help.
We are working on a flow to ingest events and writes the aggregatedcounts to a database. The input rate is rather low (~2000 message persec), but the processing is relatively heavy, that we need to scale outto 5~6 nodes. The output (via JDBC) is aggregated, so the volume is alsolow. But because of the number of workers, it keeps 3000 connections tothe database and it keeps hitting the database connection limits.
Is there a way that we can reduce the concurrency only at the outputstage? (In Spark we would have done a repartition/coalesce).
And, if it matters, we are using Apache Beam 2.2 via Scio, on GoogleDataflow.
Thank you in advance!

Re: Reducing database connection with JdbcIO

Reply via email to