Hi Derek - could you explain where does the "3000 connections" number come from, i.e. how did you measure it? It's weird that 5-6 workers would use 3000 connections.
On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <[email protected]> wrote: > Hi, > > We are new to Beam and need some help. > > We are working on a flow to ingest events and writes the aggregated > counts to a database. The input rate is rather low (~2000 message per > sec), but the processing is relatively heavy, that we need to scale out > to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also > low. But because of the number of workers, it keeps 3000 connections to > the database and it keeps hitting the database connection limits. > > Is there a way that we can reduce the concurrency only at the output > stage? (In Spark we would have done a repartition/coalesce). > > And, if it matters, we are using Apache Beam 2.2 via Scio, on Google > Dataflow. > > Thank you in advance! > > > >
