I wonder if this is https://issues.apache.org/jira/browse/BEAM-3245 , but in that JIRA it is unclear exactly in what cases it is broken and how much. +Thomas Groh <[email protected]> +Kenn Knowles <[email protected]> Dataflow Runner normally doesn't leak DoFn's if their ProcessElement threw an exception, right?
On Wed, Mar 14, 2018 at 12:24 PM Romain Manni-Bucau <[email protected]> wrote: > side note: try to do a thread dump on the workers maybe > > > Romain Manni-Bucau > @rmannibucau <https://twitter.com/rmannibucau> | Blog > <https://rmannibucau.metawerx.net/> | Old Blog > <http://rmannibucau.wordpress.com> | Github > <https://github.com/rmannibucau> | LinkedIn > <https://www.linkedin.com/in/rmannibucau> | Book > <https://www.packtpub.com/application-development/java-ee-8-high-performance> > > 2018-03-14 20:21 GMT+01:00 Eugene Kirpichov <[email protected]>: > >> "Jdbcio will create for each prepared statement new connection" - this is >> not the case: the connection is created in @Setup and deleted in @Teardown. >> >> https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L503 >> >> https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L631 >> >> Something else must be going wrong. >> >> On Wed, Mar 14, 2018 at 12:11 PM Aleksandr <[email protected]> >> wrote: >> >>> Hello, we had similar problem. Current jdbcio will cause alot of >>> connection errors. >>> >>> Typically you have more than one prepared statement. Jdbcio will create >>> for each prepared statement new connection(and close only in teardown) So >>> it is possible that connection will get timeot or in case in case of auto >>> scaling you will get to many connections to sql. >>> Our solution was to create connection pool in setup and get connection >>> and return back to pool in processElement. >>> >>> Best Regards, >>> Aleksandr Gortujev. >>> >>> 14. märts 2018 8:52 PM kirjutas kuupäeval "Jean-Baptiste Onofré" < >>> [email protected]>: >>> >>> Agree especially using the current JdbcIO impl that creates connection >>> in the @Setup. Or it means that @Teardown is never called ? >>> >>> Regards >>> JB >>> Le 14 mars 2018, à 11:40, Eugene Kirpichov <[email protected]> a >>> écrit: >>>> >>>> Hi Derek - could you explain where does the "3000 connections" number >>>> come from, i.e. how did you measure it? It's weird that 5-6 workers would >>>> use 3000 connections. >>>> >>>> On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> We are new to Beam and need some help. >>>>> >>>>> We are working on a flow to ingest events and writes the aggregated >>>>> counts to a database. The input rate is rather low (~2000 message per >>>>> sec), but the processing is relatively heavy, that we need to scale out >>>>> to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is >>>>> also >>>>> low. But because of the number of workers, it keeps 3000 connections to >>>>> the database and it keeps hitting the database connection limits. >>>>> >>>>> Is there a way that we can reduce the concurrency only at the output >>>>> stage? (In Spark we would have done a repartition/coalesce). >>>>> >>>>> And, if it matters, we are using Apache Beam 2.2 via Scio, on Google >>>>> Dataflow. >>>>> >>>>> Thank you in advance! >>>>> >>>>> >>>>> >>>>> >>> >
