"Jdbcio will create for each prepared statement new connection" - this is
not the case: the connection is created in @Setup and deleted in @Teardown.
https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L503
https://github.com/apache/beam/blob/v2.3.0/sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcIO.java#L631

Something else must be going wrong.

On Wed, Mar 14, 2018 at 12:11 PM Aleksandr <[email protected]> wrote:

> Hello, we had similar problem. Current jdbcio will cause alot of
> connection errors.
>
> Typically you have more than one prepared statement. Jdbcio will create
> for each prepared statement new connection(and close only in teardown) So
> it is possible that connection will get timeot or in case in case of auto
> scaling you will get to many connections to sql.
> Our solution was to create connection pool in setup and get connection and
> return back to pool in processElement.
>
> Best Regards,
> Aleksandr Gortujev.
>
> 14. märts 2018 8:52 PM kirjutas kuupäeval "Jean-Baptiste Onofré" <
> [email protected]>:
>
> Agree especially using the current JdbcIO impl that creates connection in
> the @Setup. Or it means that @Teardown is never called ?
>
> Regards
> JB
> Le 14 mars 2018, à 11:40, Eugene Kirpichov <[email protected]> a écrit:
>>
>> Hi Derek - could you explain where does the "3000 connections" number
>> come from, i.e. how did you measure it? It's weird that 5-6 workers would
>> use 3000 connections.
>>
>> On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> We are new to Beam and need some help.
>>>
>>> We are working on a flow to ingest events and writes the aggregated
>>> counts to a database. The input rate is rather low (~2000 message per
>>> sec), but the processing is relatively heavy, that we need to scale out
>>> to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also
>>> low. But because of the number of workers, it keeps 3000 connections to
>>> the database and it keeps hitting the database connection limits.
>>>
>>> Is there a way that we can reduce the concurrency only at the output
>>> stage? (In Spark we would have done a repartition/coalesce).
>>>
>>> And, if it matters, we are using Apache Beam 2.2 via Scio, on Google
>>> Dataflow.
>>>
>>> Thank you in advance!
>>>
>>>
>>>
>>>
>

Reply via email to