Re: Reducing database connection with JdbcIO

Romain Manni-Bucau Wed, 14 Mar 2018 12:22:02 -0700

A pool would only make sense if you can get a singleton for the
JVM/datasource (not even the pipeline on this one) - there is a disucssion
on that on dev@ more generally than just on IO.
A pool of size one without any validation config is like having a single
connection you reuse for each bundle if it is still open - but it requires
a new jar ;).
Think the validation strategy can make sense and limiting the concurrency
as well since RDBMS will not behave better with hundreds of clients than
some dozens.


The current workaround can be to set a datasource supplier which will use a
pool correctly configured which is set from a singleton in your app code
probably.



Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>

2018-03-14 20:11 GMT+01:00 Aleksandr <[email protected]>:

> Hello, we had similar problem. Current jdbcio will cause alot of
> connection errors.
>
> Typically you have more than one prepared statement. Jdbcio will create
> for each prepared statement new connection(and close only in teardown) So
> it is possible that connection will get timeot or in case in case of auto
> scaling you will get to many connections to sql.
> Our solution was to create connection pool in setup and get connection and
> return back to pool in processElement.
>
> Best Regards,
> Aleksandr Gortujev.
>
> 14. märts 2018 8:52 PM kirjutas kuupäeval "Jean-Baptiste Onofré" <
> [email protected]>:
>
> Agree especially using the current JdbcIO impl that creates connection in
> the @Setup. Or it means that @Teardown is never called ?
>
> Regards
> JB
> Le 14 mars 2018, à 11:40, Eugene Kirpichov <[email protected]> a écrit:
>>
>> Hi Derek - could you explain where does the "3000 connections" number
>> come from, i.e. how did you measure it? It's weird that 5-6 workers would
>> use 3000 connections.
>>
>> On Wed, Mar 14, 2018 at 3:50 AM Derek Chan <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> We are new to Beam and need some help.
>>>
>>> We are working on a flow to ingest events and writes the aggregated
>>> counts to a database. The input rate is rather low (~2000 message per
>>> sec), but the processing is relatively heavy, that we need to scale out
>>> to 5~6 nodes. The output (via JDBC) is aggregated, so the volume is also
>>> low. But because of the number of workers, it keeps 3000 connections to
>>> the database and it keeps hitting the database connection limits.
>>>
>>> Is there a way that we can reduce the concurrency only at the output
>>> stage? (In Spark we would have done a repartition/coalesce).
>>>
>>> And, if it matters, we are using Apache Beam 2.2 via Scio, on Google
>>> Dataflow.
>>>
>>> Thank you in advance!
>>>
>>>
>>>
>>>
>

Re: Reducing database connection with JdbcIO

Reply via email to