On Fri, Feb 9, 2018 at 4:14 PM, Shay Rojansky <r...@roji.org> wrote:
> Am a bit late to this thread, sorry if I'm slightly rehashing things. I'd
> like to go back to the basic on this.
> Unless I'm mistaken, at least in the Java and .NET world, clients are
> almost always expected to have their own connection pooling, either
> implemented inside the driver (ADO.NET model) or as a separate modular
> component (JDBC). This approach has a few performance advantages:
> 1. "Opening" a new pooled connection is virtually free - no TCP connection
> needs to be opened, no I/O, no startup packet, nothing (only a tiny bit of
> 2. Important client state can be associated to physical connections. For
> example, prepared statements can be tracked on the physical connection, and
> persisted when the connection is returned to the pool. The next time the
> physical connection is returned from the pool, if the user tries to
> server-prepare a statement, we can check on the connection if it has
> already been prepared in a "previous lifetime", and if so, no need to
> prepare again. This is vital for scenarios with short-lived (pooled)
> connections, such as web. Npgsql does this.
> Regarding the problem of idle connections being kept open by clients, I'd
> argue it's a client-side problem. If the client is using a connection pool,
> the pool should be configurable to close idle connections after a certain
> time (I think this is relatively standard behavior). If the client isn't
> using a pool, it seems to be the application's responsibility to release
> connections when they're no longer needed.
> The one drawback is that the pooling is application-specific, so it can't
> be shared by multiple applications/hosts. So in some scenarios it may make
> sense to use both client pooling and proxy/server pooling.
> To sum it up, I would argue that connection pooling should first and
> foremost be considered as a client feature, rather than a proxy feature
> (pgpool) or server feature (the PostgreSQL pooling being discussed here).
> This isn't to say server-side pooling has no value though.
Recently, I did a large amount of parallel data processing where the
results were stored in PG. I had about 1000 workers each with their own PG
connection. As you pointed out, application pooling doesn't make sense in
this scenario. I tried pgpool and pgbouncer, and both ended up as the
bottleneck. Overall throughput was not great but it was highest without a
pooler. That aligns with Konstantin's benchmarks too. As far as I know,
server pooling is the only solution to increase throughput, without
upgrading hardware, for this use case.
I hope this PR gets accepted!