Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Jan Wieck

On 6/8/23 15:57, Dave Cramer wrote:


Apparently this is coming in pgbouncer Support of prepared statements by 
knizhnik · Pull Request #845 · pgbouncer/pgbouncer (github.com) 



I am quite interested in that patch. Considering how pgbouncer works 
internally I am very curious.



Jan





Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Dave Cramer
On Thu, 8 Jun 2023 at 15:49, Jan Wieck  wrote:

> On 6/8/23 13:31, Dave Cramer wrote:
> >
> > On Thu, 8 Jun 2023 at 11:22, Konstantin Knizhnik  > > wrote:
> >
>
> > So it will be responsibility of client to remember text of prepared
> > query to be able to resend it when statement doesn't exists at
> server?
> > IMHO very strange decision. Why not to handle it in connection
> > pooler (doesn't matter - external or embedded)?
> >
> >
> > I may be myopic but in the JDBC world and I assume others we have a
> > `PreparedStatement` object which has the text of the query.
> > The text is readily available to us.
> >
> > Also again from the JDBC point of view we have use un-named statements
> > normally and then name them after 5 uses so we already have embedded
> > logic on how to deal with PreparedStatements
>
> The entire problem only surfaces when using a connection pool of one
> sort or another. Without one the session is persistent to the client.
>
> At some point I created a "functional" proof of concept for a connection
> pool that did a mapping of the client side name to a pool managed server
> side name. It kept track of which query was known by a server. It kept a
> hashtable of poolname+username+query MD5 sums. On each prepare request
> it would look up if that query is known, add a query-client reference in
> another hashtable and so on. On a Bind/Exec message it would check that
> the server has the query prepared and issue a P message if not. What was
> missing was to keep track of no longer needed queries and deallocate them.
>
> As said, it was a POC. Since it was implemented in Tcl it performed
> miserable, but I got it to the point of being able to pause & resume and
> the whole thing did work with prepared statements on the transaction
> level. So it was a full functioning POC.
>
> What makes this design appealing to me is that it is completely
> transparent to every existing client that uses the extended query
> protocol for server side prepared statements.
>

Apparently this is coming in pgbouncer Support of prepared statements by
knizhnik · Pull Request #845 · pgbouncer/pgbouncer (github.com)


Dave

>
>
> Jan
>
>


Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Jan Wieck

On 6/8/23 13:31, Dave Cramer wrote:


On Thu, 8 Jun 2023 at 11:22, Konstantin Knizhnik > wrote:





So it will be responsibility of client to remember text of prepared
query to be able to resend it when statement doesn't exists at server?
IMHO very strange decision. Why not to handle it in connection
pooler (doesn't matter - external or embedded)?


I may be myopic but in the JDBC world and I assume others we have a 
`PreparedStatement` object which has the text of the query.

The text is readily available to us.

Also again from the JDBC point of view we have use un-named statements 
normally and then name them after 5 uses so we already have embedded 
logic on how to deal with PreparedStatements


The entire problem only surfaces when using a connection pool of one 
sort or another. Without one the session is persistent to the client.


At some point I created a "functional" proof of concept for a connection 
pool that did a mapping of the client side name to a pool managed server 
side name. It kept track of which query was known by a server. It kept a 
hashtable of poolname+username+query MD5 sums. On each prepare request 
it would look up if that query is known, add a query-client reference in 
another hashtable and so on. On a Bind/Exec message it would check that 
the server has the query prepared and issue a P message if not. What was 
missing was to keep track of no longer needed queries and deallocate them.


As said, it was a POC. Since it was implemented in Tcl it performed 
miserable, but I got it to the point of being able to pause & resume and 
the whole thing did work with prepared statements on the transaction 
level. So it was a full functioning POC.


What makes this design appealing to me is that it is completely 
transparent to every existing client that uses the extended query 
protocol for server side prepared statements.



Jan





Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Dave Cramer
On Thu, 8 Jun 2023 at 11:22, Konstantin Knizhnik  wrote:

>
>
> On 08.06.2023 6:18 PM, Dave Cramer wrote:
>
>
>
> On Thu, 8 Jun 2023 at 11:15, Jan Wieck  wrote:
>
>> On 6/8/23 10:56, Dave Cramer wrote:
>> >
>> >
>> >
>> >
>> > On Thu, 8 Jun 2023 at 10:31, Jan Wieck > > > wrote:
>> >
>> > On 6/8/23 09:53, Jan Wieck wrote:
>> >  > On 6/8/23 09:21, Dave Cramer wrote:
>> >  > The server doesn't know about all the clients of the pooler, does
>> > it? It
>> >  > has no way of telling if/when a client disconnects from the
>> pooler.
>> >
>> > Another problem that complicates doing it in the server is that the
>> > information require to (re-)prepare a statement in a backend that
>> > currently doesn't have it needs to be kept in shared memory. This
>> > includes the query string itself. Doing that without shared memory
>> in a
>> > pooler that is multi-threaded or based on async-IO is much simpler
>> and
>> > allows for easy ballooning.
>> >
>> >
>> > I don't expect the server to re-prepare the statement. If the server
>> > responds with "statement doesn't exist" the client would send a prepare.
>>
>> Are you proposing a new libpq protocol version?
>>
>
> I believe we would need to add this to the protocol, yes.
>
>
>
> So it will be responsibility of client to remember text of prepared query
> to be able to resend it when statement doesn't exists at server?
> IMHO very strange decision. Why not to handle it in connection pooler
> (doesn't matter - external or embedded)?
>

I may be myopic but in the JDBC world and I assume others we have a
`PreparedStatement` object which has the text of the query.
The text is readily available to us.

Also again from the JDBC point of view we have use un-named statements
normally and then name them after 5 uses so we already have embedded logic
on how to deal with PreparedStatements

Dave


Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Konstantin Knizhnik



On 08.06.2023 6:18 PM, Dave Cramer wrote:



On Thu, 8 Jun 2023 at 11:15, Jan Wieck  wrote:

On 6/8/23 10:56, Dave Cramer wrote:
>
>
>
>
> On Thu, 8 Jun 2023 at 10:31, Jan Wieck  > wrote:
>
>     On 6/8/23 09:53, Jan Wieck wrote:
>      > On 6/8/23 09:21, Dave Cramer wrote:
>      > The server doesn't know about all the clients of the
pooler, does
>     it? It
>      > has no way of telling if/when a client disconnects from
the pooler.
>
>     Another problem that complicates doing it in the server is
that the
>     information require to (re-)prepare a statement in a backend
that
>     currently doesn't have it needs to be kept in shared memory.
This
>     includes the query string itself. Doing that without shared
memory in a
>     pooler that is multi-threaded or based on async-IO is much
simpler and
>     allows for easy ballooning.
>
>
> I don't expect the server to re-prepare the statement. If the
server
> responds with "statement doesn't exist" the client would send a
prepare.

Are you proposing a new libpq protocol version?


I believe we would need to add this to the protocol, yes.



So it will be responsibility of client to remember text of prepared 
query to be able to resend it when statement doesn't exists at server?
IMHO very strange decision. Why not to handle it in connection pooler 
(doesn't matter - external or embedded)?


Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Dave Cramer
On Thu, 8 Jun 2023 at 11:15, Jan Wieck  wrote:

> On 6/8/23 10:56, Dave Cramer wrote:
> >
> >
> >
> >
> > On Thu, 8 Jun 2023 at 10:31, Jan Wieck  > > wrote:
> >
> > On 6/8/23 09:53, Jan Wieck wrote:
> >  > On 6/8/23 09:21, Dave Cramer wrote:
> >  > The server doesn't know about all the clients of the pooler, does
> > it? It
> >  > has no way of telling if/when a client disconnects from the
> pooler.
> >
> > Another problem that complicates doing it in the server is that the
> > information require to (re-)prepare a statement in a backend that
> > currently doesn't have it needs to be kept in shared memory. This
> > includes the query string itself. Doing that without shared memory
> in a
> > pooler that is multi-threaded or based on async-IO is much simpler
> and
> > allows for easy ballooning.
> >
> >
> > I don't expect the server to re-prepare the statement. If the server
> > responds with "statement doesn't exist" the client would send a prepare.
>
> Are you proposing a new libpq protocol version?
>

I believe we would need to add this to the protocol, yes.

Dave

>
>
> Jan
>


Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Jan Wieck

On 6/8/23 10:56, Dave Cramer wrote:





On Thu, 8 Jun 2023 at 10:31, Jan Wieck > wrote:


On 6/8/23 09:53, Jan Wieck wrote:
 > On 6/8/23 09:21, Dave Cramer wrote:
 > The server doesn't know about all the clients of the pooler, does
it? It
 > has no way of telling if/when a client disconnects from the pooler.

Another problem that complicates doing it in the server is that the
information require to (re-)prepare a statement in a backend that
currently doesn't have it needs to be kept in shared memory. This
includes the query string itself. Doing that without shared memory in a
pooler that is multi-threaded or based on async-IO is much simpler and
allows for easy ballooning.


I don't expect the server to re-prepare the statement. If the server 
responds with "statement doesn't exist" the client would send a prepare.


Are you proposing a new libpq protocol version?


Jan




Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Dave Cramer
On Thu, 8 Jun 2023 at 10:31, Jan Wieck  wrote:

> On 6/8/23 09:53, Jan Wieck wrote:
> > On 6/8/23 09:21, Dave Cramer wrote:
> > The server doesn't know about all the clients of the pooler, does it? It
> > has no way of telling if/when a client disconnects from the pooler.
>
> Another problem that complicates doing it in the server is that the
> information require to (re-)prepare a statement in a backend that
> currently doesn't have it needs to be kept in shared memory. This
> includes the query string itself. Doing that without shared memory in a
> pooler that is multi-threaded or based on async-IO is much simpler and
> allows for easy ballooning.
>
>
I don't expect the server to re-prepare the statement. If the server
responds with "statement doesn't exist" the client would send a prepare.

Dave


Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Dave Cramer
On Thu, 8 Jun 2023 at 09:53, Jan Wieck  wrote:

> On 6/8/23 09:21, Dave Cramer wrote:
> >
> >
> > On Thu, Jun 8, 2023 at 8:43 AM Jan Wieck  > > wrote:
> >
> > On 6/8/23 02:15, Konstantin Knizhnik wrote:
> >
> >  > There is a PR with support of prepared statement support to
> > pgbouncer:
> >  > https://github.com/pgbouncer/pgbouncer/pull/845
> > 
> >  > any feedback, reviews and suggestions are welcome.
> >
> > I was about to say that the support would have to come from the
> pooler
> > as it is possible to have multiple applications in different
> languages
> > connecting to the same pool(s).
> >
> >
> > Why from the pooler? If it were done at the server every client could
> > use it?
>
> The server doesn't know about all the clients of the pooler, does it? It
> has no way of telling if/when a client disconnects from the pooler.
>

Why does it have to know if the client disconnects ? It just keeps a cache
of prepared statements.
In large apps it is very likely there will be another client wanting to use
the statement

Dave

>
>


Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Jan Wieck

On 6/8/23 09:53, Jan Wieck wrote:

On 6/8/23 09:21, Dave Cramer wrote:
The server doesn't know about all the clients of the pooler, does it? It
has no way of telling if/when a client disconnects from the pooler.


Another problem that complicates doing it in the server is that the 
information require to (re-)prepare a statement in a backend that 
currently doesn't have it needs to be kept in shared memory. This 
includes the query string itself. Doing that without shared memory in a 
pooler that is multi-threaded or based on async-IO is much simpler and 
allows for easy ballooning.



Jan





Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Jan Wieck

On 6/8/23 09:21, Dave Cramer wrote:



On Thu, Jun 8, 2023 at 8:43 AM Jan Wieck > wrote:


On 6/8/23 02:15, Konstantin Knizhnik wrote:

 > There is a PR with support of prepared statement support to
pgbouncer:
 > https://github.com/pgbouncer/pgbouncer/pull/845

 > any feedback, reviews and suggestions are welcome.

I was about to say that the support would have to come from the pooler
as it is possible to have multiple applications in different languages
connecting to the same pool(s).


Why from the pooler? If it were done at the server every client could 
use it?


The server doesn't know about all the clients of the pooler, does it? It 
has no way of telling if/when a client disconnects from the pooler.



Jan




Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Konstantin Knizhnik



On 08.06.2023 3:43 PM, Jan Wieck wrote:

On 6/8/23 02:15, Konstantin Knizhnik wrote:


There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.


I was about to say that the support would have to come from the pooler 
as it is possible to have multiple applications in different languages 
connecting to the same pool(s)


Ideally, support should be provided by both sides: only pooler knows 
mapping between clients and postgres backends and only server knows
which queries require session semantic and which not (in principle it is 
possible to make connection pooler to determine it, but it is very 
non-trivial).

.

I can certainly give this a try, possibly over the weekend. I have a 
TPC-C that can use prepared statements plus pause/resume. That might 
be a good stress for it.




By the way, I have done some small benchmarking of different connection 
poolers for Postgres.
Benchmark was very simple: I just create small pgbench database with 
scale 10 and then

run read-only queries with 100 clients:

pgbench -c 100 -P 10 -T 100 -S -M prepared postgres


Number of connections to the database was limited in an all pooler
configurations to 10. I have tested only transaction mode. If pooler 
supports prepared statements, I have also tested them.
Just for reference I also include results with direct connection to 
Postgres.
All benchamrking was done at my notebook, so it is not quite 
representative scenario.



Direct:
Connections  Prepared TPS
10   yes   135507
10   no 73218
100  yes79042
100  no 59245

Pooler: (100 client connections, 10 server connections, transaction mode)
Pooler Prepared TPS
pgbouncer  no  65029
pgbouncer-ps   no  65570
pgbouncer-ps   yes 65825
odysseyyes 18351
odysseyno  21299
pgagrolno  29673
pgcat  no  23247


Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Dave Cramer
On Thu, Jun 8, 2023 at 8:43 AM Jan Wieck  wrote:

> On 6/8/23 02:15, Konstantin Knizhnik wrote:
>
> > There is a PR with support of prepared statement support to pgbouncer:
> > https://github.com/pgbouncer/pgbouncer/pull/845
> > any feedback, reviews and suggestions are welcome.
>
> I was about to say that the support would have to come from the pooler
> as it is possible to have multiple applications in different languages
> connecting to the same pool(s).


Why from the pooler? If it were done at the server every client could use
it?

>
> Dave

>
> --
Dave Cramer


Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Jan Wieck

On 6/8/23 02:15, Konstantin Knizhnik wrote:


There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.


I was about to say that the support would have to come from the pooler 
as it is possible to have multiple applications in different languages 
connecting to the same pool(s).


I can certainly give this a try, possibly over the weekend. I have a 
TPC-C that can use prepared statements plus pause/resume. That might be 
a good stress for it.



Best Regards, Jan




Re: Named Prepared statement problems and possible solutions

2023-06-08 Thread Dave Cramer
Hi Konstantin,

Yes, I ran into Euler at pgcon and he mentioned this. I intend to test it.
I'd still like to see my proposal in the server.

Dave Cramer


On Thu, 8 Jun 2023 at 02:15, Konstantin Knizhnik  wrote:

>
>
> On 07.06.2023 10:48 PM, Dave Cramer wrote:
>
> Greetings,
>
> At pgcon last week I was speaking to some people about the problem we have
> with connection pools and named prepared statements.
>
> For context pgjdbc (and others) use un-named statements and then switch to
> named statements after using the statement N (default 5) times. In session
> mode this is not a problem. When the connection is closed by the
> application the pools generally issue "DISCARD ALL" and close all prepared
> statements. The next time the connection is opened the statement is
> prepared and all works as it should.
>
> However one of the more interesting use cases for pgbouncer is to use
> "TRANSACTION MODE" to manage idle sessions. In transaction mode the
> connection is returned to the pool after each transaction. There are usage
> patterns in large applications where clients have client pools and
> subsequently have large numbers of connections open. Sometimes in the
> thousands, unfortunately many of these are idle connections. Using
> transaction mode reduces the number of real connections to the database in
> many cases by orders of magnitude.
>
> Unfortunately this is incompatible with named prepared statements. From
> the client's point of view they have one session and named prepared
> statements are session objects. From one transaction to the next the
> physical connection can change along with the attached prepared statements.
>
> The idea that was discussed is when we prepare the statement we cache it
> in a statement cache and return a queryid much like the queryid used in
> pg_stat_statements.  Instead of executing the statement name we would
> execute the queryid.
>
> If the queryid did not exist, attempting to execute it would cause an
> error and cause the running transaction to fail. Retrieving the statement
> from the query cache would have to happen before the attempt to execute it
> and return an error to the client subsequently the client could re-prepare
> the statement and execute. This would have to happen in such a way as to
> not cause the transaction to fail.
>
> The one other idea that was proposed was to cache the statements in the
> client. However this does nothing to address the issue of managing idle
> connections.
>
> Regards,
> Dave Cramer
>
>
>
> There is a PR with support of prepared statement support to pgbouncer:
> https://github.com/pgbouncer/pgbouncer/pull/845
> any feedback, reviews and suggestions are welcome.
>


Re: Named Prepared statement problems and possible solutions

2023-06-07 Thread Konstantin Knizhnik



On 07.06.2023 10:48 PM, Dave Cramer wrote:

Greetings,

At pgcon last week I was speaking to some people about the problem we 
have with connection pools and named prepared statements.


For context pgjdbc (and others) use un-named statements and then 
switch to named statements after using the statement N (default 5) 
times. In session mode this is not a problem. When the connection is 
closed by the application the pools generally issue "DISCARD ALL" and 
close all prepared statements. The next time the connection is opened 
the statement is prepared and all works as it should.


However one of the more interesting use cases for pgbouncer is to use 
"TRANSACTION MODE" to manage idle sessions. In transaction mode the 
connection is returned to the pool after each transaction. There are 
usage patterns in large applications where clients have client pools 
and subsequently have large numbers of connections open. Sometimes in 
the thousands, unfortunately many of these are idle connections. Using 
transaction mode reduces the number of real connections to the 
database in many cases by orders of magnitude.


Unfortunately this is incompatible with named prepared statements. 
From the client's point of view they have one session and named 
prepared statements are session objects. From one transaction to the 
next the physical connection can change along with the attached 
prepared statements.


The idea that was discussed is when we prepare the statement we cache 
it in a statement cache and return a queryid much like the queryid 
used in pg_stat_statements. Instead of executing the statement name we 
would execute the queryid.


If the queryid did not exist, attempting to execute it would cause an 
error and cause the running transaction to fail. Retrieving the 
statement from the query cache would have to happen before the attempt 
to execute it and return an error to the client subsequently the 
client could re-prepare the statement and execute. This would have to 
happen in such a way as to not cause the transaction to fail.


The one other idea that was proposed was to cache the statements in 
the client. However this does nothing to address the issue of managing 
idle connections.


Regards,
Dave Cramer



There is a PR with support of prepared statement support to pgbouncer:
https://github.com/pgbouncer/pgbouncer/pull/845
any feedback, reviews and suggestions are welcome.

Named Prepared statement problems and possible solutions

2023-06-07 Thread Dave Cramer
Greetings,

At pgcon last week I was speaking to some people about the problem we have
with connection pools and named prepared statements.

For context pgjdbc (and others) use un-named statements and then switch to
named statements after using the statement N (default 5) times. In session
mode this is not a problem. When the connection is closed by the
application the pools generally issue "DISCARD ALL" and close all prepared
statements. The next time the connection is opened the statement is
prepared and all works as it should.

However one of the more interesting use cases for pgbouncer is to use
"TRANSACTION MODE" to manage idle sessions. In transaction mode the
connection is returned to the pool after each transaction. There are usage
patterns in large applications where clients have client pools and
subsequently have large numbers of connections open. Sometimes in the
thousands, unfortunately many of these are idle connections. Using
transaction mode reduces the number of real connections to the database in
many cases by orders of magnitude.

Unfortunately this is incompatible with named prepared statements. From the
client's point of view they have one session and named prepared statements
are session objects. From one transaction to the next the physical
connection can change along with the attached prepared statements.

The idea that was discussed is when we prepare the statement we cache it in
a statement cache and return a queryid much like the queryid used in
pg_stat_statements.  Instead of executing the statement name we would
execute the queryid.

If the queryid did not exist, attempting to execute it would cause an error
and cause the running transaction to fail. Retrieving the statement from
the query cache would have to happen before the attempt to execute it and
return an error to the client subsequently the client could re-prepare the
statement and execute. This would have to happen in such a way as to not
cause the transaction to fail.

The one other idea that was proposed was to cache the statements in the
client. However this does nothing to address the issue of managing idle
connections.

Regards,
Dave Cramer