Re: Cached/global query plans, autopreparation

2018-03-02 Thread Pavel Stehule
2018-03-02 21:51 GMT+01:00 Tom Lane :

> Andres Freund  writes:
> > On 2018-03-02 15:29:09 -0500, Bruce Momjian wrote:
> >> While I have heard people complain about how other databases cache
> >> prepare plans, I have heard few complaints about the Postgres approach,
> >> and I haven't even heard of people asking to control the documented
> "five
> >> or more" behavior.
>
> > This *constantly* is a problem.
>
> Yeah, I've certainly heard complaints about it.  I do agree with
> Bruce's conclusion that we should try to improve that behavior;
> but it's not entirely clear how.  (A user-frobbable knob isn't
> necessarily the best answer.)
>

Can be this problem reduced if we can count number of possible paths?

Maybe it can work for some simple queries, what is majority in pgbench.

When I migrate from Oracle, there was a issue slow planning of very complex
views - probably optimization on most common values can work well.

Still I have a idea about some optimization based not on searching the best
plan of one parameter vektor, but for searching the best plan for all
possible vectors - or best worst case plan.

I don't think so this issue is solvable without changing optimization
method.

Or don't lost time with probably useless work and move forward to dynamic
execution - for example - dynamic switch from nested loop, to hashjoin to
mergejoin ...



regards, tom lane
>
>


Re: Cached/global query plans, autopreparation

2018-03-02 Thread Tom Lane
Andres Freund  writes:
> On 2018-03-02 15:29:09 -0500, Bruce Momjian wrote:
>> While I have heard people complain about how other databases cache
>> prepare plans, I have heard few complaints about the Postgres approach,
>> and I haven't even heard of people asking to control the documented "five
>> or more" behavior.

> This *constantly* is a problem.

Yeah, I've certainly heard complaints about it.  I do agree with
Bruce's conclusion that we should try to improve that behavior;
but it's not entirely clear how.  (A user-frobbable knob isn't
necessarily the best answer.)

regards, tom lane



Re: Cached/global query plans, autopreparation

2018-03-02 Thread Andres Freund
On 2018-03-02 15:29:09 -0500, Bruce Momjian wrote:
> Postgres uses a conservative method for reusing plans with previous
> constants, as described in the PREPARE manual page:
> 
>   https://www.postgresql.org/docs/10/static/sql-prepare.html
>   Prepared statements can use generic plans rather than re-planning with
>   each set of supplied EXECUTE values. This occurs immediately for 
> prepared
>   statements with no parameters; otherwise it occurs only after five or 
> more
>   executions produce plans whose estimated cost average (including 
> planning
>   overhead) is more expensive than the generic plan cost estimate. Once
>   a generic plan is chosen, it is used for the remaining lifetime of the
>   prepared statement. Using EXECUTE values which are rare in columns with
>   many duplicates can generate custom plans that are so much cheaper than
>   the generic plan, even after adding planning overhead, that the generic
>   plan might never be used.
> 
> While I have heard people complain about how other databases cache
> prepare plans, I have heard few complaints about the Postgres approach,
> and I haven't even heard of people asking to control the documented "five
> or more" behavior.

This *constantly* is a problem.


Greetings,

Andres Freund



Re: Cached/global query plans, autopreparation

2018-03-02 Thread Bruce Momjian
On Thu, Feb 15, 2018 at 03:00:17PM +0100, Shay Rojansky wrote:
> Just wanted to say that I've seen more than 10% improvement in some real-world
> application when preparation was done properly. Also, I'm assuming that
> implementing this wouldn't involve "rewriting substantial part of Postgres
> core", and that even 10% is quite a big gain, especially if it's a 
> transparent/
> free one as far as the user is concerned (no application changes).

I would like to step back on this issue.  Ideally, every query would get
re-optimized because we can only be sure the plan is optimal when we use
supplied constants to generate the plan.  But, of course, parsing and
planning take time, so there ideally would be an way to avoid it.  The
question is always how much time will be saved by avoiding
parsing/planning, and what risk is there of suboptimal plans.

Postgres uses a conservative method for reusing plans with previous
constants, as described in the PREPARE manual page:

https://www.postgresql.org/docs/10/static/sql-prepare.html
Prepared statements can use generic plans rather than re-planning with
each set of supplied EXECUTE values. This occurs immediately for 
prepared
statements with no parameters; otherwise it occurs only after five or 
more
executions produce plans whose estimated cost average (including 
planning
overhead) is more expensive than the generic plan cost estimate. Once
a generic plan is chosen, it is used for the remaining lifetime of the
prepared statement. Using EXECUTE values which are rare in columns with
many duplicates can generate custom plans that are so much cheaper than
the generic plan, even after adding planning overhead, that the generic
plan might never be used.

While I have heard people complain about how other databases cache
prepare plans, I have heard few complaints about the Postgres approach,
and I haven't even heard of people asking to control the documented "five
or more" behavior.

I also know that other database products have more sophisticated prepare
usage, but they might have higher parse/plan overhead, or they might be
more flexible in handling specialized workloads, which Postgres might
not want to handle, given the costs/complexity/overhead.

So, the directions for improvement are:

1  Improve the existing "five or more" behavior
2  Automatically prepare queries that are not sent as prepared queries
3  Share plans among sessions

While #1 would be nice, #2 increases the number of applications that can
silently benefit from prepared queries, and #3 improves the number of
cases that query plans can be reused.  The issue with #3 is that the
constants used are no longer local to the session (which is the same
issue with connection poolers reusing prepared plans).  When different
sessions with potentially more varied constants reuse plans, the
probability of suboptimal plans increases.

I think the fact that pgbench shows a 2x improvement for prepared
statements, and real world reports are a 10% improvement means we need
to have a better understanding of exactly what workloads can benefit
from this, and a comprehensive analysis of all three areas of
improvement.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Cached/global query plans, autopreparation

2018-02-15 Thread Jorge Solórzano
On Thu, Feb 15, 2018 at 8:00 AM, Shay Rojansky  wrote:

> I am an author of one of the proposal (autoprepare which is in commit fest
>> now), but I think that sooner or later Postgres has to come to solution
>> with shared DB caches/prepared plans.
>> Please correct me if I am wrong, but it seems to me that most of all
>> other top DBMSes having something like this.
>> Such decision can provide a lot of different advantages:
>> 1. Better memory utilization: no need to store the same data N times
>> where N is number of backends and spend time for warming cache.
>> 2. Optimizer can spend more time choosing better plan which then can be
>> used by all clients. Even now time of compilation of some queries several
>> times exceeds time of their execution.
>> 3. It is simpler to add facilities for query plan tuning and maintaining
>> (storing, comparing,...)
>> 4. It make is possible to control size of memory used by caches. Right
>> now catalog cache for DB with hundred thousands and tables and indexes
>> multiplied by hundreds of backends can consume terabytes of memory.
>> 5. Shared caches can simplify invalidation mechanism.
>> 6. Almost all enterprise systems working with Postgres has to use some
>> kind of connection pooling (pgbouncer, pgpool,...). It almost exclude
>> possibility to use prepared statements. Which can slow down performance up
>> to two times.
>>
>
> Just wanted to say I didn't see this email before my previous response,
> but I agree with all of the above. The last point is particularly
> important, especially for short-lived connection scenarios, the most
> typical of which is web.
>
>
>> There is just one (but very important) problem which needs to be solved:
>> access to shared cache should be synchronized.
>> But there are a lot of other shared resources in Postgres (procarray,
>> shared buffers,...). So  I do not think that it is unsolvable problem and
>> that it can cause degrade of performance.
>>
>> So it seems to be obvious that shared caches/plans can provide a lot of
>> advantages. But it is still not clear to me the value of this advantages
>> for real customers.
>> Using -M prepared  protocol in pgbench workload can improve speed up to
>> two times. But I have asked real Postgres users in Avito, Yandex, MyOffice
>> and them told me
>> that on their workloads advantage of prepared statements is about 10%.
>> 10% performance improvement is definitely not a good compensation for
>> rewriting substantial part of Postgres core...
>>
>
> Just wanted to say that I've seen more than 10% improvement in some
> real-world application when preparation was done properly. Also, I'm
> assuming that implementing this wouldn't involve "rewriting substantial
> part of Postgres core", and that even 10% is quite a big gain, especially
> if it's a transparent/free one as far as the user is concerned (no
> application changes).
>


​10% of improvement in real-world can be pretty significant​, I ignore how
complicated can be to implement this in Postgres core, how about add this
to the GSoC 2018 ideas[1]?

[1] https://wiki.postgresql.org/wiki/GSoC_2018


Re: Cached/global query plans, autopreparation

2018-02-15 Thread Shay Rojansky
>
> I am an author of one of the proposal (autoprepare which is in commit fest
> now), but I think that sooner or later Postgres has to come to solution
> with shared DB caches/prepared plans.
> Please correct me if I am wrong, but it seems to me that most of all other
> top DBMSes having something like this.
> Such decision can provide a lot of different advantages:
> 1. Better memory utilization: no need to store the same data N times where
> N is number of backends and spend time for warming cache.
> 2. Optimizer can spend more time choosing better plan which then can be
> used by all clients. Even now time of compilation of some queries several
> times exceeds time of their execution.
> 3. It is simpler to add facilities for query plan tuning and maintaining
> (storing, comparing,...)
> 4. It make is possible to control size of memory used by caches. Right now
> catalog cache for DB with hundred thousands and tables and indexes
> multiplied by hundreds of backends can consume terabytes of memory.
> 5. Shared caches can simplify invalidation mechanism.
> 6. Almost all enterprise systems working with Postgres has to use some
> kind of connection pooling (pgbouncer, pgpool,...). It almost exclude
> possibility to use prepared statements. Which can slow down performance up
> to two times.
>

Just wanted to say I didn't see this email before my previous response, but
I agree with all of the above. The last point is particularly important,
especially for short-lived connection scenarios, the most typical of which
is web.


> There is just one (but very important) problem which needs to be solved:
> access to shared cache should be synchronized.
> But there are a lot of other shared resources in Postgres (procarray,
> shared buffers,...). So  I do not think that it is unsolvable problem and
> that it can cause degrade of performance.
>
> So it seems to be obvious that shared caches/plans can provide a lot of
> advantages. But it is still not clear to me the value of this advantages
> for real customers.
> Using -M prepared  protocol in pgbench workload can improve speed up to
> two times. But I have asked real Postgres users in Avito, Yandex, MyOffice
> and them told me
> that on their workloads advantage of prepared statements is about 10%. 10%
> performance improvement is definitely not a good compensation for rewriting
> substantial part of Postgres core...
>

Just wanted to say that I've seen more than 10% improvement in some
real-world application when preparation was done properly. Also, I'm
assuming that implementing this wouldn't involve "rewriting substantial
part of Postgres core", and that even 10% is quite a big gain, especially
if it's a transparent/free one as far as the user is concerned (no
application changes).


Re: Cached/global query plans, autopreparation

2018-02-15 Thread Shay Rojansky
>
> > Well, the issue is that implementing this is a major piece of work. This
> > post doesn't offer either resources nor a simpler way to do so. There's
> > no huge debate about the benefit of having a global plan cache, so I'm
> > not that surprised there's not a huge debate about a post arguing that
> > we should have one.
>
> Actually, I'm pretty darn skeptical about the value of such a cache for
> most use-cases.  But as long as it can be turned off and doesn't leave
> residual overhead nor massive added code cruft, I won't stand in the way
> of someone else investing their time in it.
>
> In any case, as you say, it's moot until somebody steps up to do it.
>

Well, looking at previous conversations and also at the comment above it
doesn't seem like there's a consensus on whether this feature would even be
beneficial... The point of my email above was to have that conversation
before looking into implementation. Tom, I'm especially interested in
understanding why you think this cache wouldn't help most use-cases: I see
many applications which don't prepare (i.e. because they use data access
layers/O/RMs which don't do it or expose it), and implementing this in the
driver seems like the wrong way (although Npgsql and JDBC do it, at least
some other languages don't).

In addition, there are also various options/possibilities here and there
seems no consensus about that either:

* How should statement plan caching be done for unprepared statements, what
strategy should be used? Should a threshold number of unprepared executions
be used before PostgreSQL decides to prepare? Should there be a maximum
number of autoprepared statements, ejecting the least-recently used one to
make room for a new one? Or something else?
* Should the cached plans be shared across connections ("global" cached
statements)? Are the savings from global caching greater than the cost of
the contention? The savings include both (a) not having to re-prepare the
same statement N times on different connections (typically just a one-time
application warm-up cost), and (b) not having the memory duplication of N
identical statements across statements (a constant cost, not warm-up - but
not sure how significant this is). Note that the global/shared discussion
is a bit orthogonal to the general autopreparation conversation - the
latter has value with or without the former.

Essentially I think it's a good idea to have a conversation about all this
before anyone jumps into implementation.


Re: Cached/global query plans, autopreparation

2018-02-15 Thread Konstantin Knizhnik



On 13.02.2018 20:13, Shay Rojansky wrote:

Hi all,

Was wondering if anyone has a reaction to my email below about 
statement preparation, was it too long? :)


(and sorry for top-posting)

On Tue, Feb 6, 2018 at 9:27 PM, Shay Rojansky > wrote:


Hi all.

Various versions of having PostgreSQL caching and/or autopreparing
statement plans have been discussed

(https://www.postgresql.org/message-id/op.t9ggb3wacigqcu%40apollo13.peufeu.com

,

https://www.postgresql.org/message-id/8e76d8fc-8b8c-14bd-d4d1-e9cf193a74f5%40postgrespro.ru

),
without clear conclusions or even an agreement on what might be
worthwhile to implement. I wanted to bring this up again from a
PostgreSQL driver maintainer's perspective (I'm the owner of
Npgsql, the open source .NET driver), apologies in advance if I'm
repeating things or I've missed crucial information. Below I'll
describe three relevant issues and what I've done to deal with them.

When the same statement is rerun, preparing it has a very
significant performance boost. However, in short-lived connection
scenarios it's frequently not possible to benefit from this -
think of a typical webapp which allocates a connection from a
pool, run a query and then return the connection. To make sure
prepared statements are used, Npgsql's connection pool doesn't
send DISCARD ALL when a connection is returned (to avoid wiping
out the connections), and maintains an internal table mapping SQL
(and parameter types) to a PostgreSQL statement name. The next
time the application attempts to prepare the same SQL, the
prepared statement is found in the table and no preparation needs
to occur. This means that prepared statements persist across
pooled connection open/close, and are never discarded unless the
user uses a specific API. While this works, the disadvantages are
that:
1. This kind of mechanism needs to be implemented again and again,
in each driver:
2. It relies on Npgsql's internal pooling, which can track
persistent prepared statements on physical connections. If an
external pool is used (e.g. pgpool), this isn't really possible.
1. It complicates resetting the session state (instead of DISCARD
ALL, a combination of all other reset commands except DEALLOCATE
ALL needs be sent). This is minor.

The second issue is that many applications don't work directly
against the database API (ADO.NET  in .NET, JDBC
in Java). If any sort of O/RM or additional layer is used, there's
a good chance that that layer doesn't prepare in any way, and
indeed hide your access to the database API's preparation method.
Two examples from the .NET world is dapper (a very popular
micro-O/RM) and Entity Framework. In order to provide the best
possible performance in these scenarios, Npgsql has an opt-in
feature whereby it tracks how many times a given statement was
executed, and once it passes a certain threshold automatically
prepares it. An LRU cache is then used to determine which prepared
statements to discard, to avoid explosion. In effect, statement
auto-preparation is implemented in the driver. I know that the
JDBC driver also implements such a mechanism (it was actually the
inspiration for the Npgsql feature). The issues with this are:

1. As above, this has to be implemented by every driver (and is
quite complex to do well)
2. There's a possible missed opportunity in having a single plan
on the server, as each connection has its own (the "global plan"
option). Many apps out there send the same statements across many
connections so this seems relevant - but I don't know if the gains
outweigh the contention impact in PostgreSQL.

Finally, since quite a few (most?) other databases include
autopreparation (SQL Server, Oracle...), users porting their
applications - which don't explicitly prepare - experience a big
performance drop. It can rightly be said that porting an
application across databases isn't a trivial task and that
adjustments need to be made, but from experience I can say that
PostgreSQL is losing quite a few users to this.

The above issues could be helped by having PostgreSQL cache on its
side (especially the second issue, which is the most important).
Ideally, any adopted solution would be transparent and not require
any modification to applications. It would also not impact
explicitly-prepared statements in any way.

Note that I'm not arguing for any specific implementation on the
PostgreSQL side (e.g. global or not), but just describing a need
and hoping to restart a conversation 

Re: Cached/global query plans, autopreparation

2018-02-14 Thread he...@visionlink.org
Any idea on how feasible it would be as an extention or is the work too
central to abstract that way?

 Chet Henry
Senior Software Developer - Dev Ops Liaison
VisionLink, Inc.
3101 Iris Ave, Ste 240
Boulder, CO 80301
  he...@visionlink.org


Site  | Blog  | Join
Our Team  | Try a
Demo 
[image: Twitter]    [image: Pinterest]
   [image: Facebook]
   [image:
LinkedIn]

   [image: YouTube]


On Wed, Feb 14, 2018 at 2:19 PM, Tom Lane  wrote:

> Andres Freund  writes:
> > On 2018-02-13 09:13:09 -0800, Shay Rojansky wrote:
> >> Was wondering if anyone has a reaction to my email below about statement
> >> preparation, was it too long? :)
>
> > Well, the issue is that implementing this is a major piece of work. This
> > post doesn't offer either resources nor a simpler way to do so. There's
> > no huge debate about the benefit of having a global plan cache, so I'm
> > not that surprised there's not a huge debate about a post arguing that
> > we should have one.
>
> Actually, I'm pretty darn skeptical about the value of such a cache for
> most use-cases.  But as long as it can be turned off and doesn't leave
> residual overhead nor massive added code cruft, I won't stand in the way
> of someone else investing their time in it.
>
> In any case, as you say, it's moot until somebody steps up to do it.
>
> regards, tom lane
>
>


Re: Cached/global query plans, autopreparation

2018-02-14 Thread Tom Lane
Andres Freund  writes:
> On 2018-02-13 09:13:09 -0800, Shay Rojansky wrote:
>> Was wondering if anyone has a reaction to my email below about statement
>> preparation, was it too long? :)

> Well, the issue is that implementing this is a major piece of work. This
> post doesn't offer either resources nor a simpler way to do so. There's
> no huge debate about the benefit of having a global plan cache, so I'm
> not that surprised there's not a huge debate about a post arguing that
> we should have one.

Actually, I'm pretty darn skeptical about the value of such a cache for
most use-cases.  But as long as it can be turned off and doesn't leave
residual overhead nor massive added code cruft, I won't stand in the way
of someone else investing their time in it.

In any case, as you say, it's moot until somebody steps up to do it.

regards, tom lane



Re: Cached/global query plans, autopreparation

2018-02-14 Thread he...@visionlink.org
​Coming from a PHP application I have several of the same concerns and
wishes​.  Given that php can not share any (resources) between requests it
would be impossible to accomplish what you have in .NET.  We still prepare
statements though for use in result sets and other loops (ORM driven).  I'm
wondering if it is possible to solve this as an extension to postgres?  If
Citus could refactor their code into an extension surly this is simple in
comparison.  I would be willing to help but PHP has made me soft, so it
will take a bit longer.

Thanks,

Chet Henry

On Tue, Feb 13, 2018 at 10:13 AM, Shay Rojansky  wrote:

> Hi all,
>
> Was wondering if anyone has a reaction to my email below about statement
> preparation, was it too long? :)
>
> (and sorry for top-posting)
>
> On Tue, Feb 6, 2018 at 9:27 PM, Shay Rojansky  wrote:
>
>> Hi all.
>>
>> Various versions of having PostgreSQL caching and/or autopreparing
>> statement plans have been discussed (https://www.postgresql.org/me
>> ssage-id/op.t9ggb3wacigqcu%40apollo13.peufeu.com, https://
>> www.postgresql.org/message-id/8e76d8fc-8b8c-14bd-d4d1-e9cf19
>> 3a74f5%40postgrespro.ru), without clear conclusions or even an agreement
>> on what might be worthwhile to implement. I wanted to bring this up again
>> from a PostgreSQL driver maintainer's perspective (I'm the owner of Npgsql,
>> the open source .NET driver), apologies in advance if I'm repeating things
>> or I've missed crucial information. Below I'll describe three relevant
>> issues and what I've done to deal with them.
>>
>> When the same statement is rerun, preparing it has a very significant
>> performance boost. However, in short-lived connection scenarios it's
>> frequently not possible to benefit from this - think of a typical webapp
>> which allocates a connection from a pool, run a query and then return the
>> connection. To make sure prepared statements are used, Npgsql's connection
>> pool doesn't send DISCARD ALL when a connection is returned (to avoid
>> wiping out the connections), and maintains an internal table mapping SQL
>> (and parameter types) to a PostgreSQL statement name. The next time the
>> application attempts to prepare the same SQL, the prepared statement is
>> found in the table and no preparation needs to occur. This means that
>> prepared statements persist across pooled connection open/close, and are
>> never discarded unless the user uses a specific API. While this works, the
>> disadvantages are that:
>> 1. This kind of mechanism needs to be implemented again and again, in
>> each driver:
>> 2. It relies on Npgsql's internal pooling, which can track persistent
>> prepared statements on physical connections. If an external pool is used
>> (e.g. pgpool), this isn't really possible.
>> 1. It complicates resetting the session state (instead of DISCARD ALL, a
>> combination of all other reset commands except DEALLOCATE ALL needs be
>> sent). This is minor.
>>
>> The second issue is that many applications don't work directly against
>> the database API (ADO.NET in .NET, JDBC in Java). If any sort of O/RM or
>> additional layer is used, there's a good chance that that layer doesn't
>> prepare in any way, and indeed hide your access to the database API's
>> preparation method. Two examples from the .NET world is dapper (a very
>> popular micro-O/RM) and Entity Framework. In order to provide the best
>> possible performance in these scenarios, Npgsql has an opt-in feature
>> whereby it tracks how many times a given statement was executed, and once
>> it passes a certain threshold automatically prepares it. An LRU cache is
>> then used to determine which prepared statements to discard, to avoid
>> explosion. In effect, statement auto-preparation is implemented in the
>> driver. I know that the JDBC driver also implements such a mechanism (it
>> was actually the inspiration for the Npgsql feature). The issues with this
>> are:
>>
>> 1. As above, this has to be implemented by every driver (and is quite
>> complex to do well)
>> 2. There's a possible missed opportunity in having a single plan on the
>> server, as each connection has its own (the "global plan" option). Many
>> apps out there send the same statements across many connections so this
>> seems relevant - but I don't know if the gains outweigh the contention
>> impact in PostgreSQL.
>>
>> Finally, since quite a few (most?) other databases include
>> autopreparation (SQL Server, Oracle...), users porting their applications -
>> which don't explicitly prepare - experience a big performance drop. It can
>> rightly be said that porting an application across databases isn't a
>> trivial task and that adjustments need to be made, but from experience I
>> can say that PostgreSQL is losing quite a few users to this.
>>
>> The above issues could be helped by having PostgreSQL cache on its side
>> (especially the second issue, which is the most important). Ideally, any
>> adopted solution would be 

Re: Cached/global query plans, autopreparation

2018-02-13 Thread Shay Rojansky
Hi all,

Was wondering if anyone has a reaction to my email below about statement
preparation, was it too long? :)

(and sorry for top-posting)

On Tue, Feb 6, 2018 at 9:27 PM, Shay Rojansky  wrote:

> Hi all.
>
> Various versions of having PostgreSQL caching and/or autopreparing
> statement plans have been discussed (https://www.postgresql.org/
> message-id/op.t9ggb3wacigqcu%40apollo13.peufeu.com, https:/
> /www.postgresql.org/message-id/8e76d8fc-8b8c-14bd-d4d1-
> e9cf193a74f5%40postgrespro.ru), without clear conclusions or even an
> agreement on what might be worthwhile to implement. I wanted to bring this
> up again from a PostgreSQL driver maintainer's perspective (I'm the owner
> of Npgsql, the open source .NET driver), apologies in advance if I'm
> repeating things or I've missed crucial information. Below I'll describe
> three relevant issues and what I've done to deal with them.
>
> When the same statement is rerun, preparing it has a very significant
> performance boost. However, in short-lived connection scenarios it's
> frequently not possible to benefit from this - think of a typical webapp
> which allocates a connection from a pool, run a query and then return the
> connection. To make sure prepared statements are used, Npgsql's connection
> pool doesn't send DISCARD ALL when a connection is returned (to avoid
> wiping out the connections), and maintains an internal table mapping SQL
> (and parameter types) to a PostgreSQL statement name. The next time the
> application attempts to prepare the same SQL, the prepared statement is
> found in the table and no preparation needs to occur. This means that
> prepared statements persist across pooled connection open/close, and are
> never discarded unless the user uses a specific API. While this works, the
> disadvantages are that:
> 1. This kind of mechanism needs to be implemented again and again, in each
> driver:
> 2. It relies on Npgsql's internal pooling, which can track persistent
> prepared statements on physical connections. If an external pool is used
> (e.g. pgpool), this isn't really possible.
> 1. It complicates resetting the session state (instead of DISCARD ALL, a
> combination of all other reset commands except DEALLOCATE ALL needs be
> sent). This is minor.
>
> The second issue is that many applications don't work directly against the
> database API (ADO.NET in .NET, JDBC in Java). If any sort of O/RM or
> additional layer is used, there's a good chance that that layer doesn't
> prepare in any way, and indeed hide your access to the database API's
> preparation method. Two examples from the .NET world is dapper (a very
> popular micro-O/RM) and Entity Framework. In order to provide the best
> possible performance in these scenarios, Npgsql has an opt-in feature
> whereby it tracks how many times a given statement was executed, and once
> it passes a certain threshold automatically prepares it. An LRU cache is
> then used to determine which prepared statements to discard, to avoid
> explosion. In effect, statement auto-preparation is implemented in the
> driver. I know that the JDBC driver also implements such a mechanism (it
> was actually the inspiration for the Npgsql feature). The issues with this
> are:
>
> 1. As above, this has to be implemented by every driver (and is quite
> complex to do well)
> 2. There's a possible missed opportunity in having a single plan on the
> server, as each connection has its own (the "global plan" option). Many
> apps out there send the same statements across many connections so this
> seems relevant - but I don't know if the gains outweigh the contention
> impact in PostgreSQL.
>
> Finally, since quite a few (most?) other databases include autopreparation
> (SQL Server, Oracle...), users porting their applications - which don't
> explicitly prepare - experience a big performance drop. It can rightly be
> said that porting an application across databases isn't a trivial task and
> that adjustments need to be made, but from experience I can say that
> PostgreSQL is losing quite a few users to this.
>
> The above issues could be helped by having PostgreSQL cache on its side
> (especially the second issue, which is the most important). Ideally, any
> adopted solution would be transparent and not require any modification to
> applications. It would also not impact explicitly-prepared statements in
> any way.
>
> Note that I'm not arguing for any specific implementation on the
> PostgreSQL side (e.g. global or not), but just describing a need and hoping
> to restart a conversation that will lead somewhere.
>
> (and thanks for reading this overly long message!)
>
> Shay
>