Re: [HACKERS] Slowness of extended protocol

Shay Rojansky Tue, 09 Aug 2016 06:00:07 -0700

On Tue, Aug 9, 2016 at 8:50 AM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:


> Shay>There are many scenarios where connections are very short-lived
> (think about webapps where a pooled connection is allocated per-request and
> reset in between)
>
> Why the connection is reset in between in the first place?
> In pgjdbc we do not reset per-connection statement cache, thus we easily
> reuse named statements for pooled connections.
>

A DISCARD ALL is sent when the connection is returned to the pool, the
prevent state leakage etc. You make a valid comment though - there's
already a new feature in the Npgsql dev branch which allows prepared
statements to be persisted across open/close on pooled connections, it's
interesting to learn that it is standard behavior in pgjdbc. At least up to
now, the logic was not to implcitly keep holding server-side resources
across a pooled connection close, because these may become a drain on the
server etc.

More important, unless I'm mistaken pgbouncer also sends DISCARD ALL to
clean the connection state, as will other pooling solutions. That
unfortunately means that you can't always depend on prepared statements to
persist after closing the connection.


> Shay>There are also many scenarios where you're not necessarily going to
> send the same query multiple times in a single connection lifespan, so
> preparing is again out of the question.
>

> Can you list at least one scenario of that kind, so we can code it into
> pgbench (or alike) and validate "simple vs prepared" performance?
>

Again, in a world where prepared statements aren't persisted across
connections (e.g. pgbouncer), this scenario is extremely common. Any
scenario where you open a relatively short-lived connection and execute
something once is problematic - imagine a simple web service which needs to
insert a single record into a table.

Shay>And more generally, there's no reason for a basic, non-prepared
> execution to be slower than it can be.
>
> That's too generic. If the performance for "end-to-end cases" is just
> fine, then it is not worth optimizing further. Typical application best
> practice is to reuse SQL text (for both security and performance point of
> views), so in typical applications I've seen, query text was reused, thus
> it naturally was handled by server-prepared logic.
>

I don't see how reusing SQL text affects security in any way.

But here's the more important general point. We're driver developers, not
application developers. I don't really know what performance is "just fine"
for each of my users, and what is not worth optimizing further. Users may
follow best practices, or they may not for various reasons. They may be
porting code over from SqlServer, where prepare is almost never used
(because SqlServer caches plans implicitly), or they may simply not be very
good programmers. The API for executing non-prepared statements is there,
we support it and PostgreSQL supports it - it just happens to not be very
fast. Of course we can say "screw everyone not preparing statements", but
that doesn't seem like a very good way to treat users. Especially since the
fix isn't that hard.

Let me highlight another direction: current execution of server-prepared
> statement requires some copying of "parse tree" (or whatever). I bet it
> would be much better investing in removal of that copying rather than
> investing into "make one-time queries faster" thing. If we could make
> "Exec" processing faster, it would immediately improve tons of applications.
>

I don't understand what exactly you're proposing here, but if you have some
unrelated optimization that would speed up prepared statements, by all
means that's great. It's just unrelated to this thread.


> Shay>Of course we can choose a different query to benchmark instead of
> SELECT 1 - feel free to propose one (or several).
>
> I've tried pgbench -M prepared, and it is way faster than pgbench -M
> simple.
>
> Once again: all cases I have in mind would benefit from reusing
> server-prepared statements. In other words, after some warmup the
> appication would use just Bind-Execute-Sync kind of messages, and it would
> completely avoid Parse/Describe ones.
>

Of course that's the ideal scenario. It's just not the *only* scenario for
all users - they may either not have prepared statements persisting across
open/close as detailed above, or their code may simply not be preparing
statements at the moment. Why not help them out for free?

Shay>FYI in Npgsql specifically describe isn't used to get any knowledge
> about parameters - users must populate the correct parameters or query
> execution fails.
>
> I think the main reason to describe for pgjdbc is to get result oids.
> pgjdbc is not "full binary", thus it has to be careful which fields it
> requests in binary format.
> That indeed slows down "unknown queries", but as the query gets reused,
> pgjdbc switches to server-prepared execution, and eliminates parse-describe
> overheads completely.
>

Npgsql has a very different architecture. It does not attempt to decide on
its own when to prepare statements and when not to - it's a very simple
driver that does exactly what the user requests. If the user requests that
a statement is prepared, it's prepared, otherwise it's not prepared.

FYI, if it's *result* OIDs you're concerned with, the Npgsql solution is to
always request binary results, unless the user explicitly requests certain
fields in text (via a special API). This means Npgsql never breaks the
query into two roundtrips - Parse/Bind/Describe/Execute/Sync are always
sent in a single roundtrip.

Re: [HACKERS] Slowness of extended protocol

Reply via email to