Re: [HACKERS] Slowness of extended protocol

Shay Rojansky Mon, 15 Aug 2016 06:58:28 -0700

On Sat, Aug 13, 2016 at 11:20 PM, Vladimir Sitnikov <
[email protected]> wrote:


> Tatsuo>Interesting. What would happen if a user changes some of GUC
> parameters? Subsequent session accidentally inherits the changed GUC
> parameter?
>
> Yes, that is what happens.
> The idea is not to mess with gucs.
>

Wow... That is... insane...


> Tatsuo>There's nothing wrong with DICARD ALL
> Tatsuo>"DISCARD ALL" is perfect for this goal.
>
> It looks like you mean: "let's reset the connection state just in case".
> I see where it might help: e.g. when providing database access to random
> people who might screw a connection in every possible way.
>

It's not about random people screwing up connections, it's about
maintaining isolation between different connections... When using an
out-of-process pool (e.g. pgbouncer/pgpool), it's entirely reasonable for
more than one app to use use the same pool. It's entirely possible for one
app to change a GUC and then the other app goes boom. It's especially hard
to debug too, because it depends on who got the physical connection before
you.

Even with in-process pools it's standard practice (and a good idea) to
reset state. The idea is that pooling, much like a cache (see below!), is
supposed to speed up your application without having any other visible
effects. In other words, running your app with or without a pool should be
identical except for performance aspects. If some part of a big complex
application modified state, and then some other part happens to get that
physical connection, it's extremely hard to make sense of things.

One note - in Npgsql's implementation of persistent prepared statements,
instead of sending DISCARD ALL Npgsql sends the commands listed in
https://www.postgresql.org/docs/current/static/sql-discard.html, except for
DEALLOCATE ALL. This cleans up state changes but leaves prepared statements
in place.

Just in case: do you think one should reset CPU caches, OS filesystem
> caches, DNS caches, bounce the application, and bounce the OS in addition
> to "discard all"?
> Why reset only "connection properties" when we can reset everything to its
> pristine state?
>

Um, because all the caches you mention are, well, caches - by definition
they're not supposed to have any visible impact on any application, except
for making it faster. This is somewhat similar to the CPU reordering you
keep coming back to - it's totally invisible. GUCs are the exact opposite -
you use them to modify how PostgreSQL behaves. So it makes perfect sense to
reset them.

Just in case: PostgreSQL does not execute "discard all" on its own.
>

Of course it doesn't - it doesn't know anything about connection pooling,
it only knows about physical connections. When would it execute "discard
all" on its own?


> If you mean "pgpool is exactly like reconnect to postgresql but faster
> since connection is already established", then OK, that might work in some
> cases (e.g. when response times/throughput are not important), however why
> forcing "you must always start from scratch" execution model?
>

To enforce isolation, which is maybe the most important way for programs to
be reliable - but this is a general theme which you don't seem to agree
with. Regardless, resetting state doesn't have to have a necessary effect
on response times/throughput.


> Developers are not that dumb, and they can understand that "changing gucs
> at random is bad".
>

This has nothing to do with random, developers may have a legitimate reason
to modify a GUC. In fact, GUCs are there precisely so that developers can
change them...

Also, session state is not only about GUCs. DISCARD ALL also releases
locks, resets authorization, resets sequence state (currval/nextval), etc.
etc. All these things should not leak across connections.

> Please read again. PreparedStatement is the only way to execute statements
> in JDBC API. There's no API that allows user to specify "use
> server-prepared here".
> Well, there's non-prepared API in JDBC, however it misses "bind
> variables" support,
> so if bind variables required, developer would use PreparedStatement.

> Java's PreparedStatement does not have an option to distinguish which
statements
> should be server-prepared and which should not.

This is something new - maybe it's part of the misunderstanding here. To
me, the term "prepared statements" always means "server-prepared
statements"; this seems to be supported by the documentation you quote: "If
the driver supports precompilation, the method prepareStatement will send
the statement to the database for precompilation". I don't understand the
concept of a prepared statements which isn't server-prepared - do you have
some sort of driver-only prepared statements, which

Regardless of any optimizations you may be doing, in every database driver
I've ever seen in my life, "prepared" simply means "server-prepared". And
in every driver I've ever seen, there's an explicit API for that. Therefore
server-prepare is something that's exposed to the developer, rather than
some internal detail left to the driver.

> Suppose backend can handle 20 server-prepared statements at most (if
using more it would run out of memory).
> Suppose an application has 100 statements with ".prepare()" call.
> I think it is reasonable for the DB driver to figure out which statements
are most important and server-prepare just "20 most important ones", > and
leave the rest 80 as regular non-prepared statements.

I seriously don't find it reasonable at all. LRU caching sometimes works
well, and sometimes works very badly. That depends on the application and
on the domain. Here's an example. Imagine two applications running against
the same pool (or two large components within the same application). Each
one has its set of "hot" statements, but unfortunately not all of them can
fit prepared statement quota (20 in your example). Each connection fills
the cache, ejecting the other component's prepared statements from the
cache. As we switch between the two components, statements have to be
reprepared.

Manual work here would allow picking truly hottest statements from each
app/component, and preparing those - allowing the hottest statements across
both components to always remain live. This is exactly what is meant by
"programmer knowledge" which the driver doesn't have - it only has its
recently-used logic which sometimes breaks down.

> Shay> I'm going to repeat what I said
> Shay> before and it would be good to get some reaction to this. Every
software
> Shay> component in the stack has a role, and maintaining those
separations is
> Shay> what keeps things simple and sane
>
> You might be missing my comments on CPU, x86, etc.
> My reaction is: almost every existing component is extremely hard to
> reason about.

> For instance: CPU has certain number of registers, it has certain
> amount of L1/L2/...
> caches and so on.
> Do you mean each and every developer should explicitly specify which
> program variable should use register and which one should go into L2
cache?

You keep making this mistaken analogy... Caches (and CPU reordering) are
invisible, prepared statements aren't - they can trigger bugs (because of
DDL), and much more importantly, they take up resources on the PostgreSQL
side. Period. The L2 cache is already there and takes no additional
resources whether you use it or not. It's invisible.

Re: [HACKERS] Slowness of extended protocol

Reply via email to