Re: [HACKERS] Slowness of extended protocol

Shay Rojansky Tue, 16 Aug 2016 12:16:56 -0700

Halfway through this mail I suddenly understood something, please read all
the way down before responding...

On Tue, Aug 16, 2016 at 2:16 PM, Vladimir Sitnikov <
[email protected]> wrote:

> Shay> your analogy breaks down. Of course L2 was invented to improve
> performance,
> Shay> but that doesn't mean that all caches are the same. More precisely,
> what I
> Shay> find objectionable about your approach is not any caching - it's the
> Shay> implicit or automatic preparation of statements. This practice isn't
> Shay> invisible in that a) it may cause errors that wouldn't have been
> there
> Shay> otherwise (e.g. because of DDL),
>
> Long-lived named server-prepared statements cause problems even if
> server-prepared statements are created manually by developers.
>
> Could you please stop saying "automatic preparation causes ~DDL issues"?
>

I never said that... As I've said many times, the problem is errors caused
by something the user never asked for. If I server-prepare a statement and
then get an error, it's a result of my own action.

Shay> As I said above, I think this is a critical point of misunderstand
> between
> Shay> us. The developers tells the driver which statements should be
> Shay> server-prepared by calling .prepareStatement(). I'm guessing you
> have a
> Shay> totally different understanding here.
>
> Please, quote the document you got that "developers tell the driver which
> statements should be server-prepared by calling ..." from. It never
> works like that.
> Neither in Java, nor in C#. I would admit I've no C# experience, but I did
> find documentation on IDbCommand.Prepare() and examined it.
>

> The proper way to say is "by calling .Prepare() developer passes the
> intention that
> he might be using the same query multiple times".
> That is it. It never means "driver must absolutely use server-prepare
> in the response
> to .Prepare() call".
>
> The same goes for Java's PreparedStatement.
> It never means "the driver must use server-prepared features".
>
> As Microsoft lists in the .Prepare() documentation, modern versions of
> MSSQL just ignore .Prepare() and cache statements automatically.
>
> It is not a developer's business which statements should be in the
> database cache.
> Neither developer should care which statements reside in the driver cache.
>

I'm really baffled here.

First, I never said prepared statements *must* be server-prepared. You're
completely correct that databases APIs don't *require* this, because they
by definition cover many databases and drivers. In Sqlite there's no
server, so there can be no server-prepared statement.

However, where there *is* a server which supports prepared statements as an
optimization, it's completely unthinkable to me that a driver wouldn't
implement prepare as server-prepare. Nobody forces you to do it - it just
seems unthinkable to do otherwise. This reason for this is that if
server-prepared statements are supported by your database, we expect them
to be a significant optimization (otherwise why would they exist), and
therefore not using them when the user calls "prepare" seems like...
foolishness. In other words, whatever client-side "precompilation" or other
optimization is possible is probably going to be negligible when compared
to server-preparation (this seems to be the case with PostgreSQL at the
very least), so why *not* map the database API's prepare method to
server-prepared statements?

I'm going to requote the API note which you quoted above on
Connection.prepareStatement (
https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html):

> This method is optimized for handling parametric SQL statements that
benefit from precompilation. If the driver supports precompilation, the
method prepareStatement will send the statement to the database for
precompilation. Some drivers may not support precompilation.

Again, my understanding of English may be flawed, or maybe my logic
circuits are malfunctioning. But I read this the following way:
1. preparedStatement is about precompilation.
2. If a driver supports precompilation (i.e. preparation),
"prepareStatement will send the statement to the *database* for
precompilation". Note that the API explicitly mentioned sending *to the
database* - server preparation...
3. A driver may not support precompilation (i.e. preparation). This could
be because it simply hasn't implemented it yet, or because the backend
doesn't support it, or for any other reason. In this case it's a noop,
which doesn't really change anything in this discussion.

A compliant implementation (that is a driver) could just assemble full SQL
> by concatenating the parameters on each execution and send it via 'Q'
> simple
> execute message.
>

I think I may have understood the problem here - there's definitely a Java
vs. C# issue difference this conversation.

>From reading the Java docs, I now realize that JDBC only seems to support
parameters in prepared statements. In other words, the parameters
capability is coupled with the capability of precompilation, i.e. the
ability to execute the same query multiple times with better performance.
In ADO.NET things are different. You can use parameters on a statement
regardless of whether it's prepared or not. In other words, the *only* use
of preparation is to speed up frequently-used statements - this makes it
very natural to understand preparation as "server preparation".

So I now understand why in JDBC it could make sense for some prepared
statements to be server-prepared, and for others not to. The JDBC API
design seems very problematic to me - there are perfectly good reasons to
use parameters without preparation (prevent SQL injection) and also the
other way around, so coupling them doesn't seem like a good idea. But that
isn't really relevant to this conversation.

I hope this also makes you understand why, at least in ADO.NET, it makes
perfect sense to simply understand preparation as server-preparation - the
only reason prepare exists is to improve performance.

I'm going to skip most of the CPU reordering/caching arguments here as I
don't think they're relevant. Not everything is the same as CPU registers,
and we can find different behaviors depending on the components and layer
we choose to compare with. There are many scenarios in which programmers
are expected to specify caching explicitly. I'm going to leave this behind.

> 0) Microsoft's documentation on ".Prepare()" says "prepare call does not
> have
> any effect since sql server optimizes the statements automatically".
> So applications ported from MSSQL might miss ".Prepare()" altogether,
> thus it will
> under-perform and blame PostgreSQL for no reason.
> Does it sound fair? I do not think so.
>

Of course it's fair. They're porting code from one database to another,
it's their responsibility to do the work. There are many other aspects
besides preparation which would perform better in one database and worst in
another. SQL itself differs from database to database, so it's totally
unrealistic to expect the same code to simply run. The same goes for your
Oracle example.

Regarding "regardless...by CPU or not", you are off.
>
I've listed a link to a concurrency bug in Linux kernel.
> It manifested itself only on some recent Intel CPUs.
> In other words, the kernel was fine otherwise, but if running modern
> CPU, it might hang.
> Of course it was bug in the kernel, but the trigger condition was
> "smarter CPU" that
> was able to do more "reorderings" that were possible with older ones.
>

Again, as I wrote, the bug was already there. Reordering just made it
manifest - it could in theory have been manifested otherwise. You need to
separate between a problem the programmer is responsible for, and a problem
someone else is responsible for, regardless of why it's manifested.

Shay> It does only if you do it in a roundtrip of its own. When you close a
> Shay> pooled connection in Npgsql, the reset query is written to an
> internal
> Shay> buffer but not sent. The first query that actually gets sent by the
> user
> Shay> after opening will therefore have the reset query prepended to it
> Shay> (basically the reset query is batched)
>
> That is clever.
> Is it something specific to Npgsql pool? Does it work with all poolers
> that operate
> on top of Npgsql connections?
>

Thanks. It's specific to the Npgsql pool, which is also in-process. ADO.NET
tends heavily to in-process pools, and also couples the pool to the driver
- I'm not aware of driver-independent pools which you can use with any
driver, like in Java. Basically if you use Npgsql, you use the Npgsql pool.

Re: [HACKERS] Slowness of extended protocol

Reply via email to