Re: [HACKERS] Slowness of extended protocol

Vladimir Sitnikov Sat, 13 Aug 2016 04:01:06 -0700

Shay>I don't know much about the Java world, but both pgbouncer and pgpool
(the major pools?)


In Java world, https://github.com/brettwooldridge/HikariCP is a very good
connection pool.
Neither pgbouncer nor pgpool is required.
The architecture is:  application <=> HikariCP <=> pgjdbc <=> PostgreSQL

The idea is HikariCP pools pgjdbc connections, and server-prepared
statements are per-pgjdbc connection, so there is no reason to "discard
all" or "deallocate all" or whatever.

Shay>send DISCARD ALL by default. That is a fact, and it has nothing to do
with any bugs or issues pgbouncer may have.

That is configurable. If pgbouncer/pgpool defaults to "wrong
configuration", why should we (driver developers, backend developers) try
to accommodate that?

Shay> What? Do you mean you do implicit savepoints and autorollback too?

I mean that.
On top of that it enables opt-in psql-like ON_ERROR_ROLLBACK functionality
so it could automatically roll back the latest statement if it failed.
For instance, that might simplify migration from other DBs that have the
same "rollback just one statement on error" semantics by default.

Shay>How does the driver decide when to do a savepoint?

By default it would set a savepoint in a case when there's open
transaction, and the query to be executed has been previously described.

In other words, the default mode is to protect user from "cached plan
cannot change result type" error.

Shay>Is it on every single command?

Exactly (modulo the above). If user manually sets "autosave=always", every
command would get prepended with a savepoint and rolled back.

Shay>f you do a savepoint on every single command, that surely would impact
performance even without extra roundtrips...?

My voltmeter says me that the overhead is just 3us for a simple "SELECT"
statement (see
https://github.com/pgjdbc/pgjdbc/pull/477#issuecomment-239579547 ).
I think it is acceptable to have it enabled by default, however it would be
nice if the database did not require a savepoint dance to heal its "not
implemented" "cache query cannot change result type" error.


Shay>I'm not aware of other drivers that implicitly prepare statements,
Shay >and definitely of no drivers that internally create savepoints and
Shay> roll the back without explicit user APIs

Glad to hear that.
Are you aware of other drivers that translate "insert into table(a,b,c)
values(?,?,?)" =into=> "insert into table(a,b,c)
values(?,?,?),(?,?,?),...,(?,?,?)"
statement on the fly?

That gives 2-3x performance boost (that includes client result processing
as well!) for batched inserts since "bind/exec" message processing is not
that fast. That is why I say that investing into "bind/exec performance"
makes more sense than investing into "improved execution of non-prepared
statements".

Shay>you're doing something very different - not necessarily wrong - and not
Shay>attempt to impose your ideas on everyone as if it's the only true way
Shay>to write a db driver.

1) Feel free to pick ideas you like

2) I don't say "it is the only true way". I would change my mind if someone
would suggest better solution. Everybody makes mistakes, and I have no
problem with admitting that "I made a mistake" and moving forward.
They say: "Don't cling to a mistake just because you spent a lot of time
making it"

However, no-one did suggest a case when application issues lots of unique
SQL statements.
The suggested alternative "a new message for non-prepared extended query"
might shave 5-10us per query for those who are lazy to implement a query
cache. However, that is just 5-10ms per 1000 queries. Would that be
noticeable by the end-users? I don't think so.

Having a query cache can easily shave 5-10+ms for each query, that is why I
suggest driver developers to implement a query cache first, and only then
ask for new performance-related messages.

3) For "performance related" issues, a business case and a voltmeter is
required to prove there's an issue.


Shay>But the second query, which should be invalidated, has already been
Shay>sent to the server (because of batching), and boom

-- Doctor, it hurts me when I do that
-- Don't do that

When doing batched SQL, some of the queries might fail with "duplicate
key", or "constraint violation". There's already API that covers those kind
of cases and reports which statements did succeed (if any).
In the case as you described (one query in a batch somehow invalidates the
subsequent one) it would just resort to generic error handling.


Shay>When would you send this ValidatePreparedStatement?
Shay>Before each execution as a separate roundtrip?
Shay>That would kill performance.

Why do you think the performance would degrade? Once again: the current
problem is the client thinks it knows "which columns will be returned by a
particular server-prepared statement" but the table might get change behind
the scenes (e.g. online schema upgrade), so the error occurs. That "return
type" is already validated by the database (the time is already spent on
validation), so there's is virtually no additional work.

So it could be fine if a separate validate message was invented or just
some connection property that would make "cached query cannot change result
type" a non-fatal error (as in "it would not kill the transaction").

For instance: "in case of cached query cannot change result type error the
backend would skip processing the subsequent bind/exec kind of messages for
the problematic statement, and it will keep the transaction in open state.
The client would have to re-send relevant prepare/bind/execute". That would
prevent user from receiving the error and that approach does not require
savepoints.

This particular topic "how to fix <<cannot change result type>> error"
might be better to be split into its own conversation.

Vladimir

Re: [HACKERS] Slowness of extended protocol

Reply via email to