On 9 March 2016 at 04:12, Robert Haas <robertmh...@gmail.com> wrote:
> I think we have a general problem with the server lacking
> certain capabilities that make it easy to implement a high-quality
> JDBC driver. And I think it would be good to work on figuring out how
> to fix that.
There are a few frustrations, to be sure, but I'm not sure there's actually
a ton server-side that drastically limits the driver.
One of the worst problems (IMO) is in the driver architecture its self. It
attempts to prevent blocking by guestimating the server's send buffer state
and its recv buffer state, trying to stop them filling and causing the
server to block on writes. It should just avoid blocking on its own send
buffer, which it can control with confidence. Or use some of Java's rather
good concurrency/threading features to simultaneously consume data from the
receive buffer and write to the send buffer when needed, like pgjdbc-ng
does. This makes making use of the pipelining features in Pg's protocol way
harder and less efficient than it should be - but then, PgJDBC still does
this better than libpq, which can't pipeline queries at all.
There certainly are server/protocol frustrations.
QUERY CANCEL RACES
Query cancellation sucks badly. Not because it requires a new connection,
though that's unfortunate, but because cancel is backend-level not
statement-level. A statement cancellation key returned as an immediate
response to the Execute message would be quite handy, so we could include
it in cancel requests and eliminate the race by having the cancel request
be a no-op if the statement cancel key doesn't match the currently running
EARLY CONNECTION CHARSETS
There's no way to know the charset of early connection error messages,
which is a flaw in the protocol that isn't specific to PgJDBC its self.
Similarly, you can't specify the text encoding of usernames, passwords, etc
sent to the server.
We also have no way to set GUCs per-query, and we need it for
statement_timeout. I really wish Parse and Execute messages allowed
statement-scoped GUCs to be passed at the protocol level. This would be
very, very helpful. The driver can probably work around it by fetching and
then SETing statement_timeout, running the query, then reSETing it
afterwards in a piplelined set of queries, but .... yuck. Also, log spam
GENERATED KEYS AND RETURNING
To get generated keys we have to hack the statement text. There's no
protocol-level equivalent, like we have for row-count limits in the v3
protocol. The ability to specify the set of returned columns at the
protocol level would be very nice. That said, hacking the statement text
isn't *too* bad, mostly because few people are going to do their own
RETURNING statement *and* request generated keys from the driver, the only
time this becomes an issue.
STRING TYPE ISSUES
PgJDBC can work around Pg's IMO somewhat overzealous type checks for string
types by passing string parameters as being of unknown-type. The JDBC
interface offers us no easy way to differentiate between "this parameter is
a real textual value" and "this parameter is a string representation of
something that might be another type". We can do it with setObject and
extension class wrappers, but then the user has to import the JDBC driver's
classes directly, use PgJDBC-specific API, etc. The people who have the
most problem with our current behaviour are those least able to do that,
users who're behind a query generation layer or ORM. I'd like to just make
stringtype=unspecified the default in PgJDBC and be done with it; users can
still specify an explicit cast to 'text' in the SQL if they want
psqlODBC would benefit from protocol-level SAVEPOINT and ROLLBACK TO
SAVEPOINT, mostly to reduce logspam and parser overhead. PgJDBC would be
able to use this to emulate other DBMSes error handling behaviour too, when
requested by a client. (Yes, I know about the correctness and performance
issues, but you tell that to someone who just wants to Port Their Stuff
>From Oracle But Can't Change The Code).
server_version_num should be GUC_REPORT and it's really annoying that it
isn't. I never agreed with the arguments about why that wasn't changed, and
I still want it changed.
LOST TYPMOD, NULLABILITY INFO
The server throws away typmod and nullability knowledge as soon as you do
anything with a column. This is frustrating for the driver's metadata API
support. Having result columns marked non-null in Describe would be handy.
The protocol offers no way to lazily fetch large values like BYTEA. Many
vendors can fetch small results and return a handle that gets larger
results from the server on-demand. This means that many clients expect that
SELECT * FROM my_table_with_100MB_bytea_column;
will not fetch all those bytea values to the client until/unless they're
actually accessed. They don't have to generate new and different queries
each time. ORMs in particular benefit from this. Ideally we'd have the
protocol-level ability to return a handle to the relevant TOAST entry that
clients can then fetch using further protocol messages on-demand so long as
they're on the same session, haven't committed or rolled back, and haven't
run another statement. This would make working with big binary objects in
the DB considerably more practical.
I'm sure there are others I haven't remembered or run into in there too.
See https://wiki.postgresql.org/wiki/Todo#Wire_Protocol_Changes for some.
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services