[HACKERS] Fwd: Proper query implementation for Postgresql driver

Shay Rojansky Mon, 29 Sep 2014 22:10:35 -0700

Hi again, a few days ago I sent a question (below) about using the extended
query protocol efficiently (especially for non-prepared statements). Your
responses spawned a good discussion between the npgsql developers, it's
here if you wish to look/weigh in:
https://github.com/npgsql/npgsql/issues/370


The idea of using extended query protocol for non-prepared queries raised
another "best practices" question, which I'm hoping you can help with. To
get rid of text encoding (which is locale-dependent, inefficient etc.) for
certain fields, it seems that we have to get rid of it for *all* fields.
This is because we send queries without knowing their result columns in
advance, and would like to pack all messages into a single packet to avoid
roundtrips. In other words, when we send a Bind message we don't yet have a
RowDescription, and so we can' specify on an individual columnar basis
which fields should be returned as binary and which as text. We're left
with the option of either going "full binary" or having to support text
encoding for *all* fields - the abolition of which was the whole point.

We can split the query process into two roundtrips - wait for the
RowDescription and only then send Bind - in order to pick-and-choose
text/binary on a columnar basis, but this is a non-trivial performance hit
which we want to avoid.

Regarding the option of going "full binary"; selecting all types from
pg_type shows hundreds of types. We'd have to provide binary encode/decode
implementation for most (?) of them. In addition, say a new type is added
(via an extension for example); text encoding at least had the advantage of
not forcing us to support everything: the unknown type would be transferred
as text and we'd provide that text to the user as an unprocessed string.
Going full binary seems to eliminate this possibility.

I'd like to understand what we're supposed to do, as a Postgresql driver.
Are we supposed to:
1) Go full binary and implement all types (but what to do about unknown
ones)?
2) Do two roundtrips for queries, thereby hurting performance?
3) Support text encoding for all fields, and manage somehow with issues
such as locale variations (by forcing the locale to be culture invariant,
as we do now)?

Thanks again for your help!

Shay

---------- Forwarded message ----------
From: Shay Rojansky <[email protected]>
Date: Sun, Sep 28, 2014 at 11:53 AM
Subject: Proper query implementation for Postgresql driver
To: [email protected]


Hi.

I'm a core developer on npgsql (the Postgresql .NET provider), we're hard
at work on a major 3.0 version. I have a fundamental question that's been
bugging me for a while and would like to get your opinion on it.

Npgsql currently supports three basic query modes: simple, client-side
parameters and prepared. The first two use the Postgresql simple query flow
(client-side parameters means the user specifies parameters
programmatically, just like with prepared queries, but the actual
substitution work is done client-side). Prepared uses the Postgresql
extended query flow.

According to the Postgresql docs (49.2.2), the simple query flow, "the
format of the retrieved values is always text". This creates a burden where
npgsql needs to parse textual (and locale-specific!) info (e.g. dates,
money). The situation is even worse when doing client-side parameters,
since npgsql has to *create* textual representations that match what
Postgresql is expecting. The workaround for this issue up to now has been
to switch to culture-invariant formatting (e.g. lc_monetary=C), but this
approach imposes the setting on users and affects functions in ways they
don't necessarily want.

I would, in theory, love to switch the entire thing to binary and thereby
avoid all textual parsing once and for all. If I understand correctly, this
means all queries must be implemented as extended queries, with numerous
extra client-server roundtrips - which are a bit hard to stomach. Section
49.1.2 of the manual also states that the unnamed prepared statement and
portal are optimized for the case of executing a query only once, hinting
that this is the proper way to do things - but this optimization still
cannot not eliminate the extra roundtrips mentioned above (PREPARE, BIND,
EXECUTE).

Can someone please let me know what the recommended/best practice here
would be?

Thanks,

Shay

[HACKERS] Fwd: Proper query implementation for Postgresql driver

Reply via email to