Thanks for all the answers.

Tom:

> FWIW, I'd go with text results, especially if you already have code to
> deal with that.  PG's on-the-wire binary formats are more efficient to
> process in some absolute sense, but they're hardly free: you need to
> consider byte endianness for integers and floats, integer vs float
> encoding for timestamps, the difference between PG's timestamp
> representation and whatever native timestamps are on your platform,
> etc etc.  It's not a trivial amount of code to deal with.  And in the
> end I think the efficiency gain is pretty marginal compared to the raw
> costs of data transfer, especially if you're not on the same physical
> machine as the server.

In my mind data transfer was actually also a good reason to switch to
binary. I don't know how PG's binary timestamp looks like, but isn't it a
safe assumption that it's much more economical (and easy to parse) than any
textual representation? By the way, we already encode numbers and floats as
binary (an handle endianness) in certain contexts (parameters of prepared
statements). I don't underestimate the effort of binary implementation, but
if it's a one-time systematic effort it seems to be worth it?

> Having said that, there has been some talk of letting client libraries
> supply a "whitelist" of data types that they'd like to receive in binary.
> We could do that (modulo questions of whether it's worth incurring a
> protocol version change for), but I'm unclear on what the use case really
> is for that.  Wouldn't you then have to provide an inconsistent API to
> users of your driver, that is some things are presented in text and others
> not?  Is that really a great thing?

Whitelisting binary types would solve the issue entirely for us here. As a
driver, I wouldn't be exposing the whitelist to users in any way; I would
simply use it to tell Postgresql (on a session level ideally) which types
we want to receive as binary. The user would access the data in the usual
way, no perceivable API change as far as I can see.

Abhijit:

> If you're willing to hand the user an unprocessed string, why can't that
> be the binary encoding just as well as text?

I might be mistaken, but in the case of textual encoding I can just hand
over the text, as I got it from PG, to the user and let them deal with it.
With binary, I get a blob of bytes that has no meaning to anyone. I can
hand it over to the user, but they can't be expected to be able to do
anything with it...

Tom and Atri:

> TW, libpqtypes (http://libpqtypes.esilo.com) might be worth
> studying as well.  I've not used it myself, but it claims to
> offer datatype-extensible processing of binary formats.

Thanks for the suggestion, I'll take a look. Since we're a pure .NET
implementation actual use of libpqtypes won't be possible, but it's
definitely possible to learn here. Although given how the protocol
currently looks like, I can't really see what could be done to support
"magical" support of binary encoding of arbitrary, unknown types...?

Craig:

> Even if you can't get rid of text support, dropping simple query
> protocol support and the need to support client-side parameter binding
> may well be a pleasant improvement.

I definitely agree when it comes to dropping client-side parameter binding
(and there seems to be an agreement on that between the devs). But for the
case non-parameterized queries, there doesn't seem to be any benefit of
using the extended protocol over the simple one (if you're still doing
text), is there?

> It's also possible for a type not to have send/recv functions, i.e. to
> support text-only use.

In that case, what would be the behavior of selecting such a type with an
extended query that specifies "all results in binary"? A PG error?

> You could reasonably require that all user defined extension types must
> support binary I/O. This will probably be fine in practice. As you said,
> though, users would then have to install plugin for nPgSQL for each
> custom type they wished to use because nPgSQL won't otherwise know what
> to do with the binary data.

This is the only true remaining point of difficulty for me. We could bite
the bullet, sit down and implement binary for everything built-in; but
eliminating the free use of extensions seems like a no-go.

> That's pretty much what PgJDBC does, playing with extra_float_digits,
> client_encoding, TimeZone, etc.
> It's not lovely.

It definitely isn't... And we have user complaints on several counts as
well. The problem is that messing around with extra_float_digits,
lc_monetary and the rest also affect some Postgresql functions which do
text conversion... With extra_float_digits, "SELECT format('%s',
0.28::double precision)" returns 0.280000000000000027 rather than 0.28. The
point is that the hacks we're doing to support textual *wire* encoding also
impact non-wire functionality and affecting users in unwanted ways.

> I would like to be able to specify a top-level option at Bind/Execute
> time that asks the server to send binary for built-in types only, or for
> a list of oids that we register ourselves as understanding binary for at
> a session level. That would require a protocol change, though.

I absolutely agree something is missing. I'd go more with whitelisting
approach as Tom said above: a session-level list of OIDs for which binary
encoding is requested for all cases throughout the session. It would cause
somewhat less breakage - no actual message format is changed (although you
suddenly start to get binary returns values in "unexpected" places, i.e.
results of extended queries which requested "all text"). It would solve the
problem in an ideal way.

Shay

On Tue, Sep 30, 2014 at 7:46 AM, Craig Ringer <cr...@2ndquadrant.com> wrote:

> On 09/30/2014 01:09 PM, Shay Rojansky wrote:
>
> > The idea of using extended query protocol for non-prepared queries
> > raised another "best practices" question, which I'm hoping you can help
> > with. To get rid of text encoding (which is locale-dependent,
> > inefficient etc.) for certain fields, it seems that we have to get rid
> > of it for *all* fields. This is because we send queries without knowing
> > their result columns in advance, and would like to pack all messages
> > into a single packet to avoid roundtrips. In other words, when we send a
> > Bind message we don't yet have a RowDescription, and so we can' specify
> > on an individual columnar basis which fields should be returned as
> > binary and which as text. We're left with the option of either going
> > "full binary" or having to support text encoding for *all* fields - the
> > abolition of which was the whole point.
>
> Even if you can't get rid of text support, dropping simple query
> protocol support and the need to support client-side parameter binding
> may well be a pleasant improvement.
>
> > We can split the query process into two roundtrips - wait for the
> > RowDescription and only then send Bind - in order to pick-and-choose
> > text/binary on a columnar basis, but this is a non-trivial performance
> > hit which we want to avoid.
>
> It may be worth doing exactly this if you're doing large batches where
> it might be a real win to use as many binary parameters as possible, but
> I agree that you wouldn't want to do it for one-shot queries.
>
> > Regarding the option of going "full binary"; selecting all types from
> > pg_type shows hundreds of types. We'd have to provide binary
> > encode/decode implementation for most (?) of them.
>
> All of them except 'internal' and 'unknown', really. There are some you
> may not see in common queries, but you'll still run into them when
> you're looking at the system catalogs.
>
> Quite a few are binary compatible with each other, though, so you'll
> need fewer individual implementations than you might expect. Take a look
> at the castmethod in pg_cast to identify groups of binary compatible types.
>
> > In addition, say a
> > new type is added (via an extension for example); text encoding at least
> > had the advantage of not forcing us to support everything: the unknown
> > type would be transferred as text and we'd provide that text to the user
> > as an unprocessed string. Going full binary seems to eliminate this
> > possibility.
>
> It does.
>
> > I'd like to understand what we're supposed to do, as a Postgresql
> > driver. Are we supposed to:
> > 1) Go full binary and implement all types (but what to do about unknown
> > ones)?
>
> It's also possible for a type not to have send/recv functions, i.e. to
> support text-only use.
>
> From the docs
> (http://www.postgresql.org/docs/9.3/static/sql-createtype.html):
>
> "The support functions input_function and output_function are required,
> while the functions receive_function, send_function, ... are optional."
>
> However, no built-in type lack binary I/O functions.
>
> You could reasonably require that all user defined extension types must
> support binary I/O. This will probably be fine in practice. As you said,
> though, users would then have to install plugin for nPgSQL for each
> custom type they wished to use because nPgSQL won't otherwise know what
> to do with the binary data.
>
> > 2) Do two roundtrips for queries, thereby hurting performance?
> > 3) Support text encoding for all fields, and manage somehow with issues
> > such as locale variations (by forcing the locale to be culture
> > invariant, as we do now)?
>
> That's pretty much what PgJDBC does, playing with extra_float_digits,
> client_encoding, TimeZone, etc.
>
> It's not lovely.
>
> I would like to be able to specify a top-level option at Bind/Execute
> time that asks the server to send binary for built-in types only, or for
> a list of oids that we register ourselves as understanding binary for at
> a session level. That would require a protocol change, though.
>
> It might be worth taking some time to think about how we can help
> clients get best use out of the binary protocol and noting it on the
> TODO page for when we do protocol revision 4.
>
> --
>  Craig Ringer                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Reply via email to