Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Peter Eisentraut
On tis, 2012-01-24 at 20:13 -0500, Tom Lane wrote:
 Yeah.  In both cases, the (proposed) new output format is
 self-identifying *to clients that know what to look for*.
 Unfortunately it would only be the most anally-written pre-existing
 client code that would be likely to spit up on the unexpected
 variations.  What's much more likely to happen, and did happen in the
 bytea case, is silent data corruption. 

The problem in the bytea case is that the client libraries are written
to ignore encoding errors.  No amount of protocol versioning will help
you in that case.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Marko Kreen
On Tue, Jan 24, 2012 at 09:33:52PM -0500, Robert Haas wrote:
 Furthermore, while we haven't settled the question of exactly what a
 good negotiation facility would look like, we seem to agree that a GUC
 isn't it.  I think that means this isn't going to happen for 9.2, so
 we should mark this patch Returned with Feedback and return to this
 topic for 9.3.

Simply extending the text/bin flags should be quite
uncontroversial first step.  How to express the
capability in startup packet, I leave to others to decide.

But my proposal would be following:

bit 0 : text/bin
bit 1..15 : format version number, maps to best formats in some
Postgres version.  

It does not solve the resultset problem, where I'd like to say
gimme well-known types in optimal representation, others in text.  
I don't know the perfect solution for that, but I suspect the
biggest danger here is the urge to go to maximal complexity
immediately.  So perhaps the good idea would simply give one
additional bit (0x8000?) in result flag to say that only
well-known types should be optimized.  That should cover 95%
of use-cases, and we can design more flexible packet format
when we know more about actual needs.

libpq suggestions:

  PQsetformatcodes(bool)
only if its called with TRUE, it starts interpreting
text/bin codes as non-bools.  IOW, we will be compatible
with old code using -1 as TRUE.

protocol suggestions:

  On startup server sends highest supported text/bin codes,
  and gives error if finds higher code than supported.
  Poolers/proxies with different server versions in pool
  will simply give lowest common code out.


Small QA, to put obvious aspects into writing
--

* Does that mean we need to keep old formats around infinitely?

Yes.On-wire formats have *much* higher visibility than
on-disk formats.  Also, except some basic types they are
not parsed in adapters, but in client code.  Libpq offers
least help in that respect.

Basically - changing on-wire formatting is big deal,
don't do it willy-nilly.


* Does that mean we cannot turn on new formats automatically?

Yes.  Should be obvious..


-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 On Tue, Jan 24, 2012 at 09:33:52PM -0500, Robert Haas wrote:
 Furthermore, while we haven't settled the question of exactly what a
 good negotiation facility would look like, we seem to agree that a GUC
 isn't it.  I think that means this isn't going to happen for 9.2, so
 we should mark this patch Returned with Feedback and return to this
 topic for 9.3.

 Simply extending the text/bin flags should be quite
 uncontroversial first step.  How to express the
 capability in startup packet, I leave to others to decide.

 But my proposal would be following:

 bit 0 : text/bin
 bit 1..15 : format version number, maps to best formats in some
 Postgres version.  

 It does not solve the resultset problem, where I'd like to say
 gimme well-known types in optimal representation, others in text.  
 I don't know the perfect solution for that, but I suspect the
 biggest danger here is the urge to go to maximal complexity
 immediately.  So perhaps the good idea would simply give one
 additional bit (0x8000?) in result flag to say that only
 well-known types should be optimized.  That should cover 95%
 of use-cases, and we can design more flexible packet format
 when we know more about actual needs.

Huh?  How can that work?  If we decide to change the representation of
some other well known type, say numeric, how do we decide whether a
client setting that bit is expecting that change or not?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Marko Kreen
On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:
 Marko Kreen mark...@gmail.com writes:
  On Tue, Jan 24, 2012 at 09:33:52PM -0500, Robert Haas wrote:
  Furthermore, while we haven't settled the question of exactly what a
  good negotiation facility would look like, we seem to agree that a GUC
  isn't it.  I think that means this isn't going to happen for 9.2, so
  we should mark this patch Returned with Feedback and return to this
  topic for 9.3.
 
  Simply extending the text/bin flags should be quite
  uncontroversial first step.  How to express the
  capability in startup packet, I leave to others to decide.
 
  But my proposal would be following:
 
  bit 0 : text/bin
  bit 1..15 : format version number, maps to best formats in some
  Postgres version.  
 
  It does not solve the resultset problem, where I'd like to say
  gimme well-known types in optimal representation, others in text.  
  I don't know the perfect solution for that, but I suspect the
  biggest danger here is the urge to go to maximal complexity
  immediately.  So perhaps the good idea would simply give one
  additional bit (0x8000?) in result flag to say that only
  well-known types should be optimized.  That should cover 95%
  of use-cases, and we can design more flexible packet format
  when we know more about actual needs.
 
 Huh?  How can that work?  If we decide to change the representation of
 some other well known type, say numeric, how do we decide whether a
 client setting that bit is expecting that change or not?

It sets that bit *and* version code - which means that it is
up-to-date with all well-known type formats in that version.

The key here is to sanely define the well-known types
and document them, so clients can be uptodate with them.

Variants:
- All built-in and contrib types in some Postgres version
- All built-in types in some Postgres version
- Most common types (text, numeric, bytes, int, float, bool, ..)

Also, as we have only one bit, the set of types cannot be
extended.  (Unless we provide more bits for that, but that
may get too confusing?)


Basically, I see 2 scenarios here:

1) Client knows the result types and can set the
text/bin/version code safely, without further restrictions.

2) There is generic framework, that does not know query contents
but can be expected to track Postgres versions closely.
Such framework cannot say binary for results safely,
but *could* do it for some well-defined subset of types.


Ofcourse it may be that 2) is not worth supporting, as
frameworks can throw errors on their own if they find
format that they cannot parse.  Then the user needs
to either register their own parser, or simply turn off
optmized formats to get the plain-text values.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:
 Huh?  How can that work?  If we decide to change the representation of
 some other well known type, say numeric, how do we decide whether a
 client setting that bit is expecting that change or not?

 It sets that bit *and* version code - which means that it is
 up-to-date with all well-known type formats in that version.

Then why bother with the bit in the format code?  If you've already done
some other negotiation to establish what datatype formats you will
accept, this doesn't seem to be adding any value.

 Basically, I see 2 scenarios here:

 1) Client knows the result types and can set the
 text/bin/version code safely, without further restrictions.

 2) There is generic framework, that does not know query contents
 but can be expected to track Postgres versions closely.
 Such framework cannot say binary for results safely,
 but *could* do it for some well-defined subset of types.

The hole in approach (2) is that it supposes that the client side knows
the specific datatypes in a query result in advance.  While this is
sometimes workable for application-level code that knows what query it's
issuing, it's really entirely untenable for a framework or library.
The only way that a framework can deal with arbitrary queries is to
introduce an extra round trip (Describe step) to see what datatypes
the query will produce so it can decide what format codes to issue
... and that will pretty much eat up any time savings you might get
from a more efficient representation.

You really want to do the negotiation once, at connection setup, and
then be able to process queries without client-side prechecking of what
data types will be sent back.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Robert Haas
On Wed, Jan 25, 2012 at 11:40 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:
 Huh?  How can that work?  If we decide to change the representation of
 some other well known type, say numeric, how do we decide whether a
 client setting that bit is expecting that change or not?

 It sets that bit *and* version code - which means that it is
 up-to-date with all well-known type formats in that version.

 Then why bother with the bit in the format code?  If you've already done
 some other negotiation to establish what datatype formats you will
 accept, this doesn't seem to be adding any value.

 Basically, I see 2 scenarios here:

 1) Client knows the result types and can set the
 text/bin/version code safely, without further restrictions.

 2) There is generic framework, that does not know query contents
 but can be expected to track Postgres versions closely.
 Such framework cannot say binary for results safely,
 but *could* do it for some well-defined subset of types.

 The hole in approach (2) is that it supposes that the client side knows
 the specific datatypes in a query result in advance.  While this is
 sometimes workable for application-level code that knows what query it's
 issuing, it's really entirely untenable for a framework or library.
 The only way that a framework can deal with arbitrary queries is to
 introduce an extra round trip (Describe step) to see what datatypes
 the query will produce so it can decide what format codes to issue
 ... and that will pretty much eat up any time savings you might get
 from a more efficient representation.

 You really want to do the negotiation once, at connection setup, and
 then be able to process queries without client-side prechecking of what
 data types will be sent back.

What might work is for clients to advertise a list of capability
strings, like compact_array_format, at connection startup time.  The
server can then adjust its behavior based on that list.  But the
problem with that is that as we make changes to the wire protocol, the
list of capabilities clients need to advertise could get pretty long
in a hurry.  A simpler alternative is to have the client send a server
version along with the initial connection attempt and have the server
do its best not to use any features that weren't present in that
server version - but that seems to leave user-defined types out in the
cold.

I reiterate my previous view that we don't have time to engineer a
good solution to this problem right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Marko Kreen
On Wed, Jan 25, 2012 at 11:40:28AM -0500, Tom Lane wrote:
 Marko Kreen mark...@gmail.com writes:
  On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:
  Huh?  How can that work?  If we decide to change the representation of
  some other well known type, say numeric, how do we decide whether a
  client setting that bit is expecting that change or not?
 
  It sets that bit *and* version code - which means that it is
  up-to-date with all well-known type formats in that version.
 
 Then why bother with the bit in the format code?  If you've already done
 some other negotiation to establish what datatype formats you will
 accept, this doesn't seem to be adding any value.

The other negotiation is done via Postgres release notes...

I specifically want to avoid any sort of per-connection
negotation, except the max format version supported,
because it will mess up multiplexed usage of single connection.
Then they need to either disabled advanced formats completely,
or still do it per-query somehow (via GUCs?) which is mess.

Also I don't see any market for flexible negotations,
instead I see that people want 2 things:

- Updated formats are easily available
- Old apps not to break

I might be mistaken here, then please correct me,
but currently I'm designing for simplicity.

  Basically, I see 2 scenarios here:
 
  1) Client knows the result types and can set the
  text/bin/version code safely, without further restrictions.
 
  2) There is generic framework, that does not know query contents
  but can be expected to track Postgres versions closely.
  Such framework cannot say binary for results safely,
  but *could* do it for some well-defined subset of types.
 
 The hole in approach (2) is that it supposes that the client side knows
 the specific datatypes in a query result in advance.  While this is
 sometimes workable for application-level code that knows what query it's
 issuing, it's really entirely untenable for a framework or library.

No, the list of well-known types is documented and fixed.
The bit is specifically for frameworks, so that they can say
I support all well-known types in Postgres version X.Y.

Note I said that the list cannot be extended but that is wrong.
When this bit and version code are taken together, it clearly
defines list as in version X.Y.  So considering that
client should not send any higher version than server supports,
it means server always knows what list client refers to.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 On Wed, Jan 25, 2012 at 11:40:28AM -0500, Tom Lane wrote:
 Then why bother with the bit in the format code?  If you've already done
 some other negotiation to establish what datatype formats you will
 accept, this doesn't seem to be adding any value.

 The other negotiation is done via Postgres release notes...

That is really not going to work if the requirement is to not break old
apps.  They haven't read the release notes.

 I specifically want to avoid any sort of per-connection
 negotation, except the max format version supported,
 because it will mess up multiplexed usage of single connection.
 Then they need to either disabled advanced formats completely,
 or still do it per-query somehow (via GUCs?) which is mess.

Hmm, that adds yet another level of not-obvious-how-to-meet requirement.
I tend to concur with Robert that we are not close to a solution.

 No, the list of well-known types is documented and fixed.
 The bit is specifically for frameworks, so that they can say
 I support all well-known types in Postgres version X.Y.

So in other words, if we have a client that contains a framework that
knows about version N, and we connect it up to a server that speaks
version N+1, it suddenly loses the ability to use any version-N
optimizations?  That does not meet my idea of not breaking old apps.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Merlin Moncure
On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen mark...@gmail.com wrote:
 I specifically want to avoid any sort of per-connection
 negotation, except the max format version supported,
 because it will mess up multiplexed usage of single connection.
 Then they need to either disabled advanced formats completely,
 or still do it per-query somehow (via GUCs?) which is mess.

Being able to explicitly pick format version other than the one the
application was specifically written against adds a lot of complexity
and needs to be justified.  Maybe you're trying to translate data
between two differently versioned servers?  I'm trying to understand
the motive behind your wanting finer grained control of picking format
version...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Marko Kreen
On Wed, Jan 25, 2012 at 12:58:15PM -0500, Tom Lane wrote:
 Marko Kreen mark...@gmail.com writes:
  On Wed, Jan 25, 2012 at 11:40:28AM -0500, Tom Lane wrote:
  Then why bother with the bit in the format code?  If you've already done
  some other negotiation to establish what datatype formats you will
  accept, this doesn't seem to be adding any value.
 
  The other negotiation is done via Postgres release notes...
 
 That is really not going to work if the requirement is to not break old
 apps.  They haven't read the release notes.

Yes, but they also keep requesting the old formats so everything is fine?
Note that formats are under full control of client, server has no way
to send newer formats to client that has not requested it.

  I specifically want to avoid any sort of per-connection
  negotation, except the max format version supported,
  because it will mess up multiplexed usage of single connection.
  Then they need to either disabled advanced formats completely,
  or still do it per-query somehow (via GUCs?) which is mess.
 
 Hmm, that adds yet another level of not-obvious-how-to-meet requirement.
 I tend to concur with Robert that we are not close to a solution.

Well, my simple scheme seems to work fine with such requirement.

[My scheme - client-supplied 16bit type code is only thing
that decides format.]

  No, the list of well-known types is documented and fixed.
  The bit is specifically for frameworks, so that they can say
  I support all well-known types in Postgres version X.Y.
 
 So in other words, if we have a client that contains a framework that
 knows about version N, and we connect it up to a server that speaks
 version N+1, it suddenly loses the ability to use any version-N
 optimizations?  That does not meet my idea of not breaking old apps.

That is up to Postgres maintainers to decide, whether they want
to phase out some type from the list.  But my main point was
it's OK to add types into list.  I missed that aspect on my
previous mail.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Marko Kreen
On Wed, Jan 25, 2012 at 12:54:00PM -0600, Merlin Moncure wrote:
 On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen mark...@gmail.com wrote:
  I specifically want to avoid any sort of per-connection
  negotation, except the max format version supported,
  because it will mess up multiplexed usage of single connection.
  Then they need to either disabled advanced formats completely,
  or still do it per-query somehow (via GUCs?) which is mess.
 
 Being able to explicitly pick format version other than the one the
 application was specifically written against adds a lot of complexity
 and needs to be justified.  Maybe you're trying to translate data
 between two differently versioned servers?  I'm trying to understand
 the motive behind your wanting finer grained control of picking format
 version...

You mean if client has written with version N formats, but connects
to server with version N-1 formats?  True, simply not supporting
such case simplifies client-side API.

But note that it does not change anything on protocol level, it's purely
client-API specific.  It may well be that some higher-level APIs
(JDBC, Npgsql, Psycopg) may support such downgrade, but with lower-level
API-s (raw libpq), it may be optional whether the client wants to
support such usage or not.

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Merlin Moncure
On Wed, Jan 25, 2012 at 1:24 PM, Marko Kreen mark...@gmail.com wrote:
 On Wed, Jan 25, 2012 at 12:54:00PM -0600, Merlin Moncure wrote:
 On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen mark...@gmail.com wrote:
  I specifically want to avoid any sort of per-connection
  negotation, except the max format version supported,
  because it will mess up multiplexed usage of single connection.
  Then they need to either disabled advanced formats completely,
  or still do it per-query somehow (via GUCs?) which is mess.

 Being able to explicitly pick format version other than the one the
 application was specifically written against adds a lot of complexity
 and needs to be justified.  Maybe you're trying to translate data
 between two differently versioned servers?  I'm trying to understand
 the motive behind your wanting finer grained control of picking format
 version...

 You mean if client has written with version N formats, but connects
 to server with version N-1 formats?  True, simply not supporting
 such case simplifies client-side API.

 But note that it does not change anything on protocol level, it's purely
 client-API specific.  It may well be that some higher-level APIs
 (JDBC, Npgsql, Psycopg) may support such downgrade, but with lower-level
 API-s (raw libpq), it may be optional whether the client wants to
 support such usage or not.

well, I see the following cases:
1) Vserver  Vapplication: server downgrades wire formats to
applications version
2) Vapplication  Vlibpq  Vserver: since the application is
reading/writing formats the server can't understand, an error should
be raised if they are used in either direction
3) Vlibpq = VApplication  Vserver: same as above, but libpq can
'upconvert' low version wire format to application's wire format or
error otherwise.

By far, the most common cause of problems (both in terms of severity
and frequency) is case #1.  #3 allows a 'compatibility mode' via
libpq, but that comes at significant cost of complexity since libpq
needs to be able to translate wire formats up (but not down).  #2/3 is
a less common problem though as it's more likely the application can
be adjusted to get up to speed: so to keep things simple we can maybe
just error out in those scenarios.

In the database, we need to maintain outdated send/recv functions
basically forever and as much as possible try and translate old wire
format data to and from newer backend structures (maybe in very
specific cases that will be impossible such that the application is
SOL, but that should be rare).  All send/recv functions, including
user created ones need to be stamped with a version token (database
version?).  With the versions of the application, libpq, and all
server functions, we can determine all wire formats as long as we
assume the application's targeted database version represents all the
wire formats it was using.

My good ideas stop there: the exact mechanics of how the usable set of
functions are determined, how exactly the adjusted type look ups will
work, etc. would all have to be sorted out.  Most of the nastier parts
though (protocol changes notwithstanding) are not in libpq, but the
server.  There's just no quick fix on the client side I can see.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Marko Kreen
On Wed, Jan 25, 2012 at 01:43:03PM -0600, Merlin Moncure wrote:
 On Wed, Jan 25, 2012 at 1:24 PM, Marko Kreen mark...@gmail.com wrote:
  On Wed, Jan 25, 2012 at 12:54:00PM -0600, Merlin Moncure wrote:
  On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen mark...@gmail.com wrote:
   I specifically want to avoid any sort of per-connection
   negotation, except the max format version supported,
   because it will mess up multiplexed usage of single connection.
   Then they need to either disabled advanced formats completely,
   or still do it per-query somehow (via GUCs?) which is mess.
 
  Being able to explicitly pick format version other than the one the
  application was specifically written against adds a lot of complexity
  and needs to be justified.  Maybe you're trying to translate data
  between two differently versioned servers?  I'm trying to understand
  the motive behind your wanting finer grained control of picking format
  version...
 
  You mean if client has written with version N formats, but connects
  to server with version N-1 formats?  True, simply not supporting
  such case simplifies client-side API.
 
  But note that it does not change anything on protocol level, it's purely
  client-API specific.  It may well be that some higher-level APIs
  (JDBC, Npgsql, Psycopg) may support such downgrade, but with lower-level
  API-s (raw libpq), it may be optional whether the client wants to
  support such usage or not.
 
 well, I see the following cases:
 1) Vserver  Vapplication: server downgrades wire formats to
 applications version
 2) Vapplication  Vlibpq  Vserver: since the application is
 reading/writing formats the server can't understand, an error should
 be raised if they are used in either direction
 3) Vlibpq = VApplication  Vserver: same as above, but libpq can
 'upconvert' low version wire format to application's wire format or
 error otherwise.

I don't see why you special-case libpq here.  There is no reason
libpq cannot pass older/newer formats through.  Only thing that
matters it parser/formatter version.  If that is done in libpq,
then app version does not matter.  If it's done in app, then
libpq version does not matter.

 By far, the most common cause of problems (both in terms of severity
 and frequency) is case #1.  #3 allows a 'compatibility mode' via
 libpq, but that comes at significant cost of complexity since libpq
 needs to be able to translate wire formats up (but not down).  #2/3 is
 a less common problem though as it's more likely the application can
 be adjusted to get up to speed: so to keep things simple we can maybe
 just error out in those scenarios.

I don't like the idea of conversion.  Instead either client
writes values through API that picks format based on server version,
or it writes them for specific version only.  In latter case it cannot
work with older server.  Unless the fixed version is the baseline.

 In the database, we need to maintain outdated send/recv functions
 basically forever and as much as possible try and translate old wire
 format data to and from newer backend structures (maybe in very
 specific cases that will be impossible such that the application is
 SOL, but that should be rare).  All send/recv functions, including
 user created ones need to be stamped with a version token (database
 version?).  With the versions of the application, libpq, and all
 server functions, we can determine all wire formats as long as we
 assume the application's targeted database version represents all the
 wire formats it was using.
 
 My good ideas stop there: the exact mechanics of how the usable set of
 functions are determined, how exactly the adjusted type look ups will
 work, etc. would all have to be sorted out.  Most of the nastier parts
 though (protocol changes notwithstanding) are not in libpq, but the
 server.  There's just no quick fix on the client side I can see.

It does not need to be complex - just bring the version number to
i/o function and let it decide whether it cares about it or not.
Most functions will not..  Only those that we want to change in
compatible manner need to look at it.

But I don't see that there is danger of having regular changes in wire
formats.  So most of the functions will ignore the versioning.
Including the ones that don't care about compatibility.

But seriously - on-wire compatibility is good thing, do not fear it...

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Merlin Moncure
On Wed, Jan 25, 2012 at 2:29 PM, Marko Kreen mark...@gmail.com wrote:
 well, I see the following cases:
 1) Vserver  Vapplication: server downgrades wire formats to
 applications version
 2) Vapplication  Vlibpq  Vserver: since the application is
 reading/writing formats the server can't understand, an error should
 be raised if they are used in either direction
 3) Vlibpq = VApplication  Vserver: same as above, but libpq can
 'upconvert' low version wire format to application's wire format or
 error otherwise.

 I don't see why you special-case libpq here.  There is no reason
 libpq cannot pass older/newer formats through.  Only thing that
 matters it parser/formatter version.  If that is done in libpq,
 then app version does not matter.  If it's done in app, then
 libpq version does not matter.

Only because if the app is targeting wire format N, but the server can
only handle N-1, libpq has the opportunity to fix it up.  That's could
be just over thinking it though.

 By far, the most common cause of problems (both in terms of severity
 and frequency) is case #1.  #3 allows a 'compatibility mode' via
 libpq, but that comes at significant cost of complexity since libpq
 needs to be able to translate wire formats up (but not down).  #2/3 is
 a less common problem though as it's more likely the application can
 be adjusted to get up to speed: so to keep things simple we can maybe
 just error out in those scenarios.

 I don't like the idea of conversion.  Instead either client
 writes values through API that picks format based on server version,
 or it writes them for specific version only.  In latter case it cannot
 work with older server.  Unless the fixed version is the baseline.

ok.  another point about that: libpq isn't really part of the solution
anyways since there are other popular fully native protocol consumers,
including (and especially) jdbc, but also python, node.js etc etc.

that's why I was earlier insisting on a protocol bump, so that we
could in the new protocol force application version to be advertised.
v3 would remain caveat emptor for wire formats but v4 would not.

 In the database, we need to maintain outdated send/recv functions
 basically forever and as much as possible try and translate old wire
 format data to and from newer backend structures (maybe in very
 specific cases that will be impossible such that the application is
 SOL, but that should be rare).  All send/recv functions, including
 user created ones need to be stamped with a version token (database
 version?).  With the versions of the application, libpq, and all
 server functions, we can determine all wire formats as long as we
 assume the application's targeted database version represents all the
 wire formats it was using.

 My good ideas stop there: the exact mechanics of how the usable set of
 functions are determined, how exactly the adjusted type look ups will
 work, etc. would all have to be sorted out.  Most of the nastier parts
 though (protocol changes notwithstanding) are not in libpq, but the
 server.  There's just no quick fix on the client side I can see.

 It does not need to be complex - just bring the version number to
 i/o function and let it decide whether it cares about it or not.
 Most functions will not..  Only those that we want to change in
 compatible manner need to look at it.

well, maybe instead of passing version number around, the server
installs the proper compatibility send/recv functions just once on
session start up so your code isn't littered with stuff like
if(version  n) do this; else do this;?

 But seriously - on-wire compatibility is good thing, do not fear it...

sure -- but for postgres I just don't think it's realistic, especially
for the binary wire formats.  a json based data payload could give it
to you (and I'm only half kidding) :-).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Marko Kreen
On Wed, Jan 25, 2012 at 02:50:09PM -0600, Merlin Moncure wrote:
 On Wed, Jan 25, 2012 at 2:29 PM, Marko Kreen mark...@gmail.com wrote:
  well, I see the following cases:
  1) Vserver  Vapplication: server downgrades wire formats to
  applications version
  2) Vapplication  Vlibpq  Vserver: since the application is
  reading/writing formats the server can't understand, an error should
  be raised if they are used in either direction
  3) Vlibpq = VApplication  Vserver: same as above, but libpq can
  'upconvert' low version wire format to application's wire format or
  error otherwise.
 
  I don't see why you special-case libpq here.  There is no reason
  libpq cannot pass older/newer formats through.  Only thing that
  matters it parser/formatter version.  If that is done in libpq,
  then app version does not matter.  If it's done in app, then
  libpq version does not matter.
 
 Only because if the app is targeting wire format N, but the server can
 only handle N-1, libpq has the opportunity to fix it up.  That's could
 be just over thinking it though.

I think it's over thinking.  The value should be formatted/parsed just
once.  Server side must support processing different versions.
Whether client side supports downgrading, it's up to client-side
programmers.

If you want to write compatible client, you have a choice of using
proper wrapper API, or simply writing baseline formatting, ignoring
format changes in new versions.

Both are valid approaches and I think we should keep it that way.

  By far, the most common cause of problems (both in terms of severity
  and frequency) is case #1.  #3 allows a 'compatibility mode' via
  libpq, but that comes at significant cost of complexity since libpq
  needs to be able to translate wire formats up (but not down).  #2/3 is
  a less common problem though as it's more likely the application can
  be adjusted to get up to speed: so to keep things simple we can maybe
  just error out in those scenarios.
 
  I don't like the idea of conversion.  Instead either client
  writes values through API that picks format based on server version,
  or it writes them for specific version only.  In latter case it cannot
  work with older server.  Unless the fixed version is the baseline.
 
 ok.  another point about that: libpq isn't really part of the solution
 anyways since there are other popular fully native protocol consumers,
 including (and especially) jdbc, but also python, node.js etc etc.
 
 that's why I was earlier insisting on a protocol bump, so that we
 could in the new protocol force application version to be advertised.
 v3 would remain caveat emptor for wire formats but v4 would not.

We can bump major/minor anyway to inform clients about new
functionality.  I don't particularly care about that.  What
I'm interested in is what the actual type negotation looks like.

It might be possible we could get away without bumpping anything.
But I have not thought about that angle too deeply yet.

  In the database, we need to maintain outdated send/recv functions
  basically forever and as much as possible try and translate old wire
  format data to and from newer backend structures (maybe in very
  specific cases that will be impossible such that the application is
  SOL, but that should be rare).  All send/recv functions, including
  user created ones need to be stamped with a version token (database
  version?).  With the versions of the application, libpq, and all
  server functions, we can determine all wire formats as long as we
  assume the application's targeted database version represents all the
  wire formats it was using.
 
  My good ideas stop there: the exact mechanics of how the usable set of
  functions are determined, how exactly the adjusted type look ups will
  work, etc. would all have to be sorted out.  Most of the nastier parts
  though (protocol changes notwithstanding) are not in libpq, but the
  server.  There's just no quick fix on the client side I can see.
 
  It does not need to be complex - just bring the version number to
  i/o function and let it decide whether it cares about it or not.
  Most functions will not..  Only those that we want to change in
  compatible manner need to look at it.
 
 well, maybe instead of passing version number around, the server
 installs the proper compatibility send/recv functions just once on
 session start up so your code isn't littered with stuff like
 if(version  n) do this; else do this;?

Seems confusing.  Note that type i/o functions are user-callable.
How should they act then?

Also note that if()s are needed only for types that want to change their
on-wire formatting.  Considering the mess incompatible on-wire format change
can cause, it's good price to pay.

  But seriously - on-wire compatibility is good thing, do not fear it...
 
 sure -- but for postgres I just don't think it's realistic, especially
 for the binary wire formats.  a json based data payload could give it
 to you (and I'm only half 

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-25 Thread Mikko Tiihonen

On 01/25/2012 06:40 PM, Tom Lane wrote:

Marko Kreenmark...@gmail.com  writes:

On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:

Huh?  How can that work?  If we decide to change the representation of
some other well known type, say numeric, how do we decide whether a
client setting that bit is expecting that change or not?



It sets that bit *and* version code - which means that it is
up-to-date with all well-known type formats in that version.


Then why bother with the bit in the format code?  If you've already done
some other negotiation to establish what datatype formats you will
accept, this doesn't seem to be adding any value.


Basically, I see 2 scenarios here:



1) Client knows the result types and can set the
text/bin/version code safely, without further restrictions.



2) There is generic framework, that does not know query contents
but can be expected to track Postgres versions closely.
Such framework cannot say binary for results safely,
but *could* do it for some well-defined subset of types.


The hole in approach (2) is that it supposes that the client side knows
the specific datatypes in a query result in advance.  While this is
sometimes workable for application-level code that knows what query it's
issuing, it's really entirely untenable for a framework or library.
The only way that a framework can deal with arbitrary queries is to
introduce an extra round trip (Describe step) to see what datatypes
the query will produce so it can decide what format codes to issue
... and that will pretty much eat up any time savings you might get
from a more efficient representation.


This is pretty much what jdbc driver already does, since it does not have
100% coverage of even current binary formats. First time you execute a
query it requests text encoding, but caches the Describe results. Next
time it sets the binary bits on all return columns that it knows how to
decode.


You really want to do the negotiation once, at connection setup, and
then be able to process queries without client-side prechecking of what
data types will be sent back.


I think my original minor_version patch tried to do that. It introduced a
per-connection setting for version. Server GUC_REPORTED the maximum supported
minor_version but defaulted to the baseline wire format.
The jdbc client could bump the minor_version to supported higher
value (error if value larger than what server advertised).

A way was provided for the application using jdbc driver to
override the requested minor_version in the rare event that something
broke (rare, because jdbc driver generally does not expose the
wire-encoding to applications).

Now if pgbounce and other pooling solutions would reset the minor_version
to 0 then it should work.

Scenarios where other end is too old to know about the minor_version:
VserverVlibpq  = client does nothing - use baseline version
VlibpqVserver  = no supported_minor_version in GUC_REPORT - use baseline

Normal 9.2+ scenarios:
VserverVlibpq  = libpg sets minor_version to largest that is supports
   - libpq requested version used
VlibpqVserver  = libpg notices that server supported value is lower than
   its so it sets minor_version to server supported value
   - server version used

For perl driver that exposes the wire format to application by default
I can envision that the driver needs to add a new API that applications
need to use to explicitly bump the minor_version up instead of defaulting
to the largest supported by the driver as in jdbc/libpg.

The reason why I proposed a incrementing minor_version instead of bit flags
of new encodings was that it takes less space and is easier to document and
understand so that exposing it to applications is possible.

But how to handle postgres extensions that change their wire-format?
Maybe we do need to have oid:minor_version,oid:ver,oid_ver as the
negotiated version variable?

-Mikko

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-24 Thread Robert Haas
On Mon, Jan 23, 2012 at 5:49 PM, Merlin Moncure mmonc...@gmail.com wrote:
 I'm not sure that you're getting anything with that user facing
 complexity.  The only realistic case I can see for explicit control of
 wire formats chosen is to defend your application from format changes
 in the server when upgrading the server and/or libpq.   This isn't a
 let's get better compression problem, this is I upgraded my
 database and my application broke problem.

 Fixing this problem in non documentation fashion is going to require a
 full protocol change, period.

Our current protocol allocates a 2-byte integer for the purposes of
specifying the type of each parameter, and another 2-byte integer for
the purpose of specifying the result type... but only one bit is
really needed at present: text or binary.  If we revise the protocol
version at some point, we might want to use some of that bit space to
allow some more fine-grained negotiation of the protocol version.  So,
for example, we might define the top 5 bits as reserved (always pass
zero), the next bit as a text/binary flag, and the remaining 10 bits
as a 10-bit format version number.  When a change like this comes
along, we can bump the highest binary format version recognized by the
server, and clients who request the new version can get it.

Alternatively, we might conclude that a 2-byte integer for each
parameter is overkill and try to cut back... but the point is there's
a bunch of unused bitspace there now.  In theory we could even do
something this without bumping the protocol version since the
documentation seems clear that any value other than 0 and 1 yields
undefined behavior, but in practice that seems like it might be a bit
too edgy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-24 Thread Merlin Moncure
On Tue, Jan 24, 2012 at 8:26 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 23, 2012 at 5:49 PM, Merlin Moncure mmonc...@gmail.com wrote:
 I'm not sure that you're getting anything with that user facing
 complexity.  The only realistic case I can see for explicit control of
 wire formats chosen is to defend your application from format changes
 in the server when upgrading the server and/or libpq.   This isn't a
 let's get better compression problem, this is I upgraded my
 database and my application broke problem.

 Fixing this problem in non documentation fashion is going to require a
 full protocol change, period.

 Our current protocol allocates a 2-byte integer for the purposes of
 specifying the type of each parameter, and another 2-byte integer for
 the purpose of specifying the result type... but only one bit is
 really needed at present: text or binary.  If we revise the protocol
 version at some point, we might want to use some of that bit space to
 allow some more fine-grained negotiation of the protocol version.  So,
 for example, we might define the top 5 bits as reserved (always pass
 zero), the next bit as a text/binary flag, and the remaining 10 bits
 as a 10-bit format version number.  When a change like this comes
 along, we can bump the highest binary format version recognized by the
 server, and clients who request the new version can get it.

 Alternatively, we might conclude that a 2-byte integer for each
 parameter is overkill and try to cut back... but the point is there's
 a bunch of unused bitspace there now.  In theory we could even do
 something this without bumping the protocol version since the
 documentation seems clear that any value other than 0 and 1 yields
 undefined behavior, but in practice that seems like it might be a bit
 too edgy.

Yeah.  But again, this isn't a contract between libpq and the server,
but between the application and the server...unless you want libpq to
do format translation to something the application can understand (but
even then the application is still involved).  I'm not very
enthusiastic about encouraging libpq application authors to pass
format #defines for every single parameter and consumed datum to get
future proofing on wire formats.  So I'd vote against any format code
beyond the text/binary switch that currently exists (which, by the
way, while useful, is one of the great sins of libpq that we have to
deal with basically forever).  While wire formatting is granular down
to the type level, applications should not have to deal with that.
They should Just Work.  So who decides what format code to stuff into
the protocol?  Where are the codes defined?

I'm very much in the camp that sometime, presumably during connection
startup, the protocol accepts a non-#defined-in-libpq token (database
version?) from the application that describes to the server what wire
formats can be used and the server sends one back.  There probably has
to be some additional facilities for non-core types but let's put that
aside for the moment.  Those two tokens allow the server to pick the
highest supported wire format (text and binary!) that everybody
understands.  The server's token is useful if we're being fancy and we
want libpq to translate an older server's wire format to a newer one
for the application.  This of course means moving some of the type
system into the client, which is something we might not want to do
since among other things it puts a heavy burden on non-libpq driver
authors (but then again, they can always stay on the v3 protocol,
which can benefit from being frozen in terms of wire formats).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-24 Thread Robert Haas
On Tue, Jan 24, 2012 at 11:16 AM, Merlin Moncure mmonc...@gmail.com wrote:
 Our current protocol allocates a 2-byte integer for the purposes of
 specifying the type of each parameter, and another 2-byte integer for
 the purpose of specifying the result type... but only one bit is
 really needed at present: text or binary.  If we revise the protocol
 version at some point, we might want to use some of that bit space to
 allow some more fine-grained negotiation of the protocol version.  So,
 for example, we might define the top 5 bits as reserved (always pass
 zero), the next bit as a text/binary flag, and the remaining 10 bits
 as a 10-bit format version number.  When a change like this comes
 along, we can bump the highest binary format version recognized by the
 server, and clients who request the new version can get it.

 Alternatively, we might conclude that a 2-byte integer for each
 parameter is overkill and try to cut back... but the point is there's
 a bunch of unused bitspace there now.  In theory we could even do
 something this without bumping the protocol version since the
 documentation seems clear that any value other than 0 and 1 yields
 undefined behavior, but in practice that seems like it might be a bit
 too edgy.

 Yeah.  But again, this isn't a contract between libpq and the server,
 but between the application and the server...

I don't see how this is relevant.  The text/binary format flag is
there in both libpq and the underlying protocol.

  So I'd vote against any format code
 beyond the text/binary switch that currently exists (which, by the
 way, while useful, is one of the great sins of libpq that we have to
 deal with basically forever).  While wire formatting is granular down
 to the type level, applications should not have to deal with that.
 They should Just Work.  So who decides what format code to stuff into
 the protocol?  Where are the codes defined?

 I'm very much in the camp that sometime, presumably during connection
 startup, the protocol accepts a non-#defined-in-libpq token (database
 version?) from the application that describes to the server what wire
 formats can be used and the server sends one back.  There probably has
 to be some additional facilities for non-core types but let's put that
 aside for the moment.  Those two tokens allow the server to pick the
 highest supported wire format (text and binary!) that everybody
 understands.  The server's token is useful if we're being fancy and we
 want libpq to translate an older server's wire format to a newer one
 for the application.  This of course means moving some of the type
 system into the client, which is something we might not want to do
 since among other things it puts a heavy burden on non-libpq driver
 authors (but then again, they can always stay on the v3 protocol,
 which can benefit from being frozen in terms of wire formats).

I think it's sensible for the server to advertise a version to the
client, but I don't see how you can dismiss add-on types so blithely.
The format used to represent any given type is logically a property of
that type, and only for built-in types is that associated with the
server version.

I do wonder whether we are making a mountain out of a mole-hill here,
though.  If I properly understand the proposal on the table, which
it's possible that I don't, but if I do, the new format is
self-identifying: when the optimization is in use, it sets a bit that
previously would always have been clear.  So if we just go ahead and
change this, clients that have been updated to understand the new
format will work just fine.  The server uses the proposed optimization
only for arrays that meet certain criteria, so any properly updated
client must still be able to handle the case where that bit isn't set.
 On the flip side, clients that aren't expecting the new optimization
might break.  But that's, again, no different than what happened when
we changed the default bytea output format.  If you get bit, you
either update your client or shut off the optimization and deal with
the performance consequences of so doing.  In fact, the cases are
almost perfectly analogous, because in each case the proposal was
based on the size of the output format being larger than necessary,
and wanting to squeeze it down to a smaller size for compactness.

And more generally, does anyone really expect that we're never going
to change the output format of any type we support ever again, without
retaining infinite backward compatibility?  I didn't hear any screams
of outrage when we updated the hyphenation rules for contrib/isbn -
well, ok, there were some howls, but that was because the rules were
still incomplete and US-centric, not so much because people thought it
was unacceptable for the hyphenation rules to be different in major
release N+1 than they were in major release N.  If the IETF goes and
defines a new standard for formatting IPv6 addresses, we're likely to
eventually support it via the inet and 

Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-24 Thread Merlin Moncure
On Tue, Jan 24, 2012 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote:
 I do wonder whether we are making a mountain out of a mole-hill here,
 though.  If I properly understand the proposal on the table, which
 it's possible that I don't, but if I do, the new format is
 self-identifying: when the optimization is in use, it sets a bit that
 previously would always have been clear.  So if we just go ahead and
 change this, clients that have been updated to understand the new
 format will work just fine.  The server uses the proposed optimization
 only for arrays that meet certain criteria, so any properly updated
 client must still be able to handle the case where that bit isn't set.
  On the flip side, clients that aren't expecting the new optimization
 might break.  But that's, again, no different than what happened when
 we changed the default bytea output format.  If you get bit, you
 either update your client or shut off the optimization and deal with
 the performance consequences of so doing.

Well, the bytea experience was IMNSHO a complete disaster (It was
earlier mentioned that jdbc clients were silently corrupting bytea
datums) and should be held up as an example of how *not* to do things;
it's better to avoid having to depend on the GUC or defensive
programmatic intervention to prevent further occurrences of
application failure since the former doesn't work and the latter won't
be reliably done.  Waiting for applications to break in the field only
to point affected users at the GUC is weak sauce.  It's creating a
user culture that is terrified of database upgrades which hurts
everybody.

Database apps tend to have long lives in computer terms such that they
can greatly outlive the service life of a particular postgres dot
release or even the programmers who originally wrote the application.
I'm not too concerned about the viability of a programming department
with Robert Haas at the helm, but what about when he leaves?  What
about critical 3rd party software that is no longer maintained?

In regards to the array optimization, I think it's great -- but if you
truly want to avoid blowing up user applications, it needs to be
disabled automatically.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-24 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 On Tue, Jan 24, 2012 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote:
 I do wonder whether we are making a mountain out of a mole-hill here,
 though.  If I properly understand the proposal on the table, which
 it's possible that I don't, but if I do, the new format is
 self-identifying: when the optimization is in use, it sets a bit that
 previously would always have been clear.  So if we just go ahead and
 change this, clients that have been updated to understand the new
 format will work just fine.  The server uses the proposed optimization
 only for arrays that meet certain criteria, so any properly updated
 client must still be able to handle the case where that bit isn't set.
  On the flip side, clients that aren't expecting the new optimization
 might break.  But that's, again, no different than what happened when
 we changed the default bytea output format.  If you get bit, you
 either update your client or shut off the optimization and deal with
 the performance consequences of so doing.

 Well, the bytea experience was IMNSHO a complete disaster (It was
 earlier mentioned that jdbc clients were silently corrupting bytea
 datums) and should be held up as an example of how *not* to do things;

Yeah.  In both cases, the (proposed) new output format is
self-identifying *to clients that know what to look for*.  Unfortunately
it would only be the most anally-written pre-existing client code that
would be likely to spit up on the unexpected variations.  What's much
more likely to happen, and did happen in the bytea case, is silent data
corruption.  The lack of redundancy in binary data makes this even more
likely, and the documentation situation makes it even worse.  If we had
had a clear binary-data format spec from day one that told people that
they must check for unexpected contents of the flag field and fail, then
maybe we could get away with considering not doing so to be a
client-side bug ... but I really don't think we have much of a leg to
stand on given the poor documentation we've provided.

 In regards to the array optimization, I think it's great -- but if you
 truly want to avoid blowing up user applications, it needs to be
 disabled automatically.

Right.  We need to fix things so that this format will not be sent to
clients unless the client code has indicated ability to accept it.
A GUC is a really poor proxy for that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-24 Thread Robert Haas
On Tue, Jan 24, 2012 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, the bytea experience was IMNSHO a complete disaster (It was
 earlier mentioned that jdbc clients were silently corrupting bytea
 datums) and should be held up as an example of how *not* to do things;

 Yeah.  In both cases, the (proposed) new output format is
 self-identifying *to clients that know what to look for*.Unfortunately
 it would only be the most anally-written pre-existing client code that
 would be likely to spit up on the unexpected variations.  What's much
 more likely to happen, and did happen in the bytea case, is silent data
 corruption.  The lack of redundancy in binary data makes this even more
 likely, and the documentation situation makes it even worse.  If we had
 had a clear binary-data format spec from day one that told people that
 they must check for unexpected contents of the flag field and fail, then
 maybe we could get away with considering not doing so to be a
 client-side bug ... but I really don't think we have much of a leg to
 stand on given the poor documentation we've provided.

 In regards to the array optimization, I think it's great -- but if you
 truly want to avoid blowing up user applications, it needs to be
 disabled automatically.

 Right.  We need to fix things so that this format will not be sent to
 clients unless the client code has indicated ability to accept it.
 A GUC is a really poor proxy for that.

OK.  It seems clear to me at this point that there is no appetite for
this patch in its present form:

https://commitfest.postgresql.org/action/patch_view?id=715

Furthermore, while we haven't settled the question of exactly what a
good negotiation facility would look like, we seem to agree that a GUC
isn't it.  I think that means this isn't going to happen for 9.2, so
we should mark this patch Returned with Feedback and return to this
topic for 9.3.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Robert Haas
On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen mark...@gmail.com wrote:
 On Sun, Jan 22, 2012 at 11:47 PM, Mikko Tiihonen
 mikko.tiiho...@nitorcreations.com wrote:
 * introduced a new GUC variable array_output copying the current
  bytea_output type, with values full (old value) and
  smallfixed (new default)
 * added documentation for the new GUC variable

 If this variable changes protocol-level layout
 and is user-settable, shouldn't it be GUC_REPORT?

 Now that I think about it, same applies to bytea_output?

 You could say the problem does not appear if the
 clients always accepts server default.  But how can
 the client know the default?  If the client is required
 to do SHOW before it can talk to server then that
 seems to hint those vars should be GUC_REPORT.

 Same story when clients are always expected to set
 the vars to their preferred values.  Then you get
 clients with different settings on one server.
 This breaks transaction-pooling setups (pgbouncer).
 Again, such protocol-changing tunables should be
 GUC_REPORT.

Probably so.  But I think we need not introduce quite so many new
threads on this patch.  This is, I think, at least thread #4, and
that's making the discussion hard to follow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen mark...@gmail.com wrote:
 Now that I think about it, same applies to bytea_output?

 Probably so.  But I think we need not introduce quite so many new
 threads on this patch.  This is, I think, at least thread #4, and
 that's making the discussion hard to follow.

Well, this is independent of the proposed patch, so I think a separate
thread is okay.  The question is shouldn't bytea_output be marked
GUC_REPORT?  I think that probably it should be, though I wonder
whether we're not too late.  Clients relying on it to be transmitted are
not going to work with existing 9.0 or 9.1 releases; so maybe changing
it to be reported going forward would just make things worse.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Marko Kreen
On Mon, Jan 23, 2012 at 11:20:52AM -0500, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen mark...@gmail.com wrote:
  Now that I think about it, same applies to bytea_output?
 
  Probably so.  But I think we need not introduce quite so many new
  threads on this patch.  This is, I think, at least thread #4, and
  that's making the discussion hard to follow.
 
 Well, this is independent of the proposed patch, so I think a separate
 thread is okay.  The question is shouldn't bytea_output be marked
 GUC_REPORT?  I think that probably it should be, though I wonder
 whether we're not too late.  Clients relying on it to be transmitted are
 not going to work with existing 9.0 or 9.1 releases; so maybe changing
 it to be reported going forward would just make things worse.


Well, in a complex setup it can change under you at will,
but as clients can process the data without knowing the
server state, maybe it's not a big problem.  (Unless there
are old clients in the mix...)

Perhaps we can leave it as-is?

But this leaves the question of future policy for
data format change in protocol.  Note I'm talking
about both text and binary formats here together.
Although we could have different policy for them.

Also note that any kind of per-session flag is basically a GUC.


Question 1 - how does client know about which format data is?

1) new format is detectable from lossy GUC
2) new format is detectable from GUC_REPORT
3) new format is detectable from Postgres version
4) new format was requested in query (V4 proto)
5) new format is detectable from data (\x in bytea)

1. obviously does not work.
2. works, but requires changes across all infrastructure.
3. works and is simple, but painful.
4. is good, but in the future
5. is good, now


Question 2 - how does client request new format?

1) Postgres new version forces it.
2) GUC_REPORT + non-detectable data
3) Lossy GUC + autodetectable data
4) GUC_REPORT + autodetectable data
5) Per-request data (V4 proto)

1. is painful
2. is painful - all infra components need to know about the GUC.
34. are both ugly and non-maintanable in long term.  Only
   difference is that with 3) the infrastructure can give slight
   guarantees that it does not change under client.
4. seems good...


Btw, it does not seems that per-request metainfo change requires
major version.  It just client can send extra metainfo packet
before bind+execute, if it knows server version is good enough.
For older servers it can simply skip the extra info.  [Oh yeah,
that requires data format is autodetectable, always.]

My conclusions:

1. Any change in data format should be compatible with old data.
   IOW - if client requested new data format, it should always
   accept old format too.

2. Can we postpone minor data format changes on the wire until there
   is proper way for clients to request on-the-wire formats?

-- 
marko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 [ bytea_output doesn't need to be GUC_REPORT because format is autodetectable 
 ]

Fair enough.  Anyway we're really about two years too late to revisit that.

 Btw, it does not seems that per-request metainfo change requires
 major version.  It just client can send extra metainfo packet
 before bind+execute, if it knows server version is good enough.

That is nonsense.  You're changing the protocol, and then saying
that clients should consult the server version instead of the
protocol version to know what to do.

 2. Can we postpone minor data format changes on the wire until there
is proper way for clients to request on-the-wire formats?

I think that people are coming around to that position, ie, we need
a well-engineered solution to the versioning problem *first*, and
should not accept incompatible minor improvements until we have that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread A.M.

On Jan 23, 2012, at 2:49 PM, Tom Lane wrote:

 Marko Kreen mark...@gmail.com writes:
 [ bytea_output doesn't need to be GUC_REPORT because format is 
 autodetectable ]
 
 Fair enough.  Anyway we're really about two years too late to revisit that.
 
 Btw, it does not seems that per-request metainfo change requires
 major version.  It just client can send extra metainfo packet
 before bind+execute, if it knows server version is good enough.
 
 That is nonsense.  You're changing the protocol, and then saying
 that clients should consult the server version instead of the
 protocol version to know what to do.
 
 2. Can we postpone minor data format changes on the wire until there
   is proper way for clients to request on-the-wire formats?
 
 I think that people are coming around to that position, ie, we need
 a well-engineered solution to the versioning problem *first*, and
 should not accept incompatible minor improvements until we have that.

One simple way clients could detect the binary encoding at startup would be to 
pass known test parameters and match against the returned values. If the client 
cannot match the response, then it should choose the text representation.

Alternatively, the 16-bit int in the Bind and RowDescription messages could be 
incremented to indicate a new format and then clients can specify the highest 
version of the binary format which they support.

Cheers,
M 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Merlin Moncure
On Mon, Jan 23, 2012 at 2:00 PM, A.M. age...@themactionfaction.com wrote:
 One simple way clients could detect the binary encoding at startup would be 
 to pass known test parameters and match against the returned values. If the 
 client cannot match the response, then it should choose the text 
 representation.

 Alternatively, the 16-bit int in the Bind and RowDescription messages could 
 be incremented to indicate a new format and then clients can specify the 
 highest version of the binary format which they support.

Prefer the version.  But why send this over and over with each bind?
Wouldn't you negotiate that when connecting? Most likely, optionally,
doing as much as you can from the server version?  Personally I'm not
really enthusiastic about a solution that adds a non-avoidable penalty
to all queries.

Also, a small nit: this problem is not specific to binary formats.
Text formats can and do change, albeit rarely, with predictable
headaches for the client.  I see no reason to deal with text/binary
differently.  The only difference between text/binary wire formats in
my eyes are that the text formats are documented.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread A.M.

On Jan 23, 2012, at 4:45 PM, Merlin Moncure wrote:

 On Mon, Jan 23, 2012 at 2:00 PM, A.M. age...@themactionfaction.com wrote:
 One simple way clients could detect the binary encoding at startup would be 
 to pass known test parameters and match against the returned values. If the 
 client cannot match the response, then it should choose the text 
 representation.
 
 Alternatively, the 16-bit int in the Bind and RowDescription messages could 
 be incremented to indicate a new format and then clients can specify the 
 highest version of the binary format which they support.
 
 Prefer the version.  But why send this over and over with each bind?
 Wouldn't you negotiate that when connecting? Most likely, optionally,
 doing as much as you can from the server version?  Personally I'm not
 really enthusiastic about a solution that adds a non-avoidable penalty
 to all queries.
 
 Also, a small nit: this problem is not specific to binary formats.
 Text formats can and do change, albeit rarely, with predictable
 headaches for the client.  I see no reason to deal with text/binary
 differently.  The only difference between text/binary wire formats in
 my eyes are that the text formats are documented.
 
 merlin


In terms of backwards compatibility (to support the widest range of clients), 
wouldn't it make sense to freeze each format option? That way, an updated text 
version could also assume a new int16 format identifier. The client would 
simply pass its preferred format. This could also allow for multiple in-flight 
formats; for example, if a client anticipates a large in-bound bytea column, it 
could specify format X which indicates the server should use gzip the result 
before sending. That same format may not be preferable on a different request.

Cheers,
M




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements

2012-01-23 Thread Merlin Moncure
On Mon, Jan 23, 2012 at 4:12 PM, A.M. age...@themactionfaction.com wrote:
 On Jan 23, 2012, at 4:45 PM, Merlin Moncure wrote:
 Prefer the version.  But why send this over and over with each bind?
 Wouldn't you negotiate that when connecting? Most likely, optionally,
 doing as much as you can from the server version?  Personally I'm not
 really enthusiastic about a solution that adds a non-avoidable penalty
 to all queries.

 In terms of backwards compatibility (to support the widest range of clients), 
 wouldn't it make sense to freeze each format option? That way, an updated 
 text version could also assume a new int16 format identifier. The client 
 would simply pass its preferred format. This could also allow for multiple 
 in-flight formats; for example, if a client anticipates a large in-bound 
 bytea column, it could specify format X which indicates the server should use 
 gzip the result before sending. That same format may not be preferable on a 
 different request.

hm.  well, I'd say that you're much better off if you can hold to the
principle that newer versions of the format are always better and
should both be used if the application and the server agree.  Using
your example, since you can already do something like:

select zlib_compress(byteacol) from foo;

I'm not sure that you're getting anything with that user facing
complexity.  The only realistic case I can see for explicit control of
wire formats chosen is to defend your application from format changes
in the server when upgrading the server and/or libpq.   This isn't a
let's get better compression problem, this is I upgraded my
database and my application broke problem.

Fixing this problem in non documentation fashion is going to require a
full protocol change, period.  It's the only way we can safely get all
the various players (libpq, jdbc, etc) on the same page without
breaking/recompiling millions of lines of old code that is currently
in production.  The new protocol should *require* at minimum the
application, not libpq, to explicitly send the version of the database
it was coded against.  That's just not getting sent now, and without
that information there's no realistic way to prevent application
breakage -- depending on libpq versions is useless since it can be
upgraded and there's always jdbc to deal with.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers