Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On tis, 2012-01-24 at 20:13 -0500, Tom Lane wrote: Yeah. In both cases, the (proposed) new output format is self-identifying *to clients that know what to look for*. Unfortunately it would only be the most anally-written pre-existing client code that would be likely to spit up on the unexpected variations. What's much more likely to happen, and did happen in the bytea case, is silent data corruption. The problem in the bytea case is that the client libraries are written to ignore encoding errors. No amount of protocol versioning will help you in that case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Tue, Jan 24, 2012 at 09:33:52PM -0500, Robert Haas wrote: Furthermore, while we haven't settled the question of exactly what a good negotiation facility would look like, we seem to agree that a GUC isn't it. I think that means this isn't going to happen for 9.2, so we should mark this patch Returned with Feedback and return to this topic for 9.3. Simply extending the text/bin flags should be quite uncontroversial first step. How to express the capability in startup packet, I leave to others to decide. But my proposal would be following: bit 0 : text/bin bit 1..15 : format version number, maps to best formats in some Postgres version. It does not solve the resultset problem, where I'd like to say gimme well-known types in optimal representation, others in text. I don't know the perfect solution for that, but I suspect the biggest danger here is the urge to go to maximal complexity immediately. So perhaps the good idea would simply give one additional bit (0x8000?) in result flag to say that only well-known types should be optimized. That should cover 95% of use-cases, and we can design more flexible packet format when we know more about actual needs. libpq suggestions: PQsetformatcodes(bool) only if its called with TRUE, it starts interpreting text/bin codes as non-bools. IOW, we will be compatible with old code using -1 as TRUE. protocol suggestions: On startup server sends highest supported text/bin codes, and gives error if finds higher code than supported. Poolers/proxies with different server versions in pool will simply give lowest common code out. Small QA, to put obvious aspects into writing -- * Does that mean we need to keep old formats around infinitely? Yes.On-wire formats have *much* higher visibility than on-disk formats. Also, except some basic types they are not parsed in adapters, but in client code. Libpq offers least help in that respect. Basically - changing on-wire formatting is big deal, don't do it willy-nilly. * Does that mean we cannot turn on new formats automatically? Yes. Should be obvious.. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
Marko Kreen mark...@gmail.com writes: On Tue, Jan 24, 2012 at 09:33:52PM -0500, Robert Haas wrote: Furthermore, while we haven't settled the question of exactly what a good negotiation facility would look like, we seem to agree that a GUC isn't it. I think that means this isn't going to happen for 9.2, so we should mark this patch Returned with Feedback and return to this topic for 9.3. Simply extending the text/bin flags should be quite uncontroversial first step. How to express the capability in startup packet, I leave to others to decide. But my proposal would be following: bit 0 : text/bin bit 1..15 : format version number, maps to best formats in some Postgres version. It does not solve the resultset problem, where I'd like to say gimme well-known types in optimal representation, others in text. I don't know the perfect solution for that, but I suspect the biggest danger here is the urge to go to maximal complexity immediately. So perhaps the good idea would simply give one additional bit (0x8000?) in result flag to say that only well-known types should be optimized. That should cover 95% of use-cases, and we can design more flexible packet format when we know more about actual needs. Huh? How can that work? If we decide to change the representation of some other well known type, say numeric, how do we decide whether a client setting that bit is expecting that change or not? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote: Marko Kreen mark...@gmail.com writes: On Tue, Jan 24, 2012 at 09:33:52PM -0500, Robert Haas wrote: Furthermore, while we haven't settled the question of exactly what a good negotiation facility would look like, we seem to agree that a GUC isn't it. I think that means this isn't going to happen for 9.2, so we should mark this patch Returned with Feedback and return to this topic for 9.3. Simply extending the text/bin flags should be quite uncontroversial first step. How to express the capability in startup packet, I leave to others to decide. But my proposal would be following: bit 0 : text/bin bit 1..15 : format version number, maps to best formats in some Postgres version. It does not solve the resultset problem, where I'd like to say gimme well-known types in optimal representation, others in text. I don't know the perfect solution for that, but I suspect the biggest danger here is the urge to go to maximal complexity immediately. So perhaps the good idea would simply give one additional bit (0x8000?) in result flag to say that only well-known types should be optimized. That should cover 95% of use-cases, and we can design more flexible packet format when we know more about actual needs. Huh? How can that work? If we decide to change the representation of some other well known type, say numeric, how do we decide whether a client setting that bit is expecting that change or not? It sets that bit *and* version code - which means that it is up-to-date with all well-known type formats in that version. The key here is to sanely define the well-known types and document them, so clients can be uptodate with them. Variants: - All built-in and contrib types in some Postgres version - All built-in types in some Postgres version - Most common types (text, numeric, bytes, int, float, bool, ..) Also, as we have only one bit, the set of types cannot be extended. (Unless we provide more bits for that, but that may get too confusing?) Basically, I see 2 scenarios here: 1) Client knows the result types and can set the text/bin/version code safely, without further restrictions. 2) There is generic framework, that does not know query contents but can be expected to track Postgres versions closely. Such framework cannot say binary for results safely, but *could* do it for some well-defined subset of types. Ofcourse it may be that 2) is not worth supporting, as frameworks can throw errors on their own if they find format that they cannot parse. Then the user needs to either register their own parser, or simply turn off optmized formats to get the plain-text values. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
Marko Kreen mark...@gmail.com writes: On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote: Huh? How can that work? If we decide to change the representation of some other well known type, say numeric, how do we decide whether a client setting that bit is expecting that change or not? It sets that bit *and* version code - which means that it is up-to-date with all well-known type formats in that version. Then why bother with the bit in the format code? If you've already done some other negotiation to establish what datatype formats you will accept, this doesn't seem to be adding any value. Basically, I see 2 scenarios here: 1) Client knows the result types and can set the text/bin/version code safely, without further restrictions. 2) There is generic framework, that does not know query contents but can be expected to track Postgres versions closely. Such framework cannot say binary for results safely, but *could* do it for some well-defined subset of types. The hole in approach (2) is that it supposes that the client side knows the specific datatypes in a query result in advance. While this is sometimes workable for application-level code that knows what query it's issuing, it's really entirely untenable for a framework or library. The only way that a framework can deal with arbitrary queries is to introduce an extra round trip (Describe step) to see what datatypes the query will produce so it can decide what format codes to issue ... and that will pretty much eat up any time savings you might get from a more efficient representation. You really want to do the negotiation once, at connection setup, and then be able to process queries without client-side prechecking of what data types will be sent back. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 11:40 AM, Tom Lane t...@sss.pgh.pa.us wrote: Marko Kreen mark...@gmail.com writes: On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote: Huh? How can that work? If we decide to change the representation of some other well known type, say numeric, how do we decide whether a client setting that bit is expecting that change or not? It sets that bit *and* version code - which means that it is up-to-date with all well-known type formats in that version. Then why bother with the bit in the format code? If you've already done some other negotiation to establish what datatype formats you will accept, this doesn't seem to be adding any value. Basically, I see 2 scenarios here: 1) Client knows the result types and can set the text/bin/version code safely, without further restrictions. 2) There is generic framework, that does not know query contents but can be expected to track Postgres versions closely. Such framework cannot say binary for results safely, but *could* do it for some well-defined subset of types. The hole in approach (2) is that it supposes that the client side knows the specific datatypes in a query result in advance. While this is sometimes workable for application-level code that knows what query it's issuing, it's really entirely untenable for a framework or library. The only way that a framework can deal with arbitrary queries is to introduce an extra round trip (Describe step) to see what datatypes the query will produce so it can decide what format codes to issue ... and that will pretty much eat up any time savings you might get from a more efficient representation. You really want to do the negotiation once, at connection setup, and then be able to process queries without client-side prechecking of what data types will be sent back. What might work is for clients to advertise a list of capability strings, like compact_array_format, at connection startup time. The server can then adjust its behavior based on that list. But the problem with that is that as we make changes to the wire protocol, the list of capabilities clients need to advertise could get pretty long in a hurry. A simpler alternative is to have the client send a server version along with the initial connection attempt and have the server do its best not to use any features that weren't present in that server version - but that seems to leave user-defined types out in the cold. I reiterate my previous view that we don't have time to engineer a good solution to this problem right now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 11:40:28AM -0500, Tom Lane wrote: Marko Kreen mark...@gmail.com writes: On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote: Huh? How can that work? If we decide to change the representation of some other well known type, say numeric, how do we decide whether a client setting that bit is expecting that change or not? It sets that bit *and* version code - which means that it is up-to-date with all well-known type formats in that version. Then why bother with the bit in the format code? If you've already done some other negotiation to establish what datatype formats you will accept, this doesn't seem to be adding any value. The other negotiation is done via Postgres release notes... I specifically want to avoid any sort of per-connection negotation, except the max format version supported, because it will mess up multiplexed usage of single connection. Then they need to either disabled advanced formats completely, or still do it per-query somehow (via GUCs?) which is mess. Also I don't see any market for flexible negotations, instead I see that people want 2 things: - Updated formats are easily available - Old apps not to break I might be mistaken here, then please correct me, but currently I'm designing for simplicity. Basically, I see 2 scenarios here: 1) Client knows the result types and can set the text/bin/version code safely, without further restrictions. 2) There is generic framework, that does not know query contents but can be expected to track Postgres versions closely. Such framework cannot say binary for results safely, but *could* do it for some well-defined subset of types. The hole in approach (2) is that it supposes that the client side knows the specific datatypes in a query result in advance. While this is sometimes workable for application-level code that knows what query it's issuing, it's really entirely untenable for a framework or library. No, the list of well-known types is documented and fixed. The bit is specifically for frameworks, so that they can say I support all well-known types in Postgres version X.Y. Note I said that the list cannot be extended but that is wrong. When this bit and version code are taken together, it clearly defines list as in version X.Y. So considering that client should not send any higher version than server supports, it means server always knows what list client refers to. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
Marko Kreen mark...@gmail.com writes: On Wed, Jan 25, 2012 at 11:40:28AM -0500, Tom Lane wrote: Then why bother with the bit in the format code? If you've already done some other negotiation to establish what datatype formats you will accept, this doesn't seem to be adding any value. The other negotiation is done via Postgres release notes... That is really not going to work if the requirement is to not break old apps. They haven't read the release notes. I specifically want to avoid any sort of per-connection negotation, except the max format version supported, because it will mess up multiplexed usage of single connection. Then they need to either disabled advanced formats completely, or still do it per-query somehow (via GUCs?) which is mess. Hmm, that adds yet another level of not-obvious-how-to-meet requirement. I tend to concur with Robert that we are not close to a solution. No, the list of well-known types is documented and fixed. The bit is specifically for frameworks, so that they can say I support all well-known types in Postgres version X.Y. So in other words, if we have a client that contains a framework that knows about version N, and we connect it up to a server that speaks version N+1, it suddenly loses the ability to use any version-N optimizations? That does not meet my idea of not breaking old apps. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen mark...@gmail.com wrote: I specifically want to avoid any sort of per-connection negotation, except the max format version supported, because it will mess up multiplexed usage of single connection. Then they need to either disabled advanced formats completely, or still do it per-query somehow (via GUCs?) which is mess. Being able to explicitly pick format version other than the one the application was specifically written against adds a lot of complexity and needs to be justified. Maybe you're trying to translate data between two differently versioned servers? I'm trying to understand the motive behind your wanting finer grained control of picking format version... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 12:58:15PM -0500, Tom Lane wrote: Marko Kreen mark...@gmail.com writes: On Wed, Jan 25, 2012 at 11:40:28AM -0500, Tom Lane wrote: Then why bother with the bit in the format code? If you've already done some other negotiation to establish what datatype formats you will accept, this doesn't seem to be adding any value. The other negotiation is done via Postgres release notes... That is really not going to work if the requirement is to not break old apps. They haven't read the release notes. Yes, but they also keep requesting the old formats so everything is fine? Note that formats are under full control of client, server has no way to send newer formats to client that has not requested it. I specifically want to avoid any sort of per-connection negotation, except the max format version supported, because it will mess up multiplexed usage of single connection. Then they need to either disabled advanced formats completely, or still do it per-query somehow (via GUCs?) which is mess. Hmm, that adds yet another level of not-obvious-how-to-meet requirement. I tend to concur with Robert that we are not close to a solution. Well, my simple scheme seems to work fine with such requirement. [My scheme - client-supplied 16bit type code is only thing that decides format.] No, the list of well-known types is documented and fixed. The bit is specifically for frameworks, so that they can say I support all well-known types in Postgres version X.Y. So in other words, if we have a client that contains a framework that knows about version N, and we connect it up to a server that speaks version N+1, it suddenly loses the ability to use any version-N optimizations? That does not meet my idea of not breaking old apps. That is up to Postgres maintainers to decide, whether they want to phase out some type from the list. But my main point was it's OK to add types into list. I missed that aspect on my previous mail. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 12:54:00PM -0600, Merlin Moncure wrote: On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen mark...@gmail.com wrote: I specifically want to avoid any sort of per-connection negotation, except the max format version supported, because it will mess up multiplexed usage of single connection. Then they need to either disabled advanced formats completely, or still do it per-query somehow (via GUCs?) which is mess. Being able to explicitly pick format version other than the one the application was specifically written against adds a lot of complexity and needs to be justified. Maybe you're trying to translate data between two differently versioned servers? I'm trying to understand the motive behind your wanting finer grained control of picking format version... You mean if client has written with version N formats, but connects to server with version N-1 formats? True, simply not supporting such case simplifies client-side API. But note that it does not change anything on protocol level, it's purely client-API specific. It may well be that some higher-level APIs (JDBC, Npgsql, Psycopg) may support such downgrade, but with lower-level API-s (raw libpq), it may be optional whether the client wants to support such usage or not. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 1:24 PM, Marko Kreen mark...@gmail.com wrote: On Wed, Jan 25, 2012 at 12:54:00PM -0600, Merlin Moncure wrote: On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen mark...@gmail.com wrote: I specifically want to avoid any sort of per-connection negotation, except the max format version supported, because it will mess up multiplexed usage of single connection. Then they need to either disabled advanced formats completely, or still do it per-query somehow (via GUCs?) which is mess. Being able to explicitly pick format version other than the one the application was specifically written against adds a lot of complexity and needs to be justified. Maybe you're trying to translate data between two differently versioned servers? I'm trying to understand the motive behind your wanting finer grained control of picking format version... You mean if client has written with version N formats, but connects to server with version N-1 formats? True, simply not supporting such case simplifies client-side API. But note that it does not change anything on protocol level, it's purely client-API specific. It may well be that some higher-level APIs (JDBC, Npgsql, Psycopg) may support such downgrade, but with lower-level API-s (raw libpq), it may be optional whether the client wants to support such usage or not. well, I see the following cases: 1) Vserver Vapplication: server downgrades wire formats to applications version 2) Vapplication Vlibpq Vserver: since the application is reading/writing formats the server can't understand, an error should be raised if they are used in either direction 3) Vlibpq = VApplication Vserver: same as above, but libpq can 'upconvert' low version wire format to application's wire format or error otherwise. By far, the most common cause of problems (both in terms of severity and frequency) is case #1. #3 allows a 'compatibility mode' via libpq, but that comes at significant cost of complexity since libpq needs to be able to translate wire formats up (but not down). #2/3 is a less common problem though as it's more likely the application can be adjusted to get up to speed: so to keep things simple we can maybe just error out in those scenarios. In the database, we need to maintain outdated send/recv functions basically forever and as much as possible try and translate old wire format data to and from newer backend structures (maybe in very specific cases that will be impossible such that the application is SOL, but that should be rare). All send/recv functions, including user created ones need to be stamped with a version token (database version?). With the versions of the application, libpq, and all server functions, we can determine all wire formats as long as we assume the application's targeted database version represents all the wire formats it was using. My good ideas stop there: the exact mechanics of how the usable set of functions are determined, how exactly the adjusted type look ups will work, etc. would all have to be sorted out. Most of the nastier parts though (protocol changes notwithstanding) are not in libpq, but the server. There's just no quick fix on the client side I can see. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 01:43:03PM -0600, Merlin Moncure wrote: On Wed, Jan 25, 2012 at 1:24 PM, Marko Kreen mark...@gmail.com wrote: On Wed, Jan 25, 2012 at 12:54:00PM -0600, Merlin Moncure wrote: On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen mark...@gmail.com wrote: I specifically want to avoid any sort of per-connection negotation, except the max format version supported, because it will mess up multiplexed usage of single connection. Then they need to either disabled advanced formats completely, or still do it per-query somehow (via GUCs?) which is mess. Being able to explicitly pick format version other than the one the application was specifically written against adds a lot of complexity and needs to be justified. Maybe you're trying to translate data between two differently versioned servers? I'm trying to understand the motive behind your wanting finer grained control of picking format version... You mean if client has written with version N formats, but connects to server with version N-1 formats? True, simply not supporting such case simplifies client-side API. But note that it does not change anything on protocol level, it's purely client-API specific. It may well be that some higher-level APIs (JDBC, Npgsql, Psycopg) may support such downgrade, but with lower-level API-s (raw libpq), it may be optional whether the client wants to support such usage or not. well, I see the following cases: 1) Vserver Vapplication: server downgrades wire formats to applications version 2) Vapplication Vlibpq Vserver: since the application is reading/writing formats the server can't understand, an error should be raised if they are used in either direction 3) Vlibpq = VApplication Vserver: same as above, but libpq can 'upconvert' low version wire format to application's wire format or error otherwise. I don't see why you special-case libpq here. There is no reason libpq cannot pass older/newer formats through. Only thing that matters it parser/formatter version. If that is done in libpq, then app version does not matter. If it's done in app, then libpq version does not matter. By far, the most common cause of problems (both in terms of severity and frequency) is case #1. #3 allows a 'compatibility mode' via libpq, but that comes at significant cost of complexity since libpq needs to be able to translate wire formats up (but not down). #2/3 is a less common problem though as it's more likely the application can be adjusted to get up to speed: so to keep things simple we can maybe just error out in those scenarios. I don't like the idea of conversion. Instead either client writes values through API that picks format based on server version, or it writes them for specific version only. In latter case it cannot work with older server. Unless the fixed version is the baseline. In the database, we need to maintain outdated send/recv functions basically forever and as much as possible try and translate old wire format data to and from newer backend structures (maybe in very specific cases that will be impossible such that the application is SOL, but that should be rare). All send/recv functions, including user created ones need to be stamped with a version token (database version?). With the versions of the application, libpq, and all server functions, we can determine all wire formats as long as we assume the application's targeted database version represents all the wire formats it was using. My good ideas stop there: the exact mechanics of how the usable set of functions are determined, how exactly the adjusted type look ups will work, etc. would all have to be sorted out. Most of the nastier parts though (protocol changes notwithstanding) are not in libpq, but the server. There's just no quick fix on the client side I can see. It does not need to be complex - just bring the version number to i/o function and let it decide whether it cares about it or not. Most functions will not.. Only those that we want to change in compatible manner need to look at it. But I don't see that there is danger of having regular changes in wire formats. So most of the functions will ignore the versioning. Including the ones that don't care about compatibility. But seriously - on-wire compatibility is good thing, do not fear it... -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 2:29 PM, Marko Kreen mark...@gmail.com wrote: well, I see the following cases: 1) Vserver Vapplication: server downgrades wire formats to applications version 2) Vapplication Vlibpq Vserver: since the application is reading/writing formats the server can't understand, an error should be raised if they are used in either direction 3) Vlibpq = VApplication Vserver: same as above, but libpq can 'upconvert' low version wire format to application's wire format or error otherwise. I don't see why you special-case libpq here. There is no reason libpq cannot pass older/newer formats through. Only thing that matters it parser/formatter version. If that is done in libpq, then app version does not matter. If it's done in app, then libpq version does not matter. Only because if the app is targeting wire format N, but the server can only handle N-1, libpq has the opportunity to fix it up. That's could be just over thinking it though. By far, the most common cause of problems (both in terms of severity and frequency) is case #1. #3 allows a 'compatibility mode' via libpq, but that comes at significant cost of complexity since libpq needs to be able to translate wire formats up (but not down). #2/3 is a less common problem though as it's more likely the application can be adjusted to get up to speed: so to keep things simple we can maybe just error out in those scenarios. I don't like the idea of conversion. Instead either client writes values through API that picks format based on server version, or it writes them for specific version only. In latter case it cannot work with older server. Unless the fixed version is the baseline. ok. another point about that: libpq isn't really part of the solution anyways since there are other popular fully native protocol consumers, including (and especially) jdbc, but also python, node.js etc etc. that's why I was earlier insisting on a protocol bump, so that we could in the new protocol force application version to be advertised. v3 would remain caveat emptor for wire formats but v4 would not. In the database, we need to maintain outdated send/recv functions basically forever and as much as possible try and translate old wire format data to and from newer backend structures (maybe in very specific cases that will be impossible such that the application is SOL, but that should be rare). All send/recv functions, including user created ones need to be stamped with a version token (database version?). With the versions of the application, libpq, and all server functions, we can determine all wire formats as long as we assume the application's targeted database version represents all the wire formats it was using. My good ideas stop there: the exact mechanics of how the usable set of functions are determined, how exactly the adjusted type look ups will work, etc. would all have to be sorted out. Most of the nastier parts though (protocol changes notwithstanding) are not in libpq, but the server. There's just no quick fix on the client side I can see. It does not need to be complex - just bring the version number to i/o function and let it decide whether it cares about it or not. Most functions will not.. Only those that we want to change in compatible manner need to look at it. well, maybe instead of passing version number around, the server installs the proper compatibility send/recv functions just once on session start up so your code isn't littered with stuff like if(version n) do this; else do this;? But seriously - on-wire compatibility is good thing, do not fear it... sure -- but for postgres I just don't think it's realistic, especially for the binary wire formats. a json based data payload could give it to you (and I'm only half kidding) :-). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Wed, Jan 25, 2012 at 02:50:09PM -0600, Merlin Moncure wrote: On Wed, Jan 25, 2012 at 2:29 PM, Marko Kreen mark...@gmail.com wrote: well, I see the following cases: 1) Vserver Vapplication: server downgrades wire formats to applications version 2) Vapplication Vlibpq Vserver: since the application is reading/writing formats the server can't understand, an error should be raised if they are used in either direction 3) Vlibpq = VApplication Vserver: same as above, but libpq can 'upconvert' low version wire format to application's wire format or error otherwise. I don't see why you special-case libpq here. There is no reason libpq cannot pass older/newer formats through. Only thing that matters it parser/formatter version. If that is done in libpq, then app version does not matter. If it's done in app, then libpq version does not matter. Only because if the app is targeting wire format N, but the server can only handle N-1, libpq has the opportunity to fix it up. That's could be just over thinking it though. I think it's over thinking. The value should be formatted/parsed just once. Server side must support processing different versions. Whether client side supports downgrading, it's up to client-side programmers. If you want to write compatible client, you have a choice of using proper wrapper API, or simply writing baseline formatting, ignoring format changes in new versions. Both are valid approaches and I think we should keep it that way. By far, the most common cause of problems (both in terms of severity and frequency) is case #1. #3 allows a 'compatibility mode' via libpq, but that comes at significant cost of complexity since libpq needs to be able to translate wire formats up (but not down). #2/3 is a less common problem though as it's more likely the application can be adjusted to get up to speed: so to keep things simple we can maybe just error out in those scenarios. I don't like the idea of conversion. Instead either client writes values through API that picks format based on server version, or it writes them for specific version only. In latter case it cannot work with older server. Unless the fixed version is the baseline. ok. another point about that: libpq isn't really part of the solution anyways since there are other popular fully native protocol consumers, including (and especially) jdbc, but also python, node.js etc etc. that's why I was earlier insisting on a protocol bump, so that we could in the new protocol force application version to be advertised. v3 would remain caveat emptor for wire formats but v4 would not. We can bump major/minor anyway to inform clients about new functionality. I don't particularly care about that. What I'm interested in is what the actual type negotation looks like. It might be possible we could get away without bumpping anything. But I have not thought about that angle too deeply yet. In the database, we need to maintain outdated send/recv functions basically forever and as much as possible try and translate old wire format data to and from newer backend structures (maybe in very specific cases that will be impossible such that the application is SOL, but that should be rare). All send/recv functions, including user created ones need to be stamped with a version token (database version?). With the versions of the application, libpq, and all server functions, we can determine all wire formats as long as we assume the application's targeted database version represents all the wire formats it was using. My good ideas stop there: the exact mechanics of how the usable set of functions are determined, how exactly the adjusted type look ups will work, etc. would all have to be sorted out. Most of the nastier parts though (protocol changes notwithstanding) are not in libpq, but the server. There's just no quick fix on the client side I can see. It does not need to be complex - just bring the version number to i/o function and let it decide whether it cares about it or not. Most functions will not.. Only those that we want to change in compatible manner need to look at it. well, maybe instead of passing version number around, the server installs the proper compatibility send/recv functions just once on session start up so your code isn't littered with stuff like if(version n) do this; else do this;? Seems confusing. Note that type i/o functions are user-callable. How should they act then? Also note that if()s are needed only for types that want to change their on-wire formatting. Considering the mess incompatible on-wire format change can cause, it's good price to pay. But seriously - on-wire compatibility is good thing, do not fear it... sure -- but for postgres I just don't think it's realistic, especially for the binary wire formats. a json based data payload could give it to you (and I'm only half
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On 01/25/2012 06:40 PM, Tom Lane wrote: Marko Kreenmark...@gmail.com writes: On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote: Huh? How can that work? If we decide to change the representation of some other well known type, say numeric, how do we decide whether a client setting that bit is expecting that change or not? It sets that bit *and* version code - which means that it is up-to-date with all well-known type formats in that version. Then why bother with the bit in the format code? If you've already done some other negotiation to establish what datatype formats you will accept, this doesn't seem to be adding any value. Basically, I see 2 scenarios here: 1) Client knows the result types and can set the text/bin/version code safely, without further restrictions. 2) There is generic framework, that does not know query contents but can be expected to track Postgres versions closely. Such framework cannot say binary for results safely, but *could* do it for some well-defined subset of types. The hole in approach (2) is that it supposes that the client side knows the specific datatypes in a query result in advance. While this is sometimes workable for application-level code that knows what query it's issuing, it's really entirely untenable for a framework or library. The only way that a framework can deal with arbitrary queries is to introduce an extra round trip (Describe step) to see what datatypes the query will produce so it can decide what format codes to issue ... and that will pretty much eat up any time savings you might get from a more efficient representation. This is pretty much what jdbc driver already does, since it does not have 100% coverage of even current binary formats. First time you execute a query it requests text encoding, but caches the Describe results. Next time it sets the binary bits on all return columns that it knows how to decode. You really want to do the negotiation once, at connection setup, and then be able to process queries without client-side prechecking of what data types will be sent back. I think my original minor_version patch tried to do that. It introduced a per-connection setting for version. Server GUC_REPORTED the maximum supported minor_version but defaulted to the baseline wire format. The jdbc client could bump the minor_version to supported higher value (error if value larger than what server advertised). A way was provided for the application using jdbc driver to override the requested minor_version in the rare event that something broke (rare, because jdbc driver generally does not expose the wire-encoding to applications). Now if pgbounce and other pooling solutions would reset the minor_version to 0 then it should work. Scenarios where other end is too old to know about the minor_version: VserverVlibpq = client does nothing - use baseline version VlibpqVserver = no supported_minor_version in GUC_REPORT - use baseline Normal 9.2+ scenarios: VserverVlibpq = libpg sets minor_version to largest that is supports - libpq requested version used VlibpqVserver = libpg notices that server supported value is lower than its so it sets minor_version to server supported value - server version used For perl driver that exposes the wire format to application by default I can envision that the driver needs to add a new API that applications need to use to explicitly bump the minor_version up instead of defaulting to the largest supported by the driver as in jdbc/libpg. The reason why I proposed a incrementing minor_version instead of bit flags of new encodings was that it takes less space and is easier to document and understand so that exposing it to applications is possible. But how to handle postgres extensions that change their wire-format? Maybe we do need to have oid:minor_version,oid:ver,oid_ver as the negotiated version variable? -Mikko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Mon, Jan 23, 2012 at 5:49 PM, Merlin Moncure mmonc...@gmail.com wrote: I'm not sure that you're getting anything with that user facing complexity. The only realistic case I can see for explicit control of wire formats chosen is to defend your application from format changes in the server when upgrading the server and/or libpq. This isn't a let's get better compression problem, this is I upgraded my database and my application broke problem. Fixing this problem in non documentation fashion is going to require a full protocol change, period. Our current protocol allocates a 2-byte integer for the purposes of specifying the type of each parameter, and another 2-byte integer for the purpose of specifying the result type... but only one bit is really needed at present: text or binary. If we revise the protocol version at some point, we might want to use some of that bit space to allow some more fine-grained negotiation of the protocol version. So, for example, we might define the top 5 bits as reserved (always pass zero), the next bit as a text/binary flag, and the remaining 10 bits as a 10-bit format version number. When a change like this comes along, we can bump the highest binary format version recognized by the server, and clients who request the new version can get it. Alternatively, we might conclude that a 2-byte integer for each parameter is overkill and try to cut back... but the point is there's a bunch of unused bitspace there now. In theory we could even do something this without bumping the protocol version since the documentation seems clear that any value other than 0 and 1 yields undefined behavior, but in practice that seems like it might be a bit too edgy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Tue, Jan 24, 2012 at 8:26 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 23, 2012 at 5:49 PM, Merlin Moncure mmonc...@gmail.com wrote: I'm not sure that you're getting anything with that user facing complexity. The only realistic case I can see for explicit control of wire formats chosen is to defend your application from format changes in the server when upgrading the server and/or libpq. This isn't a let's get better compression problem, this is I upgraded my database and my application broke problem. Fixing this problem in non documentation fashion is going to require a full protocol change, period. Our current protocol allocates a 2-byte integer for the purposes of specifying the type of each parameter, and another 2-byte integer for the purpose of specifying the result type... but only one bit is really needed at present: text or binary. If we revise the protocol version at some point, we might want to use some of that bit space to allow some more fine-grained negotiation of the protocol version. So, for example, we might define the top 5 bits as reserved (always pass zero), the next bit as a text/binary flag, and the remaining 10 bits as a 10-bit format version number. When a change like this comes along, we can bump the highest binary format version recognized by the server, and clients who request the new version can get it. Alternatively, we might conclude that a 2-byte integer for each parameter is overkill and try to cut back... but the point is there's a bunch of unused bitspace there now. In theory we could even do something this without bumping the protocol version since the documentation seems clear that any value other than 0 and 1 yields undefined behavior, but in practice that seems like it might be a bit too edgy. Yeah. But again, this isn't a contract between libpq and the server, but between the application and the server...unless you want libpq to do format translation to something the application can understand (but even then the application is still involved). I'm not very enthusiastic about encouraging libpq application authors to pass format #defines for every single parameter and consumed datum to get future proofing on wire formats. So I'd vote against any format code beyond the text/binary switch that currently exists (which, by the way, while useful, is one of the great sins of libpq that we have to deal with basically forever). While wire formatting is granular down to the type level, applications should not have to deal with that. They should Just Work. So who decides what format code to stuff into the protocol? Where are the codes defined? I'm very much in the camp that sometime, presumably during connection startup, the protocol accepts a non-#defined-in-libpq token (database version?) from the application that describes to the server what wire formats can be used and the server sends one back. There probably has to be some additional facilities for non-core types but let's put that aside for the moment. Those two tokens allow the server to pick the highest supported wire format (text and binary!) that everybody understands. The server's token is useful if we're being fancy and we want libpq to translate an older server's wire format to a newer one for the application. This of course means moving some of the type system into the client, which is something we might not want to do since among other things it puts a heavy burden on non-libpq driver authors (but then again, they can always stay on the v3 protocol, which can benefit from being frozen in terms of wire formats). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Tue, Jan 24, 2012 at 11:16 AM, Merlin Moncure mmonc...@gmail.com wrote: Our current protocol allocates a 2-byte integer for the purposes of specifying the type of each parameter, and another 2-byte integer for the purpose of specifying the result type... but only one bit is really needed at present: text or binary. If we revise the protocol version at some point, we might want to use some of that bit space to allow some more fine-grained negotiation of the protocol version. So, for example, we might define the top 5 bits as reserved (always pass zero), the next bit as a text/binary flag, and the remaining 10 bits as a 10-bit format version number. When a change like this comes along, we can bump the highest binary format version recognized by the server, and clients who request the new version can get it. Alternatively, we might conclude that a 2-byte integer for each parameter is overkill and try to cut back... but the point is there's a bunch of unused bitspace there now. In theory we could even do something this without bumping the protocol version since the documentation seems clear that any value other than 0 and 1 yields undefined behavior, but in practice that seems like it might be a bit too edgy. Yeah. But again, this isn't a contract between libpq and the server, but between the application and the server... I don't see how this is relevant. The text/binary format flag is there in both libpq and the underlying protocol. So I'd vote against any format code beyond the text/binary switch that currently exists (which, by the way, while useful, is one of the great sins of libpq that we have to deal with basically forever). While wire formatting is granular down to the type level, applications should not have to deal with that. They should Just Work. So who decides what format code to stuff into the protocol? Where are the codes defined? I'm very much in the camp that sometime, presumably during connection startup, the protocol accepts a non-#defined-in-libpq token (database version?) from the application that describes to the server what wire formats can be used and the server sends one back. There probably has to be some additional facilities for non-core types but let's put that aside for the moment. Those two tokens allow the server to pick the highest supported wire format (text and binary!) that everybody understands. The server's token is useful if we're being fancy and we want libpq to translate an older server's wire format to a newer one for the application. This of course means moving some of the type system into the client, which is something we might not want to do since among other things it puts a heavy burden on non-libpq driver authors (but then again, they can always stay on the v3 protocol, which can benefit from being frozen in terms of wire formats). I think it's sensible for the server to advertise a version to the client, but I don't see how you can dismiss add-on types so blithely. The format used to represent any given type is logically a property of that type, and only for built-in types is that associated with the server version. I do wonder whether we are making a mountain out of a mole-hill here, though. If I properly understand the proposal on the table, which it's possible that I don't, but if I do, the new format is self-identifying: when the optimization is in use, it sets a bit that previously would always have been clear. So if we just go ahead and change this, clients that have been updated to understand the new format will work just fine. The server uses the proposed optimization only for arrays that meet certain criteria, so any properly updated client must still be able to handle the case where that bit isn't set. On the flip side, clients that aren't expecting the new optimization might break. But that's, again, no different than what happened when we changed the default bytea output format. If you get bit, you either update your client or shut off the optimization and deal with the performance consequences of so doing. In fact, the cases are almost perfectly analogous, because in each case the proposal was based on the size of the output format being larger than necessary, and wanting to squeeze it down to a smaller size for compactness. And more generally, does anyone really expect that we're never going to change the output format of any type we support ever again, without retaining infinite backward compatibility? I didn't hear any screams of outrage when we updated the hyphenation rules for contrib/isbn - well, ok, there were some howls, but that was because the rules were still incomplete and US-centric, not so much because people thought it was unacceptable for the hyphenation rules to be different in major release N+1 than they were in major release N. If the IETF goes and defines a new standard for formatting IPv6 addresses, we're likely to eventually support it via the inet and
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Tue, Jan 24, 2012 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote: I do wonder whether we are making a mountain out of a mole-hill here, though. If I properly understand the proposal on the table, which it's possible that I don't, but if I do, the new format is self-identifying: when the optimization is in use, it sets a bit that previously would always have been clear. So if we just go ahead and change this, clients that have been updated to understand the new format will work just fine. The server uses the proposed optimization only for arrays that meet certain criteria, so any properly updated client must still be able to handle the case where that bit isn't set. On the flip side, clients that aren't expecting the new optimization might break. But that's, again, no different than what happened when we changed the default bytea output format. If you get bit, you either update your client or shut off the optimization and deal with the performance consequences of so doing. Well, the bytea experience was IMNSHO a complete disaster (It was earlier mentioned that jdbc clients were silently corrupting bytea datums) and should be held up as an example of how *not* to do things; it's better to avoid having to depend on the GUC or defensive programmatic intervention to prevent further occurrences of application failure since the former doesn't work and the latter won't be reliably done. Waiting for applications to break in the field only to point affected users at the GUC is weak sauce. It's creating a user culture that is terrified of database upgrades which hurts everybody. Database apps tend to have long lives in computer terms such that they can greatly outlive the service life of a particular postgres dot release or even the programmers who originally wrote the application. I'm not too concerned about the viability of a programming department with Robert Haas at the helm, but what about when he leaves? What about critical 3rd party software that is no longer maintained? In regards to the array optimization, I think it's great -- but if you truly want to avoid blowing up user applications, it needs to be disabled automatically. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
Merlin Moncure mmonc...@gmail.com writes: On Tue, Jan 24, 2012 at 11:55 AM, Robert Haas robertmh...@gmail.com wrote: I do wonder whether we are making a mountain out of a mole-hill here, though. If I properly understand the proposal on the table, which it's possible that I don't, but if I do, the new format is self-identifying: when the optimization is in use, it sets a bit that previously would always have been clear. So if we just go ahead and change this, clients that have been updated to understand the new format will work just fine. The server uses the proposed optimization only for arrays that meet certain criteria, so any properly updated client must still be able to handle the case where that bit isn't set. On the flip side, clients that aren't expecting the new optimization might break. But that's, again, no different than what happened when we changed the default bytea output format. If you get bit, you either update your client or shut off the optimization and deal with the performance consequences of so doing. Well, the bytea experience was IMNSHO a complete disaster (It was earlier mentioned that jdbc clients were silently corrupting bytea datums) and should be held up as an example of how *not* to do things; Yeah. In both cases, the (proposed) new output format is self-identifying *to clients that know what to look for*. Unfortunately it would only be the most anally-written pre-existing client code that would be likely to spit up on the unexpected variations. What's much more likely to happen, and did happen in the bytea case, is silent data corruption. The lack of redundancy in binary data makes this even more likely, and the documentation situation makes it even worse. If we had had a clear binary-data format spec from day one that told people that they must check for unexpected contents of the flag field and fail, then maybe we could get away with considering not doing so to be a client-side bug ... but I really don't think we have much of a leg to stand on given the poor documentation we've provided. In regards to the array optimization, I think it's great -- but if you truly want to avoid blowing up user applications, it needs to be disabled automatically. Right. We need to fix things so that this format will not be sent to clients unless the client code has indicated ability to accept it. A GUC is a really poor proxy for that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Tue, Jan 24, 2012 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, the bytea experience was IMNSHO a complete disaster (It was earlier mentioned that jdbc clients were silently corrupting bytea datums) and should be held up as an example of how *not* to do things; Yeah. In both cases, the (proposed) new output format is self-identifying *to clients that know what to look for*.Unfortunately it would only be the most anally-written pre-existing client code that would be likely to spit up on the unexpected variations. What's much more likely to happen, and did happen in the bytea case, is silent data corruption. The lack of redundancy in binary data makes this even more likely, and the documentation situation makes it even worse. If we had had a clear binary-data format spec from day one that told people that they must check for unexpected contents of the flag field and fail, then maybe we could get away with considering not doing so to be a client-side bug ... but I really don't think we have much of a leg to stand on given the poor documentation we've provided. In regards to the array optimization, I think it's great -- but if you truly want to avoid blowing up user applications, it needs to be disabled automatically. Right. We need to fix things so that this format will not be sent to clients unless the client code has indicated ability to accept it. A GUC is a really poor proxy for that. OK. It seems clear to me at this point that there is no appetite for this patch in its present form: https://commitfest.postgresql.org/action/patch_view?id=715 Furthermore, while we haven't settled the question of exactly what a good negotiation facility would look like, we seem to agree that a GUC isn't it. I think that means this isn't going to happen for 9.2, so we should mark this patch Returned with Feedback and return to this topic for 9.3. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen mark...@gmail.com wrote: On Sun, Jan 22, 2012 at 11:47 PM, Mikko Tiihonen mikko.tiiho...@nitorcreations.com wrote: * introduced a new GUC variable array_output copying the current bytea_output type, with values full (old value) and smallfixed (new default) * added documentation for the new GUC variable If this variable changes protocol-level layout and is user-settable, shouldn't it be GUC_REPORT? Now that I think about it, same applies to bytea_output? You could say the problem does not appear if the clients always accepts server default. But how can the client know the default? If the client is required to do SHOW before it can talk to server then that seems to hint those vars should be GUC_REPORT. Same story when clients are always expected to set the vars to their preferred values. Then you get clients with different settings on one server. This breaks transaction-pooling setups (pgbouncer). Again, such protocol-changing tunables should be GUC_REPORT. Probably so. But I think we need not introduce quite so many new threads on this patch. This is, I think, at least thread #4, and that's making the discussion hard to follow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
Robert Haas robertmh...@gmail.com writes: On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen mark...@gmail.com wrote: Now that I think about it, same applies to bytea_output? Probably so. But I think we need not introduce quite so many new threads on this patch. This is, I think, at least thread #4, and that's making the discussion hard to follow. Well, this is independent of the proposed patch, so I think a separate thread is okay. The question is shouldn't bytea_output be marked GUC_REPORT? I think that probably it should be, though I wonder whether we're not too late. Clients relying on it to be transmitted are not going to work with existing 9.0 or 9.1 releases; so maybe changing it to be reported going forward would just make things worse. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Mon, Jan 23, 2012 at 11:20:52AM -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen mark...@gmail.com wrote: Now that I think about it, same applies to bytea_output? Probably so. But I think we need not introduce quite so many new threads on this patch. This is, I think, at least thread #4, and that's making the discussion hard to follow. Well, this is independent of the proposed patch, so I think a separate thread is okay. The question is shouldn't bytea_output be marked GUC_REPORT? I think that probably it should be, though I wonder whether we're not too late. Clients relying on it to be transmitted are not going to work with existing 9.0 or 9.1 releases; so maybe changing it to be reported going forward would just make things worse. Well, in a complex setup it can change under you at will, but as clients can process the data without knowing the server state, maybe it's not a big problem. (Unless there are old clients in the mix...) Perhaps we can leave it as-is? But this leaves the question of future policy for data format change in protocol. Note I'm talking about both text and binary formats here together. Although we could have different policy for them. Also note that any kind of per-session flag is basically a GUC. Question 1 - how does client know about which format data is? 1) new format is detectable from lossy GUC 2) new format is detectable from GUC_REPORT 3) new format is detectable from Postgres version 4) new format was requested in query (V4 proto) 5) new format is detectable from data (\x in bytea) 1. obviously does not work. 2. works, but requires changes across all infrastructure. 3. works and is simple, but painful. 4. is good, but in the future 5. is good, now Question 2 - how does client request new format? 1) Postgres new version forces it. 2) GUC_REPORT + non-detectable data 3) Lossy GUC + autodetectable data 4) GUC_REPORT + autodetectable data 5) Per-request data (V4 proto) 1. is painful 2. is painful - all infra components need to know about the GUC. 34. are both ugly and non-maintanable in long term. Only difference is that with 3) the infrastructure can give slight guarantees that it does not change under client. 4. seems good... Btw, it does not seems that per-request metainfo change requires major version. It just client can send extra metainfo packet before bind+execute, if it knows server version is good enough. For older servers it can simply skip the extra info. [Oh yeah, that requires data format is autodetectable, always.] My conclusions: 1. Any change in data format should be compatible with old data. IOW - if client requested new data format, it should always accept old format too. 2. Can we postpone minor data format changes on the wire until there is proper way for clients to request on-the-wire formats? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
Marko Kreen mark...@gmail.com writes: [ bytea_output doesn't need to be GUC_REPORT because format is autodetectable ] Fair enough. Anyway we're really about two years too late to revisit that. Btw, it does not seems that per-request metainfo change requires major version. It just client can send extra metainfo packet before bind+execute, if it knows server version is good enough. That is nonsense. You're changing the protocol, and then saying that clients should consult the server version instead of the protocol version to know what to do. 2. Can we postpone minor data format changes on the wire until there is proper way for clients to request on-the-wire formats? I think that people are coming around to that position, ie, we need a well-engineered solution to the versioning problem *first*, and should not accept incompatible minor improvements until we have that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Jan 23, 2012, at 2:49 PM, Tom Lane wrote: Marko Kreen mark...@gmail.com writes: [ bytea_output doesn't need to be GUC_REPORT because format is autodetectable ] Fair enough. Anyway we're really about two years too late to revisit that. Btw, it does not seems that per-request metainfo change requires major version. It just client can send extra metainfo packet before bind+execute, if it knows server version is good enough. That is nonsense. You're changing the protocol, and then saying that clients should consult the server version instead of the protocol version to know what to do. 2. Can we postpone minor data format changes on the wire until there is proper way for clients to request on-the-wire formats? I think that people are coming around to that position, ie, we need a well-engineered solution to the versioning problem *first*, and should not accept incompatible minor improvements until we have that. One simple way clients could detect the binary encoding at startup would be to pass known test parameters and match against the returned values. If the client cannot match the response, then it should choose the text representation. Alternatively, the 16-bit int in the Bind and RowDescription messages could be incremented to indicate a new format and then clients can specify the highest version of the binary format which they support. Cheers, M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Mon, Jan 23, 2012 at 2:00 PM, A.M. age...@themactionfaction.com wrote: One simple way clients could detect the binary encoding at startup would be to pass known test parameters and match against the returned values. If the client cannot match the response, then it should choose the text representation. Alternatively, the 16-bit int in the Bind and RowDescription messages could be incremented to indicate a new format and then clients can specify the highest version of the binary format which they support. Prefer the version. But why send this over and over with each bind? Wouldn't you negotiate that when connecting? Most likely, optionally, doing as much as you can from the server version? Personally I'm not really enthusiastic about a solution that adds a non-avoidable penalty to all queries. Also, a small nit: this problem is not specific to binary formats. Text formats can and do change, albeit rarely, with predictable headaches for the client. I see no reason to deal with text/binary differently. The only difference between text/binary wire formats in my eyes are that the text formats are documented. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Jan 23, 2012, at 4:45 PM, Merlin Moncure wrote: On Mon, Jan 23, 2012 at 2:00 PM, A.M. age...@themactionfaction.com wrote: One simple way clients could detect the binary encoding at startup would be to pass known test parameters and match against the returned values. If the client cannot match the response, then it should choose the text representation. Alternatively, the 16-bit int in the Bind and RowDescription messages could be incremented to indicate a new format and then clients can specify the highest version of the binary format which they support. Prefer the version. But why send this over and over with each bind? Wouldn't you negotiate that when connecting? Most likely, optionally, doing as much as you can from the server version? Personally I'm not really enthusiastic about a solution that adds a non-avoidable penalty to all queries. Also, a small nit: this problem is not specific to binary formats. Text formats can and do change, albeit rarely, with predictable headaches for the client. I see no reason to deal with text/binary differently. The only difference between text/binary wire formats in my eyes are that the text formats are documented. merlin In terms of backwards compatibility (to support the widest range of clients), wouldn't it make sense to freeze each format option? That way, an updated text version could also assume a new int16 format identifier. The client would simply pass its preferred format. This could also allow for multiple in-flight formats; for example, if a client anticipates a large in-bound bytea column, it could specify format X which indicates the server should use gzip the result before sending. That same format may not be preferable on a different request. Cheers, M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GUC_REPORT for protocol tunables was: Re: [HACKERS] Optimize binary serialization format of arrays with fixed size elements
On Mon, Jan 23, 2012 at 4:12 PM, A.M. age...@themactionfaction.com wrote: On Jan 23, 2012, at 4:45 PM, Merlin Moncure wrote: Prefer the version. But why send this over and over with each bind? Wouldn't you negotiate that when connecting? Most likely, optionally, doing as much as you can from the server version? Personally I'm not really enthusiastic about a solution that adds a non-avoidable penalty to all queries. In terms of backwards compatibility (to support the widest range of clients), wouldn't it make sense to freeze each format option? That way, an updated text version could also assume a new int16 format identifier. The client would simply pass its preferred format. This could also allow for multiple in-flight formats; for example, if a client anticipates a large in-bound bytea column, it could specify format X which indicates the server should use gzip the result before sending. That same format may not be preferable on a different request. hm. well, I'd say that you're much better off if you can hold to the principle that newer versions of the format are always better and should both be used if the application and the server agree. Using your example, since you can already do something like: select zlib_compress(byteacol) from foo; I'm not sure that you're getting anything with that user facing complexity. The only realistic case I can see for explicit control of wire formats chosen is to defend your application from format changes in the server when upgrading the server and/or libpq. This isn't a let's get better compression problem, this is I upgraded my database and my application broke problem. Fixing this problem in non documentation fashion is going to require a full protocol change, period. It's the only way we can safely get all the various players (libpq, jdbc, etc) on the same page without breaking/recompiling millions of lines of old code that is currently in production. The new protocol should *require* at minimum the application, not libpq, to explicitly send the version of the database it was coded against. That's just not getting sent now, and without that information there's no realistic way to prevent application breakage -- depending on libpq versions is useless since it can be upgraded and there's always jdbc to deal with. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers