Re: Stability of queryid in minor versions
On Sat, Apr 20, 2024 at 01:56:48PM +1200, David Rowley wrote: > Thanks for the review. I've now pushed this, backpatching to 12. You've split that into two separate paragraphs with 2d3389c28c5c. Thanks for the commit. -- Michael signature.asc Description: PGP signature
Re: Stability of queryid in minor versions
On Tue, 16 Apr 2024 at 15:16, Michael Paquier wrote: > > On Tue, Apr 16, 2024 at 02:04:22PM +1200, David Rowley wrote: > > It makes sense to talk about the hashing variations closer to the > > object identifier part. > > Mostly what I had in mind. A separate for the new part you are > adding at the end of the first part feels a bit more natural split > here. Feel free to discard my comment if you think that's not worth > it. Thanks for the review. I've now pushed this, backpatching to 12. David
Re: Stability of queryid in minor versions
On Tue, Apr 16, 2024 at 02:04:22PM +1200, David Rowley wrote: > It makes sense to talk about the hashing variations closer to the > object identifier part. Mostly what I had in mind. A separate for the new part you are adding at the end of the first part feels a bit more natural split here. Feel free to discard my comment if you think that's not worth it. -- Michael signature.asc Description: PGP signature
Re: Stability of queryid in minor versions
On Tue, 16 Apr 2024 at 12:10, Michael Paquier wrote: > Not sure that this is an improvement in clarity. There are a few > bullet points that treat about the instability of the query ID, and > your patch is now mixing the query ID being different for two > mostly-identical queries on the same host with larger conditions like > the environment involved. Perhaps it would be better to move the last > sentence of the first ("Furthermore, it is not safe..") with > the part you are adding about replication in this paragraph. Yeah, I think this is better. I think the attached is what you mean. It makes sense to talk about the hashing variations closer to the object identifier part. David diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml index 44dd4db7ce..38c4ce3241 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -636,18 +636,21 @@ machine architecture and other facets of the platform. Furthermore, it is not safe to assume that queryid will be stable across major versions of PostgreSQL. + Two servers participating in replication based on physical WAL replay can + be expected to have identical queryid values for + the same query. However, logical replication schemes do not promise to + keep replicas identical in all relevant details, so + queryid will not be a useful identifier for + accumulating costs across a set of logical replicas. + If in doubt, direct testing is recommended. - As a rule of thumb, queryid values can be assumed to be - stable and comparable only so long as the underlying server version and - catalog metadata details stay exactly the same. Two servers - participating in replication based on physical WAL replay can be expected - to have identical queryid values for the same query. - However, logical replication schemes do not promise to keep replicas - identical in all relevant details, so queryid will - not be a useful identifier for accumulating costs across a set of logical - replicas. If in doubt, direct testing is recommended. + Generally, it can be assumed that queryid values + are stable between minor version releases of PostgreSQL, + providing that instances are running on the same machine architecture and + the catalog metadata details match. Compatibility will only be broken + between minor versions as a last resort.
Re: Stability of queryid in minor versions
On Mon, Apr 15, 2024 at 02:54:52PM +1200, David Rowley wrote: > pg_stat_statements will consider two > apparently-identical > queries to be distinct, if they reference a table that was dropped > and recreated between the executions of the two queries. > + Two servers participating in replication based on physical WAL replay can > + be expected to have identical queryid values > for > + the same query. However, logical replication schemes do not promise to > + keep replicas identical in all relevant details, so > + queryid will not be a useful identifier for > + accumulating costs across a set of logical replicas. > The hashing process is also sensitive to differences in > machine architecture and other facets of the platform. > Furthermore, it is not safe to assume that > queryid > will be stable across major versions of > PostgreSQL. > + If in doubt, direct testing is recommended. > Not sure that this is an improvement in clarity. There are a few bullet points that treat about the instability of the query ID, and your patch is now mixing the query ID being different for two mostly-identical queries on the same host with larger conditions like the environment involved. Perhaps it would be better to move the last sentence of the first ("Furthermore, it is not safe..") with the part you are adding about replication in this paragraph. > > - As a rule of thumb, queryid values can be > assumed to be > - stable and comparable only so long as the underlying server version and > - catalog metadata details stay exactly the same. Two servers > - participating in replication based on physical WAL replay can be expected > - to have identical queryid values for the same > query. > - However, logical replication schemes do not promise to keep replicas > - identical in all relevant details, so queryid > will > - not be a useful identifier for accumulating costs across a set of logical > - replicas. If in doubt, direct testing is recommended. > + Generally, it can be assumed that queryid > values > + are stable between minor version releases of > PostgreSQL, > + providing that instances are running on the same machine architecture and > + the catalog metadata details match. Compatibility will only be broken > + between minor versions as a last resort. This split is cleaner. -- Michael signature.asc Description: PGP signature
Re: Stability of queryid in minor versions
On Mon, 15 Apr 2024 at 14:09, Michael Paquier wrote: > > On Mon, Apr 15, 2024 at 01:31:47PM +1200, David Rowley wrote: > > I'm unsure if "Rule of thumb" is the correct way to convey that. We > > can't really write "We endeavour to", as who is "We". Maybe something > > like "Generally, it can be assumed that queryid is stable between all > > minor versions of a major version of ..., providing that > reasons>". > > It sounds to me that the term "best-effort" is adapted here? Like in > "The compatibility of query IDs is preserved across minor versions on > a best-effort basis. It is possible that the post-parse-analysis tree > changes across minor releases, impacting the value of queryid for the > same query run across two different minor versions.". I had another try and ended up pushing the logical / physical replica details up to the paragraph above. It seems more relevant to mention this in the section which details reasons why the queryid can be unstable due to metadata variations. I think keeping the 2nd paragraph for reasons it's stable is a good separation of responsibility. I didn't include the "best-effort" word, but here's what I did come up with. David diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml index 44dd4db7ce..302421306e 100644 --- a/doc/src/sgml/pgstatstatements.sgml +++ b/doc/src/sgml/pgstatstatements.sgml @@ -632,22 +632,25 @@ pg_stat_statements will consider two apparently-identical queries to be distinct, if they reference a table that was dropped and recreated between the executions of the two queries. + Two servers participating in replication based on physical WAL replay can + be expected to have identical queryid values for + the same query. However, logical replication schemes do not promise to + keep replicas identical in all relevant details, so + queryid will not be a useful identifier for + accumulating costs across a set of logical replicas. The hashing process is also sensitive to differences in machine architecture and other facets of the platform. Furthermore, it is not safe to assume that queryid will be stable across major versions of PostgreSQL. + If in doubt, direct testing is recommended. - As a rule of thumb, queryid values can be assumed to be - stable and comparable only so long as the underlying server version and - catalog metadata details stay exactly the same. Two servers - participating in replication based on physical WAL replay can be expected - to have identical queryid values for the same query. - However, logical replication schemes do not promise to keep replicas - identical in all relevant details, so queryid will - not be a useful identifier for accumulating costs across a set of logical - replicas. If in doubt, direct testing is recommended. + Generally, it can be assumed that queryid values + are stable between minor version releases of PostgreSQL, + providing that instances are running on the same machine architecture and + the catalog metadata details match. Compatibility will only be broken + between minor versions as a last resort.
Re: Stability of queryid in minor versions
On Sun, Apr 14, 2024 at 7:03 PM David Rowley wrote: > On Mon, 15 Apr 2024 at 13:37, David G. Johnston > wrote: > > Seems we can improve things by simply removing the "rule of thumb" > sentence altogether. The prior paragraph states the things the queryid > depends upon at the level of detail the reader needs. > > I don't think that addresses the following, which I mentioned earlier: > > > but not stable across *major* versions does *not* mean stable across > > *minor* versions. The reader is just left guessing if that's true. > > The base assumption here is that changes in the things we don't mention do not influence the queryid. We didn't mention minor versions, changing them doesn't influence the queryid. Now, reading that entire paragraph is a bit of a challenge IMO, and agree, as I subsequently noted, that the sentence you pointed out could be reworked. I stand by my statement that removing the sentence about "rule of thumb" altogether is a win. The prior paragraph should be sufficient - it is technically at the moment but am not opposed to rewording. David J.
Re: Stability of queryid in minor versions
On Mon, Apr 15, 2024 at 01:31:47PM +1200, David Rowley wrote: > I think a soft guarantee in the docs for it being stable in minor > versions would be ok then. > > I'm unsure if "Rule of thumb" is the correct way to convey that. We > can't really write "We endeavour to", as who is "We". Maybe something > like "Generally, it can be assumed that queryid is stable between all > minor versions of a major version of ..., providing that reasons>". It sounds to me that the term "best-effort" is adapted here? Like in "The compatibility of query IDs is preserved across minor versions on a best-effort basis. It is possible that the post-parse-analysis tree changes across minor releases, impacting the value of queryid for the same query run across two different minor versions.". -- Michael signature.asc Description: PGP signature
Re: Stability of queryid in minor versions
On Mon, 15 Apr 2024 at 13:37, David G. Johnston wrote: > Seems we can improve things by simply removing the "rule of thumb" sentence > altogether. The prior paragraph states the things the queryid depends upon > at the level of detail the reader needs. I don't think that addresses the following, which I mentioned earlier: > but not stable across *major* versions does *not* mean stable across > *minor* versions. The reader is just left guessing if that's true. David
Re: Stability of queryid in minor versions
On Sun, Apr 14, 2024 at 6:32 PM David Rowley wrote: > On Mon, 15 Apr 2024 at 13:19, Tom Lane wrote: > > > > Michael Paquier writes: > > > On Mon, Apr 15, 2024 at 11:20:16AM +1200, David Rowley wrote: > > >> 1. We cannot change Node enums in minor versions > > >> 2. We're *unlikely* to add fields to Node types in minor versions, and > > >> if we did we'd likely be leaving them out of the jumble calc, plus it > > >> seems highly unlikely any new field we wedged into the padding would > > >> relate at all to the parsed query. > > > > > Since 16 these new fields would be added by default unless the node > > > attribute query_jumble_ignore is appended to it. > > > > They'd also be written/read by outfuncs/readfuncs, thereby breaking > > stored views/rules if the Node is one that can appear in a parsetree. > > So the bar to making such a change in a stable branch would be very > > high. > > I think a soft guarantee in the docs for it being stable in minor > versions would be ok then. > > I'm unsure if "Rule of thumb" is the correct way to convey that. We > can't really write "We endeavour to", as who is "We". Maybe something > like "Generally, it can be assumed that queryid is stable between all > minor versions of a major version of ..., providing that reasons>". > > So, there are three kinds of dependencies: Product Machine User Data The user data dependencies are noted as being OID among other things The machine dependencies are the architecture and other facets The product dependencies are not enumerated but can be simply stated to be internals stable throughout a major version. A minimal rewording of the last sentence in the prior paragraph could be: Lastly, the queryid depends upon aspects of PostgreSQL internals that can only change with each major version release. I'm disinclined to note minor releases here given the above wording. Sure, like with lots of things, circumstances may require us to break a policy, but we don't seem to make that point everywhere we conceive it could happen. David J.
Re: Stability of queryid in minor versions
On Sun, Apr 14, 2024 at 4:20 PM David Rowley wrote: > > I've drafted a patch which I think improves things, but it probably > needs more work and opinions. > > Seems we can improve things by simply removing the "rule of thumb" sentence altogether. The prior paragraph states the things the queryid depends upon at the level of detail the reader needs. The sentence "Two servers participating in replication based on physical WAL replay can be expected to have identical queryid values for the same query." apparently assumes that to participate both servers must share the same machine architecture. I am under the impression that this is only an advisory, not a requirement. Rather, two servers participating in physical replication will be ensured that the catalog metadata and major versions are identical. This is not the case for servers related via logical replication. David J.
Re: Stability of queryid in minor versions
On Mon, 15 Apr 2024 at 13:19, Tom Lane wrote: > > Michael Paquier writes: > > On Mon, Apr 15, 2024 at 11:20:16AM +1200, David Rowley wrote: > >> 1. We cannot change Node enums in minor versions > >> 2. We're *unlikely* to add fields to Node types in minor versions, and > >> if we did we'd likely be leaving them out of the jumble calc, plus it > >> seems highly unlikely any new field we wedged into the padding would > >> relate at all to the parsed query. > > > Since 16 these new fields would be added by default unless the node > > attribute query_jumble_ignore is appended to it. > > They'd also be written/read by outfuncs/readfuncs, thereby breaking > stored views/rules if the Node is one that can appear in a parsetree. > So the bar to making such a change in a stable branch would be very > high. I think a soft guarantee in the docs for it being stable in minor versions would be ok then. I'm unsure if "Rule of thumb" is the correct way to convey that. We can't really write "We endeavour to", as who is "We". Maybe something like "Generally, it can be assumed that queryid is stable between all minor versions of a major version of ..., providing that ". David
Re: Stability of queryid in minor versions
On Mon, 15 Apr 2024 at 12:04, Michael Paquier wrote: > Since 16 these new fields would be added by default unless the node > attribute query_jumble_ignore is appended to it. I agree that this > may not be entirely intuitive when it comes to force compatibility > across the same major version. Could there be cases where it is worth > breaking compatibility and include something more in the jumbling, > though? I've not seen the case in recent years even in stable > branches. I think that's a valid possible situation which could result in a change. however, I think it's much less likely than it used to be because we'd have to accidentally have used query_jumble_ignore, whereas before all the struct parsing stuff went in, we could have just forgotten to jumble the field when adding a new field to a parse struct. David
Re: Stability of queryid in minor versions
Michael Paquier writes: > On Mon, Apr 15, 2024 at 11:20:16AM +1200, David Rowley wrote: >> 1. We cannot change Node enums in minor versions >> 2. We're *unlikely* to add fields to Node types in minor versions, and >> if we did we'd likely be leaving them out of the jumble calc, plus it >> seems highly unlikely any new field we wedged into the padding would >> relate at all to the parsed query. > Since 16 these new fields would be added by default unless the node > attribute query_jumble_ignore is appended to it. They'd also be written/read by outfuncs/readfuncs, thereby breaking stored views/rules if the Node is one that can appear in a parsetree. So the bar to making such a change in a stable branch would be very high. regards, tom lane
Re: Stability of queryid in minor versions
On Sun, Apr 14, 2024 at 9:01 PM David Rowley wrote: > On Mon, 15 Apr 2024 at 11:47, Peter Geoghegan wrote: > > Technically we don't promise that WAL records won't change in minor > > versions. In fact, the docs specifically state that the format of any > > WAL record might change, and that users should upgrade standbys first > > on general principle (though I imagine few do). We try hard to avoid > > changing the format of WAL records in point releases, of course, but > > strictly speaking there is no guarantee. This situation seems similar > > (though much lower stakes) to me. Query normalization isn't perfect -- > > there's a trade-off. > Where does WAL fit into this? And why would a WAL format change the > computed value? It doesn't. I just compared the two situations, which seem analogous. -- Peter Geoghegan
Re: Stability of queryid in minor versions
On Mon, 15 Apr 2024 at 11:47, Peter Geoghegan wrote: > > On Sun, Apr 14, 2024 at 7:20 PM David Rowley wrote: > > It's the "underlying server version" that I think needs some > > clarification. It's unclear if the minor version must match or just > > the major version number. The preceding paragraph does mention: > > > > "Furthermore, it is not safe to assume that queryid will be stable > > across major versions of PostgreSQL." > > > > but not stable across *major* versions does *not* mean stable across > > *minor* versions. The reader is just left guessing if that's true. > > Technically we don't promise that WAL records won't change in minor > versions. In fact, the docs specifically state that the format of any > WAL record might change, and that users should upgrade standbys first > on general principle (though I imagine few do). We try hard to avoid > changing the format of WAL records in point releases, of course, but > strictly speaking there is no guarantee. This situation seems similar > (though much lower stakes) to me. Query normalization isn't perfect -- > there's a trade-off. set compute_query_id = 'on'; explain (costs off, verbose) select oid from pg_class; QUERY PLAN - Index Only Scan using pg_class_oid_index on pg_catalog.pg_class Output: oid Query Identifier: -8748805461085747951 (3 rows) As far as I understand query ID; it's based on the parse nodes and values in the system catalogue tables and is calculated on the local server. Computed values are susceptible to variations in hash values calculated by different CPU architectures. Where does WAL fit into this? And why would a WAL format change the computed value? David
Re: Stability of queryid in minor versions
On Sun, Apr 14, 2024 at 8:04 PM Michael Paquier wrote: > Assuming that a query ID will be always stable across major versions > is overconfident, I think. As Peter said, like for WAL, we may face > cases where a slight breakage for a subset of queries could be > justified, and pg_stat_statement would be able to cope with that by > discarding the oldest entries in its hash tables. If there was a minor break in compatibility, that either went unnoticed, or was considered too minor to matter, then pg_stat_statements would be in exactly the same position as any external tool that uses its queryid values to accumulate query costs. While external tools can't understand the provenance of old queryid values, pg_stat_statements can't either. -- Peter Geoghegan
Re: Stability of queryid in minor versions
On Mon, Apr 15, 2024 at 11:20:16AM +1200, David Rowley wrote: > I was recently asked internally about the stability guarantees we > offer for queryid. My answer consisted of: > > 1. We cannot change Node enums in minor versions > 2. We're *unlikely* to add fields to Node types in minor versions, and > if we did we'd likely be leaving them out of the jumble calc, plus it > seems highly unlikely any new field we wedged into the padding would > relate at all to the parsed query. Since 16 these new fields would be added by default unless the node attribute query_jumble_ignore is appended to it. I agree that this may not be entirely intuitive when it comes to force compatibility across the same major version. Could there be cases where it is worth breaking compatibility and include something more in the jumbling, though? I've not seen the case in recent years even in stable branches. > Maybe the paragraph starting with "Consumers of" can detail the > reasons queryid might be unstable and the following paragraph can > describe the scenario for when the queryid can generally assumed to be > stable. > > > As a rule of thumb, queryid values can be > assumed to be > - stable and comparable only so long as the underlying server version and > - catalog metadata details stay exactly the same. Two servers > + stable and comparable only between PostgreSQL > instances > + which are running the same major version of > PostgreSQL > + and are running on the same machine architecture and catalog metadata > details match. Two servers > participating in replication based on physical WAL replay can be expected > to have identical queryid values for the same > query. > However, logical replication schemes do not promise to keep replicas Assuming that a query ID will be always stable across major versions is overconfident, I think. As Peter said, like for WAL, we may face cases where a slight breakage for a subset of queries could be justified, and pg_stat_statement would be able to cope with that by discarding the oldest entries in its hash tables. -- Michael signature.asc Description: PGP signature
Re: Stability of queryid in minor versions
On Sun, Apr 14, 2024 at 7:20 PM David Rowley wrote: > It's the "underlying server version" that I think needs some > clarification. It's unclear if the minor version must match or just > the major version number. The preceding paragraph does mention: > > "Furthermore, it is not safe to assume that queryid will be stable > across major versions of PostgreSQL." > > but not stable across *major* versions does *not* mean stable across > *minor* versions. The reader is just left guessing if that's true. Technically we don't promise that WAL records won't change in minor versions. In fact, the docs specifically state that the format of any WAL record might change, and that users should upgrade standbys first on general principle (though I imagine few do). We try hard to avoid changing the format of WAL records in point releases, of course, but strictly speaking there is no guarantee. This situation seems similar (though much lower stakes) to me. Query normalization isn't perfect -- there's a trade-off. > Maybe the paragraph starting with "Consumers of" can detail the > reasons queryid might be unstable and the following paragraph can > describe the scenario for when the queryid can generally assumed to be > stable. I think that it would be reasonable to say that we strive to not break the format in point releases. Fundamentally, if pg_stat_statements sees a hard queryid format change (e.g. due to a major version upgrade), then pg_stat_statements throws away the accumulated query stats without being asked to. -- Peter Geoghegan