Re: Stability of queryid in minor versions

2024-04-19 Thread Michael Paquier
On Sat, Apr 20, 2024 at 01:56:48PM +1200, David Rowley wrote:
> Thanks for the review. I've now pushed this, backpatching to 12.

You've split that into two separate paragraphs with 2d3389c28c5c.
Thanks for the commit.
--
Michael


signature.asc
Description: PGP signature


Re: Stability of queryid in minor versions

2024-04-19 Thread David Rowley
On Tue, 16 Apr 2024 at 15:16, Michael Paquier  wrote:
>
> On Tue, Apr 16, 2024 at 02:04:22PM +1200, David Rowley wrote:
> > It makes sense to talk about the hashing variations closer to the
> > object identifier part.
>
> Mostly what I had in mind.  A separate  for the new part you are
> adding at the end of the first part feels a bit more natural split
> here.  Feel free to discard my comment if you think that's not worth
> it.

Thanks for the review. I've now pushed this, backpatching to 12.

David




Re: Stability of queryid in minor versions

2024-04-15 Thread Michael Paquier
On Tue, Apr 16, 2024 at 02:04:22PM +1200, David Rowley wrote:
> It makes sense to talk about the hashing variations closer to the
> object identifier part.

Mostly what I had in mind.  A separate  for the new part you are
adding at the end of the first part feels a bit more natural split
here.  Feel free to discard my comment if you think that's not worth
it.
--
Michael


signature.asc
Description: PGP signature


Re: Stability of queryid in minor versions

2024-04-15 Thread David Rowley
On Tue, 16 Apr 2024 at 12:10, Michael Paquier  wrote:
> Not sure that this is an improvement in clarity.  There are a few
> bullet points that treat about the instability of the query ID, and
> your patch is now mixing the query ID being different for two
> mostly-identical queries on the same host with larger conditions like
> the environment involved.  Perhaps it would be better to move the last
> sentence of the first  ("Furthermore, it is not safe..") with
> the part you are adding about replication in this paragraph.

Yeah, I think this is better.  I think the attached is what you mean.

It makes sense to talk about the hashing variations closer to the
object identifier part.

David
diff --git a/doc/src/sgml/pgstatstatements.sgml 
b/doc/src/sgml/pgstatstatements.sgml
index 44dd4db7ce..38c4ce3241 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -636,18 +636,21 @@
machine architecture and other facets of the platform.
Furthermore, it is not safe to assume that 
queryid
will be stable across major versions of 
PostgreSQL.
+   Two servers participating in replication based on physical WAL replay can
+   be expected to have identical queryid values for
+   the same query.  However, logical replication schemes do not promise to
+   keep replicas identical in all relevant details, so
+   queryid will not be a useful identifier for
+   accumulating costs across a set of logical replicas.
+   If in doubt, direct testing is recommended.
   
 
   
-   As a rule of thumb, queryid values can be 
assumed to be
-   stable and comparable only so long as the underlying server version and
-   catalog metadata details stay exactly the same.  Two servers
-   participating in replication based on physical WAL replay can be expected
-   to have identical queryid values for the same 
query.
-   However, logical replication schemes do not promise to keep replicas
-   identical in all relevant details, so queryid 
will
-   not be a useful identifier for accumulating costs across a set of logical
-   replicas.  If in doubt, direct testing is recommended.
+   Generally, it can be assumed that queryid values
+   are stable between minor version releases of 
PostgreSQL,
+   providing that instances are running on the same machine architecture and
+   the catalog metadata details match.  Compatibility will only be broken
+   between minor versions as a last resort.
   
 
   


Re: Stability of queryid in minor versions

2024-04-15 Thread Michael Paquier
On Mon, Apr 15, 2024 at 02:54:52PM +1200, David Rowley wrote:
> pg_stat_statements will consider two 
> apparently-identical
> queries to be distinct, if they reference a table that was dropped
> and recreated between the executions of the two queries.
> +   Two servers participating in replication based on physical WAL replay can
> +   be expected to have identical queryid values 
> for
> +   the same query.  However, logical replication schemes do not promise to
> +   keep replicas identical in all relevant details, so
> +   queryid will not be a useful identifier for
> +   accumulating costs across a set of logical replicas.
> The hashing process is also sensitive to differences in
> machine architecture and other facets of the platform.
> Furthermore, it is not safe to assume that 
> queryid
> will be stable across major versions of 
> PostgreSQL.
> +   If in doubt, direct testing is recommended.
>

Not sure that this is an improvement in clarity.  There are a few
bullet points that treat about the instability of the query ID, and
your patch is now mixing the query ID being different for two
mostly-identical queries on the same host with larger conditions like
the environment involved.  Perhaps it would be better to move the last
sentence of the first  ("Furthermore, it is not safe..") with
the part you are adding about replication in this paragraph.

>
> -   As a rule of thumb, queryid values can be 
> assumed to be
> -   stable and comparable only so long as the underlying server version and
> -   catalog metadata details stay exactly the same.  Two servers
> -   participating in replication based on physical WAL replay can be expected
> -   to have identical queryid values for the same 
> query.
> -   However, logical replication schemes do not promise to keep replicas
> -   identical in all relevant details, so queryid 
> will
> -   not be a useful identifier for accumulating costs across a set of logical
> -   replicas.  If in doubt, direct testing is recommended.
> +   Generally, it can be assumed that queryid 
> values
> +   are stable between minor version releases of 
> PostgreSQL,
> +   providing that instances are running on the same machine architecture and
> +   the catalog metadata details match.  Compatibility will only be broken
> +   between minor versions as a last resort.

This split is cleaner.
--
Michael


signature.asc
Description: PGP signature


Re: Stability of queryid in minor versions

2024-04-14 Thread David Rowley
On Mon, 15 Apr 2024 at 14:09, Michael Paquier  wrote:
>
> On Mon, Apr 15, 2024 at 01:31:47PM +1200, David Rowley wrote:
> > I'm unsure if "Rule of thumb" is the correct way to convey that. We
> > can't really write "We endeavour to", as who is "We".  Maybe something
> > like "Generally, it can be assumed that queryid is stable between all
> > minor versions of a major version of ..., providing that  > reasons>".
>
> It sounds to me that the term "best-effort" is adapted here?  Like in
> "The compatibility of query IDs is preserved across minor versions on
> a best-effort basis.  It is possible that the post-parse-analysis tree
> changes across minor releases, impacting the value of queryid for the
> same query run across two different minor versions.".

I had another try and ended up pushing the logical / physical replica
details up to the paragraph above. It seems more relevant to mention
this in the section which details reasons why the queryid can be
unstable due to metadata variations.  I think keeping the 2nd
paragraph for reasons it's stable is a good separation of
responsibility.  I didn't include the "best-effort" word, but here's
what I did come up with.

David
diff --git a/doc/src/sgml/pgstatstatements.sgml 
b/doc/src/sgml/pgstatstatements.sgml
index 44dd4db7ce..302421306e 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -632,22 +632,25 @@
pg_stat_statements will consider two 
apparently-identical
queries to be distinct, if they reference a table that was dropped
and recreated between the executions of the two queries.
+   Two servers participating in replication based on physical WAL replay can
+   be expected to have identical queryid values for
+   the same query.  However, logical replication schemes do not promise to
+   keep replicas identical in all relevant details, so
+   queryid will not be a useful identifier for
+   accumulating costs across a set of logical replicas.
The hashing process is also sensitive to differences in
machine architecture and other facets of the platform.
Furthermore, it is not safe to assume that 
queryid
will be stable across major versions of 
PostgreSQL.
+   If in doubt, direct testing is recommended.
   
 
   
-   As a rule of thumb, queryid values can be 
assumed to be
-   stable and comparable only so long as the underlying server version and
-   catalog metadata details stay exactly the same.  Two servers
-   participating in replication based on physical WAL replay can be expected
-   to have identical queryid values for the same 
query.
-   However, logical replication schemes do not promise to keep replicas
-   identical in all relevant details, so queryid 
will
-   not be a useful identifier for accumulating costs across a set of logical
-   replicas.  If in doubt, direct testing is recommended.
+   Generally, it can be assumed that queryid values
+   are stable between minor version releases of 
PostgreSQL,
+   providing that instances are running on the same machine architecture and
+   the catalog metadata details match.  Compatibility will only be broken
+   between minor versions as a last resort.
   
 
   


Re: Stability of queryid in minor versions

2024-04-14 Thread David G. Johnston
On Sun, Apr 14, 2024 at 7:03 PM David Rowley  wrote:

> On Mon, 15 Apr 2024 at 13:37, David G. Johnston
>  wrote:
> > Seems we can improve things by simply removing the "rule of thumb"
> sentence altogether.  The prior paragraph states the things the queryid
> depends upon at the level of detail the reader needs.
>
> I don't think that addresses the following, which I mentioned earlier:
>
> > but not stable across *major* versions does *not* mean stable across
> > *minor* versions. The reader is just left guessing if that's true.
>
>
The base assumption here is that changes in the things we don't mention do
not influence the queryid.  We didn't mention minor versions, changing them
doesn't influence the queryid.

Now, reading that entire paragraph is a bit of a challenge IMO, and agree,
as I subsequently noted, that the sentence you pointed out could be
reworked.  I stand by my statement that removing the sentence about "rule
of thumb" altogether is a win.  The prior paragraph should be sufficient -
it is technically at the moment but am not opposed to rewording.

David J.


Re: Stability of queryid in minor versions

2024-04-14 Thread Michael Paquier
On Mon, Apr 15, 2024 at 01:31:47PM +1200, David Rowley wrote:
> I think a soft guarantee in the docs for it being stable in minor
> versions would be ok then.
> 
> I'm unsure if "Rule of thumb" is the correct way to convey that. We
> can't really write "We endeavour to", as who is "We".  Maybe something
> like "Generally, it can be assumed that queryid is stable between all
> minor versions of a major version of ..., providing that  reasons>".

It sounds to me that the term "best-effort" is adapted here?  Like in
"The compatibility of query IDs is preserved across minor versions on
a best-effort basis.  It is possible that the post-parse-analysis tree
changes across minor releases, impacting the value of queryid for the
same query run across two different minor versions.".
--
Michael


signature.asc
Description: PGP signature


Re: Stability of queryid in minor versions

2024-04-14 Thread David Rowley
On Mon, 15 Apr 2024 at 13:37, David G. Johnston
 wrote:
> Seems we can improve things by simply removing the "rule of thumb" sentence 
> altogether.  The prior paragraph states the things the queryid depends upon 
> at the level of detail the reader needs.

I don't think that addresses the following, which I mentioned earlier:

> but not stable across *major* versions does *not* mean stable across
> *minor* versions. The reader is just left guessing if that's true.

David




Re: Stability of queryid in minor versions

2024-04-14 Thread David G. Johnston
On Sun, Apr 14, 2024 at 6:32 PM David Rowley  wrote:

> On Mon, 15 Apr 2024 at 13:19, Tom Lane  wrote:
> >
> > Michael Paquier  writes:
> > > On Mon, Apr 15, 2024 at 11:20:16AM +1200, David Rowley wrote:
> > >> 1. We cannot change Node enums in minor versions
> > >> 2. We're *unlikely* to add fields to Node types in minor versions, and
> > >> if we did we'd likely be leaving them out of the jumble calc, plus it
> > >> seems highly unlikely any new field we wedged into the padding would
> > >> relate at all to the parsed query.
> >
> > > Since 16 these new fields would be added by default unless the node
> > > attribute query_jumble_ignore is appended to it.
> >
> > They'd also be written/read by outfuncs/readfuncs, thereby breaking
> > stored views/rules if the Node is one that can appear in a parsetree.
> > So the bar to making such a change in a stable branch would be very
> > high.
>
> I think a soft guarantee in the docs for it being stable in minor
> versions would be ok then.
>
> I'm unsure if "Rule of thumb" is the correct way to convey that. We
> can't really write "We endeavour to", as who is "We".  Maybe something
> like "Generally, it can be assumed that queryid is stable between all
> minor versions of a major version of ..., providing that  reasons>".
>
>
So, there are three kinds of dependencies:

Product
Machine
User Data

The user data dependencies are noted as being OID among other things
The machine dependencies are the architecture and other facets
The product dependencies are not enumerated but can be simply stated to be
internals stable throughout a major version.

A minimal rewording of the last sentence in the prior paragraph could be:

Lastly, the queryid depends upon aspects of PostgreSQL internals that can
only change with each major version release.

I'm disinclined to note minor releases here given the above wording.  Sure,
like with lots of things, circumstances may require us to break a policy,
but we don't seem to make that point everywhere we conceive it could happen.

David J.


Re: Stability of queryid in minor versions

2024-04-14 Thread David G. Johnston
On Sun, Apr 14, 2024 at 4:20 PM David Rowley  wrote:

>
> I've drafted a patch which I think improves things, but it probably
> needs more work and opinions.
>
>
Seems we can improve things by simply removing the "rule of thumb" sentence
altogether.  The prior paragraph states the things the queryid depends upon
at the level of detail the reader needs.

The sentence "Two servers participating in replication based on physical
WAL replay can be expected to have identical queryid values for the same
query." apparently assumes that to participate both servers must share the
same machine architecture.  I am under the impression that this is only an
advisory, not a requirement.  Rather, two servers participating in physical
replication will be ensured that the catalog metadata and major versions
are identical.  This is not the case for servers related via logical
replication.

David J.


Re: Stability of queryid in minor versions

2024-04-14 Thread David Rowley
On Mon, 15 Apr 2024 at 13:19, Tom Lane  wrote:
>
> Michael Paquier  writes:
> > On Mon, Apr 15, 2024 at 11:20:16AM +1200, David Rowley wrote:
> >> 1. We cannot change Node enums in minor versions
> >> 2. We're *unlikely* to add fields to Node types in minor versions, and
> >> if we did we'd likely be leaving them out of the jumble calc, plus it
> >> seems highly unlikely any new field we wedged into the padding would
> >> relate at all to the parsed query.
>
> > Since 16 these new fields would be added by default unless the node
> > attribute query_jumble_ignore is appended to it.
>
> They'd also be written/read by outfuncs/readfuncs, thereby breaking
> stored views/rules if the Node is one that can appear in a parsetree.
> So the bar to making such a change in a stable branch would be very
> high.

I think a soft guarantee in the docs for it being stable in minor
versions would be ok then.

I'm unsure if "Rule of thumb" is the correct way to convey that. We
can't really write "We endeavour to", as who is "We".  Maybe something
like "Generally, it can be assumed that queryid is stable between all
minor versions of a major version of ..., providing that ".

David




Re: Stability of queryid in minor versions

2024-04-14 Thread David Rowley
On Mon, 15 Apr 2024 at 12:04, Michael Paquier  wrote:
> Since 16 these new fields would be added by default unless the node
> attribute query_jumble_ignore is appended to it.  I agree that this
> may not be entirely intuitive when it comes to force compatibility
> across the same major version.  Could there be cases where it is worth
> breaking compatibility and include something more in the jumbling,
> though?  I've not seen the case in recent years even in stable
> branches.

I think that's a valid possible situation which could result in a
change. however, I think it's much less likely than it used to be
because we'd have to accidentally have used query_jumble_ignore,
whereas before all the struct parsing stuff went in, we could have
just forgotten to jumble the field when adding a new field to a parse
struct.

David




Re: Stability of queryid in minor versions

2024-04-14 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Apr 15, 2024 at 11:20:16AM +1200, David Rowley wrote:
>> 1. We cannot change Node enums in minor versions
>> 2. We're *unlikely* to add fields to Node types in minor versions, and
>> if we did we'd likely be leaving them out of the jumble calc, plus it
>> seems highly unlikely any new field we wedged into the padding would
>> relate at all to the parsed query.

> Since 16 these new fields would be added by default unless the node
> attribute query_jumble_ignore is appended to it.

They'd also be written/read by outfuncs/readfuncs, thereby breaking
stored views/rules if the Node is one that can appear in a parsetree.
So the bar to making such a change in a stable branch would be very
high.

regards, tom lane




Re: Stability of queryid in minor versions

2024-04-14 Thread Peter Geoghegan
On Sun, Apr 14, 2024 at 9:01 PM David Rowley  wrote:
> On Mon, 15 Apr 2024 at 11:47, Peter Geoghegan  wrote:
> > Technically we don't promise that WAL records won't change in minor
> > versions. In fact, the docs specifically state that the format of any
> > WAL record might change, and that users should upgrade standbys first
> > on general principle (though I imagine few do). We try hard to avoid
> > changing the format of WAL records in point releases, of course, but
> > strictly speaking there is no guarantee. This situation seems similar
> > (though much lower stakes) to me. Query normalization isn't perfect --
> > there's a trade-off.

> Where does WAL fit into this? And why would a WAL format change the
> computed value?

It doesn't. I just compared the two situations, which seem analogous.

-- 
Peter Geoghegan




Re: Stability of queryid in minor versions

2024-04-14 Thread David Rowley
On Mon, 15 Apr 2024 at 11:47, Peter Geoghegan  wrote:
>
> On Sun, Apr 14, 2024 at 7:20 PM David Rowley  wrote:
> > It's the "underlying server version" that I think needs some
> > clarification. It's unclear if the minor version must match or just
> > the major version number. The preceding paragraph does mention:
> >
> > "Furthermore, it is not safe to assume that queryid will be stable
> > across major versions of PostgreSQL."
> >
> > but not stable across *major* versions does *not* mean stable across
> > *minor* versions. The reader is just left guessing if that's true.
>
> Technically we don't promise that WAL records won't change in minor
> versions. In fact, the docs specifically state that the format of any
> WAL record might change, and that users should upgrade standbys first
> on general principle (though I imagine few do). We try hard to avoid
> changing the format of WAL records in point releases, of course, but
> strictly speaking there is no guarantee. This situation seems similar
> (though much lower stakes) to me. Query normalization isn't perfect --
> there's a trade-off.

set compute_query_id = 'on';
explain (costs off, verbose) select oid from pg_class;
   QUERY PLAN
-
 Index Only Scan using pg_class_oid_index on pg_catalog.pg_class
   Output: oid
 Query Identifier: -8748805461085747951
(3 rows)

As far as I understand query ID; it's based on the parse nodes and
values in the system catalogue tables and is calculated on the local
server. Computed values are susceptible to variations in hash values
calculated by different CPU architectures.

Where does WAL fit into this? And why would a WAL format change the
computed value?

David




Re: Stability of queryid in minor versions

2024-04-14 Thread Peter Geoghegan
On Sun, Apr 14, 2024 at 8:04 PM Michael Paquier  wrote:
> Assuming that a query ID will be always stable across major versions
> is overconfident, I think.  As Peter said, like for WAL, we may face
> cases where a slight breakage for a subset of queries could be
> justified, and pg_stat_statement would be able to cope with that by
> discarding the oldest entries in its hash tables.

If there was a minor break in compatibility, that either went
unnoticed, or was considered too minor to matter, then
pg_stat_statements would be in exactly the same position as any
external tool that uses its queryid values to accumulate query costs.
While external tools can't understand the provenance of old queryid
values, pg_stat_statements can't either.


-- 
Peter Geoghegan




Re: Stability of queryid in minor versions

2024-04-14 Thread Michael Paquier
On Mon, Apr 15, 2024 at 11:20:16AM +1200, David Rowley wrote:
> I was recently asked internally about the stability guarantees we
> offer for queryid.  My answer consisted of:
> 
> 1. We cannot change Node enums in minor versions
> 2. We're *unlikely* to add fields to Node types in minor versions, and
> if we did we'd likely be leaving them out of the jumble calc, plus it
> seems highly unlikely any new field we wedged into the padding would
> relate at all to the parsed query.

Since 16 these new fields would be added by default unless the node
attribute query_jumble_ignore is appended to it.  I agree that this
may not be entirely intuitive when it comes to force compatibility
across the same major version.  Could there be cases where it is worth 
breaking compatibility and include something more in the jumbling,
though?  I've not seen the case in recent years even in stable
branches.

> Maybe the paragraph starting with "Consumers of" can detail the
> reasons queryid might be unstable and the following paragraph can
> describe the scenario for when the queryid can generally assumed to be
> stable.
>
>
> As a rule of thumb, queryid values can be 
> assumed to be
> -   stable and comparable only so long as the underlying server version and
> -   catalog metadata details stay exactly the same.  Two servers
> +   stable and comparable only between PostgreSQL 
> instances
> +   which are running the same major version of 
> PostgreSQL
> +   and are running on the same machine architecture and catalog metadata 
> details match.  Two servers
> participating in replication based on physical WAL replay can be expected
> to have identical queryid values for the same 
> query.
> However, logical replication schemes do not promise to keep replicas

Assuming that a query ID will be always stable across major versions
is overconfident, I think.  As Peter said, like for WAL, we may face
cases where a slight breakage for a subset of queries could be
justified, and pg_stat_statement would be able to cope with that by
discarding the oldest entries in its hash tables.
--
Michael


signature.asc
Description: PGP signature


Re: Stability of queryid in minor versions

2024-04-14 Thread Peter Geoghegan
On Sun, Apr 14, 2024 at 7:20 PM David Rowley  wrote:
> It's the "underlying server version" that I think needs some
> clarification. It's unclear if the minor version must match or just
> the major version number. The preceding paragraph does mention:
>
> "Furthermore, it is not safe to assume that queryid will be stable
> across major versions of PostgreSQL."
>
> but not stable across *major* versions does *not* mean stable across
> *minor* versions. The reader is just left guessing if that's true.

Technically we don't promise that WAL records won't change in minor
versions. In fact, the docs specifically state that the format of any
WAL record might change, and that users should upgrade standbys first
on general principle (though I imagine few do). We try hard to avoid
changing the format of WAL records in point releases, of course, but
strictly speaking there is no guarantee. This situation seems similar
(though much lower stakes) to me. Query normalization isn't perfect --
there's a trade-off.

> Maybe the paragraph starting with "Consumers of" can detail the
> reasons queryid might be unstable and the following paragraph can
> describe the scenario for when the queryid can generally assumed to be
> stable.

I think that it would be reasonable to say that we strive to not break
the format in point releases. Fundamentally, if pg_stat_statements
sees a hard queryid format change (e.g. due to a major version
upgrade), then pg_stat_statements throws away the accumulated query
stats without being asked to.

-- 
Peter Geoghegan