subject:"Feature improvement\: can we add queryId for pg_catalog.pg_stat

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-07-14 Thread Magnus Hagander

On Wed, Jul 14, 2021 at 6:36 AM Peter Eisentraut
 wrote:
>
> On 13.07.21 10:58, Magnus Hagander wrote:
> > Maybe "each distinct database id, each distinct user id, each distinct
> > query id, and whether it is a top level statement or not"?
> >
> > Or maybe "each distinct combination of database id, user id, query id
> > and whether it's a top level statement or not"?
>
> Okay, now I understand what is meant here.  The second one sounds good
> to me.

Thanks, will push a fix like that.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-07-13 Thread Peter Eisentraut


On 13.07.21 10:58, Magnus Hagander wrote:

Maybe "each distinct database id, each distinct user id, each distinct
query id, and whether it is a top level statement or not"?

Or maybe "each distinct combination of database id, user id, query id
and whether it's a top level statement or not"?


Okay, now I understand what is meant here.  The second one sounds good 
to me.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-07-13 Thread Julien Rouhaud

On Tue, Jul 13, 2021 at 10:58:12AM +0200, Magnus Hagander wrote:
> 
> Maybe a problem for the readability of it is that the three other
> fields are listed by their description and not by their fieldname, and
> toplevel is fieldname?

I think so too.

> Maybe "each distinct database id, each distinct user id, each distinct
> query id, and whether it is a top level statement or not"?
> 
> Or maybe "each distinct combination of database id, user id, query id
> and whether it's a top level statement or not"?

I like the 2nd one better.  What about "and its top level status"?  It would
keep the sentence short and the full description is right after if needed.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-07-13 Thread Magnus Hagander

On Tue, Jul 13, 2021 at 8:10 AM Julien Rouhaud  wrote:
>
> On Mon, Jul 12, 2021 at 10:02:59PM +0200, Peter Eisentraut wrote:
> > On 22.04.21 11:23, Julien Rouhaud wrote:
> > >  The statistics gathered by the module are made available via a
> > >  view named pg_stat_statements.  This view
> > > -   contains one row for each distinct database ID, user ID and query
> > > -   ID (up to the maximum number of distinct statements that the module
> > > +   contains one row for each distinct database ID, user ID, query ID and
> > > +   toplevel (up to the maximum number of distinct statements that the 
> > > module
> > >  can track).  The columns of the view are shown in
> > >  .
> >
> > I'm having trouble parsing this new sentence.  It now says essentially
> >
> > "This view contains one row for each distinct database ID, each distinct
> > user ID, each distinct query ID, and each distinct toplevel."
>
> Isn't it each distinct permutation of all those fields?

Maybe a problem for the readability of it is that the three other
fields are listed by their description and not by their fieldname, and
toplevel is fieldname?

Maybe "each distinct database id, each distinct user id, each distinct
query id, and whether it is a top level statement or not"?

Or maybe "each distinct combination of database id, user id, query id
and whether it's a top level statement or not"?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-07-13 Thread Julien Rouhaud

On Mon, Jul 12, 2021 at 10:02:59PM +0200, Peter Eisentraut wrote:
> On 22.04.21 11:23, Julien Rouhaud wrote:
> >  The statistics gathered by the module are made available via a
> >  view named pg_stat_statements.  This view
> > -   contains one row for each distinct database ID, user ID and query
> > -   ID (up to the maximum number of distinct statements that the module
> > +   contains one row for each distinct database ID, user ID, query ID and
> > +   toplevel (up to the maximum number of distinct statements that the 
> > module
> >  can track).  The columns of the view are shown in
> >  .
> 
> I'm having trouble parsing this new sentence.  It now says essentially
> 
> "This view contains one row for each distinct database ID, each distinct
> user ID, each distinct query ID, and each distinct toplevel."

Isn't it each distinct permutation of all those fields?

> That last part doesn't make sense.

I'm not sure what you mean by that.  Maybe it's not really self explanatory
without referring to what toplevel is, which is a bool flag stating whether the
statement was exected as a top level statement or not.

So every distinct permutation of (dbid, userid, queryid) can indeed be stored
twice, if pg_stat_statements.track is set to all.  However in practice most
statements are not executed both as top level and nested statements.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-07-12 Thread Peter Eisentraut


On 22.04.21 11:23, Julien Rouhaud wrote:

 The statistics gathered by the module are made available via a
 view named pg_stat_statements.  This view
-   contains one row for each distinct database ID, user ID and query
-   ID (up to the maximum number of distinct statements that the module
+   contains one row for each distinct database ID, user ID, query ID and
+   toplevel (up to the maximum number of distinct statements that the module
 can track).  The columns of the view are shown in
 .


I'm having trouble parsing this new sentence.  It now says essentially

"This view contains one row for each distinct database ID, each distinct 
user ID, each distinct query ID, and each distinct toplevel."


That last part doesn't make sense.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-05-17 Thread Magnus Hagander

On Fri, Apr 23, 2021 at 12:40 PM Fujii Masao
 wrote:
>
>
>
> On 2021/04/23 19:11, Magnus Hagander wrote:
> > On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao
> >  wrote:
> >>
> >>
> >>
> >> On 2021/04/23 18:46, Magnus Hagander wrote:
> >>> On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao  
> >>> wrote:
> 
> 
> 
>  On 2021/04/22 18:23, Julien Rouhaud wrote:
> > On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
> >>
> >> I found another small issue in pg_stat_statements docs. The following
> >> description in the docs should be updated so that toplevel is included?
> >>
> >>> This view contains one row for each distinct database ID, user ID and 
> >>> query ID
> >
> > Indeed!  I'm adding Magnus in Cc.
> >
> > PFA a patch to fix at, and also mention that toplevel will only
> > contain True values if pg_stat_statements.track is set to top.
> 
>  Thanks for the patch! LGTM.
> >>>
> >>> Agreed, in general. But going by the example a few lines down, I
> >>> changed the second part to:
> >>>   True if the query was executed as a top level statement
> >>> +   (if pg_stat_statements.track is set to
> >>> +   all, otherwise always false)
> >>
> >> Isn't this confusing? Users may mistakenly read this as that the toplevel
> >> column always indicates false if pg_stat_statements.track is not "all".
> >
> > Hmm. I think you're right. It should say "always true", shouldn't it?
>
> You're thinking something like the following?
>
>  True if the query was executed as a top level statement
>  (if pg_stat_statements.track is set to
>  top, always true)
>
> > So not just confusing, but completely wrong? :)
>
> Yeah :)

Ugh. I completely lost track of this email.

I've applied the change suggested above with another slight reordering
of the words:

+   (always true if pg_stat_statements.track is set to
+   top)


> I'm fine with the original wording by Julien.
> Of course, the parameter name should be corrected as you did, though.
>
> Or what about the following?
>
>  True if the query was executed as a top level statement
>  (this can be false only if
>  pg_stat_statements.track is set to
>  all and nested statements are also tracked)

I found my suggestion, once the final reordering of words was done,
easier to parse.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-23 Thread Fujii Masao





On 2021/04/23 19:11, Magnus Hagander wrote:

On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao
 wrote:




On 2021/04/23 18:46, Magnus Hagander wrote:

On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao  wrote:




On 2021/04/22 18:23, Julien Rouhaud wrote:

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:


I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?


This view contains one row for each distinct database ID, user ID and query ID


Indeed!  I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.


Thanks for the patch! LGTM.


Agreed, in general. But going by the example a few lines down, I
changed the second part to:
  True if the query was executed as a top level statement
+   (if pg_stat_statements.track is set to
+   all, otherwise always false)


Isn't this confusing? Users may mistakenly read this as that the toplevel
column always indicates false if pg_stat_statements.track is not "all".


Hmm. I think you're right. It should say "always true", shouldn't it?


You're thinking something like the following?

True if the query was executed as a top level statement
(if pg_stat_statements.track is set to
top, always true)


So not just confusing, but completely wrong? :)


Yeah :)

I'm fine with the original wording by Julien.
Of course, the parameter name should be corrected as you did, though.

Or what about the following?

True if the query was executed as a top level statement
(this can be false only if
pg_stat_statements.track is set to
all and nested statements are also tracked)

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-23 Thread Fujii Masao





On 2021/04/23 18:46, Magnus Hagander wrote:

On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao  wrote:




On 2021/04/22 18:23, Julien Rouhaud wrote:

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:


I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?


This view contains one row for each distinct database ID, user ID and query ID


Indeed!  I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.


Thanks for the patch! LGTM.


Agreed, in general. But going by the example a few lines down, I
changed the second part to:
 True if the query was executed as a top level statement
+   (if pg_stat_statements.track is set to
+   all, otherwise always false)


Isn't this confusing? Users may mistakenly read this as that the toplevel
column always indicates false if pg_stat_statements.track is not "all".



(changes the wording, but also the name of the parameter is
pg_stat_statements.track, not pg_stat_statements.toplevel (that's the
column, not the parameter). Same error in the commit msg except there
you called it pg_stat_statements.top - but that one needed some more
fix as well)

With those changes, applied. Thanks!


Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-23 Thread Magnus Hagander

On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao
 wrote:
>
>
>
> On 2021/04/23 18:46, Magnus Hagander wrote:
> > On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2021/04/22 18:23, Julien Rouhaud wrote:
> >>> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
> 
>  I found another small issue in pg_stat_statements docs. The following
>  description in the docs should be updated so that toplevel is included?
> 
> > This view contains one row for each distinct database ID, user ID and 
> > query ID
> >>>
> >>> Indeed!  I'm adding Magnus in Cc.
> >>>
> >>> PFA a patch to fix at, and also mention that toplevel will only
> >>> contain True values if pg_stat_statements.track is set to top.
> >>
> >> Thanks for the patch! LGTM.
> >
> > Agreed, in general. But going by the example a few lines down, I
> > changed the second part to:
> >  True if the query was executed as a top level statement
> > +   (if pg_stat_statements.track is set to
> > +   all, otherwise always false)
>
> Isn't this confusing? Users may mistakenly read this as that the toplevel
> column always indicates false if pg_stat_statements.track is not "all".

Hmm. I think you're right. It should say "always true", shouldn't it?
So not just confusing, but completely wrong? :)


-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-23 Thread Magnus Hagander

On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao  wrote:
>
>
>
> On 2021/04/22 18:23, Julien Rouhaud wrote:
> > On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
> >>
> >> I found another small issue in pg_stat_statements docs. The following
> >> description in the docs should be updated so that toplevel is included?
> >>
> >>> This view contains one row for each distinct database ID, user ID and 
> >>> query ID
> >
> > Indeed!  I'm adding Magnus in Cc.
> >
> > PFA a patch to fix at, and also mention that toplevel will only
> > contain True values if pg_stat_statements.track is set to top.
>
> Thanks for the patch! LGTM.

Agreed, in general. But going by the example a few lines down, I
changed the second part to:
True if the query was executed as a top level statement
+   (if pg_stat_statements.track is set to
+   all, otherwise always false)

(changes the wording, but also the name of the parameter is
pg_stat_statements.track, not pg_stat_statements.toplevel (that's the
column, not the parameter). Same error in the commit msg except there
you called it pg_stat_statements.top - but that one needed some more
fix as well)

With those changes, applied. Thanks!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-23 Thread Fujii Masao





On 2021/04/22 18:23, Julien Rouhaud wrote:

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:


I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?


This view contains one row for each distinct database ID, user ID and query ID


Indeed!  I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.


Thanks for the patch! LGTM.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-22 Thread Julien Rouhaud

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
> 
> I found another small issue in pg_stat_statements docs. The following
> description in the docs should be updated so that toplevel is included?
> 
> > This view contains one row for each distinct database ID, user ID and query 
> > ID

Indeed!  I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.
>From a1fe3e32ca3c59b059fd87b3070201ce4e7d0e70 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 22 Apr 2021 17:14:45 +0800
Subject: [PATCH v1] doc: Mention that toplevel is part of pg_stat_statements
 key.

While at it, also document that toplevel can only be true if
pg_stat_statements.top is set to top.

Author: Julien Rouhaud
Reported-by: Fuji Masao
Discussion: https://postgr.es/m/a878d5ea-64a7-485e-5d2f-177618ebc...@oss.nttdata.com
---
 doc/src/sgml/pgstatstatements.sgml | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index a0c315aa97..561452c624 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -46,8 +46,8 @@
   
The statistics gathered by the module are made available via a
view named pg_stat_statements.  This view
-   contains one row for each distinct database ID, user ID and query
-   ID (up to the maximum number of distinct statements that the module
+   contains one row for each distinct database ID, user ID, query ID and
+   toplevel (up to the maximum number of distinct statements that the module
can track).  The columns of the view are shown in
.
   
@@ -93,6 +93,8 @@
   
   
True if the query was executed as a top level statement
+   (requires pg_stat_statements.toplevel to be set to
+   all to see false values)
   
  
 
-- 
2.30.1

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-21 Thread Fujii Masao





On 2021/04/21 1:22, Bruce Momjian wrote:

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:

On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:

On 2021-Apr-12, Bruce Momjian wrote:


OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
have not changed any of the APIs which existed before this feature was
added, and are called "queryid" or "queryId" --- it is kind of a mess.
I assume I should leave those unchanged.  It will also need a catversion
bump.


I think it is fine actually.  These names appear in structs Query and
PlannedStmt, and every single member of those already uses camelCase
naming.  Changing those to use "query_id" would look out of place.
You did change the one in PgBackendStatus to st_query_id, which also
matches the naming style in that struct, so that looks fine also.

So I'm -1 on Julien's first proposed change, and +1 on his second
proposed change (the name of the first argument of
pgstat_report_query_id should be query_id).


Thanks for your analysis.  Updated patch attached with the change
suggested above.


Patch applied.


I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?


This view contains one row for each distinct database ID, user ID and query ID


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-20 Thread Bruce Momjian

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:
> On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:
> > On 2021-Apr-12, Bruce Momjian wrote:
> > 
> > > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
> > > have not changed any of the APIs which existed before this feature was
> > > added, and are called "queryid" or "queryId" --- it is kind of a mess. 
> > > I assume I should leave those unchanged.  It will also need a catversion
> > > bump.
> > 
> > I think it is fine actually.  These names appear in structs Query and
> > PlannedStmt, and every single member of those already uses camelCase
> > naming.  Changing those to use "query_id" would look out of place.
> > You did change the one in PgBackendStatus to st_query_id, which also
> > matches the naming style in that struct, so that looks fine also.
> > 
> > So I'm -1 on Julien's first proposed change, and +1 on his second
> > proposed change (the name of the first argument of
> > pgstat_report_query_id should be query_id).
> 
> Thanks for your analysis.  Updated patch attached with the change
> suggested above.

Patch applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-15 Thread Julien Rouhaud

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:
> On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:
> > 
> > So I'm -1 on Julien's first proposed change, and +1 on his second
> > proposed change (the name of the first argument of
> > pgstat_report_query_id should be query_id).
> 
> Thanks for your analysis.  Updated patch attached with the change
> suggested above.

Thanks Bruce.  It looks good to me.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-14 Thread Bruce Momjian

On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:
> On 2021-Apr-12, Bruce Momjian wrote:
> 
> > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
> > have not changed any of the APIs which existed before this feature was
> > added, and are called "queryid" or "queryId" --- it is kind of a mess. 
> > I assume I should leave those unchanged.  It will also need a catversion
> > bump.
> 
> I think it is fine actually.  These names appear in structs Query and
> PlannedStmt, and every single member of those already uses camelCase
> naming.  Changing those to use "query_id" would look out of place.
> You did change the one in PgBackendStatus to st_query_id, which also
> matches the naming style in that struct, so that looks fine also.
> 
> So I'm -1 on Julien's first proposed change, and +1 on his second
> proposed change (the name of the first argument of
> pgstat_report_query_id should be query_id).

Thanks for your analysis.  Updated patch attached with the change
suggested above.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.



query_id.diff.gz
Description: application/gzip

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-13 Thread Alvaro Herrera

On 2021-Apr-12, Bruce Momjian wrote:

> OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
> have not changed any of the APIs which existed before this feature was
> added, and are called "queryid" or "queryId" --- it is kind of a mess. 
> I assume I should leave those unchanged.  It will also need a catversion
> bump.

I think it is fine actually.  These names appear in structs Query and
PlannedStmt, and every single member of those already uses camelCase
naming.  Changing those to use "query_id" would look out of place.
You did change the one in PgBackendStatus to st_query_id, which also
matches the naming style in that struct, so that looks fine also.

So I'm -1 on Julien's first proposed change, and +1 on his second
proposed change (the name of the first argument of
pgstat_report_query_id should be query_id).

-- 
Álvaro Herrera   Valdivia, Chile

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-13 Thread Julien Rouhaud

On Mon, Apr 12, 2021 at 10:12:46PM -0400, Bruce Momjian wrote:
> On Thu, Apr  8, 2021 at 01:01:42PM -0400, Bruce Momjian wrote:
> > On Thu, Apr  8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote:
> > > On 2021-Apr-08, Bruce Momjian wrote:
> > > 
> > > > pg_stat_activity.queryid is new, but I can imagine cases where you would
> > > > join pg_stat_activity to pg_stat_statements to get an estimate of how
> > > > long the query will take --- having one using an underscore and another
> > > > one not seems odd.
> > > 
> > > OK.  So far, you have one vote for queryid (your own) and two votes for
> > > query_id (mine and Julien's).  And even yourself were hesitating about
> > > it earlier in the thread.
> > 
> > OK, if people are fine with pg_stat_activity.query_id not matching
> > pg_stat_statements.queryid, I am fine with that.  I just don't want
> > someone to say it was a big mistake later.  ;-)
> 
> OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
> have not changed any of the APIs which existed before this feature was
> added, and are called "queryid" or "queryId" --- it is kind of a mess. 
> I assume I should leave those unchanged.  It will also need a catversion
> bump.

-   uint64  st_queryid;
+   uint64  st_query_id;

I thought we would internally keep queryid/queryId, at least for the variable
names as this is the name of the saved field in PlannedStmt.

-extern void pgstat_report_queryid(uint64 queryId, bool force);
+extern void pgstat_report_query_id(uint64 queryId, bool force);

But if we don't then it should be "uint64 query_id".

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-12 Thread Bruce Momjian

On Thu, Apr  8, 2021 at 01:01:42PM -0400, Bruce Momjian wrote:
> On Thu, Apr  8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote:
> > On 2021-Apr-08, Bruce Momjian wrote:
> > 
> > > pg_stat_activity.queryid is new, but I can imagine cases where you would
> > > join pg_stat_activity to pg_stat_statements to get an estimate of how
> > > long the query will take --- having one using an underscore and another
> > > one not seems odd.
> > 
> > OK.  So far, you have one vote for queryid (your own) and two votes for
> > query_id (mine and Julien's).  And even yourself were hesitating about
> > it earlier in the thread.
> 
> OK, if people are fine with pg_stat_activity.query_id not matching
> pg_stat_statements.queryid, I am fine with that.  I just don't want
> someone to say it was a big mistake later.  ;-)

OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
have not changed any of the APIs which existed before this feature was
added, and are called "queryid" or "queryId" --- it is kind of a mess. 
I assume I should leave those unchanged.  It will also need a catversion
bump.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.



query_id.diff.gz
Description: application/gzip

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Bruce Momjian

On Thu, Apr  8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote:
> On 2021-Apr-08, Bruce Momjian wrote:
> 
> > pg_stat_activity.queryid is new, but I can imagine cases where you would
> > join pg_stat_activity to pg_stat_statements to get an estimate of how
> > long the query will take --- having one using an underscore and another
> > one not seems odd.
> 
> OK.  So far, you have one vote for queryid (your own) and two votes for
> query_id (mine and Julien's).  And even yourself were hesitating about
> it earlier in the thread.

OK, if people are fine with pg_stat_activity.query_id not matching
pg_stat_statements.queryid, I am fine with that.  I just don't want
someone to say it was a big mistake later.  ;-)

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Alvaro Herrera

On 2021-Apr-08, Bruce Momjian wrote:

> pg_stat_activity.queryid is new, but I can imagine cases where you would
> join pg_stat_activity to pg_stat_statements to get an estimate of how
> long the query will take --- having one using an underscore and another
> one not seems odd.

OK.  So far, you have one vote for queryid (your own) and two votes for
query_id (mine and Julien's).  And even yourself were hesitating about
it earlier in the thread.

-- 
Álvaro Herrera   Valdivia, Chile

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Bruce Momjian

On Fri, Apr  9, 2021 at 12:38:29AM +0800, Julien Rouhaud wrote:
> On Thu, Apr 08, 2021 at 11:34:25AM -0400, Bruce Momjian wrote:
> > 
> > OK, let's get some details.  First, pg_stat_statements.queryid already
> > exists (no underscore), and I don't think anyone wants to change that. 
> > 
> > pg_stat_activity.queryid is new, but I can imagine cases where you would
> > join pg_stat_activity to pg_stat_statements to get an estimate of how
> > long the query will take --- having one using an underscore and another
> > one not seems odd.
> 
> Indeed, and also being able to join with a USING clause rather than an ON 
> could
> also save some keystrokes.  But unfortunately, we already have (userid, dbid)
> on pg_stat_statements side vs (usesysid, datid) on pg_stat_activity side, so
> this unfortunately won't fix all the oddities.

Wow, good point.  Shame they don't match.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Julien Rouhaud

On Thu, Apr 08, 2021 at 11:34:25AM -0400, Bruce Momjian wrote:
> 
> OK, let's get some details.  First, pg_stat_statements.queryid already
> exists (no underscore), and I don't think anyone wants to change that. 
> 
> pg_stat_activity.queryid is new, but I can imagine cases where you would
> join pg_stat_activity to pg_stat_statements to get an estimate of how
> long the query will take --- having one using an underscore and another
> one not seems odd.

Indeed, and also being able to join with a USING clause rather than an ON could
also save some keystrokes.  But unfortunately, we already have (userid, dbid)
on pg_stat_statements side vs (usesysid, datid) on pg_stat_activity side, so
this unfortunately won't fix all the oddities.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Bruce Momjian

On Wed, Apr  7, 2021 at 11:27:04PM -0400, Álvaro Herrera wrote:
> On 2021-Apr-07, Bruce Momjian wrote:
> 
> > On Thu, Apr  8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote:
> 
> > > Thanks!  And I agree with using query_id in the new field names while 
> > > keeping
> > > queryid for pg_stat_statements to avoid unnecessary query breakage.
> > 
> > I think we need more feedback from the group.  Do people agree with the
> > idea above?  The question is what to call:
> > 
> > GUC compute_queryid
> > pg_stat_activity.queryid
> > pg_stat_statements.queryid
> > 
> > using "queryid" or "query_id", and do they have to match?
> 
> Seems a matter of personal preference.  Mine is to have the underscore
> everywhere in backend code (where this is new), and let it without the
> underscore in pg_stat_statements to avoid breaking existing code.  Seems
> to match what Julien is saying.

OK, let's get some details.  First, pg_stat_statements.queryid already
exists (no underscore), and I don't think anyone wants to change that. 

pg_stat_activity.queryid is new, but I can imagine cases where you would
join pg_stat_activity to pg_stat_statements to get an estimate of how
long the query will take --- having one using an underscore and another
one not seems odd.  Also, looking at the existing pg_stat_activity
columns, those don't use underscores before the "id" unless there is a
modifier before the "id", e.g. "pid", "xid":

SELECT  attname
FROMpg_namespace JOIN pg_class ON (pg_namespace.oid = relnamespace)
JOIN pg_attribute ON (pg_class.oid = pg_attribute.attrelid)
WHERE   nspname = 'pg_catalog' AND
relname = 'pg_stat_activity' AND
attname ~ 'id$';
   attname
-
 backend_xid
 datid
 leader_pid
 pid
 queryid
 usesysid

We don't have a modifier before queryid.

If people like query_id, and I do too, I am thinking we just keep
query_id as the GUC (compute_query_id), and just accept that the GUC and
SQL levels will not match.  This is exactly what we have now.  I brought
it up to be sure this is what we want,

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Bruce Momjian

On Thu, Apr  8, 2021 at 09:31:27PM +0800, Julien Rouhaud wrote:
> On Thu, Apr 08, 2021 at 08:27:20PM +0800, Julien Rouhaud wrote:
> > On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote:
> > > 
> > > @@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
> > > /* Setting debug_query_string for individual workers */
> > > debug_query_string = queryDesc->sourceText;
> > > 
> > > -   /* Report workers' query for monitoring purposes */
> > > +   /* Report workers' query and queryId for monitoring purposes */
> > > pgstat_report_activity(STATE_RUNNING, debug_query_string);
> > > +   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
> > > 
> > > 
> > > Below lines down in ParallelQueryMain, we call ExecutorStart which
> > > will report queryid, so do we need it here?
> > 
> > Correct, it's not actually needed.  The overhead should be negligible but 
> > let's
> > get rid of it.  Updated fix patchset attached.
> 
> Sorry I messed up the last commit, v4 is ok.

Patch applied, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Julien Rouhaud

On Thu, Apr 08, 2021 at 08:27:20PM +0800, Julien Rouhaud wrote:
> On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote:
> > 
> > @@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
> > /* Setting debug_query_string for individual workers */
> > debug_query_string = queryDesc->sourceText;
> > 
> > -   /* Report workers' query for monitoring purposes */
> > +   /* Report workers' query and queryId for monitoring purposes */
> > pgstat_report_activity(STATE_RUNNING, debug_query_string);
> > +   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
> > 
> > 
> > Below lines down in ParallelQueryMain, we call ExecutorStart which
> > will report queryid, so do we need it here?
> 
> Correct, it's not actually needed.  The overhead should be negligible but 
> let's
> get rid of it.  Updated fix patchset attached.

Sorry I messed up the last commit, v4 is ok.
>From d74523dfb76e7583c27166ec10d72670654c3b7b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 8 Apr 2021 13:59:43 +0800
Subject: [PATCH v4 1/3] Ignore parallel workers in pg_stat_statements.

Oversight in 4f0b0966c8 which exposed queryid in parallel workers.  Counters
are aggregated by the main backend process so parallel workers would report
duplicated activity, and could also report activity for the wrong entry as they
are only aware of the top level queryid.

Author: Julien Rouhaud
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210408051735.lfbdzun5zdlax...@alap3.anarazel.de
---
 contrib/pg_stat_statements/pg_stat_statements.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index fc2677643b..dbd0d41d88 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 
+#include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/hashfn.h"
 #include "executor/instrument.h"
@@ -278,8 +279,9 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 
 
 #define pgss_enabled(level) \
+	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
 
 #define record_gc_qtexts() \
 	do { \
-- 
2.30.1

>From 61ff6d226761fcc8f2a28fe8e313382d1d46f098 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 8 Apr 2021 20:05:14 +0800
Subject: [PATCH v4 2/3] Fix thinko in pg_stat_get_activity when retrieving the
 queryid.

---
 src/backend/utils/adt/pgstatfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9fa4a93162..182b15e3f2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -917,7 +917,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			if (beentry->st_queryid == 0)
 nulls[29] = true;
 			else
-values[29] = DatumGetUInt64(beentry->st_queryid);
+values[29] = UInt64GetDatum(beentry->st_queryid);
 		}
 		else
 		{
-- 
2.30.1

>From c3ab7472f1f75c9a26f1b77d748aa267e3483d1e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 8 Apr 2021 20:25:19 +0800
Subject: [PATCH v4 3/3] Remove unnecessary call to pgstat_report_queryid().

Reported-by: Amit Kapila
---
 src/backend/executor/execParallel.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index d104a19767..4fca8782b2 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1426,7 +1426,6 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 
 	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
-	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
-- 
2.30.1

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Julien Rouhaud

On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote:
> 
> @@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
> /* Setting debug_query_string for individual workers */
> debug_query_string = queryDesc->sourceText;
> 
> -   /* Report workers' query for monitoring purposes */
> +   /* Report workers' query and queryId for monitoring purposes */
> pgstat_report_activity(STATE_RUNNING, debug_query_string);
> +   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
> 
> 
> Below lines down in ParallelQueryMain, we call ExecutorStart which
> will report queryid, so do we need it here?

Correct, it's not actually needed.  The overhead should be negligible but let's
get rid of it.  Updated fix patchset attached.
>From d74523dfb76e7583c27166ec10d72670654c3b7b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 8 Apr 2021 13:59:43 +0800
Subject: [PATCH v3 1/3] Ignore parallel workers in pg_stat_statements.

Oversight in 4f0b0966c8 which exposed queryid in parallel workers.  Counters
are aggregated by the main backend process so parallel workers would report
duplicated activity, and could also report activity for the wrong entry as they
are only aware of the top level queryid.

Author: Julien Rouhaud
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210408051735.lfbdzun5zdlax...@alap3.anarazel.de
---
 contrib/pg_stat_statements/pg_stat_statements.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index fc2677643b..dbd0d41d88 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 
+#include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/hashfn.h"
 #include "executor/instrument.h"
@@ -278,8 +279,9 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 
 
 #define pgss_enabled(level) \
+	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
 
 #define record_gc_qtexts() \
 	do { \
-- 
2.30.1

>From 61ff6d226761fcc8f2a28fe8e313382d1d46f098 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 8 Apr 2021 20:05:14 +0800
Subject: [PATCH v3 2/3] Fix thinko in pg_stat_get_activity when retrieving the
 queryid.

---
 src/backend/utils/adt/pgstatfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9fa4a93162..182b15e3f2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -917,7 +917,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			if (beentry->st_queryid == 0)
 nulls[29] = true;
 			else
-values[29] = DatumGetUInt64(beentry->st_queryid);
+values[29] = UInt64GetDatum(beentry->st_queryid);
 		}
 		else
 		{
-- 
2.30.1

>From bc3b5470d12aab282e1b6f6b0b2f4afcc65b7dcf Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 8 Apr 2021 20:25:19 +0800
Subject: [PATCH v3 3/3] Remove unnecessary call to pgstat_report_queryid().

Reported-by: Amit Kapila
---
 src/backend/executor/execParallel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index d104a19767..bfd6155509 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1426,7 +1426,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
 
 	/* Report workers' query and queryId for monitoring purposes */
 	pgstat_report_activity(STATE_RUNNING, debug_query_string);
-	pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
+	//pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
 
 	/* Attach to the dynamic shared memory area. */
 	area_space = shm_toc_lookup(toc, PARALLEL_KEY_DSA, false);
-- 
2.30.1

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Amit Kapila

On Thu, Apr 8, 2021 at 9:47 AM Julien Rouhaud  wrote:
>
> On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote:
> >
> > Patch applied.  I am ready to adjust this with any improvements people
> > might have.  Thank you for all the good feedback we got on this, and I
> > know many users have waited a long time for this feature.
>
> Thanks a lot Bruce and everyone!  I hope that the users who waited a long time
> for this will find everything they need.
>

@@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Setting debug_query_string for individual workers */
debug_query_string = queryDesc->sourceText;

-   /* Report workers' query for monitoring purposes */
+   /* Report workers' query and queryId for monitoring purposes */
pgstat_report_activity(STATE_RUNNING, debug_query_string);
+   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);


Below lines down in ParallelQueryMain, we call ExecutorStart which
will report queryid, so do we need it here?

-- 
With Regards,
Amit Kapila.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Julien Rouhaud

On Thu, Apr 08, 2021 at 11:36:48PM +1200, Thomas Munro wrote:
> Hi Julien, Bruce,
> 
> A warning appears on 32 bit systems:
> 
> In file included from pgstatfuncs.c:15:
> pgstatfuncs.c: In function 'pg_stat_get_activity':
> ../../../../src/include/postgres.h:593:29: warning: cast to pointer
> from integer of different size [-Wint-to-pointer-cast]
>   593 | #define DatumGetPointer(X) ((Pointer) (X))
>   | ^
> ../../../../src/include/postgres.h:678:42: note: in expansion of macro
> 'DatumGetPointer'
>   678 | #define DatumGetUInt64(X) (* ((uint64 *) DatumGetPointer(X)))
>   |  ^~~
> pgstatfuncs.c:920:18: note: in expansion of macro 'DatumGetUInt64'
>   920 | values[29] = DatumGetUInt64(beentry->st_queryid);
>   |  ^~

Wow, that's really embarrassing :(

> Hmm, maybe this should be UInt64GetDatum()?

Yes definitely.  I'm attaching the previous patch for force_parallel_mode to
not forget it + a new one for this issue.
>From d74523dfb76e7583c27166ec10d72670654c3b7b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 8 Apr 2021 13:59:43 +0800
Subject: [PATCH v2 1/2] Ignore parallel workers in pg_stat_statements.

Oversight in 4f0b0966c8 which exposed queryid in parallel workers.  Counters
are aggregated by the main backend process so parallel workers would report
duplicated activity, and could also report activity for the wrong entry as they
are only aware of the top level queryid.

Author: Julien Rouhaud
Reported-by: Andres Freund
Discussion: https://postgr.es/m/20210408051735.lfbdzun5zdlax...@alap3.anarazel.de
---
 contrib/pg_stat_statements/pg_stat_statements.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index fc2677643b..dbd0d41d88 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 
+#include "access/parallel.h"
 #include "catalog/pg_authid.h"
 #include "common/hashfn.h"
 #include "executor/instrument.h"
@@ -278,8 +279,9 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 
 
 #define pgss_enabled(level) \
+	(!IsParallelWorker() && \
 	(pgss_track == PGSS_TRACK_ALL || \
-	(pgss_track == PGSS_TRACK_TOP && (level) == 0))
+	(pgss_track == PGSS_TRACK_TOP && (level) == 0)))
 
 #define record_gc_qtexts() \
 	do { \
-- 
2.30.1

>From 61ff6d226761fcc8f2a28fe8e313382d1d46f098 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Thu, 8 Apr 2021 20:05:14 +0800
Subject: [PATCH v2 2/2] Fix thinko in pg_stat_get_activity when retrieving the
 queryid.

---
 src/backend/utils/adt/pgstatfuncs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9fa4a93162..182b15e3f2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -917,7 +917,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			if (beentry->st_queryid == 0)
 nulls[29] = true;
 			else
-values[29] = DatumGetUInt64(beentry->st_queryid);
+values[29] = UInt64GetDatum(beentry->st_queryid);
 		}
 		else
 		{
-- 
2.30.1

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-08 Thread Thomas Munro

Hi Julien, Bruce,

A warning appears on 32 bit systems:

In file included from pgstatfuncs.c:15:
pgstatfuncs.c: In function 'pg_stat_get_activity':
../../../../src/include/postgres.h:593:29: warning: cast to pointer
from integer of different size [-Wint-to-pointer-cast]
  593 | #define DatumGetPointer(X) ((Pointer) (X))
  | ^
../../../../src/include/postgres.h:678:42: note: in expansion of macro
'DatumGetPointer'
  678 | #define DatumGetUInt64(X) (* ((uint64 *) DatumGetPointer(X)))
  |  ^~~
pgstatfuncs.c:920:18: note: in expansion of macro 'DatumGetUInt64'
  920 | values[29] = DatumGetUInt64(beentry->st_queryid);
  |  ^~

Hmm, maybe this should be UInt64GetDatum()?

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Julien Rouhaud

On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote:
> 
> Patch applied.  I am ready to adjust this with any improvements people
> might have.  Thank you for all the good feedback we got on this, and I
> know many users have waited a long time for this feature.

Thanks a lot Bruce and everyone!  I hope that the users who waited a long time
for this will find everything they need.

Just to validate that this patchset also allows user to use pg_stat_statements,
any additional third-party module and the new added infrastructure with the
queryid algorithm of their choice, I created a POC extension ([1]) which works
as expected.

Basically:

SHOW shared_preload_libraries;
 shared_preload_libraries
--
 pg_stat_statements, pg_queryid
(1 row)

SET pg_queryid.use_object_names TO on;
SET pg_queryid.ignore_schema TO on;

CREATE SCHEMA ns1; CREATE TABLE ns1.tbl1(id integer);
CREATE SCHEMA ns2; CREATE TABLE ns2.tbl1(id integer);

SET search_path TO ns1;
SELECT COUNT(*) FROM tbl1;
SET search_path TO ns2;
SELECT COUNT(*) FROM tbl1;

SELECT queryid, query, calls
FROM public.pg_stat_statements
WHERE query LIKE '%tbl%';
   queryid   |   query   | calls
-+---+---
 4629593225724429059 | SELECT count(*) from tbl1 | 2
(1 row)

So whether that's a good idea to do that or not, users now have a choice.

[1]: https://github.com/rjuju/pg_queryid

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Alvaro Herrera

On 2021-Apr-07, Bruce Momjian wrote:

> On Thu, Apr  8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote:

> > Thanks!  And I agree with using query_id in the new field names while 
> > keeping
> > queryid for pg_stat_statements to avoid unnecessary query breakage.
> 
> I think we need more feedback from the group.  Do people agree with the
> idea above?  The question is what to call:
> 
>   GUC compute_queryid
>   pg_stat_activity.queryid
>   pg_stat_statements.queryid
> 
> using "queryid" or "query_id", and do they have to match?

Seems a matter of personal preference.  Mine is to have the underscore
everywhere in backend code (where this is new), and let it without the
underscore in pg_stat_statements to avoid breaking existing code.  Seems
to match what Julien is saying.

-- 
Álvaro Herrera   Valdivia, Chile
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Bruce Momjian

On Thu, Apr  8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote:
> On Wed, Apr 07, 2021 at 10:31:01PM -0400, Bruce Momjian wrote:
> > On Wed, Apr  7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:
> > > > Unless I'm missing something this will output the query id in the next 
> > > > log
> > > > line?  The new code should be added before the newline is output, and 
> > > > the comma
> > > > should also be output before the queryid.
> > > 
> > > Yes, correct, updated patch attached.
> > 
> > Patch applied.
> 
> Thanks!  And I agree with using query_id in the new field names while keeping
> queryid for pg_stat_statements to avoid unnecessary query breakage.

I think we need more feedback from the group.  Do people agree with the
idea above?  The question is what to call:

GUC compute_queryid
pg_stat_activity.queryid
pg_stat_statements.queryid

using "queryid" or "query_id", and do they have to match?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Julien Rouhaud

On Wed, Apr 07, 2021 at 10:31:01PM -0400, Bruce Momjian wrote:
> On Wed, Apr  7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:
> > > Unless I'm missing something this will output the query id in the next log
> > > line?  The new code should be added before the newline is output, and the 
> > > comma
> > > should also be output before the queryid.
> > 
> > Yes, correct, updated patch attached.
> 
> Patch applied.

Thanks!  And I agree with using query_id in the new field names while keeping
queryid for pg_stat_statements to avoid unnecessary query breakage.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Bruce Momjian

On Wed, Apr  7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:
> > Unless I'm missing something this will output the query id in the next log
> > line?  The new code should be added before the newline is output, and the 
> > comma
> > should also be output before the queryid.
> 
> Yes, correct, updated patch attached.

Patch applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Bruce Momjian

On Wed, Apr  7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:
> > > I am also confused about the inconsistency of calling the GUC
> > > compute_query_id (with underscore), but pg_stat_activity.queryid.  If we
> > > make it pg_stat_activity.query_id, it doesn't match most of the other
> > > *id columsns in the table, leader_pid, usesysid, backend_xid.  Is that
> > > OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.
> > 
> > Mmm, most of the columns in pg_stat_activity do have a "_", so using 
> > query_id
> > would make more sense.
> 
> OK, let me work on a patch to change that part.

Uh, it is 'queryid' in pg_stat_statements:

https://www.postgresql.org/docs/13/pgstatstatements.html

queryid bigint
Internal hash code, computed from the statement's parse tree

I am not sure if we should have pg_stat_activity use underscore, or the
GUC use underscore.  The problem is that queryid can easily look like
quer-yid.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Bruce Momjian

On Thu, Apr  8, 2021 at 08:47:48AM +0800, Julien Rouhaud wrote:
> On Wed, Apr 07, 2021 at 07:38:35PM -0400, Bruce Momjian wrote:
> > On Wed, Apr  7, 2021 at 07:01:25PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > Uh, I think your patch missed a few things.  First, you use "%zd"
> > > > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in
> > > > src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
> > > > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
> > > > fits in a BIGINT SQL column.
> > > 
> > > Neither is correct.  Project standard these days for printing [u]int64
> > > is to write "%lld" or "%llu", with an explicit (long long) cast on
> > > the printf argument.
> > 
> > Yep, got it.  The attached patch fixes all the calls to use %lld, and
> > adds casts.  In implementing cvslog, I noticed that internally we pass
> > the hash as uint64, but output as int64, which I think is a requirement
> > for how pg_stat_statements has output it, and the use of bigint.  Is
> > that OK?
> 
> Indeed, this is due to how we expose the value in SQL.  The original 
> discussion
> is at
> https://www.postgresql.org/message-id/cah2-wzkuemfamy3onoxli+g67sjoky65cg9z1qohsyhceu8...@mail.gmail.com.
> As far as I know this is OK, as we want to show consistent values everywhere.

OK, yes, I do remember the discussion.  I was wondering if there should
be a C comment about this anywhere.

> > I am also confused about the inconsistency of calling the GUC
> > compute_query_id (with underscore), but pg_stat_activity.queryid.  If we
> > make it pg_stat_activity.query_id, it doesn't match most of the other
> > *id columsns in the table, leader_pid, usesysid, backend_xid.  Is that
> > OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.
> 
> Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id
> would make more sense.

OK, let me work on a patch to change that part.

> @@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)
> 
>   appendStringInfoChar(, '\n');
> 
> + /* query id */
> + appendStringInfo(, "%lld", (long long) pgstat_get_my_queryid());
> + appendStringInfoChar(, ',');
> +
> 
>   /* If in the syslogger process, try to write messages direct to file */
>   if (MyBackendType == B_LOGGER)
>   write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
>
> Unless I'm missing something this will output the query id in the next log
> line?  The new code should be added before the newline is output, and the 
> comma
> should also be output before the queryid.

Yes, correct, updated patch attached.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

>From 2e4beaef3690f7b61b91264a95a270e5410dad53 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Wed, 7 Apr 2021 20:51:10 -0400
Subject: [PATCH] csv squash commit

---
 doc/src/sgml/config.sgml   |  4 +++-
 doc/src/sgml/file-fdw.sgml |  3 ++-
 src/backend/utils/error/elog.c | 12 
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 963824d050..ea5cf3a2dc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7310,7 +7310,8 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 character count of the error position therein,
 location of the error in the PostgreSQL source code
 (if log_error_verbosity is set to verbose),
-application name, backend type, and process ID of parallel group leader.
+application name, backend type, process ID of parallel group leader,
+and query id.
 Here is a sample table definition for storing CSV-format log output:
 
 
@@ -7341,6 +7342,7 @@ CREATE TABLE postgres_log
   application_name text,
   backend_type text,
   leader_pid integer,
+  query_id bigint,
   PRIMARY KEY (session_id, session_line_num)
 );
 
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 2e21806f48..5b98782064 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -266,7 +266,8 @@ CREATE FOREIGN TABLE pglog (
   location text,
   application_name text,
   backend_type text,
-  leader_pid integer
+  leader_pid integer,
+  query_id bigint
 ) SERVER pglog
 OPTIONS ( filename 'log/pglog.csv', format 'csv' );
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1cf71a649b..a1ebe06d5b 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2716,11 +2716,11 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 break;
 			case 'Q':
 if (padding != 0)
-	appendStringInfo(buf, "%*ld", padding,
-			pgstat_get_my_queryid());
+	appendStringInfo(buf, "%*lld", padding,
+			(long long) pgstat_get_my_queryid());
 else
-

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Julien Rouhaud

On Wed, Apr 07, 2021 at 07:38:35PM -0400, Bruce Momjian wrote:
> On Wed, Apr  7, 2021 at 07:01:25PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Uh, I think your patch missed a few things.  First, you use "%zd"
> > > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in
> > > src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
> > > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
> > > fits in a BIGINT SQL column.
> > 
> > Neither is correct.  Project standard these days for printing [u]int64
> > is to write "%lld" or "%llu", with an explicit (long long) cast on
> > the printf argument.
> 
> Yep, got it.  The attached patch fixes all the calls to use %lld, and
> adds casts.  In implementing cvslog, I noticed that internally we pass
> the hash as uint64, but output as int64, which I think is a requirement
> for how pg_stat_statements has output it, and the use of bigint.  Is
> that OK?

Indeed, this is due to how we expose the value in SQL.  The original discussion
is at
https://www.postgresql.org/message-id/cah2-wzkuemfamy3onoxli+g67sjoky65cg9z1qohsyhceu8...@mail.gmail.com.
As far as I know this is OK, as we want to show consistent values everywhere.

> I am also confused about the inconsistency of calling the GUC
> compute_query_id (with underscore), but pg_stat_activity.queryid.  If we
> make it pg_stat_activity.query_id, it doesn't match most of the other
> *id columsns in the table, leader_pid, usesysid, backend_xid.  Is that
> OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.

Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id
would make more sense.

@@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)

appendStringInfoChar(, '\n');

+   /* query id */
+   appendStringInfo(, "%lld", (long long) pgstat_get_my_queryid());
+   appendStringInfoChar(, ',');
+

/* If in the syslogger process, try to write messages direct to file */
if (MyBackendType == B_LOGGER)
write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);


Unless I'm missing something this will output the query id in the next log
line?  The new code should be added before the newline is output, and the comma
should also be output before the queryid.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Bruce Momjian

On Wed, Apr  7, 2021 at 07:01:25PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Uh, I think your patch missed a few things.  First, you use "%zd"
> > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in
> > src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
> > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
> > fits in a BIGINT SQL column.
> 
> Neither is correct.  Project standard these days for printing [u]int64
> is to write "%lld" or "%llu", with an explicit (long long) cast on
> the printf argument.

Yep, got it.  The attached patch fixes all the calls to use %lld, and
adds casts.  In implementing cvslog, I noticed that internally we pass
the hash as uint64, but output as int64, which I think is a requirement
for how pg_stat_statements has output it, and the use of bigint.  Is
that OK?

I am also confused about the inconsistency of calling the GUC
compute_query_id (with underscore), but pg_stat_activity.queryid.  If we
make it pg_stat_activity.query_id, it doesn't match most of the other
*id columsns in the table, leader_pid, usesysid, backend_xid.  Is that
OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

>From 2d2bafb3c4c205ef5899074447f316d157345faf Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Wed, 7 Apr 2021 19:29:09 -0400
Subject: [PATCH] csv squash commit

---
 doc/src/sgml/config.sgml   |  4 +++-
 doc/src/sgml/file-fdw.sgml |  3 ++-
 src/backend/utils/error/elog.c | 12 
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 963824d050..ea5cf3a2dc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7310,7 +7310,8 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 character count of the error position therein,
 location of the error in the PostgreSQL source code
 (if log_error_verbosity is set to verbose),
-application name, backend type, and process ID of parallel group leader.
+application name, backend type, process ID of parallel group leader,
+and query id.
 Here is a sample table definition for storing CSV-format log output:
 
 
@@ -7341,6 +7342,7 @@ CREATE TABLE postgres_log
   application_name text,
   backend_type text,
   leader_pid integer,
+  query_id bigint,
   PRIMARY KEY (session_id, session_line_num)
 );
 
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 2e21806f48..5b98782064 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -266,7 +266,8 @@ CREATE FOREIGN TABLE pglog (
   location text,
   application_name text,
   backend_type text,
-  leader_pid integer
+  leader_pid integer,
+  query_id bigint
 ) SERVER pglog
 OPTIONS ( filename 'log/pglog.csv', format 'csv' );
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1cf71a649b..1ac18d9a55 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2716,11 +2716,11 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 break;
 			case 'Q':
 if (padding != 0)
-	appendStringInfo(buf, "%*ld", padding,
-			pgstat_get_my_queryid());
+	appendStringInfo(buf, "%*lld", padding,
+			(long long) pgstat_get_my_queryid());
 else
-	appendStringInfo(buf, "%ld",
-			pgstat_get_my_queryid());
+	appendStringInfo(buf, "%lld",
+			(long long) pgstat_get_my_queryid());
 break;
 			default:
 /* format error - ignore it */
@@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)
 
 	appendStringInfoChar(, '\n');
 
+	/* query id */
+	appendStringInfo(, "%lld", (long long) pgstat_get_my_queryid());
+	appendStringInfoChar(, ',');
+
 	/* If in the syslogger process, try to write messages direct to file */
 	if (MyBackendType == B_LOGGER)
 		write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
-- 
2.20.1

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Tom Lane

Bruce Momjian  writes:
> Uh, I think your patch missed a few things.  First, you use "%zd"
> (size_t) for the printf string, but calls to pgstat_get_my_queryid() in
> src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
> pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
> fits in a BIGINT SQL column.

Neither is correct.  Project standard these days for printing [u]int64
is to write "%lld" or "%llu", with an explicit (long long) cast on
the printf argument.

regards, tom lane

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Bruce Momjian

On Thu, Apr  8, 2021 at 05:56:25AM +0800, Julien Rouhaud wrote:
> On Wed, Apr 07, 2021 at 04:22:55PM -0400, Bruce Momjian wrote:
> > aOn Wed, Apr  7, 2021 at 04:15:50PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > Patch applied.  I am ready to adjust this with any improvements people
> > > > might have.  Thank you for all the good feedback we got on this, and I
> > > > know many users have waited a long time for this feature.
> > > 
> > > For starters, you could try to make the buildfarm green again.
> > 
> > Wow, that's odd.  The cfbot was green, so I never even looked at the
> > buildfarm.  I will look at that now, and the CVS log issue.
> 
> Sorry about that.  The issue came from animals with jit_above_cost = 0
> outputting more lines than expected.  I fixed that by using the same query as
> before in explain.sql, as they don't generate any JIT output.

Yes, I just came to the same conclusion, that 'SELECT 1' didn't generate
the proper output lines to allow explain_filter() to strip out the JIT
lines.  I have applied your patch for this, which should fix the build
farm. (I see my first green report now.)

> I also added the queryid to the csvlog output and fixed the documentation that
> mention how to create a table to access the data.

Uh, I think your patch missed a few things.  First, you use "%zd"
(size_t) for the printf string, but calls to pgstat_get_my_queryid() in
src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
fits in a BIGINT SQL column.

Also, you missed the SGML paragraph doc change, but you correctly
changed the SQL table definition.

I am attaching my version of the patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

>From d6b0d010678bab519cc8a54893ff6c4affd34422 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Wed, 7 Apr 2021 18:37:25 -0400
Subject: [PATCH] csv squash commit

---
 doc/src/sgml/config.sgml   | 4 +++-
 doc/src/sgml/file-fdw.sgml | 3 ++-
 src/backend/utils/error/elog.c | 4 
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 963824d050..ea5cf3a2dc 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7310,7 +7310,8 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
 character count of the error position therein,
 location of the error in the PostgreSQL source code
 (if log_error_verbosity is set to verbose),
-application name, backend type, and process ID of parallel group leader.
+application name, backend type, process ID of parallel group leader,
+and query id.
 Here is a sample table definition for storing CSV-format log output:
 
 
@@ -7341,6 +7342,7 @@ CREATE TABLE postgres_log
   application_name text,
   backend_type text,
   leader_pid integer,
+  query_id bigint,
   PRIMARY KEY (session_id, session_line_num)
 );
 
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 2e21806f48..2b531277de 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -266,7 +266,8 @@ CREATE FOREIGN TABLE pglog (
   location text,
   application_name text,
   backend_type text,
-  leader_pid integer
+  leader_pid integer,
+  query_id bigint,
 ) SERVER pglog
 OPTIONS ( filename 'log/pglog.csv', format 'csv' );
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1cf71a649b..c3c2045580 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)
 
 	appendStringInfoChar(, '\n');
 
+	/* query id */
+	appendStringInfo(, "%ld", pgstat_get_my_queryid());
+	appendStringInfoChar(, ',');
+
 	/* If in the syslogger process, try to write messages direct to file */
 	if (MyBackendType == B_LOGGER)
 		write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
-- 
2.20.1

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Julien Rouhaud

On Thu, Apr 08, 2021 at 05:56:25AM +0800, Julien Rouhaud wrote:
> 
> I also added the queryid to the csvlog output and fixed the documentation that
> mention how to create a table to access the data.

Note that I chose to output a 0 queryid if none has been computed rather that
outputting nothing.  Let me know if that's not the wanted behavior.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Julien Rouhaud

On Wed, Apr 07, 2021 at 04:22:55PM -0400, Bruce Momjian wrote:
> aOn Wed, Apr  7, 2021 at 04:15:50PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Patch applied.  I am ready to adjust this with any improvements people
> > > might have.  Thank you for all the good feedback we got on this, and I
> > > know many users have waited a long time for this feature.
> > 
> > For starters, you could try to make the buildfarm green again.
> 
> Wow, that's odd.  The cfbot was green, so I never even looked at the
> buildfarm.  I will look at that now, and the CVS log issue.

Sorry about that.  The issue came from animals with jit_above_cost = 0
outputting more lines than expected.  I fixed that by using the same query as
before in explain.sql, as they don't generate any JIT output.

I also added the queryid to the csvlog output and fixed the documentation that
mention how to create a table to access the data.
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 963824d050..a9d0d63a57 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7341,6 +7341,7 @@ CREATE TABLE postgres_log
   application_name text,
   backend_type text,
   leader_pid integer,
+  queryid bigint,
   PRIMARY KEY (session_id, session_line_num)
 );
 
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 2e21806f48..27827146f1 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -266,7 +266,8 @@ CREATE FOREIGN TABLE pglog (
   location text,
   application_name text,
   backend_type text,
-  leader_pid integer
+  leader_pid integer,
+  queryid bigint
 ) SERVER pglog
 OPTIONS ( filename 'log/pglog.csv', format 'csv' );
 
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1cf71a649b..d27fb999d9 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2965,6 +2965,9 @@ write_csvlog(ErrorData *edata)
appendStringInfo(, "%d", leader->pid);
}
 
+   appendStringInfoChar(, ',');
+   appendStringInfo(, "%zd", pgstat_get_my_queryid());
+
appendStringInfoChar(, '\n');
 
/* If in the syslogger process, try to write messages direct to file */
diff --git a/src/test/regress/expected/explain.out 
b/src/test/regress/expected/explain.out
index 4c578d4f5e..cda28098ba 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -478,11 +478,11 @@ select jsonb_pretty(
 
 rollback;
 set compute_query_id = on;
-select explain_filter('explain (verbose) select 1');
- explain_filter 
-
- Result  (cost=N.N..N.N rows=N width=N)
-   Output: N
+select explain_filter('explain (verbose) select * from int8_tbl i8');
+ explain_filter 
+
+ Seq Scan on public.int8_tbl i8  (cost=N.N..N.N rows=N width=N)
+   Output: q1, q2
  Query Identifier: N
 (3 rows)
 
diff --git a/src/test/regress/sql/explain.sql b/src/test/regress/sql/explain.sql
index 468caf4037..3f9ae9843a 100644
--- a/src/test/regress/sql/explain.sql
+++ b/src/test/regress/sql/explain.sql
@@ -105,4 +105,4 @@ select jsonb_pretty(
 rollback;
 
 set compute_query_id = on;
-select explain_filter('explain (verbose) select 1');
+select explain_filter('explain (verbose) select * from int8_tbl i8');

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Bruce Momjian

aOn Wed, Apr  7, 2021 at 04:15:50PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > Patch applied.  I am ready to adjust this with any improvements people
> > might have.  Thank you for all the good feedback we got on this, and I
> > know many users have waited a long time for this feature.
> 
> For starters, you could try to make the buildfarm green again.

Wow, that's odd.  The cfbot was green, so I never even looked at the
buildfarm.  I will look at that now, and the CVS log issue.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Tom Lane

Bruce Momjian  writes:
> Patch applied.  I am ready to adjust this with any improvements people
> might have.  Thank you for all the good feedback we got on this, and I
> know many users have waited a long time for this feature.

For starters, you could try to make the buildfarm green again.

regards, tom lane

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Justin Pryzby

On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote:
> Patch applied.  I am ready to adjust this with any improvements people
> might have.  Thank you for all the good feedback we got on this, and I
> know many users have waited a long time for this feature.

If you support log_line_prefix 'Q', then you should also add to write_csvlog().

-- 
Justin

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Bruce Momjian

On Wed, Apr  7, 2021 at 08:57:26PM +0800, Julien Rouhaud wrote:
> On Wed, Apr 07, 2021 at 06:15:27PM +0530, Nitin Jadhav wrote:
> > 
> > I feel we should merge both of the conditions as it is done in
> > pgstat_report_xact_timestamp(). Probably we can write a common comment to
> > explain both the conditions.
> > 
> > [...]
> > 
> > Thanks for the explanation. Please add a comment explaining why there is no
> > loop.
> 
> PFA v24.

Patch applied.  I am ready to adjust this with any improvements people
might have.  Thank you for all the good feedback we got on this, and I
know many users have waited a long time for this feature.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Julien Rouhaud

On Wed, Apr 07, 2021 at 06:15:27PM +0530, Nitin Jadhav wrote:
> 
> I feel we should merge both of the conditions as it is done in
> pgstat_report_xact_timestamp(). Probably we can write a common comment to
> explain both the conditions.
> 
> [...]
> 
> Thanks for the explanation. Please add a comment explaining why there is no
> loop.

PFA v24.
>From 29eda2d08f3ed38bbf443898dfad645f5d279d96 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v24 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  25 +
 doc/src/sgml/pgstatstatements.sgml|  20 +-
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  | 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h   |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 1141d2b067..0f8bac0cca 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -116,8 +101,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -237,40 +220,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Nitin Jadhav

>
> On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
> > On 2021-Apr-06, Nitin Jadhav wrote:
> >
> > > I have reviewed the code. Here are a few minor comments.
> > >
> > > 1.
> > > +void
> > > +pgstat_report_queryid(uint64 queryId, bool force)
> > > +{
> > > + volatile PgBackendStatus *beentry = MyBEEntry;
> > > +
> > > + if (!beentry)
> > > + return;
> > > +
> > > + /*
> > > + * if track_activities is disabled, st_queryid should already have
> been
> > > + * reset
> > > + */
> > > + if (!pgstat_track_activities)
> > > + return;
> > >
> > > The above two conditions can be clubbed together in a single condition.
> >
> > I wonder if it wouldn't make more sense to put the assignment *after* we
> > have checked the second condition.
> All other pgstat_report_* functions do the assignment before doing any
> test on
> beentry and/or pgstat_track_activities, I think we should keep this code
> consistent.


I agree about this.

Thanks and Regards,
Nitin Jadhav


On Tue, Apr 6, 2021 at 9:18 PM Julien Rouhaud  wrote:

> On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
> > On 2021-Apr-06, Nitin Jadhav wrote:
> >
> > > I have reviewed the code. Here are a few minor comments.
> > >
> > > 1.
> > > +void
> > > +pgstat_report_queryid(uint64 queryId, bool force)
> > > +{
> > > + volatile PgBackendStatus *beentry = MyBEEntry;
> > > +
> > > + if (!beentry)
> > > + return;
> > > +
> > > + /*
> > > + * if track_activities is disabled, st_queryid should already have
> been
> > > + * reset
> > > + */
> > > + if (!pgstat_track_activities)
> > > + return;
> > >
> > > The above two conditions can be clubbed together in a single condition.
> >
> > I wonder if it wouldn't make more sense to put the assignment *after* we
> > have checked the second condition.
>
> All other pgstat_report_* functions do the assignment before doing any
> test on
> beentry and/or pgstat_track_activities, I think we should keep this code
> consistent.
>

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-07 Thread Nitin Jadhav

>
> >
> > 1.
> > +void
> > +pgstat_report_queryid(uint64 queryId, bool force)
> > +{
> > + volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > + if (!beentry)
> > + return;
> > +
> > + /*
> > + * if track_activities is disabled, st_queryid should already have been
> > + * reset
> > + */
> > + if (!pgstat_track_activities)
> > + return;
> >
> > The above two conditions can be clubbed together in a single condition.
> Right, I just kept it separate as the comment is only relevant for the 2nd
> test.  I'm fine with merging both if needed.


I feel we should merge both of the conditions as it is done in
pgstat_report_xact_timestamp(). Probably we can write a common comment to
explain both the conditions.

> 2.
> > +/* --
> > + * pgstat_get_my_queryid() -
> > + *
> > + * Return current backend's query identifier.
> > + */
> > +uint64
> > +pgstat_get_my_queryid(void)
> > +{
> > + if (!MyBEEntry)
> > + return 0;
> > +
> > + return MyBEEntry->st_queryid;
> > +}
> >
> > Is it safe to directly read the data from MyBEEntry without
> > calling pgstat_begin_read_activity() and pgstat_end_read_activity().
> Kindly
> > ref pgstat_get_backend_current_activity() for more information. Kindly
> let
> > me know if I am wrong.
> This field is only written by a backend for its own entry.
> pg_stat_get_activity already has required protection, so the rest of the
> calls
> to read that field shouldn't have any risk of reading torn values on
> platform
> where this isn't an atomic operation due to concurrent write, as it will be
> from the same backend that originally wrote it.  It avoids some overhead to
> retrieve the queryid, but if people think it's worth having the loop (or a
> comment explaining why there's no loop) I'm also fine with it.


Thanks for the explanation. Please add a comment explaining why there is no
loop.

Thanks and Regards,
Nitin Jadhav

On Tue, Apr 6, 2021 at 8:40 PM Julien Rouhaud  wrote:

> On Tue, Apr 06, 2021 at 08:05:19PM +0530, Nitin Jadhav wrote:
> >
> > 1.
> > +void
> > +pgstat_report_queryid(uint64 queryId, bool force)
> > +{
> > + volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > + if (!beentry)
> > + return;
> > +
> > + /*
> > + * if track_activities is disabled, st_queryid should already have been
> > + * reset
> > + */
> > + if (!pgstat_track_activities)
> > + return;
> >
> > The above two conditions can be clubbed together in a single condition.
>
> Right, I just kept it separate as the comment is only relevant for the 2nd
> test.  I'm fine with merging both if needed.
>
> > 2.
> > +/* --
> > + * pgstat_get_my_queryid() -
> > + *
> > + * Return current backend's query identifier.
> > + */
> > +uint64
> > +pgstat_get_my_queryid(void)
> > +{
> > + if (!MyBEEntry)
> > + return 0;
> > +
> > + return MyBEEntry->st_queryid;
> > +}
> >
> > Is it safe to directly read the data from MyBEEntry without
> > calling pgstat_begin_read_activity() and pgstat_end_read_activity().
> Kindly
> > ref pgstat_get_backend_current_activity() for more information. Kindly
> let
> > me know if I am wrong.
>
> This field is only written by a backend for its own entry.
> pg_stat_get_activity already has required protection, so the rest of the
> calls
> to read that field shouldn't have any risk of reading torn values on
> platform
> where this isn't an atomic operation due to concurrent write, as it will be
> from the same backend that originally wrote it.  It avoids some overhead to
> retrieve the queryid, but if people think it's worth having the loop (or a
> comment explaining why there's no loop) I'm also fine with it.
>

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-06 Thread Julien Rouhaud

On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
> On 2021-Apr-06, Nitin Jadhav wrote:
> 
> > I have reviewed the code. Here are a few minor comments.
> > 
> > 1.
> > +void
> > +pgstat_report_queryid(uint64 queryId, bool force)
> > +{
> > + volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > + if (!beentry)
> > + return;
> > +
> > + /*
> > + * if track_activities is disabled, st_queryid should already have been
> > + * reset
> > + */
> > + if (!pgstat_track_activities)
> > + return;
> > 
> > The above two conditions can be clubbed together in a single condition.
> 
> I wonder if it wouldn't make more sense to put the assignment *after* we
> have checked the second condition.

All other pgstat_report_* functions do the assignment before doing any test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-06 Thread Alvaro Herrera

On 2021-Apr-06, Nitin Jadhav wrote:

> I have reviewed the code. Here are a few minor comments.
> 
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
> 
> The above two conditions can be clubbed together in a single condition.

I wonder if it wouldn't make more sense to put the assignment *after* we
have checked the second condition.

-- 
Álvaro Herrera   Valdivia, Chile

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-06 Thread Julien Rouhaud

On Tue, Apr 06, 2021 at 08:05:19PM +0530, Nitin Jadhav wrote:
> 
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
> 
> The above two conditions can be clubbed together in a single condition.

Right, I just kept it separate as the comment is only relevant for the 2nd
test.  I'm fine with merging both if needed.

> 2.
> +/* --
> + * pgstat_get_my_queryid() -
> + *
> + * Return current backend's query identifier.
> + */
> +uint64
> +pgstat_get_my_queryid(void)
> +{
> + if (!MyBEEntry)
> + return 0;
> +
> + return MyBEEntry->st_queryid;
> +}
> 
> Is it safe to directly read the data from MyBEEntry without
> calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
> ref pgstat_get_backend_current_activity() for more information. Kindly let
> me know if I am wrong.

This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the calls
to read that field shouldn't have any risk of reading torn values on platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it.  It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-06 Thread Nitin Jadhav

I have reviewed the code. Here are a few minor comments.

1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;

The above two conditions can be clubbed together in a single condition.

2.
+/* --
+ * pgstat_get_my_queryid() -
+ *
+ * Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+ if (!MyBEEntry)
+ return 0;
+
+ return MyBEEntry->st_queryid;
+}

Is it safe to directly read the data from MyBEEntry without
calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
ref pgstat_get_backend_current_activity() for more information. Kindly let
me know if I am wrong.

Thanks and Regards,
Nitin Jadhav

On Mon, Apr 5, 2021 at 10:46 PM Bruce Momjian  wrote:

> On Sun, Apr  4, 2021 at 10:18:50PM +0800, Julien Rouhaud wrote:
> > On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:
> > > On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> > > >
> > > > OK, I am happy with your design decisions, thanks.
> > >
> > > Thanks!  While double checking I noticed that I failed to remove a
> (now)
> > > useless include of pgstat.h in nodeGatherMerge.c in last version.  I'm
> > > attaching v22 to fix that, no other change.
> >
> > There was a conflict since e1025044c (Split backend status and progress
> related
> > functionality out of pgstat.c).
> >
> > Attached v23 is a rebase against current HEAD, and I also added a few
> > UINT64CONST() macro usage for consistency.
>
> Thanks.  I struggled with merging the statistics collection changes into
> my cluster file encryption branches because my patch made changes to
> code that moved to another C file.
>
> I plan to apply this tomorrow.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>
>
>

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-05 Thread Bruce Momjian

On Sun, Apr  4, 2021 at 10:18:50PM +0800, Julien Rouhaud wrote:
> On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:
> > On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> > > 
> > > OK, I am happy with your design decisions, thanks.
> > 
> > Thanks!  While double checking I noticed that I failed to remove a (now)
> > useless include of pgstat.h in nodeGatherMerge.c in last version.  I'm
> > attaching v22 to fix that, no other change.
> 
> There was a conflict since e1025044c (Split backend status and progress 
> related
> functionality out of pgstat.c).
> 
> Attached v23 is a rebase against current HEAD, and I also added a few
> UINT64CONST() macro usage for consistency.

Thanks.  I struggled with merging the statistics collection changes into
my cluster file encryption branches because my patch made changes to
code that moved to another C file.

I plan to apply this tomorrow.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-04 Thread Julien Rouhaud

On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:
> On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> > 
> > OK, I am happy with your design decisions, thanks.
> 
> Thanks!  While double checking I noticed that I failed to remove a (now)
> useless include of pgstat.h in nodeGatherMerge.c in last version.  I'm
> attaching v22 to fix that, no other change.

There was a conflict since e1025044c (Split backend status and progress related
functionality out of pgstat.c).

Attached v23 is a rebase against current HEAD, and I also added a few
UINT64CONST() macro usage for consistency.
>From 29eda2d08f3ed38bbf443898dfad645f5d279d96 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v23 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  25 +
 doc/src/sgml/pgstatstatements.sgml|  20 +-
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  | 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h   |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 1141d2b067..0f8bac0cca 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -116,8 +101,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -237,40 +220,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Julien Rouhaud

On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> 
> OK, I am happy with your design decisions, thanks.

Thanks!  While double checking I noticed that I failed to remove a (now)
useless include of pgstat.h in nodeGatherMerge.c in last version.  I'm
attaching v22 to fix that, no other change.
>From 819a45faf520dfd60b4fe3e9aea71e3a2b69 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v22 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  25 +
 doc/src/sgml/pgstatstatements.sgml|  20 +-
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  | 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h   |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef struct pgssLocationLen
-{
-	int			location;		/* start offset in query text */
-	int			length;			/* length in bytes, or -1 to ignore */
-} pgssLocationLen;
-
-/*
- * Working state for computing a query jumble and producing a normalized
- * query string
- */
-typedef struct pgssJumbleState
-{
-	/* Jumble of current query tree */
-	unsigned char *jumble;
-
-	/* Number of bytes used in jumble[] */
-	Size		jumble_len;
-
-	/* Array of locations of constants that should be removed */
-	pgssLocationLen

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Bruce Momjian

On Fri, Apr  2, 2021 at 02:28:02AM +0800, Julien Rouhaud wrote:
> Unless I'm missing something query_string isn't a global variable, it's a
> parameter passed to exec_simple_query() from postgresMain().
> 
> It's then passed to the stats collector to be able to be displayed in
> pg_stat_activity through pgstat_report_activity() a bit like what I do for the
> queryid.
> 
> There's a global variable debug_query_string, but it's only for debugging
> purpose.
> 
> > > I wonder if the query hash
> > > should be a global variable too --- this would more clearly match how we
> > > handle top-level info like query_string.  Digging into the stats system
> > > to get top-level info does seem odd.
> 
> The main difference is that there's a single top level query_string,
> even if it contains multiple statements.  But there would be multiple queryid
> calculated in that case and we don't want to change it during a top level
> multi-statements execution, so we can't use the same approach.
> 
> Also, the query_string is directly logged from this code path, while the
> queryid is logged as a log_line_prefix, and almost all the code there also
> retrieve information from some shared structure.
> 
> And since it also has to be available in pg_stat_activity, having a single
> source of truth looked like a better approach.
> 
> > Also, if you go in that direction, make sure the hash it set in the same
> > places the query string is set, though I am unclear how extensions would
> > handle that.
> 
> It should be transparent for application, it's extracting the first queryid
> seen for each top level statement and export it.  The rest of the code still
> continue to see the queryid that corresponds to the really executed single
> statement.

OK, I am happy with your design decisions, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Julien Rouhaud

On Thu, Apr 01, 2021 at 01:59:15PM -0400, Bruce Momjian wrote:
> On Thu, Apr  1, 2021 at 01:56:42PM -0400, Bruce Momjian wrote:
> > You are using:
> > 
> > /* --
> >  * pgstat_get_my_queryid() -
> >  *
> >  *  Return current backend's query identifier.
> >  */
> > uint64
> > pgstat_get_my_queryid(void)
> > {
> > if (!MyBEEntry)
> > return 0;
> > 
> > return MyBEEntry->st_queryid;
> > }
> > 
> > Looking at log_statement:
> > 
> > /* Log immediately if dictated by log_statement */
> > if (check_log_statement(parsetree_list))
> > {
> > ereport(LOG,
> > (errmsg("statement: %s", query_string),
> >  errhidestmt(true),
> >  errdetail_execute(parsetree_list)));
> > was_logged = true;
> > }
> > 
> > it uses the global variable query_string.

Unless I'm missing something query_string isn't a global variable, it's a
parameter passed to exec_simple_query() from postgresMain().

It's then passed to the stats collector to be able to be displayed in
pg_stat_activity through pgstat_report_activity() a bit like what I do for the
queryid.

There's a global variable debug_query_string, but it's only for debugging
purpose.

> > I wonder if the query hash
> > should be a global variable too --- this would more clearly match how we
> > handle top-level info like query_string.  Digging into the stats system
> > to get top-level info does seem odd.

The main difference is that there's a single top level query_string,
even if it contains multiple statements.  But there would be multiple queryid
calculated in that case and we don't want to change it during a top level
multi-statements execution, so we can't use the same approach.

Also, the query_string is directly logged from this code path, while the
queryid is logged as a log_line_prefix, and almost all the code there also
retrieve information from some shared structure.

And since it also has to be available in pg_stat_activity, having a single
source of truth looked like a better approach.

> Also, if you go in that direction, make sure the hash it set in the same
> places the query string is set, though I am unclear how extensions would
> handle that.

It should be transparent for application, it's extracting the first queryid
seen for each top level statement and export it.  The rest of the code still
continue to see the queryid that corresponds to the really executed single
statement.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Bruce Momjian

On Thu, Apr  1, 2021 at 01:56:42PM -0400, Bruce Momjian wrote:
> You are using:
> 
>   /* --
>* pgstat_get_my_queryid() -
>*
>*  Return current backend's query identifier.
>*/
>   uint64
>   pgstat_get_my_queryid(void)
>   {
>   if (!MyBEEntry)
>   return 0;
>   
>   return MyBEEntry->st_queryid;
>   }
> 
> Looking at log_statement:
> 
>   /* Log immediately if dictated by log_statement */
>   if (check_log_statement(parsetree_list))
>   {
>   ereport(LOG,
>   (errmsg("statement: %s", query_string),
>errhidestmt(true),
>errdetail_execute(parsetree_list)));
>   was_logged = true;
>   }
> 
> it uses the global variable query_string.  I wonder if the query hash
> should be a global variable too --- this would more clearly match how we
> handle top-level info like query_string.  Digging into the stats system
> to get top-level info does seem odd.

Also, if you go in that direction, make sure the hash it set in the same
places the query string is set, though I am unclear how extensions would
handle that.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Bruce Momjian

On Thu, Apr  1, 2021 at 11:30:15PM +0800, Julien Rouhaud wrote:
> On Thu, Apr 01, 2021 at 11:05:24PM +0800, Julien Rouhaud wrote:
> > On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:
> > > On 2021-Mar-31, Bruce Momjian wrote:
> > > > 
> > > > I assume it is since Alvaro didn't reply.  I am planning to apply this
> > > > soon.
> > > 
> > > I'm afraid I don't know enough about how parallel query works to make a
> > > good assessment on this being a good approach or not -- and no time at
> > > present to figure it all out.
> > 
> > I'm far from being an expert either, but at the time I wrote it and
> > looking at the code around it probably seemed sensible.  We could directly 
> > call
> > pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from 
> > the
> > various callers though, at least there would be a single source for it.
> 
> Here's a v21 that includes the mentioned change.

You are using:

/* --
 * pgstat_get_my_queryid() -
 *
 *  Return current backend's query identifier.
 */
uint64
pgstat_get_my_queryid(void)
{
if (!MyBEEntry)
return 0;

return MyBEEntry->st_queryid;
}

Looking at log_statement:

/* Log immediately if dictated by log_statement */
if (check_log_statement(parsetree_list))
{
ereport(LOG,
(errmsg("statement: %s", query_string),
 errhidestmt(true),
 errdetail_execute(parsetree_list)));
was_logged = true;
}

it uses the global variable query_string.  I wonder if the query hash
should be a global variable too --- this would more clearly match how we
handle top-level info like query_string.  Digging into the stats system
to get top-level info does seem odd.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Julien Rouhaud

On Thu, Apr 01, 2021 at 11:05:24PM +0800, Julien Rouhaud wrote:
> On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:
> > On 2021-Mar-31, Bruce Momjian wrote:
> > > 
> > > I assume it is since Alvaro didn't reply.  I am planning to apply this
> > > soon.
> > 
> > I'm afraid I don't know enough about how parallel query works to make a
> > good assessment on this being a good approach or not -- and no time at
> > present to figure it all out.
> 
> I'm far from being an expert either, but at the time I wrote it and
> looking at the code around it probably seemed sensible.  We could directly 
> call
> pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the
> various callers though, at least there would be a single source for it.

Here's a v21 that includes the mentioned change.
>From 819a45faf520dfd60b4fe3e9aea71e3a2b69 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v21 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  25 +
 doc/src/sgml/pgstatstatements.sgml|  20 +-
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  | 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h   |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState
 	pgssGlobalStats stats;		/* global statistics for pgss */
 } pgssSharedState;
 
-/*
- * Struct for tracking locations/lengths of constants during normalization
- */
-typedef

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-04-01 Thread Julien Rouhaud

On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:
> On 2021-Mar-31, Bruce Momjian wrote:
> > 
> > I assume it is since Alvaro didn't reply.  I am planning to apply this
> > soon.
> 
> I'm afraid I don't know enough about how parallel query works to make a
> good assessment on this being a good approach or not -- and no time at
> present to figure it all out.

I'm far from being an expert either, but at the time I wrote it and
looking at the code around it probably seemed sensible.  We could directly call
pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the
various callers though, at least there would be a single source for it.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-31 Thread Alvaro Herrera

On 2021-Mar-31, Bruce Momjian wrote:

> On Wed, Mar 31, 2021 at 11:25:32AM +0800, Julien Rouhaud wrote:
> > On Thu, Mar 25, 2021 at 10:36:38AM +0800, Julien Rouhaud wrote:
> > > On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote:

> > > > I find it odd that there's executor code that acquires the current query
> > > > ID from pgstat, after having been put there by planner or ExecutorStart
> > > > itself.  Seems like a modularity violation.  I wonder if it would make
> > > > more sense to have the value maybe in struct EState (or perhaps there's
> > > > a better place -- but I don't think they have a way to reach the
> > > > QueryDesc anyhow), put there by ExecutorStart, so that places such as
> > > > execParallel, nodeGather etc don't have to fetch it from pgstat but from
> > > > EState.
> > > 
> > > The current queryid is already available in the Estate, as the underlying
> > > PlannedStmt contains it.  The problem is that we want to display the top 
> > > level
> > > queryid, not the current query one, and the top level queryid is held in
> > > pgstat.
> > 
> > So is the current approach ok?  If not I'm afraid that detecting and caching
> > the top level queryid in the executor parts would lead to some code
> > duplication.
> 
> I assume it is since Alvaro didn't reply.  I am planning to apply this
> soon.

I'm afraid I don't know enough about how parallel query works to make a
good assessment on this being a good approach or not -- and no time at
present to figure it all out.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"I think my standards have lowered enough that now I think 'good design'
is when the page doesn't irritate the living f*ck out of me." (JWZ)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-31 Thread Bruce Momjian

On Wed, Mar 31, 2021 at 11:25:32AM +0800, Julien Rouhaud wrote:
> On Thu, Mar 25, 2021 at 10:36:38AM +0800, Julien Rouhaud wrote:
> > On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote:
> > > On 2021-Mar-24, Julien Rouhaud wrote:
> > > 
> > > > From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001
> > > > From: Bruce Momjian 
> > > > Date: Mon, 22 Mar 2021 17:43:23 -0400
> > > > Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and
> > > 
> > > >  src/backend/executor/execMain.c   |   9 ++
> > > >  src/backend/executor/execParallel.c   |  14 ++-
> > > >  src/backend/executor/nodeGather.c |   3 +-
> > > >  src/backend/executor/nodeGatherMerge.c|   4 +-
> > > 
> > > Hmm...
> > > 
> > > I find it odd that there's executor code that acquires the current query
> > > ID from pgstat, after having been put there by planner or ExecutorStart
> > > itself.  Seems like a modularity violation.  I wonder if it would make
> > > more sense to have the value maybe in struct EState (or perhaps there's
> > > a better place -- but I don't think they have a way to reach the
> > > QueryDesc anyhow), put there by ExecutorStart, so that places such as
> > > execParallel, nodeGather etc don't have to fetch it from pgstat but from
> > > EState.
> > 
> > The current queryid is already available in the Estate, as the underlying
> > PlannedStmt contains it.  The problem is that we want to display the top 
> > level
> > queryid, not the current query one, and the top level queryid is held in
> > pgstat.
> 
> So is the current approach ok?  If not I'm afraid that detecting and caching
> the top level queryid in the executor parts would lead to some code
> duplication.

I assume it is since Alvaro didn't reply.  I am planning to apply this
soon.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-30 Thread Julien Rouhaud

On Thu, Mar 25, 2021 at 10:36:38AM +0800, Julien Rouhaud wrote:
> On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote:
> > On 2021-Mar-24, Julien Rouhaud wrote:
> > 
> > > From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001
> > > From: Bruce Momjian 
> > > Date: Mon, 22 Mar 2021 17:43:23 -0400
> > > Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and
> > 
> > >  src/backend/executor/execMain.c   |   9 ++
> > >  src/backend/executor/execParallel.c   |  14 ++-
> > >  src/backend/executor/nodeGather.c |   3 +-
> > >  src/backend/executor/nodeGatherMerge.c|   4 +-
> > 
> > Hmm...
> > 
> > I find it odd that there's executor code that acquires the current query
> > ID from pgstat, after having been put there by planner or ExecutorStart
> > itself.  Seems like a modularity violation.  I wonder if it would make
> > more sense to have the value maybe in struct EState (or perhaps there's
> > a better place -- but I don't think they have a way to reach the
> > QueryDesc anyhow), put there by ExecutorStart, so that places such as
> > execParallel, nodeGather etc don't have to fetch it from pgstat but from
> > EState.
> 
> The current queryid is already available in the Estate, as the underlying
> PlannedStmt contains it.  The problem is that we want to display the top level
> queryid, not the current query one, and the top level queryid is held in
> pgstat.

So is the current approach ok?  If not I'm afraid that detecting and caching
the top level queryid in the executor parts would lead to some code
duplication.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-25 Thread Bruce Momjian

On Wed, Mar 24, 2021 at 11:20:49PM +0800, Julien Rouhaud wrote:
> On Wed, Mar 24, 2021 at 08:13:40AM -0400, Bruce Momjian wrote:
> > I have no local modifications.  Please modify the patch I posted and
> > repost your version, thanks.
> 
> Ok!  I used the last version of the patch you sent and addressed the following
> comments from earlier messages in attached v20:
> 
> - copyright year to 2021
> - s/has has been compute/has been compute/
> - use the name CleanQuerytext in the first commit

My apologies --- yes, I made those two changes after I posted my version
of the patch.  I should have reposted my version with those changes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-24 Thread Julien Rouhaud

On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote:
> On 2021-Mar-24, Julien Rouhaud wrote:
> 
> > From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001
> > From: Bruce Momjian 
> > Date: Mon, 22 Mar 2021 17:43:23 -0400
> > Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and
> 
> >  src/backend/executor/execMain.c   |   9 ++
> >  src/backend/executor/execParallel.c   |  14 ++-
> >  src/backend/executor/nodeGather.c |   3 +-
> >  src/backend/executor/nodeGatherMerge.c|   4 +-
> 
> Hmm...
> 
> I find it odd that there's executor code that acquires the current query
> ID from pgstat, after having been put there by planner or ExecutorStart
> itself.  Seems like a modularity violation.  I wonder if it would make
> more sense to have the value maybe in struct EState (or perhaps there's
> a better place -- but I don't think they have a way to reach the
> QueryDesc anyhow), put there by ExecutorStart, so that places such as
> execParallel, nodeGather etc don't have to fetch it from pgstat but from
> EState.

The current queryid is already available in the Estate, as the underlying
PlannedStmt contains it.  The problem is that we want to display the top level
queryid, not the current query one, and the top level queryid is held in
pgstat.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-24 Thread Alvaro Herrera

On 2021-Mar-24, Julien Rouhaud wrote:

> From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001
> From: Bruce Momjian 
> Date: Mon, 22 Mar 2021 17:43:23 -0400
> Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and

>  src/backend/executor/execMain.c   |   9 ++
>  src/backend/executor/execParallel.c   |  14 ++-
>  src/backend/executor/nodeGather.c |   3 +-
>  src/backend/executor/nodeGatherMerge.c|   4 +-

Hmm...

I find it odd that there's executor code that acquires the current query
ID from pgstat, after having been put there by planner or ExecutorStart
itself.  Seems like a modularity violation.  I wonder if it would make
more sense to have the value maybe in struct EState (or perhaps there's
a better place -- but I don't think they have a way to reach the
QueryDesc anyhow), put there by ExecutorStart, so that places such as
execParallel, nodeGather etc don't have to fetch it from pgstat but from
EState.

-- 
Álvaro Herrera   Valdivia, Chile
"We're here to devour each other alive"(Hobbes)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-24 Thread Julien Rouhaud

On Wed, Mar 24, 2021 at 08:13:40AM -0400, Bruce Momjian wrote:
> On Wed, Mar 24, 2021 at 04:51:40PM +0800, Julien Rouhaud wrote:
> > > but if you do that it'd be better to avoid
> > > introducing a function with one name and renaming it in your next
> > > commit.
> > 
> > Oops, I apparently messed a fixup when working on it.  Bruce, should I take
> > care of that of do you want to?  I think you have some local modifications
> > already I'd rather not miss some changes.
> 
> I have no local modifications.  Please modify the patch I posted and
> repost your version, thanks.

Ok!  I used the last version of the patch you sent and addressed the following
comments from earlier messages in attached v20:

- copyright year to 2021
- s/has has been compute/has been compute/
- use the name CleanQuerytext in the first commit

I didn't change the position of queryid in pg_stat_get_activity(), as the
"real" order is actually define in system_views.sql when creating
pg_stat_activity view.  Adding the new fields at the end of
pg_stat_get_activity() helps to keep the C code simpler and less bug prone, so
I think it's best to continue this way.

I also used the previous commit message if that helps.
>From 5df95cb13c3505d654bd480c8978fe6f5eba00bb Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH v20 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  25 +
 doc/src/sgml/pgstatstatements.sgml|  20 +-
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  | 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h   |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-24 Thread Bruce Momjian

On Wed, Mar 24, 2021 at 04:51:40PM +0800, Julien Rouhaud wrote:
> > but if you do that it'd be better to avoid
> > introducing a function with one name and renaming it in your next
> > commit.
> 
> Oops, I apparently messed a fixup when working on it.  Bruce, should I take
> care of that of do you want to?  I think you have some local modifications
> already I'd rather not miss some changes.

I have no local modifications.  Please modify the patch I posted and
repost your version, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-24 Thread Julien Rouhaud

On Wed, Mar 24, 2021 at 05:12:35AM -0300, Alvaro Herrera wrote:
> On 2021-Mar-22, Bruce Momjian wrote:
> 
> > diff --git a/src/include/catalog/pg_proc.dat 
> > b/src/include/catalog/pg_proc.dat
> > index e259531f60..9550de0798 100644
> > --- a/src/include/catalog/pg_proc.dat
> > +++ b/src/include/catalog/pg_proc.dat
> > @@ -5249,9 +5249,9 @@
> >proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
> >proretset => 't', provolatile => 's', proparallel => 'r',
> >prorettype => 'record', proargtypes => 'int4',
> > -  proallargtypes => 
> > '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
> > -  proargmodes => 
> > '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> > -  proargnames => 
> > '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
> > +  proallargtypes => 
> > '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
> > +  proargmodes => 
> > '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> > +  proargnames => 
> > '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
> 
> BTW why do you put the queryid column at the end of the column list
> here?  It seems awkward.  Can we put it perhaps between state and query?

I thought that it would be better to have it at the end as it can always be
NULL (and will be by default), which I guess was also the reason to have
leader_pid there.  I'm all in favor to have queryid near the query, and
while at it leader_pid near the pid.

> > -const char *clean_querytext(const char *query, int *location, int *len);
> > +const char *CleanQuerytext(const char *query, int *location, int *len);
> >  JumbleState *JumbleQuery(Query *query, const char *querytext);
> 
> I think pushing in more than one commit is a reasonable approach if they
> are well-contained

They should, as I incrementally built on top of the first one.  I also just
double checked the patchset and each new commit compiles and passes the
regression tests.

> but if you do that it'd be better to avoid
> introducing a function with one name and renaming it in your next
> commit.

Oops, I apparently messed a fixup when working on it.  Bruce, should I take
care of that of do you want to?  I think you have some local modifications
already I'd rather not miss some changes.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-24 Thread Alvaro Herrera

On 2021-Mar-22, Bruce Momjian wrote:

> diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
> index e259531f60..9550de0798 100644
> --- a/src/include/catalog/pg_proc.dat
> +++ b/src/include/catalog/pg_proc.dat
> @@ -5249,9 +5249,9 @@
>proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
>proretset => 't', provolatile => 's', proparallel => 'r',
>prorettype => 'record', proargtypes => 'int4',
> -  proallargtypes => 
> '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
> -  proargmodes => 
> '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> -  proargnames => 
> '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
> +  proallargtypes => 
> '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
> +  proargmodes => 
> '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> +  proargnames => 
> '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',

BTW why do you put the queryid column at the end of the column list
here?  It seems awkward.  Can we put it perhaps between state and query?


> -const char *clean_querytext(const char *query, int *location, int *len);
> +const char *CleanQuerytext(const char *query, int *location, int *len);
>  JumbleState *JumbleQuery(Query *query, const char *querytext);

I think pushing in more than one commit is a reasonable approach if they
are well-contained, but if you do that it'd be better to avoid
introducing a function with one name and renaming it in your next
commit.

-- 
Álvaro Herrera   Valdivia, Chile
"Just treat us the way you want to be treated + some extra allowance
 for ignorance."(Michael Brusser)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Julien Rouhaud

On Tue, Mar 23, 2021 at 12:27:10PM -0400, Bruce Momjian wrote:
> 
> No, I was thinking of just doing a single commit.  Should I do three
> commits?  I posted it as three patches since that is how it was posted
> by the author, and reviewing is easier.  It also will need a catversion
> bump.

Yes, I originally split the commit because it was easier to write this way and
it seemed better to send different patches too to ease review.

I think that it would make sense to commit the first patch separately, but I'm
fine with a single commit if you prefer.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Julien Rouhaud

On Tue, Mar 23, 2021 at 10:34:38AM -0400, Bruce Momjian wrote:
> On Tue, Mar 23, 2021 at 02:36:27PM +0800, Julien Rouhaud wrote:
> > 
> > Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least 
> > for
> > non .po files?
> 
> No, I don't think so.  We don't change the Free Software Foundation
> copyrights, and the .po files get loaded from another repository
> occasionally.  The hex/sha copyrights came from patches developed in
> 2020 but committed in 2021.  These will mostly be corrected in 2022.

Ok, thanks for the clarification!

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Bruce Momjian

On Tue, Mar 23, 2021 at 12:12:03PM -0300, Álvaro Herrera wrote:
> On 2021-Mar-22, Bruce Momjian wrote:
> 
> > --- a/doc/src/sgml/ref/explain.sgml
> > +++ b/doc/src/sgml/ref/explain.sgml
> > @@ -136,8 +136,10 @@ ROLLBACK;
> >the output column list for each node in the plan tree, schema-qualify
> >table and function names, always label variables in expressions with
> >their range table alias, and always print the name of each trigger 
> > for
> > -  which statistics are displayed.  This parameter defaults to
> > -  FALSE.
> > +  which statistics are displayed.  The query identifier will also be
> > +  displayed if one has been compute, see  > +  linkend="guc-compute-query-id"/> for more details.  This parameter
> > +  defaults to FALSE.
> 
> Typo here, "has been computed".

Good catch, fixed.

> Is the intention to commit each of these patches separately?

No, I was thinking of just doing a single commit.  Should I do three
commits?  I posted it as three patches since that is how it was posted
by the author, and reviewing is easier.  It also will need a catversion
bump.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Alvaro Herrera

On 2021-Mar-22, Bruce Momjian wrote:

> --- a/doc/src/sgml/ref/explain.sgml
> +++ b/doc/src/sgml/ref/explain.sgml
> @@ -136,8 +136,10 @@ ROLLBACK;
>the output column list for each node in the plan tree, schema-qualify
>table and function names, always label variables in expressions with
>their range table alias, and always print the name of each trigger for
> -  which statistics are displayed.  This parameter defaults to
> -  FALSE.
> +  which statistics are displayed.  The query identifier will also be
> +  displayed if one has been compute, see  +  linkend="guc-compute-query-id"/> for more details.  This parameter
> +  defaults to FALSE.

Typo here, "has been computed".

Is the intention to commit each of these patches separately?

-- 
Álvaro Herrera   Valdivia, Chile

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Bruce Momjian

On Tue, Mar 23, 2021 at 02:36:27PM +0800, Julien Rouhaud wrote:
> On Mon, Mar 22, 2021 at 08:43:40PM -0400, Bruce Momjian wrote:
> > On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:
> > > Hi,
> > > For queryjumble.c :
> > > 
> > > + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> > > 
> > > The year should be updated.
> > > Same with queryjumble.h
> > 
> > Thanks, fixed.
> 
> Thanks also for taking care of that.  While at it I see that current HEAD has 
> a
> lot of files with the same problem:
> 
> $ git grep "\-2020"
> config/config.guess:#   Copyright 1992-2020 Free Software Foundation, Inc.
> config/config.guess:Copyright 1992-2020 Free Software Foundation, Inc.
> config/config.sub:#   Copyright 1992-2020 Free Software Foundation, Inc.
> config/config.sub:Copyright 1992-2020 Free Software Foundation, Inc.
> contrib/pageinspect/gistfuncs.c: * Copyright (c) 2014-2020, PostgreSQL Global 
> Development Group
> src/backend/rewrite/rewriteSearchCycle.c: * Portions Copyright (c) 1996-2020, 
> PostgreSQL Global Development Group
> src/backend/utils/adt/jsonbsubs.c: * Portions Copyright (c) 1996-2020, 
> PostgreSQL Global Development Group
> src/bin/pg_archivecleanup/po/de.po:# Copyright (C) 2019-2020 PostgreSQL 
> Global Development Group
> src/bin/pg_rewind/po/de.po:# Copyright (C) 2015-2020 PostgreSQL Global 
> Development Group
> src/bin/pg_rewind/po/de.po:# Peter Eisentraut , 
> 2015-2020.
> src/common/hex.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
> Development Group
> src/common/sha1.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
> Development Group
> src/common/sha1_int.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
> Development Group
> src/include/common/hex.h: * Portions Copyright (c) 1996-2020, PostgreSQL 
> Global Development Group
> src/include/common/sha1.h: * Portions Copyright (c) 1996-2020, PostgreSQL 
> Global Development Group
> src/include/port/pg_iovec.h: * Portions Copyright (c) 1996-2020, PostgreSQL 
> Global Development Group
> src/include/rewrite/rewriteSearchCycle.h: * Portions Copyright (c) 1996-2020, 
> PostgreSQL Global Development Group
> src/interfaces/ecpg/preproc/po/de.po:# Copyright (C) 2009-2020 PostgreSQL 
> Global Development Group
> src/interfaces/ecpg/preproc/po/de.po:# Peter Eisentraut 
> , 2009-2020.
> 
> Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for
> non .po files?

No, I don't think so.  We don't change the Free Software Foundation
copyrights, and the .po files get loaded from another repository
occasionally.  The hex/sha copyrights came from patches developed in
2020 but committed in 2021.  These will mostly be corrected in 2022.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Julien Rouhaud

On Mon, Mar 22, 2021 at 08:43:40PM -0400, Bruce Momjian wrote:
> On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:
> > Hi,
> > For queryjumble.c :
> > 
> > + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> > 
> > The year should be updated.
> > Same with queryjumble.h
> 
> Thanks, fixed.

Thanks also for taking care of that.  While at it I see that current HEAD has a
lot of files with the same problem:

$ git grep "\-2020"
config/config.guess:#   Copyright 1992-2020 Free Software Foundation, Inc.
config/config.guess:Copyright 1992-2020 Free Software Foundation, Inc.
config/config.sub:#   Copyright 1992-2020 Free Software Foundation, Inc.
config/config.sub:Copyright 1992-2020 Free Software Foundation, Inc.
contrib/pageinspect/gistfuncs.c: * Copyright (c) 2014-2020, PostgreSQL Global 
Development Group
src/backend/rewrite/rewriteSearchCycle.c: * Portions Copyright (c) 1996-2020, 
PostgreSQL Global Development Group
src/backend/utils/adt/jsonbsubs.c: * Portions Copyright (c) 1996-2020, 
PostgreSQL Global Development Group
src/bin/pg_archivecleanup/po/de.po:# Copyright (C) 2019-2020 PostgreSQL Global 
Development Group
src/bin/pg_rewind/po/de.po:# Copyright (C) 2015-2020 PostgreSQL Global 
Development Group
src/bin/pg_rewind/po/de.po:# Peter Eisentraut , 2015-2020.
src/common/hex.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
Development Group
src/common/sha1.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
Development Group
src/common/sha1_int.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
Development Group
src/include/common/hex.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global 
Development Group
src/include/common/sha1.h: * Portions Copyright (c) 1996-2020, PostgreSQL 
Global Development Group
src/include/port/pg_iovec.h: * Portions Copyright (c) 1996-2020, PostgreSQL 
Global Development Group
src/include/rewrite/rewriteSearchCycle.h: * Portions Copyright (c) 1996-2020, 
PostgreSQL Global Development Group
src/interfaces/ecpg/preproc/po/de.po:# Copyright (C) 2009-2020 PostgreSQL 
Global Development Group
src/interfaces/ecpg/preproc/po/de.po:# Peter Eisentraut , 
2009-2020.

Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for
non .po files?

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-23 Thread Julien Rouhaud

On Mon, Mar 22, 2021 at 05:55:54PM -0400, Bruce Momjian wrote:
> 
> OK, after reading the entire thread, I don't think there are any
> remaining open issues with this patch and I think this is ready for
> committing.  I have adjusted the doc section of the patches, attached. 
> I have marked myself as committer in the commitfest app and hope to
> apply it in the next few days based on feedback.

Thanks a lot Bruce!

I looked at the changes in the attached patches and that's a clear
improvements, thanks a lot for that.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-22 Thread Bruce Momjian

On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:
> Hi,
> For queryjumble.c :
> 
> + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> 
> The year should be updated.
> Same with queryjumble.h

Thanks, fixed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-22 Thread Zhihong Yu

Hi,
For queryjumble.c :

+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group

The year should be updated.
Same with queryjumble.h

Cheers

On Mon, Mar 22, 2021 at 2:56 PM Bruce Momjian  wrote:

> On Sat, Mar 20, 2021 at 02:12:34PM +0800, Julien Rouhaud wrote:
> > On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:
> > >
> > > Well, given we don't really want to support multiple query id types
> > > being generated or displayed, the "error out" above should fix it.
> > >
> > > Let's do this --- tell extensions to error out if the query id is
> > > already set, either by compute_query_id or another extension.  If an
> > > extension wants to generate its own query id and store is internal to
> > > the extension, that is fine, but the server-displayed query id should
> be
> > > generated once and never overwritten by an extension.
> >
> > Agreed, this will ensure that you won't dynamically change the queryid
> source.
> >
> > We should also document that changing it requires a restart and calling
> > pg_stat_statements_reset() afterwards.
> >
> > v19 adds some changes, plus extra documentation for pg_stat_statements
> about
> > the requirement for a queryid to be calculated, and a note that all
> documented
> > details only apply for in-core source.  I'm not sure if this is still
> the best
> > place to document those details anymore though.
>
> OK, after reading the entire thread, I don't think there are any
> remaining open issues with this patch and I think this is ready for
> committing.  I have adjusted the doc section of the patches, attached.
> I have marked myself as committer in the commitfest app and hope to
> apply it in the next few days based on feedback.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-22 Thread Bruce Momjian

On Sat, Mar 20, 2021 at 02:12:34PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:
> > 
> > Well, given we don't really want to support multiple query id types
> > being generated or displayed, the "error out" above should fix it. 
> > 
> > Let's do this --- tell extensions to error out if the query id is
> > already set, either by compute_query_id or another extension.  If an
> > extension wants to generate its own query id and store is internal to
> > the extension, that is fine, but the server-displayed query id should be
> > generated once and never overwritten by an extension.
> 
> Agreed, this will ensure that you won't dynamically change the queryid source.
> 
> We should also document that changing it requires a restart and calling
> pg_stat_statements_reset() afterwards.
> 
> v19 adds some changes, plus extra documentation for pg_stat_statements about
> the requirement for a queryid to be calculated, and a note that all documented
> details only apply for in-core source.  I'm not sure if this is still the best
> place to document those details anymore though.

OK, after reading the entire thread, I don't think there are any
remaining open issues with this patch and I think this is ready for
committing.  I have adjusted the doc section of the patches, attached. 
I have marked myself as committer in the commitfest app and hope to
apply it in the next few days based on feedback.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

>From 5ab783123fc11e948963de7a7c3e6428051e3315 Mon Sep 17 00:00:00 2001
From: Bruce Momjian 
Date: Mon, 22 Mar 2021 17:43:22 -0400
Subject: [PATCH] qid-01-jumble_over_master squash commit

---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  25 +
 doc/src/sgml/pgstatstatements.sgml|  20 +-
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c (new)| 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h (new) |  58 ++
 13 files changed, 995 insertions(+), 785 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..bd8c96728c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * Starting in Postgres 9.2, this module normalized query entries.  As of
+ * Postgres 14, the normalization is done by the core if compute_query_id is
+ * enabled, or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE1024	/* query serialization buffer size */
-
 /*
  * Extension version number, for supporting older extension versions' objects
  */
@@ -235,40 +218,6 @@ typedef struct pgssSharedState

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-20 Thread Hannu Krosing

It would be really convenient if user-visible serialisations of the query
id had something that identifies the computation method.

maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.

This would immediately show in logs at what point the id calculator was
changed

On Fri, Mar 19, 2021 at 5:54 PM Bruce Momjian  wrote:

> On Fri, Mar 19, 2021 at 10:27:51PM +0800, Julien Rouhaud wrote:
> > On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:
> > > OK, that makes perfect sense.  I think the best solution is to document
> > > that compute_query_id just controls the built-in computation of the
> > > query id, and that extensions can also compute it if this is off, and
> > > pg_stat_activity and log_line_prefix will display built-in or extension
> > > computed query ids.
> >
> > So the last version of the patch should implement that behavior right?
> It's
> > just missing some explicit guidance that third-party extensions should
> only
> > calculate a queryid if compute_query_id is off
>
> Yes, I think we are now down to just how the extensions should be told
> to behave, and how we document this --- see the email I just sent.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   If only the physical world exists, free will is an illusion.
>
>
>
>

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-20 Thread Julien Rouhaud

On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:
> 
> Well, given we don't really want to support multiple query id types
> being generated or displayed, the "error out" above should fix it. 
> 
> Let's do this --- tell extensions to error out if the query id is
> already set, either by compute_query_id or another extension.  If an
> extension wants to generate its own query id and store is internal to
> the extension, that is fine, but the server-displayed query id should be
> generated once and never overwritten by an extension.

Agreed, this will ensure that you won't dynamically change the queryid source.

We should also document that changing it requires a restart and calling
pg_stat_statements_reset() afterwards.

v19 adds some changes, plus extra documentation for pg_stat_statements about
the requirement for a queryid to be calculated, and a note that all documented
details only apply for in-core source.  I'm not sure if this is still the best
place to document those details anymore though.
>From bcc76fdcff0ac867b087706a14141ceadcb371bf Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Wed, 14 Oct 2020 02:11:37 +0800
Subject: [PATCH v19 1/3] Move pg_stat_statements query jumbling to core.

A new compute_query_id GUC is also added, to control whether a query identifier
should be computed by the core.  It's thefore now possible to disable core
queryid computation and use pg_stat_statements with a different algorithm to
compute the query identifier by using third-party module.

To ensure that a single source of query identifier can be used and is well
defined, modules that calculate a query identifier should throw an error if
compute_query_id is enabled or if a query idenfitier was already calculated.

Author: Julien Rouhaud
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/CA+8PKvQnMfOE-c3YLRwxOsCYXQDyP8VXs6CDtMZp1V4=d4l...@mail.gmail.com
---
 .../pg_stat_statements/pg_stat_statements.c   | 805 +
 .../pg_stat_statements.conf   |   1 +
 doc/src/sgml/config.sgml  |  26 +
 doc/src/sgml/pgstatstatements.sgml|  20 +-
 src/backend/parser/analyze.c  |  14 +-
 src/backend/tcop/postgres.c   |   6 +-
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/guc.c  |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/backend/utils/misc/queryjumble.c  | 834 ++
 src/include/parser/analyze.h  |   4 +-
 src/include/utils/guc.h   |   1 +
 src/include/utils/queryjumble.h   |  58 ++
 13 files changed, 996 insertions(+), 785 deletions(-)
 create mode 100644 src/backend/utils/misc/queryjumble.c
 create mode 100644 src/include/utils/queryjumble.h

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 62cccbfa44..498f2aa376 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -8,24 +8,9 @@
  * a shared hashtable.  (We track only as many distinct queries as will fit
  * in the designated amount of shared memory.)
  *
- * As of Postgres 9.2, this module normalizes query entries.  Normalization
- * is a process whereby similar queries, typically differing only in their
- * constants (though the exact rules are somewhat more subtle than that) are
- * recognized as equivalent, and are tracked as a single entry.  This is
- * particularly useful for non-prepared queries.
- *
- * Normalization is implemented by fingerprinting queries, selectively
- * serializing those fields of each query tree's nodes that are judged to be
- * essential to the query.  This is referred to as a query jumble.  This is
- * distinct from a regular serialization in that various extraneous
- * information is ignored as irrelevant or not essential to the query, such
- * as the collations of Vars and, most notably, the values of constants.
- *
- * This jumble is acquired at the end of parse analysis of each query, and
- * a 64-bit hash of it is stored into the query's Query.queryId field.
- * The server then copies this value around, making it available in plan
- * tree(s) generated from the query.  The executor can then use this value
- * to blame query costs on the proper queryId.
+ * As of Postgres 9.2, this module normalizes query entries.  As of Postgres
+ * 14, the normalization is done by the core if compute_query_id is enabled,
+ * or optionally by third-party modules.
  *
  * To facilitate presenting entries to users, we create "representative" query
  * strings in which constants are replaced with parameter symbols ($n), to
@@ -114,8 +99,6 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
 #define USAGE_DEALLOC_PERCENT	5	/* free this % of entries at once */
 #define IS_STICKY(c)	((c.calls[PGSS_PLAN] + c.calls[PGSS_EXEC]) == 0)
 
-#define JUMBLE_SIZE1024	/*

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-19 Thread Julien Rouhaud

On Fri, Mar 19, 2021 at 08:10:54PM -0400, Bruce Momjian wrote:
> On Sat, Mar 20, 2021 at 01:03:16AM +0100, Hannu Krosing wrote:
> > It would be really convenient if user-visible serialisations of the query id
> > had something that identifies the computation method.
> > 
> > maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.
> > 
> > This would immediately show in logs at what point the id calculator was 
> > changed
> 
> Yeah, but it an integer, and I don't think we want to change that.

Also, with Bruce's approach to ask extensions to error out if they would
overwrite a queryid the only way to change the calculation method is a restart.
So only one source can exist in the system.

Hopefully that's a big enough hammer that administrators will know what method
they're using.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-19 Thread Bruce Momjian

On Sat, Mar 20, 2021 at 01:03:16AM +0100, Hannu Krosing wrote:
> It would be really convenient if user-visible serialisations of the query id
> had something that identifies the computation method.
> 
> maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.
> 
> This would immediately show in logs at what point the id calculator was 
> changed

Yeah, but it an integer, and I don't think we want to change that.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-19 Thread Bruce Momjian

On Fri, Mar 19, 2021 at 10:27:51PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:
> > OK, that makes perfect sense.  I think the best solution is to document
> > that compute_query_id just controls the built-in computation of the
> > query id, and that extensions can also compute it if this is off, and
> > pg_stat_activity and log_line_prefix will display built-in or extension
> > computed query ids.
> 
> So the last version of the patch should implement that behavior right?  It's
> just missing some explicit guidance that third-party extensions should only
> calculate a queryid if compute_query_id is off

Yes, I think we are now down to just how the extensions should be told
to behave, and how we document this --- see the email I just sent.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-19 Thread Bruce Momjian

On Fri, Mar 19, 2021 at 10:35:21PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 02:54:16PM +0100, Hannu Krosing wrote:
> > On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian  wrote:
> > The log-spam could be mitigated by logging it just once per connection
> > the first time it is overridden
> 
> Yes, but it might still generate a significant amount of additional lines.
> 
> If extensions authors follow the recommendations and only calculate a queryid
> when compute_query_id is off, it shoule be easy to check that you have
> everything setup properly.

Seems extensions that want to generate their own query id should just
error out with a message to the log file if compute_query_id is set ---
that should fix the entire issue --- but see below.

> > Also, we could ask the extensions to expose the "method name" in a 
> > read-only GUC
> > 
> > so one can do
> > 
> > SHOW compute_query_id_method;
> > 
> > and get the name of method use
> > 
> > compute_query_id_method
> > 
> > builtin
> > 
> > And it may even dynamically change to indicate the overriding of builtin
> > 
> > compute_query_id_method
> > ---
> > fancy_compute_query_id (overrides builtin)
> 
> This could be nice, but I'm not sure that it would work well if someones
> install multiple extensions that calculate a queryid (which would be silly but
> still), or load another one at runtime.

Well, given we don't really want to support multiple query id types
being generated or displayed, the "error out" above should fix it. 

Let's do this --- tell extensions to error out if the query id is
already set, either by compute_query_id or another extension.  If an
extension wants to generate its own query id and store is internal to
the extension, that is fine, but the server-displayed query id should be
generated once and never overwritten by an extension.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-19 Thread Julien Rouhaud

On Fri, Mar 19, 2021 at 02:54:16PM +0100, Hannu Krosing wrote:
> On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian  wrote:
> >
> > OK, that makes perfect sense.  I think the best solution is to document
> > that compute_query_id just controls the built-in computation of the
> > query id, and that extensions can also compute it if this is off, and
> > pg_stat_activity and log_line_prefix will display built-in or extension
> > computed query ids.
> >
> > It might be interesting someday to check if the hook changed a
> > pre-computed query id and warn the user in the logs, but that could
> > cause more log-spam problems than help.
> 
> The log-spam could be mitigated by logging it just once per connection
> the first time it is overridden

Yes, but it might still generate a significant amount of additional lines.

If extensions authors follow the recommendations and only calculate a queryid
when compute_query_id is off, it shoule be easy to check that you have
everything setup properly.

> Also, we could ask the extensions to expose the "method name" in a read-only 
> GUC
> 
> so one can do
> 
> SHOW compute_query_id_method;
> 
> and get the name of method use
> 
> compute_query_id_method
> 
> builtin
> 
> And it may even dynamically change to indicate the overriding of builtin
> 
> compute_query_id_method
> ---
> fancy_compute_query_id (overrides builtin)

This could be nice, but I'm not sure that it would work well if someones
install multiple extensions that calculate a queryid (which would be silly but
still), or load another one at runtime.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-19 Thread Julien Rouhaud

On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:
> On Fri, Mar 19, 2021 at 11:16:50AM +0800, Julien Rouhaud wrote:
> > Now that I'm back on the code I remember why I did it this way.  It's
> > unfortunately not really possible to make things work this way.
> > 
> > pg_stat_statements' post_parse_analyze_hook relies on a queryid already 
> > being
> > computed, as it's where we know where the constants are recorded.  It means:
> > 
> > - we have to call post_parse_analyze_hook *after* doing core queryid
> >   calculation
> > - if users want to use a third party module to calculate a queryid, they'll
> >   have to make sure that the module's post_parse_analyze_hook is called
> >   *before* pg_stat_statements' one.
> > - even if they do so, they'll still have to pay the price of core queryid
> >   calculation
> 
> OK, that makes perfect sense.  I think the best solution is to document
> that compute_query_id just controls the built-in computation of the
> query id, and that extensions can also compute it if this is off, and
> pg_stat_activity and log_line_prefix will display built-in or extension
> computed query ids.

So the last version of the patch should implement that behavior right?  It's
just missing some explicit guidance that third-party extensions should only
calculate a queryid if compute_query_id is off

> It might be interesting someday to check if the hook changed a
> pre-computed query id and warn the user in the logs, but that could
> cause more log-spam problems than help.  I am a little worried that
> someone might have compute_query_id enabled and then install an
> extension that overwrites it, but we will just have to document this
> issue.  Hopefully extensions will be clear that they are computing their
> own query id.

I agree.  And hopefully they will split the queryid calculation from the rest
of the extension so that users can use the combination they want.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-19 Thread Hannu Krosing

On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian  wrote:
>
> OK, that makes perfect sense.  I think the best solution is to document
> that compute_query_id just controls the built-in computation of the
> query id, and that extensions can also compute it if this is off, and
> pg_stat_activity and log_line_prefix will display built-in or extension
> computed query ids.
>
> It might be interesting someday to check if the hook changed a
> pre-computed query id and warn the user in the logs, but that could
> cause more log-spam problems than help.

The log-spam could be mitigated by logging it just once per connection
the first time it is overridden

Also, we could ask the extensions to expose the "method name" in a read-only GUC

so one can do

SHOW compute_query_id_method;

and get the name of method use

compute_query_id_method

builtin

And it may even dynamically change to indicate the overriding of builtin

compute_query_id_method
---
fancy_compute_query_id (overrides builtin)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-19 Thread Bruce Momjian

On Fri, Mar 19, 2021 at 11:16:50AM +0800, Julien Rouhaud wrote:
> On Thu, Mar 18, 2021 at 03:23:49PM -0400, Bruce Momjian wrote:
> > On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote:
> > 
> > The above text is the part that made me think an extension could display
> > a query id even if disabled by the GUC.
> 
> With the last version of the patch I sent it was the case.
> 
> > Oh, OK.  I can see an extension setting the query id on its own --- we
> > can't prevent that from happening.  It is probably enough to tell
> > extensions to honor the GUC, since they would want it enabled so it
> > displays in pg_stat_activity and log_line_prefix.
> 
> Ok.  So no new hook, and we keep using post_parse_analyze_hook as the official
> way to have custom queryid implementation, with this new behavior:
> 
> > > - if some extension calculates a queryid during post_parse_analyze_hook, 
> > > we
> > >   will always reset it.
> > 
> > OK, good.
> 
> Now that I'm back on the code I remember why I did it this way.  It's
> unfortunately not really possible to make things work this way.
> 
> pg_stat_statements' post_parse_analyze_hook relies on a queryid already being
> computed, as it's where we know where the constants are recorded.  It means:
> 
> - we have to call post_parse_analyze_hook *after* doing core queryid
>   calculation
> - if users want to use a third party module to calculate a queryid, they'll
>   have to make sure that the module's post_parse_analyze_hook is called
>   *before* pg_stat_statements' one.
> - even if they do so, they'll still have to pay the price of core queryid
>   calculation

OK, that makes perfect sense.  I think the best solution is to document
that compute_query_id just controls the built-in computation of the
query id, and that extensions can also compute it if this is off, and
pg_stat_activity and log_line_prefix will display built-in or extension
computed query ids.

It might be interesting someday to check if the hook changed a
pre-computed query id and warn the user in the logs, but that could
cause more log-spam problems than help.  I am a little worried that
someone might have compute_query_id enabled and then install an
extension that overwrites it, but we will just have to document this
issue.  Hopefully extensions will be clear that they are computing their
own query id.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-18 Thread Julien Rouhaud

On Thu, Mar 18, 2021 at 03:23:49PM -0400, Bruce Momjian wrote:
> On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote:
> 
> The above text is the part that made me think an extension could display
> a query id even if disabled by the GUC.

With the last version of the patch I sent it was the case.

> Oh, OK.  I can see an extension setting the query id on its own --- we
> can't prevent that from happening.  It is probably enough to tell
> extensions to honor the GUC, since they would want it enabled so it
> displays in pg_stat_activity and log_line_prefix.

Ok.  So no new hook, and we keep using post_parse_analyze_hook as the official
way to have custom queryid implementation, with this new behavior:

> > - if some extension calculates a queryid during post_parse_analyze_hook, we
> >   will always reset it.
> 
> OK, good.

Now that I'm back on the code I remember why I did it this way.  It's
unfortunately not really possible to make things work this way.

pg_stat_statements' post_parse_analyze_hook relies on a queryid already being
computed, as it's where we know where the constants are recorded.  It means:

- we have to call post_parse_analyze_hook *after* doing core queryid
  calculation
- if users want to use a third party module to calculate a queryid, they'll
  have to make sure that the module's post_parse_analyze_hook is called
  *before* pg_stat_statements' one.
- even if they do so, they'll still have to pay the price of core queryid
  calculation

So it would be very hard to configure and will be too expensive.  I think that
we have to choose to either we make compute_query_id only trigger core
calculation (like it was in previous patch version), or introduce a new hook.

> I think compute_query_id works, and is shorter.

WFM.

> OK, good to know.  I can run some tests here if people would like me to.

+1.  A read only pgbench will be some kind od worse case scenario that can be
used I think.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-18 Thread Bruce Momjian

On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote:
> On Thu, Mar 18, 2021 at 09:47:29AM -0400, Bruce Momjian wrote:
> > On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote:
> > > Note exactly.  Right now a custom queryid can be computed even if
> > > compute_queryid is off, if some extension does that in 
> > > post_parse_analyze_hook.

The above text is the part that made me think an extension could display
a query id even if disabled by the GUC.

> > The docs are going to say that you have to enable compute_queryid to see
> > the query id in pg_stat_activity and log_line_prefix, but if you install
> > an extension, the query id will be visible even if you don't have
> > compute_queryid enabled.  I think you need to only honor the hook if
> > compute_queryid is enabled, and update the pg_stat_statements docs to
> > say you have to enable compute_queryid for pg_stat_statements to work.
> 
> I'm confused, what you described really looks like what I described.
> 
> Let me try to clarify:
> 
> - if compute_queryid is off, a queryid should never be seen no matter how hard
>   an extension tries

Oh, OK.  I can see an extension setting the query id on its own --- we
can't prevent that from happening.  It is probably enough to tell
extensions to honor the GUC, since they would want it enabled so it
displays in pg_stat_activity and log_line_prefix.

> - if compute_queryid is on, the calculation will be done by the core
>   (using pgss JumbeQuery) unless an extension computed one already.  The only
>   way to know what algorithm is used is to check the list of extension loaded.

OK.

> - if some extension calculates a queryid during post_parse_analyze_hook, we
>   will always reset it.

OK, good.

> Is that the approach you want?

Yes, I think so.

> Note that the only way to not honor the hook is iff the new GUC is disabled is
> to have a new queryid_hook, as we can't stop calling post_parse_analyze_hook 
> if
> the new GUC is off, and we don't want to pay the queryid calculation overhead
> if the admin explicitly said it wasn't needed.

Right, let's just get the extensions to honor the GUC --- we don't need
to block them or anything.

> > Also, should it be compute_queryid or compute_query_id?
> 
> Maybe compute_query_identifier?

I think compute_query_id works, and is shorter.

> > Also, the overhead of computing the query id was reported as 2% --- that
> > seems quite high for what it does.  Do we know why it is so high?
> 
> The 2% was a worst case scenario, for a query with a single join over
> ridiculously small pg_class and pg_attribute, in read only.  The whole 
> workload
> was in shared buffers so the planning and execution is quite fast.  Adding 
> some
> complexity in the query really limited the overhead.
> 
> Note that this was done on an old laptop with quite slow CPU.  Maybe
> someone with a better hardware than a 5/6yo laptop could get some more
> realistic results (I unfortunately don't have anything to try on).

OK, good to know.  I can run some tests here if people would like me to.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-18 Thread Julien Rouhaud

On Thu, Mar 18, 2021 at 09:47:29AM -0400, Bruce Momjian wrote:
> On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote:
> > On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote:
> > > OK, is that what everyone wants?  I think that is what the patch already
> > > does.
> > 
> > Note exactly.  Right now a custom queryid can be computed even if
> > compute_queryid is off, if some extension does that in 
> > post_parse_analyze_hook.
> > 
> > I'm assuming that what Robert was thinking was more like:
> > 
> > if (compute_queryid)
> > {
> > if (queryid_hook)
> > queryId = queryid_hook(...);
> > else
> > queryId = JumbeQuery(...);
> > }
> > else
> > queryId = 0;
> > 
> > And that should be done *after* post_parse_analyse_hook so that it's clear 
> > that
> > this hook is no longer the place to compute queryid.
> > 
> > Is that what should be done?
> 
> No, I don't think so.  I think having extensions change behavior
> controlled by GUCs is a bad interface.
> 
> The docs are going to say that you have to enable compute_queryid to see
> the query id in pg_stat_activity and log_line_prefix, but if you install
> an extension, the query id will be visible even if you don't have
> compute_queryid enabled.  I think you need to only honor the hook if
> compute_queryid is enabled, and update the pg_stat_statements docs to
> say you have to enable compute_queryid for pg_stat_statements to work.

I'm confused, what you described really looks like what I described.

Let me try to clarify:

- if compute_queryid is off, a queryid should never be seen no matter how hard
  an extension tries

- if compute_queryid is on, the calculation will be done by the core
  (using pgss JumbeQuery) unless an extension computed one already.  The only
  way to know what algorithm is used is to check the list of extension loaded.

- if some extension calculates a queryid during post_parse_analyze_hook, we
  will always reset it.

Is that the approach you want?

Note that the only way to not honor the hook is iff the new GUC is disabled is
to have a new queryid_hook, as we can't stop calling post_parse_analyze_hook if
the new GUC is off, and we don't want to pay the queryid calculation overhead
if the admin explicitly said it wasn't needed.

> Also, should it be compute_queryid or compute_query_id?

Maybe compute_query_identifier?

> Also, the overhead of computing the query id was reported as 2% --- that
> seems quite high for what it does.  Do we know why it is so high?

The 2% was a worst case scenario, for a query with a single join over
ridiculously small pg_class and pg_attribute, in read only.  The whole workload
was in shared buffers so the planning and execution is quite fast.  Adding some
complexity in the query really limited the overhead.

Note that this was done on an old laptop with quite slow CPU.  Maybe
someone with a better hardware than a 5/6yo laptop could get some more
realistic results (I unfortunately don't have anything to try on).

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-18 Thread Bruce Momjian

On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote:
> On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote:
> > OK, is that what everyone wants?  I think that is what the patch already
> > does.
> 
> Note exactly.  Right now a custom queryid can be computed even if
> compute_queryid is off, if some extension does that in 
> post_parse_analyze_hook.
> 
> I'm assuming that what Robert was thinking was more like:
> 
> if (compute_queryid)
> {
> if (queryid_hook)
> queryId = queryid_hook(...);
> else
> queryId = JumbeQuery(...);
> }
> else
> queryId = 0;
> 
> And that should be done *after* post_parse_analyse_hook so that it's clear 
> that
> this hook is no longer the place to compute queryid.
> 
> Is that what should be done?

No, I don't think so.  I think having extensions change behavior
controlled by GUCs is a bad interface.

The docs are going to say that you have to enable compute_queryid to see
the query id in pg_stat_activity and log_line_prefix, but if you install
an extension, the query id will be visible even if you don't have
compute_queryid enabled.  I think you need to only honor the hook if
compute_queryid is enabled, and update the pg_stat_statements docs to
say you have to enable compute_queryid for pg_stat_statements to work.

Also, should it be compute_queryid or compute_query_id?

Also, the overhead of computing the query id was reported as 2% --- that
seems quite high for what it does.  Do we know why it is so high?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Julien Rouhaud

On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote:
> On Wed, Mar 17, 2021 at 04:04:44PM -0400, Robert Haas wrote:
> > On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud  wrote:
> > > I originally suggested to make it clearer by having an enum GUC rather 
> > > than a
> > > boolean, say compute_queryid = [ none | core | external ], and if set to
> > > external then a hook would be explicitely called.  Right now, "none" and
> > > "external" are binded with compute_queryid = off, and depends on whether 
> > > an
> > > extension is computing a queryid during post_parse_analyse_hook.
> > 
> > I would just make it a Boolean and have a hook. The Boolean controls
> > whether it gets computed at all, and the hook lets an external module
> > override the way it gets computed.
> 
> OK, is that what everyone wants?  I think that is what the patch already
> does.

Note exactly.  Right now a custom queryid can be computed even if
compute_queryid is off, if some extension does that in post_parse_analyze_hook.

I'm assuming that what Robert was thinking was more like:

if (compute_queryid)
{
if (queryid_hook)
queryId = queryid_hook(...);
else
queryId = JumbeQuery(...);
}
else
queryId = 0;

And that should be done *after* post_parse_analyse_hook so that it's clear that
this hook is no longer the place to compute queryid.

Is that what should be done?

> I think having multiple queryids used in a single cluster is much too
> confusing to support.  You would have to label and control which queryid
> is displayed by pg_stat_activity and log_line_prefix, and that seems too
> confusing and not useful.

I agree.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2021-03-17 Thread Bruce Momjian

On Wed, Mar 17, 2021 at 04:04:44PM -0400, Robert Haas wrote:
> On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud  wrote:
> > I originally suggested to make it clearer by having an enum GUC rather than 
> > a
> > boolean, say compute_queryid = [ none | core | external ], and if set to
> > external then a hook would be explicitely called.  Right now, "none" and
> > "external" are binded with compute_queryid = off, and depends on whether an
> > extension is computing a queryid during post_parse_analyse_hook.
> 
> I would just make it a Boolean and have a hook. The Boolean controls
> whether it gets computed at all, and the hook lets an external module
> override the way it gets computed.

OK, is that what everyone wants?  I think that is what the patch already
does.

I think having multiple queryids used in a single cluster is much too
confusing to support.  You would have to label and control which queryid
is displayed by pg_stat_activity and log_line_prefix, and that seems too
confusing and not useful.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

1 2 3 >

1 - 100 of 221 matches

Mail list logo