Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-13 Thread Amit Kapila
On Thu, Aug 10, 2023 at 8:32 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Based on recent discussions, I updated the patch set. I did not reply one by 
> one
> because there are many posts, but thank you for giving many suggestion!
>
> Followings shows what I changed.
>
> 1.
> This feature is now enabled by default. Instead 
> "--exclude-logical-replication-slots"
> was added. (Per suggestions like [1])
>

AFAICS, we don't have any concrete agreement on such an option but my
vote is to not have such an option as we don't have any similar option
for any other object. I understand that it could be convenient for
some use cases where some of the logical slots are not yet caught up
w.r.t WAL and users want to upgrade without the slots but not sure if
that is really the case. Does anyone else have an opinion on this
point?

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-13 Thread Amit Kapila
On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada  wrote:
>
> On Sat, Aug 12, 2023, 15:20 Amit Kapila  wrote:
>>
>> I don't think we need the complexity of version-specific checks if we
>> do what we do in get_control_data(). Basically, invoke
>> version-specific pg_replslotdata to get version-specific slot
>> information. There has been a proposal for a tool like that [1]. Do
>> you have something better in mind? If so, can you please explain the
>> same a bit more?
>
>
> Yeah, we need something like pg_replslotdata. If there are other useful 
> usecases for this tool, it would be good to have it. But I'm not sure other 
> than pg_upgrade usecase.
>
> Another idea is (which might have already discussed thoguh) that we check if 
> the latest shutdown checkpoint LSN in the control file matches the 
> confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure 
> that the slot has consumed all WAL records before the last shutdown. We don't 
> need to worry about WAL records generated after starting the old cluster 
> during the upgrade, at least for logical replication slots.
>

Right, this is somewhat closer to what Patch is already doing. But
remember in this case we need to remember and use the latest
checkpoint from the control file before the old cluster is started
because otherwise the latest checkpoint location could be even updated
during the upgrade. So, instead of reading from WAL, we need to change
so that we rely on the control file's latest LSN. I would prefer this
idea than to invent a new API/tool like pg_replslotdata.

The other point you and Bruce seem to be favoring is that instead of
dumping/restoring slots via pg_dump, we remember the required
information of slots retrieved during their validation in pg_upgrade
itself and use that to create the slots in the new cluster. Though I
am not aware of doing similar treatment for other objects we restore
in this case it seems reasonable especially because slots are not
stored in the catalog and we anyway already need to retrieve the
required information to validate them, so trying to again retrieve it
via pg_dump doesn't seem useful unless I am missing something. Does
this match your understanding?

Yet another thing I am trying to consider is whether we can allow to
upgrade slots from 16 or 15 to later versions. As of now, the patch
has the following check:
getLogicalReplicationSlots()
{
...
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 17)
+ return;
...
}

If we decide to use the existing view pg_replication_slots then can we
consider upgrading slots from the prior version to 17? Now, if we want
to invent any new API similar to pg_replslotdata then we can't do this
because it won't exist in prior versions but OTOH using existing view
pg_replication_slots can allow us to fetch slot info from older
versions as well. So, I think it is worth considering.

Thoughts?

-- 
With Regards,
Amit Kapila.




Re: WIP: new system catalog pg_wait_event

2023-08-13 Thread Michael Paquier
On Thu, Aug 10, 2023 at 08:09:34PM +0200, Drouvot, Bertrand wrote:
> Agree that's worth it given the fact that iterating one more time is not that
> costly here.

I have reviewed v4, and finished by putting my hands on it to see what
I could do.

+printf $wc "\telement = (wait_event_element *) 
palloc(sizeof(wait_event_element));\n";
+
+printf $wc "\telement->wait_event_type = \"%s\";\n", $last;
+printf $wc "\telement->wait_event_name = \"%s\";\n", $wev->[1];
+printf $wc "\telement->wait_event_description = \"%s\";\n\n", 
$new_desc;
+
+printf $wc "\twait_event = lappend(wait_event, element);\n\n";
+}

This is simpler than the previous versions, still I am not much a fan
of implying the use of a list and these pallocs.  There are two things
that we could do:
- Hide that behind a macro defined in wait_event_funcs.c.
- Feed the data generated here into a static structure, like:
+static const struct
+{
+   const char *type;
+   const char *name;
+   const char *description;
+}

After experimenting with both, I've found the latter a tad cleaner, so
the attached version does that.

+   description texte
This one was difficult to see..

I am not sure that "pg_wait_event" is a good idea for the name if the
new view.  How about "pg_wait_events" instead, in plural form?  There
is more than one wait event listed.

One log entry in Solution.pm has missed the addition of a reference to
wait_event_funcs_data.c.

And..  src/backend/Makefile missed two updates for maintainer-clean & co,
no?

One thing that the patch is still missing is the handling of custom
wait events for extensions.  So this still requires more code.  I was
thinking about listed these events as:
- Type: "Extension"
- Name: What a module has registered.
- Description: "Custom wait event \"%Name%\" defined by extension".

For now I am attaching a v5.
--
Michael
From 4f0172e216bfa7b929b9ca3465d66088b2ac1566 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Sat, 5 Aug 2023 12:39:42 +
Subject: [PATCH v5] Add catalog pg_wait_events

Adding a new system view, namely pg_wait_event, that describes the wait
events.
---
 src/include/catalog/pg_proc.dat   |  6 ++
 src/include/utils/meson.build |  4 +-
 src/backend/Makefile  |  3 +-
 src/backend/catalog/system_views.sql  |  3 +
 src/backend/utils/activity/.gitignore |  1 +
 src/backend/utils/activity/Makefile   |  8 ++-
 .../activity/generate-wait_event_types.pl | 46 +++--
 src/backend/utils/activity/meson.build|  1 +
 src/backend/utils/activity/wait_event_funcs.c | 69 +++
 src/test/regress/expected/rules.out   |  4 ++
 src/test/regress/expected/sysviews.out| 16 +
 src/test/regress/sql/sysviews.sql |  4 ++
 doc/src/sgml/system-views.sgml| 64 +
 src/tools/msvc/Solution.pm|  3 +-
 src/tools/msvc/clean.bat  |  1 +
 15 files changed, 223 insertions(+), 10 deletions(-)
 create mode 100644 src/backend/utils/activity/wait_event_funcs.c

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..1a942c678c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5417,6 +5417,12 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,gss_delegation,leader_pid,query_id}',
   prosrc => 'pg_stat_get_activity' },
+{ oid => '8403', descr => 'describe wait events',
+  proname => 'pg_get_wait_events', procost => '10', prorows => '100',
+  proretset => 't', provolatile => 's', prorettype => 'record',
+  proargtypes => '', proallargtypes => '{text,text,text}',
+  proargmodes => '{o,o,o}', proargnames => '{type,name,description}',
+  prosrc => 'pg_get_wait_events' },
 { oid => '3318',
   descr => 'statistics: information about progress of backends running maintenance command',
   proname => 'pg_stat_get_progress_info', prorows => '100', proretset => 't',
diff --git a/src/include/utils/meson.build b/src/include/utils/meson.build
index 6de5d93799..c179478611 100644
--- a/src/include/utils/meson.build
+++ b/src/include/utils/meson.build
@@ -1,6 +1,6 @@
 # Copyright (c) 2022-2023, PostgreSQL Global Development Group
 
-wait_event_output = ['wait_event_types.h', 'pgstat_wait_event.c']
+wait_event_output = ['wait_event_types.h', 'pgstat_wait_event.c', 'wait_event_funcs_data.c']
 wait_event_target = custom_target('wait_event_names',
   input: files('../../backend/utils/activity/wait_event_names.txt'),
   output: 

Re: Naive handling of inequalities by nbtree initial positioning code

2023-08-13 Thread Peter Geoghegan
On Sun, Aug 13, 2023 at 5:50 PM Peter Geoghegan  wrote:
> All that it would take to fix the problem is per-attribute
> BTScanInsertData.nextkey values. There is no reason why "nextkey"
> semantics should only work for the last attribute in the insertion
> scan key. Under this scheme, _bt_first() would be taught to set up the
> insertion scan key with (say) nextkey=true for the "four > 2"
> attribute, and nextkey=false for the other 3 attributes (since we do
> that whenever >= or = are used). It would probably also make sense to
> generalize this approach to handle (say) a third query that had a
> "four < 2" inequality, but otherwise matched the first two queries. So
> we wouldn't literally use multiple "nextkey" fields to do this.

Actually, that can't work when there are a huge number of index tuples
with the same values for "four" (enough to span many internal pages).
So we'd need specialized knowledge of the data type (probably
from an opclass support function) to transform "four > 2" into "four
>= 3" up front. Alternatively, we could do roughly the same thing via
an initial index probe to do the same thing. The latter approach would
be needed for continuous data types, where the transformation isn't
possible at all.

The probing approach could work by finding an initial position in the
same way as we currently locate an initial leaf page -- the way that I
complained about earlier on, but with an important twist. Once we'd
established that the first "four" value in the index > 2 really was 3
(or whatever it turned out to be), we could fill that value into a new
insertion scan key. It would then be possible to do another descent of
the index, skipping over most of the leaf pages that we'll access
needlessly right now. (Actually, we'd only do all that when it seemed
likely to allow us to skip a significant number of intermediate leaf
pages -- which is what we saw in my test case.)

This is directly related to skip scan. The former approach is more or
less what the MDAM paper calls "dense" access (which is naturally
limited to discrete data types like integer), while the latter probing
approach is what it calls "sparse" access. Skip scan performs this
process repeatedly, most of the time, but we'd only skip once here.

In fact, if my example had used (say) "four > 1" instead, then it
would have made sense to skip multiple times -- not just once, after
an initial descent. Because then we'd have had to consider matches for
both "two=1 and four=2" and "two=1 and four=3" (there aren't any
"two=1 and four=4" matches so we'd then be done).

In fact, had there been no mention of the "four" column in the query
whatsoever (which is how we tend to think of skip scan), then a decent
implementation of skip scan would effectively behave as if the query
had been written "two=1 and four > -inf and ...", while following the
same general approach. (Or "two=1 and four < +inf and ...", if this
was a similar looking backwards scan.)

--
Peter Geoghegan




proposal: jsonb_populate_array

2023-08-13 Thread Pavel Stehule
Hi

Now, there is no native functionality for conversion from json(b) value to
some array.

https://stackoverflow.com/questions/76894960/unable-to-assign-text-value-to-variable-in-pgsql/76896112#76896112

It should not be too hard to implement native function jsonb_populate_array

jsonb_populate_array(anyarray, jsonb) returns anyarray

Usage:

select jsonb_populate_array(null::text[], '["cust_full_name","cust_email"]')

Comments, notes?

Regards

Pavel


Re: Support to define custom wait events for extensions

2023-08-13 Thread Masahiro Ikeda

On 2023-08-14 08:06, Michael Paquier wrote:

On Thu, Aug 10, 2023 at 05:37:55PM +0900, Michael Paquier wrote:

This looks correct, but perhaps we need to think harder about the
custom event names and define a convention when more of this stuff is
added to the core modules.


Okay, I have put my hands on that, fixing a couple of typos, polishing
a couple of comments, clarifying the docs and applying an indentation.
And here is a v4.

Any thoughts or comments?  I'd like to apply that soon, so as we are
able to move on with the wait event catalog and assigning custom wait
events to the other in-core modules.


Thanks! I confirmed the changes, and all tests passed.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: pgbench with libevent?

2023-08-13 Thread Thomas Munro
On Mon, Aug 14, 2023 at 12:35 PM Fabien COELHO  wrote:
> > Pgbench is managing clients I/Os manually with select or poll. Much of this
> > could be managed by libevent.
>
> Or maybe libuv (used by nodejs?).
>
> From preliminary testing libevent seems not too good at fine grain time
> management which are used for throttling, whereas libuv advertised that it
> is good at it, although what it does is yet to be seen.

Do you think our WaitEventSet stuff could be good here, if made
frontend-friendly?




Re: Report planning memory in EXPLAIN ANALYZE

2023-08-13 Thread Andrey Lepikhov

On 14/8/2023 06:53, David Rowley wrote:

On Thu, 10 Aug 2023 at 20:33, Ashutosh Bapat
 wrote:

My point is what's relevant here is how much net memory planner asked
for.


But that's not what your patch is reporting. All you're reporting is
the difference in memory that's *currently* palloc'd from before and
after the planner ran.  If we palloc'd 600 exabytes then pfree'd it
again, your metric won't change.

I'm struggling a bit to understand your goals here.  If your goal is
to make a series of changes that reduces the amount of memory that's
palloc'd at the end of planning, then your patch seems to suit that
goal, but per the quote above, it seems you care about how many bytes
are palloc'd during planning and your patch does not seem track that.

With your patch as it is, to improve the metric you're reporting we
could go off and do things like pfree Paths once createplan.c is done,
but really, why would we do that? Just to make the "Planning Memory"
metric looks better doesn't seem like a worthy goal.

Instead, if we reported the context's mem_allocated, then it would
give us the flexibility to make changes to the memory context code to
have the metric look better. It might also alert us to planner
inefficiencies and problems with new code that may cause a large spike
in the amount of memory that gets allocated.  Now, I'm not saying we
should add a patch that shows mem_allocated. I'm just questioning if
your proposed patch meets the goals you're trying to achieve.  I just
suggested that you might want to consider something else as a metric
for your memory usage reduction work.
Really, the current approach with the final value of consumed memory 
smooths peaks of memory consumption. I recall examples likewise massive 
million-sized arrays or reparameterization with many partitions where 
the optimizer consumes much additional memory during planning.
Ideally, to dive into the planner issues, we should have something like 
a report-in-progress in the vacuum, reporting on memory consumption at 
each subquery and join level. But it looks too much for typical queries.


--
regards,
Andrey Lepikhov
Postgres Professional





Extending SMgrRelation lifetimes

2023-08-13 Thread Thomas Munro
Hi,

SMgrRelationData objects don't currently have a defined lifetime, so
it's hard to know when the result of smgropen() might become a
dangling pointer.  This has caused a few bugs in the past, and the
usual fix is to just call smgropen() more often and not hold onto
pointers.  If you're doing that frequently enough, the hash table
lookups can show up in profiles.  I'm interested in this topic for
more than just micro-optimisations, though: in order to be able to
batch/merge smgr operations, I'd like to be able to collect them in
data structures that survive more than just a few lines of code.
(Examples to follow in later emails).

The simplest idea seems to be to tie object lifetime to transactions
using the existing AtEOXact_SMgr() mechanism.  In recovery, the
obvious corresponding time would be the commit/abort record that
destroys the storage.

This could be achieved by extending smgrrelease().  That was a
solution to the same problem in a narrower context: we didn't want
CFIs to randomly free SMgrRelations, but we needed to be able to
force-close fds in other backends, to fix various edge cases.

The new idea is to overload smgrrelease(it) so that it also clears the
owner, which means that AtEOXact_SMgr() will eventually smgrclose(it),
unless it is re-owned by a relation before then.  That choice stems
from the complete lack of information available via sinval in the case
of an overflow.  We must (1) close all descriptors because any file
might have been unlinked, (2) keep all pointers valid and yet (3) not
leak dropped smgr objects forever.  In this patch, smgrreleaseall()
achieves those goals.

Proof-of-concept patch attached.  Are there holes in this scheme?
Better ideas welcome.  In terms of spelling, another option would be
to change the behaviour of smgrclose() to work as described, ie it
would call smgrrelease() and then also disown, so we don't have to
change most of those callers, and then add a new function
smgrdestroy() for the few places that truly need it.  Or something
like that.

Other completely different ideas I've bounced around with various
hackers and decided against: references counts, "holder" objects that
can be an "owner" (like Relation, but when you don't have a Relation)
but can re-open on demand.  Seemed needlessly complicated.

While studying this I noticed a minor thinko in smgrrelease() in
15+16, so here's a fix for that also.  I haven't figured out a
sequence that makes anything bad happen, but we should really
invalidate smgr_targblock when a relfilenode is reused, since it might
point past the end.  This becomes more obvious once smgrrelease() is
used for truncation, as proposed here.
From 844e56e9ea9a56e04813886a6f7ded19a3af7f90 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Mon, 14 Aug 2023 11:01:28 +1200
Subject: [PATCH 1/2] Invalidate smgr_targblock on release.

In rare circumstances involving relfilenode reuse, it may be possible
for smgr_targblock to finish up pointing past the end.

Oversight in b74e94dc.  Back-patch to 15.
---
 src/backend/storage/smgr/smgr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index f76c4605db..65e7436306 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -295,6 +295,7 @@ smgrrelease(SMgrRelation reln)
 	{
 		smgrsw[reln->smgr_which].smgr_close(reln, forknum);
 		reln->smgr_cached_nblocks[forknum] = InvalidBlockNumber;
+		reln->smgr_targblock = InvalidBlockNumber;
 	}
 }
 
-- 
2.39.2

From b18fea5f263c46f87b84e242ff228cf1ab97b3e8 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 10 Aug 2023 18:16:31 +1200
Subject: [PATCH 2/2] Give SMgrRelation pointers a well-defined lifetime.

After calling smgropen(), it was not clear how long you could continue
to use the result, because various code paths including cache
invalidation could call smgrclose().

Guarantee that the object won't be freed until the end of the current
transaction, or in recovery, the commit/abort record that destroys the
storage.

This is achieved by making most places that previously called
smgrclose() use smgrrelease() instead, and giving smgrrelease() the
additional task of 'disowning' the relation so that it is closed at
end-of-transaction.  That allows all pointers to remain valid, even in
the case of sinval overflow where we have no information about which
relations have been dropped or truncated.
---
 src/backend/access/heap/heapam_handler.c |  6 +++---
 src/backend/access/heap/visibilitymap.c  |  2 +-
 src/backend/access/transam/xlogutils.c   |  6 +++---
 src/backend/catalog/storage.c|  2 +-
 src/backend/commands/cluster.c   |  4 ++--
 src/backend/commands/dbcommands.c|  2 +-
 src/backend/commands/sequence.c  |  2 +-
 src/backend/commands/tablecmds.c |  4 ++--
 src/backend/storage/buffer/bufmgr.c  |  4 ++--
 src/backend/storage/smgr/smgr.c  | 21 ++--
 

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-13 Thread Masahiko Sawada
On Sat, Aug 12, 2023, 15:20 Amit Kapila  wrote:

> On Fri, Aug 11, 2023 at 11:38 PM Bruce Momjian  wrote:
> >
> > On Fri, Aug 11, 2023 at 10:46:31AM +0530, Amit Kapila wrote:
> > > On Thu, Aug 10, 2023 at 7:07 PM Masahiko Sawada 
> wrote:
> > > > What I imagined is that we do this check before
> > > > check_and_dump_old_cluster() while the server is 'off'. Reading the
> > > > slot state file would be simple and I guess we would not need a tool
> > > > or cli program for that. We need to expose RepliactionSlotOnDisk,
> > > > though.
> > >
> > > Won't that require a lot of version-specific checks as across versions
> > > the file format could be different? For the case of the control file,
> > > we use version-specific pg_controldata (for the old cluster, the
> > > corresponding version's pg_controldata) utility to read the old
> > > version control file. I thought we need something similar here if we
> > > want to do what you are suggesting.
> >
> > You mean the slot file format?
>
> Yes.
>
> >
> >  We will need that complexity somewhere,
> > so why not in pg_upgrade?
> >
>
> I don't think we need the complexity of version-specific checks if we
> do what we do in get_control_data(). Basically, invoke
> version-specific pg_replslotdata to get version-specific slot
> information. There has been a proposal for a tool like that [1]. Do
> you have something better in mind? If so, can you please explain the
> same a bit more?
>

Yeah, we need something like pg_replslotdata. If there are other useful
usecases for this tool, it would be good to have it. But I'm not sure other
than pg_upgrade usecase.

Another idea is (which might have already discussed thoguh) that we check
if the latest shutdown checkpoint LSN in the control file matches the
confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure
that the slot has consumed all WAL records before the last shutdown. We
don't need to worry about WAL records generated after starting the old
cluster during the upgrade, at least for logical replication slots.

Regards,


Re: pg_waldump vs. all-zeros WAL files; server creation of such files

2023-08-13 Thread Michael Paquier
On Sat, Aug 12, 2023 at 08:15:31PM -0700, Noah Misch wrote:
> The attached 010_zero.pl, when run as part of the pg_waldump test suite, fails
> at today's master (c36b636) and v15 (1bc19df).  It passes at v14 (5a32af3).
> Command "pg_waldump --start 0/0100 --end 0/01000100" fails as follows:
> 
>   pg_waldump: error: WAL segment size must be a power of two between
>   1 MB and 1 GB, but the WAL file "00010002" header
>   specifies 0 bytes

So this depends on the ordering of the entries retrieved by readdir()
as much as the segments produced by the backend.

> Where it fails, the server has created an all-zeros WAL file under that name.
> Where it succeeds, that file doesn't exist at all.  Two decisions to make:
> 
> - Should a clean server shutdown ever leave an all-zeros WAL file?  I think
>   yes, it's okay to let that happen.

It doesn't hurt to leave that around.  On the contrary, it makes any
follow-up startup cheaper the bigger the segment size.

> - Should "pg_waldump --start $X --end $Y" open files not needed for the
>   requested range?  I think no.

So this is a case where identify_target_directory() is called with a
fname of NULL.  Agreed that search_directory could be smarter with the
files it should scan, especially if we have start and/or end LSNs at
hand to filter out what we'd like to be in the data folder.
--
Michael


signature.asc
Description: PGP signature


Re: Naive handling of inequalities by nbtree initial positioning code

2023-08-13 Thread Peter Geoghegan
On Sun, Aug 13, 2023 at 5:50 PM Peter Geoghegan  wrote:
> select * from tenk1
> where
>   two = 1
>   and four = 3
>   and hundred = 91
>   and thousand = 891
>   and tenthous = 1891;
>
> The query returns one row, and touches 3 buffers/pages (according to
> EXPLAIN ANALYZE with buffers). The overheads here make perfect sense:
> there's one root page access, one leaf page access, and a single heap
> page access. Clearly the nbtree initial positioning code is able to
> descend to the exact leaf page (and page offset) where the first
> possible match could be found. Pretty standard stuff.

I probably should have made this first query use "four >= 3" instead
of using "four = 3" (while still using "four > 2" for the second,
"bad" query). The example works a bit better that way because now the
queries are logically equivalent, and yet still have this big
disparity. (We get 4 buffer hits for the "good" >= query, but 16
buffer hits for the equivalent "bad" > query.)

-- 
Peter Geoghegan




Naive handling of inequalities by nbtree initial positioning code

2023-08-13 Thread Peter Geoghegan
Suppose that I create the following index on the tenk1 table from the
regression tests:

create index on tenk1 (two, four, hundred, thousand, tenthous);

Now the following query will be able to use index quals for each
column that appear in my composite index:

select * from tenk1
where
  two = 1
  and four = 3
  and hundred = 91
  and thousand = 891
  and tenthous = 1891;

The query returns one row, and touches 3 buffers/pages (according to
EXPLAIN ANALYZE with buffers). The overheads here make perfect sense:
there's one root page access, one leaf page access, and a single heap
page access. Clearly the nbtree initial positioning code is able to
descend to the exact leaf page (and page offset) where the first
possible match could be found. Pretty standard stuff.

But if I rewrite this query to use an inequality, the picture changes.
If I replace "four = 3" with "four > 2", I get a query that is very
similar to the original (that returns the same single row):

select * from tenk1
where
  two = 1
  and four > 2
  and hundred = 91
  and thousand = 891
  and tenthous = 1891;

This time our query touches 16 buffers/pages. That's a total of 15
index pages accessed, of which 14 are leaf pages. We'll needlessly
plow through an extra 13 leaf pages, before finally arriving at the
first leaf page that might *actually* have a real match for us.

We can and should find a way for the second query to descend to the
same leaf page directly, so that the physical access patterns match
those that we saw with the first query. Only the first query can use
an insertion scan key with all 4 attributes filled in to find its
initial scan position. The second query uses an insertion scan key
with values set for the first 2 index columns (on two and four) only.
EXPLAIN offers no hint that this is what happens -- the "Index Cond:"
shown for each query is practically identical. It seems to me that
Markus Winand had a very good point when he complained that we don't
expose this difference directly (e.g., by identifying which columns
appear in "Access predicates" and which columns are merely "Index
filter predicates") [1]. That would make these kinds of issues a lot
more obvious.

The nbtree code is already capable of tricks that are close enough to
what I'm thinking of here. Currently, nbtree's initial positioning
code will set BTScanInsertData.nextkey=false for the first query
(where BTScanInsertData.keysz=4), and BTScanInsertData.nextkey=true
for the second query (where BTScanInsertData.keysz=2 right now). So
the second query I came up with does at least manage to locate the
leaf page where "four = 3" tuples begin, even today -- its "four > 2"
inequality is at least "handled efficiently".  The inefficiencies come
from how nbtree handles the remaining two index columns when building
an insertion scan key for our initial descent. nbtree will treat the
inequality as making it unsafe to include further values for the
remaining two attributes, which is the real source of the extra leaf
page scans (though of course the later attributes are still usable as
search-type scan keys). But it's *not* unsafe to position ourselves on
the right leaf page from the start. Not really.

All that it would take to fix the problem is per-attribute
BTScanInsertData.nextkey values. There is no reason why "nextkey"
semantics should only work for the last attribute in the insertion
scan key. Under this scheme, _bt_first() would be taught to set up the
insertion scan key with (say) nextkey=true for the "four > 2"
attribute, and nextkey=false for the other 3 attributes (since we do
that whenever >= or = are used). It would probably also make sense to
generalize this approach to handle (say) a third query that had a
"four < 2" inequality, but otherwise matched the first two queries. So
we wouldn't literally use multiple "nextkey" fields to do this.

The most general approach seems to require that we teach insertion
scan key routines like _bt_compare() about "goback" semantics, which
must also work at the attribute granularity. So we'd probably replace
both "nextkey" and "goback" with something new and more general. I
already wrote a patch (still in the CF queue) to teach nbtree
insertion scan keys about "goback" semantics [2] (whose use would
still be limited to backwards scans), so that we'd avoid needlessly
accessing extra pages in so-called boundary cases (which seems like a
much less important problem than the one I've highlighted here).

That existing patch already removed code in _bt_first that handled
"stepping back" once we're on the leaf level. ISTM that the right
place to do stuff like that is in routines like _bt_search,
_bt_binsrch, and _bt_compare -- not in _bt_first. The code around
_bt_compare seems like it would benefit from having more of this sort
of context. Having the full picture matters both when searching
internal pages and leaf pages.

Thoughts? Was this issue discussed at some point in the past?

[1] 

Re: Make psql's qeury canceling test simple by using signal() routine of IPC::Run

2023-08-13 Thread Fabien COELHO


Bonjour Michaël,


On Sun, Aug 13, 2023 at 11:22:33AM +0200, Fabien COELHO wrote:

Test run is ok on my Ubuntu laptop.


I have a few comments about this patch.


Argh, sorry!

I looked at what was removed (a lot) from the previous version, not what 
was remaining and should also have been removed.


--
Fabien.

Re: pgbench with libevent?

2023-08-13 Thread Fabien COELHO



Pgbench is managing clients I/Os manually with select or poll. Much of this 
could be managed by libevent.


Or maybe libuv (used by nodejs?).

From preliminary testing libevent seems not too good at fine grain time 
management which are used for throttling, whereas libuv advertised that it 
is good at it, although what it does is yet to be seen.


Note: libev had no updates in 8 years.

--
Fabien.




Re: Report planning memory in EXPLAIN ANALYZE

2023-08-13 Thread David Rowley
On Thu, 10 Aug 2023 at 20:33, Ashutosh Bapat
 wrote:
> My point is what's relevant here is how much net memory planner asked
> for.

But that's not what your patch is reporting. All you're reporting is
the difference in memory that's *currently* palloc'd from before and
after the planner ran.  If we palloc'd 600 exabytes then pfree'd it
again, your metric won't change.

I'm struggling a bit to understand your goals here.  If your goal is
to make a series of changes that reduces the amount of memory that's
palloc'd at the end of planning, then your patch seems to suit that
goal, but per the quote above, it seems you care about how many bytes
are palloc'd during planning and your patch does not seem track that.

With your patch as it is, to improve the metric you're reporting we
could go off and do things like pfree Paths once createplan.c is done,
but really, why would we do that? Just to make the "Planning Memory"
metric looks better doesn't seem like a worthy goal.

Instead, if we reported the context's mem_allocated, then it would
give us the flexibility to make changes to the memory context code to
have the metric look better. It might also alert us to planner
inefficiencies and problems with new code that may cause a large spike
in the amount of memory that gets allocated.  Now, I'm not saying we
should add a patch that shows mem_allocated. I'm just questioning if
your proposed patch meets the goals you're trying to achieve.  I just
suggested that you might want to consider something else as a metric
for your memory usage reduction work.

David




Re: run pgindent on a regular basis / scripted manner

2023-08-13 Thread Michael Paquier
On Sun, Aug 13, 2023 at 10:33:21AM -0400, Andrew Dunstan wrote:
> After I'd been caught by this once or twice I implemented a git hook test
> for that too - in fact it was the first hook I did. It's not perfect but
> it's saved me a couple of times:
> 
> check_catalog_version () {

I find that pretty cool.  Thanks for sharing.
--
Michael


signature.asc
Description: PGP signature


Re: Make psql's qeury canceling test simple by using signal() routine of IPC::Run

2023-08-13 Thread Michael Paquier
On Sun, Aug 13, 2023 at 11:22:33AM +0200, Fabien COELHO wrote:
> Test run is ok on my Ubuntu laptop.

I have a few comments about this patch.

On HEAD and even after this patch, we still have the following:
SKIP:   

   {

  skip "cancel test requires a Unix shell", 2 if 
$windows_os;

Could the SKIP be removed for $windows_os?  If not, this had better be
documented because the reason for the skip becomes incorrect.

The comment at the top of the SKIP block still states the following:
# There is, as of this writing, no documented way to get the PID of
# the process from IPC::Run.  As a workaround, we have psql print its
# own PID (which is the parent of the shell launched by psql) to a
# file.

This is also incorrect.
--
Michael


signature.asc
Description: PGP signature


Re: Support to define custom wait events for extensions

2023-08-13 Thread Michael Paquier
On Thu, Aug 10, 2023 at 05:37:55PM +0900, Michael Paquier wrote:
> This looks correct, but perhaps we need to think harder about the
> custom event names and define a convention when more of this stuff is
> added to the core modules.

Okay, I have put my hands on that, fixing a couple of typos, polishing
a couple of comments, clarifying the docs and applying an indentation.
And here is a v4.

Any thoughts or comments?  I'd like to apply that soon, so as we are
able to move on with the wait event catalog and assigning custom wait
events to the other in-core modules.
--
Michael
From 83e40dace75d0df91e8b7f617bd03d02bd8450d4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 14 Aug 2023 07:48:15 +0900
Subject: [PATCH v4] Change custom wait events to use shared hash tables

Currently, names of the custom wait event must be registered for each
backends, requiring all these to link to the shared memory area of an
extention, even if not loaded with shared_preload_libraries.

This patch relaxes the constraints by storing the wait events and their
names in hash tables in shared memory.  First, this has the advantage to
simplify the registration of custom wait events to a single routine
call that returns an event ID, either existing or incremented:
uint32 WaitEventExtensionNew(const char *wait_event_name);

Any other backend looking at the wait event information, for instance
via pg_stat_activity, is then able to automatically know about the new
event name.

The code changes done in worker_spi show how things are simplified:
- worker_spi_init() is gone.
- No more shmem hooks.
- The custom wait event ID is cached in the process that needs to set
it, with one single call to WaitEventExtensionNew() to retrieve it.

Per suggestion from Andres Freund.

Author: Masahiro Ikeda
Discussion: https://postgr.es/m/20230801032349.aaiuvhtrcvvcw...@awork3.anarazel.de
---
 src/include/utils/wait_event.h|  18 +-
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 src/backend/utils/activity/wait_event.c   | 217 +++---
 .../utils/activity/wait_event_names.txt   |   1 +
 .../modules/worker_spi/t/001_worker_spi.pl|  18 +-
 .../modules/worker_spi/worker_spi--1.0.sql|   5 -
 src/test/modules/worker_spi/worker_spi.c  | 109 +
 doc/src/sgml/monitoring.sgml  |   5 +-
 doc/src/sgml/xfunc.sgml   |  26 +--
 src/tools/pgindent/typedefs.list  |   2 +
 10 files changed, 164 insertions(+), 238 deletions(-)

diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h
index aad8bc08fa..3cffae23e1 100644
--- a/src/include/utils/wait_event.h
+++ b/src/include/utils/wait_event.h
@@ -44,12 +44,14 @@ extern PGDLLIMPORT uint32 *my_wait_event_info;
  * Use this category when the server process is waiting for some condition
  * defined by an extension module.
  *
- * Extensions can define their own wait events in this category.  First,
- * they should call WaitEventExtensionNew() to get one or more wait event
- * IDs that are allocated from a shared counter.  These can be used directly
- * with pgstat_report_wait_start() or equivalent.  Next, each individual
- * process should call WaitEventExtensionRegisterName() to associate a wait
- * event string to the number allocated previously.
+ * Extensions can define their own wait events in this category.  They should
+ * call WaitEventExtensionNew() with a wait event string.  If the wait event
+ * associated the string is already allocated, it returns the wait event info.
+ * If not, it will get one wait event ID that is allocated from a shared
+ + counter, associate the string to the number in the shared dynamic hash and
+ * return the wait event info.
+ *
+ * The ID retrieved can be used with pgstat_report_wait_start() or equivalent.
  */
 typedef enum
 {
@@ -60,9 +62,7 @@ typedef enum
 extern void WaitEventExtensionShmemInit(void);
 extern Size WaitEventExtensionShmemSize(void);
 
-extern uint32 WaitEventExtensionNew(void);
-extern void WaitEventExtensionRegisterName(uint32 wait_event_info,
-		   const char *wait_event_name);
+extern uint32 WaitEventExtensionNew(const char *wait_event_name);
 
 /* --
  * pgstat_report_wait_start() -
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index b34b6afecd..7af3e0ba1a 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -53,3 +53,4 @@ XactTruncationLock	44
 # 45 was XactTruncationLock until removal of BackendRandomLock
 WrapLimitsVacuumLock46
 NotifyQueueTailLock	47
+WaitEventExtensionLock  48
diff --git a/src/backend/utils/activity/wait_event.c b/src/backend/utils/activity/wait_event.c
index b3596ece80..14750233d4 100644
--- a/src/backend/utils/activity/wait_event.c
+++ b/src/backend/utils/activity/wait_event.c
@@ -45,6 +45,41 @@ uint32	   *my_wait_event_info = _my_wait_event_info;
 #define 

Re: run pgindent on a regular basis / scripted manner

2023-08-13 Thread Andrew Dunstan


On 2023-08-12 Sa 20:53, Peter Geoghegan wrote:

On Sat, Aug 12, 2023 at 5:20 PM Tom Lane  wrote:

Hm.  I was envisioning that we should expect committers to deal
with this, not original patch submitters.  So that's an argument
against including it in the CI tests.  But I'm in favor of anything
we can do to make it more painless for committers to fix up patch
indentation.



I agree with this.



Making it a special responsibility for committers comes with the same
set of problems that we see with catversion bumps. People are much
more likely to forget to do something that must happen last.



After I'd been caught by this once or twice I implemented a git hook 
test for that too - in fact it was the first hook I did. It's not 
perfect but it's saved me a couple of times:



check_catalog_version () {

    # only do this on master
    test  "$branch" = "master" || return 0

    case "$files" in
    *src/include/catalog/catversion.h*)
    return 0;
    ;;
    *src/include/catalog/*)
    ;;
    *)
    return 0;
    ;;
    esac

    # changes include catalog but not catversion.h, so warn about it
    {
    echo 'Commit on master alters catalog but catversion not bumped'
    echo 'It can be forced with git commit --no-verify'
    } >&2

    exit 1
}


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


pgbench with libevent?

2023-08-13 Thread Fabien COELHO


Hello devs,

Pgbench is managing clients I/Os manually with select or poll. Much of 
this could be managed by libevent.


Pros:

1. libevent is portable, stable, and widely used (eg Chromium, Memcached, 
PgBouncer).

2. libevent implements more I/O wait methods, which may be more efficient on 
some platforms
   (eg FreeBSD kqueue, Windows wepoll in libevent 2.2 alpha), and hides 
portability issues.

3. it would remove significant portions of unattractive pgbench code, esp. in 
threadRun,
   and around socket/poll abstraction and portability layer.

4. depending on the number of event loops, the client load could be shared more 
evenly.
   currently a thread only manages its own clients, some client I/Os may be 
waiting to be
   processed while other threads could be available to process them.

Cons:

1. it adds a libevent dependency to postgres. This may be a no go from the 
start.

2. this is a significant refactoring project which implies a new internal 
architecture and adds
   new code to process and generate appropriate events.

3. libevent ability to function efficiently in a highly multithreaded 
environment
   is unclear. Should there be one event queue which generate a shared work 
queue?
   or several event queues, one per thread (which would remove the sharing pro)?
   or something in between? Some experiments and configuratibility may be 
desirable.
   This may also have an impact on pgbench user interface and output depending 
on the result,
   eg there may be specialized event and worker threads, some statistics may be 
slightly
   different, new options may be needed…

4. libevent development seems slugish, last bugfix was published 3 years ago, 
version
   2.2 has been baking for years, but the development seems lively (+100 
contributors).

Neutral?

1. BSD 3 clauses license.

Is pros > cons, or not? Other thoughts, pros, cons?

--
Fabien.

Re: pgbench: allow to exit immediately when any client is aborted

2023-08-13 Thread Fabien COELHO



Hello Yugo-san,


I attached the updated patch v3 including changes above, a test,
and fix of the typo you pointed out.


I'm sorry but the test in the previous patch was incorrect.
I attached the correct one.


About pgbench exit on abort v3:

Patch does not "git apply", but is ok with "patch" although there are some
minor warnings.

Compiles. Code is ok. Tests are ok.

About Test:

The code is simple to get an error quickly but after a few transactions, 
good. I'll do a minimal "-c 2 -j 2 -t 10" instead of "-c 4 -j 4 -T 10".


--
Fabien.




Re: Make psql's qeury canceling test simple by using signal() routine of IPC::Run

2023-08-13 Thread Fabien COELHO



Hello Yugo-san,


Currently, the psql's test of query cancelling (src/bin/psql/t/020_cancel.pl)
gets the PPID of a running psql by using "\!" meta command, and sends SIGINT to
the process by using "kill". However, IPC::Run provides signal() routine that
sends a signal to a running process, so I think we can rewrite the test using
this routine to more simple fashion as attached patch.

What do you think about it?


I'm the one who pointed out signal(), so I'm a little biaised, 
nevertheless, regarding the patch:


Patch applies with "patch".

Test code is definitely much simpler.

Test run is ok on my Ubuntu laptop.

Let's drop 25 lines of perl!

--
Fabien.




Re: Ignore 2PC transaction GIDs in query jumbling

2023-08-13 Thread Michael Paquier
On Sun, Aug 13, 2023 at 02:48:22PM +0800, Julien Rouhaud wrote:
> On Sun, Aug 13, 2023 at 03:25:33PM +0900, Michael Paquier wrote:
>> Perhaps not as much, actually, because I was just reminded that
>> DEALLOCATE is something that pg_stat_statements ignores.  So this
>> makes harder the introduction of tests.
> 
> Maybe it's time to revisit that?  According to [1] the reason why
> pg_stat_statements currently ignores DEALLOCATE is because it indeed bloated
> the entries but also because at that time the suggestion for jumbling only 
> this
> one was really hackish.

Good point.  The argument of the other thread does not really hold
much these days now that query jumbling can happen for all the utility
nodes.

> Now that we do have a sensible approach to jumble utility statements, I think
> it would be beneficial to be able to track those, for instance to be easily
> diagnose a driver that doesn't rely on the extended protocol.

Fine by me.  Would you like to write a patch?  I've begun typing an
embryon of patch a few days ago, and did not look yet at the rest.
Please see the attached.

>> Anyway, I guess that your own
>> extension modules have a need for a query ID compiled with these
>> fields ignored?
> 
> My module has a need to track statements and still work efficiently after 
> that.
> So anything that can lead to virtually an infinite number of 
> pg_stat_statements
> entries is a problem for me, and I guess to all the other modules / tools that
> track pg_stat_statements.  I guess the reason why my module is still ignoring
> DEALLOCATE is because it existed before pg 9.4 (with a homemade queryid as it
> wasn't exposed before that), and it just stayed there since with the rest of
> still problematic statements.

Yeah, probably.
--
Michael
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2565348303..b7aeebe3b4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3925,8 +3925,10 @@ typedef struct ExecuteStmt
 typedef struct DeallocateStmt
 {
 	NodeTag		type;
-	char	   *name;			/* The name of the plan to remove */
-	/* NULL means DEALLOCATE ALL */
+	/* The name of the plan to remove.  NULL means DEALLOCATE ALL. */
+	char	   *name pg_node_attr(query_jumble_ignore);
+	/* token location, or -1 if unknown */
+	int			location pg_node_attr(query_jumble_location);
 } DeallocateStmt;
 
 /*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b3bdf947b6..680c7f3683 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11936,6 +11936,7 @@ DeallocateStmt: DEALLOCATE name
 		DeallocateStmt *n = makeNode(DeallocateStmt);
 
 		n->name = $2;
+		n->location = @2;
 		$$ = (Node *) n;
 	}
 | DEALLOCATE PREPARE name
@@ -11943,6 +11944,7 @@ DeallocateStmt: DEALLOCATE name
 		DeallocateStmt *n = makeNode(DeallocateStmt);
 
 		n->name = $3;
+		n->location = @3;
 		$$ = (Node *) n;
 	}
 | DEALLOCATE ALL
@@ -11950,6 +11952,7 @@ DeallocateStmt: DEALLOCATE name
 		DeallocateStmt *n = makeNode(DeallocateStmt);
 
 		n->name = NULL;
+		n->location = -1;
 		$$ = (Node *) n;
 	}
 | DEALLOCATE PREPARE ALL
@@ -11957,6 +11960,7 @@ DeallocateStmt: DEALLOCATE name
 		DeallocateStmt *n = makeNode(DeallocateStmt);
 
 		n->name = NULL;
+		n->location = -1;
 		$$ = (Node *) n;
 	}
 		;
diff --git a/contrib/pg_stat_statements/expected/utility.out b/contrib/pg_stat_statements/expected/utility.out
index 93735d5d85..823d78802b 100644
--- a/contrib/pg_stat_statements/expected/utility.out
+++ b/contrib/pg_stat_statements/expected/utility.out
@@ -472,6 +472,45 @@ SELECT pg_stat_statements_reset();
  
 (1 row)
 
+-- Execution statements
+SELECT 1 as a;
+ a 
+---
+ 1
+(1 row)
+
+PREPARE stat_select AS SELECT $1 AS a;
+EXECUTE stat_select (1);
+ a 
+---
+ 1
+(1 row)
+
+DEALLOCATE stat_select;
+PREPARE stat_select AS SELECT $1 AS a;
+EXECUTE stat_select (2);
+ a 
+---
+ 2
+(1 row)
+
+DEALLOCATE PREPARE stat_select;
+DEALLOCATE ALL;
+DEALLOCATE PREPARE ALL;
+SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+ calls | rows | query 
+---+--+---
+ 2 |2 | PREPARE stat_select AS SELECT $1 AS a
+ 1 |1 | SELECT $1 as a
+ 1 |1 | SELECT pg_stat_statements_reset()
+(3 rows)
+
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
 -- SET statements.
 -- These use two different strings, still they count as one entry.
 SET work_mem = '1MB';
diff --git a/contrib/pg_stat_statements/sql/utility.sql b/contrib/pg_stat_statements/sql/utility.sql
index 87666d9135..5f7d4a467f 100644
--- a/contrib/pg_stat_statements/sql/utility.sql
+++ b/contrib/pg_stat_statements/sql/utility.sql
@@ -237,6 +237,19 @@ DROP DOMAIN domain_stats;
 SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";

Re: Ignore 2PC transaction GIDs in query jumbling

2023-08-13 Thread Julien Rouhaud
On Sun, Aug 13, 2023 at 03:25:33PM +0900, Michael Paquier wrote:
> On Tue, Aug 01, 2023 at 10:46:58AM +0800, Julien Rouhaud wrote:
> >
> > Agreed, it should be as trivial to implement as for the 2pc commands :)
>
> Perhaps not as much, actually, because I was just reminded that
> DEALLOCATE is something that pg_stat_statements ignores.  So this
> makes harder the introduction of tests.

Maybe it's time to revisit that?  According to [1] the reason why
pg_stat_statements currently ignores DEALLOCATE is because it indeed bloated
the entries but also because at that time the suggestion for jumbling only this
one was really hackish.

Now that we do have a sensible approach to jumble utility statements, I think
it would be beneficial to be able to track those, for instance to be easily
diagnose a driver that doesn't rely on the extended protocol.

> Anyway, I guess that your own
> extension modules have a need for a query ID compiled with these
> fields ignored?

My module has a need to track statements and still work efficiently after that.
So anything that can lead to virtually an infinite number of pg_stat_statements
entries is a problem for me, and I guess to all the other modules / tools that
track pg_stat_statements.  I guess the reason why my module is still ignoring
DEALLOCATE is because it existed before pg 9.4 (with a homemade queryid as it
wasn't exposed before that), and it just stayed there since with the rest of
still problematic statements.

> For now, I have applied the 2PC bits independently, as of 638d42a.

Thanks!

[1] 
https://www.postgresql.org/message-id/flat/alpine.DEB.2.10.1404011631560.2557%40sto




Re: Ignore 2PC transaction GIDs in query jumbling

2023-08-13 Thread Michael Paquier
On Tue, Aug 01, 2023 at 10:46:58AM +0800, Julien Rouhaud wrote:
> On Tue, Aug 01, 2023 at 11:37:49AM +0900, Michael Paquier wrote:
>> On Tue, Aug 01, 2023 at 10:22:09AM +0800, Julien Rouhaud wrote:
>>> Looking at the rest of the ignored patterns, the only remaining one would be
>>> DEALLOCATE, which AFAICS doesn't have a query_jumble_ignore tag for now.
>>
>> This one seems to be simple as well with a location field, looking
>> quickly at its Node.
> 
> Agreed, it should be as trivial to implement as for the 2pc commands :)

Perhaps not as much, actually, because I was just reminded that
DEALLOCATE is something that pg_stat_statements ignores.  So this
makes harder the introduction of tests.  Anyway, I guess that your own
extension modules have a need for a query ID compiled with these
fields ignored? 

For now, I have applied the 2PC bits independently, as of 638d42a.
--
Michael


signature.asc
Description: PGP signature