date:20230828

Re: Debian 12 gcc warning

2023-08-28 Thread David Rowley

On Tue, 29 Aug 2023 at 07:37, Bruce Momjian  wrote:
> nargs = 0;
> foreach(lc, args)
> {
> actual_arg_types[nargs++] = exprType((Node *) lfirst(lc));
> }

Does it still produce the warning if you form the above more like?

nargs = list_length(args);
for (int i = 0; i < nargs; i++)
actual_arg_types[i] = exprType((Node *) list_nth(args, i));

I'm just not sure if it's unable to figure out if at least nargs
elements is set or if it won't be happy until all 100 elements are
set.

David

Re: Debian 12 gcc warning

2023-08-28 Thread Bruce Momjian

On Tue, Aug 29, 2023 at 11:55:48AM +1200, David Rowley wrote:
> On Tue, 29 Aug 2023 at 07:37, Bruce Momjian  wrote:
> > nargs = 0;
> > foreach(lc, args)
> > {
> > actual_arg_types[nargs++] = exprType((Node *) lfirst(lc));
> > }
> 
> Does it still produce the warning if you form the above more like?
> 
> nargs = list_length(args);
> for (int i = 0; i < nargs; i++)
> actual_arg_types[i] = exprType((Node *) list_nth(args, i));
> 
> I'm just not sure if it's unable to figure out if at least nargs
> elements is set or if it won't be happy until all 100 elements are
> set.

I applied the attached patch but got the same warning:

clauses.c: In function ‘recheck_cast_function_args’:
clauses.c:4297:19: warning: ‘actual_arg_types’ may be used 
uninitialized [-Wmaybe-uninitialized]
 4297 | rettype = 
enforce_generic_type_consistency(actual_arg_types,
  |   
^~
 4298 | 
   declared_arg_types,
  | 
   ~~~
 4299 | 
   nargs,
  | 
   ~~
 4300 | 
   funcform->prorettype,
  | 
   ~
 4301 | 
   false);
  | 
   ~~
In file included from clauses.c:45:
../../../../src/include/parser/parse_coerce.h:82:17: note: by argument 
1 of type ‘const Oid *’ {aka ‘const unsigned int *’} to 
‘enforce_generic_type_consistency’ declared here
   82 | extern Oid  enforce_generic_type_consistency(const Oid 
*actual_arg_types,
  | ^~~~
clauses.c:4279:33: note: ‘actual_arg_types’ declared here
 4279 | Oid actual_arg_types[FUNC_MAX_ARGS];
  | ^~~~

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index da258968b8..6cf020acef 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -4279,15 +4279,19 @@ recheck_cast_function_args(List *args, Oid result_type,
 	Oid			actual_arg_types[FUNC_MAX_ARGS];
 	Oid			declared_arg_types[FUNC_MAX_ARGS];
 	Oid			rettype;
-	ListCell   *lc;
+//	ListCell   *lc;
 
 	if (list_length(args) > FUNC_MAX_ARGS)
 		elog(ERROR, "too many function arguments");
-	nargs = 0;
-	foreach(lc, args)
-	{
-		actual_arg_types[nargs++] = exprType((Node *) lfirst(lc));
-	}
+//	nargs = 0;
+//	foreach(lc, args)
+//	{
+//		actual_arg_types[nargs++] = exprType((Node *) lfirst(lc));
+//	}
+	nargs = list_length(args);
+	for (int i = 0; i < nargs; i++)
+	actual_arg_types[i] = exprType((Node *) list_nth(args, i));
+
 	Assert(nargs == pronargs);
 	memcpy(declared_arg_types, proargtypes, pronargs * sizeof(Oid));
 	rettype = enforce_generic_type_consistency(actual_arg_types,

Re: Should the archiver process always make sure that the timeline history files exist in the archive?

2023-08-28 Thread Jimmy Yih

Thanks for the insightful response! I have attached an updated patch
that moves the proposed logic to the end of StartupXLOG where it seems
more correct to do this. It also helps with backporting (if it's
needed) since the archiver process only has access to shared memory
starting from Postgres 14.

Kyotaro Horiguchi  wrote:
> A. The OP suggests archiving the timeline history file for the current
>  timeline every time the archiver starts. However, I don't think we
>  want to keep archiving the same file over and over. (Granted, we're
>  not always perfect at avoiding that..)

With the updated proposed patch, we'll be checking if the current
timeline history file needs to be archived at the end of StartupXLOG
if archiving is enabled. If it detects that a .ready or .done file
already exists, then it won't do anything (which will be the common
case). I agree though that this may be an excessive check since it'll
be a no-op the majority of the time. However, it shouldn't execute
often and seems like a quick safe preventive measure. Could you give
more details on why this would be too cumbersome?

Kyotaro Horiguchi  wrote:
> B. Given that the steps valid, I concur to what is described in the
>  test script provided: standbys don't really need that history file
>  for the initial TLI (though I have yet to fully verify this).  If the
>  walreceiver just overlooks a fetch error for this file, the standby
>  can successfully start. (Just skipping the first history file seems
>  to work, but it feels a tad aggressive to me.)

This was my initial thought as well but I wasn't sure if it was okay
to overlook the fetch error. Initial testing and brainstorming seems
to show that it's okay. I think the main bad thing is that these new
standbys will not have their initial timeline history files which can
be useful for administration. I've attached a patch that attempts this
approach if we want to switch to this approach as the solution. The
patch contains an updated TAP test as well to better showcase the
issue and fix.

Kyotaro Horiguchi  wrote:
> C. If those steps aren't valid, we might want to add a note stating
>  that -X none basebackups do need the timeline history file for the
>  initial TLI.

The difficult thing about only documenting this is that it forces the
user to manually store and track the timeline history files. It can be
a bit cumbersome for WAL archiving users to recognize this scenario
when they're just trying to optimize their basebackups by using
-Xnone. But then again -Xnone does seem like it's designed for
advanced users so this might be okay.

Kyotaro Horiguchi  wrote:
> And don't forget to enable archive mode before the latest timeline
> switch if any.

This might not be reasonable since a user could've been using
streaming replication and doing failover/failbacks as part of general
high availability to manage their Postgres without knowing they were
going to enable WAL archiving later on. The user would need to
configure archiving and force a failover which may not be
straightforward.

Regards,
Jimmy Yih

v2-0001-Archive-current-timeline-history-file-after-recovery.patch
Description:  v2-0001-Archive-current-timeline-history-file-after-recovery.patch

v1-0001-Allow-recovery-to-proceed-when-initial-timeline-hist.patch
Description:  v1-0001-Allow-recovery-to-proceed-when-initial-timeline-hist.patch

Re: Debian 12 gcc warning

2023-08-28 Thread Bruce Momjian

On Mon, Aug 28, 2023 at 07:10:38PM -0400, Bruce Momjian wrote:
> On Tue, Aug 29, 2023 at 07:30:15AM +0900, Michael Paquier wrote:
> > On Mon, Aug 28, 2023 at 03:37:20PM -0400, Bruce Momjian wrote:
> > > I don't see a clean way of avoiding the warning except by initializing
> > > the array, which seems wasteful.
> > 
> > Or just initialize the array with a {0}?
> 
> Uh, doesn't that set all elements to zero?  See:
> 
>   
> https://stackoverflow.com/questions/2589749/how-to-initialize-array-to-0-in-c

FYI, that does stop the warning.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Wrong usage of pqMsg_Close message code?

2023-08-28 Thread Tatsuo Ishii

> Yeah, both of you are right here.  Anyway, it seems to me that there
> is a bit more going on in protocol.h.  I have noticed two more things
> that are incorrect:
> - HandleParallelMessage is missing a message for 'P', but I think that
> we should have a code for it as well as part of the parallel query
> protocol.

I did not know this. Why is this not explained in the frontend/backend
protocol document?

> - PqMsg_Terminate can be sent by the frontend *and* the backend, see
> fe-connect.c and parallel.c.  However, protocol.h documents it as a
> frontend-only code.

I do not blame protocol.h because our frontend/backend protocol
document also stats that it's a frontend only message. Someone who
started to use 'X' in backend should have added that in the
documentation.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Eager page freeze criteria clarification

2023-08-28 Thread Peter Geoghegan

On Mon, Aug 28, 2023 at 4:15 PM Melanie Plageman
 wrote:
> On Mon, Aug 28, 2023 at 5:06 PM Peter Geoghegan  wrote:
> > What I've described in a scheme for deciding to not freeze where that
> > would usually happen -- a scheme for *vetoing* page freezing -- rather
> > than a scheme for deciding to freeze. On its own, what I suggested
> > cannot help at all. It assumes a world in which we're already deciding
> > to freeze much more frequently, based on whatever other criteria. It's
> > intended to complement something like the LSN scheme.
>
> I like the direction of this idea. It could avoid repeatedly freezing
> a page that is being modified over and over. I tried it out with a
> short-running version of workload I (approximating a work queue) --
> and it prevents unnecessary freezing.

I see a lot of value in it as an enabler of freezing at the earliest
opportunity, which is usually the way to go.

> The problem with this criterion is that if you freeze a page and then
> ever modify it again -- even once, vacuum will not be allowed to
> freeze it again until it is forced to.

Right. Like you, I was thinking of something that dampened VACUUM's
newfound enthusiasm for freezing, in whatever way -- not a discrete
rule. A mechanism with a sense of proportion about what it meant for
the page to look a certain way. The strength of the signal would
perhaps be highest (i.e. most discouraging of further freezing) on a
page that has only a few relatively recent unfrozen tuples. XID age
could actually be useful here!

> Perhaps we could use a very
> strict LSN cutoff for pages which contain any frozen tuples -- for
> example: only freeze a page containing a mix of frozen and unfrozen
> tuples if it has not been modified since before the last vacuum of the
> relation ended.

You might also need to account for things like interactions with the
FSM. If the new unfrozen tuple packs the page to the brim, and the new
row is from an insert, then maybe the mechanism shouldn't dampen our
enthusiasm for freezing the page at all. Similarly, on a page that has
no unfrozen tuples at all just yet, it would make sense for the
algorithm to be most enthusiastic about freezing those pages that can
fit no more tuples. Perhaps we'd be willing to accept the cost of an
extra FPI with such a page, for example.

It will take time to validate these sorts of ideas thoroughly, of
course. I agree with what you said upthread about it also being a
question of values and priorities.

-- 
Peter Geoghegan

Re: Query execution in Perl TAP tests needs work

2023-08-28 Thread Noah Misch

On Mon, Aug 28, 2023 at 05:29:56PM +1200, Thomas Munro wrote:
> CI shows Windows
> consuming 4 CPUs at 100% for a full 10 minutes to run a test suite
> that finishes in 2-3 minutes everywhere else with the same number of
> CPUs.  Could there be an event handling snafu in IPC::Run or elsewhere
> nearby?  It seems like there must be either a busy loop or a busted
> sleep/wakeup... somewhere?

That smells like this one:
https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929

> As an experiment, I hacked up a not-good-enough-to-share experiment
> where $node->safe_psql() would automatically cache a BackgroundPsql
> object and reuse it, and the times for that test dropped ~51 -> ~9s on
> Windows, and ~7 -> ~2s on the Unixen.

Nice!

> suppose there are quite a few ways we could do better:
> 
> 1.  Don't fork anything at all: open (and cache) a connection directly
> from Perl.
> 1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
> popular Perl xsub library?
> 1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
> of some existing one.
> 2.  Use long-lived psql sessions.
> 2a.  Something building on BackgroundPsql.
> 2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
> protocol that is more fun to talk to from Perl/machines?

(2a) seems adequate and easiest to achieve.  If DBD::Pg were under a
compatible license, I'd say use it as the vendor for (1a).  Maybe we can get
it relicensed?  Building a separate way of connecting from Perl would be sad.

Re: Logger process and "LOG: could not close client or listen socket: Bad file descriptor"

2023-08-28 Thread Peter Geoghegan

On Sun, Aug 27, 2023 at 5:52 PM Michael Paquier  wrote:
> From what I can see, this is is a rather old issue, because
> ListenSocket[] is filled with PGINVALID_SOCKET *after* starting the
> syslogger.  It seems to me that we should just initialize the array
> before starting the syslogger, so as we don't get these incorrect
> logs?
>
> Thoughts?  Please see the attached.

Agreed, this is very annoying. I'm going to start using your patch
with the feature branch I'm working on. Hopefully that won't be
necessary for too much longer.


-- 
Peter Geoghegan

Re: Debian 12 gcc warning

2023-08-28 Thread John Naylor

On Tue, Aug 29, 2023 at 6:56 AM David Rowley  wrote:
>
> I'm just not sure if it's unable to figure out if at least nargs
> elements is set or if it won't be happy until all 100 elements are
> set.

It looks like the former, since I can silence it on gcc 13 / -O1 by doing:

/* keep compiler quiet */
actual_arg_types[0] = InvalidOid;

--
John Naylor
EDB: http://www.enterprisedb.com

Re: POC, WIP: OR-clause support for indexes

2023-08-28 Thread a.rybakina

Thank you for your interest in this problem and help, and I'm sorry that 
I didn't respond to this email for a long time. To be honest, I wanted 
to investigate the problems in more detail and already answer more 
clearly, but unfortunately I have not found anything more significant yet.


On 21.08.2023 01:26, Peter Geoghegan wrote:

There was actually support for OR lists in index AMs prior to
ScalarArrayOpExpr. Even though ScalarArrayOpExpr don't really seem all
that related to bitmap scans these days (since at least nbtree knows
how to execute them "natively"), that wasn't always the case.
ScalarArrayOpExpr were invented the same year that bitmap index scans
were first added (2005), and seem more or less related to that work.
See commits bc843d39, 5b051852, 1e9a6ba5, and 290166f9 (all from
2005). Particularly the last one, which has a commit message that
heavily suggests that my interpretation is correct.

Back in 2003, commit 9888192f removed (or at least simplified) what
were then called "CNF/DNF CONVERSION ROUTINES". Prior to that point
the optimizer README had something about leaving clause lists
un-normalized leading to selectivity estimation problems. Bear in mind
that this is a couple of years before ScalarArrayOpExpr was first
invented. Apparently even back then "The OR-of-ANDs format is useful
for indexscan implementation". It's possible that that old work will
offer some hints on what to do now.
In a way it's not surprising that work in this area would have some
impact on selectivies. The surprising part is the extent of the
problem, I suppose.

I see that a lot of the things in this area are just used by BitmapOr
clauses, such as build_paths_for_OR() -- but you're not necessarily
able to use any of that stuff. Also, choose_bitmap_and() has some
stuff about how it compensates to avoid "too-small selectivity that
makes a redundant AND step look like it reduces the total cost". It
also mentions some problems with match_join_clauses_to_index() +
extract_restriction_or_clauses(). Again, this might be a good place to
look for more clues.
I agree with your assumption about looking at the source of the error 
related to selectivity in these places. But honestly, no matter how many 
times I looked, until enough sensible thoughts appeared, which could 
cause a problem. I keep looking, maybe I'll find something.
EXPLAIN (COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND 
(tenthous = 1 OR tenthous = 3 OR tenthous = 42); - QUERY PLAN 
-- 
- Bitmap Heap Scan on tenk1 - Recheck Cond: (((thousand = 42) AND 
(tenthous = 1)) OR ((thousand = 42) AND (tenthous = 3)) OR ((thousand 
= 42) AND (tenthous = 42))) - -> BitmapOr - -> Bitmap Index Scan on 
tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous = 
1)) - -> Bitmap Index Scan on tenk1_thous_tenthous - Index Cond: 
((thousand = 42) AND (tenthous = 3)) - -> Bitmap Index Scan on 
tenk1_thous_tenthous - Index Cond: ((thousand = 42) AND (tenthous = 
42)) -(9 rows) + QUERY PLAN 
+ 
+ Index Scan using tenk1_thous_tenthous on tenk1 + Index Cond: 
((thousand = 42) AND (tenthous = ANY (ARRAY[1, 3, 42]))) +(2 rows)


I think that we currently over-rely on BitmapOr for OR clauses. It's
useful that they're so general, of course, but ISTM that we shouldn't
even try to use a BitmapOr in simple cases. Things like the "WHERE
thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42)"
tenk1 query that you brought up probably shouldn't even have a
BitmapOr path (which I guess they don't with you patch). Note that I
recently discussed the same query at length with Tomas Vondra on the
ongoing thread for his index filter patch (you probably knew that
already).
I think so too, but it's still quite difficult to find a stable enough 
optimization to implement this, in my opinion. But I will finish the 
current optimization with OR->ANY, given that something interesting has 
appeared.

Re: subscription/015_stream sometimes breaks

2023-08-28 Thread Amit Kapila

On Mon, Aug 28, 2023 at 5:35 AM Peter Smith  wrote:
>
> On Fri, Aug 25, 2023 at 8:15 PM Amit Kapila  wrote:
>
> IMO there are inconsistencies in the second patch that was pushed.
>
> 1. In the am_xxx functions, why is there Assert 'in_use' only for the
> APPLY / PARALLEL_APPLY workers but not for TABLESYNC workers?
>
> 2. In the am_xxx functions there is now Assert 'in_use', so why are we
> still using macros to check again what we already asserted is not
> possible? (Or, if the checking overkill was a deliberate choice then
> why is there no isLeaderApplyWorker macro?)
>
> ~
>
> PSA a small patch to address these.
>

I find your suggestions reasonable. Alvaro, do you have any comments?

-- 
With Regards,
Amit Kapila.

Re: Query execution in Perl TAP tests needs work

2023-08-28 Thread Thomas Munro

On Tue, Aug 29, 2023 at 1:48 PM Noah Misch  wrote:
> https://github.com/cpan-authors/IPC-Run/issues/166#issuecomment-1288190929

Interesting.  But that shows a case with no pipes connected, using
select() as a dumb sleep and ignoring SIGCHLD.  In our usage we have
pipes connected, and I think select() should return when the child's
output pipes become readable due to EOF.  I guess something about that
might be b0rked on Windows?  I see there is an extra helper process
doing socket<->pipe conversion (hah, that explains an extra ~10ms at
the start in my timestamps)... I can't really follow that code, but
perhaps the parent forgot to close the far end of the socket pair, so
there is no EOF?

Re: Query execution in Perl TAP tests needs work

2023-08-28 Thread Thomas Munro

On Tue, Aug 29, 2023 at 1:23 AM Andrew Dunstan  wrote:
> I like the idea of using a pure perl pq implementation, not least because it 
> could expand our ability to test things at the protocol level. Not sure how 
> much work it would be. I'm willing to help if we want to go that way.

Cool.  Let's see what others think.

And assuming we can pick *something* vaguely efficient and find a Perl
hacker to implement it, a related question is how to expose it to our
test suites.

Should we figure out how to leave all our tests unchanged, by teaching
$node->psql() et al to do caching implicitly?  Or should we make it
explicit, with $conn = $node->connect(), and $conn->do_stuff()?  And
if the latter, should do_stuff be DBI style or something that suits us
better?

Re: persist logical slots to disk during shutdown checkpoint

2023-08-28 Thread vignesh C

On Mon, 28 Aug 2023 at 18:56, Amit Kapila  wrote:
>
> On Thu, Aug 24, 2023 at 11:44 AM vignesh C  wrote:
> >
>
> The patch looks mostly good to me. I have made minor changes which are
> as follows: (a) removed the autovacuum =off and
> wal_receiver_status_interval = 0 setting as those doesn't seem to be
> required for the test; (b) changed a few comments and variable names
> in the code and test;
>
> Shall we change the test file name from always_persist to
> save_logical_slots_shutdown and move to recovery/t/ as this test is
> about verification after the restart of the server?

That makes sense. The attached v6 version has the changes for the
same, apart from this I have also fixed a) pgindent issues b) perltidy
issues c) one variable change (flush_lsn_changed to
confirmed_flush_has_changed) d) corrected few comments in the test
file. Thanks to Peter for providing few offline comments.

Regards,
Vignesh
From 7cb9d7c874397cacbb4fdd6c411f1d93570265f8 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v6] Persist logical slots to disk during a shutdown checkpoint
 if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help with avoiding processing the same transactions again in
some boundary cases after the clean shutdown and restart. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN,
it may lead to processing the same changes again.

Author: Julien Rouhaud, Vignesh C, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=gyqkxox0afnbm1fbp7sy7t4yw...@mail.gmail.com
Discussion: http://postgr.es/m/tyapr01mb58664c81887b3af2eb6b16e3f5...@tyapr01mb5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c |   2 +-
 src/backend/replication/slot.c|  29 +++--
 src/include/replication/slot.h|  13 ++-
 src/test/recovery/meson.build |   1 +
 .../t/038_save_logical_slots_shutdown.pl  | 101 ++
 5 files changed, 133 insertions(+), 13 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..c075f76317 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -109,7 +109,8 @@ static void ReplicationSlotDropPtr(ReplicationSlot *slot);
 /* internal persistency functions */
 static void RestoreSlotFromDisk(const char *name);
 static void CreateSlotOnDisk(ReplicationSlot *slot);
-static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel);
+static void SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel,
+		   bool is_shutdown);
 
 /*
  * Report shared-memory space needed by ReplicationSlotsShmemInit.
@@ -321,6 +322,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -783,7 +785,7 @@ ReplicationSlotSave(void)
 	Assert(MyReplicationSlot != NULL);
 
 	sprintf(path, "pg_replslot/%s", NameStr(MyReplicationSlot->data.name));
-	SaveSlotToPath(MyReplicationSlot, path, ERROR);
+	SaveSlotToPath(MyReplicationSlot, path, ERROR, false);
 }
 
 /*
@@ -1572,11 +1574,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,7 +1602,7 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
-		SaveSlotToPath(s, path, LOG);
+		SaveSlotToPa

Re: Strange presentaion related to inheritance in \d+

2023-08-28 Thread Kyotaro Horiguchi

At Mon, 28 Aug 2023 13:36:00 +0200, Alvaro Herrera  
wrote in 
> On 2023-Aug-28, Kyotaro Horiguchi wrote:
> 
> > But with these tables:
> > 
> > create table p (a int, b int not null default 0);
> > create table c1 (a int, b int not null NO INHERIT default 1) inherits (p);
> > 
> > I get:
> > 
> > > Not-null constraints:
> > >"c1_b_not_null" NOT NULL "b" *NO INHERIT*
> > 
> > Here, "NO INHERIT" is mapped from connoinherit, and conislocal and
> > "coninhcount <> 0" align with "local" and "inherited". For a clearer
> > picuture, those values for c1 are as follows.
> 
> Hmm, I think the bug here is that we let you create a constraint in c1
> that is NOINHERIT.  If the parent already has one INHERIT constraint
> in that column, then the child must have that one also; it's not
> possible to have both a constraint that inherits and one that doesn't.

Yeah, I had the same question about the coexisting of the two.

> I understand that there are only three possibilities for a NOT NULL
> constraint in a column:
> 
> - There's a NO INHERIT constraint.  A NO INHERIT constraint is always
>   defined locally in that table.  In this case, if there is a parent
>   relation, then it must either not have a NOT NULL constraint in that
>   column, or it may also have a NO INHERIT one.  Therefore, it's
>   correct to print NO INHERIT and nothing else.  We could also print
>   "(local)" but I see no point in doing that.
> 
> - A constraint comes inherited from one or more parent tables and has no
>   local definition.  In this case, the constraint always inherits
>   (otherwise, the parent wouldn't have given it to this table).  So
>   printing "(inherited)" and nothing else is correct.
> 
> - A constraint can have a local definition and also be inherited.  In
>   this case, printing "(local, inherited)" is correct.
> 
> Have I missed other cases?

Seems correct. I don't see another case given that NO INHERIT is
inhibited when a table has an inherited constraint.

> The NO INHERIT bit is part of the syntax, which is why I put it in
> uppercase and not marked it for translation.  The other two are
> informational, so they are translatable.

Given the conditions above, I agree with you.

Attached is the initial version of the patch. It prevents "CREATE
TABLE" from executing if there is an inconsisntent not-null
constraint.  Also I noticed that "ALTER TABLE t ADD NOT NULL c NO
INHERIT" silently ignores the "NO INHERIT" part and fixed it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index b534da7d80..39fbb0f151 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2533,7 +2533,7 @@ AddRelationNewConstraints(Relation rel,
 			 * update its catalog status and we're done.
 			 */
 			if (AdjustNotNullInheritance1(RelationGetRelid(rel), colnum,
-		  cdef->inhcount))
+		  cdef->inhcount, cdef->is_no_inherit))
 continue;
 
 			/*
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 2a725f6280..bd589c0e7c 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -669,7 +669,8 @@ extractNotNullColumn(HeapTuple constrTup)
  * If no not-null constraint is found for the column, return false.
  */
 bool
-AdjustNotNullInheritance1(Oid relid, AttrNumber attnum, int count)
+AdjustNotNullInheritance1(Oid relid, AttrNumber attnum, int count,
+		  bool is_no_inherit)
 {
 	HeapTuple	tup;
 
@@ -681,6 +682,14 @@ AdjustNotNullInheritance1(Oid relid, AttrNumber attnum, int count)
 
 		pg_constraint = table_open(ConstraintRelationId, RowExclusiveLock);
 		conform = (Form_pg_constraint) GETSTRUCT(tup);
+
+		if (is_no_inherit)
+			ereport(ERROR,
+	errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	errmsg("cannot change NO INHERIT status of inherited NOT NULL constraint \"%s\" on relation \"%s\"",
+		   NameStr(conform->conname), get_rel_name(relid)));
+
+
 		if (count > 0)
 			conform->coninhcount += count;
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 47c900445c..ae5ef6e115 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -350,8 +350,8 @@ static void truncate_check_perms(Oid relid, Form_pg_class reltuple);
 static void truncate_check_activity(Relation rel);
 static void RangeVarCallbackForTruncate(const RangeVar *relation,
 		Oid relId, Oid oldRelId, void *arg);
-static List *MergeAttributes(List *schema, List *supers, char relpersistence,
-			 bool is_partition, List **supconstr,
+static List *MergeAttributes(List *schema, List *supers, List *nnconstr,
+			 char relpersistence, bool is_partition, List **supconstr,
 			 List **supnotnulls);
 static bool MergeCheckConstraint(List *constraints, char *name, Node *expr);
 static void MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel);
@@ -873,7 +873,7 @@ DefineRelati

Re: Debian 12 gcc warning

2023-08-28 Thread Tristan Partin


On Mon Aug 28, 2023 at 2:37 PM CDT, Bruce Momjian wrote:

I don't see a clean way of avoiding the warning except by initializing
the array, which seems wasteful.


For what it's worth, we recently committed a patch[0] that initialized 
an array due to a similar warning being generated on Fedora 38 (gcc 
(GCC) 13.2.1 20230728 (Red Hat 13.2.1-1)).


[0]: 
https://github.com/postgres/postgres/commit/4a8fef0d733965c1a1836022f8a42ab1e83a721f

--
Tristan Partin
Neon (https://neon.tech)

Re: Autogenerate some wait events code and documentation

2023-08-28 Thread Drouvot, Bertrand


Hi,

On 8/28/23 10:04 AM, Michael Paquier wrote:

On Mon, Jul 17, 2023 at 10:16:02AM +0900, Michael Paquier wrote:

So you mean to switch a line that now looks like that:
WAIT_EVENT_FOO_BAR   FooBar"Waiting on Foo Bar."
To that:
FOO_BAR   "Waiting on Foo Bar."
Or even that:
WAIT_EVENT_FOO_BAR   "Waiting on Foo Bar."

Sure, it is an improvement for any wait events that use WAIT_EVENT_
when searching them, but this adds more magic into the LWLock and Lock
areas if the same conversion is applied there.  Or am I right to
assume that you'd mean to *not* do any of that for these two classes?
These can be treated as exceptions in the script when generating the
wait event names from the enum elements, of course.


I have looked again at that, and switching wait_event_names.txt to use
two columns made of the typedef definitions and the docs like is not a
problem:
FOO_BAR   "Waiting on Foo Bar."

WAIT_EVENT_ is appended to the typedef definitions in the script.  The
wait event names like "FooBar" are generated from the enums by
splitting using their underscores and doing some lc().  Lock and
LWLock don't need to change.  This way, it is easy to grep the wait
events from the source code and match them with wait_event_names.txt.

Thoughts or comments?


Agree that done that way one could easily grep the events from the source code 
and
match them with wait_event_names.txt. Then I don't think the "search" issue in 
the code
is still a concern with the current proposal.

FWIW, I'm getting:

$ git am v3-000*
Applying: Rename wait events with more consistent camelcase style
Applying: Remove column for wait event names in wait_event_names.txt
error: patch failed: src/backend/utils/activity/wait_event_names.txt:261
error: src/backend/utils/activity/wait_event_names.txt: patch does not apply
Patch failed at 0002 Remove column for wait event names in wait_event_names.txt

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-28 Thread Heikki Linnakangas


On 29/08/2023 01:28, Michael Paquier wrote:


In case you've not noticed:
https://www.postgresql.org/message-id/zovvuqe0rdj2s...@paquier.xyz
But it does not really matter now ;)


Ah sorry, missed that thread.


Yes, so it seems. Syslogger is started before the ListenSockets array is
initialized, so its still all zeros. When syslogger is forked, the child
process tries to close all the listen sockets, which are all zeros. So
syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
array initialization earlier.


Yep, I've reached the same conclusion.  Wouldn't it be cleaner to move
the callback registration of CloseServerPorts() closer to the array
initialization, though?


Ok, pushed that way.

I checked the history of this: it goes back to commit 9a86f03b4e in 
version 13. The SysLogger_Start() call used to be later, after setting p 
ListenSockets, but that commit moved it. So I backpatched this to v13.



The first close(0) actually does have an effect: it closes stdin, which is
fd 0. That is surely accidental, but I wonder if we should indeed close
stdin in child processes? The postmaster process doesn't do anything with
stdin either, although I guess a library might try to read a passphrase from
stdin before starting up, for example.


We would have heard about that, wouldn't we?  I may be missing
something of course, but on HEAD, the array initialization is done
before starting any child processes, and the syslogger is the first
one forked.


Yes, syslogger is the only process that closes stdin. After moving the 
initialization, it doesn't close it either.


Thinking about this some more, the ListenSockets array is a bit silly 
anyway. We fill the array starting from index 0, always append to the 
end, and never remove entries from it. It would seem more 
straightforward to keep track of the used size of the array. Currently 
we always loop through the unused parts too, and e.g. 
ConfigurePostmasterWaitSet() needs to walk the array to count how many 
elements are in use.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-28 Thread Amit Kapila

On Mon, Aug 28, 2023 at 1:01 PM Peter Smith  wrote:
>
> Hi, here are my review comments for v26-0003
>
> It seems I must defend some of my previous suggestions from v25* [1],
> so here goes...
>
> ==
> src/bin/pg_upgrade/check.c
>
> 1. check_and_dump_old_cluster
>
> CURRENT CODE (with v26-0003 patch applied)
>
> /* Extract a list of logical replication slots */
> get_old_cluster_logical_slot_infos();
>
> ...
>
> /*
> * Logical replication slots can be migrated since PG17. See comments atop
> * get_old_cluster_logical_slot_infos().
> */
> if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
> {
> check_old_cluster_for_lost_slots();
>
> /*
> * Do additional checks if a live check is not required. This requires
> * that confirmed_flush_lsn of all the slots is the same as the latest
> * checkpoint location, but it would be satisfied only when the server
> * has been shut down.
> */
> if (!live_check)
> check_old_cluster_for_confirmed_flush_lsn();
> }
>
>
> SUGGESTION
>
> /*
>  * Logical replication slots can be migrated since PG17. See comments atop
>  * get_old_cluster_logical_slot_infos().
>  */
> if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) // NOTE 1a.
> {
>   /* Extract a list of logical replication slots */
>   get_old_cluster_logical_slot_infos();
>
>   if (count_old_cluster_slots()) // NOTE 1b.
>   {
> check_old_cluster_for_lost_slots();
>
> /*
>  * Do additional checks if a live check is not required. This requires
>  * that confirmed_flush_lsn of all the slots is the same as the latest
>  * checkpoint location, but it would be satisfied only when the server
>  * has been shut down.
>  */
> if (!live_check)
>   check_old_cluster_for_confirmed_flush_lsn();
>   }
> }
>

I think a slightly better way to achieve this is to combine the code
from check_old_cluster_for_lost_slots() and
check_old_cluster_for_confirmed_flush_lsn() into
check_old_cluster_for_valid_slots(). That will even save us a new
connection for the second check.

Also, I think we can simplify another check in the patch:
@@ -1446,8 +1446,10 @@ check_new_cluster_logical_replication_slots(void)
char   *wal_level;

/* Logical slots can be migrated since PG17. */
-   if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
-   nslots = count_old_cluster_logical_slots();
+   if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+   return;
+
+   nslots = count_old_cluster_logical_slots();


-- 
With Regards,
Amit Kapila.

Re: Logger process and "LOG: could not close client or listen socket: Bad file descriptor"

2023-08-28 Thread Heikki Linnakangas


On 29/08/2023 06:18, Peter Geoghegan wrote:

On Sun, Aug 27, 2023 at 5:52 PM Michael Paquier  wrote:

 From what I can see, this is is a rather old issue, because
ListenSocket[] is filled with PGINVALID_SOCKET *after* starting the
syslogger.  It seems to me that we should just initialize the array
before starting the syslogger, so as we don't get these incorrect
logs?

Thoughts?  Please see the attached.


Agreed, this is very annoying. I'm going to start using your patch
with the feature branch I'm working on. Hopefully that won't be
necessary for too much longer.


Just to close the loop on this thread: I committed and backpatched 
Michael's fix.


Discussion on other thread at 
https://www.postgresql.org/message-id/9caed67f-f93e-5701-8c25-265a2b139ed0%40iki.fi.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Logger process and "LOG: could not close client or listen socket: Bad file descriptor"

2023-08-28 Thread Michael Paquier

On Tue, Aug 29, 2023 at 09:27:48AM +0300, Heikki Linnakangas wrote:
> Just to close the loop on this thread: I committed and backpatched Michael's
> fix.

Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Autogenerate some wait events code and documentation

2023-08-28 Thread Michael Paquier

On Tue, Aug 29, 2023 at 08:17:10AM +0200, Drouvot, Bertrand wrote:
> Agree that done that way one could easily grep the events from the
> source code and match them with wait_event_names.txt. Then I don't
> think the "search" issue in the code is still a concern with the
> current proposal.

It could still be able to append WAIT_EVENT_ to the first column of
the file.  I'd just rather keep it shorter.

> FWIW, I'm getting:
> 
> $ git am v3-000*
> Applying: Rename wait events with more consistent camelcase style
> Applying: Remove column for wait event names in wait_event_names.txt
> error: patch failed: src/backend/utils/activity/wait_event_names.txt:261
> error: src/backend/utils/activity/wait_event_names.txt: patch does not apply
> Patch failed at 0002 Remove column for wait event names in 
> wait_event_names.txt

That may be a bug in the matrix because of bb90022, as git am can be
easily pissed.  I am attaching a new patch series, but it does not
seem to matter here.

I have double-checked the docs generated, while on it, and I am not
seeing anything missing, particularly for the LWLock and Lock parts..
--
Michael
From 7e221db0fe26f9d36df90202934f4177daa86ff7 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 28 Aug 2023 14:47:13 +0900
Subject: [PATCH v4 1/2] Rename wait events with more consistent camelcase
 style

This will help in the introduction of more simplifications with the
generation of wait event data using wait_event_names.txt.  The event
names used the same case-insensitive characters, hence applying lower()
or upper() to the monitoring queries allows the detection of the same
events as before this change.
---
 src/backend/libpq/pqmq.c  |  2 +-
 src/backend/storage/ipc/shm_mq.c  |  6 +-
 .../utils/activity/wait_event_names.txt   | 90 +--
 3 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 253181f47a..38b042804c 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -182,7 +182,7 @@ mq_putmessage(char msgtype, const char *s, size_t len)
 			break;
 
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_PUT_MESSAGE);
+		 WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE);
 		ResetLatch(MyLatch);
 		CHECK_FOR_INTERRUPTS();
 	}
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index d134a09a19..06d6b73527 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -1017,7 +1017,7 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 			 * cheaper than setting one that has been reset.
 			 */
 			(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-			 WAIT_EVENT_MQ_SEND);
+			 WAIT_EVENT_MESSAGE_QUEUE_SEND);
 
 			/* Reset the latch so we don't spin. */
 			ResetLatch(MyLatch);
@@ -1163,7 +1163,7 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 		 * setting one that has been reset.
 		 */
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_RECEIVE);
+		 WAIT_EVENT_MESSAGE_QUEUE_RECEIVE);
 
 		/* Reset the latch so we don't spin. */
 		ResetLatch(MyLatch);
@@ -1252,7 +1252,7 @@ shm_mq_wait_internal(shm_mq *mq, PGPROC **ptr, BackgroundWorkerHandle *handle)
 
 		/* Wait to be signaled. */
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_INTERNAL);
+		 WAIT_EVENT_MESSAGE_QUEUE_INTERNAL);
 
 		/* Reset the latch so we don't spin. */
 		ResetLatch(MyLatch);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 13774254d2..0cace8f40a 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -45,15 +45,15 @@
 Section: ClassName - WaitEventActivity
 
 WAIT_EVENT_ARCHIVER_MAIN	ArchiverMain	"Waiting in main loop of archiver process."
-WAIT_EVENT_AUTOVACUUM_MAIN	AutoVacuumMain	"Waiting in main loop of autovacuum launcher process."
-WAIT_EVENT_BGWRITER_HIBERNATE	BgWriterHibernate	"Waiting in background writer process, hibernating."
-WAIT_EVENT_BGWRITER_MAIN	BgWriterMain	"Waiting in main loop of background writer process."
+WAIT_EVENT_AUTOVACUUM_MAIN	AutovacuumMain	"Waiting in main loop of autovacuum launcher process."
+WAIT_EVENT_BGWRITER_HIBERNATE	BgwriterHibernate	"Waiting in background writer process, hibernating."
+WAIT_EVENT_BGWRITER_MAIN	BgwriterMain	"Waiting in main loop of background writer process."
 WAIT_EVENT_CHECKPOINTER_MAIN	CheckpointerMain	"Waiting in main loop of checkpointer process."
 WAIT_EVENT_LOGICAL_APPLY_MAIN	LogicalApplyMain	"Waiting in main loop of logical replication apply process."
 WAIT_EVENT_LOGICAL_LAUNCHER_MAIN	LogicalLauncherMain	"Waiting in main loop of logical replication launcher process."
 WAIT_EVENT_LOGICAL_PARALLEL_APPLY_MAIN	LogicalParallelApplyMain	"Waiting in main

Re: Query execution in Perl TAP tests needs work

2023-08-28 Thread Thomas Munro

On Tue, Aug 29, 2023 at 1:48 PM Noah Misch  wrote:
> On Mon, Aug 28, 2023 at 05:29:56PM +1200, Thomas Munro wrote:
> > 1.  Don't fork anything at all: open (and cache) a connection directly
> > from Perl.
> > 1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
> > popular Perl xsub library?
> > 1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
> > of some existing one.
> > 2.  Use long-lived psql sessions.
> > 2a.  Something building on BackgroundPsql.
> > 2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
> > protocol that is more fun to talk to from Perl/machines?
>
> (2a) seems adequate and easiest to achieve.  If DBD::Pg were under a
> compatible license, I'd say use it as the vendor for (1a).  Maybe we can get
> it relicensed?  Building a separate way of connecting from Perl would be sad.

Here's my minimal POC of 2a.  It only changes ->safe_psql() and not
the various other things like ->psql() and ->poll_query_until().
Hence use of amcheck/001_verify_heapam as an example: it runs a lot of
safe_psql() queries.  It fails in all kinds of ways if enabled
generally, which would take some investigation (some tests require
there to be no connections at various times, so we'd probably need to
insert disable/re-enable calls at various places).
From 0f6f4b1bdc69a10b1048731d5b1d744567978cce Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 29 Aug 2023 18:12:07 +1200
Subject: [PATCH] XXX cache psql processes


diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index 46d5b53181..316e876c5d 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -18,6 +18,7 @@ $node = PostgreSQL::Test::Cluster->new('test');
 $node->init;
 $node->append_conf('postgresql.conf', 'autovacuum=off');
 $node->start;
+$node->enable_auto_background_psql(1);
 $node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
 
 #
diff --git a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
index 924b57ab21..53ee69c801 100644
--- a/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
+++ b/src/test/perl/PostgreSQL/Test/BackgroundPsql.pm
@@ -221,7 +221,7 @@ sub query
 	$output = $self->{stdout};
 
 	# remove banner again, our caller doesn't care
-	$output =~ s/\n$banner$//s;
+	$output =~ s/\n?$banner$//s;
 
 	# clear out output for the next query
 	$self->{stdout} = '';
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 227c34ab4d..f53844e25e 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -986,6 +986,8 @@ sub stop
 
 	local %ENV = $self->_get_env();
 
+	$self->_close_auto_background_psqls;
+
 	$mode = 'fast' unless defined $mode;
 	return 1 unless defined $self->{_pid};
 
@@ -1025,6 +1027,8 @@ sub reload
 
 	local %ENV = $self->_get_env();
 
+	$self->_close_auto_background_psqls;
+
 	print "### Reloading node \"$name\"\n";
 	PostgreSQL::Test::Utils::system_or_bail('pg_ctl', '-D', $pgdata,
 		'reload');
@@ -1049,6 +1053,8 @@ sub restart
 
 	local %ENV = $self->_get_env(PGAPPNAME => undef);
 
+	$self->_close_auto_background_psqls;
+
 	print "### Restarting node \"$name\"\n";
 
 	# -w is now the default but having it here does no harm and helps
@@ -1349,7 +1355,9 @@ sub new
 		_logfile_base =>
 		  "$PostgreSQL::Test::Utils::log_path/${testname}_${name}",
 		_logfile =>
-		  "$PostgreSQL::Test::Utils::log_path/${testname}_${name}.log"
+		  "$PostgreSQL::Test::Utils::log_path/${testname}_${name}.log",
+		_auto_background_psqls => {},
+		_use_auto_background_psqls => 0
 	};
 
 	if ($params{install_path})
@@ -1508,6 +1516,39 @@ sub _get_env
 	return (%inst_env);
 }
 
+# Private routine to close and forget any psql processes we may be
+# holding onto.
+sub _close_auto_background_psqls
+{
+	my ($self) = @_;
+	foreach my $dbname (keys %{$self->{_auto_background_psqls}})
+	{
+		my $psql = $self->{_auto_background_psqls}{$dbname};
+		delete $self->{_auto_background_psqls}{$dbname};
+		$psql->quit;
+	}
+}
+
+=pod
+
+=item enable_auto_background_psql()
+
+Enable or disable the automatic reuse of long-lived psql sessions.
+
+=cut
+
+sub enable_auto_background_psql
+{
+	my $self = shift;
+	my $value = shift;
+
+	$self->{_use_auto_background_psqls} = $value;
+	if (!$value)
+	{
+		$self->_close_auto_background_psqls;
+	}
+}
+
 # Private routine to get an installation path qualified command.
 #
 # IPC::Run maintains a cache, %cmd_cache, mapping commands to paths.  Tests
@@ -1744,13 +1785,32 @@ sub safe_psql
 
 	my ($stdout, $stderr);
 
-	my $ret = $self->psql(
-		$dbname, $sql,
-		%params,
-		stdout => \$stdout,
-		stderr => \$stderr,
-		on_error_die => 1,
-		on_error_stop => 1);
+
+	# If there are no special parameters, and re-use of sessions
+	# hasn't been disabled, create or re-use a long-lived psql
+	# process.
+	if (!%params && $self->{_use_auto_background_psqls})

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-28 Thread Heikki Linnakangas


On 29/08/2023 09:21, Heikki Linnakangas wrote:

Thinking about this some more, the ListenSockets array is a bit silly
anyway. We fill the array starting from index 0, always append to the
end, and never remove entries from it. It would seem more
straightforward to keep track of the used size of the array. Currently
we always loop through the unused parts too, and e.g.
ConfigurePostmasterWaitSet() needs to walk the array to count how many
elements are in use.


Like this.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 796280f07dd5dbf50897c9895715ab5e2dbf187b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 29 Aug 2023 09:53:08 +0300
Subject: [PATCH 1/1] Refactor ListenSocket array.

Keep track of the used size of the array. That avoids looping through
the whole array in a few places. It doesn't matter from a performance
point of view since the array is small anyway, but this feels less
surprising and is a little less code. Now that we have an explicit
NumListenSockets variable that is statically initialized to 0, we
don't need the loop to initialize the array.

Allocate the array in PostmasterContext. The array isn't needed in
child processes, so this allows reusing that memory. We could easily
make the array resizable now, but we haven't heard any complaints
about the current 64 sockets limit.
---
 src/backend/libpq/pqcomm.c  | 18 +++
 src/backend/postmaster/postmaster.c | 76 +++--
 src/include/libpq/libpq.h   |  2 +-
 3 files changed, 36 insertions(+), 60 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 8d217b0645..48ae7704fb 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -311,8 +311,9 @@ socket_close(int code, Datum arg)
  * specified.  For TCP ports, hostName is either NULL for all interfaces or
  * the interface to listen on, and unixSocketDir is ignored (can be NULL).
  *
- * Successfully opened sockets are added to the ListenSocket[] array (of
- * length MaxListen), at the first position that isn't PGINVALID_SOCKET.
+ * Successfully opened sockets are appended to the ListenSockets[] array.  The
+ * current size of the array is *NumListenSockets, it is updated to reflect
+ * the opened sockets.  MaxListen is the allocated size of the array.
  *
  * RETURNS: STATUS_OK or STATUS_ERROR
  */
@@ -320,7 +321,7 @@ socket_close(int code, Datum arg)
 int
 StreamServerPort(int family, const char *hostName, unsigned short portNumber,
  const char *unixSocketDir,
- pgsocket ListenSocket[], int MaxListen)
+ pgsocket ListenSockets[], int *NumListenSockets, int MaxListen)
 {
 	pgsocket	fd;
 	int			err;
@@ -335,7 +336,6 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber,
 	struct addrinfo *addrs = NULL,
 			   *addr;
 	struct addrinfo hint;
-	int			listen_index = 0;
 	int			added = 0;
 	char		unixSocketPath[MAXPGPATH];
 #if !defined(WIN32) || defined(IPV6_V6ONLY)
@@ -401,12 +401,7 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber,
 		}
 
 		/* See if there is still room to add 1 more socket. */
-		for (; listen_index < MaxListen; listen_index++)
-		{
-			if (ListenSocket[listen_index] == PGINVALID_SOCKET)
-break;
-		}
-		if (listen_index >= MaxListen)
+		if (*NumListenSockets == MaxListen)
 		{
 			ereport(LOG,
 	(errmsg("could not bind to all requested addresses: MAXLISTEN (%d) exceeded",
@@ -573,7 +568,8 @@ StreamServerPort(int family, const char *hostName, unsigned short portNumber,
 	(errmsg("listening on %s address \"%s\", port %d",
 			familyDesc, addrDesc, (int) portNumber)));
 
-		ListenSocket[listen_index] = fd;
+		ListenSockets[*NumListenSockets] = fd;
+		(*NumListenSockets)++;
 		added++;
 	}
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d7bfb28ff3..2659329b82 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -226,7 +226,8 @@ int			ReservedConnections;
 
 /* The socket(s) we're listening to. */
 #define MAXLISTEN	64
-static pgsocket ListenSocket[MAXLISTEN];
+static int NumListenSockets = 0;
+static pgsocket *ListenSockets = NULL;
 
 /* still more option variables */
 bool		EnableSSL = false;
@@ -587,7 +588,6 @@ PostmasterMain(int argc, char *argv[])
 	int			status;
 	char	   *userDoption = NULL;
 	bool		listen_addr_saved = false;
-	int			i;
 	char	   *output_config_variable = NULL;
 
 	InitProcessGlobals();
@@ -1141,17 +1141,6 @@ PostmasterMain(int argc, char *argv[])
  errmsg("could not remove file \"%s\": %m",
 		LOG_METAINFO_DATAFILE)));
 
-	/*
-	 * Initialize input sockets.
-	 *
-	 * Mark them all closed, and set up an on_proc_exit function that's
-	 * charged with closing the sockets again at postmaster shutdown.
-	 */
-	for (i = 0; i < MAXLISTEN; i++)
-		ListenSocket[i] = PGINVALID_SOCKET;
-
-	on_proc_exit(CloseServerPorts, 0);
-
 	/*
 	 * If enabled, start up syslogger collection subp

Re: RFC: Logging plan of the running query

2023-08-28 Thread torikoshia


On 2023-08-26 21:03, James Coleman wrote:

On Fri, Aug 25, 2023 at 7:43 AM James Coleman  wrote:


On Thu, Aug 17, 2023 at 10:02 AM torikoshia 
 wrote:

>
> On 2023-06-16 01:34, James Coleman wrote:
> > Attached is v28
> > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
> > when necessary. Once built with those two patches I'm simply running
> > `make check`.
>
> With v28-0001 and v28-0002 patch, I confirmed backend processes consume
> huge
> amount of memory and under some environments they were terminated by OOM
> killer.
>
> This was because memory was allocated from existing memory contexts and
> they
> were not freed after ProcessLogQueryPlanInterrupt().
> Updated the patch to use dedicated memory context for
> ProcessLogQueryPlanInterrupt().
>
> Applying attached patch and v28-0002 patch, `make check` successfully
> completed after 20min and 50GB of logs on my environment.
>
> >>> On 2023-06-15 01:48, James Coleman wrote:
> >>> > The tests have been running since last night, but have been apparently
> >>> > hung now for many hours.
>
> I don't know if this has anything to do with the hung you faced, but I
> thought
> it might be possible that the large amount of memory usage resulted in
> swapping, which caused a significant delay in processing.

Ah, yes, I think that could be a possible explanation. I was delaying
on this thread because I wasn't comfortable with having caused an
issue once (even if I couldn't easily reproduce) without at least some
theory as to the cause (and a fix).

> If possible, I would be very grateful if you could try to reproduce this
> with
> the v29 patch.

I'll kick off some testing.



I don't have time to investigate what's happening here, but 24 hours
later the first "make check" is still running, and at first glance it
seems to have the same behavior I'd seen that first time. The test
output is to this point:

# parallel group (5 tests):  index_including create_view
index_including_gist create_index create_index_spgist
ok 66+ create_index26365 ms
ok 67+ create_index_spgist 27675 ms
ok 68+ create_view  1235 ms
ok 69+ index_including  1102 ms
ok 70+ index_including_gist 1633 ms
# parallel group (16 tests):  create_aggregate create_cast errors
roleattributes drop_if_exists hash_func typed_table create_am
infinite_recurse

and it hasn't progressed past that point since at least ~16 hours ago
(the first several hours of the run I wasn't monitoring it).

I haven't connected up gdb yet, and won't be able to until maybe
tomorrow, but here's the ps output for postgres processes that are
running:

admin3213249  0.0  0.0 196824 20552 ?Ss   Aug25   0:00
/home/admin/postgresql-test/bin/postgres -D
/home/admin/postgresql-test-data
admin3213250  0.0  0.0 196964  7284 ?Ss   Aug25   0:00
postgres: checkpointer
admin3213251  0.0  0.0 196956  4276 ?Ss   Aug25   0:00
postgres: background writer
admin3213253  0.0  0.0 196956  8600 ?Ss   Aug25   0:00
postgres: walwriter
admin3213254  0.0  0.0 198424  7316 ?Ss   Aug25   0:00
postgres: autovacuum launcher
admin3213255  0.0  0.0 198412  5488 ?Ss   Aug25   0:00
postgres: logical replication launcher
admin3237967  0.0  0.0   2484   516 pts/4S+   Aug25   0:00
/bin/sh -c echo "# +++ regress check in src/test/regress +++" &&
PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH"
LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib"
INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template
../../../src/test/regress/pg_regress --temp-instance=./tmp_check
--inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20
--schedule=./parallel_schedule
admin3237973  0.0  0.0 197880 20688 pts/4S+   Aug25   0:00
postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c
listen_addresses= -k /tmp/pg_regress-7mmGUa
admin3237976  0.0  0.1 198332 44608 ?Ss   Aug25   0:00
postgres: checkpointer
admin3237977  0.0  0.0 198032  4640 ?Ss   Aug25   0:00
postgres: background writer
admin3237979  0.0  0.0 197880  8580 ?Ss   Aug25   0:00
postgres: walwriter
admin3237980  0.0  0.0 199484  7608 ?Ss   Aug25   0:00
postgres: autovacuum launcher
admin3237981  0.0  0.0 199460  5488 ?Ss   Aug25   0:00
postgres: logical replication launcher
admin3243644  0.0  0.2 252400 74656 ?Ss   Aug25   0:01
postgres: admin regression [local] SELECT waiting
admin3243645  0.0  0.1 205480 33992 ?Ss   Aug25   0:00
postgres: admin regression [local] SELECT waiting
admin3243654 99.9  0.0 203156 31504 ?Rs   Aug25 1437:49
postgres: admin regression [local] VACUUM
admin3243655  0.0  0.1 212036 38504 ?Ss   Aug25   0:00
post

Strange presentaion related to inheritance in \d+

2023-08-28 Thread Kyotaro Horiguchi

While translating a message, I found a questionable behavior in \d+,
introduced by a recent commit b0e96f3119. In short, the current code
hides the constraint's origin when "NO INHERIT" is used.

For these tables:

create table p (a int, b int not null default 0);
create table c1 (a int, b int not null default 1) inherits (p);

The output from "\d+ c1" contains the lines:

> Not-null constraints:
>"c1_b_not_null" NOT NULL "b" *(local, inherited)*

But with these tables:

create table p (a int, b int not null default 0);
create table c1 (a int, b int not null NO INHERIT default 1) inherits (p);

I get:

> Not-null constraints:
>"c1_b_not_null" NOT NULL "b" *NO INHERIT*

Here, "NO INHERIT" is mapped from connoinherit, and conislocal and
"coninhcount <> 0" align with "local" and "inherited". For a clearer
picuture, those values for c1 are as follows.

=# SELECT co.conname, at.attname, co.connoinherit, co.conislocal, 
co.coninhcount FROM pg_catalog.pg_constraint co JOIN pg_catalog.pg_attribute at 
ON (at.attnum = co.conkey[1]) WHERE co.contype = 'n' AND co.conrelid = 
'c1'::pg_catalog.regclass AND at.attrelid = 'c1'::pg_catalog.regclass ORDER BY 
at.attnum;
conname| attname | connoinherit | conislocal | coninhcount 
---+-+--++-
 c1_b_not_null | b   | t| t  |   1

It feels off to me, but couldn't find any discussion about it. Is it
the intended behavior? I believe it's more appropriate to show the
origins even when specifed as NO INHERIT.

==
If not so, the following change might be possible, which is quite simple.

> Not-null constraints:
>"c1_b_not_null" NOT NULL "b" NO INHERIT(local, inherited)

However, it looks somewhat strange as the information in parentheses
is not secondary to "NO INHERIT".  Thus, perhaps a clearer or more
proper representation would be:

>"c1_b_not_null" NOT NULL "b" (local, inherited, not inheritable)

That being said, I don't come up with a simple way to do this for now..
(Note that we need to translate the puctuations and the words.)

There's no need to account for all combinations. "Local" and
"inherited" don't be false at the same time and the combination (local
& !inherited) is not displayed. Given these factors, we're left with 6
possible combinations, which I don't think aren't worth the hassle:

(local, inherited, not inheritable)
(inherited, not inheritable) # I didn't figure out how to cause this.
(not inheritable)
(local, inherited)
(inherited)
"" (empty string, means local)

A potential solution that comes to mind is presenting the attributes
in a space sparated list after a colon as attached. (Honestly, I'm not
fond of the format and the final term, though.)

>"c1_b_not_null" NOT NULL "b": local inherited uninheritable

In 0001, I did wonder about hiding "local" when it's not inherited,
but this behavior rfollows existing code.

In 0002, I'm not completely satisfied with the location, but standard
regression test suite seems more suitable for this check than the TAP
test suite used for testing psql.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 6c8511499d5dbfc769c38b32292d415fa8982707 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 28 Aug 2023 14:38:58 +0900
Subject: [PATCH 1/2] Fix not-null constraint representation in \d+

The recent commit b0e96f3119 added the description about not-null
constraints in the output of \d+ command. It hided constraints' origin
when it is marked as NO INHERIT.  Show their origin irrespective of
the NO INHERIT state.
---
 src/bin/psql/describe.c   | 22 +++---
 src/test/regress/expected/constraints.out | 12 ++---
 src/test/regress/expected/create_table.out|  6 +--
 .../regress/expected/create_table_like.out|  6 +--
 src/test/regress/expected/foreign_data.out| 44 +--
 src/test/regress/expected/generated.out   |  2 +-
 src/test/regress/expected/inherit.out | 30 ++---
 7 files changed, 66 insertions(+), 56 deletions(-)

diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index bac94a338c..3bf1c0cb97 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3079,16 +3079,26 @@ describeOneTableDetails(const char *schemaname,
 			/* Might be an empty set - that's ok */
 			for (i = 0; i < tuples; i++)
 			{
+bool		noinherit = PQgetvalue(result, i, 2)[0] == 't';
 bool		islocal = PQgetvalue(result, i, 3)[0] == 't';
 bool		inherited = PQgetvalue(result, i, 4)[0] == 't';
 
-printfPQExpBuffer(&buf, "\"%s\" NOT NULL \"%s\"%s",
+printfPQExpBuffer(&buf, "\"%s\" NOT NULL \"%s\"",
   PQgetvalue(result, i, 0),
-  PQgetvalue(result, i, 1),
-  PQgetvalue(result, i, 2)[0] == 't' ?
-  " NO INHERIT" :
-  islocal && inherited ? _(" (local, inherited)") :
-  inherited ? _(" (inherited)") : "");
+  PQgetvalue(result, i, 1));
+

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-08-28 Thread John Naylor

On Sun, Aug 27, 2023 at 7:53 PM Masahiko Sawada 
wrote:
>
> I've updated the regression tests for tidstore so that it uses SQL
> functions to add blocks/offsets and dump its contents. The new test
> covers the same test coverages but it's executed using SQL functions
> instead of executing all tests in one SQL function.

This is much nicer and more flexible, thanks! A few questions/comments:

tidstore_dump_tids() returns a string -- is it difficult to turn this into
a SRF, or is it just a bit more work?

The lookup test seems fine for now. The output would look nicer with an
"order by tid".

I think we could have the SQL function tidstore_create() take a boolean for
shared memory. That would allow ad-hoc testing without a recompile, if I'm
not mistaken.

+SELECT tidstore_set_block_offsets(blk, array_agg(offsets.off)::int2[])
+  FROM blocks, offsets
+  GROUP BY blk;
+ tidstore_set_block_offsets
+
+
+
+
+
+
+(5 rows)

Calling a void function multiple times leads to vertical whitespace, which
looks a bit strange and may look better with some output, even if
irrelevant:

-SELECT tidstore_set_block_offsets(blk, array_agg(offsets.off)::int2[])
+SELECT row_number() over(order by blk), tidstore_set_block_offsets(blk,
array_agg(offsets.off)::int2[])

 row_number | tidstore_set_block_offsets
+
  1 |
  2 |
  3 |
  4 |
  5 |
(5 rows)

--
John Naylor
EDB: http://www.enterprisedb.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-28 Thread Peter Smith

Hi, here are my comments for patch v26-0002.

==
1. About the PG17 limitation

In my previous review of v25-0002, I suggested that the PG17
limitation should be documented atop one of the source files. See
[1]#3, [1]#7, [1]#10

I just wanted to explain the reason for that suggestion.

Currently, all the new version checks have a comment like "/* Logical
slots can be migrated since PG17. */". I felt that it would be better
if those comments said something more like "/* Logical slots can be
migrated since PG17. See XYZ for details. */". I don't really care
*where* the main explanation lives, but I thought since it is
referenced from multiple places it might be easier to find if it was
atop some file instead of just in a function comment. YMMV.

==
2. Do version checking in check_and_dump_old_cluster instead of inside
get_old_cluster_logical_slot_infos

check_and_dump_old_cluster - Should check version before calling
get_old_cluster_logical_slot_infos
get_old_cluster_logical_slot_infos - Keep a sanity check Assert if you
wish (or do nothing -- e.g. see #3 below)

Refer to [1]#4, [1]#8

Isn't it self-evident from the file/function names what kind of logic
they are intended to have in them? Sure, there may be some exceptions
but unless it is difficult to implement I think most people would
reasonably assume:

- checking code should be in file "check.c"
-- e.g. a function called 'check_and_dump_old_cluster' ought to be
*checking* stuff

- info fetching code should be in file "info.c"

~~

Another motivation for this suggestion becomes more obvious later with
patch 0003. By checking at the "higher" level (in check.c) it means
multiple related functions can all be called under one version check.
Less checking means less code and/or simpler code. For example,
multiple redundant calls to get_old_cluster_count_slots() can be
avoided in patch 0003 by writing *less* code, than v26* currently has.

==
3. count_old_cluster_logical_slots

I think there is nothing special in this logic that will crash if PG
version <= 1600. Keep the Assert for sanity checking if you wish, but
this is already guarded by the call in pg_upgrade.c so perhaps it is
overkill.

--
[1] My review of v25-0002 -
https://www.postgresql.org/message-id/CAHut%2BPtQcou3Bfm9A5SbhFuo2uKK-6u4_j_59so3skAi8Ns03A%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-28 Thread Peter Smith

Hi, here are my review comments for v26-0003

It seems I must defend some of my previous suggestions from v25* [1],
so here goes...

==
src/bin/pg_upgrade/check.c

1. check_and_dump_old_cluster

CURRENT CODE (with v26-0003 patch applied)

/* Extract a list of logical replication slots */
get_old_cluster_logical_slot_infos();

...

/*
* Logical replication slots can be migrated since PG17. See comments atop
* get_old_cluster_logical_slot_infos().
*/
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
{
check_old_cluster_for_lost_slots();

/*
* Do additional checks if a live check is not required. This requires
* that confirmed_flush_lsn of all the slots is the same as the latest
* checkpoint location, but it would be satisfied only when the server
* has been shut down.
*/
if (!live_check)
check_old_cluster_for_confirmed_flush_lsn();
}


SUGGESTION

/*
 * Logical replication slots can be migrated since PG17. See comments atop
 * get_old_cluster_logical_slot_infos().
 */
if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) // NOTE 1a.
{
  /* Extract a list of logical replication slots */
  get_old_cluster_logical_slot_infos();

  if (count_old_cluster_slots()) // NOTE 1b.
  {
check_old_cluster_for_lost_slots();

/*
 * Do additional checks if a live check is not required. This requires
 * that confirmed_flush_lsn of all the slots is the same as the latest
 * checkpoint location, but it would be satisfied only when the server
 * has been shut down.
 */
if (!live_check)
  check_old_cluster_for_confirmed_flush_lsn();
  }
}

~~

Benefits:

1a.
One version check instead of multiple.

~

1b.
Upfront slot counting means
- only call 1 time to count_old_cluster_slots().
- unnecessary calls to other check* functions are avoided

~

1c.
get_old_cluster_logical_slot_infos
- No version check is needed.

check_old_cluster_for_lost_slots
- Call to count_old_cluster_slots is not needed
- Quick exit not needed.

check_old_cluster_for_confirmed_flush_lsn
- Call to count_old_cluster_slots is not needed
- Quick exit not needed.

~~~

2. check_old_cluster_for_lost_slots

+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;

Refer [1]#4. Can remove this because #1b above.

~~~

3. check_old_cluster_for_confirmed_flush_lsn

+ /* Quick exit if the cluster does not have logical slots. */
+ if (count_old_cluster_logical_slots() == 0)
+ return;

Refer [1]#5. Can remove this because #1b above.

~~~

4. .../t/003_logical_replication_slots.pl

/shipped/replicated/

Kuroda-san 26/8 wrote:
You meant to say s/replicated/shipped/, right? Fixed.

No, I meant what I wrote for [1]#7. I was referring to the word
"shipped" in the message 'check changes are shipped to the
subscriber'. Now there are 2 places to change instead of one.

--
[1] my review of v25-0003.
https://www.postgresql.org/message-id/CAHut%2BPsdkhcVG5GY4ZW0DMUF8FG%3DWvjaGN%2BNA4XFLrzxWSQXVA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Autogenerate some wait events code and documentation

2023-08-28 Thread Michael Paquier

On Mon, Jul 17, 2023 at 10:16:02AM +0900, Michael Paquier wrote:
> So you mean to switch a line that now looks like that:
> WAIT_EVENT_FOO_BAR   FooBar"Waiting on Foo Bar."
> To that:
> FOO_BAR   "Waiting on Foo Bar."
> Or even that:
> WAIT_EVENT_FOO_BAR   "Waiting on Foo Bar."
> 
> Sure, it is an improvement for any wait events that use WAIT_EVENT_
> when searching them, but this adds more magic into the LWLock and Lock
> areas if the same conversion is applied there.  Or am I right to
> assume that you'd mean to *not* do any of that for these two classes?
> These can be treated as exceptions in the script when generating the
> wait event names from the enum elements, of course.

I have looked again at that, and switching wait_event_names.txt to use
two columns made of the typedef definitions and the docs like is not a
problem:
FOO_BAR   "Waiting on Foo Bar."

WAIT_EVENT_ is appended to the typedef definitions in the script.  The
wait event names like "FooBar" are generated from the enums by
splitting using their underscores and doing some lc().  Lock and
LWLock don't need to change.  This way, it is easy to grep the wait
events from the source code and match them with wait_event_names.txt.

Thoughts or comments?
--
Michael
From 794f8a95ae3b9d25f3293cbf063cf70c5e8f5197 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 28 Aug 2023 14:47:13 +0900
Subject: [PATCH v3 1/2] Rename wait events with more consistent camelcase
 style

This will help in the introduction of more simplifications with the
generation of wait event data using wait_event_names.txt.  The event
names used the same case-insensitive characters, hence applying lower()
or upper() to the monitoring queries allows the detection of the same
events as before this change.
---
 src/backend/libpq/pqmq.c  |  2 +-
 src/backend/storage/ipc/shm_mq.c  |  6 +-
 .../utils/activity/wait_event_names.txt   | 90 +--
 3 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 253181f47a..38b042804c 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -182,7 +182,7 @@ mq_putmessage(char msgtype, const char *s, size_t len)
 			break;
 
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_PUT_MESSAGE);
+		 WAIT_EVENT_MESSAGE_QUEUE_PUT_MESSAGE);
 		ResetLatch(MyLatch);
 		CHECK_FOR_INTERRUPTS();
 	}
diff --git a/src/backend/storage/ipc/shm_mq.c b/src/backend/storage/ipc/shm_mq.c
index d134a09a19..06d6b73527 100644
--- a/src/backend/storage/ipc/shm_mq.c
+++ b/src/backend/storage/ipc/shm_mq.c
@@ -1017,7 +1017,7 @@ shm_mq_send_bytes(shm_mq_handle *mqh, Size nbytes, const void *data,
 			 * cheaper than setting one that has been reset.
 			 */
 			(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-			 WAIT_EVENT_MQ_SEND);
+			 WAIT_EVENT_MESSAGE_QUEUE_SEND);
 
 			/* Reset the latch so we don't spin. */
 			ResetLatch(MyLatch);
@@ -1163,7 +1163,7 @@ shm_mq_receive_bytes(shm_mq_handle *mqh, Size bytes_needed, bool nowait,
 		 * setting one that has been reset.
 		 */
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_RECEIVE);
+		 WAIT_EVENT_MESSAGE_QUEUE_RECEIVE);
 
 		/* Reset the latch so we don't spin. */
 		ResetLatch(MyLatch);
@@ -1252,7 +1252,7 @@ shm_mq_wait_internal(shm_mq *mq, PGPROC **ptr, BackgroundWorkerHandle *handle)
 
 		/* Wait to be signaled. */
 		(void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
-		 WAIT_EVENT_MQ_INTERNAL);
+		 WAIT_EVENT_MESSAGE_QUEUE_INTERNAL);
 
 		/* Reset the latch so we don't spin. */
 		ResetLatch(MyLatch);
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 4d74f0068e..99ed44cbc8 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -45,15 +45,15 @@
 Section: ClassName - WaitEventActivity
 
 WAIT_EVENT_ARCHIVER_MAIN	ArchiverMain	"Waiting in main loop of archiver process."
-WAIT_EVENT_AUTOVACUUM_MAIN	AutoVacuumMain	"Waiting in main loop of autovacuum launcher process."
-WAIT_EVENT_BGWRITER_HIBERNATE	BgWriterHibernate	"Waiting in background writer process, hibernating."
-WAIT_EVENT_BGWRITER_MAIN	BgWriterMain	"Waiting in main loop of background writer process."
+WAIT_EVENT_AUTOVACUUM_MAIN	AutovacuumMain	"Waiting in main loop of autovacuum launcher process."
+WAIT_EVENT_BGWRITER_HIBERNATE	BgwriterHibernate	"Waiting in background writer process, hibernating."
+WAIT_EVENT_BGWRITER_MAIN	BgwriterMain	"Waiting in main loop of background writer process."
 WAIT_EVENT_CHECKPOINTER_MAIN	CheckpointerMain	"Waiting in main loop of checkpointer process."
 WAIT_EVENT_LOGICAL_APPLY_MAIN	LogicalApplyMain	"Waiting in main loop of logical replication apply process."
 WAIT_EVENT_LOGICAL_LAUNCHER_MAIN	LogicalLauncherMain	"Waiting in main loop of lo

Re: New WAL record to detect the checkpoint redo location

2023-08-28 Thread Dilip Kumar

On Mon, Aug 28, 2023 at 5:14 AM Michael Paquier  wrote:
>
> On Fri, Aug 25, 2023 at 11:08:25AM +0530, Dilip Kumar wrote:
> > Here is the updated version of the patch.
>
> The concept of the patch looks sound to me.  I have a few comments.

Thanks for the review

> +* This special record, however, is not required when we doing a 
> shutdown
> +* checkpoint, as there will be no concurrent wal insertions during 
> that
> +* time.  So, the shutdown checkpoint LSN will be the same as
> +* checkpoint-redo LSN.
>
> This is missing "are", as in "when we are doing a shutdown
> checkpoint".

Fixed

> -   freespace = INSERT_FREESPACE(curInsert);
> -   if (freespace == 0)
>
> The variable "freespace" can be moved within the if block introduced
> by this patch when calculating the REDO location for the shutdown
> case.  And you can do the same with curInsert?

Done, I have also moved code related to computing curInsert in the
same if (shutdown) block.

> -* Compute new REDO record ptr = location of next XLOG record.
> -*
> -* NB: this is NOT necessarily where the checkpoint record itself 
> will be,
> -* since other backends may insert more XLOG records while we're off 
> doing
> -* the buffer flush work.  Those XLOG records are logically after the
> -* checkpoint, even though physically before it.  Got that?
> +* In case of shutdown checkpoint, compute new REDO record
> +* ptr = location of next XLOG record.
>
> It seems to me that keeping around this comment is important,
> particularly for the case where we have a shutdown checkpoint and we
> expect nothing to run, no?

I removed this mainly because now in other comments[1] where we are
introducing this new CHECKPOINT_REDO record we are explaining the
problem
that the redo location and the actual checkpoint records are not at
the same place and that is because of the concurrent xlog insertion.
I think we are explaining in more
detail by also stating that in case of a shutdown checkpoint, there
would not be any concurrent insertion so the shutdown checkpoint redo
will be at the same place.  So I feel keeping old comments is not
required.  And we are explaining it when we are setting this for
non-shutdown checkpoint because for shutdown checkpoint this statement
is anyway not correct because for the shutdown checkpoint the
checkpoint record will be at the same location and there will be no
concurrent wal insertion, what do you think?

[1]
+ /*
+ * Insert a dummy CHECKPOINT_REDO record and set start LSN of this record
+ * as checkpoint.redo.  Although we have the checkpoint record that also
+ * contains the exact redo lsn, there might have been some other records
+ * those got inserted between the redo lsn and the actual checkpoint
+ * record.  So when processing the wal, we cannot rely on the checkpoint
+ * record if we want to stop at the checkpoint-redo LSN.
+ *
+ * This special record, however, is not required when we are doing a
+ * shutdown checkpoint, as there will be no concurrent wal insertions
+ * during that time.  So, the shutdown checkpoint LSN will be the same as
+ * checkpoint-redo LSN.
+ */

>
> How about adding a test in pg_walinspect?  There is an online
> checkpoint running there, so you could just add something like that
> to check that the REDO record is at the expected LSN stored in the
> control file:
> +-- Check presence of REDO record.
> +SELECT redo_lsn FROM pg_control_checkpoint() \gset
> +SELECT start_lsn = :'redo_lsn'::pg_lsn AS same_lsn, record_type
> +  FROM pg_get_wal_record_info(:'redo_lsn');
> --

Done, thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v4-0001-New-WAL-record-for-checkpoint-redo-location.patch
Description: Binary data

Re: Return value of pg_promote()

2023-08-28 Thread Michael Paquier

On Mon, Aug 28, 2023 at 11:50:45AM +0530, Ashutosh Sharma wrote:
> Thanks for reviewing the patch and adding a CF entry for it. PFA patch
> that addresses your review comments.

That looks OK seen from here.  Perhaps others have more comments?

> And... Sorry for the delayed response. I totally missed it.

No problem.
--
Michael


signature.asc
Description: PGP signature

Re: Support prepared statement invalidation when result types change

2023-08-28 Thread jian he

On Sat, Aug 26, 2023 at 1:58 AM Jelte Fennema  wrote:
>
> The cached plan for a prepared statements can get invalidated when DDL
> changes the tables used in the query, or when search_path changes. When
> this happens the prepared statement can still be executed, but it will
> be replanned in the new context. This means that the prepared statement
> will do something different e.g. in case of search_path changes it will
> select data from a completely different table. This won't throw an
> error, because it is considered the responsibility of the operator and
> query writers that the query will still do the intended thing.
>
> However, we would throw an error if the the result of the query is of a
> different type than it was before:
> ERROR: cached plan must not change result type
>
> This requirement was not documented anywhere and it
> can thus be a surprising error to hit. But it's actually not needed for
> this to be an error, as long as we send the correct RowDescription there
> does not have to be a problem for clients when the result types or
> column counts change.
>
> This patch starts to allow a prepared statement to continue to work even
> when the result type changes.
>
> Without this change all clients that automatically prepare queries as a
> performance optimization will need to handle or avoid the error somehow,
> often resulting in deallocating and re-preparing queries when its
> usually not necessary. With this change connection poolers can also
> safely prepare the same query only once on a connection and share this
> one prepared query across clients that prepared that exact same query.
>
> Some relevant previous discussions:
> [1]: 
> https://www.postgresql.org/message-id/flat/CAB%3DJe-GQOW7kU9Hn3AqP1vhaZg_wE9Lz6F4jSp-7cm9_M6DyVA%40mail.gmail.com
> [2]: 
> https://stackoverflow.com/questions/2783813/postgres-error-cached-plan-must-not-change-result-type
> [3]: 
> https://stackoverflow.com/questions/42119365/how-to-avoid-cached-plan-must-not-change-result-type-error
> [4]: https://github.com/pgjdbc/pgjdbc/pull/451
> [5]: https://github.com/pgbouncer/pgbouncer/pull/845#discussion_r1305295551
> [6]: https://github.com/jackc/pgx/issues/927
> [7]: 
> https://elixirforum.com/t/postgrex-errors-with-cached-plan-must-not-change-result-type-during-migration/51235/2
> [8]: https://github.com/rails/rails/issues/12330

prepared statement with no parameters, tested many cases (add column,
change column data type, rename column, set default, set not null), it
worked as expected.
With parameters, it also works, only a tiny issue with error reporting.

prepstmt2 | PREPARE prepstmt2(bigint) AS SELECT * FROM pcachetest
WHERE q1 = $1; | {bigint}| {bigint,bigint,bigint}
ERROR:  column "q1" does not exist at character 61
HINT:  Perhaps you meant to reference the column "pcachetest.x1".
STATEMENT:  execute prepstmt2(1);

I think "character 61" refer to "PREPARE prepstmt2(bigint) AS SELECT *
FROM pcachetest WHERE q1 = $1;"
so maybe the STATEMENT is slightly misleading.

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-08-28 Thread Amul Sul

On Thu, Aug 24, 2023 at 9:36 AM Vaibhav Dalvi <
vaibhav.da...@enterprisedb.com> wrote:

> Hi Amul,
>
>
> On Wed, Aug 2, 2023 at 4:06 PM Amul Sul  wrote:
>
>> Hi,
>>
>> Currently, we have an option to drop the expression of stored generated
>> columns
>> as:
>>
>> ALTER [ COLUMN ] column_name DROP EXPRESSION [ IF EXISTS ]
>>
>> But don't have support to update that expression. The attached patch
>> provides
>> that as:
>>
>> ALTER [ COLUMN ] column_name SET EXPRESSION expression
>>
>> +1 to the idea.
>

Thank you.


> 3. The AlteredTableInfo structure has member Relation, So need to pass
> parameter Relation separately?
>
>> static ObjectAddress ATExecColumnExpression(AlteredTableInfo *tab,
>> Relation rel,
>>  const char *colName, Node *newDefault,
>>  bool missing_ok, LOCKMODE lockmode);
>
>
Yeah, but I think, let it be since other AT routines have the same.

Thanks for the review comments, I have fixed those in the attached version.
In
addition to that, extended syntax to have the STORE keyword as suggested by
Vik.

Regards,
Amul


v2-0001-Prerequisite-changes-rename-functions-enum.patch
Description: Binary data


v2-0002-Allow-to-change-generated-column-expression.patch
Description: Binary data

[PATCH v1] PQputCopyEnd never returns 0, fix the inaccurate comment

2023-08-28 Thread Junwang Zhao

PQputCopyEnd returns 1 or -1, never 0, I guess the comment was
copy/paste from PQputCopyData's comment, this should be fixed.

-- 
Regards
Junwang Zhao


v1-0001-PQputCopyEnd-never-returns-0-fix-the-inaccurate-c.patch
Description: Binary data

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-08-28 Thread Amul Sul

On Fri, Aug 25, 2023 at 5:35 AM Vik Fearing  wrote:

> On 8/2/23 12:35, Amul Sul wrote:
> > Hi,
> >
> > Currently, we have an option to drop the expression of stored generated
> > columns
> > as:
> >
> > ALTER [ COLUMN ] column_name DROP EXPRESSION [ IF EXISTS ]
> >
> > But don't have support to update that expression. The attached patch
> > provides
> > that as:
> >
> > ALTER [ COLUMN ] column_name SET EXPRESSION expression
>
> I love this idea.  It is something that the standard SQL language is
> lacking and I am submitting a paper to correct that based on this.  I
> will know in October what the committee thinks of it.  Thanks!
>
>
Great, thank you so much.


> > Note that this form of ALTER is meant to work for the column which is
> > already generated.
>
> Why?  SQL does not have a way to convert a non-generated column into a
> generated column, and this seems like as good a way as any.
>

Well, I had to have the same thought but Peter Eisentraut thinks that we
should
have that in a separate patch & I am fine with that.

> To keep the code flow simple, I have renamed the existing function that
> was
> > in use for DROP EXPRESSION so that it can be used for SET EXPRESSION as
> well,
> > which is a similar design as SET/DROP DEFAULT. I kept this renaming code
> > changes in a separate patch to minimize the diff in the main patch.
>
> I don't like this part of the patch at all.  Not only is the
> documentation only half baked, but the entire concept of the two
> commands is different.  Especially since I believe the command should
> also create a generated column from a non-generated one.
>

I am not sure I understood this, why would that break the documentation
even if
we allow non-generated columns to be generated. This makes the code flow
simple
and doesn't have any issue for the future extension to allow non-generated
columns too.


>
> Is is possible to compare the old and new expressions and no-op if they
> are the same?
>
>
> psql (17devel)
> Type "help" for help.
>
> postgres=# create table t (c integer generated always as (null) stored);
> CREATE TABLE
> postgres=# select relfilenode from pg_class where oid = 't'::regclass;
>   relfilenode
> -
> 16384
> (1 row)
>
> postgres=# alter table t alter column c set expression (null);
> ALTER TABLE
> postgres=# select relfilenode from pg_class where oid = 't'::regclass;
>   relfilenode
> -
> 16393
> (1 row)
>
>
> I am not saying we should make every useless case avoid rewriting the
> table, but if there are simple wins, we should take them.  (I don't know
> how feasible this is.)
>

I think that is feasible, but I am not sure if we want to do that & add an
extra
code for the case, which is not really breaking anything except making the
system do extra work for the user's thoughtless action.


>
> I think repeating the STORED keyword should be required here to
> future-proof virtual generated columns.
>

Agree, added in the v2 version posted a few minutes ago.

Regards,
Amul

Re: Strange presentaion related to inheritance in \d+

2023-08-28 Thread Alvaro Herrera

On 2023-Aug-28, Kyotaro Horiguchi wrote:

> But with these tables:
> 
> create table p (a int, b int not null default 0);
> create table c1 (a int, b int not null NO INHERIT default 1) inherits (p);
> 
> I get:
> 
> > Not-null constraints:
> >"c1_b_not_null" NOT NULL "b" *NO INHERIT*
> 
> Here, "NO INHERIT" is mapped from connoinherit, and conislocal and
> "coninhcount <> 0" align with "local" and "inherited". For a clearer
> picuture, those values for c1 are as follows.

Hmm, I think the bug here is that we let you create a constraint in c1
that is NOINHERIT.  If the parent already has one INHERIT constraint
in that column, then the child must have that one also; it's not
possible to have both a constraint that inherits and one that doesn't.

I understand that there are only three possibilities for a NOT NULL
constraint in a column:

- There's a NO INHERIT constraint.  A NO INHERIT constraint is always
  defined locally in that table.  In this case, if there is a parent
  relation, then it must either not have a NOT NULL constraint in that
  column, or it may also have a NO INHERIT one.  Therefore, it's
  correct to print NO INHERIT and nothing else.  We could also print
  "(local)" but I see no point in doing that.

- A constraint comes inherited from one or more parent tables and has no
  local definition.  In this case, the constraint always inherits
  (otherwise, the parent wouldn't have given it to this table).  So
  printing "(inherited)" and nothing else is correct.

- A constraint can have a local definition and also be inherited.  In
  this case, printing "(local, inherited)" is correct.

Have I missed other cases?


The NO INHERIT bit is part of the syntax, which is why I put it in
uppercase and not marked it for translation.  The other two are
informational, so they are translatable.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"The important things in the world are problems with society that we don't
understand at all. The machines will become more complicated but they won't
be more complicated than the societies that run them."(Freeman Dyson)

Wrong usage of pqMsg_Close message code?

2023-08-28 Thread Pavel Stehule

Hi

I workin with protocol and reading related code.

I found in routine EndCommand one strange thing

void
EndCommand(const QueryCompletion *qc, CommandDest dest, bool
force_undecorated_output)
{
<-->char<--><-->completionTag[COMPLETION_TAG_BUFSIZE];
<-->Size<--><-->len;

<-->switch (dest)
<-->{
<--><-->case DestRemote:
<--><-->case DestRemoteExecute:
<--><-->case DestRemoteSimple:

<--><--><-->len = BuildQueryCompletionString(completionTag, qc,
<--><--><--><--><--><--><--><--><--><--><--> force_undecorated_output);
<--><--><-->pq_putmessage(PqMsg_Close, completionTag, len + 1);

<--><-->case DestNone:

There is message PqMsgClose, but this should be used from client side. Here
should be used PqMsg_CommandComplete instead?

Regards

Pavel

Re: [PATCH v1] PQputCopyEnd never returns 0, fix the inaccurate comment

2023-08-28 Thread Aleksander Alekseev

Hi Junwang,

> PQputCopyEnd returns 1 or -1, never 0, I guess the comment was
> copy/paste from PQputCopyData's comment, this should be fixed.

The patch LGTM but I wonder whether we should also change all the
existing calls of PQputCopyEnd() from:

```
PQputCopyEnd(...) <= 0
```

... to:

```
PQputCopyEnd(...) < 0
```

Given the circumstances, checking for equality to zero seems to be at
least strange.


On top of that, none of the PQputCopyData() callers cares whether the
function returns 0 or -1, both are treated the same way. I suspect the
function does some extra work no one asked to do and no one cares
about. Perhaps this function should be refactored too for consistency.

Thoughts?

-- 
Best regards,
Aleksander Alekseev

RE: persist logical slots to disk during shutdown checkpoint

2023-08-28 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

I also tested for logical slots on the physical standby. PSA the script.
confirmed_flush_lsn for such slots were successfully persisted.

# Topology

In this test nodes are connected each other.

node1 --(physical replication)-->node2--(logical replication)-->node3

# Test method

An attached script did following steps

1. constructed above configurations
2. Inserted data on node1
3. read confirmed_flush_lsn on node2 (a)
4. restarted node2
5. read confirmed_flush_lsn again on node2 (b)
6. compare (a) and (b)

# result

Before patching, (a) and (b) were different value, which meant that logical
slots on physical standby were not saved at shutdown.

```
slot_name | confirmed_flush_lsn 
---+-
 sub   | 0/30003E8
(1 row)

waiting for server to shut down done
server stopped
waiting for server to start done
server started
 slot_name | confirmed_flush_lsn 
---+-
 sub   | 0/3D8
(1 row)
```

After patching, (a) and (b) became the same value. The v4 patch worked well even
if the node is physical standby.

```
slot_name | confirmed_flush_lsn 
---+-
 sub   | 0/30003E8
(1 row)

waiting for server to shut down done
server stopped
waiting for server to start done
server started
 slot_name | confirmed_flush_lsn 
---+-
 sub   | 0/30003E8
(1 row)
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



test_for_physical_standby.sh
Description: test_for_physical_standby.sh

Re: proposal: psql: show current user in prompt

2023-08-28 Thread Pavel Stehule

Hi


>>
>>
>> But afaict there's no problem with using pqParseInput3() and
>> PQexecFinish() even if the message isn't handled as part of the
>> transaction. Some other messages that pqParseInput3 handles which are
>> not part of the transaction are 'N' (Notice) and 'K' (secret key).
>>
>
> I have to recheck it
>

here is new version based on usage of PQexecFinish

Regards

Pavel


>
> Regards
>
> Pavel
>
>
From 4567473ebd7fec9dc836793e8e80244b22e5c96b Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Mon, 24 Jul 2023 20:18:16 +0200
Subject: [PATCH 2/2] Implementation of %N prompt placeholder

It is based on forcing reporting feature"role" GUC to client.
---
 doc/src/sgml/ref/psql-ref.sgml | 19 +-
 src/bin/psql/command.c | 13 +
 src/bin/psql/prompt.c  | 35 ++
 src/bin/psql/settings.h|  1 +
 src/bin/psql/startup.c | 32 +++
 5 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 182e58353f..b5da83013e 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4567,7 +4567,24 @@ testdb=> INSERT INTO my_table VALUES (:'content');
 The port number at which the database server is listening.
   
 
-  
+  
+%N
+
+ 
+  The database role name. This value is specified by command
+  SET ROLE. Until execution of this command
+  the value is set to the database session user name.
+ 
+
+ 
+  This substitution requires PostgreSQL
+  version 16 and up. When you use older version, the empty string
+  is used instead.
+ 
+
+  
+
+  
 %n
 
  
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 1300869d79..bf67febc11 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3861,6 +3861,7 @@ SyncVariables(void)
 {
 	char		vbuf[32];
 	const char *server_version;
+	PGresult   *result;
 
 	/* get stuff from connection */
 	pset.encoding = PQclientEncoding(pset.db);
@@ -3890,6 +3891,18 @@ SyncVariables(void)
 	/* send stuff to it, too */
 	PQsetErrorVerbosity(pset.db, pset.verbosity);
 	PQsetErrorContextVisibility(pset.db, pset.show_context);
+
+	/* link role GUC when it is needed for prompt */
+	if (pset.prompt_shows_role)
+		result = PQlinkParameterStatus(pset.db, "role");
+	else
+		result = PQunlinkParameterStatus(pset.db, "role");
+
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		pg_log_info("cannot set REPORT flag on configuration variable \"role\": %s",
+	PQerrorMessage(pset.db));
+
+	PQclear(result);
 }
 
 /*
diff --git a/src/bin/psql/prompt.c b/src/bin/psql/prompt.c
index 969cd9908e..b0f5158c59 100644
--- a/src/bin/psql/prompt.c
+++ b/src/bin/psql/prompt.c
@@ -165,6 +165,41 @@ get_prompt(promptStatus_t status, ConditionalStack cstack)
 	if (pset.db)
 		strlcpy(buf, session_username(), sizeof(buf));
 	break;
+	/* DB server user role */
+case 'N':
+	if (pset.db)
+	{
+		int			minServerMajor;
+		int			serverMajor;
+		const char *rolename = NULL;
+
+		/*
+		 * This feature requires GUC "role" to be marked
+		 * as GUC_REPORT. Without it is hard to specify fallback
+		 * result. Returning empty value can be messy, returning
+		 * PQuser like session_username can be messy too.
+		 * Exec query is not too practical too, because it doesn't
+		 * work when session is not in transactional state, and
+		 * CURRENT_ROLE returns different result when role is not
+		 * explicitly specified by SET ROLE.
+		 */
+		minServerMajor = 1600;
+		serverMajor = PQserverVersion(pset.db) / 100;
+		if (serverMajor >= minServerMajor)
+		{
+			rolename  = PQparameterStatus(pset.db, "role");
+
+			/* fallback when role is not set yet */
+			if (rolename && strcmp(rolename, "none") == 0)
+rolename = session_username();
+		}
+
+		if (rolename)
+			strlcpy(buf, rolename, sizeof(buf));
+		else
+			buf[0] = '\0';
+	}
+	break;
 	/* backend pid */
 case 'p':
 	if (pset.db)
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 1106954236..cb7c12bd1d 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -154,6 +154,7 @@ typedef struct _psqlSettings
 	PGVerbosity verbosity;		/* current error verbosity level */
 	bool		show_all_results;
 	PGContextVisibility show_context;	/* current context display level */
+	bool		prompt_shows_role;
 } PsqlSettings;
 
 extern PsqlSettings pset;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 5a28b6f713..0dac396525 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -1094,10 +1094,40 @@ histcontrol_hook(const char *newval)
 	return true;
 }
 
+static void
+prompt_needs_role_parameter_status(void)
+{

Re: Wrong usage of pqMsg_Close message code?

2023-08-28 Thread Aleksander Alekseev

Hi Pavel,

> There is message PqMsgClose, but this should be used from client side. Here 
> should be used PqMsg_CommandComplete instead?

It seems so. This change was introduced in f4b54e1ed98 [1]:

```
--- a/src/backend/tcop/dest.c
+++ b/src/backend/tcop/dest.c
@@ -176,7 +176,7 @@ EndCommand(const QueryCompletion *qc, CommandDest
dest, bool force_undecorated_o

len = BuildQueryCompletionString(completionTag, qc,

  force_undecorated_output);
-   pq_putmessage('C', completionTag, len + 1);
+   pq_putmessage(PqMsg_Close, completionTag, len + 1);

case DestNone:
case DestDebug
```

It should have been PqMsg_CommandComplete.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f4b54e1ed98

-- 
Best regards,
Aleksander Alekseev

Re: Return value of pg_promote()

2023-08-28 Thread Laurenz Albe

On Thu, 2023-08-17 at 09:37 +0900, Michael Paquier wrote:
> I have just noticed that we do not have a CF entry for this proposal,
> so I have added one with Laurenz as author:
> https://commitfest.postgresql.org/44/4504/

I have changed the author to Fujii Masao.

Yours,
Laurenz Albe

Re: Disabling Heap-Only Tuples

2023-08-28 Thread Laurenz Albe

On Thu, 2023-08-24 at 18:23 +0200, Matthias van de Meent wrote:
> On Wed, 19 Jul 2023 at 15:13, Thom Brown  wrote:
> > 
> > On Wed, 19 Jul 2023, 13:58 Laurenz Albe,  wrote:
> > > I agree that the name "max_local_update" could be improved.
> > > Perhaps "avoid_hot_above_size_mb".
> > 
> > Or "hot_table_size_threshold" or "hot_update_limit"?
> 
> Although I like these names, it doesn't quite cover the use of the
> parameter for me, as updated tuples prefer to be inserted on the same
> page as the old tuple regardless of whether HOT applies.
> 
> How about 'local_update_limit'?

I agree with your concern.  I cannot think of a better name than yours.

Yours,
Laurenz Albe

Re: abi-compliance-checker

2023-08-28 Thread Peter Eisentraut


On 10.06.23 22:24, Andres Freund wrote:

Looks like we ought to add --exported-interfaces-only?


Btw., this option requires libabigail 2.1, which isn't available 
everywhere yet.  For example, Debian oldstable (used on Cirrus) doesn't 
have it yet.  So I'll leave this patch set as is for now.  If it turns 
out that this is the right option and we want to proceed with this patch 
set, we might need to think about a version check or something.

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

2023-08-28 Thread Daniel Gustafsson

> On 23 Aug 2023, at 23:12, Daniel Gustafsson  wrote:
> 
>> On 23 Aug 2023, at 23:02, Tom Lane  wrote:
>> 
>> Daniel Gustafsson  writes:
>>> On 23 Aug 2023, at 21:22, Andres Freund  wrote:
 I think there's more effective ways to make this cheaper. The basic thing
 would be to use libpq instead of forking of psql to make a connection
 check.
>> 
>>> I had it in my head that not using libpq in pg_regress was a deliberate 
>>> choice,
>>> but I fail to find a reference to it in the archives.
>> 
>> I have a vague feeling that you are right about that.  Perhaps the
>> concern was that under "make installcheck", pg_regress might be
>> using a build-tree copy of libpq rather than the one from the
>> system under test.  As long as we're just trying to ping the server,
>> that shouldn't matter too much I think ... unless we hit problems
>> with, say, a different default port number or socket path compiled into
>> one copy vs. the other?  That seems like it's probably a "so don't
>> do that" case, though.
> 
> Ah yes, that does ring a familiar bell.  I agree that using it for pinging the
> server should be safe either way, but we should document the use-with-caution
> in pg_regress.c if/when we go down that path.  I'll take a stab at changing 
> the
> psql retry loop for pinging tomorrow to see what it would look like.

Attached is a patch with a quick PoC for using PQPing instead of using psql for
connection checks in pg_regress.  In order to see performance it also includes
a diag output for "Time to first test" which contains all setup costs.  This
might not make it into a commit but it was quite helpful in hacking so I left
it in for now.

The patch incorporates Andres' idea for finer granularity of checks by checking
TICKS times per second rather than once per second, it also shifts the
pg_usleep around to require just one ping in most cases compard to two today.

On my relatively tired laptop this speeds up pg_regress setup with 100+ms with
much bigger wins on Windows in the CI.  While it does add a dependency on
libpq, I think it's a fairly decent price to pay for running tests faster.

--
Daniel Gustafsson

v1-0001-Speed-up-pg_regress-server-testing.patch
Description: Binary data

Re: cataloguing NOT NULL constraints

2023-08-28 Thread Peter Eisentraut


On 25.08.23 13:38, Alvaro Herrera wrote:

I have now pushed this again.  Hopefully it'll stick this time.

We may want to make some further tweaks to the behavior in some cases --
for example, don't disallow ALTER TABLE DROP NOT NULL when the
constraint is both inherited and has a local definition; the other
option is to mark the constraint as no longer having a local definition.
I left it the other way because that's what CHECK does; maybe we would
like to change both at once.

I ran it through CI, and the pg_upgrade test with a dump from 14's
regression test database and everything worked well, but it's been a
while since I tested the sepgsql part of it, so that might the first
thing to explode.


It looks like we forgot about domain constraints?  For example,

create domain testdomain as int not null;

should create a row in pg_constraint?

Re: proposal: psql: show current user in prompt

2023-08-28 Thread Pavel Stehule

po 28. 8. 2023 v 13:58 odesílatel Pavel Stehule 
napsal:

> Hi
>
>
>>>
>>>
>>> But afaict there's no problem with using pqParseInput3() and
>>> PQexecFinish() even if the message isn't handled as part of the
>>> transaction. Some other messages that pqParseInput3 handles which are
>>> not part of the transaction are 'N' (Notice) and 'K' (secret key).
>>>
>>
>> I have to recheck it
>>
>
> here is new version based on usage of PQexecFinish
>

with protocol test

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
>
From 4567473ebd7fec9dc836793e8e80244b22e5c96b Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Mon, 24 Jul 2023 20:18:16 +0200
Subject: [PATCH 2/3] Implementation of %N prompt placeholder

It is based on forcing reporting feature"role" GUC to client.
---
 doc/src/sgml/ref/psql-ref.sgml | 19 +-
 src/bin/psql/command.c | 13 +
 src/bin/psql/prompt.c  | 35 ++
 src/bin/psql/settings.h|  1 +
 src/bin/psql/startup.c | 32 +++
 5 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 182e58353f..b5da83013e 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4567,7 +4567,24 @@ testdb=> INSERT INTO my_table VALUES (:'content');
 The port number at which the database server is listening.
   
 
-  
+  
+%N
+
+ 
+  The database role name. This value is specified by command
+  SET ROLE. Until execution of this command
+  the value is set to the database session user name.
+ 
+
+ 
+  This substitution requires PostgreSQL
+  version 16 and up. When you use older version, the empty string
+  is used instead.
+ 
+
+  
+
+  
 %n
 
  
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 1300869d79..bf67febc11 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -3861,6 +3861,7 @@ SyncVariables(void)
 {
 	char		vbuf[32];
 	const char *server_version;
+	PGresult   *result;
 
 	/* get stuff from connection */
 	pset.encoding = PQclientEncoding(pset.db);
@@ -3890,6 +3891,18 @@ SyncVariables(void)
 	/* send stuff to it, too */
 	PQsetErrorVerbosity(pset.db, pset.verbosity);
 	PQsetErrorContextVisibility(pset.db, pset.show_context);
+
+	/* link role GUC when it is needed for prompt */
+	if (pset.prompt_shows_role)
+		result = PQlinkParameterStatus(pset.db, "role");
+	else
+		result = PQunlinkParameterStatus(pset.db, "role");
+
+	if (PQresultStatus(result) != PGRES_COMMAND_OK)
+		pg_log_info("cannot set REPORT flag on configuration variable \"role\": %s",
+	PQerrorMessage(pset.db));
+
+	PQclear(result);
 }
 
 /*
diff --git a/src/bin/psql/prompt.c b/src/bin/psql/prompt.c
index 969cd9908e..b0f5158c59 100644
--- a/src/bin/psql/prompt.c
+++ b/src/bin/psql/prompt.c
@@ -165,6 +165,41 @@ get_prompt(promptStatus_t status, ConditionalStack cstack)
 	if (pset.db)
 		strlcpy(buf, session_username(), sizeof(buf));
 	break;
+	/* DB server user role */
+case 'N':
+	if (pset.db)
+	{
+		int			minServerMajor;
+		int			serverMajor;
+		const char *rolename = NULL;
+
+		/*
+		 * This feature requires GUC "role" to be marked
+		 * as GUC_REPORT. Without it is hard to specify fallback
+		 * result. Returning empty value can be messy, returning
+		 * PQuser like session_username can be messy too.
+		 * Exec query is not too practical too, because it doesn't
+		 * work when session is not in transactional state, and
+		 * CURRENT_ROLE returns different result when role is not
+		 * explicitly specified by SET ROLE.
+		 */
+		minServerMajor = 1600;
+		serverMajor = PQserverVersion(pset.db) / 100;
+		if (serverMajor >= minServerMajor)
+		{
+			rolename  = PQparameterStatus(pset.db, "role");
+
+			/* fallback when role is not set yet */
+			if (rolename && strcmp(rolename, "none") == 0)
+rolename = session_username();
+		}
+
+		if (rolename)
+			strlcpy(buf, rolename, sizeof(buf));
+		else
+			buf[0] = '\0';
+	}
+	break;
 	/* backend pid */
 case 'p':
 	if (pset.db)
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 1106954236..cb7c12bd1d 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -154,6 +154,7 @@ typedef struct _psqlSettings
 	PGVerbosity verbosity;		/* current error verbosity level */
 	bool		show_all_results;
 	PGContextVisibility show_context;	/* current context display level */
+	bool		prompt_shows_role;
 } PsqlSettings;
 
 extern PsqlSettings pset;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 5a28b6f713..0dac396525 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -109

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-28 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing! PSA new version patch set.

> ==
> 1. About the PG17 limitation
> 
> In my previous review of v25-0002, I suggested that the PG17
> limitation should be documented atop one of the source files. See
> [1]#3, [1]#7, [1]#10
> 
> I just wanted to explain the reason for that suggestion.
> 
> Currently, all the new version checks have a comment like "/* Logical
> slots can be migrated since PG17. */". I felt that it would be better
> if those comments said something more like "/* Logical slots can be
> migrated since PG17. See XYZ for details. */". I don't really care
> *where* the main explanation lives, but I thought since it is
> referenced from multiple places it might be easier to find if it was
> atop some file instead of just in a function comment. YMMV.
> 
> ==
> 2. Do version checking in check_and_dump_old_cluster instead of inside
> get_old_cluster_logical_slot_infos
> 
> check_and_dump_old_cluster - Should check version before calling
> get_old_cluster_logical_slot_infos
> get_old_cluster_logical_slot_infos - Keep a sanity check Assert if you
> wish (or do nothing -- e.g. see #3 below)
> 
> Refer to [1]#4, [1]#8
> 
> Isn't it self-evident from the file/function names what kind of logic
> they are intended to have in them? Sure, there may be some exceptions
> but unless it is difficult to implement I think most people would
> reasonably assume:
> 
> - checking code should be in file "check.c"
> -- e.g. a function called 'check_and_dump_old_cluster' ought to be
> *checking* stuff
> 
> - info fetching code should be in file "info.c"
> 
> ~~
> 
> Another motivation for this suggestion becomes more obvious later with
> patch 0003. By checking at the "higher" level (in check.c) it means
> multiple related functions can all be called under one version check.
> Less checking means less code and/or simpler code. For example,
> multiple redundant calls to get_old_cluster_count_slots() can be
> avoided in patch 0003 by writing *less* code, than v26* currently has.

IIUC these points were disagreed by Amit, so I would keep my code until he posts
opinions.

> 3. count_old_cluster_logical_slots
> 
> I think there is nothing special in this logic that will crash if PG
> version <= 1600. Keep the Assert for sanity checking if you wish, but
> this is already guarded by the call in pg_upgrade.c so perhaps it is
> overkill.

Your point is right.
I have checked some version-specific functions like 
check_for_aclitem_data_type_usage()
and check_for_user_defined_encoding_conversions(), they do not have assert(). So
removed from it. As for free_db_and_rel_infos(), the Assert() ensures that new
cluster does not have logical slots, so I kept it.

Also, I found that get_loadable_libraries() always read pg_replication_slots,
even if the old cluster is older than PG17. This let additional checks for 
logical
decoding output plugins. Moreover, prior than PG12 could not be upgrade because
they do not have an attribute wal_status.

I think the checking should be done only when old_cluster is >= PG17, so fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v27-0001-Persist-to-disk-logical-slots-during-a-shutdown-.patch
Description:  v27-0001-Persist-to-disk-logical-slots-during-a-shutdown-.patch


v27-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v27-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


v27-0003-pg_upgrade-Add-check-function-for-logical-replic.patch
Description:  v27-0003-pg_upgrade-Add-check-function-for-logical-replic.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-28 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing!

> 1. check_and_dump_old_cluster
> 
> CURRENT CODE (with v26-0003 patch applied)
> 
> /* Extract a list of logical replication slots */
> get_old_cluster_logical_slot_infos();
> 
> ...
> 
> /*
> * Logical replication slots can be migrated since PG17. See comments atop
> * get_old_cluster_logical_slot_infos().
> */
> if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
> {
> check_old_cluster_for_lost_slots();
> 
> /*
> * Do additional checks if a live check is not required. This requires
> * that confirmed_flush_lsn of all the slots is the same as the latest
> * checkpoint location, but it would be satisfied only when the server
> * has been shut down.
> */
> if (!live_check)
> check_old_cluster_for_confirmed_flush_lsn();
> }
> 
> 
> SUGGESTION
> 
> /*
>  * Logical replication slots can be migrated since PG17. See comments atop
>  * get_old_cluster_logical_slot_infos().
>  */
> if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) // NOTE 1a.
> {
>   /* Extract a list of logical replication slots */
>   get_old_cluster_logical_slot_infos();
> 
>   if (count_old_cluster_slots()) // NOTE 1b.
>   {
> check_old_cluster_for_lost_slots();
> 
> /*
>  * Do additional checks if a live check is not required. This requires
>  * that confirmed_flush_lsn of all the slots is the same as the latest
>  * checkpoint location, but it would be satisfied only when the server
>  * has been shut down.
>  */
> if (!live_check)
>   check_old_cluster_for_confirmed_flush_lsn();
>   }
> }
> 
> ~~
> 
> Benefits:
> 
> 1a.
> One version check instead of multiple.
> 
> ~
> 
> 1b.
> Upfront slot counting means
> - only call 1 time to count_old_cluster_slots().
> - unnecessary calls to other check* functions are avoided
> 
> ~
> 
> 1c.
> get_old_cluster_logical_slot_infos
> - No version check is needed.
> 
> check_old_cluster_for_lost_slots
> - Call to count_old_cluster_slots is not needed
> - Quick exit not needed.
> 
> check_old_cluster_for_confirmed_flush_lsn
> - Call to count_old_cluster_slots is not needed
> - Quick exit not needed.
> 
> ~~~
> 
> 2. check_old_cluster_for_lost_slots
> 
> + /* Quick exit if the cluster does not have logical slots. */
> + if (count_old_cluster_logical_slots() == 0)
> + return;
> 
> Refer [1]#4. Can remove this because #1b above.
>
> 3. check_old_cluster_for_confirmed_flush_lsn
> 
> + /* Quick exit if the cluster does not have logical slots. */
> + if (count_old_cluster_logical_slots() == 0)
> + return;
> 
> Refer [1]#5. Can remove this because #1b above.

IIUC these points were disagreed by Amit, so I would keep my code until he posts
opinions.

> 4. .../t/003_logical_replication_slots.pl
> 
> /shipped/replicated/
> 
> Kuroda-san 26/8 wrote:
> You meant to say s/replicated/shipped/, right? Fixed.
> 
> No, I meant what I wrote for [1]#7. I was referring to the word
> "shipped" in the message 'check changes are shipped to the
> subscriber'. Now there are 2 places to change instead of one.
>

Oh, sorry for that. Both places was fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Support prepared statement invalidation when result types change

2023-08-28 Thread Konstantin Knizhnik





On 25.08.2023 8:57 PM, Jelte Fennema wrote:

The cached plan for a prepared statements can get invalidated when DDL
changes the tables used in the query, or when search_path changes. When
this happens the prepared statement can still be executed, but it will
be replanned in the new context. This means that the prepared statement
will do something different e.g. in case of search_path changes it will
select data from a completely different table. This won't throw an
error, because it is considered the responsibility of the operator and
query writers that the query will still do the intended thing.

However, we would throw an error if the the result of the query is of a
different type than it was before:
ERROR: cached plan must not change result type

This requirement was not documented anywhere and it
can thus be a surprising error to hit. But it's actually not needed for
this to be an error, as long as we send the correct RowDescription there
does not have to be a problem for clients when the result types or
column counts change.

This patch starts to allow a prepared statement to continue to work even
when the result type changes.

Without this change all clients that automatically prepare queries as a
performance optimization will need to handle or avoid the error somehow,
often resulting in deallocating and re-preparing queries when its
usually not necessary. With this change connection poolers can also
safely prepare the same query only once on a connection and share this
one prepared query across clients that prepared that exact same query.

Some relevant previous discussions:
[1]: 
https://www.postgresql.org/message-id/flat/CAB%3DJe-GQOW7kU9Hn3AqP1vhaZg_wE9Lz6F4jSp-7cm9_M6DyVA%40mail.gmail.com
[2]: 
https://stackoverflow.com/questions/2783813/postgres-error-cached-plan-must-not-change-result-type
[3]: 
https://stackoverflow.com/questions/42119365/how-to-avoid-cached-plan-must-not-change-result-type-error
[4]: https://github.com/pgjdbc/pgjdbc/pull/451
[5]: https://github.com/pgbouncer/pgbouncer/pull/845#discussion_r1305295551
[6]: https://github.com/jackc/pgx/issues/927
[7]: 
https://elixirforum.com/t/postgrex-errors-with-cached-plan-must-not-change-result-type-during-migration/51235/2
[8]: https://github.com/rails/rails/issues/12330



The following assignment of format is not corrects:

 /* Do nothing if portal won't return tuples */
 if (portal->tupDesc == NULL)
+    {
+        /*
+         * For SELECT like queries we delay filling in the tupDesc 
until after

+         * PortalRunUtility, because we don't know what rows an EXECUTE
+         * command will return. Because we delay setting tupDesc, we 
also need

+         * to delay setting formats. We do this in a pretty hacky way, by
+         * temporarily setting the portal formats to the passed in formats.
+         * Then once we fill in tupDesc, we call PortalSetResultFormat 
again

+         * with portal->formats to fill in the final formats value.
+         */
+        if (portal->strategy == PORTAL_UTIL_SELECT)
+            portal->formats = formats;
     return;


because it is create in other memory context:

postgres.c:
    /* Done storing stuff in portal's context */
    MemoryContextSwitchTo(oldContext);
    ...
 /* Get the result format codes */
    numRFormats = pq_getmsgint(input_message, 2);
    if (numRFormats > 0)
    {
        rformats = palloc_array(int16, numRFormats);
        for (int i = 0; i < numRFormats; i++)
            rformats[i] = pq_getmsgint(input_message, 2);
    }



It has to be copied as below:

    portal->formats = (int16 *)
    MemoryContextAlloc(portal->portalContext,
                       natts * sizeof(int16));
    memcpy(portal->formats, formats, natts * sizeof(int16));


or alternatively  MemoryContextSwitchTo(oldContext) should be moved 
after initialization of rformat

Is pg_regress --use-existing used by anyone or is it broken?

2023-08-28 Thread Daniel Gustafsson

When looking at pg_regress I noticed that the --use-existing support didn't
seem to work.  ISTM that the removal/creation of test databases and roles
doesn't run since the conditional is reversed.  There is also no support for
using a non-default socket directory with PG_REGRESS_SOCK_DIR.  The attached
hack fixes these and allows the tests to execute for me, but even with that the
test_setup suite fails due to the tablespace not being dropped and recreated
like databases and roles.

Is it me who is too thick to get it working, or is it indeed broken?  If it's
the latter, it's been like that for a long time which seems to indicate that it
isn't really used and should probably be removed rather than fixed?

Does anyone here use it?

--
Daniel Gustafsson



pg_regress_use_existing.diff
Description: Binary data

Re: Query execution in Perl TAP tests needs work

2023-08-28 Thread Andrew Dunstan



On 2023-08-28 Mo 01:29, Thomas Munro wrote:

Hi,

Every time we run a SQL query, we fork a new psql process and a new
cold backend process.  It's not free on Unix, and quite a lot worse on
Windows, at around 70ms per query.  Take amcheck/001_verify_heapam for
example.  It runs 272 subtests firing off a stream of queries, and
completes in ~51s on Windows (!), and ~6-9s on the various Unixen, on
CI.

Here are some timestamps I captured from CI by instrumenting various
Perl and C bits:

0.000s: IPC::Run starts
0.023s:   postmaster socket sees connection
0.025s:   postmaster has created child process
0.033s: backend starts running main()
0.039s: backend has reattached to shared memory
0.043s: backend connection authorized message
0.046s: backend has executed and logged query
0.070s: IPC::Run returns

I expected process creation to be slow on that OS, but it seems like
something happening at the end is even slower.  CI shows Windows
consuming 4 CPUs at 100% for a full 10 minutes to run a test suite
that finishes in 2-3 minutes everywhere else with the same number of
CPUs.  Could there be an event handling snafu in IPC::Run or elsewhere
nearby?  It seems like there must be either a busy loop or a busted
sleep/wakeup... somewhere?  But even if there's a weird bug here
waiting to be discovered and fixed, I guess it'll always be too slow
at ~10ms per process spawned, with two processes to spawn, and it's
bad enough on Unix.

As an experiment, I hacked up a not-good-enough-to-share experiment
where $node->safe_psql() would automatically cache a BackgroundPsql
object and reuse it, and the times for that test dropped ~51 -> ~9s on
Windows, and ~7 -> ~2s on the Unixen.  But even that seems non-ideal
(well it's certainly non-ideal the way I hacked it up anyway...).  I
suppose there are quite a few ways we could do better:

1.  Don't fork anything at all: open (and cache) a connection directly
from Perl.
1a.  Write xsub or ffi bindings for libpq.  Or vendor (parts) of the
popular Perl xsub library?
1b.  Write our own mini pure-perl pq client module.  Or vendor (parts)
of some existing one.
2.  Use long-lived psql sessions.
2a.  Something building on BackgroundPsql.
2b.  Maybe give psql or a new libpq-wrapper a new low level stdio/pipe
protocol that is more fun to talk to from Perl/machines?

In some other languages one can do FFI pretty easily so we could use
the in-tree libpq without extra dependencies:


import ctypes
libpq = ctypes.cdll.LoadLibrary("/path/to/libpq.so")
libpq.PQlibVersion()

17

... but it seems you can't do either static C bindings or runtime FFI
from Perl without adding a new library/package dependency.  I'm not
much of a Perl hacker so I don't have any particular feeling.  What
would be best?

This message brought to you by the Lorax.


Thanks for raising this. Windows test times have bothered me for ages.

The standard perl DBI library has a connect_cached method. Of course we 
don't want to be dependent on it, especially if we might have changed 
libpq in what we're testing, and it would place a substantial new burden 
on testers like buildfarm owners.


I like the idea of using a pure perl pq implementation, not least 
because it could expand our ability to test things at the protocol 
level. Not sure how much work it would be. I'm willing to help if we 
want to go that way.


Yes you need an external library to use FFI in perl, but there's one 
that's pretty tiny. See . There 
is also FFI::Platypus, but it involves building a library. OTOH, that's 
the one that's available standard on my Fedora and Ubuntu systems. I 
haven't tried using either Maybe we could use some logic that would use 
the FFI interface if it's available, and fall back on current usage.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: persist logical slots to disk during shutdown checkpoint

2023-08-28 Thread Amit Kapila

On Thu, Aug 24, 2023 at 11:44 AM vignesh C  wrote:
>

The patch looks mostly good to me. I have made minor changes which are
as follows: (a) removed the autovacuum =off and
wal_receiver_status_interval = 0 setting as those doesn't seem to be
required for the test; (b) changed a few comments and variable names
in the code and test;

Shall we change the test file name from always_persist to
save_logical_slots_shutdown and move to recovery/t/ as this test is
about verification after the restart of the server?

-- 
With Regards,
Amit Kapila.


v5-0001-Persist-to-disk-logical-slots-during-a-shutdown-c.patch
Description: Binary data

Re: Make error messages about WAL segment size more consistent

2023-08-28 Thread Peter Eisentraut


On 22.08.23 16:26, Aleksander Alekseev wrote:

Hi Peter,


This started out as a small patch to make pg_controldata use the logging
API instead of printf statements, and then it became a larger patch to
adjust error and warning messages about invalid WAL segment sizes
(IsValidWalSegSize()) across the board.


Thanks for working on this.


I went through and made the
primary messages more compact and made the detail messages uniform.  In
initdb.c and pg_resetwal.c, I use the newish option_parse_int() to
simplify some of the option parsing.  For the backend GUC
wal_segment_size, I added a GUC check hook to do the verification
instead of coding it in bootstrap.c.  This might be overkill, but that
way the check is in the right place and it becomes more self-documenting.


I reviewed the code and tested it on Linux and MacOS with Autotools
and Meson. The patch LGTM.


Thanks, committed.

Re: generic plans and "initial" pruning

2023-08-28 Thread Robert Haas

On Fri, Aug 11, 2023 at 9:50 AM Amit Langote  wrote:
> After removing the unnecessary cleanup code from most node types’ ExecEnd* 
> functions, one thing I’m tempted to do is remove the functions that do 
> nothing else but recurse to close the outerPlan, innerPlan child nodes.  We 
> could instead have ExecEndNode() itself recurse to close outerPlan, innerPlan 
> child nodes at the top, which preserves the close-child-before-self behavior 
> for Gather* nodes, and close node type specific cleanup functions for nodes 
> that do have any local cleanup to do.  Perhaps, we could even use 
> planstate_tree_walker() called at the top instead of the usual bottom so that 
> nodes with a list of child subplans like Append also don’t need to have their 
> own ExecEnd* functions.

I think 0001 needs to be split up. Like, this is code cleanup:

-   /*
-* Free the exprcontext
-*/
-   ExecFreeExprContext(&node->ss.ps);

This is providing for NULL pointers where we don't currently:

-   list_free_deep(aggstate->hash_batches);
+   if (aggstate->hash_batches)
+   list_free_deep(aggstate->hash_batches);

And this is the early return mechanism per se:

+   if (!ExecPlanStillValid(estate))
+   return aggstate;

I think at least those 3 kinds of changes deserve to be in separate
patches with separate commit messages explaining the rationale behind
each e.g. "Remove unnecessary cleanup calls in ExecEnd* functions.
These calls are no longer required, because . Removing them
saves a few CPU cycles and simplifies planned refactoring, so do
that."

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Prevent psql \watch from running queries that return no rows

2023-08-28 Thread Daniel Gustafsson

> On 22 Aug 2023, at 23:23, Greg Sabino Mullane  wrote:
> 
> Thank you for the feedback, everyone. Attached is version 4 of the patch, 
> featuring a few tests and minor rewordings.

I had another look, and did some playing around with this and I think this
version is ready to go in, so I will try to get that sorted shortly.

--
Daniel Gustafsson

Re: [PATCH v1] PQputCopyEnd never returns 0, fix the inaccurate comment

2023-08-28 Thread Junwang Zhao

On Mon, Aug 28, 2023 at 7:48 PM Aleksander Alekseev
 wrote:
>
> Hi Junwang,
>
> > PQputCopyEnd returns 1 or -1, never 0, I guess the comment was
> > copy/paste from PQputCopyData's comment, this should be fixed.
>
> The patch LGTM but I wonder whether we should also change all the
> existing calls of PQputCopyEnd() from:
>
> ```
> PQputCopyEnd(...) <= 0
> ```
>
> ... to:
>
> ```
> PQputCopyEnd(...) < 0
> ```
>
> Given the circumstances, checking for equality to zero seems to be at
> least strange.
>

Yeah, it makes sense to me, or maybe just `PQputCopyEnd(...) == -1`,
let's wait for some other opinions.

>
> On top of that, none of the PQputCopyData() callers cares whether the
> function returns 0 or -1, both are treated the same way. I suspect the
> function does some extra work no one asked to do and no one cares
> about. Perhaps this function should be refactored too for consistency.
>
> Thoughts?
>
> --
> Best regards,
> Aleksander Alekseev



-- 
Regards
Junwang Zhao

Commitfest manager for September

2023-08-28 Thread Peter Eisentraut


I would like to be the commitfest manager for CF 2023-09.

Re: RFC: Logging plan of the running query

2023-08-28 Thread James Coleman

On Mon, Aug 28, 2023 at 3:01 AM torikoshia  wrote:
>
> On 2023-08-26 21:03, James Coleman wrote:
> > On Fri, Aug 25, 2023 at 7:43 AM James Coleman  wrote:
> >>
> >> On Thu, Aug 17, 2023 at 10:02 AM torikoshia
> >>  wrote:
> >> >
> >> > On 2023-06-16 01:34, James Coleman wrote:
> >> > > Attached is v28
> >> > > which sets ProcessLogQueryPlanInterruptActive to false in errfinish
> >> > > when necessary. Once built with those two patches I'm simply running
> >> > > `make check`.
> >> >
> >> > With v28-0001 and v28-0002 patch, I confirmed backend processes consume
> >> > huge
> >> > amount of memory and under some environments they were terminated by OOM
> >> > killer.
> >> >
> >> > This was because memory was allocated from existing memory contexts and
> >> > they
> >> > were not freed after ProcessLogQueryPlanInterrupt().
> >> > Updated the patch to use dedicated memory context for
> >> > ProcessLogQueryPlanInterrupt().
> >> >
> >> > Applying attached patch and v28-0002 patch, `make check` successfully
> >> > completed after 20min and 50GB of logs on my environment.
> >> >
> >> > >>> On 2023-06-15 01:48, James Coleman wrote:
> >> > >>> > The tests have been running since last night, but have been 
> >> > >>> > apparently
> >> > >>> > hung now for many hours.
> >> >
> >> > I don't know if this has anything to do with the hung you faced, but I
> >> > thought
> >> > it might be possible that the large amount of memory usage resulted in
> >> > swapping, which caused a significant delay in processing.
> >>
> >> Ah, yes, I think that could be a possible explanation. I was delaying
> >> on this thread because I wasn't comfortable with having caused an
> >> issue once (even if I couldn't easily reproduce) without at least some
> >> theory as to the cause (and a fix).
> >>
> >> > If possible, I would be very grateful if you could try to reproduce this
> >> > with
> >> > the v29 patch.
> >>
> >> I'll kick off some testing.
> >>
> >
> > I don't have time to investigate what's happening here, but 24 hours
> > later the first "make check" is still running, and at first glance it
> > seems to have the same behavior I'd seen that first time. The test
> > output is to this point:
> >
> > # parallel group (5 tests):  index_including create_view
> > index_including_gist create_index create_index_spgist
> > ok 66+ create_index26365 ms
> > ok 67+ create_index_spgist 27675 ms
> > ok 68+ create_view  1235 ms
> > ok 69+ index_including  1102 ms
> > ok 70+ index_including_gist 1633 ms
> > # parallel group (16 tests):  create_aggregate create_cast errors
> > roleattributes drop_if_exists hash_func typed_table create_am
> > infinite_recurse
> >
> > and it hasn't progressed past that point since at least ~16 hours ago
> > (the first several hours of the run I wasn't monitoring it).
> >
> > I haven't connected up gdb yet, and won't be able to until maybe
> > tomorrow, but here's the ps output for postgres processes that are
> > running:
> >
> > admin3213249  0.0  0.0 196824 20552 ?Ss   Aug25   0:00
> > /home/admin/postgresql-test/bin/postgres -D
> > /home/admin/postgresql-test-data
> > admin3213250  0.0  0.0 196964  7284 ?Ss   Aug25   0:00
> > postgres: checkpointer
> > admin3213251  0.0  0.0 196956  4276 ?Ss   Aug25   0:00
> > postgres: background writer
> > admin3213253  0.0  0.0 196956  8600 ?Ss   Aug25   0:00
> > postgres: walwriter
> > admin3213254  0.0  0.0 198424  7316 ?Ss   Aug25   0:00
> > postgres: autovacuum launcher
> > admin3213255  0.0  0.0 198412  5488 ?Ss   Aug25   0:00
> > postgres: logical replication launcher
> > admin3237967  0.0  0.0   2484   516 pts/4S+   Aug25   0:00
> > /bin/sh -c echo "# +++ regress check in src/test/regress +++" &&
> > PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/bin:/home/admin/postgres/src/test/regress:$PATH"
> > LD_LIBRARY_PATH="/home/admin/postgres/tmp_install/home/admin/postgresql-test/lib"
> > INITDB_TEMPLATE='/home/admin/postgres'/tmp_install/initdb-template
> > ../../../src/test/regress/pg_regress --temp-instance=./tmp_check
> > --inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20
> > --schedule=./parallel_schedule
> > admin3237973  0.0  0.0 197880 20688 pts/4S+   Aug25   0:00
> > postgres -D /home/admin/postgres/src/test/regress/tmp_check/data -F -c
> > listen_addresses= -k /tmp/pg_regress-7mmGUa
> > admin3237976  0.0  0.1 198332 44608 ?Ss   Aug25   0:00
> > postgres: checkpointer
> > admin3237977  0.0  0.0 198032  4640 ?Ss   Aug25   0:00
> > postgres: background writer
> > admin3237979  0.0  0.0 197880  8580 ?Ss   Aug25   0:00
> > postgres: walwriter
> > admin3237980  0.0  0.0 199484  7608 ?Ss   Aug25   0:00
> > postgres: autovacuum launcher
> > admin3237981

Re: Disabling Heap-Only Tuples

2023-08-28 Thread Matthias van de Meent

On Wed, 19 Jul 2023 at 14:58, Laurenz Albe  wrote:
>
> On Thu, 2023-07-06 at 22:18 +0200, Matthias van de Meent wrote:
> > On Wed, 5 Jul 2023 at 19:55, Thom Brown  wrote:
> > >
> > > On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent
> > >  wrote:
> > > > So what were you thinking of? A session GUC? A table option?
> > >
> > > Both.
> >
> > Here's a small patch implementing a new table option max_local_update
> > (name very much bikesheddable). Value is -1 (default, disabled) or the
> > size of the table in MiB that you still want to allow to update on the
> > same page. I didn't yet go for a GUC as I think that has too little
> > control on the impact on the system.
> >
> > I decided that max_local_update would be in MB because there is no
> > reloption value that can contain MaxBlockNumber and -1/disabled; and 1
> > MiB seems like enough granularity for essentially all use cases.
> >
> > The added regression tests show how this feature works, that the new
> > feature works, and validate that lock levels are acceptable
> > (ShareUpdateExclusiveLock, same as for updating fillfactor).
>
> I have looked at your patch, and I must say that I like it.  Having
> a size limit is better than my original idea of just "on" or "off".
> Essentially, it is "try to shrink the table if it grows above a limit".
>
> The patch builds fine and passes all regression tests.
>
> Documentation is missing.

Yes, the first patch was a working proof-of-concept. Here's a new one
with documentation.

> I agree that the name "max_local_update" could be improved.
> Perhaps "avoid_hot_above_size_mb".
>
> --- a/src/include/utils/rel.h
> +++ b/src/include/utils/rel.h
> @@ -342,6 +342,7 @@ typedef struct StdRdOptions
> int parallel_workers;   /* max number of parallel workers */
> StdRdOptIndexCleanup vacuum_index_cleanup;  /* controls index vacuuming */
> boolvacuum_truncate;/* enables vacuum to truncate a relation 
> */
> +   int max_local_update;   /* Updates to pages after this block must 
> go through the VM */
>  } StdRdOptions;
>
>  #define HEAP_MIN_FILLFACTOR10
>
> In the comment, it should be FSM, not VM, right?

Good catch.

In this new patch, I've updated a few comments to get mostly within
line length limits; the name of the storage parameter is now
"local_update_limit", as per discussion on naming.
I've also added local_update_limit to psql's autocomplete file, and
added documentation on how the parameter behaves - including warnings
- in create_table.sgml.

Kind regards,

Matthias van de Meent


v1-0001-Add-heap-reloption-local_update_limit.patch
Description: Binary data

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-28 Thread Christoph Berg

Re: Andres Freund
> > Thanks.  I realised that it's easy enough to test that theory about
> > cleanup locks by hacking ConditionalLockBufferForCleanup() to return
> > false randomly.  Then the test occasionally fails as described.  Seems
> > like we'll need to fix that test, but it's not evidence of a server
> > bug, and my signal handler refactoring patch is in the clear.  Thanks
> > for testing it!
> 
> WRT fixing the test: I think just using VACUUM FREEZE ought to do the job?
> After changing all the VACUUMs to VACUUM FREEZEs, 031_recovery_conflict.pl
> passes even after I make ConditionalLockBufferForCleanup() fail 100%.

I have now applied the last two patches to postgresql-17 so see if the
build is more stable. (So far I had only tried in manual tests.)

Fwiw this is also causing pain on PostgreSQL 16:

https://pgdgbuild.dus.dg-i.net/view/Snapshot/job/postgresql-16-binaries-snapshot/1011/architecture=s390x,distribution=sid/consoleText

Most of the failing builds in
https://pgdgbuild.dus.dg-i.net/view/Snapshot/job/postgresql-16-binaries-snapshot/
are on s390x and likely due to this problem.

This should be fixed before the 16 release.

Christoph

Re: Eager page freeze criteria clarification

2023-08-28 Thread Melanie Plageman

On Fri, Jul 28, 2023 at 3:27 PM Melanie Plageman
 wrote:
> On Fri, Jul 28, 2023 at 3:00 PM Peter Geoghegan  wrote:
> > > Is this test meant to guard against unnecessary freezing or to avoid
> > > freezing when the cost is too high? That is, are we trying to
> > > determine how likely it is that the page has been recently modified
> > > and avoid eager freezing when it would be pointless (because the page
> > > will soon be modified again)?
> >
> > Sort of. This cost of freezing over time is weirdly nonlinear, so it's
> > hard to give a simple answer.
> >
> > The justification for the FPI trigger optimization is that FPIs are
> > overwhelmingly the cost that really matters when it comes to freezing
> > (and vacuuming in general) -- so we might as well make the best out of
> > a bad situation when pruning happens to get an FPI. There can easily
> > be a 10x or more cost difference (measured in total WAL volume)
> > between freezing without an FPI and freezing with an FPI.
> ...
> > In 16 VACUUM just "makes the best
> > out of a bad situation" when an FPI was already required during
> > pruning. We have already "paid for the privilege" of writing some WAL
> > for the page at that point, so it's reasonable to not squander a
> > window of opportunity to avoid future FPIs in future VACUUM
> > operations, by freezing early.
> >
> > We're "taking a chance" on being able to get freezing out of the way
> > early when an FPI triggers freezing. It's not guaranteed to work out
> > in each individual case, of course, but even if we assume it's fairly
> > unlikely to work out (which is very pessimistic) it's still very
> > likely a good deal.
> >
> > This strategy (the 16 strategy of freezing eagerly because we already
> > got an FPI) seems safer than a strategy involving freezing eagerly
> > because we won't get an FPI as a result. If for no other reason than
> > this: with the approach in 16 we already know for sure that we'll have
> > written an FPI anyway. It's hard to imagine somebody being okay with
> > the FPIs, but not being okay with the other extra WAL.
>
> I see. I don't have an opinion on the "best of a bad situation"
> argument. Though, I think it is worth amending the comment in the code
> to include this explanation.
>
> But, ISTM that there should also be some independent heuristic to
> determine whether or not it makes sense to freeze the page. That could
> be related to whether or not it will be cheap to do so (in which case
> we can check if we will have to emit an FPI as part of the freeze
> record) or it could be related to whether or not the freezing is
> likely to be pointless (we are likely to update the page again soon).
>
> It sounds like it was discussed before, but I'd be interested in
> revisiting it and happy to test out various ideas.

Hi, in service of "testing various ideas", I've benchmarked the
heuristics proposed in this thread, as well a few others that Andres and
I considered, for determining whether or not to opportunistically freeze
a page during vacuum. Note that this heuristic would be in addition to
the existing criterion that we only opportunistically freeze pages that
can be subsequently set all frozen in the visibility map.

I believe that there are two goals that should dictate whether or not we
should perform opportunistic freezing:

  1. Is it cheap? For example, if the buffer is already dirty, then no
  write amplification occurs, since it must be written out anyway.
  Freezing is also less expensive if we can do it without emitting an
  FPI.

  2. Will it be effective; that is, will it stay frozen?
  Opportunistically freezing a page that will immediately be modified is
  a waste.

The current heuristic on master meets neither of these goals: it freezes
a page if pruning emitted an FPI for it. This doesn't evaluate whether
or not freezing itself would be cheap, but rather attempts to hide
freezing behind an expensive operation. Furthermore, it often fails to
freeze cold data and may indiscriminately freeze hot data.

For the second goal, I've relied on past data to predict future
behavior, so I tried several criteria to estimate the likelihood that a
page will not be imminently modified. What was most effective was
Andres' suggestion of comparing the page LSN to the insert LSN at the
end of the last vacuum of that table; this approximates whether the page
has been recently modified, which is a decent proxy for whether it'll be
modified in the future. To do this, we need to save that insert LSN
somewhere. In the attached WIP patch, I saved it in the table stats, for
now -- knowing that those are not crash-safe.

Other discarded heuristic ideas included comparing the next transaction
ID at the end of the vacuum of a relation to the visibility cutoff xid
in the page -- but that wasn't effective for freezing data from bulk
loads.

The algorithms I evaluated all attempt to satisfy goal (1) by freezing
only if the buffer is already dirty and also by considering whether or
not

Re: Wrong usage of pqMsg_Close message code?

2023-08-28 Thread Pavel Stehule

Hi

po 28. 8. 2023 v 14:00 odesílatel Aleksander Alekseev <
aleksan...@timescale.com> napsal:

> Hi Pavel,
>
> > There is message PqMsgClose, but this should be used from client side.
> Here should be used PqMsg_CommandComplete instead?
>
> It seems so. This change was introduced in f4b54e1ed98 [1]:
>
> ```
> --- a/src/backend/tcop/dest.c
> +++ b/src/backend/tcop/dest.c
> @@ -176,7 +176,7 @@ EndCommand(const QueryCompletion *qc, CommandDest
> dest, bool force_undecorated_o
>
> len = BuildQueryCompletionString(completionTag, qc,
>
>   force_undecorated_output);
> -   pq_putmessage('C', completionTag, len + 1);
> +   pq_putmessage(PqMsg_Close, completionTag, len + 1);
>
> case DestNone:
> case DestDebug
> ```
>
> It should have been PqMsg_CommandComplete.
>
> [1]:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f4b54e1ed98


here is a patch - all tests passed

Regards

Pavel

>
>
> --
> Best regards,
> Aleksander Alekseev
>
diff --git a/src/backend/tcop/dest.c b/src/backend/tcop/dest.c
index 06d1872b9a..bd6085b7ed 100644
--- a/src/backend/tcop/dest.c
+++ b/src/backend/tcop/dest.c
@@ -176,7 +176,7 @@ EndCommand(const QueryCompletion *qc, CommandDest dest, bool force_undecorated_o
 
 			len = BuildQueryCompletionString(completionTag, qc,
 			 force_undecorated_output);
-			pq_putmessage(PqMsg_Close, completionTag, len + 1);
+			pq_putmessage(PqMsg_CommandComplete, completionTag, len + 1);
 
 		case DestNone:
 		case DestDebug:

Re: Wrong usage of pqMsg_Close message code?

2023-08-28 Thread Aleksander Alekseev

Hi,

> here is a patch - all tests passed

LGTM and added to the nearest CF just in case [1].

[1]: https://commitfest.postgresql.org/44/4523/

-- 
Best regards,
Aleksander Alekseev

Re: Commitfest manager for September

2023-08-28 Thread Aleksander Alekseev

Hi Peter,

> I would like to be the commitfest manager for CF 2023-09.

Many thanks for volunteering! If at some point you will require a bit
of help please let me know.

-- 
Best regards,
Aleksander Alekseev

Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-08-28 Thread Masahiko Sawada

On Mon, Aug 28, 2023 at 4:20 PM John Naylor
 wrote:
>
> On Sun, Aug 27, 2023 at 7:53 PM Masahiko Sawada  wrote:
> >
> > I've updated the regression tests for tidstore so that it uses SQL
> > functions to add blocks/offsets and dump its contents. The new test
> > covers the same test coverages but it's executed using SQL functions
> > instead of executing all tests in one SQL function.
>
> This is much nicer and more flexible, thanks! A few questions/comments:
>
> tidstore_dump_tids() returns a string -- is it difficult to turn this into a 
> SRF, or is it just a bit more work?

It's not difficult. I've changed it in v42 patch.

>
> The lookup test seems fine for now. The output would look nicer with an 
> "order by tid".

Agreed.

>
> I think we could have the SQL function tidstore_create() take a boolean for 
> shared memory. That would allow ad-hoc testing without a recompile, if I'm 
> not mistaken.

Agreed.

>
> +SELECT tidstore_set_block_offsets(blk, array_agg(offsets.off)::int2[])
> +  FROM blocks, offsets
> +  GROUP BY blk;
> + tidstore_set_block_offsets
> +
> +
> +
> +
> +
> +
> +(5 rows)
>
> Calling a void function multiple times leads to vertical whitespace, which 
> looks a bit strange and may look better with some output, even if irrelevant:
>
> -SELECT tidstore_set_block_offsets(blk, array_agg(offsets.off)::int2[])
> +SELECT row_number() over(order by blk), tidstore_set_block_offsets(blk, 
> array_agg(offsets.off)::int2[])
>
>  row_number | tidstore_set_block_offsets
> +
>   1 |
>   2 |
>   3 |
>   4 |
>   5 |
> (5 rows)

Yes, it looks better.

I've attached v42 patch set. I improved tidstore regression test codes
in addition of imcorporating the above comments.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v42-ART.tar.gz
Description: GNU Zip compressed data

Re: Fix error handling in be_tls_open_server()

2023-08-28 Thread Jacob Champion

On Thu, Aug 24, 2023 at 6:25 PM Michael Paquier  wrote:
> LD_PRELOAD is the only thing I can think about, but that's very fancy.
> Even with that, having a certificate with a NULL peer_cn could prove
> to be useful in the SSL suite to stress more patterns around it?

+1. Last we tried it, OpenSSL didn't want to create a certificate with
an embedded null, but maybe things have changed?

--Jacob

Re: Disabling Heap-Only Tuples

2023-08-28 Thread Robert Haas

On Mon, Aug 28, 2023 at 10:52 AM Matthias van de Meent
 wrote:
> In this new patch, I've updated a few comments to get mostly within
> line length limits; the name of the storage parameter is now
> "local_update_limit", as per discussion on naming.
> I've also added local_update_limit to psql's autocomplete file, and
> added documentation on how the parameter behaves - including warnings
> - in create_table.sgml.

I feel like this is the sort of setting that experts will sometimes be
able to use to improve the situation, and non-experts will have great
difficulty using. It relies on the user to know what size limit will
work out well, which probably involves knowing how much real data is
in the table, and how that's going to change over time, and probably
also some things about how PostgreSQL does space management
internally. I don't know that I'd be able to guide a non-expert user
in how to make effective use of this as a tool.

I don't know exactly what to propose, but I would definitely like it
if we could come up with something with which a casual user would be
less likely to shoot themselves in the foot and more likely to derive
a benefit.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Commitfest manager for September

2023-08-28 Thread Melanie Plageman

On Mon, Aug 28, 2023 at 11:36 AM Aleksander Alekseev
 wrote:
>
> Hi Peter,
>
> > I would like to be the commitfest manager for CF 2023-09.
>
> Many thanks for volunteering! If at some point you will require a bit
> of help please let me know.

I too had planned to volunteer to help. I volunteer to do a
triage/summary of patch statuses, as has been done occasionally in the
past [1].
Have folks found this helpful in the past?

[1] 
https://www.postgresql.org/message-id/CAM-w4HOFOUNuOZSpsCfH_ir7dqJNdA1pxkxfaVEvLk5sn6HhsQ%40mail.gmail.com

Re: cataloguing NOT NULL constraints

2023-08-28 Thread Alvaro Herrera

On 2023-Aug-28, Peter Eisentraut wrote:

> It looks like we forgot about domain constraints?  For example,
> 
> create domain testdomain as int not null;
> 
> should create a row in pg_constraint?

Well, at some point I purposefully left them out; they were sufficiently
different from the ones in tables that doing both things at the same
time was not saving any effort.  I guess we could try to bake them in
now.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem."  (Tom Lane)

Re: Disabling Heap-Only Tuples

2023-08-28 Thread Matthias van de Meent

On Mon, 28 Aug 2023 at 17:14, Robert Haas  wrote:
>
> On Mon, Aug 28, 2023 at 10:52 AM Matthias van de Meent
>  wrote:
> > In this new patch, I've updated a few comments to get mostly within
> > line length limits; the name of the storage parameter is now
> > "local_update_limit", as per discussion on naming.
> > I've also added local_update_limit to psql's autocomplete file, and
> > added documentation on how the parameter behaves - including warnings
> > - in create_table.sgml.
>
> I feel like this is the sort of setting that experts will sometimes be
> able to use to improve the situation, and non-experts will have great
> difficulty using. It relies on the user to know what size limit will
> work out well, which probably involves knowing how much real data is
> in the table, and how that's going to change over time, and probably
> also some things about how PostgreSQL does space management
> internally. I don't know that I'd be able to guide a non-expert user
> in how to make effective use of this as a tool.

Agreed on all points. But isn't that true for most most tools on bloat
prevention and/or detection? E.g. fillfactor, autovacuum_*, ...

> I don't know exactly what to propose, but I would definitely like it
> if we could come up with something with which a casual user would be
> less likely to shoot themselves in the foot and more likely to derive
> a benefit.

I'd prefer that too, but by lack of other work in this area this seems
like it fills a niche that would otherwise require extremely expensive
locking over a long time for CLUSTER, superuser+pg_repack, or manual
scripts that update tuples until they're located on a different page
(begin; update tuple WHERE ctid > '(12,0)' returning ctid; ...;
commit;). I agree this is very minimal and can definitely be used as a
footgun, but with the description that it can be a footgun I don't
think it's (much) worse than the current situation - a user should
only reach for this once they've realized they actually have an issue.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-28 Thread Jeff Janes

On Thu, Aug 24, 2023 at 10:05 AM Heikki Linnakangas  wrote:

> On 24/08/2023 15:48, Thomas Munro wrote:
> > LGTM.  I vaguely recall thinking that it might be better to keep
> > EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
> > didn't try this one, but it looks fine with the comment to explain, as
> > you have it.  (It's a shame we can't use O_CLOFORK.)
>
> Yeah, O_CLOFORK would be nice..
>
> Committed, thanks!
>
>
Since this commit, I'm getting a lot (63 per restart) of messages:

 LOG:  could not close client or listen socket: Bad file descriptor

All I have to do to get the message is turn logging_collector = on and
restart.

The close failure condition existed before the commit, it just wasn't
logged before.  So, did the extra logging added here just uncover a
pre-existing bug?

The LOG message is sent to the terminal, not to the log file.

Cheers,

Jeff

Re: Disabling Heap-Only Tuples

2023-08-28 Thread Robert Haas

On Mon, Aug 28, 2023 at 11:50 AM Matthias van de Meent
 wrote:
> Agreed on all points. But isn't that true for most most tools on bloat
> prevention and/or detection? E.g. fillfactor, autovacuum_*, ...

Not nearly to the same extent, IMHO. A lot of those parameters can be
left alone forever and you lose nothing. That's not so here.

> I'd prefer that too, but by lack of other work in this area this seems
> like it fills a niche that would otherwise require extremely expensive
> locking over a long time for CLUSTER, superuser+pg_repack, or manual
> scripts that update tuples until they're located on a different page
> (begin; update tuple WHERE ctid > '(12,0)' returning ctid; ...;
> commit;). I agree this is very minimal and can definitely be used as a
> footgun, but with the description that it can be a footgun I don't
> think it's (much) worse than the current situation - a user should
> only reach for this once they've realized they actually have an issue.

Well, I sort of expected that counter-argument, but I'm not sure that I buy it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Commitfest manager for September

2023-08-28 Thread Jesper Pedersen


Hi Melanie,

On 8/28/23 11:42, Melanie Plageman wrote:

I too had planned to volunteer to help. I volunteer to do a
triage/summary of patch statuses, as has been done occasionally in the
past [1].
Have folks found this helpful in the past?



Having a summary to begin with is very helpful for reviewers.


Best regards,

 Jesper

Re: Eager page freeze criteria clarification

2023-08-28 Thread Robert Haas

On Mon, Aug 28, 2023 at 10:00 AM Melanie Plageman
 wrote:
> For the second goal, I've relied on past data to predict future
> behavior, so I tried several criteria to estimate the likelihood that a
> page will not be imminently modified. What was most effective was
> Andres' suggestion of comparing the page LSN to the insert LSN at the
> end of the last vacuum of that table; this approximates whether the page
> has been recently modified, which is a decent proxy for whether it'll be
> modified in the future. To do this, we need to save that insert LSN
> somewhere. In the attached WIP patch, I saved it in the table stats, for
> now -- knowing that those are not crash-safe.

I wonder what the real plan here is for where to store this. It's not
obvious that we need this to be crash-safe; it's after all only for
use by a heuristic, and there's no actual breakage if the heuristic
goes wrong. At the same time, it doesn't exactly feel like a
statistic.

Then there's the question of whether it's the right metric. My first
reaction is to think that it sounds pretty good. One thing I really
like about it is that if the table is being vacuumed frequently, then
we freeze less aggressively, and if the table is being vacuumed
infrequently, then we freeze more aggressively. That seems like a very
desirable property. It also seems broadly good that this metric
doesn't really care about reads. If there are a lot of reads on the
system, or no reads at all, it doesn't really change the chances that
a certain page is going to be written again soon, and since reads
don't change the insert LSN, here again it seems to do the right
thing. I'm a little less clear about whether it's good that it doesn't
really depend on wall-clock time. Certainly, that's desirable from the
point of view of not wanting to have to measure wall-clock time in
places where we otherwise wouldn't have to, which tends to end up
being expensive. However, if I were making all of my freezing
decisions manually, I might be more freeze-positive on a low-velocity
system where writes are more stretched out across time than on a
high-velocity system where we're blasting through the LSN space at a
higher rate. But maybe that's not a very important consideration, and
I don't know what we'd do about it anyway.

>Page Freezes/Page Frozen (less is better)
>
> |   | Master | (1) | (2) | (3) | (4) | (5) |
> |---++-+-+-+-+-|
> | A |  28.50 |3.89 |1.08 |1.15 |1.10 |1.10 |
> | B |   1.00 |1.06 |1.65 |1.03 |1.59 |1.00 |
> | C |N/A |1.00 |1.00 |1.00 |1.00 |1.00 |
> | D |   2.00 | 5199.15 | 5276.85 | 4830.45 | 5234.55 | 2193.55 |
> | E |   7.90 |3.21 |2.73 |2.70 |2.69 |2.43 |
> | F |N/A |1.00 |1.00 |1.00 |1.00 |1.00 |
> | G |N/A |1.00 |1.00 |1.00 |1.00 |1.00 |
> | H |N/A |1.00 |1.00 |1.00 |1.00 |1.00 |
> | I |N/A |   42.00 |   42.00 | N/A |   41.00 | N/A |

Hmm. I would say that the interesting rows here are A, D, and I, with
rows C and E deserving honorable mention. In row A, master is bad. In
row D, your algorithms are all bad, really bad. I don't quite
understand how it can be that bad, actually. Row I looks bad for
algorithms 1, 2, and 4: they freeze pages because it looks cheap, but
the work doesn't really pay off.

>% Frozen at end of run
>
> |   | Master | (1) | (2) | (3) |  (4) | (5) |
> |---++-+-+-+--+-+
> | A |  0 |   1 |  99 |   0 |   81 |   0 |
> | B | 71 |  96 |  99 |   3 |   98 |   2 |
> | C |  0 |   9 | 100 |   6 |   92 |   5 |
> | D |  0 |   1 |   1 |   1 |1 |   1 |
> | E |  0 |  63 | 100 |  68 |  100 |  67 |
> | F |  0 |   5 |  14 |   6 |   14 |   5 |
> | G |  0 | 100 | 100 |  92 |  100 |  67 |
> | H |  0 |  11 | 100 |   9 |   86 |   5 |
> | I |  0 | 100 | 100 |   0 |  100 |   0 |

So all of the algorithms here, but especially 1, 2, and 4, freeze a
lot more often than master.

If I understand correctly, we'd like to see small numbers for B, D,
and I, and large numbers for the other workloads. None of the
algorithms seem to achieve that. (3) and (5) seem like they always
behave as well or better than master, but they produce small numbers
for A, C, F, and H. (1), (2), and (4) regress B and I relative to
master but do better than (3) and (5) on A, C, and the latter two also
on E.

B is such an important benchmarking workload that I'd be loathe to
regress it, so if I had to pick on the basis of this data, my vote
would be (3) or (5), provided whatever is happening with (D) in the
previous metric is not as bad as it looks. What's your reason for
preferring (4) and (5) over (2) and (3)? I'm not clear that these
numbers give us much of an idea whether 10% or 33% or something else
is better in general.

To be honest, having now spent more time l

Re: Support prepared statement invalidation when result types change

2023-08-28 Thread Jelte Fennema

On Mon, 28 Aug 2023 at 15:05, Konstantin Knizhnik  wrote:
> The following assignment of format is not corrects:
>
> It has to be copied as below:
>
>  portal->formats = (int16 *)
>  MemoryContextAlloc(portal->portalContext,
> natts * sizeof(int16));
>  memcpy(portal->formats, formats, natts * sizeof(int16));

I attached a new version of the patch where I now did this. But I also
moved the code around quite a bit, since all this tupDesc/format
delaying is only needed for exec_simple_query. The original changes
actually broke some prepared statements that were using protocol level
Bind messages.

v2-0001-Support-prepared-statement-invalidation-when-resu.patch
Description: Binary data

v2-0002-Completely-remove-fixed_result-from-CachedPlanSou.patch
Description: Binary data

Re: Eager page freeze criteria clarification

2023-08-28 Thread Peter Geoghegan

On Mon, Aug 28, 2023 at 7:00 AM Melanie Plageman
 wrote:
> I believe that there are two goals that should dictate whether or not we
> should perform opportunistic freezing:
>
>   1. Is it cheap? For example, if the buffer is already dirty, then no
>   write amplification occurs, since it must be written out anyway.
>   Freezing is also less expensive if we can do it without emitting an
>   FPI.
>
>   2. Will it be effective; that is, will it stay frozen?
>   Opportunistically freezing a page that will immediately be modified is
>   a waste.
>
> The current heuristic on master meets neither of these goals: it freezes
> a page if pruning emitted an FPI for it. This doesn't evaluate whether
> or not freezing itself would be cheap, but rather attempts to hide
> freezing behind an expensive operation.

The goal is to minimize the number of FPIs over time, in general.
That's hardly the same thing as hiding the cost.

> Furthermore, it often fails to
> freeze cold data and may indiscriminately freeze hot data.

You need to account for the cost of not freezing, too. Controlling the
overall freeze debt at the level of whole tables (and the level of the
whole system) is very important. In fact it's probably the single most
important thing. A good model might end up increasing the cost of
freezing on a simple quantitative basis, while making life much better
overall. Huge balloon payments for freezing are currently a huge
problem.

Performance stability might come with a cost in some cases. There
isn't necessarily anything wrong with that at all.

-- 
Peter Geoghegan

Re: Support prepared statement invalidation when result types change

2023-08-28 Thread Jelte Fennema

On Mon, 28 Aug 2023 at 11:27, jian he  wrote:
> With parameters, it also works, only a tiny issue with error reporting.
>
> prepstmt2 | PREPARE prepstmt2(bigint) AS SELECT * FROM pcachetest
> WHERE q1 = $1; | {bigint}| {bigint,bigint,bigint}
> ERROR:  column "q1" does not exist at character 61
> HINT:  Perhaps you meant to reference the column "pcachetest.x1".
> STATEMENT:  execute prepstmt2(1);
>
> I think "character 61" refer to "PREPARE prepstmt2(bigint) AS SELECT *
> FROM pcachetest WHERE q1 = $1;"
> so maybe the STATEMENT is slightly misleading.

Could you share the full set of commands that cause the reporting
issue? I don't think my changes should impact this reporting, so I'm
curious if this is a new issue, or an already existing one.

Re: Fix error handling in be_tls_open_server()

2023-08-28 Thread Sergey Shinderuk


On 28.08.2023 18:12, Jacob Champion wrote:

On Thu, Aug 24, 2023 at 6:25 PM Michael Paquier  wrote:

LD_PRELOAD is the only thing I can think about, but that's very fancy.
Even with that, having a certificate with a NULL peer_cn could prove
to be useful in the SSL suite to stress more patterns around it?


+1. Last we tried it, OpenSSL didn't want to create a certificate with
an embedded null, but maybe things have changed?



To embed a null byte into the Subject, I first generated a regular 
certificate request in the DER (binary) format, then manually inserted 
null into the file and recomputed the checksum. Like this:

https://security.stackexchange.com/a/58845

I'll try to add a client certificate lacking a CN to the SSL test suite.

--
Sergey Shinderukhttps://postgrespro.com/

PostgreSQL 16 RC1 release announcement draft

2023-08-28 Thread Jonathan S. Katz


Hi,

Attached is the PostgreSQL 16 RC1 release announcement draft.

Currently there is only one item in it, as there was only one open item 
marked as closed. If there are any other fixes for the RC1 that were 
specific to v16 and should be included in the announcement, please let 
me know.


Please provide all feedback no later than August 31, 2023 @ 12:00 UTC 
(and preferably before that).


Thanks,

Jonathan
The PostgreSQL Global Development Group announces that the first release
candidate of PostgreSQL 16 is now available for download. As a release
candidate, PostgreSQL 16 RC 1 will be mostly identical to the initial release of
PostgreSQL 16, though some more fixes may be applied prior to the general
availability of PostgreSQL 16.

The planned date for the general availability of PostgreSQL 16 is
September 14, 2023. Please see the "Release Schedule" section for more details.

Upgrading to PostgreSQL 16 RC 1
---

To upgrade to PostgreSQL 16 RC 1 from earlier versions of PostgreSQL, you will
need to use a major version upgrade strategy, e.g. `pg_upgrade` or
`pg_dump` / `pg_restore`. For more information, please visit the documentation
section on 
[upgrading](https://www.postgresql.org/docs/16/static/upgrading.html):

[https://www.postgresql.org/docs/16/static/upgrading.html](https://www.postgresql.org/docs/16/static/upgrading.html)

Changes Since 16 Beta 3
---

Several bug fixes were applied for PostgreSQL 16 during the Beta 3 period. These
include:

* Fixed performance regression when running `COPY` concurrently on a single
table.

For a detailed list of fixes, please visit the
[open 
items](https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items#resolved_before_16rc1)
page.

Release Schedule


This is the first release candidate for PostgreSQL 16. Unless an issue is
discovered that warrants a delay or to produce an additional release candidate,
PostgreSQL 16 should be made generally available on September 14, 2023.

For further information please see the
[Beta Testing](https://www.postgresql.org/developer/beta/) page.

Links
-

* [Download](https://www.postgresql.org/download/)
* [Beta Testing Information](https://www.postgresql.org/developer/beta/)
* [PostgreSQL 16 Beta Release 
Notes](https://www.postgresql.org/docs/16/release-16.html)
* [PostgreSQL 16 Open 
Issues](https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items)
* [Feature 
Matrix](https://www.postgresql.org/about/featurematrix/#configuration-management)
* [Submit a Bug](https://www.postgresql.org/account/submitbug/)
* [Follow @postgresql on Twitter](https://twitter.com/postgresql)
* [Donate](https://www.postgresql.org/about/donate/)


OpenPGP_signature
Description: OpenPGP digital signature

Re: list of acknowledgments for PG16

2023-08-28 Thread Vik Fearing


On 8/22/23 16:24, Peter Eisentraut wrote:

On 22.08.23 15:29, Tom Lane wrote:

Alvaro Herrera  writes:

Yeah, I've been proposing this kind of thing for many years; the
problem, until not long ago, was that the tooling was unable to process
non-Latin1 characters in all the output formats that we use.  But
tooling has changed and the oldest platforms have disappeared, so maybe
it works now; do you want to inject some Chinese, Cyrillic, Japanese
names and give it a spin?  At least HTML and PDF need to work correctly.


I'm pretty sure the PDF toolchain still fails on non-Latin1 characters.
At least it does the way I have it installed; maybe adding some
non-default dependencies would help?


See here: 
https://www.postgresql.org/message-id/f58a0973-6e06-65de-8fb8-b3b93518b...@2ndquadrant.com


I applied that patch, and it works for Cyrillic text, but not for 
Japanese.  I am trying to figure out how to make it use a secondary 
font, but that might take me a while.

--
Vik Fearing

Re: Eager page freeze criteria clarification

2023-08-28 Thread Peter Geoghegan

On Mon, Aug 28, 2023 at 7:00 AM Melanie Plageman
 wrote:
> To do this, we need to save that insert LSN
> somewhere. In the attached WIP patch, I saved it in the table stats, for
> now -- knowing that those are not crash-safe.

What patch? You didn't actually attach one.

> Other discarded heuristic ideas included comparing the next transaction
> ID at the end of the vacuum of a relation to the visibility cutoff xid
> in the page -- but that wasn't effective for freezing data from bulk
> loads.

I've long emphasized the importance of designs that just try to avoid
disaster. With that in mind, I wonder: have you thought about
conditioning page freezing on whether or not there are already some
frozen tuples on the page? You could perhaps give some weight to
whether or not the page already has at least one or two preexisting
frozen tuples when deciding on whether to freeze it once again now.
You'd be more eager about freezing pages that have no frozen tuples
whatsoever, compared to what you'd do with an otherwise equivalent
page that has no unfrozen tuples.

Small mistakes are inevitable here. They have to be okay. What's not
okay is a small mistake that effectively becomes a big mistake because
it gets repeated across each VACUUM operation, again and again,
ceaselessly. You can probably be quite aggressive about freezing, to
good effect -- provided you can be sure that the downside when it
turns out to have been a bad idea is self-limiting, in whatever way.
Making more small mistakes might actually be a good thing --
especially if it can dramatically lower the chances of ever making any
really big mistakes.

VACUUM is not a passive observer of the system. It's just another
component in the system. So what VACUUM sees in the table depends in
no small part on what previous VACUUMs actually did. It follows that
VACUUM should be concerned about how what it does might either help or
hinder future VACUUMs. My preexisting frozen tuples suggestion is
really just an example of how that principle might be applied. Many
variations on the same general idea are possible.

There are already various extremely weird feedback loops where what
VACUUM did last time affects what it'll do this time. These are
vicious circles. So ISTM that there is a lot to be said for disrupting
those vicious circles, and even deliberately engineering heuristics
that have the potential to create virtuous circles for the right
workload.

-- 
Peter Geoghegan

Debian 12 gcc warning

2023-08-28 Thread Bruce Momjian

On Debian 12, gcc version 12.2.0 (Debian 12.2.0-14) generates a warning
on PG 13 to current, but only with -O1 optimization level, and not at
-O0/-O2/-O3:

clauses.c: In function ‘recheck_cast_function_args’:
clauses.c:4293:19: warning: ‘actual_arg_types’ may be used 
uninitialized [-Wmaybe-uninitialized]
 4293 | rettype = 
enforce_generic_type_consistency(actual_arg_types,
  |   
^~
 4294 | 
   declared_arg_types,
  | 
   ~~~
 4295 | 
   nargs,
  | 
   ~~
 4296 | 
   funcform->prorettype,
  | 
   ~
 4297 | 
   false);
  | 
   ~~
In file included from clauses.c:45:
../../../../src/include/parser/parse_coerce.h:82:17: note: by argument 
1 of type ‘const Oid *’ {aka ‘const unsigned int *’} to 
‘enforce_generic_type_consistency’ declared here
   82 | extern Oid  enforce_generic_type_consistency(const Oid 
*actual_arg_types,
  | ^~~~
clauses.c:4279:33: note: ‘actual_arg_types’ declared here
 4279 | Oid actual_arg_types[FUNC_MAX_ARGS];
  | ^~~~

The code is:

static void
recheck_cast_function_args(List *args, Oid result_type,
   Oid *proargtypes, int pronargs,
   HeapTuple func_tuple)
{
Form_pg_proc funcform = (Form_pg_proc) GETSTRUCT(func_tuple);
int nargs;
Oid actual_arg_types[FUNC_MAX_ARGS];
Oid declared_arg_types[FUNC_MAX_ARGS];
Oid rettype;
ListCell   *lc;

if (list_length(args) > FUNC_MAX_ARGS)
elog(ERROR, "too many function arguments");
nargs = 0;
foreach(lc, args)
{
actual_arg_types[nargs++] = exprType((Node *) lfirst(lc));
}
Assert(nargs == pronargs);
memcpy(declared_arg_types, proargtypes, pronargs * sizeof(Oid));
--> rettype = enforce_generic_type_consistency(actual_arg_types,
   declared_arg_types,
   nargs,
   funcform->prorettype,
   false);
/* let's just check we got the same answer as the parser did ... */

I don't see a clean way of avoiding the warning except by initializing
the array, which seems wasteful.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Eager page freeze criteria clarification

2023-08-28 Thread Robert Haas

On Mon, Aug 28, 2023 at 3:09 PM Peter Geoghegan  wrote:
> I've long emphasized the importance of designs that just try to avoid
> disaster. With that in mind, I wonder: have you thought about
> conditioning page freezing on whether or not there are already some
> frozen tuples on the page? You could perhaps give some weight to
> whether or not the page already has at least one or two preexisting
> frozen tuples when deciding on whether to freeze it once again now.
> You'd be more eager about freezing pages that have no frozen tuples
> whatsoever, compared to what you'd do with an otherwise equivalent
> page that has no unfrozen tuples.

I'm sure this could be implemented, but it's unclear to me why you
would expect it to perform well. Freezing a page that has no frozen
tuples yet isn't cheaper than freezing one that does, so for this idea
to be a win, the presence of frozen tuples on the page would have to
be a signal that the page is likely to be modified again in the near
future. In general, I don't see any reason why we should expect that
to be the case. One could easily construct a workload where it is the
case -- for instance, set up one table T1 where 90% of the tuples are
repeatedly updated and the other 10% are never touched, and another
table T2 that is insert-only. Once frozen, the never-updated tuples in
T1 become sentinels that we can use to know that the table isn't
insert-only. But I don't think that's very interesting: you can
construct a test case like this for any proposed criterion, just by
structuring the test workload so that whatever criterion is being
tested is a perfect predictor of whether the page will be modified
soon.

What really matters here is finding a criterion that is likely to
perform well in general, on a test case not known to us beforehand.
This isn't an entirely feasible goal, because just as you can
construct a test case where any given criterion performs well, so you
can also construct one where any given criterion performs poorly. But
I think a rule that has a clear theory of operation must be preferable
to one that doesn't. The theory that Melanie and Andres are advancing
is that a page that has been modified recently (in insert-LSN-time) is
more likely to be modified again soon than one that has not i.e. the
near future will be like the recent past. I'm not sure what the theory
behind the rule you propose here might be; if you articulated it
somewhere in your email, I seem to have missed it.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Eager page freeze criteria clarification

2023-08-28 Thread Melanie Plageman

On Mon, Aug 28, 2023 at 12:26 PM Robert Haas  wrote:
>
> On Mon, Aug 28, 2023 at 10:00 AM Melanie Plageman
>  wrote:
> Then there's the question of whether it's the right metric. My first
> reaction is to think that it sounds pretty good. One thing I really
> like about it is that if the table is being vacuumed frequently, then
> we freeze less aggressively, and if the table is being vacuumed
> infrequently, then we freeze more aggressively. That seems like a very
> desirable property. It also seems broadly good that this metric
> doesn't really care about reads. If there are a lot of reads on the
> system, or no reads at all, it doesn't really change the chances that
> a certain page is going to be written again soon, and since reads
> don't change the insert LSN, here again it seems to do the right
> thing. I'm a little less clear about whether it's good that it doesn't
> really depend on wall-clock time. Certainly, that's desirable from the
> point of view of not wanting to have to measure wall-clock time in
> places where we otherwise wouldn't have to, which tends to end up
> being expensive. However, if I were making all of my freezing
> decisions manually, I might be more freeze-positive on a low-velocity
> system where writes are more stretched out across time than on a
> high-velocity system where we're blasting through the LSN space at a
> higher rate. But maybe that's not a very important consideration, and
> I don't know what we'd do about it anyway.

By low-velocity, do you mean lower overall TPS? In that case, wouldn't you be
less likely to run into xid wraparound and thus need less aggressive
opportunistic freezing?

> >Page Freezes/Page Frozen (less is better)
> >
> > |   | Master | (1) | (2) | (3) | (4) | (5) |
> > |---++-+-+-+-+-|
> > | A |  28.50 |3.89 |1.08 |1.15 |1.10 |1.10 |
> > | B |   1.00 |1.06 |1.65 |1.03 |1.59 |1.00 |
> > | C |N/A |1.00 |1.00 |1.00 |1.00 |1.00 |
> > | D |   2.00 | 5199.15 | 5276.85 | 4830.45 | 5234.55 | 2193.55 |
> > | E |   7.90 |3.21 |2.73 |2.70 |2.69 |2.43 |
> > | F |N/A |1.00 |1.00 |1.00 |1.00 |1.00 |
> > | G |N/A |1.00 |1.00 |1.00 |1.00 |1.00 |
> > | H |N/A |1.00 |1.00 |1.00 |1.00 |1.00 |
> > | I |N/A |   42.00 |   42.00 | N/A |   41.00 | N/A |
>
> Hmm. I would say that the interesting rows here are A, D, and I, with
> rows C and E deserving honorable mention. In row A, master is bad.

So, this is where the caveat about absolute number of page freezes
matters. In algorithm A, master only did 57 page freezes (spread across
the various pgbench tables). At the end of the run, 2 pages were still
frozen.

> In row D, your algorithms are all bad, really bad. I don't quite
> understand how it can be that bad, actually.

So, I realize now that this test was poorly designed. I meant it to be a
worst case scenario, but I think one critical part was wrong. In this
example one client is going at full speed inserting a row and then
updating it. Then another rate-limited client is deleting old data
periodically to keep the table at a constant size. I meant to bulk load
the table with enough data that the delete job would have data to delete
from the start. With the default autovacuum settings, over the course of
45 minutes, I usually saw around 40 autovacuums of the table. Due to the
rate limiting, the first autovacuum of the table ends up freezing many
pages that are deleted soon after. Thus the total number of page freezes
is very high.

I will redo benchmarking of workload D and start the table with the
number of rows which the DELETE job seeks to maintain. My back of the
envelope math says that this will mean ratios closer to a dozen (and not
5000).

Also, I had doubled checkpoint timeout, which likely led master to
freeze so few pages (2 total freezes, neither of which were still frozen
at the end of the run). This is an example where master's overall low
number of page freezes makes it difficult to compare to the alternatives
using a ratio.

I didn't initially question the numbers because it seems like freezing
data and then deleting it right after would naturally be one of the
worst cases for opportunistic freezing, but certainly not this bad.

> Row I looks bad for algorithms 1, 2, and 4: they freeze pages because
> it looks cheap, but the work doesn't really pay off.

Yes, the work queue example looks like it is hard to handle.

> >% Frozen at end of run
> >
> > |   | Master | (1) | (2) | (3) |  (4) | (5) |
> > |---++-+-+-+--+-+
> > | A |  0 |   1 |  99 |   0 |   81 |   0 |
> > | B | 71 |  96 |  99 |   3 |   98 |   2 |
> > | C |  0 |   9 | 100 |   6 |   92 |   5 |
> > | D |  0 |   1 |   1 |   1 |1 |   1 |
> > | E |  0 |  63 | 100 |  68 |  100 |  67 |
> > | F

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-28 Thread Heikki Linnakangas


On 28/08/2023 18:55, Jeff Janes wrote:

Since this commit, I'm getting a lot (63 per restart) of messages:

  LOG:  could not close client or listen socket: Bad file descriptor
All I have to do to get the message is turn logging_collector = on and 
restart.


The close failure condition existed before the commit, it just wasn't 
logged before.  So, did the extra logging added here just uncover a  
pre-existing bug?


Yes, so it seems. Syslogger is started before the ListenSockets array is 
initialized, so its still all zeros. When syslogger is forked, the child 
process tries to close all the listen sockets, which are all zeros. So 
syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the 
array initialization earlier.


The first close(0) actually does have an effect: it closes stdin, which 
is fd 0. That is surely accidental, but I wonder if we should indeed 
close stdin in child processes? The postmaster process doesn't do 
anything with stdin either, although I guess a library might try to read 
a passphrase from stdin before starting up, for example.


--
Heikki Linnakangas
Neon (https://neon.tech)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 41bccb46a87..6b9e10bffa0 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -596,6 +596,9 @@ PostmasterMain(int argc, char *argv[])
 
 	IsPostmasterEnvironment = true;
 
+	for (i = 0; i < MAXLISTEN; i++)
+		ListenSocket[i] = PGINVALID_SOCKET;
+
 	/*
 	 * Start our win32 signal implementation
 	 */
@@ -1176,12 +1179,9 @@ PostmasterMain(int argc, char *argv[])
 	/*
 	 * Establish input sockets.
 	 *
-	 * First, mark them all closed, and set up an on_proc_exit function that's
-	 * charged with closing the sockets again at postmaster shutdown.
+	 * Set up an on_proc_exit function that's charged with closing the sockets
+	 * again at postmaster shutdown.
 	 */
-	for (i = 0; i < MAXLISTEN; i++)
-		ListenSocket[i] = PGINVALID_SOCKET;
-
 	on_proc_exit(CloseServerPorts, 0);
 
 	if (ListenAddresses)

Re: Eager page freeze criteria clarification

2023-08-28 Thread Peter Geoghegan

On Mon, Aug 28, 2023 at 1:17 PM Robert Haas  wrote:
> I'm sure this could be implemented, but it's unclear to me why you
> would expect it to perform well. Freezing a page that has no frozen
> tuples yet isn't cheaper than freezing one that does, so for this idea
> to be a win, the presence of frozen tuples on the page would have to
> be a signal that the page is likely to be modified again in the near
> future. In general, I don't see any reason why we should expect that
> to be the case.

What I've described in a scheme for deciding to not freeze where that
would usually happen -- a scheme for *vetoing* page freezing -- rather
than a scheme for deciding to freeze. On its own, what I suggested
cannot help at all. It assumes a world in which we're already deciding
to freeze much more frequently, based on whatever other criteria. It's
intended to complement something like the LSN scheme.

> What really matters here is finding a criterion that is likely to
> perform well in general, on a test case not known to us beforehand.
> This isn't an entirely feasible goal, because just as you can
> construct a test case where any given criterion performs well, so you
> can also construct one where any given criterion performs poorly. But
> I think a rule that has a clear theory of operation must be preferable
> to one that doesn't. The theory that Melanie and Andres are advancing
> is that a page that has been modified recently (in insert-LSN-time) is
> more likely to be modified again soon than one that has not i.e. the
> near future will be like the recent past.

I don't think that it's all that useful on its own. You just cannot
ignore the fact that the choice to not freeze now doesn't necessarily
mean that you get to rereview that choice in the near future.
Particularly with large tables, the opportunities to freeze at all are
few and far between -- if for no other reason than the general design
of autovacuum scheduling. Worse still, any unfrozen all-visible pages
can just accumulate as all-visible pages, until the next aggressive
VACUUM happens whenever. How can that not be extremely important?

That isn't an argument against a scheme that uses LSNs (many kinds of
information might be weighed) -- it's an argument in favor of paying
attention to the high level cadence of VACUUM. That much seems
essential. I think that there might well be room for having several
complementary schemes like the LSN scheme. Or one big scheme that
weighs multiple factors together, if you prefer. That all seems
basically reasonable to me.

Adaptive behavior is important with something as complicated as this.
Adaptive schemes all seem to involve trial and error. The cost of
freezing too much is relatively well understood, and can be managed
sensibly. So we should err in that direction -- a direction that is
relatively easy to understand, to notice, and to pull back from having
gone too far. Putting off freezing for a very long time is a source of
much of the seemingly intractable complexity in this area.

Another way of addressing that is getting rid of aggressive VACUUM as
a concept. But I'm not going to revisit that topic now, or likely
ever.

-- 
Peter Geoghegan

Re: Wrong usage of pqMsg_Close message code?

2023-08-28 Thread Tatsuo Ishii

>> Hi Pavel,
>>
>> > There is message PqMsgClose, but this should be used from client side.
>> Here should be used PqMsg_CommandComplete instead?
>>
>> It seems so. This change was introduced in f4b54e1ed98 [1]:
>>
>> ```
>> --- a/src/backend/tcop/dest.c
>> +++ b/src/backend/tcop/dest.c
>> @@ -176,7 +176,7 @@ EndCommand(const QueryCompletion *qc, CommandDest
>> dest, bool force_undecorated_o
>>
>> len = BuildQueryCompletionString(completionTag, qc,
>>
>>   force_undecorated_output);
>> -   pq_putmessage('C', completionTag, len + 1);
>> +   pq_putmessage(PqMsg_Close, completionTag, len + 1);
>>
>> case DestNone:
>> case DestDebug
>> ```
>>
>> It should have been PqMsg_CommandComplete.
>>
>> [1]:
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f4b54e1ed98
> 
> 
> here is a patch - all tests passed

I think EndReplicationCommand needs to be fixed as well.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Allow specifying a dbname in pg_basebackup connection string

2023-08-28 Thread Tristen Raab

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Hello,

I've reviewed your patch and it applies and builds without error. When testing 
this patch I was slightly confused as to what its purpose was, after testing it 
I now understand. Initially, I thought this was a change to add database-level 
replication. I would suggest some clarifications to the documentation such as 
changing:

"supplying a specific database name in the connection string won't cause 
PostgreSQL to behave any differently."

to 

"supplying a specific database name in the connection string won't cause 
pg_basebackup to behave any differently."

I believe this better illustrates that we are referring to the actual 
pg_basebackup utility and how this parameter is only used for proxies and bears 
no impact on what pg_basebackup is actually doing. It also would remove any 
confusion about database replication I had prior.

There is also a small typo in the same documentation:

"However, if you are connecting to PostgreSQL through a proxy, then it's 
possible that this proxy does use the supplied databasename to make certain 
decisions, such as to which cluster to route the connection."

"databasename" is just missing a space.

Other than that, everything looks good.

Regards,

Tristen

Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-08-28 Thread Thomas Munro

On Tue, Aug 29, 2023 at 1:58 AM Christoph Berg  wrote:
> This should be fixed before the 16 release.

Here's what I was thinking of doing for this, given where we are in
the release schedule:

* commit the signal-refactoring patch in master only
* plan to back-patch it into 16 in a later point release (it's much
harder to back-patch further than that)
* look into what we can do to suppress the offending test for 16 in the meantime

But looking into that just now, I am curious about something.  The
whole area of recovery conflicts has been buggy forever, and the tests
were only developed fairly recently by Melanie and Andres (April
2022), and then backpatched to all releases.  They were disabled again
in release branches 10-14 (discussion at
https://postgr.es/m/3447060.1652032...@sss.pgh.pa.us):

+plan skip_all => "disabled until after minor releases, due to instability";

Now your mainframe build bot is regularly failing for whatever timing
reason, in 16 and master.  That's quite useful because your tests have
made us or at least me a lot more confident that the fix is good (one
of the reasons progress has been slow is that failures in CI and BF
were pretty rare and hard to analyse).  But... I wonder why it isn't
failing for you in 15?  Are you able to check if it ever has?  I
suppose we could go and do the "disabled due to instability" thing in
15 and 16, and then later we'll un-disable it in 16 when we back-patch
the fix into it.

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

2023-08-28 Thread Michael Paquier

On Mon, Aug 28, 2023 at 11:52:15PM +0300, Heikki Linnakangas wrote:
> On 28/08/2023 18:55, Jeff Janes wrote:
>> Since this commit, I'm getting a lot (63 per restart) of messages:
>> 
>>   LOG:  could not close client or listen socket: Bad file descriptor
>> All I have to do to get the message is turn logging_collector = on and
>> restart.
>> 
>> The close failure condition existed before the commit, it just wasn't
>> logged before.  So, did the extra logging added here just uncover a
>> pre-existing bug?

In case you've not noticed:
https://www.postgresql.org/message-id/zovvuqe0rdj2s...@paquier.xyz
But it does not really matter now ;)

> Yes, so it seems. Syslogger is started before the ListenSockets array is
> initialized, so its still all zeros. When syslogger is forked, the child
> process tries to close all the listen sockets, which are all zeros. So
> syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
> array initialization earlier.

Yep, I've reached the same conclusion.  Wouldn't it be cleaner to move
the callback registration of CloseServerPorts() closer to the array
initialization, though?

> The first close(0) actually does have an effect: it closes stdin, which is
> fd 0. That is surely accidental, but I wonder if we should indeed close
> stdin in child processes? The postmaster process doesn't do anything with
> stdin either, although I guess a library might try to read a passphrase from
> stdin before starting up, for example.

We would have heard about that, wouldn't we?  I may be missing
something of course, but on HEAD, the array initialization is done
before starting any child processes, and the syslogger is the first
one forked.
--
Michael


signature.asc
Description: PGP signature

Re: Debian 12 gcc warning

2023-08-28 Thread Michael Paquier

On Mon, Aug 28, 2023 at 03:37:20PM -0400, Bruce Momjian wrote:
> I don't see a clean way of avoiding the warning except by initializing
> the array, which seems wasteful.

Or just initialize the array with a {0}?
--
Michael


signature.asc
Description: PGP signature

Re: Commitfest manager for September

2023-08-28 Thread Michael Paquier

On Mon, Aug 28, 2023 at 11:42:22AM -0400, Melanie Plageman wrote:
> I too had planned to volunteer to help. I volunteer to do a
> triage/summary of patch statuses, as has been done occasionally in the
> past [1].
> Have folks found this helpful in the past?

With the number of patches in place, getting any help with triaging is
much welcome, but I cannot speak for Peter.  FWIW, with your
experience, I think that you are a good fit, as is Aleksander.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH v1] PQputCopyEnd never returns 0, fix the inaccurate comment

2023-08-28 Thread Michael Paquier

On Mon, Aug 28, 2023 at 09:46:07PM +0800, Junwang Zhao wrote:
> Yeah, it makes sense to me, or maybe just `PQputCopyEnd(...) == -1`,
> let's wait for some other opinions.

One can argue that PQputCopyEnd() returning 0 could be possible in an
older version of libpq these callers are linking to, but this has
never existed from what I can see (just checked down to 8.2 now).
Anyway, changing these callers may create some backpatching conflicts,
so I'd let them as they are, and just fix the comment.
--
Michael


signature.asc
Description: PGP signature

Re: Wrong usage of pqMsg_Close message code?

2023-08-28 Thread Michael Paquier

On Tue, Aug 29, 2023 at 06:12:00AM +0900, Tatsuo Ishii wrote:
> I think EndReplicationCommand needs to be fixed as well.

Yeah, both of you are right here.  Anyway, it seems to me that there
is a bit more going on in protocol.h.  I have noticed two more things
that are incorrect:
- HandleParallelMessage is missing a message for 'P', but I think that
we should have a code for it as well as part of the parallel query
protocol.
- PqMsg_Terminate can be sent by the frontend *and* the backend, see
fe-connect.c and parallel.c.  However, protocol.h documents it as a
frontend-only code.
--
Michael


signature.asc
Description: PGP signature

Re: Debian 12 gcc warning

2023-08-28 Thread Bruce Momjian

On Tue, Aug 29, 2023 at 07:30:15AM +0900, Michael Paquier wrote:
> On Mon, Aug 28, 2023 at 03:37:20PM -0400, Bruce Momjian wrote:
> > I don't see a clean way of avoiding the warning except by initializing
> > the array, which seems wasteful.
> 
> Or just initialize the array with a {0}?

Uh, doesn't that set all elements to zero?  See:


https://stackoverflow.com/questions/2589749/how-to-initialize-array-to-0-in-c

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Eager page freeze criteria clarification

2023-08-28 Thread Melanie Plageman

On Mon, Aug 28, 2023 at 5:06 PM Peter Geoghegan  wrote:
>
> On Mon, Aug 28, 2023 at 1:17 PM Robert Haas  wrote:
> > I'm sure this could be implemented, but it's unclear to me why you
> > would expect it to perform well. Freezing a page that has no frozen
> > tuples yet isn't cheaper than freezing one that does, so for this idea
> > to be a win, the presence of frozen tuples on the page would have to
> > be a signal that the page is likely to be modified again in the near
> > future. In general, I don't see any reason why we should expect that
> > to be the case.
>
> What I've described in a scheme for deciding to not freeze where that
> would usually happen -- a scheme for *vetoing* page freezing -- rather
> than a scheme for deciding to freeze. On its own, what I suggested
> cannot help at all. It assumes a world in which we're already deciding
> to freeze much more frequently, based on whatever other criteria. It's
> intended to complement something like the LSN scheme.

I like the direction of this idea. It could avoid repeatedly freezing
a page that is being modified over and over. I tried it out with a
short-running version of workload I (approximating a work queue) --
and it prevents unnecessary freezing.

The problem with this criterion is that if you freeze a page and then
ever modify it again -- even once, vacuum will not be allowed to
freeze it again until it is forced to. Perhaps we could use a very
strict LSN cutoff for pages which contain any frozen tuples -- for
example: only freeze a page containing a mix of frozen and unfrozen
tuples if it has not been modified since before the last vacuum of the
relation ended.

- Melanie

Re: Return value of pg_promote()

2023-08-28 Thread Michael Paquier

On Mon, Aug 28, 2023 at 02:09:37PM +0200, Laurenz Albe wrote:
> On Thu, 2023-08-17 at 09:37 +0900, Michael Paquier wrote:
> > I have just noticed that we do not have a CF entry for this proposal,
> > so I have added one with Laurenz as author:
> > https://commitfest.postgresql.org/44/4504/
> 
> I have changed the author to Fujii Masao.

Still incorrect, as the author is Ashutosh Sharma.  Fujii-san has
provided some feedback, though.
--
Michael


signature.asc
Description: PGP signature

Eliminate redundant tuple visibility check in vacuum

2023-08-28 Thread Melanie Plageman

While working on a set of patches to combine the freeze and visibility
map WAL records into the prune record, I wrote the attached patches
reusing the tuple visibility information collected in heap_page_prune()
back in lazy_scan_prune().

heap_page_prune() collects the HTSV_Result for every tuple on a page
and saves it in an array used by heap_prune_chain(). If we make that
array available to lazy_scan_prune(), it can use it when collecting
stats for vacuum and determining whether or not to freeze tuples.
This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
the page.

It also gets rid of the retry loop in lazy_scan_prune().

- Melanie
From bdb432b220571086077085b29d3c90a3a1c68c47 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 3 Aug 2023 15:37:39 -0400
Subject: [PATCH v1 2/2] Reuse heap_page_prune() tuple visibility statuses

heap_page_prune() obtains the HTSV_Result (tuple visibility status)
returned from HeapTupleSatisfiesVacuum() for every tuple on the page and
stores them in an array. By making this array available to
heap_page_prune()'s caller lazy_scan_prune(), we can avoid an additional
call to HeapTupleSatisfiesVacuum() when freezing the tuples and
recording LP_DEAD items for vacuum. This saves resources and eliminates
the possibility that vacuuming corrupted data results in a hang due to
endless retry looping.

We pass the HTSV array from heap_page_prune() back to lazy_scan_prune()
in PruneResult. Now that we are passing in PruneResult to
heap_page_prune(), we can move the other output parameters passed
individually into heap_page_prune() into the PruneResult.
---
 src/backend/access/heap/pruneheap.c  | 68 +++-
 src/backend/access/heap/vacuumlazy.c | 66 ---
 src/include/access/heapam.h  | 14 +-
 3 files changed, 68 insertions(+), 80 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 47b9e20915..b5bc126399 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -65,16 +65,6 @@ typedef struct
 	 * 1. Otherwise every access would need to subtract 1.
 	 */
 	bool		marked[MaxHeapTuplesPerPage + 1];
-
-	/*
-	 * Tuple visibility is only computed once for each tuple, for correctness
-	 * and efficiency reasons; see comment in heap_page_prune() for details.
-	 * This is of type int8[], instead of HTSV_Result[], so we can use -1 to
-	 * indicate no visibility has been computed, e.g. for LP_DEAD items.
-	 *
-	 * Same indexing as ->marked.
-	 */
-	int8		htsv[MaxHeapTuplesPerPage + 1];
 } PruneState;
 
 /* Local functions */
@@ -83,7 +73,7 @@ static HTSV_Result heap_prune_satisfies_vacuum(PruneState *prstate,
 			   Buffer buffer);
 static int	heap_prune_chain(Buffer buffer,
 			 OffsetNumber rootoffnum,
-			 PruneState *prstate);
+			 PruneState *prstate, PruneResult *presult);
 static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
 static void heap_prune_record_redirect(PruneState *prstate,
 	   OffsetNumber offnum, OffsetNumber rdoffnum);
@@ -191,10 +181,25 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 
 	if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 	{
+		PruneResult presult;
+
 		/* OK, try to get exclusive buffer lock */
 		if (!ConditionalLockBufferForCleanup(buffer))
 			return;
 
+		/* Initialize prune result fields */
+		presult.hastup = false;
+		presult.has_lpdead_items = false;
+		presult.all_visible = true;
+		presult.all_frozen = true;
+		presult.visibility_cutoff_xid = InvalidTransactionId;
+
+		/*
+		 * nnewlpdead only includes those items which were newly set to
+		 * LP_DEAD during pruning.
+		 */
+		presult.nnewlpdead = 0;
+
 		/*
 		 * Now that we have buffer lock, get accurate information about the
 		 * page's free space, and recheck the heuristic about whether to
@@ -202,11 +207,8 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		 */
 		if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 		{
-			int			ndeleted,
-		nnewlpdead;
-
-			ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
-	   limited_ts, &nnewlpdead, NULL);
+			int			ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
+   limited_ts, &presult, NULL);
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
@@ -222,9 +224,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * tracks ndeleted, since it will set the same LP_DEAD items to
 			 * LP_UNUSED separately.
 			 */
-			if (ndeleted > nnewlpdead)
+			if (ndeleted > presult.nnewlpdead)
 pgstat_update_heap_dead_tuples(relation,
-			   ndeleted - nnewlpdead);
+			   ndeleted - presult.nnewlpdead);
 		}
 
 		/* And release buffer lock */
@@ -253,12 +255,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * either have been set by TransactionIdLimitedForOldSnapshots, or
  * Inval

1 2 >

100 matches

Mail list logo