Re: GUC for temporarily disabling event triggers

2023-09-06 Thread Michael Paquier
On Wed, Sep 06, 2023 at 10:23:55PM +0200, Daniel Gustafsson wrote:
> > On 6 Sep 2023, at 16:22, Robert Haas  wrote:
>> I usually prefer to give things a positive sense, talking about
>> whether things are enabled rather than disabled. I'd do event_triggers
>> = off | on, like we have for row_security. YMMV, though.
> 
> Fair enough, I don't have strong opinions and I do agree that making this work
> like row_security is a good thing for consistency.  Done in the attached.

This point has been raised a couple of months ago:
https://www.postgresql.org/message-id/ZC0s%2BBRMqRupDInQ%40paquier.xyz

+SET event_triggers = 'on';
+CREATE POLICY pguc ON event_trigger_test USING (FALSE);
+DROP POLICY pguc ON event_trigger_test;

This provides checks for the start, end and drop events.  Shouldn't
table_rewrite also be covered?

+   GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE

I am a bit surprised by these two additions.  Setting this GUC at
file-level can be useful, as is documenting it in the control file if
it provides some control of how a statement behaves, no?

+   Allow temporarily disabling execution of event triggers in order to
+   troubleshoot and repair faulty event triggers. All event triggers will
+   be disabled by setting it to true. Setting the value
+   to false will not disable any event triggers, this
+   is the default value. Only superusers and users with the appropriate
+   SET privilege can change this setting.

Event triggers are disabled if setting this GUC to false, while true,
the default, allows event triggers.  The values are reversed in this
description.
--
Michael


signature.asc
Description: PGP signature


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-06 Thread Peter Smith
Hi, here are my code review comments for the patch v32-0002

==
src/bin/pg_upgrade/check.c

1. check_new_cluster_logical_replication_slots

+ res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
+ max_replication_slots = atoi(PQgetvalue(res, 0, 0));
+
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine max_replication_slots");

Shouldn't the PQntuples check be *before* the PQgetvalue and
assignment to max_replication_slots?

~~~

2. check_new_cluster_logical_replication_slots

+ res = executeQueryOrDie(conn, "SHOW wal_level;");
+ wal_level = PQgetvalue(res, 0, 0);
+
+ if (PQntuples(res) != 1)
+ pg_fatal("could not determine wal_level");

Shouldn't the PQntuples check be *before* the PQgetvalue and
assignment to wal_level?

~~~

3. check_old_cluster_for_valid_slots

I saw that similar code with scripts like this is doing PG_REPORT:

pg_log(PG_REPORT, "fatal");

but that PG_REPORT is missing from this function.

==
src/bin/pg_upgrade/function.c

4. get_loadable_libraries

@@ -42,11 +43,12 @@ library_name_compare(const void *p1, const void *p2)
  ((const LibraryInfo *) p2)->dbnum;
 }

-
 /*
  * get_loadable_libraries()

~

Removing that blank line (above this function) should not be included
in the patch.

~~~

5. get_loadable_libraries

+ /*
+ * Allocate a memory for extensions and logical replication output
+ * plugins.
+ */
+ os_info.libraries = pg_malloc_array(LibraryInfo,
+ totaltups + count_old_cluster_logical_slots());

/Allocate a memory/Allocate memory/

~~~

6. get_loadable_libraries
+ /*
+ * Store the name of output plugins as well. There is a possibility
+ * that duplicated plugins are set, but the consumer function
+ * check_loadable_libraries() will avoid checking the same library, so
+ * we do not have to consider their uniqueness here.
+ */
+ for (slotno = 0; slotno < slot_arr->nslots; slotno++)

/Store the name/Store the names/

==
src/bin/pg_upgrade/info.c

7. get_old_cluster_logical_slot_infos

+ i_slotname = PQfnumber(res, "slot_name");
+ i_plugin = PQfnumber(res, "plugin");
+ i_twophase = PQfnumber(res, "two_phase");
+ i_caughtup = PQfnumber(res, "caughtup");
+ i_conflicting = PQfnumber(res, "conflicting");
+
+ for (slotnum = 0; slotnum < num_slots; slotnum++)
+ {
+ LogicalSlotInfo *curr = [slotnum];
+
+ curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
+ curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
+ curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
+ curr->caughtup = (strcmp(PQgetvalue(res, slotnum, i_caughtup), "t") == 0);
+ curr->conflicting = (strcmp(PQgetvalue(res, slotnum, i_conflicting),
"t") == 0);
+ }

Saying "tup" always looks like it should be something tuple-related.
IMO it will be better to call all these "caught_up" instead of
"caughtup":

"caughtup" ==> "caught_up"
i_caughtup ==> i_caught_up
curr->caughtup ==> curr->caught_up

~~~

8. print_slot_infos

+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ if (slot_arr->nslots > 1)
+ pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = _arr->slots[slotnum];
+
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+slot_info->slotname,
+slot_info->plugin,
+slot_info->two_phase);
+ }
+}

Although it makes no functional difference, it might be neater if the
for loop is also within that "if (slot_arr->nslots > 1)" condition.

==
src/bin/pg_upgrade/pg_upgrade.h

9.
+/*
+ * Structure to store logical replication slot information
+ */
+typedef struct
+{
+ char*slotname; /* slot name */
+ char*plugin; /* plugin */
+ bool two_phase; /* can the slot decode 2PC? */
+ bool caughtup; /* Is confirmed_flush_lsn the same as latest
+ * checkpoint LSN? */
+ bool conflicting; /* Is the slot usable? */
+} LogicalSlotInfo;

9a.
+ bool caughtup; /* Is confirmed_flush_lsn the same as latest
+ * checkpoint LSN? */

caughtup ==> caught_up

~

9b.
+ bool conflicting; /* Is the slot usable? */

The field name has the opposite meaning of the wording of the comment.
(e.g. it is usable when it is NOT conflicting, right?).

Maybe there needs a better field name, or a better comment, or both.
AFAICT from other code pg_fatal message 'conflicting' is always
interpreted as 'lost' so maybe the field should be called that?

--
Kind Regards,
Peter Smith.
Fujitsu Australia




psql help message contains excessive indentations

2023-09-06 Thread Kyotaro Horiguchi
This topic is extracted from [1].

As mentioned there, in psql, running \? displays the following lines.

 >  \gdesc describe result of query, without executing it
 >  \gexec execute query, then execute each value in its result
 >  \gset [PREFIX] execute query and store result in psql variables
 >  \gx [(OPTIONS)] [FILE] as \g, but forces expanded output mode
 >  \q quit psql
 >  \watch [[i=]SEC] [c=N] [m=MIN]
!>  execute query every SEC seconds, up to N times
!>  stop if less than MIN rows are returned

The last two lines definitely have some extra indentation.
I've attached a patch that fixes this.

[1] 
https://www.postgresql.org/message-id/20230830.103356.1909699532104167477.horikyota@gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 38c165a627..f8eeef8e4e 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -201,8 +201,8 @@ slashUsage(unsigned short int pager)
 	HELP0("  \\gx [(OPTIONS)] [FILE] as \\g, but forces expanded output mode\n");
 	HELP0("  \\q quit psql\n");
 	HELP0("  \\watch [[i=]SEC] [c=N] [m=MIN]\n");
-	HELP0("  execute query every SEC seconds, up to N times\n");
-	HELP0("  stop if less than MIN rows are returned\n");
+	HELP0(" execute query every SEC seconds, up to N times\n");
+	HELP0(" stop if less than MIN rows are returned\n");
 	HELP0("\n");
 
 	HELP0("Help\n");


Re: persist logical slots to disk during shutdown checkpoint

2023-09-06 Thread vignesh C
On Wed, 6 Sept 2023 at 12:09, Dilip Kumar  wrote:
>
> Other than that the patch LGTM.

pgindent reported that the new comments added need to be re-adjusted,
here is an updated patch for the same.
I also verified the following: a) patch applies neatly on HEAD b) make
check-world passes c) pgindent looks good d) pgperltiy was fine e)
meson test runs were successful. Also checked that CFBot run was fine
for the last patch.

Regards,
Vignesh
From d748cd6d78c2ae5fc38552166572834876381336 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v9] Persist logical slots to disk during a shutdown checkpoint
 if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help avoid processing the same transactions again in some
boundary cases after the clean shutdown and restart. Say, we process some
transactions for which we didn't send anything downstream (the changes got
filtered) but the confirm_flush LSN is updated due to keepalives. As we
don't flush the latest value of confirm_flush LSN, it may lead to
processing the same changes again without this patch.

Author: Vignesh C, Julien Rouhaud, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith, Ashutosh Bapat
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=gyqkxox0afnbm1fbp7sy7t4yw...@mail.gmail.com
Discussion: http://postgr.es/m/tyapr01mb58664c81887b3af2eb6b16e3f5...@tyapr01mb5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c |   2 +-
 src/backend/replication/slot.c|  34 +-
 src/include/replication/slot.h|   5 +-
 src/test/recovery/meson.build |   1 +
 .../t/038_save_logical_slots_shutdown.pl  | 102 ++
 5 files changed, 139 insertions(+), 5 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..2d22b4e956 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -321,6 +321,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -1573,10 +1574,10 @@ restart:
  * Flush all replication slots to disk.
  *
  * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * location. is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,6 +1602,31 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
+
+		/*
+		 * We won't ensure that the slot is persisted after the
+		 * confirmed_flush LSN is updated as that could lead to frequent
+		 * writes.  However, we need to ensure that we do persist the slots at
+		 * the time of shutdown whose confirmed_flush LSN is changed since we
+		 * last saved the slot to disk. This will help in avoiding retreat of
+		 * the confirmed_flush LSN after restart. At other times, the
+		 * walsender keeps saving the slot from time to time as the
+		 * replication progresses, so there is no clear advantage of flushing
+		 * additional slots at the time of checkpoint.
+		 */
+		if (is_shutdown && SlotIsLogical(s))
+		{
+			SpinLockAcquire(>mutex);
+			if (s->data.invalidated == RS_INVAL_NONE &&
+s->data.confirmed_flush != s->last_saved_confirmed_flush)
+			{
+s->just_dirtied = true;
+s->dirty = true;
+			}
+
+			SpinLockRelease(>mutex);
+		}
+
 		SaveSlotToPath(s, path, LOG);
 	}
 	LWLockRelease(ReplicationSlotAllocationLock);
@@ -1873,11 +1899,12 @@ SaveSlotToPath(ReplicationSlot *slot, const char *dir, int elevel)
 
 	/*
 	 * Successfully wrote, unset dirty bit, unless somebody dirtied again
-	 * already.
+	 * already and remember the confirmed_flush LSN value.
 	 */
 	

Re: Unlogged relation copy is not fsync'd

2023-09-06 Thread Michael Paquier
On Tue, Sep 05, 2023 at 02:20:18PM -0400, Robert Haas wrote:
> The general rule throughout the system is that the init-fork of an
> unlogged relation is treated the same as a permanent relation: it is
> WAL-logged and fsyncd. But the other forks of an unlogged relation are
> neither WAL-logged nor fsync'd ... except in the case of a clean
> shutdown, when we fsync even that data.
> 
> In other words, somehow it feels like we ought to be trying to defer
> the fsync here until a clean shutdown actually occurs, instead of
> performing it immediately. Admittedly, the bookkeeping seems like a
> problem, so maybe this is the best we can do, but it's clearly worse
> than what we do in other cases.

That's where we usually rely more on RegisterSyncRequest() and
register_dirty_segment() so as the flush of the dirty segments can
happen when they should, but we don't go through the shared buffers
when copying all the forks of a relation file across tablespaces..
--
Michael


signature.asc
Description: PGP signature


Re: pg_basebackup: Always return valid temporary slot names

2023-09-06 Thread Michael Paquier
On Thu, Sep 07, 2023 at 01:23:33PM +0900, Michael Paquier wrote:
> PQbackendPID() returns a signed value, likely coming from the fact
> that it was thought to be OK back in the days where PIDs were always
> defined with less bits.  The fix is OK taken in isolation, so I am
> going to apply it in a few minutes as I'm just passing by..

Actually, correcting myself, pid_max cannot be higher than 2^22 on 64b
machines even these days (per man 5 proc).
--
Michael


signature.asc
Description: PGP signature


Re: pg_basebackup: Always return valid temporary slot names

2023-09-06 Thread Michael Paquier
On Tue, Sep 05, 2023 at 02:06:14PM +0200, Daniel Gustafsson wrote:
>> For that purpose it's actually more secure to use all bits for random
>> data, instead of keeping one bit always 0.
> 
> If it's common practice to submit a pid which isn't a pid, I wonder if longer
> term it's worth inventing a value for be_pid which means "unknown pid" such
> that consumers can make informed calls when reading it?  Not the job of this
> patch to do so, but maybe food for thought.

Perhaps.

>> So, while I agree that putting a negative value in the process ID field of
>> BackendData, is arguably incorrect. Given the simplicity of the fix on
>> the pg_basebackup side, I think addressing it in pg_basebackup is
>> easier than fixing this in all connection poolers.
> 
> Since the value in the temporary slotname isn't used to convey meaning, but
> merely to ensure uniqueness, I don't think it's unreasonable to guard aginst
> malformed input (ie negative integer).

PQbackendPID() returns a signed value, likely coming from the fact
that it was thought to be OK back in the days where PIDs were always
defined with less bits.  The fix is OK taken in isolation, so I am
going to apply it in a few minutes as I'm just passing by..

Saying that, I agree with the point that we should also use %u in
psql's prompt.c to get a correct PID if the 32th bit is set, and move
on.
--
Michael


signature.asc
Description: PGP signature


Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-09-06 Thread Thomas Munro
On Tue, Aug 8, 2023 at 11:08 AM Andres Freund  wrote:
> On 2023-08-07 12:57:40 +0200, Christoph Berg wrote:
> > v8 worked better. It succeeded a few times (at least 12, my screen
> > scrollback didn't catch more) before erroring like this:
>
> > [10:21:58.410](0.151s) ok 15 - startup deadlock: logfile contains 
> > terminated connection due to recovery conflict
> > [10:21:58.463](0.053s) not ok 16 - startup deadlock: stats show conflict on 
> > standby
> > [10:21:58.463](0.000s)
> > [10:21:58.463](0.000s) #   Failed test 'startup deadlock: stats show 
> > conflict on standby'
> > #   at t/031_recovery_conflict.pl line 332.
> > [10:21:58.463](0.000s) #  got: '0'
> > # expected: '1'
>
> Hm, that could just be a "harmless" race. Does it still happen if you apply
> the attached patch in addition?

Do you intend that as the fix, or just proof of what's wrong?  It
looks pretty reasonable to me.  I will go ahead and commit this unless
you want to change something/do it yourself.  As far as I know so far,
this is the last thing preventing the mighty Debian/s390x build box
from passing consistently, which would be nice.
From cbab5797fde1db70464f157b4bb9321ab40ad025 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 7 Sep 2023 15:28:47 +1200
Subject: [PATCH] Update recovery conflict stats immediately.

031_recovery_conflict.pl would occasionally fail to see a stats change,
leading to intermittent failures.  Since recovery conflicts are rare,
just flush them immediately.

No back-patch for now, but that will be needed if/when the test is
re-enabled in back-branches.

Author: Andres Freund 
Reported-by: Christoph Berg 
Discussion: https://postgr.es/m/20230807230821.gylo74fox445hu24%40awork3.anarazel.de

diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index d04426f53f..cc2a5724e7 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -81,12 +81,22 @@ void
 pgstat_report_recovery_conflict(int reason)
 {
 	PgStat_StatDBEntry *dbentry;
+	PgStat_EntryRef *entry_ref;
+	PgStatShared_Database *sharedent;
 
 	Assert(IsUnderPostmaster);
 	if (!pgstat_track_counts)
 		return;
 
-	dbentry = pgstat_prep_database_pending(MyDatabaseId);
+	/*
+	 * Update the shared stats directly - recovery conflicts should never be
+	 * common enough for that to be a problem.
+	 */
+	entry_ref =
+		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, MyDatabaseId, InvalidOid, false);
+
+	sharedent = (PgStatShared_Database *) entry_ref->shared_stats;
+	dbentry = >stats;
 
 	switch (reason)
 	{
@@ -116,6 +126,8 @@ pgstat_report_recovery_conflict(int reason)
 			dbentry->conflict_startup_deadlock++;
 			break;
 	}
+
+	pgstat_unlock_entry(entry_ref);
 }
 
 /*
-- 
2.41.0



Re: Synchronizing slots from primary to standby

2023-09-06 Thread shveta malik
On Fri, Aug 25, 2023 at 2:15 PM Alvaro Herrera  wrote:
>
> Wait a minute ...
>
> From bac0fbef8b203c530e5117b0b7cfee13cfab78b9 Mon Sep 17 00:00:00 2001
> From: Bharath Rupireddy 
> Date: Sat, 22 Jul 2023 10:17:48 +
> Subject: [PATCH v13 1/2] Allow logical walsenders to wait for physical
>  standbys
>
> @@ -2498,6 +2500,13 @@ ReorderBufferProcessTXN(ReorderBuffer *rb, 
> ReorderBufferTXN *txn,
> }
> else
> {
> +   /*
> +* Before we send out the last set of changes to 
> logical decoding
> +* output plugin, wait for specified streaming 
> replication standby
> +* servers (if any) to confirm receipt of WAL upto 
> commit_lsn.
> +*/
> +   WaitForStandbyLSN(commit_lsn);
>
> OK, so we call this new function frequently enough -- once per
> transaction, if I read this correctly?  So ...
>
> +void
> +WaitForStandbyLSN(XLogRecPtr wait_for_lsn)
> +{
>  ...
>
> +   /* "*" means all logical walsenders should wait for physical 
> standbys. */
> +   if (strcmp(synchronize_slot_names, "*") != 0)
> +   {
> +   boolshouldwait = false;
> +
> +   rawname = pstrdup(synchronize_slot_names);
> +   SplitIdentifierString(rawname, ',', );
> +
> +   foreach (l, elemlist)
> +   {
> +   char *name = lfirst(l);
> +   if (strcmp(name, 
> NameStr(MyReplicationSlot->data.name)) == 0)
> +   {
> +   shouldwait = true;
> +   break;
> +   }
> +   }
> +
> +   pfree(rawname);
> +   rawname = NULL;
> +   list_free(elemlist);
> +   elemlist = NIL;
> +
> +   if (!shouldwait)
> +   return;
> +   }
> +
> +   rawname = pstrdup(standby_slot_names);
> +   SplitIdentifierString(rawname, ',', );
>
> ... do we really want to be doing the GUC string parsing every time
> through it?  This sounds like it could be a bottleneck, or at least slow
> things down.  Maybe we should think about caching this somehow.
>

Yes, these parsed lists are now cached. Please see v15
(https://www.postgresql.org/message-id/CAJpy0uAuzbzvcjpnzFTiWuDBctnH-SDZC6AZabPX65x9GWBrjQ%40mail.gmail.com)

thanks
Shveta




Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-09-06 Thread Thomas Munro
On Sun, Aug 13, 2023 at 9:00 AM Andres Freund  wrote:
> On 2023-08-12 15:50:24 +1200, Thomas Munro wrote:
> > Thanks.  I realised that it's easy enough to test that theory about
> > cleanup locks by hacking ConditionalLockBufferForCleanup() to return
> > false randomly.  Then the test occasionally fails as described.  Seems
> > like we'll need to fix that test, but it's not evidence of a server
> > bug, and my signal handler refactoring patch is in the clear.  Thanks
> > for testing it!
>
> WRT fixing the test: I think just using VACUUM FREEZE ought to do the job?
> After changing all the VACUUMs to VACUUM FREEZEs, 031_recovery_conflict.pl
> passes even after I make ConditionalLockBufferForCleanup() fail 100%.

I pushed that change (master only for now, like the other change).




Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-06 Thread Richard Guo
On Wed, Sep 6, 2023 at 8:50 PM Robert Haas  wrote:

> On Wed, Sep 6, 2023 at 2:45 AM Andy Fan  wrote:
> > In my opinion,  we can do some stuff to improve the ROI.
> > -  Authors should do as much as possible,  mainly a better commit
> > message.  As for this patch, the commit message is " Adjustment
> > to get_cheapest_path_for_pathkeys"  which I don't think matches
> > our culture.
>
> I agree. I don't think the patch submitter is obliged to try to write
> a good commit message, but people who contribute regularly or are
> posting large stacks of complex patches are probably well-advised to
> try. It makes life easier for committers and even for reviewers trying
> to make sense of their patches.


Fair point.  So I had a go at writing a commit message for this patch as
attached.  Thanks for all the reviews.

Thanks
Richard


v2-0001-Reorder-the-tests-in-get_cheapest_path_for_pathkeys.patch
Description: Binary data


Re: Add 'worker_type' to pg_stat_subscription

2023-09-06 Thread Peter Smith
On Wed, Sep 6, 2023 at 9:49 AM Nathan Bossart  wrote:
>
> On Wed, Sep 06, 2023 at 09:02:21AM +1200, Peter Smith wrote:
> > On Sat, Sep 2, 2023 at 7:41 AM Nathan Bossart  
> > wrote:
> >> I see that the table refers to "leader apply workers".  Would those show up
> >> as parallel apply workers in the view?  Can we add another worker type for
> >> those?
> >
> > Internally there are only 3 worker types: A "leader" apply worker is
> > basically the same as a regular apply worker, except it has other
> > parallel apply workers associated with it.
> >
> > I felt that pretending there are 4 types in the view would be
> > confusing. Instead, I just removed the word "leader". Now there are:
> > "apply worker"
> > "parallel apply worker"
> > "table synchronization worker"
>
> Okay.  Should we omit "worker" for each of the types?  Since these are the
> values for the "worker_type" column, it seems a bit redundant.  For
> example, we don't add "backend" to the end of each value for backend_type
> in pg_stat_activity.
>
> I wonder if we could add the new field to the end of
> pg_stat_get_subscription() so that we could simplify this patch a bit.  At
> the moment, a big chunk of it is dedicated to reordering the values.
>

Modified as suggested. PSA v3.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v3-0001-Add-worker_type-to-pg_stat_subscription.patch
Description: Binary data


Re: A failure in 031_recovery_conflict.pl on Debian/s390x

2023-09-06 Thread Thomas Munro
I have now disabled the test in 15 and 16 (like the older branches).
I'll see about getting the fixes into master today, and we can
contemplate back-patching later, after we've collected a convincing
volume of test results from the build farm, CI and hopefully your
s390x master snapshot builds (if that is a thing).




Re: should frontend tools use syncfs() ?

2023-09-06 Thread Nathan Bossart
On Tue, Sep 05, 2023 at 05:08:53PM -0700, Nathan Bossart wrote:
> I've committed 0001 for now.  I plan to commit the rest in the next couple
> of days.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Can a role have indirect ADMIN OPTION on another role?

2023-09-06 Thread David G. Johnston
On Wed, Sep 6, 2023 at 1:55 PM Ashutosh Sharma 
wrote:

> But what if roleB doesn't want to give roleA access to
> the certain objects it owns.


Not doable - roleA can always pretend they are roleB one way or another
since roleA made roleB.

David J.


Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-09-06 Thread Tom Lane
Andres Freund  writes:
> On 2023-08-16 13:15:46 +0200, Alvaro Herrera wrote:
>> Since the wins from this patch were replicated and it has been pushed, I
>> understand that this open item can be marked as closed, so I've done
>> that.

> Thanks!

It turns out that this patch is what's making buildfarm member
chipmunk fail in contrib/pg_visibility [1].  That's easily reproduced
by running the test with shared_buffers = 10MB.  I didn't dig further
than the "git bisect" result:

$ git bisect bad
82a4edabd272f70d044faec8cf7fd1eab92d9991 is the first bad commit
commit 82a4edabd272f70d044faec8cf7fd1eab92d9991
Author: Andres Freund 
Date:   Mon Aug 14 09:54:03 2023 -0700

hio: Take number of prior relation extensions into account

but I imagine the problem is that the patch's more aggressive
relation-extension heuristic is causing the table to have more
pages than the test case expects.  Is it really a good idea
for such a heuristic to depend on shared_buffers?  If you don't
want to change the heuristic then we'll have to find a way to
tweak this test to avoid it.

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunk=2023-09-06%2014%3A14%3A51




Re: Eliminate redundant tuple visibility check in vacuum

2023-09-06 Thread Melanie Plageman
On Wed, Sep 6, 2023 at 1:04 PM Robert Haas  wrote:
>
> On Thu, Aug 31, 2023 at 6:29 PM Melanie Plageman
>  wrote:
> > I have changed this.
>
> I spent a bunch of time today looking at this, thinking maybe I could
> commit it. But then I got cold feet.
>
> With these patches applied, PruneResult ends up being declared in
> heapam.h, with a comment that says /* State returned from pruning */.
> But that comment isn't accurate. The two new members that get added to
> the structure by 0002, namely nnewlpdead and htsv, are in fact state
> that is returned from pruning. But the other 5 members aren't. They're
> just initialized to constant values by pruning and then filled in for
> real by the vacuum logic. That's extremely weird. It would be fine if
> heap_page_prune() just grew a new output argument that only returned
> the HTSV results, or perhaps it could make sense to bundle any
> existing out parameters together into a struct and then add new things
> to that struct instead of adding even more parameters to the function
> itself. But there doesn't seem to be any good reason to muddle
> together the new output parameters for heap_page_prune() with a bunch
> of state that is currently internal to vacuumlazy.c.
>
> I realize that the shape of the patches probably stems from the fact
> that they started out life as part of a bigger patch set. But to be
> committed independently, they need to be shaped in a way that makes
> sense independently, and I don't think this qualifies. On the plus
> side, it seems to me that it's probably not that much work to fix this
> issue and that the result would likely be a smaller patch than what
> you have now, which is something.

Yeah, I think this is a fair concern. I have addressed it in the
attached patches.

I thought a lot about whether or not adding a PruneResult which
contains only the output parameters and result of heap_page_prune() is
annoying since we have so many other *Prune* data structures. I
decided it's not annoying. In some cases, four outputs don't merit a
new structure. In this case, I think it declutters the code a bit --
independent of any other patches I may be writing :)

- Melanie
From 11f5d895e4efe7b3b3a7bbc58efbb4d6c40274c0 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Wed, 6 Sep 2023 14:57:20 -0400
Subject: [PATCH v3 1/2] Move heap_page_prune output parameters into struct

Add PruneResult, a structure containing the output parameters and result
of heap_page_prune(). Reorganizing the results of heap_page_prune() into
a struct simplifies the function signature and provides a location for
future commits to store additional output parameters.

Note that heap_page_prune() no longer NULL checks off_loc, the current
tuple's offset in the line pointer array. It will never be NULL now that
it is a member of a required output parameter, PruneResult.

Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
 src/backend/access/heap/pruneheap.c  | 47 +++-
 src/backend/access/heap/vacuumlazy.c | 19 +--
 src/include/access/heapam.h  | 20 ++--
 src/tools/pgindent/typedefs.list |  1 +
 4 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 18193efa23..f0847795a7 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -155,15 +155,13 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 		 */
 		if (PageIsFull(page) || PageGetHeapFreeSpace(page) < minfree)
 		{
-			int			ndeleted,
-		nnewlpdead;
+			PruneResult presult;
 
-			ndeleted = heap_page_prune(relation, buffer, vistest,
-	   , NULL);
+			heap_page_prune(relation, buffer, vistest, );
 
 			/*
 			 * Report the number of tuples reclaimed to pgstats.  This is
-			 * ndeleted minus the number of newly-LP_DEAD-set items.
+			 * presult.ndeleted minus the number of newly-LP_DEAD-set items.
 			 *
 			 * We derive the number of dead tuples like this to avoid totally
 			 * forgetting about items that were set to LP_DEAD, since they
@@ -175,9 +173,9 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 			 * tracks ndeleted, since it will set the same LP_DEAD items to
 			 * LP_UNUSED separately.
 			 */
-			if (ndeleted > nnewlpdead)
+			if (presult.ndeleted > presult.nnewlpdead)
 pgstat_update_heap_dead_tuples(relation,
-			   ndeleted - nnewlpdead);
+			   presult.ndeleted - presult.nnewlpdead);
 		}
 
 		/* And release buffer lock */
@@ -204,21 +202,15 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
  * (see heap_prune_satisfies_vacuum and
  * HeapTupleSatisfiesVacuum).
  *
- * Sets *nnewlpdead for caller, indicating the number of items that were
- * newly set LP_DEAD during prune operation.
- *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
- * Returns the number of tuples deleted 

Re: GUC for temporarily disabling event triggers

2023-09-06 Thread Daniel Gustafsson
> On 6 Sep 2023, at 16:22, Robert Haas  wrote:
> On Wed, Sep 6, 2023 at 4:50 AM Daniel Gustafsson  wrote:

>> Fair enough, how about disable_event_trigger instead as per the attached?
> 
> I usually prefer to give things a positive sense, talking about
> whether things are enabled rather than disabled. I'd do event_triggers
> = off | on, like we have for row_security. YMMV, though.

Fair enough, I don't have strong opinions and I do agree that making this work
like row_security is a good thing for consistency.  Done in the attached.

--
Daniel Gustafsson



v8-0001-Add-GUC-for-temporarily-disabling-event-triggers.patch
Description: Binary data


Re: Eager page freeze criteria clarification

2023-09-06 Thread Robert Haas
On Wed, Sep 6, 2023 at 12:20 PM Peter Geoghegan  wrote:
> This was a case where index vacuuming was never required. It's just a
> simple and easy to recreate example of what I think of as a more
> general problem.

OK.

> Why wouldn't we expect a table to have some pages that ought to be
> frozen right away, and others where freezing should in theory be put
> off indefinitely? I think that that's very common.

Oh, I see. I agree that that's pretty common. I think what that means
in practice is that we need to avoid relying too much on
relation-level statistics to guide behavior with respect to individual
relation pages. On the other hand, I don't think it means that we
can't look at relation-wide or even system-wide statistics at all.
Sure, those statistics may not be perfect, but some things are not
practical to track on a page granularity, and having some
course-grained information can, I think, be better than having nothing
at all, if you're careful about how much and in what way you rely on
it.

> As you know, I am particularly concerned about the tendency of
> unfrozen all-visible pages to accumulate without bound (at least
> without bound expressed in physical units such as pages). The very
> fact that pages are being set all-visible by VACUUM can be seen as a
> part of a high-level systemic problem -- a problem that plays out over
> time, across multiple VACUUM operations. So even if the cost of
> setting pages all-visible happened to be much lower than the cost of
> freezing (which it isn't), setting pages all-visible without freezing
> has unique downsides.

I generally agree with all of that.

> If VACUUM freezes too aggressively, then (pretty much by definition)
> we can be sure that the next VACUUM will scan the same pages -- there
> may be some scope for VACUUM to "learn from its mistake" when we err
> in the direction of over-freezing. But when VACUUM makes the opposite
> mistake (doesn't freeze when it should have), it won't scan those same
> pages again for a long time, by design. It therefore has no plausible
> way of "learning from its mistakes" before it becomes an extremely
> expensive and painful lesson (which happens whenever the next
> aggressive VACUUM takes place). This is in large part a consequence of
> the way that VACUUM dutifully sets pages all-visible whenever
> possible. That behavior interacts badly with many workloads, over
> time.

I think this is an insightful commentary with which I partially agree.
As I see it, the difference is that when you make the mistake of
marking something all-visible or freezing it too aggressively, you
incur a price that you pay almost immediately. When you make the
mistake of not marking something all-visible when it would have been
best to do so, you incur a price that you pay later, when the next
VACUUM happens. When you make the mistake of not marking something
all-frozen when it would have been best to do so, you incur a price
that you pay even later, not at the next VACUUM but at some VACUUM
further off. So there are different trade-offs. When you pay the price
for a mistake immediately or nearly immediately, it can potentially
harm the performance of the foreground workload, if you're making a
lot of mistakes. That sucks. On the other hand, when you defer paying
the price until some later bulk operation, the costs of all of your
mistakes get added up and then you pay the whole price all at once,
which means you can be suddenly slapped with an enormous bill that you
weren't expecting. That sucks, too, just in a different way.

> VACUUM simply ignores such second-order effects. Perhaps it would be
> practical to address some of the issues in this area by avoiding
> setting pages all-visible without freezing them, in some general
> sense. That at least creates a kind of symmetry between mistakes in
> the direction of under-freezing and mistakes in the direction of
> over-freezing. That might enable VACUUM to course-correct in either
> direction.
>
> Melanie is already planning on combining the WAL records (PRUNE,
> FREEZE_PAGE, and VISIBLE). Perhaps that'll weaken the argument for
> setting unfrozen pages all-visible even further.

Yeah, so I think the question here is whether it's ever a good idea to
mark a page all-visible without also freezing it. If it's not, then we
should either mark fewer pages all-visible, or freeze more of them.
Maybe I'm all wet here, but I think it depends on the situation. If a
page is already dirty and has had an FPI since the last checkpoint,
then it's pretty appealing to freeze whenever we mark all-visible. We
still have to consider whether the incremental CPU cost and WAL volume
are worth it, but assuming those costs are small enough not to be a
big problem, it seems like a pretty good bet. Making a page
un-all-visible has some cost, but making a page un-all-frozen really
doesn't, so cool. On the other hand, if we have a page that isn't
dirty, hasn't had a recent FPI, and doesn't need pruning, but which

Simple CustomScan example

2023-09-06 Thread Chris Cleveland
I'm trying to write a custom scan. It's pretty confusing. I've read the
documentation at
https://www.postgresql.org/docs/current/custom-scan.html, and I've scanned
the
code in Citus Columnar and in Timescale, both of which are quite complex.

Is anyone aware of code with a simple example of a custom scan? Or maybe a
tutorial?

-- 
Chris Cleveland
312-339-2677 mobile


Re: Release notes wording about logical replication as table owner

2023-09-06 Thread Bruce Momjian
On Wed, Sep  6, 2023 at 09:29:25PM +0200, Magnus Hagander wrote:
> We have:
> 
> "This improves security and now requires subscription owners to be
> either superusers or to have SET ROLE permissions on all tables in the
> replication set. The previous behavior of performing all operations as
> the subscription owner can be enabled with the subscription
> run_as_owner option."
> 
> How does one have SET ROLE permissions on a table? I think that's
> supposed to be:
> 
> "subscription owners be either superusers or to have SET ROLE
> permissions on all roles owning tables in the replication set."
> 
> Or something like that? Or can someone suggest a better wording?

You are exactly corrected.  Patch attached and applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml
index 0fa0d25fe1..5f174e99f6 100644
--- a/doc/src/sgml/release-16.sgml
+++ b/doc/src/sgml/release-16.sgml
@@ -1818,7 +1818,7 @@ Author: Robert Haas 
This improves security and now requires subscription
owners to be either superusers or to have SET ROLE
-   permissions on all tables in the replication set.
+   permission on all roles owning tables in the replication set.
The previous behavior of performing all operations as the
subscription owner can be enabled with the subscription run_as_owner


Release notes wording about logical replication as table owner

2023-09-06 Thread Magnus Hagander
We have:

"This improves security and now requires subscription owners to be
either superusers or to have SET ROLE permissions on all tables in the
replication set. The previous behavior of performing all operations as
the subscription owner can be enabled with the subscription
run_as_owner option."

How does one have SET ROLE permissions on a table? I think that's
supposed to be:

"subscription owners be either superusers or to have SET ROLE
permissions on all roles owning tables in the replication set."

Or something like that? Or can someone suggest a better wording?

//Magnus




Re: Obsolete reference to pg_relation in comment

2023-09-06 Thread Bruce Momjian
On Wed, Jul 26, 2023 at 05:14:08PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
> > On Wed, Jul 26, 2023 at 06:48:51PM +0100, Dagfinn Ilmari Mannsåker wrote:
> >>   * All accesses to pg_largeobject and its index make use of a single 
> >> Relation
> >> - * reference, so that we only need to open pg_relation once per 
> >> transaction.
> >> + * reference, so that we only need to open pg_class once per transaction.
> >>   * To avoid problems when the first such reference occurs inside a
> >>   * subtransaction, we execute a slightly klugy maneuver to assign 
> >> ownership of
> >>   * the Relation reference to TopTransactionResourceOwner.
> 
> > Hm.  Are you sure this is actually referring to pg_class?  It seems
> > unlikely given pg_relation was renamed 14 years before this comment was
> > added, and the code appears to be ensuring that pg_largeobject and its
> > index are opened at most once per transaction.
> 
> I believe it is just a typo/thinko for pg_class, but there's more not
> to like about this comment.  First, once we've made a relcache entry
> it would typically stay valid across uses, so it's far from clear that
> this coding actually prevents many catalog accesses in typical cases.
> Second, when we do have to rebuild the relcache entry, there's a lot
> more involved than just a pg_class fetch; we at least need to read
> pg_attribute, and I think there may be other catalogs that we'd read
> along the way, even for a system catalog that lacks complicated
> features.  (pg_index would presumably get looked at, for instance.)
> 
> I think we should reword this to just generically claim that holding
> the Relation reference open for the whole transaction reduces overhead.

How is this attached patch?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/src/backend/storage/large_object/inv_api.c b/src/backend/storage/large_object/inv_api.c
new file mode 100644
index 84e543e..cc9c335
*** a/src/backend/storage/large_object/inv_api.c
--- b/src/backend/storage/large_object/inv_api.c
***
*** 58,68 
  bool		lo_compat_privileges;
  
  /*
!  * All accesses to pg_largeobject and its index make use of a single Relation
!  * reference, so that we only need to open pg_relation once per transaction.
!  * To avoid problems when the first such reference occurs inside a
!  * subtransaction, we execute a slightly klugy maneuver to assign ownership of
!  * the Relation reference to TopTransactionResourceOwner.
   */
  static Relation lo_heap_r = NULL;
  static Relation lo_index_r = NULL;
--- 58,68 
  bool		lo_compat_privileges;
  
  /*
!  * All accesses to pg_largeobject and its index make use of a single
!  * Relation reference.  To guarantee that the relcache entry remains
!  * in the cache, on the first reference inside a subtransaction, we
!  * execute a slightly klugy maneuver to assign ownership of the
!  * Relation reference to TopTransactionResourceOwner.
   */
  static Relation lo_heap_r = NULL;
  static Relation lo_index_r = NULL;


Re: information_schema and not-null constraints

2023-09-06 Thread Vik Fearing

On 9/6/23 05:40, Tom Lane wrote:

Vik Fearing  writes:

On 9/6/23 02:53, Tom Lane wrote:

What solution do you propose?  Starting to enforce the spec's rather
arbitrary requirement that constraint names be unique per-schema is
a complete nonstarter.  Changing the set of columns in a spec-defined
view is also a nonstarter, or at least we've always taken it as such.



I both semi-agree and semi-disagree that these are nonstarters.  One of
them has to give.


[ shrug... ] if you stick to a SQL-compliant schema setup, then the
information_schema views will serve for introspection.  If you don't,
they won't, and you'll need to look at Postgres-specific catalog data.



As someone who regularly asks people to cite chapter and verse of the 
standard, do you not see this as a problem?


If there is /one thing/ I wish we were 100% compliant on, it's 
information_schema.




This compromise has served for twenty years or so, and I'm not in a
hurry to change it.  



Has it?  Or is this just the first time someone has complained?



I think the odds of changing to the spec's
restriction without enormous pushback are nil, and I do not think
that the benefit could possibly be worth the ensuing pain to users.



That is a valid opinion, and probably one that will win out for quite a 
while.




(It's not even the absolute pain level that is a problem, so much
as the asymmetry: the pain would fall exclusively on users who get
no benefit, because they weren't relying on these views anyway.
If you think that's an easy sell, you're mistaken.)



I am curious how many people we are selling this to.  In my career as a 
consultant, I have never once come across anyone specifying their own 
constraint names.  That is certainly anecdotal, and by no means means it 
doesn't happen, but my personal experience says that it is very low.


And since our generated names obey the spec (see ChooseConstraintName() 
which even says some apps depend on this), I don't see making this 
change being a big problem in the real world.


Mind you, I am not pushing (right now) to make this change; I am just 
saying that it is the right thing to do.




It could possibly be a little more palatable to add column(s) to the
information_schema views, but I'm having a hard time seeing how that
moves the needle.  The situation would still be precisely describable
as "if you stick to a SQL-compliant schema setup, then the standard
columns of the information_schema views will serve for introspection.
If you don't, they won't, and you'll need to look at Postgres-specific
columns".  That doesn't seem like a big improvement.  Also, given your
point about normalization, how would we define the additions exactly?



This is precisely my point.
--
Vik Fearing





Re: Configurable FP_LOCK_SLOTS_PER_BACKEND

2023-09-06 Thread Jeremy Schneider
On 8/8/23 3:04 PM, Andres Freund wrote:
> On 2023-08-08 16:44:37 -0400, Robert Haas wrote:
>> On Mon, Aug 7, 2023 at 6:05 PM Andres Freund  wrote:
>>> I think the biggest flaw of the locking scheme is that the LockHash locks
>>> protect two, somewhat independent, things:
>>> 1) the set of currently lockable objects, i.e. the entries in the hash 
>>> table [partition]
>>> 2) the state of all the locks [in a partition]
>>>
>>> It'd not be that hard to avoid the shared hashtable lookup in a number of
>>> cases, e.g. by keeping LOCALLOCK entries around for longer, as I suggest
>>> above.  But we can't, in general, avoid the lock on the partition anyway, as
>>> the each lock's state is also protected by the partition lock.
>>
>> Yes, and that's a huge problem. The main selling point of the whole
>> fast-path mechanism is to ease the pressure on the lock manager
>> partition locks, and if we did something like what you described in
>> the previous email without changing the locking regimen, we'd bring
>> all of that contention back. I'm pretty sure that would suck.
> 
> Yea - I tried to outline how I think we could implement the fastpath locking
> scheme in a less limited way in the earlier email, that I had quoted above
> this bit.  Here I was pontificating on what we possibly should do in addition
> to that. I think even if we had "unlimited" fastpath locking, there's still
> enough pressure on the lock manager locks that it's worth improving the
> overall locking scheme.


Has anyone considered whether increasing NUM_LOCK_PARTITIONS to
something bigger than 16 might offer cheap/easy/small short-term
improvements while folks continue to think about the bigger long-term ideas?

cf.
https://www.postgresql.org/message-id/flat/VI1PR05MB62031A41186ACC3FC91ACFC70%40VI1PR05MB6206.eurprd05.prod.outlook.com

I haven't looked deeply into it myself yet. Didn't see a mention in this
thread or in Matt's gitlab research ticket. Maybe it doesn't actually
help. Anyway Alexander Pyhalov's email about LWLock optimization and
NUM_LOCK_PARTITIONS is out there, and I wondered about this.

-Jeremy


-- 
Jeremy Schneider
Performance Engineer
Amazon Web Services





Re: Can a role have indirect ADMIN OPTION on another role?

2023-09-06 Thread Robert Haas
On Wed, Sep 6, 2023 at 1:33 PM Ashutosh Sharma  wrote:
> Actually I have one more question. With this new design, assuming that
> createrole_self_grant is set to 'set, inherit' in postgresql.conf and
> if roleA creates roleB. So, in this case, roleA will inherit
> permissions of roleB which means roleA will have access to objects
> owned by roleB. But what if roleB doesn't want to give roleA access to
> the certain objects it owns. As an example let's say that roleB
> creates a table 't' and by default (with this setting) roleA will have
> access to this table, but for some reason roleB does not want roleA to
> have access to it. So what's the option for roleB? I've tried running
> "revoke select on table t from roleA" but that doesn't seem to be
> working. the only option that works is roleA himself set inherit
> option on roleB to false - "grant roleB to roleA with inherit false;"

It doesn't matter what roleB wants. roleA is strictly more powerful
than roleB and can do whatever they want to roleB or roleB's objects
regardless of how roleB feels about it.

In the same way, the superuser is strictly more powerful than either
roleA or roleB and can override any security control that either one
of them put in place.

Neither roleB nor roleA has any right to hide their data from the
superuser, and roleB has no right to hide data from roleA. It's a
hierarchy. If you're on top, you're in charge, and that's it.

Here again, it can't really meaningfully work in any other way.
Suppose you were to add a feature to allow roleB to hide data from
roleA. Given that roleA has the ability to change roleB's password,
how could that possibly work? When you give one user the ability to
administer another user, that includes the right to change that user's
password, change whether they can log in, drop the role, give the
privileges of that role to themselves or other users, and a whole
bunch of other super-powerful stuff. You can't really give someone
that level of power over another account and, at the same time, expect
the account being administered to be able to keep the more powerful
account from doing stuff. It just can't possibly work. If you want
roleB to be able to resist roleA, you have to give roleA less power.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: information_schema and not-null constraints

2023-09-06 Thread Alvaro Herrera
On 2023-Sep-04, Alvaro Herrera wrote:

> In reference to [1], 0001 attached to this email contains the updated
> view definitions that I propose.

Given the downthread discussion, I propose the attached.  There are no
changes to v2, other than dropping the test part.

We can improve the situation for domains separately and likewise for
testing.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
>From 870533e728cbbcc878cf10cef12b3357a51db5fd Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 4 Sep 2023 18:05:50 +0200
Subject: [PATCH v3] Update information_schema definition for not-null
 constraints
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Now that we have catalogued not-null constraints, our information_schema
definition must be updated to grab those rather than fabricate synthetic
definitions.

Note that we still don't have catalog rows for not-null constraints on
domains, but we've never had not-null constraints listed in
information_schema, so that's a problem to be solved separately.

Co-authored-by: Peter Eisentraut 
Co-authored-by: Álvaro Herrera 
Discussion: https://postgr.es/m/202309041710.psytrxlsiqex@alvherre.pgsql
---
 src/backend/catalog/information_schema.sql | 74 --
 src/include/catalog/catversion.h   |  2 +-
 2 files changed, 28 insertions(+), 48 deletions(-)

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index a06ec7a0a8..c402cca7f4 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -444,22 +444,19 @@ CREATE VIEW check_constraints AS
 WHERE pg_has_role(coalesce(c.relowner, t.typowner), 'USAGE')
   AND con.contype = 'c'
 
-UNION
+UNION ALL
 -- not-null constraints
-
-SELECT CAST(current_database() AS sql_identifier) AS constraint_catalog,
-   CAST(n.nspname AS sql_identifier) AS constraint_schema,
-   CAST(CAST(n.oid AS text) || '_' || CAST(r.oid AS text) || '_' || CAST(a.attnum AS text) || '_not_null' AS sql_identifier) AS constraint_name, -- XXX
-   CAST(a.attname || ' IS NOT NULL' AS character_data)
- AS check_clause
-FROM pg_namespace n, pg_class r, pg_attribute a
-WHERE n.oid = r.relnamespace
-  AND r.oid = a.attrelid
-  AND a.attnum > 0
-  AND NOT a.attisdropped
-  AND a.attnotnull
-  AND r.relkind IN ('r', 'p')
-  AND pg_has_role(r.relowner, 'USAGE');
+SELECT current_database()::information_schema.sql_identifier AS constraint_catalog,
+   rs.nspname::information_schema.sql_identifier AS constraint_schema,
+   con.conname::information_schema.sql_identifier AS constraint_name,
+   format('CHECK (%s IS NOT NULL)', at.attname)::information_schema.character_data AS check_clause
+ FROM pg_constraint con
+LEFT JOIN pg_namespace rs ON rs.oid = con.connamespace
+LEFT JOIN pg_class c ON c.oid = con.conrelid
+LEFT JOIN pg_type t ON t.oid = con.contypid
+LEFT JOIN pg_attribute at ON (con.conrelid = at.attrelid AND con.conkey[1] = at.attnum)
+ WHERE pg_has_role(coalesce(c.relowner, t.typowner), 'USAGE'::text)
+   AND con.contype = 'n';
 
 GRANT SELECT ON check_constraints TO PUBLIC;
 
@@ -826,6 +823,20 @@ CREATE VIEW constraint_column_usage AS
 AND r.relkind IN ('r', 'p')
 AND NOT a.attisdropped
 
+	UNION ALL
+
+	/* not-null constraints */
+SELECT DISTINCT nr.nspname, r.relname, r.relowner, a.attname, nc.nspname, c.conname
+  FROM pg_namespace nr, pg_class r, pg_attribute a, pg_namespace nc, pg_constraint c
+  WHERE nr.oid = r.relnamespace
+AND r.oid = a.attrelid
+AND r.oid = c.conrelid
+AND a.attnum = c.conkey[1]
+AND c.connamespace = nc.oid
+AND c.contype = 'n'
+AND r.relkind in ('r', 'p')
+AND not a.attisdropped
+
 UNION ALL
 
 /* unique/primary key/foreign key constraints */
@@ -1828,6 +1839,7 @@ CREATE VIEW table_constraints AS
CAST(r.relname AS sql_identifier) AS table_name,
CAST(
  CASE c.contype WHEN 'c' THEN 'CHECK'
+WHEN 'n' THEN 'CHECK'
 WHEN 'f' THEN 'FOREIGN KEY'
 WHEN 'p' THEN 'PRIMARY KEY'
 WHEN 'u' THEN 'UNIQUE' END
@@ -1852,38 +1864,6 @@ CREATE VIEW table_constraints AS
   AND c.contype NOT IN ('t', 'x')  -- ignore nonstandard constraints
   AND r.relkind IN ('r', 'p')
   AND (NOT pg_is_other_temp_schema(nr.oid))
-  AND (pg_has_role(r.relowner, 'USAGE')
-   -- SELECT privilege omitted, per SQL standard
-   OR has_table_privilege(r.oid, 'INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER')
-   OR 

Re: Can a role have indirect ADMIN OPTION on another role?

2023-09-06 Thread Ashutosh Sharma
On Wed, Sep 6, 2023 at 9:03 PM Robert Haas  wrote:
>
> On Wed, Sep 6, 2023 at 11:14 AM Ashutosh Sharma  wrote:
> > In PG-16, I see that we have made a lot of changes in the area roles
> > and privileges. I have a question related to this and here is my
> > question:
> >
> > Let's say there is a roleA who creates roleB and then roleB creates
> > another role, say roleC. By design, A can administer B and B can
> > administer C. But, can A administer C although it has not created C?
>
> Ultimately, yes, because A can get access to all of B's privileges,
> which include administering C. However, A might or might not have B's
> privileges by default, depending on the value of createrole_self_grant
> in effect at the time when B was created. So, depending on the
> situation, A might (or might not) need to do something like GRANT
> roleB to roleA or SET ROLE roleB in order to be able to actually
> execute the administration commands in question.
>
> IMHO, it really couldn't reasonably work in any other way. Consider
> that A's right to administer B includes the right to change B's
> password. If the superuser wants users A and B that can't interfere
> with each other, the superuser should create both of those accounts
> themselves instead of letting one create the other.
>

Thank you for the clarification. This is very helpful.

Actually I have one more question. With this new design, assuming that
createrole_self_grant is set to 'set, inherit' in postgresql.conf and
if roleA creates roleB. So, in this case, roleA will inherit
permissions of roleB which means roleA will have access to objects
owned by roleB. But what if roleB doesn't want to give roleA access to
the certain objects it owns. As an example let's say that roleB
creates a table 't' and by default (with this setting) roleA will have
access to this table, but for some reason roleB does not want roleA to
have access to it. So what's the option for roleB? I've tried running
"revoke select on table t from roleA" but that doesn't seem to be
working. the only option that works is roleA himself set inherit
option on roleB to false - "grant roleB to roleA with inherit false;"

--
With Regards,
Ashutosh Sharma.




Re: Eliminate redundant tuple visibility check in vacuum

2023-09-06 Thread Robert Haas
On Thu, Aug 31, 2023 at 6:29 PM Melanie Plageman
 wrote:
> I have changed this.

I spent a bunch of time today looking at this, thinking maybe I could
commit it. But then I got cold feet.

With these patches applied, PruneResult ends up being declared in
heapam.h, with a comment that says /* State returned from pruning */.
But that comment isn't accurate. The two new members that get added to
the structure by 0002, namely nnewlpdead and htsv, are in fact state
that is returned from pruning. But the other 5 members aren't. They're
just initialized to constant values by pruning and then filled in for
real by the vacuum logic. That's extremely weird. It would be fine if
heap_page_prune() just grew a new output argument that only returned
the HTSV results, or perhaps it could make sense to bundle any
existing out parameters together into a struct and then add new things
to that struct instead of adding even more parameters to the function
itself. But there doesn't seem to be any good reason to muddle
together the new output parameters for heap_page_prune() with a bunch
of state that is currently internal to vacuumlazy.c.

I realize that the shape of the patches probably stems from the fact
that they started out life as part of a bigger patch set. But to be
committed independently, they need to be shaped in a way that makes
sense independently, and I don't think this qualifies. On the plus
side, it seems to me that it's probably not that much work to fix this
issue and that the result would likely be a smaller patch than what
you have now, which is something.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-06 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thank you for reviewing! 

> 
> I also think referring to the conflicting field would be better not
> only for the purpose of avoiding extra code but also to give accurate
> information about invalidated slots for which we want to give an
> error.

Fixed.

> Additionally, I think we should try to avoid writing a new function
> strtoLSN as that adds a maintainability burden. We can probably send
> the value fetched from pg_controldata in the query for comparison with
> confirmed_flush LSN.

Changed like that. LogicalSlotInfo was also updated to have the Boolean.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-06 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> 
> I think it would be better to write problematic slots in the script
> file like we are doing in the function
> check_for_composite_data_type_usage()->check_for_data_types_usage()
> and give a message suggesting what the user can do as we are doing in
> check_for_composite_data_type_usage(). That will be helpful for the
> user to take necessary action.

Did it. I wondered how we output the list of slots because there are two types 
of
problem, but currently I used a same file. If you have better approach, please
teach me.

> A few other comments:
> =
> 1.
> @@ -189,6 +199,8 @@ check_new_cluster(void)
>  {
>   get_db_and_rel_infos(_cluster);
> 
> + check_new_cluster_logical_replication_slots();
> +
>   check_new_cluster_is_empty();
> 
>   check_loadable_libraries();
> 
> Why check_new_cluster_logical_replication_slots is done before
> check_new_cluster_is_empty? At least check_new_cluster_is_empty()
> would be much quicker to return an error if any. I think if we don't
> have a specific reason to position this new check, we can do it at the
> end after check_for_new_tablespace_dir() to avoid breaking the order
> of existing checks.

Moved to the bottom.

> 2. Shall we rename get_db_and_rel_infos() to
> get_db_rel_and_slot_infos() or something like that as that function
> now fetches the slot information as well?

Fixed. Comments were also fixed as well. 

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Debian 12 gcc warning

2023-09-06 Thread Bruce Momjian
On Wed, Aug 30, 2023 at 11:34:22AM -0400, Bruce Momjian wrote:
> Good news, I was able to prevent the bug by causing compiling of
> clauses.c to use -O2 by adding this to src/Makefile.custom:
> 
>   clauses.o : CFLAGS+=-O2
> 
> Here is my submitted bug report:
> 
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111240

I got this reply on the bug report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111240#c5

Richard Biener 2023-08-31 09:46:44 UTC

Confirmed.

rettype_58 = enforce_generic_type_consistency (_arg_types, 
_arg_types, 0, _56, 0);

and we reach this on the args == 0 path where indeed actual_arg_types
is uninitialized and our heuristic says that a const qualified pointer
is an input and thus might be read.  So you get a maybe-uninitialized
diagnostic at the call.

GCC doesn't know that the 'nargs' argument relates to the array and
that at most 'nargs' (zero here) arguments are inspected.

So I think it works as designed, we have some duplicate bugreports
complaining about this "heuristic".

We are exposing this to ourselves by optimizing the args == 0 case
(skipping the initialization loop and constant propagating the
nargs argument).  Aka jump-threading.

I think we just have to assume this incorrect warning will be around for
a while.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Eager page freeze criteria clarification

2023-09-06 Thread Peter Geoghegan
On Wed, Sep 6, 2023 at 7:46 AM Robert Haas  wrote:
> It's interesting that when you raised the threshold the rate of
> vacuuming dropped to zero. I haven't seen that behavior. pgbench does
> rely very, very heavily on HOT-pruning, so VACUUM only cleans up a
> small percentage of the dead tuples that get created. However, it's
> still needed for index vacuuming, and I'm not sure what would cause it
> not to trigger for that purpose.

This was a case where index vacuuming was never required. It's just a
simple and easy to recreate example of what I think of as a more
general problem.

> > > And I'm not sure why we
> > > want to do that. If the table is being vacuumed a lot, it's probably
> > > also being modified a lot, which suggests that we ought to be more
> > > cautious about freezing, rather than the reverse.
> >
> > Why wouldn't it be both things at the same time, for the same table?
>
> Both of what things?

Why wouldn't we expect a table to have some pages that ought to be
frozen right away, and others where freezing should in theory be put
off indefinitely? I think that that's very common.

> > Why not also avoid setting pages all-visible? The WAL records aren't
> > too much smaller than most freeze records these days -- 64 bytes on
> > most systems. I realize that the rules for FPIs are a bit different
> > when page-level checksums aren't enabled, but fundamentally it's the
> > same situation. No?
>
> It's an interesting point. AFAIK, whether or not page-level checksums
> are enabled doesn't really matter here. But it seems fair to ask - if
> freezing is too aggressive, why is setting all-visible not also too
> aggressive? I don't have a good answer to that question right now.

As you know, I am particularly concerned about the tendency of
unfrozen all-visible pages to accumulate without bound (at least
without bound expressed in physical units such as pages). The very
fact that pages are being set all-visible by VACUUM can be seen as a
part of a high-level systemic problem -- a problem that plays out over
time, across multiple VACUUM operations. So even if the cost of
setting pages all-visible happened to be much lower than the cost of
freezing (which it isn't), setting pages all-visible without freezing
has unique downsides.

If VACUUM freezes too aggressively, then (pretty much by definition)
we can be sure that the next VACUUM will scan the same pages -- there
may be some scope for VACUUM to "learn from its mistake" when we err
in the direction of over-freezing. But when VACUUM makes the opposite
mistake (doesn't freeze when it should have), it won't scan those same
pages again for a long time, by design. It therefore has no plausible
way of "learning from its mistakes" before it becomes an extremely
expensive and painful lesson (which happens whenever the next
aggressive VACUUM takes place). This is in large part a consequence of
the way that VACUUM dutifully sets pages all-visible whenever
possible. That behavior interacts badly with many workloads, over
time.

VACUUM simply ignores such second-order effects. Perhaps it would be
practical to address some of the issues in this area by avoiding
setting pages all-visible without freezing them, in some general
sense. That at least creates a kind of symmetry between mistakes in
the direction of under-freezing and mistakes in the direction of
over-freezing. That might enable VACUUM to course-correct in either
direction.

Melanie is already planning on combining the WAL records (PRUNE,
FREEZE_PAGE, and VISIBLE). Perhaps that'll weaken the argument for
setting unfrozen pages all-visible even further.

-- 
Peter Geoghegan




Re: Can a role have indirect ADMIN OPTION on another role?

2023-09-06 Thread Robert Haas
On Wed, Sep 6, 2023 at 11:14 AM Ashutosh Sharma  wrote:
> In PG-16, I see that we have made a lot of changes in the area roles
> and privileges. I have a question related to this and here is my
> question:
>
> Let's say there is a roleA who creates roleB and then roleB creates
> another role, say roleC. By design, A can administer B and B can
> administer C. But, can A administer C although it has not created C?

Ultimately, yes, because A can get access to all of B's privileges,
which include administering C. However, A might or might not have B's
privileges by default, depending on the value of createrole_self_grant
in effect at the time when B was created. So, depending on the
situation, A might (or might not) need to do something like GRANT
roleB to roleA or SET ROLE roleB in order to be able to actually
execute the administration commands in question.

IMHO, it really couldn't reasonably work in any other way. Consider
that A's right to administer B includes the right to change B's
password. If the superuser wants users A and B that can't interfere
with each other, the superuser should create both of those accounts
themselves instead of letting one create the other.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: remaining sql/json patches

2023-09-06 Thread Alvaro Herrera
0001 is quite mysterious to me.  I've been reading it but I'm not sure I
grok it, so I don't have anything too intelligent to say about it at
this point.  But here are my thoughts anyway.

Assert()ing that a pointer is not null, and in the next line
dereferencing that pointer, is useless: the process would crash anyway
at the time of dereference, so the Assert() adds no value.  Better to
leave the assert out.  (This appears both in ExecExprEnableErrorSafe and
ExecExprDisableErrorSafe).

Is it not a problem to set just the node type, and not reset the
contents of the node to zeroes, in ExecExprEnableErrorSafe?  I'm not
sure if it's possible to enable error-safe on a node two times with an
error reported in between; would that result in the escontext filled
with junk the second time around?  That might be dangerous.  Maybe a
simple cross-check is to verify (assert) in ExecExprEnableErrorSafe()
that the struct is already all-zeroes, so that if this happens, we'll
get reports about it.  (After all, there are very few nodes that handle
the SOFT_ERROR_OCCURRED case).

Do we need to have the ->details_wanted flag turned on?  Maybe if we're
having ExecExprEnableErrorSafe() as a generic tool, it should receive
the boolean to use as an argument.

Why palloc the escontext always, and not just when
ExecExprEnableErrorSafe is called?  (At Disable time, just memset it to
zero, and next time it is enabled for that node, we don't need to
allocate it again, just set the nodetype.)

ExecExprEnableErrorSafe() is a strange name for this operation.  Maybe
you mean ExecExprEnableSoftErrors()?  Maybe it'd be better to leave it
as NULL initially, so that for the majority of cases we don't even
allocate it.

In 0002 you're adding soft-error support for a bunch of existing
operations, in addition to introducing SQL/JSON query functions.  Maybe
the soft-error stuff should be done separately in a preparatory patch.

I think functions such as populate_array_element() that can now save
soft errors and which currently do not have a return value, should
acquire a convention to let caller know that things failed: maybe return
false if SOFT_ERROR_OCCURRED().  Otherwise it appears that, for instance
populate_array_dim_jsonb() can return happily if an error occurs when
parsing the last element in the array.  Splitting 0002 to have a
preparatory patch where all such soft-error-saving changes are
introduced separately would help review that this is indeed being
handled by all their callers.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Por suerte hoy explotó el califont porque si no me habría muerto
 de aburrido"  (Papelucho)




Re: Eager page freeze criteria clarification

2023-09-06 Thread Robert Haas
On Fri, Sep 1, 2023 at 9:07 PM Peter Geoghegan  wrote:
> When I reported this a couple of years ago, I noticed that autovacuum
> would spin whenever I set autovacuum_vacuum_scale_factor to 0.02. But
> autovacuum would *never* run (outside of antiwraparound autovacuums)
> when it was set just a little higher (perhaps 0.03 or 0.04). So there
> was some inflection point at which its behavior totally changed.

I think that tables have a "natural" amount of bloat that is a
function of the workload. If you remove all the bloat by, say, running
VACUUM FULL, they quickly bloat again to that point, and then
stabilize (hopefully). It seems like if you set the scale factor to a
value below that "natural" amount of bloat, you might well spin,
especially on a very write-heavy workload like pgbench, because you're
basically trying to squeeze water from a stone on that point.

It's interesting that when you raised the threshold the rate of
vacuuming dropped to zero. I haven't seen that behavior. pgbench does
rely very, very heavily on HOT-pruning, so VACUUM only cleans up a
small percentage of the dead tuples that get created. However, it's
still needed for index vacuuming, and I'm not sure what would cause it
not to trigger for that purpose.

> > And I'm not sure why we
> > want to do that. If the table is being vacuumed a lot, it's probably
> > also being modified a lot, which suggests that we ought to be more
> > cautious about freezing, rather than the reverse.
>
> Why wouldn't it be both things at the same time, for the same table?

Both of what things?

> Why not also avoid setting pages all-visible? The WAL records aren't
> too much smaller than most freeze records these days -- 64 bytes on
> most systems. I realize that the rules for FPIs are a bit different
> when page-level checksums aren't enabled, but fundamentally it's the
> same situation. No?

It's an interesting point. AFAIK, whether or not page-level checksums
are enabled doesn't really matter here. But it seems fair to ask - if
freezing is too aggressive, why is setting all-visible not also too
aggressive? I don't have a good answer to that question right now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Eager page freeze criteria clarification

2023-09-06 Thread Robert Haas
On Wed, Sep 6, 2023 at 1:09 AM Andres Freund  wrote:
> Yea, it'd be useful to have a reasonably approximate wall clock time for the
> last modification of a page. We just don't have infrastructure for determining
> that. We'd need an LSN->time mapping (xid->time wouldn't be particularly
> useful, I think).
>
> A very rough approximate modification time can be computed by assuming an even
> rate of WAL generation, and using the LSN at the time of the last vacuum and
> the time of the last vacuum, to compute the approximate age.
>
> For a while I thought that'd not give us anything that just using LSNs gives
> us, but I think it might allow coming up with a better cutoff logic: Instead
> of using a cutoff like "page LSN is older than 10% of the LSNs since the last
> vacuum of the table", it would allow us to approximate "page has not been
> modified in the last 15 seconds" or such.  I think that might help avoid
> unnecessary freezing on tables with very frequent vacuuming.

Yes. I'm uncomfortable with the last-vacuum-LSN approach mostly
because of the impact on very frequently vacuumed tables, and
secondarily because of the impact on very infrequently vacuumed
tables.

Downthread, I proposed using the RedoRecPtr of the latest checkpoint
rather than the LSN of the previou vacuum. I still like that idea.
It's a value that we already have, with no additional bookkeeping. It
varies over a much narrower range than the interval between vacuums on
a table. The vacuum interval could be as short as tens of seconds as
long as years, while the checkpoint interval is almost always going to
be between a few minutes at the low end and some tens of minutes at
the high end, hours at the very most. That's quite appealing. Also, I
think the time between checkpoints actually matters here, because in
some sense we're looking to get dirty, already-FPI'd pages frozen
before they get written out, or before a new FPI becomes necessary,
and checkpoints are one way for the first of those things to happen
and the only way for the second one to happen.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: GUC for temporarily disabling event triggers

2023-09-06 Thread Robert Haas
On Wed, Sep 6, 2023 at 4:50 AM Daniel Gustafsson  wrote:
> > On 5 Sep 2023, at 17:29, Robert Haas  wrote:
> > On Tue, Sep 5, 2023 at 8:12 AM Daniel Gustafsson  wrote:
>
> >> The attached version of the patch replaces it with a boolean flag for 
> >> turning
> >> off all event triggers, and I also renamed it to the debug_xxx "GUC 
> >> namespace"
> >> which seemed more appropriate.
> >
> > I don't care for the debug_xxx renaming, myself. I think that should
> > be reserved for things where we believe there's no reason to ever use
> > it in production/real life, or for things whose purpose is to emit
> > debugging messages. Neither is the case here.
>
> Fair enough, how about disable_event_trigger instead as per the attached?

I usually prefer to give things a positive sense, talking about
whether things are enabled rather than disabled. I'd do event_triggers
= off | on, like we have for row_security. YMMV, though.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: generic plans and "initial" pruning

2023-09-06 Thread Robert Haas
On Wed, Sep 6, 2023 at 5:12 AM Amit Langote  wrote:
> Attached updated patches.  Thanks for the review.

I think 0001 looks ready to commit. I'm not sure that the commit
message needs to mention future patches here, since this code cleanup
seems like a good idea regardless, but if you feel otherwise, fair
enough.

On 0002, some questions:

- In ExecEndLockRows, is the call to EvalPlanQualEnd a concern? i.e.
Does that function need any adjustment?
- In ExecEndMemoize, should there be a null-test around
MemoryContextDelete(node->tableContext) as we have in
ExecEndRecursiveUnion, ExecEndSetOp, etc.?

I wonder how we feel about setting pointers to NULL after freeing the
associated data structures. The existing code isn't consistent about
doing that, and making it do so would be a fairly large change that
would bloat this patch quite a bit. On the other hand, I think it's a
good practice as a general matter, and we do do it in some ExecEnd
functions.

On 0003, I have some doubt about whether we really have all the right
design decisions in detail here:

- Why have this weird rule where sometimes we return NULL and other
times the planstate? Is there any point to such a coding rule? Why not
just always return the planstate?

- Is there any point to all of these early exit cases? For example, in
ExecInitBitmapAnd, why exit early if initialization fails? Why not
just plunge ahead and if initialization failed the caller will notice
that and when we ExecEndNode some of the child node pointers will be
NULL but who cares? The obvious disadvantage of this approach is that
we're doing a bunch of unnecessary initialization, but we're also
speeding up the common case where we don't need to abort by avoiding a
branch that will rarely be taken. I'm not quite sure what the right
thing to do is here.

- The cases where we call ExecGetRangeTableRelation or
ExecOpenScanRelation are a bit subtler ... maybe initialization that
we're going to do later is going to barf if the tuple descriptor of
the relation isn't what we thought it was going to be. In that case it
becomes important to exit early. But if that's not actually a problem,
then we could apply the same principle here also -- don't pollute the
code with early-exit cases, just let it do its thing and sort it out
later. Do you know what the actual problems would be here if we didn't
exit early in these cases?

- Depending on the answers to the above points, one thing we could
think of doing is put an early exit case into ExecInitNode itself: if
(unlikely(!ExecPlanStillValid(whatever)) return NULL. Maybe Andres or
someone is going to argue that that checks too often and is thus too
expensive, but it would be a lot more maintainable than having similar
checks strewn throughout the ExecInit* functions. Perhaps it deserves
some thought/benchmarking. More generally, if there's anything we can
do to centralize these checks in fewer places, I think that would be
worth considering. The patch isn't terribly large as it stands, so I
don't necessarily think that this is a critical issue, but I'm just
wondering if we can do better. I'm not even sure that it would be too
expensive to just initialize the whole plan always, and then just do
one test at the end. That's not OK if the changed tuple descriptor (or
something else) is going to crash or error out in a funny way or
something before initialization is completed, but if it's just going
to result in burning a few CPU cycles in a corner case, I don't know
if we should really care.

- The "At this point" comments don't give any rationale for why we
shouldn't have received any such invalidation messages. That  makes
them fairly useless; the Assert by itself clarifies that you think
that case shouldn't happen. The comment's job is to justify that
claim.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Output affected rows in EXPLAIN

2023-09-06 Thread Tom Lane
Damir Belyalov  writes:
> I create a patch that outputs affected rows in EXPLAIN that occur by
> INSERT/UPDATE/DELETE.
> Despite the fact that commands in EXPLAIN ANALYZE query are executed as
> usual, EXPLAIN doesn't show outputting affected rows as in these commands.
> The patch fixes this problem.

This creates a bug, not fixes one.  It's intentional that "insert into a"
is shown as returning zero rows, because that's what it did.  If you'd
written "insert ... returning", you'd have gotten a different result:

=# explain analyze insert into a values (1);
QUERY PLAN  
  
--
 Insert on a  (cost=0.00..0.01 rows=0 width=0) (actual time=0.015..0.016 rows=0 
loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 
rows=1 loops=1)
 Planning Time: 0.015 ms
 Execution Time: 0.027 ms
(4 rows)

=# explain analyze insert into a values (1) returning *;
QUERY PLAN  
  
--
 Insert on a  (cost=0.00..0.01 rows=1 width=4) (actual time=0.026..0.028 rows=1 
loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.003..0.003 
rows=1 loops=1)
 Planning Time: 0.031 ms
 Execution Time: 0.051 ms
(4 rows)

Now admittedly, if you want to know the number of rows that went to disk,
you have to infer that from the number of rows emitted by the
ModifyTable's child plan.  But that's a matter for documentation
(and I'm pretty sure it's documented someplace).

regards, tom lane




Re: pg_stats and range statistics

2023-09-06 Thread jian he
hi. I played around with the 2023-Apr 4 latest patch.

+lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])
should be
+
ranges_lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])

+upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])
should be
+
ranges_upper(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4)])

https://www.postgresql.org/docs/current/catalog-pg-type.html
there is no association between numrange and their base type numeric.
so for template: anyarray ranges_lower(anyarray). I don't think we can
input numrange array and return a numeric array.

https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC
>> When the return value of a function is declared as a polymorphic type, there 
>> must be at least one argument position that is also >> polymorphic, and the 
>> actual data type(s) supplied for the polymorphic arguments determine the 
>> actual result type for that call.


regression=# select
ranges_lower(ARRAY[numrange(1.1,2.2),numrange(3.3,4.4),
numrange(5.5,6.6)]);
 ranges_lower
---
 {1.1,3.3,5.5}
(1 row)
regression=# \gdesc
Column|Type
--+
 ranges_lower | numrange[]
(1 row)

I don't think you can cast literal ' {1.1,3.3,5.5}' to numrange[].




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-06 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing!

> 
> ==
> src/bin/pg_upgrade/controldata.c
> 
> 1. get_control_data
> 
> + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> + {
> + bool have_error = false;
> +
> + p = strchr(p, ':');
> +
> + if (p == NULL || strlen(p) <= 1)
> + pg_fatal("%d: controldata retrieval problem", __LINE__);
> +
> + p++; /* remove ':' char */
> +
> + p = strpbrk(p, "01234567890ABCDEF");
> +
> + if (p == NULL || strlen(p) <= 1)
> + pg_fatal("%d: controldata retrieval problem", __LINE__);
> +
> + cluster->controldata.chkpnt_latest =
> + strtoLSN(p, _error);
> 
> 1a.
> The declaration assignment of 'have_error' is redundant because it
> gets overwritten before it is checked anyhow.
> 
> ~
> 
> 1b.
> IMO that first check logic should also be shifted to be *inside* the
> strtoLSN and it would just return have_error true. This eliminates
> having 2x pg_fatal that have the same purpose.
> 
> ~~~
> 
> 2. strtoLSN
> 
> +/*
> + * Convert String to XLogRecPtr.
> + *
> + * This function is ported from pg_lsn_in_internal(). The function cannot be
> + * called from client binaries.
> + */
> +XLogRecPtr
> +strtoLSN(const char *str, bool *have_error)
> 
> SUGGESTION (comment wording)
> This function is ported from pg_lsn_in_internal() which cannot be
> called from client binaries.

These changes are reverted.

> src/bin/pg_upgrade/function.c
> 
> 3. struct plugin_list
> 
> +typedef struct plugin_list
> +{
> + int dbnum;
> + char*plugin;
> + struct plugin_list *next;
> +} plugin_list;
> 
> I found that name confusing. IMO should be like 'plugin_list_elem'.
> 
> e.g. it gets too strange in subsequent code:
> + plugin_list *newentry = (plugin_list *) pg_malloc(sizeof(plugin_list));
> 
> ~~~
> 
> 4. is_plugin_unique
> 
> +/* Has the given plugin already been listed? */
> +static bool
> +is_plugin_unique(plugin_list_head *listhead, const char *plugin)
> +{
> + plugin_list *point;
> +
> + /* Quick return if the head is NULL */
> + if (listhead == NULL)
> + return true;
> +
> + /* Seek the plugin list */
> + for (point = listhead->head; point; point = point->next)
> + {
> + if (strcmp(point->plugin, plugin) == 0)
> + return false;
> + }
> +
> + return true;
> +}
> 
> What's the meaning of the name 'point'? Maybe something generic like
> 'cur' or similar is better?
> 
> ~~~
> 
> 5. get_output_plugins
> 
> +/*
> + * Load the list of unique output plugins.
> + *
> + * XXX: Currently, we extract the list of unique output plugins, but this may
> + * be overkill. The list is used for two purposes - 1) to allocate the 
> minimal
> + * memory for the library list and 2) to skip storing duplicated plugin 
> names.
> + * However, the consumer check_loadable_libraries() can avoid double checks
> for
> + * the same library. The above means that we can arrange output plugins 
> without
> + * considering their uniqueness, so that we can remove this function.
> + */
> +static plugin_list_head *
> +get_output_plugins(void)
> +{
> + plugin_list_head *head = NULL;
> + int dbnum;
> +
> + /* Quick return if there are no logical slots to be migrated. */
> + if (count_old_cluster_logical_slots() == 0)
> + return NULL;
> +
> + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> + {
> + LogicalSlotInfoArr *slot_arr = _cluster.dbarr.dbs[dbnum].slot_arr;
> + int slotnum;
> +
> + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> + {
> + LogicalSlotInfo *slot = _arr->slots[slotnum];
> +
> + /* Add to the list if the plugin has not been listed yet */
> + if (is_plugin_unique(head, slot->plugin))
> + add_plugin_list_item(, dbnum, slot->plugin);
> + }
> + }
> +
> + return head;
> +}
> 
> About the XXX. Yeah, since the uniqueness seems checked later anyway
> all this extra code seems overkill. Instead of all the extra code you
> just need a comment to mention how it will be sorted and checked
> later.
> 
> But even if you prefer to keep it, I thought those 2 functions
> 'is_plugin_unique()' and 'add_plugin_list_item()' could have been
> combined to just have 'add_plugin_list_unique_item()'. Since order
> does not matter, such a function would just add items to the end of
> the list (after finding uniqueness) instead of to the head.

Based on suggestions from you and Hou[1], I withdrew to check their uniqueness.
So these functions and structures are removed.

> 6. get_loadable_libraries
> 
>   FirstNormalObjectId);
> +
>   totaltups += PQntuples(ress[dbnum]);
> ~
> 
> The extra blank line in the existing code is not needed in this patch.

Removed.

> 7. get_loadable_libraries
> 
>   int rowno;
> + plugin_list *point;
> 
> ~
> 
> Same as a prior comment #4. What's the meaning of the name 'point'?

The variable was removed.

> 8. get_loadable_libraries
> +
> + /*
> + * If the old cluster has logical replication slots, plugins used by
> + * them must be also stored. It must be done only once, so do it at
> + * dbnum == 0 case.
> + */
> + if (output_plugins == NULL)
> + continue;
> +
> + if (dbnum != 0)
> + continue;

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-06 Thread Hayato Kuroda (Fujitsu)
Dear Hou,

Thank you for giving comments! PSA new version.
0001 is updated based on the forked thread.

> 
> 1.
> 
> + res = executeQueryOrDie(conn, "SHOW wal_level;");
> + wal_level = PQgetvalue(res, 0, 0);
> 
> + res = executeQueryOrDie(conn, "SHOW wal_level;");
> + wal_level = PQgetvalue(res, 0, 0);
> 
> I think it would be better to do a sanity check using PQntuples() before
> calling PQgetvalue() in above places.

Added.

> 2.
> 
> +/*
> + * Helper function for get_old_cluster_logical_slot_infos()
> + */
> +static WALAvailability
> +GetWALAvailabilityByString(const char *str)
> +{
> + WALAvailability status = WALAVAIL_INVALID_LSN;
> +
> + if (strcmp(str, "reserved") == 0)
> + status = WALAVAIL_RESERVED;
> 
> Not a comment, but I am wondering if we could use conflicting field to do this
> check, so that we could avoid the new conversion function and structure
> movement. What do you think ?

I checked pg_get_replication_slots() and agreed that 
pg_replication_slots.conflicting
indicates whether the slot is usable or not. I can use the attribute instead of 
porting
WALAvailability. Fixed.

> 3.
> 
> + curr->confirmed_flush = strtoLSN(
> +
>PQgetvalue(res,
> +
>   slotnum,
> +
>   i_confirmed_flush),
> +
>_error);
> 
> The indention looks a bit unusual.

The part is not needed anymore.

> 4.
> +  * XXX: As mentioned in comments atop get_output_plugins(), we may
> not
> +  * have to consider the uniqueness of entries. If so, we can use
> +  * count_old_cluster_logical_slots() instead of plugin_list_length().
> +  */
> 
> I think check_loadable_libraries() will avoid loading the same library, so it
> seems fine to skip duplicating the plugins and we can save some codes.
> 
> 
>   /* Did the library name change?  Probe it. */
>   if (libnum == 0 || strcmp(lib, os_info.libraries[libnum -
> 1].name) != 0)
> 
> 
> But if we think duplicating them would be better, I feel we could use the
> SimpleStringList to store and duplicate the plugin name. get_output_plugins 
> can
> return an array of the stringlist, each stringlist includes the plugins names
> in one db. I shared a rough POC patch to show how it works, the intention is 
> to
> avoid introducing our new plugin list API.

Actually I do not like the style neither. Peter also said that we can skip 
checking the
uniqueness, so removed.

> 5.
> 
> + os_info.libraries = (LibraryInfo *) pg_malloc(
> +
> (totaltups + plugin_list_length(output_plugins)) *
> sizeof(LibraryInfo));
> 
> If we think this looks too long, maybe using pg_malloc_array can help.
>

I checked whole of the patch and used these shorten macros if the line exceeded
80 columns.

Also, I found a cfbot failure [1] but I could not find any reasons.
I will keep investigating more about it.

[1]: https://cirrus-ci.com/task/4634769732927488

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v32-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch
Description:  v32-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch


v32-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v32-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: Output affected rows in EXPLAIN

2023-09-06 Thread Daniel Gustafsson
> On 6 Sep 2023, at 14:49, Damir Belyalov  wrote

> The patch fixes this problem.

Given that EXPLAIN ANALYZE has worked like for a very long time, which problem
is it you have identified?

I'm also not convinced that the added "EXPLAIN" in the below plan is an
improvement in any way.

postgres=# explain (analyze) select * from t;
  QUERY PLAN
---
 Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.064..0.075 
rows=5 loops=1)
 Planning Time: 1.639 ms
 Execution Time: 0.215 ms
(3 rows)

EXPLAIN
postgres=#

--
Daniel Gustafsson





Re: Optimize planner memory consumption for huge arrays

2023-09-06 Thread Ashutosh Bapat
Hi Lepikhov,

Thanks for using my patch and I am glad that you found it useful.

On Mon, Sep 4, 2023 at 10:56 AM Lepikhov Andrei
 wrote:
>
> Hi, hackers,
>
> Looking at the planner behaviour with the memory consumption patch [1], I 
> figured out that arrays increase memory consumption by the optimizer 
> significantly. See init.sql in attachment.
> The point here is that the planner does small memory allocations for each 
> element during estimation. As a result, it looks like the planner consumes 
> about 250 bytes for each integer element.

I guess the numbers you mentioned in init.sql are total memory used by
the planner (as reported by the patch in the thread) when planning
that query and not memory consumed by Const nodes themselves. Am I
right? I think the measurements need to be explained better and also
the realistic scenario you are trying to oprimize.

I guess, the reason you think that partitioning will increase the
memory consumed is because each partition will have the clause
translated for it. Selectivity estimation for each partition will
create those many Const nodes and hence consume memory. Am I right?
Can you please measure the memory consumed with and without your
patch.

>
> It is maybe not a problem most of the time. However, in the case of 
> partitions, memory consumption multiplies by each partition. Such a corner 
> case looks weird, but the fix is simple. So, why not?

With vectorized operations becoming a norm these days, it's possible
to have thousands of element in array of an ANY or IN clause. Also
will be common to have thousands of partitions. But I think what we
need to do here is to write a selectivity estimation function which
takes an const array and return selectivity without requiring to
create a Const node for each element.

>
> The diff in the attachment is proof of concept showing how to reduce wasting 
> of memory. Having benchmarked a bit, I didn't find any overhead.
>

You might want to include your benchmarking results as well.

-- 
Best Wishes,
Ashutosh Bapat




Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-06 Thread Andy Fan
On Wed, Sep 6, 2023 at 8:50 PM Robert Haas  wrote:

> On Wed, Sep 6, 2023 at 2:45 AM Andy Fan  wrote:
> > I guess the *valuable* sometimes means the effort we pay is greater
> > than the benefit we get,  As for this patch,  the benefit is not huge (it
> > is possible the compiler already does that). and the most effort we pay
> > should be committer's attention, who needs to grab the patch, write the
> > correct  commit and credit to the author and push it.  I'm not sure if
> > Aleksander is worrying that this kind of patch will grab too much of
> > the committer's attention and I do see there are lots of patches like
> > this.
>
> Very fair point. However, as you said in your follow-up email, Richard
> Guo has done a lot of good work in this area already, so it makes
> sense to pay a bit more attention to his suggestions.
>

Agreed.


>
> > In my opinion,  we can do some stuff to improve the ROI.
> > -  Authors should do as much as possible,  mainly a better commit
> > message.  As for this patch, the commit message is " Adjustment
> > to get_cheapest_path_for_pathkeys"  which I don't think matches
> > our culture.
>
> I agree. I don't think the patch submitter is obliged to try to write
> a good commit message, but people who contribute regularly or are
> posting large stacks of complex patches are probably well-advised to
> try. It makes life easier for committers and even for reviewers trying
> to make sense of their patches.
>

Fair enough.


> > Actually I also want to know what "Ready for Committer" is designed for,
> > and when/who can mark a patch as "Ready for Committer" ?
>
> Any reviewer who feels that this is the case. It's not binding on
> anyone; it's an opinion.
>

Glad to know that.  I have marked myself as a reviewer and mark this entry
as "Ready for Committer".

https://commitfest.postgresql.org/44/4286/

-- 
Best Regards
Andy Fan


Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-09-06 Thread Masahiko Sawada
On Wed, Sep 6, 2023 at 3:23 PM John Naylor  wrote:
>
>
> On Mon, Aug 28, 2023 at 9:44 PM Masahiko Sawada  wrote:
> >
> > I've attached v42 patch set. I improved tidstore regression test codes
> > in addition of imcorporating the above comments.
>
> Seems fine at a glance, thanks. I will build on this to implement 
> variable-length values.

Thanks.

> I have already finished one prerequisite which is: public APIs passing 
> pointers to values.

Great!

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-06 Thread Robert Haas
On Wed, Sep 6, 2023 at 2:45 AM Andy Fan  wrote:
> I guess the *valuable* sometimes means the effort we pay is greater
> than the benefit we get,  As for this patch,  the benefit is not huge (it
> is possible the compiler already does that). and the most effort we pay
> should be committer's attention, who needs to grab the patch, write the
> correct  commit and credit to the author and push it.  I'm not sure if
> Aleksander is worrying that this kind of patch will grab too much of
> the committer's attention and I do see there are lots of patches like
> this.

Very fair point. However, as you said in your follow-up email, Richard
Guo has done a lot of good work in this area already, so it makes
sense to pay a bit more attention to his suggestions.

> In my opinion,  we can do some stuff to improve the ROI.
> -  Authors should do as much as possible,  mainly a better commit
> message.  As for this patch, the commit message is " Adjustment
> to get_cheapest_path_for_pathkeys"  which I don't think matches
> our culture.

I agree. I don't think the patch submitter is obliged to try to write
a good commit message, but people who contribute regularly or are
posting large stacks of complex patches are probably well-advised to
try. It makes life easier for committers and even for reviewers trying
to make sense of their patches.

> Actually I also want to know what "Ready for Committer" is designed for,
> and when/who can mark a patch as "Ready for Committer" ?

Any reviewer who feels that this is the case. It's not binding on
anyone; it's an opinion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Output affected rows in EXPLAIN

2023-09-06 Thread Damir Belyalov
I create a patch that outputs affected rows in EXPLAIN that occur by
INSERT/UPDATE/DELETE.
Despite the fact that commands in EXPLAIN ANALYZE query are executed as
usual, EXPLAIN doesn't show outputting affected rows as in these commands.
The patch fixes this problem.

Examples:
explain analyze insert into a values (1);
QUERY PLAN

--
 Insert on a  (cost=0.00..0.01 rows=0 width=0) (actual time=0.076..0.077
rows=0 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002
rows=1 loops=1)
 Planning Time: 0.025 ms
 Execution Time: 0.412 ms
(4 rows)

INSERT 0 1

  QUERY PLAN

--
 Update on a  (cost=0.00..35.50 rows=0 width=0) (actual time=0.059..0.060
rows=0 loops=1)
   ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=10) (actual
time=0.012..0.013 rows=7 loops=1)
 Planning Time: 0.142 ms
 Execution Time: 0.666 ms
(4 rows)

UPDATE 7

explain analyze delete from a where n = 1;
QUERY PLAN

---
 Delete on a  (cost=0.00..41.88 rows=0 width=0) (actual time=0.147..0.147
rows=0 loops=1)
   ->  Seq Scan on a  (cost=0.00..41.88 rows=13 width=6) (actual
time=0.120..0.123 rows=7 loops=1)
 Filter: (n = 1)
 Planning Time: 1.073 ms
 Execution Time: 0.178 ms
(5 rows)

DELETE 7

EXPLAIN queries without ANALYZE don't affect rows, so the output number is
0.

explain update a set n = 2;
 QUERY PLAN

 Update on a  (cost=0.00..35.50 rows=0 width=0)
   ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=10)
(2 rows)

UPDATE 0

Maybe there is no need to add this row when EXPLAIN has no ANALYZE. So it
is a discussion question.
Also haven't fixed regress tests yet.

Regards,
Damir Belyalov
Postgres Professional
From c6cbc6fa9ddf24f29bc19ff115224dd76e351db1 Mon Sep 17 00:00:00 2001
From: Damir Belyalov 
Date: Tue, 5 Sep 2023 15:04:01 +0300
Subject: [PATCH] Output affected rows in EXPLAIN.

---
 src/backend/commands/explain.c | 10 +-
 src/backend/tcop/cmdtag.c  |  2 +-
 src/backend/tcop/pquery.c  |  8 +++-
 src/backend/tcop/utility.c | 27 ++-
 src/bin/psql/common.c  |  5 +++--
 src/include/commands/explain.h |  3 ++-
 src/include/tcop/cmdtag.h  |  1 +
 src/include/tcop/cmdtaglist.h  |  3 +++
 src/interfaces/libpq/fe-exec.c |  4 +++-
 9 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..453e545ba5 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -162,7 +162,7 @@ static void escape_yaml(StringInfo buf, const char *str);
  */
 void
 ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
-			 ParamListInfo params, DestReceiver *dest)
+			 ParamListInfo params, DestReceiver *dest, uint64 *processed)
 {
 	ExplainState *es = NewExplainState();
 	TupOutputState *tstate;
@@ -173,6 +173,9 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	bool		timing_set = false;
 	bool		summary_set = false;
 
+	if (processed)
+		*processed = 0;
+
 	/* Parse options list. */
 	foreach(lc, stmt->options)
 	{
@@ -311,6 +314,9 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
 	end_tup_output(tstate);
 
 	pfree(es->str->data);
+
+	if (processed)
+		*processed = es->es_processed;
 }
 
 /*
@@ -649,6 +655,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 	 */
 	INSTR_TIME_SET_CURRENT(starttime);
 
+	es->es_processed += queryDesc->estate->es_processed;
+
 	ExecutorEnd(queryDesc);
 
 	FreeQueryDesc(queryDesc);
diff --git a/src/backend/tcop/cmdtag.c b/src/backend/tcop/cmdtag.c
index 4bd713a0b4..9e6fdbd8af 100644
--- a/src/backend/tcop/cmdtag.c
+++ b/src/backend/tcop/cmdtag.c
@@ -146,7 +146,7 @@ BuildQueryCompletionString(char *buff, const QueryCompletion *qc,
 	 */
 	if (command_tag_display_rowcount(tag) && !nameonly)
 	{
-		if (tag == CMDTAG_INSERT)
+		if (tag == CMDTAG_INSERT || tag == CMDTAG_EXPLAIN_INSERT)
 		{
 			*bufp++ = ' ';
 			*bufp++ = '0';
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5565f200c3..ba0b33cc67 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -775,7 +775,13 @@ PortalRun(Portal portal, long count, bool isTopLevel, bool run_once,
 if (qc && portal->qc.commandTag != CMDTAG_UNKNOWN)
 {
 	CopyQueryCompletion(qc, >qc);
-	qc->nprocessed = nprocessed;
+	if (portal->qc.commandTag == CMDTAG_EXPLAIN ||
+		portal->qc.commandTag == CMDTAG_EXPLAIN_INSERT ||
+		portal->qc.commandTag == CMDTAG_EXPLAIN_UPDATE ||
+		

make add_paths_to_append_rel aware of startup cost

2023-09-06 Thread Andy Fan
Hi:

This thread is a refactor of thread [1] for easier communication.

Currently add_paths_to_append_rel overlooked the startup cost for creating
append path, so it may have lost some optimization chances.  After a
glance,
the following 4 identifiers can be impacted.

Identifier 1:

SELECT .. FROM v1
UNION ALL
SELECT .. FROM v2
LIMIT 3;

Identifier 2:

SELECT * FROM p .. LIMIT 3;  p is a partitioned table.

Identifier 3:
SELECT * FROM p JOIN q using (partkey) LIMIT 3;

If we did the partition-wise-join, then we lost the chances for a better
plan.

Identifier 4:  -- EXISTS implies LIMIT 1;
SELECT * FROM foo
WHERE EXISTS
(SELECT 1 FROM append_rel_v_not_pullup_able WHERE xxx);

However, after I completed my patch and wanted to build some real
queries to prove my idea,  I just find it is hard to build the case for
Identifier 2/3/4.  But the Improvement for Identifier 1 is easy and
my real user case in work is Identifier 1 as well.

So a patch is attached for this case, it will use fractional costs
rather than total costs if needed. The following items needs more
attention during development.

- We shouldn't do the optimization if there are still more tables to join,
  the reason is similar to has_multiple_baserels(root) in
  set_subquery_pathlist. But the trouble here is we may inherit multiple
  levels to build an appendrel, so I have to keep the 'top_relids' all the
  time and compare it with PlannerInfo.all_baserels. If they are the same,
  then it is the case we want to optimize.

- add_partial_path doesn't consider the startup costs by design, I didn't
  rethink too much about this, but the idea of "choose a path which
  let each worker produces the top-N tuples as fast as possible" looks
  reasonable, and even though add_partial_path doesn't consider startup
  cost, it is still possible that childrel keeps more than 1 partial paths
  due to any reasons except startup_cost, for example PathKey. then we
  still have chances to choose the cheapest fractional path among
  them. The same strategy also applies to mixed partial and non-partial
  append paths.

- Due to the complexity of add_paths_to_append_rel, 3 arguments have
 to be added to get_cheapest_fractional_path...

  Path *
get_cheapest_fractional_path(RelOptInfo *rel, double tuple_fraction,
bool allow_parameterized, bool look_partial,
bool must_parallel_safe)


Cases can be improved.

Here is the simplest test case,  but it will not be hard to provide more
cases for Identifier 1.

 (select * from tenk1 order by hundred)
 UNION ALL
 (select * from tenk1 order by hundred)
 limit 3;

master: 8.096ms.
patched: 0.204ms.

The below user case should be more reasonable for real business.

with a as (select * from t1 join t2..),
b as (select * from t1 join t3 ..)
select * from a union all select * from b
limit 3;

The patch would also have impacts on identifier 2/3/4, even though I can't
make a demo sql which can get benefits from this patch,  I also  added
some test cases for code coverage purposes.

Any feedback is welcome!

[1]
https://www.postgresql.org/message-id/flat/CAKU4AWqEnzhUTxopVhENC3vs6NnYV32+e6GSBtp1rAv0ZNX=m...@mail.gmail.com


-- 
Best Regards
Andy Fan


v1-0001-make-add_paths_to_append_rel-aware-of-startup-cos.patch
Description: Binary data


Re: How to add a new pg oid?

2023-09-06 Thread jacktby jacktby


> 2023年9月6日 18:50,jacktby jacktby  写道:
> 
> 
> 
>> 2023年9月6日 18:19,jacktby jacktby  写道:
>> 
>> 
>> 
>>> 2023年9月6日 01:47,David G. Johnston  写道:
>>> 
>>> OIDs don't exist independently of the data they are associated with.  Give 
>>> more context if you want a better answer.  Or just go look at the source 
>>> code commits for when the last time something needing an OID got added to 
>>> the core catalog.
>>> 
>>> David J.
>>>  
>> 
>> { oid => '111', array_type_oid => '6099', descr => 'similarity columns',
>>   typname => 'similarity_columns', typlen => '-1', typlen => '-1', typbyval 
>> => 'f', typcategory => 'U',
>>   typinput => 'byteain', typoutput => 'byteaout', typreceive => 'bytearecv',
>>   typsend => 'byteasend', typalign => 'i', typstorage => 'x' },
>> 
>> I add above into pg_type.dat. And then I add execute “make install” and 
>> restart pg. And Then do below:
>> postgres=# SELECT typname from pg_type where typname like '%similarity%';
>>  typname 
>> -
>> (0 rows)
>> 
>> I can’t get the type I added. What else I need to do?
> I add below in bootstrap.c:
> static const struct typinfo TypInfo[] = {
>   {"similarity_columns", SimilarityColumns, 0, -1, false, TYPALIGN_INT, 
> TYPSTORAGE_EXTENDED, InvalidOid,
>F_BYTEAIN, F_BYTEAOUT},
> ….
> }
> And then “make install” and restart pg.but still:
> postgres=# SELECT typname from pg_type where typname like '%similarity%';
>  typname 
> -
> (0 rows)
> 
> Please give me help.
After initdb , I get it. Thanks



Re: Transaction timeout

2023-09-06 Thread Andrey M. Borodin
Thanks for looking into this!

> On 6 Sep 2023, at 13:16, Fujii Masao  wrote:
> 
> While testing v4 patch, I noticed it doesn't handle the COMMIT AND CHAIN case 
> correctly.
> When COMMIT AND CHAIN is executed, I believe the transaction timeout counter 
> should reset
> and start from zero with the next transaction. However, it appears that the 
> current
> v4 patch doesn't reset the counter in this scenario. Can you confirm this?
Yes, I was not aware of this feature. I'll test and fix this.

> With the v4 patch, I found that timeout errors no longer occur during the 
> idle in
> transaction phase. Instead, they occur when the next statement is executed. 
> Is this
> the intended behavior?
AFAIR I had been testing that behaviour of "idle in transaction" was intact. 
I'll check that again.

> I thought some users might want to use the transaction timeout
> feature to prevent prolonged transactions and promptly release resources 
> (e.g., locks)
> in case of a timeout, similar to idle_in_transaction_session_timeout.
Yes, this is exactly how I was expecting the feature to behave: empty up 
max_connections slots for long-hanging transactions.

Thanks for your findings, I'll check and post new version!


Best regards, Andrey Borodin.



Re: pgbnech: allow to cancel queries during benchmark

2023-09-06 Thread Yugo NAGATA
Hi Fabien,

On Thu, 10 Aug 2023 12:32:44 +0900
Yugo NAGATA  wrote:

> On Wed, 9 Aug 2023 11:18:43 +0200 (CEST)
> Fabien COELHO  wrote:
> 
> > 
> > I forgot, about the test:
> > 
> > I think that it should be integrated in the existing 
> > "001_pgbench_with_server.pl" script, because a TAP script is pretty 
> > expensive as it creates a cluster, starts it… before running the test.
> 
> Ok. I'll integrate the test into 001.
>  
> > I'm surprise that IPC::Run does not give any access to the process number. 
> > Anyway, its object interface seems to allow sending signal:
> > 
> > $h->signal("...")
> > 
> > So the code could be simplified to use that after a small delay.
> 
> Thank you for your information.
> 
> I didn't know $h->signal() and  I mimicked the way of
> src/bin/psql/t/020_cancel.pl to send SIGINT to a running program. I don't
> know why the psql test doesn't use the interface, I'll investigate whether
> this can be used in our purpose, anyway.

I attached the updated patch v3. The changes since the previous
patch includes the following;

I removed the unnecessary condition (&& false) that you
pointed out in [1].

The test was rewritten by using IPC::Run signal() and integrated
to "001_pgbench_with_server.pl". This test is skipped on Windows
because SIGINT causes to terminate the test itself as discussed
in [2] about query cancellation test in psql.

I added some comments to describe how query cancellation is
handled as I explained in [1].

Also, I found the previous patch didn't work on Windows so fixed it.
On non-Windows system, a thread waiting a response of long query can
be interrupted by SIGINT, but on Windows, threads do not return from
waiting until queries they are running are cancelled. This is because,
when the signal is received, the system just creates a new thread to
execute the callback function specified by setup_cancel_handler, and
other thread continue to run[3]. Therefore, the queries have to be
cancelled in the callback function.

[1] 
https://www.postgresql.org/message-id/a58388ac-5411-4760-ea46-71324d8324cb%40mines-paristech.fr
[2] 
https://www.postgresql.org/message-id/20230906004524.2fd6ee049f8a6c6f2690b99c%40sraoss.co.jp
[3] https://learn.microsoft.com/en-us/windows/console/handlerroutine

Regards,
Yugo Nagata

> 
> -- 
> Yugo NAGATA 
> 
> 


-- 
Yugo NAGATA 
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 713e8a06bb..1e396865f7 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -596,6 +596,7 @@ typedef enum
 typedef struct
 {
 	PGconn	   *con;			/* connection handle to DB */
+	PGcancel   *cancel;			/* query cancel */
 	int			id;/* client No. */
 	ConnectionStateEnum state;	/* state machine's current state. */
 	ConditionalStack cstack;	/* enclosing conditionals state */
@@ -638,6 +639,8 @@ typedef struct
  * here */
 } CState;
 
+CState	*client_states;		/* status of all clients */
+
 /*
  * Thread state
  */
@@ -837,6 +840,10 @@ static void add_socket_to_set(socket_set *sa, int fd, int idx);
 static int	wait_on_socket_set(socket_set *sa, int64 usecs);
 static bool socket_has_input(socket_set *sa, int fd, int idx);
 
+#ifdef WIN32
+static void pgbench_cancel_callback(void);
+#endif
+
 /* callback used to build rows for COPY during data loading */
 typedef void (*initRowMethod) (PQExpBufferData *sql, int64 curr);
 
@@ -3639,6 +3646,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 		st->state = CSTATE_ABORTED;
 		break;
 	}
+	st->cancel = PQgetCancel(st->con);
 
 	/* reset now after connection */
 	now = pg_time_now();
@@ -4670,6 +4678,18 @@ disconnect_all(CState *state, int length)
 		finishCon([i]);
 }
 
+/* send cancel requests to all connections */
+static void
+cancel_all()
+{
+	for (int i = 0; i < nclients; i++)
+	{
+		char errbuf[1];
+		if (client_states[i].cancel != NULL)
+			(void) PQcancel(client_states[i].cancel, errbuf, sizeof(errbuf));
+	}
+}
+
 /*
  * Remove old pgbench tables, if any exist
  */
@@ -7146,6 +7166,9 @@ main(int argc, char **argv)
 		}
 	}
 
+	/* enable threads to access the status of all clients */
+	client_states = state;
+
 	/* other CState initializations */
 	for (i = 0; i < nclients; i++)
 	{
@@ -7358,6 +7381,37 @@ threadRun(void *arg)
 	StatsData	last,
 aggs;
 
+	/*
+	 * Query cancellation is handled only in thread #0.
+	 *
+	 * On Windows, a callback function is set in which query cancel requests
+	 * are sent to all benchmark queries running in the backend.
+	 *
+	 * On non-Windows, any callback function is not set. When SIGINT is
+	 * received, CancelRequested is just set, and only thread #0 is interrupted
+	 * and returns from waiting input from the backend. After that, the thread
+	 * sends cancel requests to all benchmark queries.
+	 */
+	if (thread->tid == 0)
+#ifdef WIN32
+		setup_cancel_handler(pgbench_cancel_callback);
+#else
+		setup_cancel_handler(NULL);
+#endif
+
+#if defined(ENABLE_THREAD_SAFETY) && 

Re: information_schema and not-null constraints

2023-09-06 Thread Peter Eisentraut

On 05.09.23 18:24, Alvaro Herrera wrote:

On 2023-Sep-05, Peter Eisentraut wrote:


The following information schema views are affected by the not-null
constraint catalog entries:

1. CHECK_CONSTRAINTS
2. CONSTRAINT_COLUMN_USAGE
3. DOMAIN_CONSTRAINTS
4. TABLE_CONSTRAINTS

Note that 1 and 3 also contain domain constraints.


After looking at what happens for domain constraints in older versions
(I tested 15, but I suppose this applies everywhere), I notice that we
don't seem to handle them anywhere that I can see.  My quick exercise is
just

create domain nnint as int not null;
create table foo (a nnint);

and then verify that this constraint shows nowhere -- it's not in
DOMAIN_CONSTRAINTS for starters, which is I think the most obvious place.
And nothing is shown in CHECK_CONSTRAINTS nor TABLE_CONSTRAINTS either.

This did ever work in the past?  I tested with 9.3 and didn't see
anything there either.


No, this was never implemented.  (As I wrote in my other message on the 
other thread, arguably a bit buggy.)  We could fix this separately, 
unless we are going to implement catalogued domain not-null constraints 
soon.






Re: pg_upgrade and logical replication

2023-09-06 Thread vignesh C
On Wed, 19 Jul 2023 at 12:47, Michael Paquier  wrote:
>
> On Wed, May 10, 2023 at 05:59:24PM +1000, Peter Smith wrote:
> > 1. ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 
> > 'X/Y'])
> >
> > I was a bit confused by this relation 'state' mentioned in multiple
> > places. IIUC the pg_upgrade logic is going to reject anything with a
> > non-READY (not 'r') state anyhow, so what is the point of having all
> > the extra grammar/parse_subscription_options etc to handle setting the
> > state when only possible value must be 'r'?
>
> We are just talking about the handling of an extra DefElem in an
> extensible grammar pattern, so adding the state field does not
> represent much maintenance work.  I'm OK with the addition of this
> field in the data set dumped, FWIW, on the ground that it can be
> useful for debugging purposes when looking at --binary-upgrade dumps,
> and because we aim at copying catalog contents from one cluster to
> another.
>
> Anyway, I am not convinced that we have any need for a parse-able
> grammar at all, because anything that's presented on this thread is
> aimed at being used only for the internal purpose of an upgrade in a
> --binary-upgrade dump with a direct catalog copy in mind, and having a
> grammar would encourage abuses of it outside of this context.  I think
> that we should aim for simpler than what's proposed by the patch,
> actually, with either a single SQL function à-la-binary_upgrade() that
> adds the contents of a relation.  Or we can be crazier and just create
> INSERT queries for pg_subscription_rel to provide an exact copy of the
> catalog contents.  A SQL function would be more consistent with other
> objects types that use similar tricks, see
> binary_upgrade_create_empty_extension() that does something similar
> for some pg_extension records.  So, this function would require in
> input 4 arguments:
> - The subscription name or OID.
> - The relation OID.
> - Its LSN.
> - Its sync state.

Added a SQL function to handle the insertion and removed the "ALTER
SUBSCRIPTION ... ADD TABLE" command that was added.
Attached patch has the changes for the same.

Regards,
Vignesh
From ce0e041bf120f3615ec7a02187ce27e9922688d2 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Wed, 6 Sep 2023 10:07:42 +0530
Subject: [PATCH v6] Optionally preserve the full subscription's state during
 pg_upgrade

Previously, only the subscription metadata information was preserved.  Without
the list of relations and their state it's impossible to re-enable the
subscriptions without missing some records as the list of relations can only be
refreshed after enabling the subscription (and therefore starting the apply
worker).  Even if we added a way to refresh the subscription while enabling a
publication, we still wouldn't know which relations are new on the publication
side, and therefore should be fully synced, and which shouldn't.

Similarly, the subscription's replication origin are needed to ensure
that we don't replicate anything twice.

To fix this problem, this patch teaches pg_dump in binary upgrade mode to
restore the content of pg_subscription_rel from the old cluster by using
binary_upgrade_create_sub_rel_state SQL function, and also provides an
additional LSN parameter for CREATE SUBSCRIPTION to restore the underlying
replication origin remote LSN.  The new binary_upgrade_create_sub_rel_state
SQL function and the new LSN parameter are not exposed to users and only
accepted in binary upgrade mode.

The new SQL binary_upgrade_create_sub_rel_state function has the following
syntax:
SELECT binary_upgrade_create_sub_rel_state(subname text, relid oid, state char [,sublsn pg_lsn])

In the above, subname is the subscription name, relid is the relation
identifier, the state is the state of the relation, sublsn is optional, and
defaults to NULL/InvalidXLogRecPtr if not provided. pg_dump will retrieve these
values(subname, relid, state and sublsn) from the old cluster.

This mode is optional and not enabled by default.  A new
--preserve-subscription-state option is added to pg_upgrade to use it.  For
now, pg_upgrade will check that all the subscription have a valid replication
origin remote_lsn, and that all underlying relations are in 'r' (ready) state,
and will error out if that's not the case, logging the reason for the failure.

Author: Julien Rouhaud
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
---
 doc/src/sgml/ref/pgupgrade.sgml  |  23 +++
 src/backend/catalog/pg_subscription.c|  64 +++
 src/backend/commands/subscriptioncmds.c  |  10 +-
 src/bin/pg_dump/common.c |  22 +++
 src/bin/pg_dump/pg_backup.h  |   2 +
 src/bin/pg_dump/pg_dump.c| 128 -
 src/bin/pg_dump/pg_dump.h|  15 ++
 src/bin/pg_upgrade/check.c   |  79 
 src/bin/pg_upgrade/dump.c|   3 +-
 src/bin/pg_upgrade/meson.build   |   1 +
 

Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-06 Thread Ranier Vilela
Em seg., 4 de set. de 2023 às 11:27, Ranier Vilela 
escreveu:

> Em dom., 3 de set. de 2023 às 22:01, Michael Paquier 
> escreveu:
>
>> On Sat, Sep 02, 2023 at 09:13:11AM -0300, Ranier Vilela wrote:
>> > I tried to keep the same behavior as before.
>> > Note that if the locale equals COLLPROVIDER_LIBC,
>> > the message to the user will be the same.
>>
>> -/* shouldn't happen */
>> -elog(ERROR, "unsupported collprovider: %c", locale->provider);
>> +elog(ERROR, "collprovider '%c' does not support
>> pg_strnxfrm_prefix()",
>> + locale->provider);
>>
>> Based on what's written here, these messages could be better because
>> full sentences are not encouraged in error messages, even for
>> non-translated elogs:
>> https://www.postgresql.org/docs/current/error-style-guide.html
>>
>> Perhaps something like "could not use collprovider %c: no support for
>> %s", where the function name is used, would be more consistent.
>>
> Sure.
> I have no objection to the wording of the message.
> If there is consensus, I can send another patch.
>
I think no one objected.

v1 attached.

best regards,
Ranier Vilela


v1-0001-Avoid-a-possible-dereference-a-null-pointer.patch
Description: Binary data


Re: How to add a new pg oid?

2023-09-06 Thread jacktby jacktby


> 2023年9月6日 18:19,jacktby jacktby  写道:
> 
> 
> 
>> 2023年9月6日 01:47,David G. Johnston  写道:
>> 
>> OIDs don't exist independently of the data they are associated with.  Give 
>> more context if you want a better answer.  Or just go look at the source 
>> code commits for when the last time something needing an OID got added to 
>> the core catalog.
>> 
>> David J.
>>  
> 
> { oid => '111', array_type_oid => '6099', descr => 'similarity columns',
>   typname => 'similarity_columns', typlen => '-1', typlen => '-1', typbyval 
> => 'f', typcategory => 'U',
>   typinput => 'byteain', typoutput => 'byteaout', typreceive => 'bytearecv',
>   typsend => 'byteasend', typalign => 'i', typstorage => 'x' },
> 
> I add above into pg_type.dat. And then I add execute “make install” and 
> restart pg. And Then do below:
> postgres=# SELECT typname from pg_type where typname like '%similarity%';
>  typname 
> -
> (0 rows)
> 
> I can’t get the type I added. What else I need to do?
I add below in bootstrap.c:
static const struct typinfo TypInfo[] = {
{"similarity_columns", SimilarityColumns, 0, -1, false, TYPALIGN_INT, 
TYPSTORAGE_EXTENDED, InvalidOid,
 F_BYTEAIN, F_BYTEAOUT},
….
}
And then “make install” and restart pg.but still:
postgres=# SELECT typname from pg_type where typname like '%similarity%';
 typname 
-
(0 rows)

Please give me help.

Re: Vectorization of some functions and improving pg_list interface

2023-09-06 Thread Yura Sokolov

06.09.2023 13:24, Yura Sokolov wrote:

24.08.2023 17:07, Maxim Orlov wrote:

Hi!

Recently, I've been playing around with pg_lists and realize how 
annoying (maybe, I was a bit tired) some stuff related to the lists.

For an example, see this code
List *l1 = list_make4(1, 2, 3, 4),
  *l2 = list_make4(5, 6, 7, 8),
  *l3 = list_make4(9, 0, 1, 2);
ListCell *lc1, *lc2, *lc3;

forthree(lc1, l1, lc2, l2, lc3, l3) {
...
}

list_free(l1);
list_free(l2);
list_free(l3);

There are several questions:
1) Why do I need to specify the number of elements in the list in the 
function name?

Compiler already knew how much arguments do I use.
2) Why I have to call free for every list? I don't know how to call it 
right, for now I call it vectorization.

     Why not to use simple wrapper to "vectorize" function args?

So, my proposal is:
1) Add a simple macro to "vectorize" functions.
2) Use this macro to "vectorize" list_free and list_free_deep functions.
3) Use this macro to "vectorize" bms_free function.
4) "Vectorize" list_makeN functions.

For this V1 version, I do not remove all list_makeN calls in order to 
reduce diff, but I'll address

this in future, if it will be needed.

In my view, one thing still waiting to be improved if foreach loop. It 
is not very handy to have a bunch of
similar calls foreach, forboth, forthree and etc. It will be ideal to 
have single foreach interface, but I don't know how

to do it without overall interface of the loop.

Any opinions are very welcome!


Given use case doesn't assume "zero" arguments, it is possible to 
implement "lists_free" with just macro expansion (following code is not 
checked, but close to valid):


#define VA_FOR_EACH(invoke, join, ...) \
 CppConcat(VA_FOR_EACH_, VA_ARGS_NARGS(__VA_ARGS__))( \
     invoke, join, __VA_ARGS__)
#define VA_FOR_EACH_1(invoke, join, a1) \
 invoke(a1)
#define VA_FOR_EACH_2(invoke, join, a1, a2) \
 invoke(a1) join() invoke(a2)
#define VA_FOR_EACH_3(invoke, join, a1, a2, a3) \
 invoke(a1) join() invoke(a2) join() invoke(a3)
... up to 63 args

#define VA_SEMICOLON() ;

#define lists_free(...) \
 VA_FOR_EACH(list_free, VA_SEMICOLON, __VA_ARGS__)

#define lists_free_deep(...) \
 VA_FOR_EACH(list_free_deep, VA_SEMICOLON, __VA_ARGS__)

There could be couple of issues with msvc, but they are solvable.


Given we could use C99 compound literals, list contruction could be 
implemented without C vaarg functions as well


List *
list_make_impl(NodeTag t, int n, ListCell *datums)
{
List   *list = new_list(t, n);
memcpy(list->elements, datums, sizeof(ListCell)*n);
return list;
}

#define VA_COMMA() ,

#define list_make__m(Tag, type, ...) \
   list_make_impl(Tag, VA_ARGS_NARGS(__VA_ARGS__), \
 ((ListCell[]){ \
   VA_FOR_EACH(list_make_##type##_cell, VA_COMMA, __VA_ARGS__) \
 }))


#define list_make(...) list_make__m(T_List, ptr, __VA_ARGS__)
#define list_make_int(...) list_make__m(T_IntList, int, __VA_ARGS__)
#define list_make_oid(...) list_make__m(T_OidList, oid, __VA_ARGS__)
#define list_make_xid(...) list_make__m(T_XidList, xid, __VA_ARGS__)

(code is not checked)

If zero arguments (no arguments) should be supported, it is tricky 
because of mvsc, but solvable.





Re: Vectorization of some functions and improving pg_list interface

2023-09-06 Thread Yura Sokolov

24.08.2023 17:07, Maxim Orlov wrote:

Hi!

Recently, I've been playing around with pg_lists and realize how 
annoying (maybe, I was a bit tired) some stuff related to the lists.

For an example, see this code
List *l1 = list_make4(1, 2, 3, 4),
  *l2 = list_make4(5, 6, 7, 8),
  *l3 = list_make4(9, 0, 1, 2);
ListCell *lc1, *lc2, *lc3;

forthree(lc1, l1, lc2, l2, lc3, l3) {
...
}

list_free(l1);
list_free(l2);
list_free(l3);

There are several questions:
1) Why do I need to specify the number of elements in the list in the 
function name?

Compiler already knew how much arguments do I use.
2) Why I have to call free for every list? I don't know how to call it 
right, for now I call it vectorization.

     Why not to use simple wrapper to "vectorize" function args?

So, my proposal is:
1) Add a simple macro to "vectorize" functions.
2) Use this macro to "vectorize" list_free and list_free_deep functions.
3) Use this macro to "vectorize" bms_free function.
4) "Vectorize" list_makeN functions.

For this V1 version, I do not remove all list_makeN calls in order to 
reduce diff, but I'll address

this in future, if it will be needed.

In my view, one thing still waiting to be improved if foreach loop. It 
is not very handy to have a bunch of
similar calls foreach, forboth, forthree and etc. It will be ideal to 
have single foreach interface, but I don't know how

to do it without overall interface of the loop.

Any opinions are very welcome!


Given use case doesn't assume "zero" arguments, it is possible to 
implement "lists_free" with just macro expansion (following code is not 
checked, but close to valid):


#define VA_FOR_EACH(invoke, join, ...) \
CppConcat(VA_FOR_EACH_, VA_ARGS_NARGS(__VA_ARGS__))( \
invoke, join, __VA_ARGS__)
#define VA_FOR_EACH_1(invoke, join, a1) \
invoke(a1)
#define VA_FOR_EACH_2(invoke, join, a1, a2) \
invoke(a1) join() invoke(a2)
#define VA_FOR_EACH_3(invoke, join, a1, a2, a3) \
invoke(a1) join() invoke(a2) join() invoke(a3)
... up to 63 args

#define VA_SEMICOLON() ;

#define lists_free(...) \
VA_FOR_EACH(list_free, VA_SEMICOLON, __VA_ARGS__)

#define lists_free_deep(...) \
VA_FOR_EACH(list_free_deep, VA_SEMICOLON, __VA_ARGS__)

There could be couple of issues with msvc, but they are solvable.

--

Regards,
Yura




Re: How to add a new pg oid?

2023-09-06 Thread jacktby jacktby


> 2023年9月6日 01:47,David G. Johnston  写道:
> 
> OIDs don't exist independently of the data they are associated with.  Give 
> more context if you want a better answer.  Or just go look at the source code 
> commits for when the last time something needing an OID got added to the core 
> catalog.
> 
> David J.
>  

{ oid => '111', array_type_oid => '6099', descr => 'similarity columns',
  typname => 'similarity_columns', typlen => '-1', typlen => '-1', typbyval => 
'f', typcategory => 'U',
  typinput => 'byteain', typoutput => 'byteaout', typreceive => 'bytearecv',
  typsend => 'byteasend', typalign => 'i', typstorage => 'x' },

I add above into pg_type.dat. And then I add execute “make install” and restart 
pg. And Then do below:
postgres=# SELECT typname from pg_type where typname like '%similarity%';
 typname 
-
(0 rows)

I can’t get the type I added. What else I need to do?

Re: [Regression] Incorrect filename in test case comment

2023-09-06 Thread Suraj Kharage
Thanks Daniel and Michael.

On Wed, Sep 6, 2023 at 1:52 PM Daniel Gustafsson  wrote:

> > On 6 Sep 2023, at 10:19, Michael Paquier  wrote:
> >
> > On Wed, Sep 06, 2023 at 10:48:32AM +0530, Suraj Kharage wrote:
> >> While browsing the test cases, found that the incorrect filename was
> there
> >> in the test case comment.
> >> The below commit added the custom hash opclass in insert.sql,
> >
> > --- part_part_test_int4_ops and part_test_text_ops in insert.sql.
> > +-- part_part_test_int4_ops and part_test_text_ops in test_setup.sql.
> >
> > Good catch, but part_part_test_int4_ops should be renamed to
> > part_test_int4_ops, removing the first "part_", no?
>
> Ah, seems we came to same conclusion when looking simultaneously, I just
> pushed
> the fix with the typo fix.
>
> --
> Daniel Gustafsson
>
>

-- 
--

Thanks & Regards,
Suraj kharage,



edbpostgres.com


Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-06 Thread Quan Zongliang




On 2023/9/6 17:07, John Naylor wrote:


On Wed, Sep 6, 2023 at 2:50 PM Quan Zongliang > wrote:


 > If not optimized(--enable-debug CFLAGS='-O0'), there is a clear
 > difference. When the compiler does the optimization, the performance is
 > similar. I think the compiler does a good enough optimization with
 > "pg_attribute_always_inline" and the last two constant parameters when
 > calling heapgetpage_collect.

So as we might expect, more specialization (Andres' patch) has no 
apparent downsides in this workload. (While I'm not sure of the point of 
testing at -O0, I think we can conclude that less-bright compilers will 
show some improvement with either patch.)


If you agree, do you want to withdraw your patch from the commit fest?


Ok.

--
John Naylor
EDB: http://www.enterprisedb.com 






Can a role have indirect ADMIN OPTION on another role?

2023-09-06 Thread Ashutosh Sharma
Hi,

In PG-16, I see that we have made a lot of changes in the area roles
and privileges. I have a question related to this and here is my
question:

Let's say there is a roleA who creates roleB and then roleB creates
another role, say roleC. By design, A can administer B and B can
administer C. But, can A administer C although it has not created C?

--
With Regards,
Ashutosh Sharma.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-06 Thread Amit Kapila
On Wed, Sep 6, 2023 at 11:01 AM Peter Smith  wrote:
>
> On Tue, Sep 5, 2023 at 7:34 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Also, for simplifying codes, only a first-met invalidated slot is output in 
> > the
> > check_old_cluster_for_valid_slots(). Warning messages int the function were
> > removed. I think it may be enough because check_new_cluster_is_empty() do
> > similar thing. Please tell me if it should be reverted...
> >
>
> Another possible idea is to show all the WARNINGS but only when in verbose 
> mode.
>

I think it would be better to write problematic slots in the script
file like we are doing in the function
check_for_composite_data_type_usage()->check_for_data_types_usage()
and give a message suggesting what the user can do as we are doing in
check_for_composite_data_type_usage(). That will be helpful for the
user to take necessary action.

A few other comments:
=
1.
@@ -189,6 +199,8 @@ check_new_cluster(void)
 {
  get_db_and_rel_infos(_cluster);

+ check_new_cluster_logical_replication_slots();
+
  check_new_cluster_is_empty();

  check_loadable_libraries();

Why check_new_cluster_logical_replication_slots is done before
check_new_cluster_is_empty? At least check_new_cluster_is_empty()
would be much quicker to return an error if any. I think if we don't
have a specific reason to position this new check, we can do it at the
end after check_for_new_tablespace_dir() to avoid breaking the order
of existing checks.

2. Shall we rename get_db_and_rel_infos() to
get_db_rel_and_slot_infos() or something like that as that function
now fetches the slot information as well?

-- 
With Regards,
Amit Kapila.




Re: Introduce join_info_array for direct lookups of SpecialJoinInfo by ojrelid

2023-09-06 Thread Richard Guo
On Mon, May 8, 2023 at 10:30 AM Richard Guo  wrote:

> I'd like to devise a test query that shows performance gain from this
> patch, but I'm not sure how to do that.  May need help here.
>

I've been trying for some time but still haven't been able to come up
with a test case that shows the performance improvement of this patch.
My best guess is that situations that can benefit from direct lookups of
SpecialJoinInfo are pretty rare, and the related codes are not in the
critical path.  So for now I think I'd better withdraw this patch to
avoid people wasting time reviewing it.

Thanks
Richard


Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-06 Thread John Naylor
On Wed, Sep 6, 2023 at 2:50 PM Quan Zongliang 
wrote:

> If not optimized(--enable-debug CFLAGS='-O0'), there is a clear
> difference. When the compiler does the optimization, the performance is
> similar. I think the compiler does a good enough optimization with
> "pg_attribute_always_inline" and the last two constant parameters when
> calling heapgetpage_collect.

So as we might expect, more specialization (Andres' patch) has no apparent
downsides in this workload. (While I'm not sure of the point of testing at
-O0, I think we can conclude that less-bright compilers will show some
improvement with either patch.)

If you agree, do you want to withdraw your patch from the commit fest?

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-06 Thread Amit Kapila
On Wed, Sep 6, 2023 at 8:47 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Tuesday, September 5, 2023 3:35 PM Kuroda, Hayato/黒田 隼人 
>  wrote:
> >
> > Dear Hou-san,
> >
> > > Based on this, it’s possible that the slots we get each time when
> > > checking wal_status are different, because they may get changed in between
> > these checks.
> > > This may not cause serious problems for now, because we will either
> > > copy all the slots including ones invalidated when upgrading or we
> > > report ERROR. But I feel it's better to get consistent result each
> > > time we check the slots to close the possibility for problems in the
> > > future. So, I feel we could centralize the check for wal_status and
> > > slots fetch, so that even if some slots status changed after that, it 
> > > won't have
> > a risk to affect our check. What do you think ?
> >
> > Thank you for giving the suggestion! I agreed that to centralize checks, 
> > and I
> > had already started to modify. Here is the updated patch.
> >
> > In this patch all slot infos are extracted in the
> > get_old_cluster_logical_slot_infos(),
> > upcoming functions uses them. Based on the change, two attributes
> > confirmed_flush and wal_status were added in LogicalSlotInfo.
> >
> > IIUC we cannot use strcut List in the client codes, so structures and 
> > related
> > functions are added in the function.c. These are used for extracting unique
> > plugins, but it may be overkill because check_loadable_libraries() handle
> > duplicated entries. If we can ignore duplicated entries, these functions 
> > can be
> > removed.
> >
> > Also, for simplifying codes, only a first-met invalidated slot is output in 
> > the
> > check_old_cluster_for_valid_slots(). Warning messages int the function were
> > removed. I think it may be enough because check_new_cluster_is_empty() do
> > similar thing. Please tell me if it should be reverted...
>
> Thank for updating the patch ! here are few comments.
>
> 1.
>
> +   res = executeQueryOrDie(conn, "SHOW wal_level;");
> +   wal_level = PQgetvalue(res, 0, 0);
>
> +   res = executeQueryOrDie(conn, "SHOW wal_level;");
> +   wal_level = PQgetvalue(res, 0, 0);
>
> I think it would be better to do a sanity check using PQntuples() before
> calling PQgetvalue() in above places.
>
> 2.
>
> +/*
> + * Helper function for get_old_cluster_logical_slot_infos()
> + */
> +static WALAvailability
> +GetWALAvailabilityByString(const char *str)
> +{
> +   WALAvailability status = WALAVAIL_INVALID_LSN;
> +
> +   if (strcmp(str, "reserved") == 0)
> +   status = WALAVAIL_RESERVED;
>
> Not a comment, but I am wondering if we could use conflicting field to do this
> check, so that we could avoid the new conversion function and structure
> movement. What do you think ?
>

I also think referring to the conflicting field would be better not
only for the purpose of avoiding extra code but also to give accurate
information about invalidated slots for which we want to give an
error.

Additionally, I think we should try to avoid writing a new function
strtoLSN as that adds a maintainability burden. We can probably send
the value fetched from pg_controldata in the query for comparison with
confirmed_flush LSN.

-- 
With Regards,
Amit Kapila.




Re: GUC for temporarily disabling event triggers

2023-09-06 Thread Daniel Gustafsson
> On 5 Sep 2023, at 17:29, Robert Haas  wrote:
> On Tue, Sep 5, 2023 at 8:12 AM Daniel Gustafsson  wrote:

>> The attached version of the patch replaces it with a boolean flag for turning
>> off all event triggers, and I also renamed it to the debug_xxx "GUC 
>> namespace"
>> which seemed more appropriate.
> 
> I don't care for the debug_xxx renaming, myself. I think that should
> be reserved for things where we believe there's no reason to ever use
> it in production/real life, or for things whose purpose is to emit
> debugging messages. Neither is the case here.

Fair enough, how about disable_event_trigger instead as per the attached?

--
Daniel Gustafsson



v7-0001-Add-GUC-for-temporarily-disabling-event-triggers.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2023-09-06 Thread shveta malik
On Fri, Sep 1, 2023 at 1:59 PM Peter Smith  wrote:
>
> Hi Shveta. Here are some comments for patch v14-0002
>
> The patch is large, so my code review is a WIP... more later next week...
>

Thanks Peter for the feedback. I have tried to address most of these
in v15. Please find my response inline for the ones which I have not
addressed.

> ==
> GENERAL
>
> 1. Patch size
>
> The patch is 2700 lines.  Is it possible to break this up into smaller
> self-contained parts to make the reviews more manageable?
>

Currently, patches are created based on work done on primary and
standby. Patch 001 for primary-side implementation and 002 for standby
side. Let me think more on this and see if the changes can be
segregated further.


>
> 26.
> +typedef struct SlotSyncWorker
> +{
> + /* Time at which this worker was launched. */
> + TimestampTz launch_time;
> +
> + /* Indicates if this slot is used or free. */
> + bool in_use;
> +
> + /* The slot in worker pool to which it is attached */
> + int slot;
> +
> + /* Increased every time the slot is taken by new worker. */
> + uint16 generation;
> +
> + /* Pointer to proc array. NULL if not running. */
> + PGPROC*proc;
> +
> + /* User to use for connection (will be same as owner of subscription). */
> + Oid userid;
> +
> + /* Database id to connect to. */
> + Oid dbid;
> +
> + /* Count of Database ids it manages */
> + uint32 dbcount;
> +
> + /* DSA for dbids */
> + dsa_area *dbids_dsa;
> +
> + /* dsa_pointer for database ids it manages */
> + dsa_pointer dbids_dp;
> +
> + /* Mutex to access dbids in dsa */
> + slock_t mutex;
> +
> + /* Info about slot being monitored for worker's naptime purpose */
> + SlotSyncWorkerWatchSlot monitor;
> +} SlotSyncWorker;
>
> There seems an awful lot about this struct which is common with
> 'LogicalRepWorker' struct.
>
> It seems a shame not to make use of the commonality instead of all the
> cut/paste here.
>
> E.g. Can it be rearranged so all these common fields are shared:
> - launch_time
> - in_use
> - slot
> - generation
> - proc
> - userid
> - dbid
>
> ==
> src/include/storage/lwlock.h

Sure, I had this in mind along with previous comments where it was
suggested to merge similar functions like
WaitForReplicationWorkerAttach, WaitForSlotSyncWorkerAttach etc. That
merging could only be possible if we try to merge the common part of
these structures. This is WIP, will be addressed in the next version.

>
> 27.
>   LWTRANCHE_LAUNCHER_HASH,
> - LWTRANCHE_FIRST_USER_DEFINED
> + LWTRANCHE_FIRST_USER_DEFINED,
> + LWTRANCHE_SLOT_SYNC_DSA
>  } BuiltinTrancheIds;
>
> Isn't 'LWTRANCHE_FIRST_USER_DEFINED' supposed to the be last enum?
>
> ==
> src/test/recovery/meson.build
>
> ==
> src/test/recovery/t/051_slot_sync.pl
>

I have currently removed this file from the patch. Please see my
comments (pt 3) here:
https://mail.google.com/mail/u/0/?ik=52d5778aba=om=msg-a:r-2984462571505788980

thanks
Shveta

> 28.
> +
> +# Copyright (c) 2021, PostgreSQL Global Development Group
> +
> +use strict;
>
> Wrong copyright date
>
> ~~~
>
> 29.
> +my $node_primary = PostgreSQL::Test::Cluster->new('primary');
> +my $node_phys_standby = PostgreSQL::Test::Cluster->new('phys_standby');
> +my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
>
> 29a.
> Can't all the subroutines be up-front? Then this can move to be with
> the other node initialisation code that comets next.
>
> ~
>
> 29b.
> Add a comment something like # Setup nodes
>
> ~~~
>
> 30.
> +# Check conflicting status in pg_replication_slots.
> +sub check_slots_conflicting_status
> +{
> + my $res = $node_phys_standby->safe_psql(
> + 'postgres', qq(
> + select bool_and(conflicting) from pg_replication_slots;));
> +
> + is($res, 't',
> + "Logical slot is reported as conflicting");
> +}
>
> Doesn't bool_and() mean returns false if only some but not all slots
> are conflicting - is that intentional?> Or is this sub-routine only
> expecting to test one slot, in which case maybe the SQL should include
> also the 'slot_name'?
>
> ~~~
>
> 31.
> +$node_primary->start;
> +$node_primary->psql('postgres', q{SELECT
> pg_create_physical_replication_slot('pslot1');});
> +
> +$node_primary->backup('backup');
> +
> +$node_phys_standby->init_from_backup($node_primary, 'backup',
> has_streaming => 1);
> +$node_phys_standby->append_conf('postgresql.conf', q{
> +synchronize_slot_names = '*'
> +primary_slot_name = 'pslot1'
> +hot_standby_feedback = off
> +});
> +$node_phys_standby->start;
> +
> +$node_primary->safe_psql('postgres', "CREATE TABLE t1 (a int PRIMARY KEY)");
> +$node_primary->safe_psql('postgres', "INSERT INTO t1 VALUES (1), (2), (3)");
>
> The comments seem mostly to describe details about what are the
> expectations at each test step.
>
> IMO there also needs to be a larger "overview" comment to describe
> more generally *what* this is testing, and *how* it is testing it.
> e.g. it is hard to understand the test without being already familiar
> with the patch.
>
> --
> 

Re: [Regression] Incorrect filename in test case comment

2023-09-06 Thread Daniel Gustafsson
> On 6 Sep 2023, at 10:19, Michael Paquier  wrote:
> 
> On Wed, Sep 06, 2023 at 10:48:32AM +0530, Suraj Kharage wrote:
>> While browsing the test cases, found that the incorrect filename was there
>> in the test case comment.
>> The below commit added the custom hash opclass in insert.sql,
> 
> --- part_part_test_int4_ops and part_test_text_ops in insert.sql.
> +-- part_part_test_int4_ops and part_test_text_ops in test_setup.sql.
> 
> Good catch, but part_part_test_int4_ops should be renamed to
> part_test_int4_ops, removing the first "part_", no?

Ah, seems we came to same conclusion when looking simultaneously, I just pushed
the fix with the typo fix.

--
Daniel Gustafsson





Re: [Regression] Incorrect filename in test case comment

2023-09-06 Thread Daniel Gustafsson
> On 6 Sep 2023, at 07:18, Suraj Kharage  wrote:

> we haven't changed the filename in other test cases.
> Did the same in the attached patch.

Pushed (along with a small typo fix), thanks!

--
Daniel Gustafsson





Re: [Regression] Incorrect filename in test case comment

2023-09-06 Thread Michael Paquier
On Wed, Sep 06, 2023 at 10:48:32AM +0530, Suraj Kharage wrote:
> While browsing the test cases, found that the incorrect filename was there
> in the test case comment.
> The below commit added the custom hash opclass in insert.sql,

--- part_part_test_int4_ops and part_test_text_ops in insert.sql.
+-- part_part_test_int4_ops and part_test_text_ops in test_setup.sql.

Good catch, but part_part_test_int4_ops should be renamed to
part_test_int4_ops, removing the first "part_", no?
--
Michael


signature.asc
Description: PGP signature


Re: Transaction timeout

2023-09-06 Thread Fujii Masao




On 2022/12/19 5:53, Andrey Borodin wrote:

On Wed, Dec 7, 2022 at 1:30 PM Andrey Borodin  wrote:

I hope to address other feedback on the weekend.


Thanks for implementing this feature!

While testing v4 patch, I noticed it doesn't handle the COMMIT AND CHAIN case 
correctly.
When COMMIT AND CHAIN is executed, I believe the transaction timeout counter 
should reset
and start from zero with the next transaction. However, it appears that the 
current
v4 patch doesn't reset the counter in this scenario. Can you confirm this?

With the v4 patch, I found that timeout errors no longer occur during the idle 
in
transaction phase. Instead, they occur when the next statement is executed. Is 
this
the intended behavior? I thought some users might want to use the transaction 
timeout
feature to prevent prolonged transactions and promptly release resources (e.g., 
locks)
in case of a timeout, similar to idle_in_transaction_session_timeout.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-06 Thread Quan Zongliang




On 2023/9/6 15:50, Quan Zongliang wrote:



On 2023/9/5 18:46, John Naylor wrote:


On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang > wrote:


 > Here's how I test it
 >     EXPLAIN ANALYZE SELECT * FROM orders;

Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not 
good for these kinds of tests.


 > I'll also try Andres Freund's test method next.

Commit f691f5b80a85 from today removes another source of overhead in 
this function, so I suggest testing against that, if you wish to test 
again.



Test with the latest code of the master branch, see the attached results.

If not optimized(--enable-debug CFLAGS='-O0'), there is a clear 
difference. When the compiler does the optimization, the performance is 
similar. I think the compiler does a good enough optimization with 
"pg_attribute_always_inline" and the last two constant parameters when 
calling heapgetpage_collect.


Add a note. The first execution time of an attachment is not calculated 
in the average.





--
John Naylor
EDB: http://www.enterprisedb.com 






Re: Improving the heapgetpage function improves performance in common scenarios

2023-09-06 Thread Quan Zongliang



On 2023/9/5 18:46, John Naylor wrote:


On Tue, Sep 5, 2023 at 4:27 PM Quan Zongliang > wrote:


 > Here's how I test it
 >     EXPLAIN ANALYZE SELECT * FROM orders;

Note that EXPLAIN ANALYZE has quite a bit of overhead, so it's not good 
for these kinds of tests.


 > I'll also try Andres Freund's test method next.

Commit f691f5b80a85 from today removes another source of overhead in 
this function, so I suggest testing against that, if you wish to test again.



Test with the latest code of the master branch, see the attached results.

If not optimized(--enable-debug CFLAGS='-O0'), there is a clear 
difference. When the compiler does the optimization, the performance is 
similar. I think the compiler does a good enough optimization with 
"pg_attribute_always_inline" and the last two constant parameters when 
calling heapgetpage_collect.




--
John Naylor
EDB: http://www.enterprisedb.com 

Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-06 Thread Andy Fan
>
> I guess the *valuable* sometimes means the effort we pay is greater
> than the benefit we get,  As for this patch,  the benefit is not huge (it
> is possible the compiler already does that). and the most effort we pay
> should be committer's attention, who needs to grab the patch, write the
> correct  commit and credit to the author and push it.  I'm not sure if
> Aleksander is worrying that this kind of patch will grab too much of
> the committer's attention and I do see there are lots of patches like
> this.
>

I forget to mention that the past contribution of the author should be a
factor as well.  Like Richard has provided lots of performance improvement,
bug fix, code reviews, so I believe more attention from committers should
be a reasonable request.

-- 
Best Regards
Andy Fan


Re: Extract numeric filed in JSONB more effectively

2023-09-06 Thread Andy Fan
>
>
> based on v13.
> IMHO, it might be a good idea to write some comments on
> jsonb_object_field_internal. especially the second boolean argument.
> something like "some case, we just want return JsonbValue rather than
> Jsonb. to return JsonbValue, make as_jsonb be false".
>

OK,  I will proposal  "return a JsonbValue when as_jsonb is false".


> I am not sure "jsonb_object_field_start" is a good name, so far I only
> come up with "jsonb_object_field_to_jsonbvalues".


Yes, I think it is a good idea.  Puting the jsonbvalue in the name can
compensate for the imprecision of "internal" as a return type.  I am
thinking
if we should rename jsonb_finish_numeric to jsonbvalue_to_numeric as
well.


>
linitial(jsonb_start_func->args) =
> makeRelabelType(linitial(jsonb_start_func->args),
>INTERNALOID, 0,
>InvalidOid,
>COERCE_IMPLICIT_CAST);
>
> if no need, output typmod (usually -1), so here should be -1 rather than 0?


I agree. -1 is better than 0.

Thanks for the code level review again! I want to wait for some longer time
to gather more feedback.  I'm willing to name it better,  but hope I didn't
rename it to A and rename it back shortly.

-- 
Best Regards
Andy Fan


Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-09-06 Thread Peter Eisentraut
I noticed that on this commitfest entry 
(https://commitfest.postgresql.org/44/4491/), the reviewers were 
assigned by the patch author (presumably because they had previously 
contributed to this thread).  Unless these individuals know about that, 
this is unlikely to work out.  It's better to remove the reviewer 
entries and let people sign up on their own.  (You can nudge potential 
candidates, of course.)



On 07.08.23 20:49, Christoph Heiss wrote:


Hi all,
sorry for the long delay.

On Mon, Jan 09, 2023 at 04:32:09PM +0100, Jim Jones wrote:

However, an "ALTER TABLE  S" does not complete the open
parenthesis "(" from "SET (", as suggested in "ALTER VIEW  ".

postgres=# ALTER VIEW w SET
Display all 187 possibilities? (y or n)

Is it intended to behave like this? If so, an "ALTER VIEW 
RES" does complete the open parenthesis -> "RESET (".


On Sun, Jan 29, 2023 at 10:19:12AM +, Mikhail Gribkov wrote:

The patch have a potential, although I have to agree with Jim Jones,
it obviously have a problem with "alter view  set"
handling.
[..]
I think it may worth looking at "alter materialized view"  completion
tree and making "alter view" the same way.


Thank you both for reviewing/testing and the suggestions. Yeah,
definitively, sounds very sensible.

I've attached a new revision, rebased and addressing the above by
aligning it with how "ALTER MATERIALIZED VIEW" works, such that "SET ("
and "SET SCHEMA" won't compete anymore. So that should now work more
like expected.

postgres=# ALTER MATERIALIZED VIEW m
ALTER COLUMN CLUSTER ON   DEPENDS ON EXTENSION
NO DEPENDS ON EXTENSION  OWNER TO RENAME
RESET (  SET

postgres=# ALTER MATERIALIZED VIEW m SET
(ACCESS METHODSCHEMA   TABLESPACE
WITHOUT CLUSTER

postgres=# ALTER VIEW v
ALTER COLUMN  OWNER TO  RENAMERESET (   SET

postgres=# ALTER VIEW v SET
(   SCHEMA

postgres=# ALTER VIEW v SET (
CHECK_OPTION  SECURITY_BARRIER  SECURITY_INVOKER

On Fri, Jan 06, 2023 at 12:18:44PM +, Dean Rasheed wrote:

Hmm, I don't think we should be offering "check_option" as a tab
completion for CREATE VIEW at all, since that would encourage users to
use non-SQL-standard syntax, rather than CREATE VIEW ... WITH
[CASCADED|LOCAL] CHECK OPTION.


Left that part in for now. I would argue that it is a well-documented
combination and as such users would expect it to turn up in the
tab-complete as well. OTOH not against removing it either, if there are
others voicing the same opinion ..

Thanks,
Christoph






Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-06 Thread Andy Fan
On Wed, Sep 6, 2023 at 2:45 PM Andy Fan 
wrote:uld have a different conversation...

>
> I like this consultation, so +1 from me :)
>

s/consultation/conclusion.


-- 
Best Regards
Andy Fan


Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-09-06 Thread Andy Fan
On Wed, Sep 6, 2023 at 4:06 AM Robert Haas  wrote:

> On Tue, Sep 5, 2023 at 12:05 PM Aleksander Alekseev
>  wrote:
> > Now when we continue reviewing the patch, could you please elaborate a
> > bit on why you think it's worth committing?
>
> Well, why not? The test he's proposing to move earlier doesn't involve
> calling a function, so it should be cheaper than the one he's moving
> it past, which does.
>
> I mean, I don't know whether the savings are going to be measurable on
> a benchmark, but I guess I don't particularly see why it matters. Why
> write a function that says "this thing is cheaper so we test it first"
> and then perform a cheaper test afterwards? That's just silly. We can
> either change the comment to say "we do this first for no reason even
> though it would be more sensible to do the cheap test first" or we can
> reorder the tests to match the principle set forth in the existing
> comment.
>
> I mean, unless there's some reason why it *isn't* cheaper. In that
> case we should have a different conversation...
>

I like this consultation, so +1 from me :)

I guess the *valuable* sometimes means the effort we pay is greater
than the benefit we get,  As for this patch,  the benefit is not huge (it
is possible the compiler already does that). and the most effort we pay
should be committer's attention, who needs to grab the patch, write the
correct  commit and credit to the author and push it.  I'm not sure if
Aleksander is worrying that this kind of patch will grab too much of
the committer's attention and I do see there are lots of patches like
this.

In my opinion,  we can do some stuff to improve the ROI.
-  Authors should do as much as possible,  mainly a better commit
message.  As for this patch, the commit message is " Adjustment
to get_cheapest_path_for_pathkeys"  which I don't think matches
our culture.
- Authors can provide more refactor code if possible. like 8b26769bc

.
- After others reviewers read the patch and think it is good to commit
with the rule above,  who can mark the commitfest entry as "Ready
for Committer".  Whenever a committer wants some non mental stress,
they can pick this and commit this.

Actually I also want to know what "Ready for Committer" is designed for,
and when/who can mark a patch as "Ready for Committer" ?

-- 
Best Regards
Andy Fan


Re: persist logical slots to disk during shutdown checkpoint

2023-09-06 Thread Dilip Kumar
On Wed, Sep 6, 2023 at 12:01 PM Amit Kapila  wrote:
>
> On Wed, Sep 6, 2023 at 9:57 AM Dilip Kumar  wrote:
> >
> > On Wed, Sep 6, 2023 at 9:47 AM Amit Kapila  wrote:
> > >
> > > On Tue, Sep 5, 2023 at 6:00 PM Dilip Kumar  wrote:
> >
> > >
> > > Right, it can change and in that case, the check related to
> > > confirm_flush LSN will fail during the upgrade. However, the point is
> > > that if we don't take spinlock, we need to properly write comments on
> > > why unlike in other places it is safe here to check these values
> > > without spinlock.
> >
> > I agree with that, but now also it is not true that we are alway
> > reading this under the spin lock for example[1][2], we can see we are
> > reading this without spin lock.
> > [1]
> > StartLogicalReplication
> > {
> > /*
> > * Report the location after which we'll send out further commits as the
> > * current sentPtr.
> > */
> > sentPtr = MyReplicationSlot->data.confirmed_flush;
> > }
> > [2]
> > LogicalIncreaseRestartDecodingForSlot
> > {
> > /* candidates are already valid with the current flush position, apply */
> > if (updated_lsn)
> > LogicalConfirmReceivedLocation(slot->data.confirmed_flush);
> > }
> >
>
> These are accessed only in walsender and confirmed_flush is always
> updated by walsender. So, this is clearly okay.

Hmm, that's a valid point.

> >  We can do that but I feel we have to be careful for
> > > all future usages of these variables, so, having spinlock makes them
> > > follow the normal coding pattern which I feel makes it more robust.
> > > Yes, marking dirty via common function also has merits but personally,
> > > I find it better to follow the normal coding practice of checking the
> > > required fields under spinlock. The other possibility is to first
> > > check if we need to mark the slot dirty under spinlock, then release
> > > the spinlock, and then call the common MarkDirty function, but again
> > > that will be under the assumption that these flags won't change.
> >
> > Thats true, but we are already making the assumption because now also
> > we are taking the spinlock and taking a decision of marking the slot
> > dirty.  And after that we are releasing the spin lock and if we do not
> > have guarantee that it can not concurrently change the many things can
> > go wrong no?
> >
>
> Also, note that invalidated field could be updated by startup process
> but that is only possible on standby, so it is safe but again that
> would be another assumption.

Okay, so I also agree to go with the current patch.  Because as you
said above if we access this without a spin lock outside walsender
then we will be making a new exception and I agree with that decision
of not making the new exception.

Other than that the patch LGTM.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: persist logical slots to disk during shutdown checkpoint

2023-09-06 Thread Amit Kapila
On Wed, Sep 6, 2023 at 9:57 AM Dilip Kumar  wrote:
>
> On Wed, Sep 6, 2023 at 9:47 AM Amit Kapila  wrote:
> >
> > On Tue, Sep 5, 2023 at 6:00 PM Dilip Kumar  wrote:
>
> >
> > Right, it can change and in that case, the check related to
> > confirm_flush LSN will fail during the upgrade. However, the point is
> > that if we don't take spinlock, we need to properly write comments on
> > why unlike in other places it is safe here to check these values
> > without spinlock.
>
> I agree with that, but now also it is not true that we are alway
> reading this under the spin lock for example[1][2], we can see we are
> reading this without spin lock.
> [1]
> StartLogicalReplication
> {
> /*
> * Report the location after which we'll send out further commits as the
> * current sentPtr.
> */
> sentPtr = MyReplicationSlot->data.confirmed_flush;
> }
> [2]
> LogicalIncreaseRestartDecodingForSlot
> {
> /* candidates are already valid with the current flush position, apply */
> if (updated_lsn)
> LogicalConfirmReceivedLocation(slot->data.confirmed_flush);
> }
>

These are accessed only in walsender and confirmed_flush is always
updated by walsender. So, this is clearly okay.

>  We can do that but I feel we have to be careful for
> > all future usages of these variables, so, having spinlock makes them
> > follow the normal coding pattern which I feel makes it more robust.
> > Yes, marking dirty via common function also has merits but personally,
> > I find it better to follow the normal coding practice of checking the
> > required fields under spinlock. The other possibility is to first
> > check if we need to mark the slot dirty under spinlock, then release
> > the spinlock, and then call the common MarkDirty function, but again
> > that will be under the assumption that these flags won't change.
>
> Thats true, but we are already making the assumption because now also
> we are taking the spinlock and taking a decision of marking the slot
> dirty.  And after that we are releasing the spin lock and if we do not
> have guarantee that it can not concurrently change the many things can
> go wrong no?
>

Also, note that invalidated field could be updated by startup process
but that is only possible on standby, so it is safe but again that
would be another assumption.

> Anyway said that, I do not have any strong objection against what we
> are doing now.  There were discussion around making the code so that
> it can use common function and I was suggesting how it could be
> achieved but I am not against the current way either.
>

Okay, thanks for looking into it.

-- 
With Regards,
Amit Kapila.




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-09-06 Thread John Naylor
On Mon, Aug 28, 2023 at 9:44 PM Masahiko Sawada 
wrote:
>
> I've attached v42 patch set. I improved tidstore regression test codes
> in addition of imcorporating the above comments.

Seems fine at a glance, thanks. I will build on this to implement
variable-length values. I have already finished one prerequisite which is:
public APIs passing pointers to values.

--
John Naylor
EDB: http://www.enterprisedb.com