date:20230823

On Wed, Aug 23, 2023 at 07:32:06PM -0700, Nathan Bossart wrote:
> On Thu, Aug 24, 2023 at 10:22:49AM +0900, Ian Lawrence Barwick wrote:
>> Looking at the code, this is happening because
>> "pgstat_fetch_stat_local_beentry()"
>> expects to be passed the backend ID as an integer representing a 1-based 
>> index
>> referring to "localBackendStatusTable", but "pg_stat_get_backend_subxact()"
>> is presumably intended to take the actual BackendId , as per other
>> "pg_stat_get_XXX()"
>> functions.
> 
> Yes, this was changed in d7e39d7, but 10ea0f9 seems to have missed the
> memo.

BTW I'd argue that this is a bug in v16 that we should try to fix before
GA, so I've added an open item [0].  I assigned it to Robert (CC'd) since
he was the committer, but I'm happy to pick it up.

[0] https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items#Open_Issues

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-23 Thread Amit Kapila

On Thu, Aug 24, 2023 at 7:55 AM Peter Smith  wrote:
>
> ==
> src/bin/pg_upgrade/info.c
>
> 4. get_logical_slot_infos
>
> +/*
> + * get_logical_slot_infos()
> + *
> + * Higher level routine to generate LogicalSlotInfoArr for all databases.
> + */
> +void
> +get_logical_slot_infos(ClusterInfo *cluster)
> +{
> + int dbnum;
> +
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
> + return;
>
> It is no longer clear to me what is the purpose of these version checks.
>
> As mentioned in comment #2 above, I don't think we need to check the 
> new_cluster >= 1700, because this patch is for PG17 by definition.
>
> OTOH, I also don't recognise the reason why there has to be a PG17 
> restriction on the 'old_cluster' version. Such a restriction seems to cripple 
> the usefulness of this patch (eg. cannot even upgrade slots from PG16 to 
> PG17), and there is no explanation given for it. If there is some valid 
> incompatibility reason why only PG17 old_cluster slots can be upgraded then 
> it ought to be described in detail and probably also mentioned in the PG DOCS.
>

One of the main reasons is that slots prior to v17 won't persist
confirm_flush_lsn as discussed in the email thread [1] which means it
will always fail even if we allow to upgrade from versions prior to
v17. Now, there is an argument that let's backpatch what's being
discussed in [1] and then we will be able to upgrade slots from the
prior version. Normally, we don't backatch new enhancements, so even
if we want to do that in this case, a separate argument has to be made
for it. We have already discussed this point in this thread. We can
probably add a comment in the patch where we do version checks so that
it will be a bit easier to understand the reason.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JzJagMmb_E8D4au%3DGYQkxox0AfNBm1FbP7sy7t4YWXPQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-23 Thread Richard Guo

On Thu, Aug 24, 2023 at 1:44 AM Tom Lane  wrote:

> Richard Guo  writes:
> > If we go with the "tablesample scans can't be reparameterized" approach
> > in the back branches, I'm a little concerned that what if we find more
> > cases in the futrue where we need modify RTEs for reparameterization.
> > So I spent some time seeking and have managed to find one: there might
> > be lateral references in a scan path's restriction clauses, and
> > currently reparameterize_path_by_child fails to adjust them.
>
> Hmm, this seems completely wrong to me.  By definition, such clauses
> ought to be join clauses not restriction clauses, so how are we getting
> into this state?  IOW, I agree this is clearly buggy but I think the
> bug is someplace else.

If the clause contains PHVs that syntactically belong to a rel and
meanwhile have lateral references to other rels, then it may become a
restriction clause with lateral references.  Take the query shown
upthread as an example,

select count(*) from prt1 t1 left join lateral
(select t1.b as t1b, t2.* from prt2 t2) s
on t1.a = s.b where s.t1b = s.a;

The clause 's.t1b = s.a' would become 'PHV(t1.b) = t2.a' after we have
pulled up the subquery.  The PHV in it syntactically belongs to 't2' and
laterally refers to 't1'.  So this clause is actually a restriction
clause for rel 't2', and will be put into the baserestrictinfo of t2
rel.  But it also has lateral reference to rel 't1', which we need to
adjust in reparameterize_path_by_child for partitionwise join.

Thanks
Richard

Re: pg_stat_get_backend_subxact() and backend IDs?

On Thu, Aug 24, 2023 at 10:22:49AM +0900, Ian Lawrence Barwick wrote:
> Looking at the code, this is happening because
> "pgstat_fetch_stat_local_beentry()"
> expects to be passed the backend ID as an integer representing a 1-based index
> referring to "localBackendStatusTable", but "pg_stat_get_backend_subxact()"
> is presumably intended to take the actual BackendId , as per other
> "pg_stat_get_XXX()"
> functions.

Yes, this was changed in d7e39d7, but 10ea0f9 seems to have missed the
memo.

> Assuming I am not misunderstanding something here (always a
> possibility, apologies
> in advance if this is merely noise), what is actually needed is a function 
> which
> accepts a BackendId (as per "pgstat_fetch_stat_beentry()"), but returns a
> LocalPgBackendStatus (as per "pgstat_fetch_stat_local_beentry()") like the
> attached, clumsily named "pgstat_fetch_stat_backend_local_beentry()".

I think you are right.  The relevant information is only available in
LocalPgBackendStatus, but there's presently no helper function for
obtaining the "local" status with the BackendId.

> +LocalPgBackendStatus *
> +pgstat_fetch_stat_backend_local_beentry(BackendId beid)
> +{
> + LocalPgBackendStatus key;
> +
> + pgstat_read_current_status();
> +
> + /*
> +  * Since the localBackendStatusTable is in order by backend_id, we can 
> use
> +  * bsearch() to search it efficiently.
> +  */
> + key.backend_id = beid;
> +
> + return (LocalPgBackendStatus *) bsearch(, localBackendStatusTable,
> + 
> localNumBackends,
> + 
> sizeof(LocalPgBackendStatus),
> + 
> cmp_lbestatus);
> +}

We could probably modify pgstat_fetch_stat_beentry() to use this new
function.  I suspect we'll want to work on the naming, too.  Maybe we could
name them pg_stat_fetch_local_beentry_by_index() and
pg_stat_fetch_local_beentry_by_backendid().

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-23 Thread Peter Smith

Thanks for the updated patches.

Here are some review comments for the patch v24-0002

==
doc/src/sgml/ref/pgupgrade.sgml

1.
+ 
+  
+   All slots on the old cluster must be usable, i.e., there are no
slots
+   whose wal_status is
lost (see
+   ).
+  
+ 
+ 
+  
+   confirmed_flush_lsn (see )
+   of all slots on the old cluster must be the same as the latest
+   checkpoint location.
+  
+ 

It might be more tidy to change the way those links (e.g. "See section
54.19") are presented:

1a.
SUGGESTION
All slots on the old cluster must be usable, i.e., there are no slots whose
pg_replication_slots.wal_status
is lost.

~

1b.
SUGGESTION
pg_replication_slots.confirmed_flush_lsn
of all slots on the old cluster must be the same as the latest checkpoint
location.

==
src/bin/pg_upgrade/check.c

2.
+ /* Logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster.major_version) >= 1700)
+ check_new_cluster_logical_replication_slots();
+

Does it even make sense to check the new_cluster version? IIUC pg_upgrade
*always* updates to the current PG version, which must be 1700 by
definition, because this only is a PG17 patch, right?

For example, see check_cluster_versions() function where it does this check:

/* Only current PG version is supported as a target */
if (GET_MAJOR_VERSION(new_cluster.major_version) !=
GET_MAJOR_VERSION(PG_VERSION_NUM))
pg_fatal("This utility can only upgrade to PostgreSQL version %s.",
PG_MAJORVERSION);

==
src/bin/pg_upgrade/function.c

3.
os_info.libraries = (LibraryInfo *) pg_malloc(totaltups *
sizeof(LibraryInfo));
totaltups = 0;

for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
{
PGresult   *res = ress[dbnum];
int ntups;
int rowno;

ntups = PQntuples(res);
for (rowno = 0; rowno < ntups; rowno++)
{
char   *lib = PQgetvalue(res, rowno, 0);

os_info.libraries[totaltups].name = pg_strdup(lib);
os_info.libraries[totaltups].dbnum = dbnum;

totaltups++;
}
PQclear(res);
}

~

Although this was not introduced by your patch, I do not understand why the
'totaltups' variable gets reset to zero and then re-incremented in these
loops.

In other words, how is it possible for the end result of 'totaltups' to be
any different from what was already calculated earlier in this function?

IMO totaltups = 0; and totaltups++; is just redundant code.

==
src/bin/pg_upgrade/info.c

4. get_logical_slot_infos

+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

It is no longer clear to me what is the purpose of these version checks.

As mentioned in comment #2 above, I don't think we need to check the
new_cluster >= 1700, because this patch is for PG17 by definition.

OTOH, I also don't recognise the reason why there has to be a PG17
restriction on the 'old_cluster' version. Such a restriction seems to
cripple the usefulness of this patch (eg. cannot even upgrade slots from
PG16 to PG17), and there is no explanation given for it. If there is some
valid incompatibility reason why only PG17 old_cluster slots can be
upgraded then it ought to be described in detail and probably also
mentioned in the PG DOCS.

~~~

5. count_logical_slots

+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all
databases.
+ */
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ /* Quick exit if the version is prior to PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slot_count += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}

Same as the previous comment #4. I had doubts about the intent/need for
this cluster version checking.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: pgbench: allow to exit immediately when any client is aborted

2023-08-23 Thread Yugo NAGATA

On Thu, 24 Aug 2023 09:15:51 +0900 (JST)
Tatsuo Ishii  wrote:

> >> I start to think this behavior is ok and consistent with previous
> >> behavior of pgbench because serialization (and dealock) errors have
> >> been treated specially from other types of errors, such as accessing
> >> non existing tables. However, I suggest to add more sentences to the
> >> explanation of this option so that users are not confused by this.
> >> 
> >> + 
> >> +  --exit-on-abort
> >> +  
> >> +   
> >> +Exit immediately when any client is aborted due to some error. 
> >> Without
> >> +this option, even when a client is aborted, other clients could 
> >> continue
> >> +their run as specified by -t or 
> >> -T option,
> >> +and pgbench will print an incomplete 
> >> results
> >> +in this case.
> >> +   
> >> +  
> >> + 
> >> +
> >> 
> >> What about inserting "Note that serialization failures or deadlock
> >> failures will not abort client.  See  >> linkend="failures-and-retries"/> for more information." into the end
> >> of this paragraph?
> > 
> > --exit-on-abort is related to "abort" of a client instead of error or
> > failure itself, so rather I wonder a bit that mentioning 
> > serialization/deadlock
> > failures might be  confusing. However, if users may think of such failures 
> > from
> > "abort", it could be beneficial to add the sentences with some modification 
> > as
> > below.
> 
> I myself confused by this and believe that adding extra paragraph is
> beneficial to users.

Ok.

> >  "Note that serialization failures or deadlock failures does not abort the
> >   client, so they are not affected by this option.
> >   See  for more information."
> 
> "does not" --> "do not".

Oops. I attached the updated patch.

> >> BTW, I think:
> >> Exit immediately when any client is aborted due to some error. 
> >> Without
> >> 
> >> should be:
> >> Exit immediately when any client is aborted due to some errors. 
> >> Without
> >> 
> >> (error vs. erros)
> > 
> > Well, I chose "some" to mean "unknown or unspecified", not "an unspecified 
> > amount
> > or number of", so singular form "error" is used. 
> 
> Ok.
> 
> > Instead, should we use "due to a error"?
> 
> I don't think so.
> 
> >> Also:
> >> + --exit-on-abort is specified . Otherwise in the 
> >> worst
> >> 
> >> There is an extra space between "specified" and ".".
> > 
> > Fixed.
> > 
> > Also, I fixed the place of the description in the documentation
> > to alphabetical order
> > 
> > Attached is the updated patch. 
> 
> Looks good to me. If there's no objection, I will commit this next week.
> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp


-- 
Yugo NAGATA 
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 6c5c8afa6d..95cc9027c3 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -768,6 +768,24 @@ pgbench  options  d
   
  
 
+ 
+  --exit-on-abort
+  
+   
+Exit immediately when any client is aborted due to some error. Without
+this option, even when a client is aborted, other clients could continue
+their run as specified by -t or -T option,
+and pgbench will print an incomplete results
+in this case.
+   
+   
+Note that serialization failures or deadlock failures do not abort the
+client, so they are not affected by this option.
+See  for more information.
+   
+  
+ 
+
  
   --failures-detailed
   
@@ -985,7 +1003,8 @@ pgbench  options  d
benchmark such as initial connection failures also exit with status 1.
Errors during the run such as database errors or problems in the script
will result in exit status 2. In the latter case,
-   pgbench will print partial results.
+   pgbench will print partial results if
+   --exit-on-abort option is not specified.
   
  
 
@@ -2801,14 +2820,17 @@ statement latencies in milliseconds, failures and retries:
  start a connection to the database server / the socket for connecting
  the client to the database server has become invalid). In such cases
  all clients of this thread stop while other threads continue to work.
+ However, --exit-on-abort is specified, all of the
+ threads stop immediately in this case.

  
  

  Direct client errors. They lead to immediate exit from
  pgbench with the corresponding error message
- only in the case of an internal pgbench
- error (which are supposed to never occur...). Otherwise in the worst
+ in the case of an internal pgbench
+ error (which are supposed to never occur...) or when
+ --exit-on-abort is specified. Otherwise in the worst
  case they only lead to the abortion of the failed client while

Re: In-placre persistance change of a relation

2023-08-23 Thread Kyotaro Horiguchi

Thank you for looking this!

At Mon, 14 Aug 2023 12:38:48 -0700, Nathan Bossart  
wrote in 
> I think there are some good ideas here.  I started to take a look at the
> patches, and I've attached a rebased version of the patch set.  Apologies
> if I am repeating any discussions from upthread.
> 
> First, I tested the time difference in ALTER TABLE SET UNLOGGED/LOGGED with
> the patch applied, and the results looked pretty impressive.
> 
>   before:
>   postgres=# alter table test set unlogged;
>   ALTER TABLE
>   Time: 5108.071 ms (00:05.108)
>   postgres=# alter table test set logged;
>   ALTER TABLE
>   Time: 6747.648 ms (00:06.748)
> 
>   after:
>   postgres=# alter table test set unlogged;
>   ALTER TABLE
>   Time: 25.609 ms
>   postgres=# alter table test set logged;
>   ALTER TABLE
>   Time: 1241.800 ms (00:01.242)

Thanks for confirmation. The difference between the both directions is
that making a table logged requires to emit WAL records for the entire
content.

> My first question is whether 0001 is a prerequisite to 0002.  I'm assuming
> it is, but the reason wasn't immediately obvious to me.  If it's just

In 0002, if a backend crashes after creating an init fork file but
before the associated commit, a lingering fork file could result in
data loss on the next startup. Thus, an utterly reliable file cleanup
mechanism is essential. 0001 also addresses the orphan storage files
issue arising from ALTER TABLE and similar commands.

> nice-to-have, perhaps we could simplify the patch set a bit.  I see that
> Heikki had some general concerns with the marker file approach [0], so
> perhaps it is at least worth brainstorming some alternatives if we _do_
> need it.

The rationale behind the file-based implementation is that any
leftover init fork file from a crash needs to be deleted before the
reinit(INIT) process kicks in, which happens irrelevantly to WAL,
before the start of crash recovery. I could implement it separately
from the reinit module, but I didn't since that results in almost a
duplication.

As commented in xlog.c, the purpose of the pre-recovery reinit CLEANUP
phase is to ensure hot standbys don't encounter erroneous unlogged
relations.  Based on that requirement, we need a mechanism to
guarantee that additional crucial operations are executed reliably at
the next startup post-crash, right before recovery kicks in (or reinit
CLEANUP). 0001 persists this data on a per-operation basis tightly
bonded to their target objects.

I could turn this into something like undo longs in a simple form, but
I'd rather not craft a general-purpose undo log system for this unelss
it's absolutely necessary.


> [0] https://postgr.es/m/9827ebd3-de2e-fd52-4091-a568387b1fc2%40iki.fi

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pg_stat_get_backend_subxact() and backend IDs?

2023-08-23 Thread Ian Lawrence Barwick

Hi

I was playing around with "pg_stat_get_backend_subxact()" (commit 10ea0f924)
and see it emits NULL values for some backends, e.g.:

postgres=# \pset null NULL
Null display is "NULL".

postgres=#  SELECT id, pg_stat_get_backend_pid(id), s.*,
   pg_stat_get_backend_activity (id)
  FROM pg_stat_get_backend_idset() id
   JOIN LATERAL pg_stat_get_backend_subxact(id) AS s ON TRUE;
 id  | pg_stat_get_backend_pid | subxact_count |
subxact_overflowed |pg_stat_get_backend_activity

-+-+---++
   1 | 3175972 | 0 | f
 | 
   2 | 3175973 | 0 | f
 | 
   3 | 3177889 | 0 | f
 | SELECT id, pg_stat_get_backend_pid(id), s.*,  +
 | |   |
 | pg_stat_get_backend_activity (id) +
 | |   |
 |FROM pg_stat_get_backend_idset() id+
 | |   |
 | JOIN LATERAL pg_stat_get_backend_subxact(id) AS s ON TRUE;
   4 | 3176027 | 5 | f
 | savepoint s4;
 256 | 3175969 |  NULL | NULL
 | 
 258 | 3175968 |  NULL | NULL
 | 
 259 | 3175971 |  NULL | NULL
 | 
(7 rows)

Reading through the thread [1], it looks like 0/false are intended to be
returned for non-backend processes too [2], so it seems odd that NULL/NULL is
getting returned in some cases, especially as that's what's returned if a
non-existent backend ID is provided.

[1] 
https://www.postgresql.org/message-id/flat/CAFiTN-uvYAofNRaGF4R%2Bu6_OrABdkqNRoX7V6%2BPP3H_0HuYMwg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CAFiTN-ut0uwkRJDQJeDPXpVyTWD46m3gt3JDToE02hTfONEN%3DQ%40mail.gmail.com#821f6f40e91314066390efd06d71d5ac

Looking at the code, this is happening because
"pgstat_fetch_stat_local_beentry()"
expects to be passed the backend ID as an integer representing a 1-based index
referring to "localBackendStatusTable", but "pg_stat_get_backend_subxact()"
is presumably intended to take the actual BackendId , as per other
"pg_stat_get_XXX()"
functions.

Also, the comment for "pgstat_fetch_stat_local_beentry()" says:

   Returns NULL if the argument is out of range (no current caller does that).

so the last part is currently incorrect.

Assuming I am not misunderstanding something here (always a
possibility, apologies
in advance if this is merely noise), what is actually needed is a function which
accepts a BackendId (as per "pgstat_fetch_stat_beentry()"), but returns a
LocalPgBackendStatus (as per "pgstat_fetch_stat_local_beentry()") like the
attached, clumsily named "pgstat_fetch_stat_backend_local_beentry()".

Regards

Ian Barwick
commit d4292d3347658f61855fd84827954f587838a324
Author: Ian Barwick 
Date:   Thu Aug 24 10:16:51 2023 +0900

pg_stat_get_backend_subxact(): handle backend ID correctly

diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 38f91a495b..bce7836264 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -1136,6 +1136,39 @@ pgstat_fetch_stat_local_beentry(int beid)
 }
 
 
+/* --
+ * pgstat_fetch_stat_backend_local_beentry() -
+ *
+ *	Like pgstat_fetch_stat_local_beentry() but takes the BackendId of the
+ *  desired session.
+ *
+ *	Returns NULL if the given beid doesn't identify any known session.
+ *
+ *	NB: caller is responsible for a check if the user is permitted to see
+ *	this info (especially the querystring).
+ * --
+ */
+
+LocalPgBackendStatus *
+pgstat_fetch_stat_backend_local_beentry(BackendId beid)
+{
+	LocalPgBackendStatus key;
+
+	pgstat_read_current_status();
+
+	/*
+	 * Since the localBackendStatusTable is in order by backend_id, we can use
+	 * bsearch() to search it efficiently.
+	 */
+	key.backend_id = beid;
+
+	return (LocalPgBackendStatus *) bsearch(, localBackendStatusTable,
+			localNumBackends,
+			sizeof(LocalPgBackendStatus),
+			cmp_lbestatus);
+}
+
+
 /* --
  * pgstat_fetch_stat_numbackends() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2b9742ad21..8479ea130b 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -727,7 +727,7 @@ pg_stat_get_backend_subxact(PG_FUNCTION_ARGS)
 
 	BlessTupleDesc(tupdesc);
 
-	if ((local_beentry = pgstat_fetch_stat_local_beentry(beid)) != NULL)
+	if ((local_beentry = pgstat_fetch_stat_backend_local_beentry(beid)) != NULL)
 	{
 		/* Fill values and NULLs */
 		values[0] = Int32GetDatum(local_beentry->backend_subxact_count);
diff --git

Re: pg_rewind WAL segments deletion pitfall

2023-08-23 Thread Kyotaro Horiguchi

At Wed, 23 Aug 2023 13:44:52 +0200, Alexander Kukushkin  
wrote in 
> On Tue, 22 Aug 2023 at 07:32, Michael Paquier  wrote:
> > I don't like much this patch.  While it takes correctly advantage of
> > the backward record read logic from SimpleXLogPageRead() able to
> > handle correctly timeline jumps, it creates a hidden dependency in the
> > code between the hash table from filemap.c and the page callback.
> > Wouldn't it be simpler to build a list of the segment names using the
> > information from WALOpenSegment and build this list in
> > findLastCheckpoint()?
> 
> I think the first version of the patch more or less did that. Not
> necessarily a list, but a hash table of WAL file names that we want to
> keep. But Kyotaro Horiguchi didn't like it and suggested creating entries
> in the filemap.c hash table instead.
> But, I agree, doing it directly from the findLastCheckpoint() makes the
> code easier to understand.
...
> > +   /*
> > +* Some entries (WAL segments) already have an action assigned
> > +* (see SimpleXLogPageRead()).
> > +*/
> > +   if (entry->action == FILE_ACTION_UNDECIDED)
> > +   entry->action = decide_file_action(entry);
> >
> > This change makes me a bit uneasy, per se my previous comment with the
> > additional code dependencies.
> >
> 
> We can revert to the original approach (see
> v1-0001-pg_rewind-wal-deletion.patch from the very first email) if you like.

On the other hand, that approach brings in another source that
suggests the way that file should be handled. I still think that
entry->action should be the only source. However, it seems I'm in the
minority here. So I'm not tied to that approach.

> > I think that this scenario deserves a test case.  If one wants to
> > emulate a delay in WAL archiving, it is possible to set
> > archive_command to a command that we know will fail, for instance.
> >
> 
> Yes, I totally agree, it is on our radar, but meanwhile please see the new
> version, just to check if I correctly understood your idea.

Agreed.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Let's make PostgreSQL multi-threaded

2023-08-23 Thread Mark Woodward

On Mon, Jun 12, 2023 at 5:17 PM Heikki Linnakangas  wrote:

> On 10/06/2023 21:01, Hannu Krosing wrote:
> > On Mon, Jun 5, 2023 at 4:52 PM Heikki Linnakangas 
> wrote:
>
> <<>>

>
> > * The backend code would be more complex.
> > -- this is still the case
>
> I don't quite buy that. A multi-threaded model isn't inherently more
> complex than a multi-process model. Just different. Sure, the transition
> period will be more complex, when we need to support both models. But in
> the long run, if we can remove the multi-process mode, we can make a lot
> of things *simpler*.
>

If I may weigh in here:
Making a previously unthreaded process able to handle multiple threads, is
a tedious process.

>
> > -- even more worrisome is that all extensions also need to be rewritten
>
> "rewritten" is an exaggeration. Yes, extensions will need adapt, similar
> to the core code. But I hope it will be pretty mechanical work, marking
> global variables as thread-local and such. Many extensions will work
> with little to no changes.
>

I can tell you from experience it isn't that easy.  In my career I have
taken a few "old" technologies and made them multithreaded and it is really
a complex and laborious undertaking.
Many operations that you do just fine without threads will break in a
multithreaded system. You need to make sure every function in every library
that you use is "thread safe."  Take a file handle, if you read, seek, or
write a file handle you are fine in a single process, but this breaks in a
multithreaded environment if the file handle is shared. That's a very
simple example. Openssl operations will almost certainly break and you will
need to rewrite your ssl stuff and protect some things with mutexes. When
you fork() a lot is essentially duplicated (COW) between the parent and
child that will ultimately be shared in a threaded model. Decades old
assumptions in the design and architecture will break and you will need to
rethink what you are doing and how it is done. You will need to change file
handling to get beyond the 1024 file limit in calls like "select." There is
a LOT of this kind of stuff, it is not mechanical. I even call into
question "Many extensions will work with little to no changes" as those too
will need to be audited for thread safety.  Think about loading extensions,
extensions are typically not loaded until they are used. In a
multi-threaded model, a shared library will only be loaded once. Think
about memory management, you will have multiple threads fighting over the
global heap as they allocate memory.  The list is virtually endless.

>
> > -- and many incompatibilities will be silent and take potentially years
> to find
>
> IMO this is the most scary part of all this. I'm optimistic that we can
> have enough compiler support and tooling to catch most issues. But we
> don't know for sure at this point.
>

We absolutely do not know and it *is* very scary.

>
> > * Terminating backend processes allows the OS to cleanly and quickly
> > free all resources, protecting against memory and file descriptor
> > leaks and making backend shutdown cheaper and faster
> > -- still true
>
> Yep. I'm not too worried about PostgreSQL code, our memory contexts and
> resource owners are very good at stopping leaks. But 3rd party libraries
> could pose hard problems. IIRC we still have a leak with the LLVM JIT
> code, for example. We should fix that anyway, of course, but the
> multi-process model is more forgiving with leaks like that.
>
> Again, we believe that this is true.

> --
> Heikki Linnakangas
> Neon (https://neon.tech)
>
>
>
>

Re: pgbench: allow to exit immediately when any client is aborted

2023-08-23 Thread Tatsuo Ishii

>> I start to think this behavior is ok and consistent with previous
>> behavior of pgbench because serialization (and dealock) errors have
>> been treated specially from other types of errors, such as accessing
>> non existing tables. However, I suggest to add more sentences to the
>> explanation of this option so that users are not confused by this.
>> 
>> + 
>> +  --exit-on-abort
>> +  
>> +   
>> +Exit immediately when any client is aborted due to some error. 
>> Without
>> +this option, even when a client is aborted, other clients could 
>> continue
>> +their run as specified by -t or 
>> -T option,
>> +and pgbench will print an incomplete 
>> results
>> +in this case.
>> +   
>> +  
>> + 
>> +
>> 
>> What about inserting "Note that serialization failures or deadlock
>> failures will not abort client.  See > linkend="failures-and-retries"/> for more information." into the end
>> of this paragraph?
> 
> --exit-on-abort is related to "abort" of a client instead of error or
> failure itself, so rather I wonder a bit that mentioning 
> serialization/deadlock
> failures might be  confusing. However, if users may think of such failures 
> from
> "abort", it could be beneficial to add the sentences with some modification as
> below.

I myself confused by this and believe that adding extra paragraph is
beneficial to users.

>  "Note that serialization failures or deadlock failures does not abort the
>   client, so they are not affected by this option.
>   See  for more information."

"does not" --> "do not".

>> BTW, I think:
>> Exit immediately when any client is aborted due to some error. 
>> Without
>> 
>> should be:
>> Exit immediately when any client is aborted due to some errors. 
>> Without
>> 
>> (error vs. erros)
> 
> Well, I chose "some" to mean "unknown or unspecified", not "an unspecified 
> amount
> or number of", so singular form "error" is used. 

Ok.

> Instead, should we use "due to a error"?

I don't think so.

>> Also:
>> + --exit-on-abort is specified . Otherwise in the 
>> worst
>> 
>> There is an extra space between "specified" and ".".
> 
> Fixed.
> 
> Also, I fixed the place of the description in the documentation
> to alphabetical order
> 
> Attached is the updated patch. 

Looks good to me. If there's no objection, I will commit this next week.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: pg_upgrade - a function parameter shadows global 'new_cluster'

2023-08-23 Thread Peter Smith

On Wed, Aug 23, 2023 at 6:00 PM Daniel Gustafsson  wrote:

> > On 23 Aug 2023, at 03:28, Peter Smith  wrote:
>
> > PSA a small patch to remove the unnecessary parameter, and so eliminate
> this shadowing.
>
> Agreed, applied. Thanks!
>
>
Thanks for pushing!

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 18:32:26 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > There are other potential uses for libpq in pg_regress though - I'd e.g. 
> > like
> > to have a "monitoring" session open, which we could use to detect that the
> > server crashed (by waiting for the FD to be become invalid). Where the
> > connection default issue could matter more?
> 
> Meh.  I don't find that idea compelling enough to justify adding
> restrictions on what test scenarios will work.  It's seldom hard to
> tell from the test output whether the server crashed.

I find it pretty painful to wade through a several-megabyte regression.diffs
to find the cause of a crash. I think we ought to use
restart_after_crash=false, since after a crash there's no hope for the tests
to succeed, but even in that case, we end up with a lot of pointless contents
in regression.diffs. If we instead realized that we shouldn't start further
tests, we'd limit that by a fair bit.

Greetings,

Andres Freund

Re: meson uses stale pg_config_paths.h left over from make

On Thu, 24 Aug 2023 at 00:52, David Rowley  wrote:
> Are there any objections to the attached being applied?

Pushed.

David

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Andres Freund  writes:
> On 2023-08-23 17:55:53 -0400, Tom Lane wrote:
>> The trouble with that approach is that in "make installcheck", we
>> don't really want to assume we know what the installed libpq's default
>> connection parameters are.  So we don't explicitly know where that
>> libpq will connect.

> Stepping back: I don't think installcheck matters for the concrete use of
> libpq we're discussing - the only time we wait for server startup is the
> non-installcheck case.

Oh, that's an excellent point.  So for the immediately proposed use-case,
there's no issue.  (We don't have a mode where we try to start a server
using already-installed executables.)

> There are other potential uses for libpq in pg_regress though - I'd e.g. like
> to have a "monitoring" session open, which we could use to detect that the
> server crashed (by waiting for the FD to be become invalid). Where the
> connection default issue could matter more?

Meh.  I don't find that idea compelling enough to justify adding
restrictions on what test scenarios will work.  It's seldom hard to
tell from the test output whether the server crashed.

> I was wondering if we could create an unambiguous connection info, but that
> seems like it'd be hard to do, without creating cross version hazards.

Hmm, we don't expect the regression test suite to work against other
server versions, so maybe that could be made to work --- that is, we
could run the psql under test and get a full set of connection
parameters out of it?  But I'm still not finding this worth the
trouble.

> What's the reason we don't force psql to come from the same build as
> pg_regress?

Because the point of installcheck is to check the installed binaries
--- including the installed psql and libpq.

(Thinks for a bit...)  Maybe we should add pg_regress to the installed
fileset, and use that copy not the in-tree copy for installcheck?
Then we could assume it's using the same libpq as psql.  IIRC there
have already been suggestions to do that for the benefit of PGXS
testing.

regards, tom lane

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 17:55:53 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-08-23 17:02:51 -0400, Tom Lane wrote:
> >> ... unless we hit problems with, say, a different default port number or
> >> socket path compiled into one copy vs. the other?  That seems like it's
> >> probably a "so don't do that" case, though.
>
> > If we were to find such a case, it seems we could just add whatever missing
> > parameter to the connection string? I think we would likely already hit such
> > problems though, the psql started by an installcheck pg_regress might use 
> > the
> > system libpq, I think?
>
> The trouble with that approach is that in "make installcheck", we
> don't really want to assume we know what the installed libpq's default
> connection parameters are.  So we don't explicitly know where that
> libpq will connect.

Stepping back: I don't think installcheck matters for the concrete use of
libpq we're discussing - the only time we wait for server startup is the
non-installcheck case.

There are other potential uses for libpq in pg_regress though - I'd e.g. like
to have a "monitoring" session open, which we could use to detect that the
server crashed (by waiting for the FD to be become invalid). Where the
connection default issue could matter more?

I was wondering if we could create an unambiguous connection info, but that
seems like it'd be hard to do, without creating cross version hazards.

> As I said, we might be able to start treating installed-libpq-not-
> compatible-with-build as a "don't do it" case.  Another idea is to try
> to ensure that pg_regress uses the same libpq that the psql-under-test
> does; but I'm not sure how to implement that.

I don't think that's likely to work, psql could use a libpq with a different
soversion. We could dlopen() libpq, etc, but that seems way too complicated.

What's the reason we don't force psql to come from the same build as
pg_regress?

Greetings,

Andres Freund

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Andres Freund  writes:
> On 2023-08-23 17:02:51 -0400, Tom Lane wrote:
>> ... unless we hit problems with, say, a different default port number or
>> socket path compiled into one copy vs. the other?  That seems like it's
>> probably a "so don't do that" case, though.

> If we were to find such a case, it seems we could just add whatever missing
> parameter to the connection string? I think we would likely already hit such
> problems though, the psql started by an installcheck pg_regress might use the
> system libpq, I think?

The trouble with that approach is that in "make installcheck", we
don't really want to assume we know what the installed libpq's default
connection parameters are.  So we don't explicitly know where that
libpq will connect.

As I said, we might be able to start treating installed-libpq-not-
compatible-with-build as a "don't do it" case.  Another idea is to try
to ensure that pg_regress uses the same libpq that the psql-under-test
does; but I'm not sure how to implement that.

regards, tom lane

Re: Fix error handling in be_tls_open_server()

> On 23 Aug 2023, at 23:47, Jacob Champion  wrote:
> 
> On Wed, Aug 23, 2023 at 6:23 AM Daniel Gustafsson  wrote:
>> This has the smell of a theoretical problem, I can't really imagine a
>> certificate where which would produce this.  Have you been able to trigger 
>> it?
>> 
>> Wouldn't a better fix be to error out on len == -1 as in the attached, maybe
>> with a "Shouldn't happen" comment?
> 
> Using "cert clientname=DN" in the HBA should let you issue Subjects
> without Common Names. Or, if you're using a certificate to authorize
> the connection rather than authenticate the user (for example
> "scram-sha-256 clientcert=verify-ca" in your HBA), then the certs you
> distribute could even be SAN-only with a completely empty Subject.

That's a good point, didn't think about those.  In those cases the original
patch by Sergey seems along the right lines.

--
Daniel Gustafsson

Re: Fix error handling in be_tls_open_server()

2023-08-23 Thread Jacob Champion

On Wed, Aug 23, 2023 at 6:23 AM Daniel Gustafsson  wrote:
> This has the smell of a theoretical problem, I can't really imagine a
> certificate where which would produce this.  Have you been able to trigger it?
>
> Wouldn't a better fix be to error out on len == -1 as in the attached, maybe
> with a "Shouldn't happen" comment?

Using "cert clientname=DN" in the HBA should let you issue Subjects
without Common Names. Or, if you're using a certificate to authorize
the connection rather than authenticate the user (for example
"scram-sha-256 clientcert=verify-ca" in your HBA), then the certs you
distribute could even be SAN-only with a completely empty Subject.

--Jacob

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 23:55:15 +0300, Nazir Bilal Yavuz wrote:
> On Wed, 23 Aug 2023 at 09:58, Andres Freund  wrote:
> > I'm hoping to push this fairly soon, as I'll be on vacation the last week of
> > August. I'll be online intermittently though, if there are issues, I can 
> > react
> > (very limited connectivity for middday Aug 29th - midday Aug 31th though). 
> > I'd
> > appreciate a quick review or two.
> 
> Patch looks good to me besides some minor points.

Thanks for looking!


> v3-0004-ci-Prepare-to-make-compute-resources-for-CI-confi.patch:
> diff --git a/.cirrus.star b/.cirrus.star
> +"""The main function is executed by cirrus-ci after loading
> .cirrus.yml and can
> +extend the CI definition further.
> +
> +As documented in .cirrus.yml, the final CI configuration is composed of
> +
> +1) the contents of this file
> 
> Instead of '1) the contents of this file' comment,  '1) the contents
> of .cirrus.yml file' could be better since this comment appears in
> .cirrus.star file.

Good catch.

> +if repo_config_url != None:
> +print("loading additional configuration from
> \"{}\"".format(repo_config_url))
> +output += config_from(repo_config_url)
> +else:
> +output += "n# REPO_CI_CONFIG_URL was not set\n"
> 
> Possible typo at output += "n# REPO_CI_CONFIG_URL was not set\n".

Fixed.


> v3-0008-ci-switch-tasks-to-debugoptimized-build.patch:
> Just thinking of possible optimizations and thought can't we create
> something like 'buildtype: xxx' to override default buildtype using
> .cirrus.star? This could be better for PG developers. For sure that
> could be the subject of another patch.

We could, but I'm not sure what the use would be?

Greetings,

Andres Freund

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 17:02:51 -0400, Tom Lane wrote:
> Daniel Gustafsson  writes:
> > On 23 Aug 2023, at 21:22, Andres Freund  wrote:
> >> I think there's more effective ways to make this cheaper. The basic thing
> >> would be to use libpq instead of forking of psql to make a connection
> >> check.
>
> > I had it in my head that not using libpq in pg_regress was a deliberate 
> > choice,
> > but I fail to find a reference to it in the archives.
>
> I have a vague feeling that you are right about that.  Perhaps the
> concern was that under "make installcheck", pg_regress might be
> using a build-tree copy of libpq rather than the one from the
> system under test.  As long as we're just trying to ping the server,
> that shouldn't matter too much I think

Or perhaps the opposite? That an installcheck pg_regress run might use the
system libpq, which doesn't have the symbols, or such?

Either way, with a function like PQping(), which existing in well beyond the
supported branches, that shouldn't be an issue?

> ... unless we hit problems with, say, a different default port number or
> socket path compiled into one copy vs. the other?  That seems like it's
> probably a "so don't do that" case, though.

If we were to find such a case, it seems we could just add whatever missing
parameter to the connection string? I think we would likely already hit such
problems though, the psql started by an installcheck pg_regress might use the
system libpq, I think?

Greetings,

Andres Freund

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

> On 23 Aug 2023, at 23:02, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> On 23 Aug 2023, at 21:22, Andres Freund  wrote:
>>> I think there's more effective ways to make this cheaper. The basic thing
>>> would be to use libpq instead of forking of psql to make a connection
>>> check.
> 
>> I had it in my head that not using libpq in pg_regress was a deliberate 
>> choice,
>> but I fail to find a reference to it in the archives.
> 
> I have a vague feeling that you are right about that.  Perhaps the
> concern was that under "make installcheck", pg_regress might be
> using a build-tree copy of libpq rather than the one from the
> system under test.  As long as we're just trying to ping the server,
> that shouldn't matter too much I think ... unless we hit problems
> with, say, a different default port number or socket path compiled into
> one copy vs. the other?  That seems like it's probably a "so don't
> do that" case, though.

Ah yes, that does ring a familiar bell.  I agree that using it for pinging the
server should be safe either way, but we should document the use-with-caution
in pg_regress.c if/when we go down that path.  I'll take a stab at changing the
psql retry loop for pinging tomorrow to see what it would look like.

--
Daniel Gustafsson

Re: PostgreSQL 16 release announcement draft

On Thu, 24 Aug 2023 at 05:55, Jonathan S. Katz  wrote:
> We could add something about 1349d2790 -- do you have suggested wording?

I think it's worth a mention. See the text added in square brackets below:

PostgreSQL 16 improves the performance of existing PostgreSQL functionality
through new query planner optimizations. In this latest release, the query
planner can parallelize  `FULL` and `RIGHT` joins, [generate more
optimal plans for
queries containing aggregate functions with a `DISTINCT` or `ORDER BY` clause,]
utilize incremental sorts for `SELECT DISTINCT` queries, and optimize
window function
executions so they execute more efficiently. It also introduces
`RIGHT` and `OUTER`
"anti-joins", which enable users to identify rows not present in a joined table.

Thanks

David

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Daniel Gustafsson  writes:
> On 23 Aug 2023, at 21:22, Andres Freund  wrote:
>> I think there's more effective ways to make this cheaper. The basic thing
>> would be to use libpq instead of forking of psql to make a connection
>> check.

> I had it in my head that not using libpq in pg_regress was a deliberate 
> choice,
> but I fail to find a reference to it in the archives.

I have a vague feeling that you are right about that.  Perhaps the
concern was that under "make installcheck", pg_regress might be
using a build-tree copy of libpq rather than the one from the
system under test.  As long as we're just trying to ping the server,
that shouldn't matter too much I think ... unless we hit problems
with, say, a different default port number or socket path compiled into
one copy vs. the other?  That seems like it's probably a "so don't
do that" case, though.

regards, tom lane

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

2023-08-23 Thread Nazir Bilal Yavuz

Hi,

Thanks for the patch!

On Wed, 23 Aug 2023 at 09:58, Andres Freund  wrote:
> I'm hoping to push this fairly soon, as I'll be on vacation the last week of
> August. I'll be online intermittently though, if there are issues, I can react
> (very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
> appreciate a quick review or two.

Patch looks good to me besides some minor points.

v3-0004-ci-Prepare-to-make-compute-resources-for-CI-confi.patch:
diff --git a/.cirrus.star b/.cirrus.star
+"""The main function is executed by cirrus-ci after loading
.cirrus.yml and can
+extend the CI definition further.
+
+As documented in .cirrus.yml, the final CI configuration is composed of
+
+1) the contents of this file

Instead of '1) the contents of this file' comment,  '1) the contents
of .cirrus.yml file' could be better since this comment appears in
.cirrus.star file.

+if repo_config_url != None:
+print("loading additional configuration from
\"{}\"".format(repo_config_url))
+output += config_from(repo_config_url)
+else:
+output += "n# REPO_CI_CONFIG_URL was not set\n"

Possible typo at output += "n# REPO_CI_CONFIG_URL was not set\n".

v3-0008-ci-switch-tasks-to-debugoptimized-build.patch:
Just thinking of possible optimizations and thought can't we create
something like 'buildtype: xxx' to override default buildtype using
.cirrus.star? This could be better for PG developers. For sure that
could be the subject of another patch.

Regards,
Nazir Bilal Yavuz
Microsoft

Re: BUG #18059: Unexpected error 25001 in stored procedure

I wrote:
> I started to code this, and immediately noticed that transformStmt()
> already has a companion function analyze_requires_snapshot() that
> returns "true" in the cases of interest ... except that it does
> not return true for T_CallStmt.  Perhaps that was intentional to
> begin with, but it is very hard to believe that it isn't a bug now,
> since transformCallStmt can invoke nearly arbitrary processing via
> transformExpr().  What semantic anomalies, if any, do we risk if CALL
> processing forces a transaction start?  (I rather imagine it does
> already, somewhere later on...)

I poked around some more, and determined that there should not be any
new semantic anomalies if analyze_requires_snapshot starts returning
true for CallStmts, because ExecuteCallStmt already acquires and
releases a snapshot before invoking the procedure (at least in the
non-atomic case, which is the one of interest here).  I spent some
time trying to devise a test case showing it's broken, and did not
succeed: the fact that we disallow sub-SELECTs in CALL arguments makes
it a lot harder than I'd expected to reach anyplace that would require
having a transaction snapshot set.  Nonetheless, I have very little
faith that such a scenario doesn't exist today, and even less that
we won't add one in future.  The only real reason I can see for not
setting a snapshot here is as a micro-optimization.  While that's
not without value, it seems hard to argue that CALL deserves an
optimization that SELECT doesn't get.

I also realized that ReturnStmts are likewise missing from
analyze_requires_snapshot().  This is probably unreachable, because
ReturnStmt can only appear in a SQL-language function and I can't
think of a scenario where we'd be parsing one outside a transaction.
Nonetheless it seems hard to argue that this is an optimization
we need to keep.

Hence I propose the attached patch, which invents
stmt_requires_parse_analysis() and makes analyze_requires_snapshot()
into an alias for it, so that all these statement types are treated
alike.  I made the adjacent comments a lot more opinionated, too,
in hopes that future additions won't overlook these concerns.

The actual bug fix is in plancache.c.  I decided to invert the tests
in plancache.c, because the macro really needed renaming anyway and
it seemed to read better this way.  I also noticed that
ResetPlanCache() already tries to optimize away invalidation of
utility statements, but that logic seems no longer necessary ---
what's more, it's outright wrong for CALL, because that does need
invalidation and won't get it.  (I have not tried to build a test
case proving that that's broken, but surely it is.)

Barring objections, this needs to be back-patched as far as v11.

regards, tom lane

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 4006632092..bc8176f545 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -335,6 +335,11 @@ transformStmt(ParseState *pstate, Node *parseTree)
 	}
 #endif			/* RAW_EXPRESSION_COVERAGE_TEST */

+	/*
+	 * Caution: when changing the set of statement types that have non-default
+	 * processing here, see also stmt_requires_parse_analysis() and
+	 * analyze_requires_snapshot().
+	 */
 	switch (nodeTag(parseTree))
 	{
 			/*
@@ -421,14 +426,22 @@ transformStmt(ParseState *pstate, Node *parseTree)
 }

 /*
- * analyze_requires_snapshot
- *		Returns true if a snapshot must be set before doing parse analysis
- *		on the given raw parse tree.
+ * stmt_requires_parse_analysis
+ *		Returns true if parse analysis will do anything non-trivial
+ *		with the given raw parse tree.
+ *
+ * Generally, this should return true for any statement type for which
+ * transformStmt() does more than wrap a CMD_UTILITY Query around it.
+ * When it returns false, the caller may assume that there is no situation
+ * in which parse analysis of the raw statement could need to be re-done.
  *
- * Classification here should match transformStmt().
+ * Currently, since the rewriter and planner do nothing for CMD_UTILITY
+ * Queries, a false result means that the entire parse analysis/rewrite/plan
+ * pipeline will never need to be re-done.  If that ever changes, callers
+ * will likely need adjustment.
  */
 bool
-analyze_requires_snapshot(RawStmt *parseTree)
+stmt_requires_parse_analysis(RawStmt *parseTree)
 {
 	bool		result;

@@ -442,6 +455,7 @@ analyze_requires_snapshot(RawStmt *parseTree)
 		case T_UpdateStmt:
 		case T_MergeStmt:
 		case T_SelectStmt:
+		case T_ReturnStmt:
 		case T_PLAssignStmt:
 			result = true;
 			break;
@@ -452,12 +466,12 @@ analyze_requires_snapshot(RawStmt *parseTree)
 		case T_DeclareCursorStmt:
 		case T_ExplainStmt:
 		case T_CreateTableAsStmt:
-			/* yes, because we must analyze the contained statement */
+		case T_CallStmt:
 			result = true;
 			break;

 		default:
-			/* other utility statements don't have any real parse analysis */
+			/* all other

Re: False "pg_serial": apparent wraparound” in logs

2023-08-23 Thread Imseih (AWS), Sami

Hi,

I dug a bit into this and what looks to be happening is the comparison
of the page containing the latest cutoff xid could falsely be reported
as in the future of the last page number because the latest
page number of the Serial slru is only set when the page is
initialized [1].

So under the correct conditions, such as in the repro, the serializable
XID has moved past the last page number, therefore to the next checkpoint
which triggers a CheckPointPredicate, it will appear that the slru
has wrapped around.

It seems what may be needed here is to advance the
latest_page_number during SerialSetActiveSerXmin and if
we are using the SLRU. See below:


diff --git a/src/backend/storage/lmgr/predicate.c 
b/src/backend/storage/lmgr/predicate.c
index 1af41213b4..6946ed21b4 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -992,6 +992,9 @@ SerialSetActiveSerXmin(TransactionId xid)

serialControl->tailXid = xid;

+   if (serialControl->headPage > 0)
+   SerialSlruCtl->shared->latest_page_number = SerialPage(xid);
+
LWLockRelease(SerialSLRULock);
}

[1] 
https://github.com/postgres/postgres/blob/master/src/backend/access/transam/slru.c#L306

Regards,

Sami

From: "Imseih (AWS), Sami" 
Date: Tuesday, August 22, 2023 at 7:56 PM
To: "pgsql-hack...@postgresql.org" 
Subject: False "pg_serial": apparent wraparound” in logs

Hi,

I Recently encountered a situation on the field in which the message
“could not truncate directory "pg_serial": apparent wraparound”
was logged even through there was no danger of wraparound. This
was on a brand new cluster and only took a few minutes to see
the message in the logs.

Reading on some history of this error message, it appears that there
was work done to improve SLRU truncation and associated wraparound
log messages [1]. The attached repro on master still shows that this message
can be logged incorrectly.

The repro runs updates with 90 threads in serializable mode and kicks
off a “long running” select on the same table in serializable mode.

As soon as the long running select commits, the next checkpoint fails
to truncate the SLRU and logs the error message.

Besides the confusing log message, there may also also be risk with
pg_serial getting unnecessarily bloated and depleting the disk space.

Is this a bug?

[1] 
https://www.postgresql.org/message-id/flat/20190202083822.GC32531%40gust.leadboat.com

Regards,

Sami Imseih
Amazon Web Services (AWS)

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

> On 23 Aug 2023, at 21:22, Andres Freund  wrote:
> On 2023-08-23 14:48:26 +0200, Daniel Gustafsson wrote:

>> I'll do another pass, but below are a few small comments so far.
>> 
>> I don't know Windows to know the implications, but should the below file have
>> some sort of warning about not doing that for production/shared systems, only
>> for dedicated test instances?
> 
> Ah, I should have explained that: I'm not planning to apply
> - regress: Check for postgres startup completion more often
> - ci: windows: Disabling write cache flushing during test
> right now. Compared to the other patches the wins are much smaller and/or more
> work is needed to make them good.
> 
> I think it might be worth going for
> - ci: switch tasks to debugoptimized build
> because that provides a fair bit of gain. But it might be more hurtful than
> helpful due to costing more when ccache doesn't work...

Gotcha.

>> +++ b/src/tools/ci/windows_write_cache.ps1
>> @@ -0,0 +1,20 @@
>> +# Define the write cache to be power protected. This reduces the rate of 
>> cache
>> +# flushes, which seems to help metadata heavy workloads on NTFS. We're just
>> +# testing here anyway, so ...
>> +#
>> +# Let's do so for all disks, this could be useful beyond cirrus-ci.
>> 
>> One thing in 0010 caught my eye, and while not introduced in this patchset it
>> might be of interest here.  In the below hunks we loop X ticks around
>> system(psql), with the loop assuming the server can come up really quickly 
>> and
>> sleeping if it doesn't.  On my systems I always reach the pg_usleep after
>> failing the check, but if I reverse the check such it first sleeps and then
>> checks I only need to check once instead of twice.
> 
> I think there's more effective ways to make this cheaper. The basic thing
> would be to use libpq instead of forking of psql to make a connection
> check.

I had it in my head that not using libpq in pg_regress was a deliberate choice,
but I fail to find a reference to it in the archives.

>> It's a micro-optimization, but if we're changing things here to chase cycles 
>> it
>> might perhaps be worth doing?
> 
> I wouldn't quite call not waiting for 1s for the server to start, when it does
> so within a few ms, chasing cycles ;). For short tests it's a substantial
> fraction of the overall runtime...

Absolutely, I was referring to shifting the sleep before the test to avoid the
extra test, not the reduction of the pg_usleep.  Reducing the sleep is a clear
win.

--
Daniel Gustafsson

Make documentation builds reproducible

Somewhere at PGCon, I forgot exactly where, maybe in the same meeting 
where we talked about getting rid of distprep, we talked about that the 
documentation builds are not reproducible (in the sense of 
https://reproducible-builds.org/).  This is easily fixable, the fix is 
available upstream 
(https://github.com/docbook/xslt10-stylesheets/issues/54) but not 
released.  We can backpatch that into our customization layer.  The 
attached patch shows it.


I had actually often wanted this during development.  When making 
documentation tooling changes, it's useful to be able to compare the 
output before and after, and this will eliminate false positives in that.


This patch addresses both the HTML and the FO output.  The man output is 
already reproducible AFAICT.  Note that the final PDF output is 
currently not reproducible; that's a different issue that needs to be 
fixed in FOP.  (See 
https://wiki.debian.org/ReproducibleBuilds/TimestampsInPDFGeneratedByApacheFOP.)From 9f2c7262445b6868c1d2c9f2d75ac1153b7c2a64 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 23 Aug 2023 16:38:44 +0200
Subject: [PATCH] Make documentation builds reproducible

---
 doc/src/sgml/stylesheet-fo.xsl  | 259 
 doc/src/sgml/stylesheet-html-common.xsl | 234 +
 2 files changed, 493 insertions(+)

diff --git a/doc/src/sgml/stylesheet-fo.xsl b/doc/src/sgml/stylesheet-fo.xsl
index 5e7e132480..aff717ddbc 100644
--- a/doc/src/sgml/stylesheet-fo.xsl
+++ b/doc/src/sgml/stylesheet-fo.xsl
@@ -1,4 +1,8 @@
 
+http://docbook.sourceforge.net/release/xsl/current/common/entities.ent;>
+%common.entities;
+]>
 http://www.w3.org/1999/XSL/Transform;
 version="1.0"
 xmlns:fo="http://www.w3.org/1999/XSL/Format;>
@@ -138,4 +142,259 @@
   
 
 
+
+
+
+
+
+  
+  
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  http://www.renderx.com/XSL/Extensions; 
xmlns:axf="http://www.antennahouse.com/names/XSL/Extensions;>
+
+  
+
+ientry-
+
+
+  
+
+
+  true
+
+
+
+  
+
+  
+
+
+
+
+
+  
+
+  
+  
+  
+  
+
+  
+
+
+  
+
+  
+
+  
+  
+
+  
+
+  
+  
+  
+  
+
+  
+
+
+
+  
+
+
+
+  
+ 
+ 
+ 
+ 
+  
+
+
+  
+
+  
+
+  
+ 
+ 
+ 
+ 
+  
+  
+   
+   
+   
+   
+  
+
+  
+
+
+
+  
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+  ientry-
+  
+
+  
+  
+
+  
+
+  
+ (
+
+  
+
+ 
+
+  
+  
+
+  
+  
+
+  
+  
+ 
+  
+  
+
+  
+
+  
+  
+
+  
+
+)
+  
+
+
+
+   
+  
+  
+
+  
+
+
+
+
+
+
+  
+
+  
+
+
+  
+
+  
+
+
+
+
+
+  
+
+ientry-
+
+  
+
+
+  
+
+
+
+  (
+  
+
+  
+   
+  
+
+
+  
+
+
+  
+
+
+  
+
+
+  
+
+  
+
+
+  
+
+  
+  )
+
+
+  
+
+
+
+
 
diff --git a/doc/src/sgml/stylesheet-html-common.xsl 
b/doc/src/sgml/stylesheet-html-common.xsl
index a368e0e199..7f541c0988 100644
--- a/doc/src/sgml/stylesheet-html-common.xsl
+++ b/doc/src/sgml/stylesheet-html-common.xsl
@@ -436,4 +436,238 @@ set   toc,title
   
 
 
+
+
+
+
+
+  
+  
+  
+
+  
+  
+  
+
+  
+  
+
+ientry-
+
+
+  
+
+
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+  
+
+
+
+  
+
+  
+
+
+
+
+  
+
+  
+  
+
+  
+
+
+
+
+  
+
+  
+
+
+
+  
+
+
+
+
+
+  
+
+  
+  
+
+  
+
+  
+
+
+
+
+
+  
+  
+
+
+
+
+
+  
+
+  
+
+
+
+  
+
+  
+
+
+
+  
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+  
+
+  
+
+  
+
+

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

Hi,

On 2023-08-23 14:48:26 +0200, Daniel Gustafsson wrote:
> > On 23 Aug 2023, at 08:58, Andres Freund  wrote:
>
> > I'm hoping to push this fairly soon, as I'll be on vacation the last week of
> > August. I'll be online intermittently though, if there are issues, I can 
> > react
> > (very limited connectivity for middday Aug 29th - midday Aug 31th though). 
> > I'd
> > appreciate a quick review or two.
>
> I've been reading over these and the thread, and while not within my area of
> expertise, nothing really sticks out.

Thanks!


> I'll do another pass, but below are a few small comments so far.
>
> I don't know Windows to know the implications, but should the below file have
> some sort of warning about not doing that for production/shared systems, only
> for dedicated test instances?

Ah, I should have explained that: I'm not planning to apply
- regress: Check for postgres startup completion more often
- ci: windows: Disabling write cache flushing during test
right now. Compared to the other patches the wins are much smaller and/or more
work is needed to make them good.

I think it might be worth going for
- ci: switch tasks to debugoptimized build
because that provides a fair bit of gain. But it might be more hurtful than
helpful due to costing more when ccache doesn't work...


> +++ b/src/tools/ci/windows_write_cache.ps1
> @@ -0,0 +1,20 @@
> +# Define the write cache to be power protected. This reduces the rate of 
> cache
> +# flushes, which seems to help metadata heavy workloads on NTFS. We're just
> +# testing here anyway, so ...
> +#
> +# Let's do so for all disks, this could be useful beyond cirrus-ci.
>
> One thing in 0010 caught my eye, and while not introduced in this patchset it
> might be of interest here.  In the below hunks we loop X ticks around
> system(psql), with the loop assuming the server can come up really quickly and
> sleeping if it doesn't.  On my systems I always reach the pg_usleep after
> failing the check, but if I reverse the check such it first sleeps and then
> checks I only need to check once instead of twice.

I think there's more effective ways to make this cheaper. The basic thing
would be to use libpq instead of forking of psql to make a connection
check. Medium term, I think we should invent a way for pg_ctl and other
tooling (including pg_regress) to wait for the service to come up. E.g. having
a named pipe that postmaster opens once the server is up, which should allow
multiple clients to use select/epoll/... to wait for it without looping.

ISTM making pg_regress use libpq w/ PQping() should be a pretty simple patch?
The non-polling approach obviously is even better, but also requires more
thought (and documentation and ...).


> @@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
>   else
>   wait_seconds = 60;
>
> - for (i = 0; i < wait_seconds; i++)
> + for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
>   {
>   /* Done if psql succeeds */
>   fflush(NULL);
> @@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[],
>outputdir);
>   }
>
> - pg_usleep(100L);
> + pg_usleep(100L / WAITS_PER_SEC);
>   }
>   if (i >= wait_seconds)
>   {
>
> It's a micro-optimization, but if we're changing things here to chase cycles 
> it
> might perhaps be worth doing?

I wouldn't quite call not waiting for 1s for the server to start, when it does
so within a few ms, chasing cycles ;). For short tests it's a substantial
fraction of the overall runtime...

Greetings,

Andres Freund

Re: Make all Perl warnings fatal


On 21.08.23 17:51, Andrew Dunstan wrote:
Still, I guess that might not matter too much since apart from plperl we 
only use perl for building / testing.


Regarding the dangers mentioned, I guess we can undo it if it proves a 
nuisance.


+1 to getting rid if the unnecessary call to getprotobyname().


I have committed the latter part.

The rest would still depend on some fixes for the MSVC build system, so 
I'll hold that until we decide what to do with that.

Re: PostgreSQL 16 release announcement draft

2023-08-23 Thread Jonathan S. Katz

On 8/23/23 8:02 AM, David Rowley wrote:

On Wed, 23 Aug 2023 at 22:21, jian he wrote:

PostgreSQL 16 improves the performance of existing PostgreSQL functionality
through new query planner optimizations. In this latest release, the query
planner can parallelize `FULL` and `RIGHT` joins, utilize incremental sorts for
`SELECT DISTINCT` queries, and execute window functions more efficiently. It
also introduces `RIGHT` and `OUTER` "anti-joins", which enable users to identify
rows not present in a joined table.

I think "utilize incremental sorts is for" something like select
my_avg(distinct one),my_sum(one) from (values(1),(3)) t(one);
so it's not the same as `SELECT DISTINCT` queries?
ref:
https://git.postgresql.org/cgit/postgresql.git/commit/?id=1349d2790bf48a4de072931c722f39337e72055e

The incremental sorts for DISTINCT will likely be a reference to
3c6fc5820, so, not the same thing as 1349d2790. I don't see anything
there relating to 1349d2790.

We could add something about 1349d2790 -- do you have suggested wording?

also
"the query planner ., and execute window functions more efficiently."
since the query planner doesn't execute anything. probably "and
optimize window functions execution"?

Yeah, that or "and optimize window functions so they execute more
efficiently" is likely an improvement there.

Modified. See updated announcement, with other incorporated changes.

Reminder that the window to submit changes closes at **August 26, 12:00
UTC**.

Thanks,

Jonathan

September 14, 2023 - The PostgreSQL Global Development Group today announced the
release of PostgreSQL 16, the latest version of the world's most advanced open
source database.

PostgreSQL 16 raises its performance, with notable improvements to query
parallelism, bulk data loading, and logical replication. There are many features
in this release for developers and administrators alike, including more SQL/JSON
syntax, new monitoring stats for your workloads, and greater flexibility in
defining access control rules for management of policies across large fleets.

PostgreSQL, an innovative data management system known for its reliability and
robustness, benefits from over 25 years of open source development from a global
developer community and has become the preferred open source relational database
for organizations of all sizes.

### Performance Improvements

PostgreSQL 16 improves the performance of existing PostgreSQL functionality
through new query planner optimizations. In this latest release, the query
planner can parallelize `FULL` and `RIGHT` joins, utilize incremental sorts for
`SELECT DISTINCT` queries, and optimize window function executions so they
execute more efficiently. It also introduces `RIGHT` and `OUTER` "anti-joins",
which enable users to identify rows not present in a joined table.

This release includes improvements for bulk loading using `COPY` in both single
and concurrent operations, with tests showing up to a 300% performance
improvement in some cases. PostgreSQL adds support for load balancing in clients
that use `libpq`, and improvements to vacuum strategy that reduce the necessity
of full-table freezes. Additionally, PostgreSQL 16 introduces CPU acceleration
using `SIMD` in both x86 and ARM architectures, resulting in performance gains
when processing ASCII and JSON strings, and performing array and subtransaction
searches.

### Logical replication

Logical replication lets PostgreSQL users stream data to other PostgreSQL
instances or subscribers that can interpret the PostgreSQL logical replication
protocol. In PostgreSQL 16, users can perform logical decoding from a standby
instance, meaning a standby can publish logical changes to other servers. This
provides developers with new workload distribution options – for example, using
a standby rather than the busier primary to logically replicate changes to
downstream systems.

Additionally, there are several performance improvements in PostgreSQL 16 to
logical replication. Subscribers can now apply large transactions using parallel
workers. For tables that do not have a `PRIMARY KEY`, subscribers can use B-tree
indexes instead of sequential scans to find rows. Under certain conditions,
users can also speed up initial table synchronization using the binary format.

There are several access control improvements to logical replication in
PostgreSQL 16, including the new predefined role pg_create_subscription, which
grants users the ability to create a new logical subscription. Finally, this
release begins adding support for bidirectional logical replication, introducing
functionality to replicate data between two tables from different publishers.

### Developer Experience

PostgreSQL 16 adds more syntax from the SQL/JSON standard, including
constructors and predicates such as `JSON_ARRAY()`, `JSON_ARRAYAGG()`, and
`IS JSON`. This release also introduces the ability to use underscores for
thousands separators (e.g. `5_432_000`) and

Re: Oversight in reparameterize_path_by_child leading to executor crash

Richard Guo  writes:
> If we go with the "tablesample scans can't be reparameterized" approach
> in the back branches, I'm a little concerned that what if we find more
> cases in the futrue where we need modify RTEs for reparameterization.
> So I spent some time seeking and have managed to find one: there might
> be lateral references in a scan path's restriction clauses, and
> currently reparameterize_path_by_child fails to adjust them.

Hmm, this seems completely wrong to me.  By definition, such clauses
ought to be join clauses not restriction clauses, so how are we getting
into this state?  IOW, I agree this is clearly buggy but I think the
bug is someplace else.

regards, tom lane

Re: using SysCacheGetAttrNotNull in place of heap_getattr

Alvaro Herrera  writes:
> I was about to push a quick patch to replace the use of heap_getattr()
> in get_primary_key_attnos() with SysCacheGetAttrNotNull(), because that
> makes the code a few lines shorter and AFAICS there's no downside.
> However, I realized that the commit that added the function
> (d435f15fff3c) did not make any such change at all -- it only changed
> SysCacheGetAttr calls to use the new function, but no heap_getattr.
> And we don't seem to have added such calls after.

Seems to me it'd be more consistent to invent a wrapper function
heap_getattr_notnull() that adds the same sort of error check,
instead of abusing the syscache function as you suggest.  For one
thing, then the functionality could be used whether there's a
suitable syscache or not.

regards, tom lane

Re: PG 16 draft release notes ready

2023-08-23 Thread Jeff Davis

On Tue, 2023-08-22 at 22:23 -0400, Bruce Momjian wrote:
> > I notice that this item is still listed:
> > 
> >  * Determine the ICU default locale from the environment (Jeff
> > Davis)
> > 
> > But that was reverted as part of 2535c74b1a.
> 
> The original commit is:
> 
> Author: Jeff Davis 
> 2023-03-10 [c45dc7ffb] initdb: derive encoding from locale
> for ICU; similar to
> 
> and I don't see that reverted by 2535c74b1a.  Is that a problem?

c45dc7ffb causes initdb to choose the encoding based on the environment
for ICU just like libc, and that was not reverted, so in v16:

  $ export LANG=en_US
  $ initdb -D data --locale-provider=icu --icu-locale=en
  ...
  The default database encoding has accordingly been set to "LATIN1".

Whereas previously in v15 that would cause an error like:

  initdb: error: encoding mismatch
  initdb: detail: The encoding you selected (UTF8) and the encoding
that the selected locale uses (LATIN1) do not match...

"Determine the ICU default locale from the environment" to me refers to
what happened in 27b62377b4, where initdb would select an ICU locale if
one was not provided. 2535c74b1a reverted that, so in v16:

  $ initdb -D data --locale-provider=icu
  initdb: error: ICU locale must be specified

Just like in v15.

Regards,
Jeff Davis

Re: add timing information to pg_upgrade

On Tue, Aug 22, 2023 at 07:06:23AM -0700, Nathan Bossart wrote:
> On Tue, Aug 22, 2023 at 11:49:33AM +0200, Peter Eisentraut wrote:
>> Let's change MESSAGE_WIDTH to 62 in v16, and then pursue the larger
>> restructuring leisurely.
> 
> Sounds good.  I plan to commit this within the next couple of days.

Here's the patch.  I'm going to run a couple of tests before committing,
but I don't think anything else is required.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From e2f65937666ec31004ec7668c475dd33c4ab0a91 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 23 Aug 2023 09:31:11 -0700
Subject: [PATCH 1/1] bump MESSAGE_WIDTH to 62

---
 src/bin/pg_upgrade/pg_upgrade.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 3eea0139c7..7afa96716e 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -22,7 +22,7 @@
 #define MAX_STRING			1024
 #define QUERY_ALLOC			8192
 
-#define MESSAGE_WIDTH		60
+#define MESSAGE_WIDTH		62
 
 #define GET_MAJOR_VERSION(v)	((v) / 100)
 
-- 
2.25.1

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-23 Thread Ashutosh Bapat

On Wed, Aug 23, 2023 at 11:07 AM Richard Guo  wrote:
>
>
> On Tue, Aug 22, 2023 at 10:39 PM Tom Lane  wrote:
>>
>> Richard Guo  writes:
>> > I'm wondering if we can instead adjust the 'inner_req_outer' in
>> > create_nestloop_path before we perform the check to work around this
>> > issue for the back branches, something like
>> > +   /*
>> > +* Adjust the parameterization information, which refers to the topmost
>> > +* parent.
>> > +*/
>> > +   inner_req_outer =
>> > +   adjust_child_relids_multilevel(root, inner_req_outer,
>> > +  outer_path->parent->relids,
>> > +  
>> > outer_path->parent->top_parent_relids);
>>
>> Mmm ... at the very least you'd need to not do that when top_parent_relids
>> is unset.
>
>
> Hmm, adjust_child_relids_multilevel would just return the given relids
> unchanged when top_parent_relids is unset, so I think it should be safe
> to call it even for non-other rel.
>
>>
>> ...  But I think the risk/reward ratio for messing with this in the
>> back branches is unattractive in any case: to fix a corner case that
>> apparently nobody uses in the field, we risk breaking any number of
>> mainstream parameterized-path cases.  I'm content to commit the v5 patch
>> (or a successor) into HEAD, but at this point I'm not sure I even want
>> to risk it in v16, let alone perform delicate surgery to get it to work
>> in older branches.  I think we ought to go with the "tablesample scans
>> can't be reparameterized" approach in v16 and before.
>
>
> If we go with the "tablesample scans can't be reparameterized" approach
> in the back branches, I'm a little concerned that what if we find more
> cases in the futrue where we need modify RTEs for reparameterization.
> So I spent some time seeking and have managed to find one: there might
> be lateral references in a scan path's restriction clauses, and
> currently reparameterize_path_by_child fails to adjust them.
>
> regression=# explain (verbose, costs off)
> select count(*) from prt1 t1 left join lateral
> (select t1.b as t1b, t2.* from prt2 t2) s
> on t1.a = s.b where s.t1b = s.a;
>   QUERY PLAN
> --
>  Aggregate
>Output: count(*)
>->  Append
>  ->  Nested Loop
>->  Seq Scan on public.prt1_p1 t1_1
>  Output: t1_1.a, t1_1.b
>->  Index Scan using iprt2_p1_b on public.prt2_p1 t2_1
>  Output: t2_1.b
>  Index Cond: (t2_1.b = t1_1.a)
>  Filter: (t1.b = t2_1.a)
>  ->  Nested Loop
>->  Seq Scan on public.prt1_p2 t1_2
>  Output: t1_2.a, t1_2.b
>->  Index Scan using iprt2_p2_b on public.prt2_p2 t2_2
>  Output: t2_2.b
>  Index Cond: (t2_2.b = t1_2.a)
>  Filter: (t1.b = t2_2.a)
>  ->  Nested Loop
>->  Seq Scan on public.prt1_p3 t1_3
>  Output: t1_3.a, t1_3.b
>->  Index Scan using iprt2_p3_b on public.prt2_p3 t2_3
>  Output: t2_3.b
>  Index Cond: (t2_3.b = t1_3.a)
>  Filter: (t1.b = t2_3.a)
> (24 rows)
>
> The clauses in 'Filter' are not adjusted correctly.  Var 't1.b' still
> refers to the top parent.
>
> For this query it would just give wrong results.
>
> regression=# set enable_partitionwise_join to off;
> SET
> regression=# select count(*) from prt1 t1 left join lateral
> (select t1.b as t1b, t2.* from prt2 t2) s
> on t1.a = s.b where s.t1b = s.a;
>  count
> ---
>100
> (1 row)
>
> regression=# set enable_partitionwise_join to on;
> SET
> regression=# select count(*) from prt1 t1 left join lateral
> (select t1.b as t1b, t2.* from prt2 t2) s
> on t1.a = s.b where s.t1b = s.a;
>  count
> ---
>  5
> (1 row)
>
>
> For another query it would error out during planning.
>
> regression=# explain (verbose, costs off)
> select count(*) from prt1 t1 left join lateral
> (select t1.b as t1b, t2.* from prt2 t2) s
> on t1.a = s.b where s.t1b = s.b;
> ERROR:  variable not found in subplan target list
>
> To fix it we may need to modify RelOptInfos for Path, BitmapHeapPath,
> ForeignPath and CustomPath, and modify IndexOptInfos for IndexPath.  It
> seems that that is not easily done without postponing reparameterization
> of paths until createplan.c.

Maybe I am missing something here but why aren't we copying these
restrictinfos in the path somewhere?  I have a similar question for
tablesample clause as well.

-- 
Best Wishes,
Ashutosh Bapat

Re: pgbench: allow to exit immediately when any client is aborted

2023-08-23 Thread Yugo NAGATA

On Sat, 19 Aug 2023 19:51:56 +0900 (JST)
Tatsuo Ishii  wrote:

> > I have tested the v4 patch with default_transaction_isolation =
> > 'repeatable read'.
> > 
> > pgbench --exit-on-abort -N -p 11002 -c 10 -T 30 test
> > pgbench (17devel, server 15.3)
> > starting vacuum...end.
> > transaction type: 
> > scaling factor: 1
> > query mode: simple
> > number of clients: 10
> > number of threads: 1
> > maximum number of tries: 1
> > duration: 30 s
> > number of transactions actually processed: 64854
> > number of failed transactions: 4 (0.006%)
> > latency average = 4.628 ms (including failures)
> > initial connection time = 49.526 ms
> > tps = 2160.827556 (without initial connection time)
> > 
> > Depite the 4 failed transactions (could not serialize access due to
> > concurrent update) pgbench ran normally, rather than aborting the
> > test. Is this an expected behavior?

Yes. --exit-on-abort changes a behaviour when a client is **aborted**
due to an error, and serialization errors do not cause abort, so
it is not affected.

> I start to think this behavior is ok and consistent with previous
> behavior of pgbench because serialization (and dealock) errors have
> been treated specially from other types of errors, such as accessing
> non existing tables. However, I suggest to add more sentences to the
> explanation of this option so that users are not confused by this.
> 
> + 
> +  --exit-on-abort
> +  
> +   
> +Exit immediately when any client is aborted due to some error. 
> Without
> +this option, even when a client is aborted, other clients could 
> continue
> +their run as specified by -t or -T 
> option,
> +and pgbench will print an incomplete 
> results
> +in this case.
> +   
> +  
> + 
> +
> 
> What about inserting "Note that serialization failures or deadlock
> failures will not abort client.  See  linkend="failures-and-retries"/> for more information." into the end
> of this paragraph?

--exit-on-abort is related to "abort" of a client instead of error or
failure itself, so rather I wonder a bit that mentioning serialization/deadlock
failures might be  confusing. However, if users may think of such failures from
"abort", it could be beneficial to add the sentences with some modification as
below.

 "Note that serialization failures or deadlock failures does not abort the
  client, so they are not affected by this option.
  See  for more information."

> BTW, I think:
> Exit immediately when any client is aborted due to some error. Without
> 
> should be:
> Exit immediately when any client is aborted due to some errors. 
> Without
> 
> (error vs. erros)

Well, I chose "some" to mean "unknown or unspecified", not "an unspecified 
amount
or number of", so singular form "error" is used. 
Instead, should we use "due to a error"?

> Also:
> + --exit-on-abort is specified . Otherwise in the 
> worst
> 
> There is an extra space between "specified" and ".".

Fixed.

Also, I fixed the place of the description in the documentation
to alphabetical order

Attached is the updated patch. 

Regards,
Yugo Nagata

> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp


-- 
Yugo NAGATA 
diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 6c5c8afa6d..4175cf4ccd 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -768,6 +768,24 @@ pgbench  options  d
   
  
 
+ 
+  --exit-on-abort
+  
+   
+Exit immediately when any client is aborted due to some error. Without
+this option, even when a client is aborted, other clients could continue
+their run as specified by -t or -T option,
+and pgbench will print an incomplete results
+in this case.
+   
+   
+Note that serialization failures or deadlock failures does not abort the
+client, so they are not affected by this option.
+See  for more information.
+   
+  
+ 
+
  
   --failures-detailed
   
@@ -985,7 +1003,8 @@ pgbench  options  d
benchmark such as initial connection failures also exit with status 1.
Errors during the run such as database errors or problems in the script
will result in exit status 2. In the latter case,
-   pgbench will print partial results.
+   pgbench will print partial results if
+   --exit-on-abort option is not specified.
   
  
 
@@ -2801,14 +2820,17 @@ statement latencies in milliseconds, failures and retries:
  start a connection to the database server / the socket for connecting
  the client to the database server has become invalid). In such cases
  all clients of this thread stop while other threads continue to work.
+ However, --exit-on-abort is specified, all of the
+ threads stop immediately in this case.

Re: [PATCH] Add function to_oct

On Tue, Aug 22, 2023 at 04:57:02PM +0200, Peter Eisentraut wrote:
> On 22.08.23 16:26, Nathan Bossart wrote:
>> > I don't understand the reason for this handling of negative values.  I 
>> > would
>> > expect that, say, to_hex(-1234) would return '-' || to_hex(1234).
>> For this patch set, I was trying to retain the current behavior, which is
>> to return the two's complement representation.  I'm open to changing this
>> if there's agreement on what the output should be.
> 
> Ah, if there is existing behavior then we should probably keep it.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

using SysCacheGetAttrNotNull in place of heap_getattr

2023-08-23 Thread Alvaro Herrera

I was about to push a quick patch to replace the use of heap_getattr()
in get_primary_key_attnos() with SysCacheGetAttrNotNull(), because that
makes the code a few lines shorter and AFAICS there's no downside.
However, I realized that the commit that added the function
(d435f15fff3c) did not make any such change at all -- it only changed
SysCacheGetAttr calls to use the new function, but no heap_getattr.
And we don't seem to have added such calls after.

Essentially the possibly contentious point is that the tuple we'd be
deforming did not come from syscache, but from a systable scan, so
calling a syscache function on it could be seen as breaking some API.
(Of course, this only works if there is a syscache on the given
relation.)

But we do have precedent: for example RelationGetFKeyList uses a sysscan
to feed DeconstructFkConstraintRow(), which extracts several attributes
that way using the CONSTROID syscache.

Does anybody think this could be a problem, if we extended it to be more
widely used?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Los cuentos de hadas no dan al niño su primera idea sobre los monstruos.
Lo que le dan es su primera idea de la posible derrota del monstruo."
   (G. K. Chesterton)

Re: Unlogged relations and WAL-logging

2023-08-23 Thread Heikki Linnakangas


On 07/07/2023 18:21, Heikki Linnakangas wrote:

On 28/01/2022 15:57, Robert Haas wrote:

4. Also, the smgrwrite() calls are performed before WAL-logging the
pages, so the page that's written to disk has 0/0 as the LSN, not the
LSN of the WAL record. That's harmless too, but seems a bit sloppy.


That is also a mistake on my part.


I'm still sitting on these fixes. I think the patch I posted still makes
sense, but I got carried away with a more invasive approach that
introduces a whole new set of functions for bulk-creating a relation,
which would handle WAL-logging, smgrimmedsync() and all that (see
below). We have some repetitive, error-prone code in all the index build
functions for that. But that's not backpatchable, so I'll rebase the
original approach next week.


Committed this fix to master and v16. Didn't seem worth backpatching 
further than that, given that there is no live user-visible issue here.



5. In heapam_relation_set_new_filenode(), we do this:



/*
 * If required, set up an init fork for an unlogged table so that it can
 * be correctly reinitialized on restart.  An immediate sync is required
 * even if the page has been logged, because the write did not go 
through
 * shared_buffers and therefore a concurrent checkpoint may have moved 
the
 * redo pointer past our xlog record.  Recovery may as well remove it
 * while replaying, for example, XLOG_DBASE_CREATE or XLOG_TBLSPC_CREATE
 * record. Therefore, logging is necessary even if wal_level=minimal.
 */
if (persistence == RELPERSISTENCE_UNLOGGED)
{
Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
   rel->rd_rel->relkind == RELKIND_MATVIEW ||
   rel->rd_rel->relkind == RELKIND_TOASTVALUE);
smgrcreate(srel, INIT_FORKNUM, false);
log_smgrcreate(newrnode, INIT_FORKNUM);
smgrimmedsync(srel, INIT_FORKNUM);
}


The comment doesn't make much sense, we haven't written nor WAL-logged
any page here, with nor without the buffer cache. It made more sense
before commit fa0f466d53.


Well, it seems to me (and perhaps I am just confused) that complaining
that there's no page written here might be a technicality. The point
is that there's no synchronization between the work we're doing here
-- which is creating a fork, not writing a page -- and any concurrent
checkpoint. So we both need to log it, and also sync it immediately.


I see. I pushed the fix from the other thread that makes smgrcreate()
call register_dirty_segment (commit 4b4798e13). I believe that makes
this smgrimmedsync() unnecessary. If a concurrent checkpoint happens
with a redo pointer greater than this WAL record, it must've received
the fsync request created by smgrcreate(). That depends on the fact that
we write the WAL record *after* smgrcreate(). Subtle..

Hmm, we have a similar smgrimmedsync() call after index build, because
we have written pages directly with smgrextend(skipFsync=true). If no
checkpoints have occurred during the index build, we could call
register_dirty_segment() instead of smgrimmedsync(). That would avoid
the fsync() latency when creating an index on an empty or small index.

This is all very subtle to get right though. That's why I'd like to
invent a new bulk-creation facility that would handle this stuff, and
make the callers less error-prone.


Having a more generic and less error-prone bulk-creation mechanism is 
still on my long TODO list..


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Fix error handling in be_tls_open_server()

> On 1 Aug 2023, at 16:44, Sergey Shinderuk  wrote:

> A static analyzer reported a possible pfree(NULL) in be_tls_open_server().

This has the smell of a theoretical problem, I can't really imagine a
certificate where which would produce this.  Have you been able to trigger it?

Wouldn't a better fix be to error out on len == -1 as in the attached, maybe
with a "Shouldn't happen" comment?

--
Daniel Gustafsson

X509_NAME_get_text_by_NID.diff
Description: Binary data

meson uses stale pg_config_paths.h left over from make

I've been having some problems running the regression tests using
meson on Windows.  This seems to be down to initdb being compiled with
a version of pg_config_paths.h left over from the msvc build which had
been used on that source tree previously.

Generally when there are files left over the meson build script will
detect this and ask you to run make maintainer-clean.  That's useful
on Linux, but on Windows you're just left to manually remove the
conflicting files which are listed.  Unfortunately, pg_config_paths.h
wasn't listed, which I think was missed because it's not generated
during configure, but during the actual make build. (see
src/port/Makefile)

Linux users are unlikely to experience this issue as they're likely
just going to run make maintainer-clean as instructed by the meson
error message.

The attached patch adds pg_config_paths.h to the generated_sources_ac
variable so that if that file exists, meson will provide an error
message to mention this. i.e:

> meson.build:2953:2: ERROR: Problem encountered:
> 
> Non-clean source code directory detected.

> To build with meson the source tree may not have an in-place, ./configure
> style, build configured. You can have both meson and ./configure style builds
> for the same source tree by building out-of-source / VPATH with
> configure. Alternatively use a separate check out for meson based builds.


> Conflicting files in source directory:
>  C:/Users//pg_src/src/port/pg_config_paths.h

> The conflicting files need to be removed, either by removing the files listed
> above, or by running configure and then make maintainer-clean.

Are there any objections to the attached being applied?

David


pg_config_paths_meson.patch
Description: Binary data

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?

> On 23 Aug 2023, at 08:58, Andres Freund  wrote:

> I'm hoping to push this fairly soon, as I'll be on vacation the last week of
> August. I'll be online intermittently though, if there are issues, I can react
> (very limited connectivity for middday Aug 29th - midday Aug 31th though). I'd
> appreciate a quick review or two.

I've been reading over these and the thread, and while not within my area of
expertise, nothing really sticks out.

I'll do another pass, but below are a few small comments so far.

I don't know Windows to know the implications, but should the below file have
some sort of warning about not doing that for production/shared systems, only
for dedicated test instances?

+++ b/src/tools/ci/windows_write_cache.ps1
@@ -0,0 +1,20 @@
+# Define the write cache to be power protected. This reduces the rate of cache
+# flushes, which seems to help metadata heavy workloads on NTFS. We're just
+# testing here anyway, so ...
+#
+# Let's do so for all disks, this could be useful beyond cirrus-ci.

One thing in 0010 caught my eye, and while not introduced in this patchset it
might be of interest here.  In the below hunks we loop X ticks around
system(psql), with the loop assuming the server can come up really quickly and
sleeping if it doesn't.  On my systems I always reach the pg_usleep after
failing the check, but if I reverse the check such it first sleeps and then
checks I only need to check once instead of twice.

@@ -2499,7 +2502,7 @@ regression_main(int argc, char *argv[],
else
wait_seconds = 60;

-   for (i = 0; i < wait_seconds; i++)
+   for (i = 0; i < wait_seconds * WAITS_PER_SEC; i++)
{
/* Done if psql succeeds */
fflush(NULL);
@@ -2519,7 +2522,7 @@ regression_main(int argc, char *argv[],
 outputdir);
}

-   pg_usleep(100L);
+   pg_usleep(100L / WAITS_PER_SEC);
}
if (i >= wait_seconds)
{

It's a micro-optimization, but if we're changing things here to chase cycles it
might perhaps be worth doing?

--
Daniel Gustafsson

Re: PostgreSQL 16 release announcement draft

On Wed, 23 Aug 2023 at 22:21, jian he  wrote:
>
> >>>
> PostgreSQL 16 improves the performance of existing PostgreSQL functionality
> through new query planner optimizations. In this latest release, the query
> planner can parallelize  `FULL` and `RIGHT` joins, utilize incremental sorts 
> for
> `SELECT DISTINCT` queries, and execute window functions more efficiently. It
> also introduces `RIGHT` and `OUTER` "anti-joins", which enable users to 
> identify
> rows not present in a joined table.
> >>>
>
> I think "utilize incremental sorts is for" something like select
> my_avg(distinct one),my_sum(one) from (values(1),(3)) t(one);
> so it's not the same as `SELECT DISTINCT` queries?
> ref: 
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=1349d2790bf48a4de072931c722f39337e72055e

The incremental sorts for DISTINCT will likely be a reference to
3c6fc5820, so, not the same thing as 1349d2790.  I don't see anything
there relating to 1349d2790.

> also
>  "the query planner ., and execute window functions more efficiently."
> since the query planner doesn't execute anything. probably "and
> optimize window functions execution"?

Yeah, that or "and optimize window functions so they execute more
efficiently" is likely an improvement there.

David

Re: pg_rewind WAL segments deletion pitfall

2023-08-23 Thread Alexander Kukushkin

Hi,



On Tue, 22 Aug 2023 at 07:32, Michael Paquier  wrote:

>
>
> I don't like much this patch.  While it takes correctly advantage of
> the backward record read logic from SimpleXLogPageRead() able to
> handle correctly timeline jumps, it creates a hidden dependency in the
> code between the hash table from filemap.c and the page callback.
> Wouldn't it be simpler to build a list of the segment names using the
> information from WALOpenSegment and build this list in
> findLastCheckpoint()?


I think the first version of the patch more or less did that. Not
necessarily a list, but a hash table of WAL file names that we want to
keep. But Kyotaro Horiguchi didn't like it and suggested creating entries
in the filemap.c hash table instead.
But, I agree, doing it directly from the findLastCheckpoint() makes the
code easier to understand.



>   Also, I am wondering if we should be smarter
> with any potential conflict handling between the source and the
> target, rather than just enforcing a FILE_ACTION_NONE for all these
> files.  In short, could it be better to copy the WAL file from the
> source if it exists there?
>

Before the switchpoint these files are supposed to be the same on the old
primary, new primary, and also in the archive. Also, if there is a
restore_command postgres will fetch the same file from the archive even if
it already exists in pg_wal, which effectively discards all pg_rewind
efforts on copying WAL files.


>
> +   /*
> +* Some entries (WAL segments) already have an action assigned
> +* (see SimpleXLogPageRead()).
> +*/
> +   if (entry->action == FILE_ACTION_UNDECIDED)
> +   entry->action = decide_file_action(entry);
>
> This change makes me a bit uneasy, per se my previous comment with the
> additional code dependencies.
>

We can revert to the original approach (see
v1-0001-pg_rewind-wal-deletion.patch from the very first email) if you like.


> I think that this scenario deserves a test case.  If one wants to
> emulate a delay in WAL archiving, it is possible to set
> archive_command to a command that we know will fail, for instance.
>

Yes, I totally agree, it is on our radar, but meanwhile please see the new
version, just to check if I correctly understood your idea.

Regards,
--
Alexander Kukushkin
From 1a8504419f7fafc04443c09d86807dccffc750f8 Mon Sep 17 00:00:00 2001
From: Alexander Kukushkin 
Date: Wed, 23 Aug 2023 13:40:47 +0200
Subject: [PATCH v4] Be more picky with WAL segment deletion in pg_rewind

Make pg_rewind to be a bit wiser in terms of creating filemap:
preserve on the target all WAL segments that contain records between the
last common checkpoint and the point of divergence.

Co-authored-by: Polina Bungina 
---
 src/bin/pg_rewind/filemap.c   | 19 ++-
 src/bin/pg_rewind/filemap.h   |  1 +
 src/bin/pg_rewind/parsexlog.c | 22 ++
 src/bin/pg_rewind/pg_rewind.c |  6 +++---
 4 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index bd5c598e20..a0c8dbc1b6 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -795,7 +795,12 @@ decide_file_actions(void)
 	filehash_start_iterate(filehash, );
 	while ((entry = filehash_iterate(filehash, )) != NULL)
 	{
-		entry->action = decide_file_action(entry);
+		/*
+		 * Some entries (WAL segments) already have an action assigned
+		 * (see findLastCheckpoint() function).
+		 */
+		if (entry->action == FILE_ACTION_UNDECIDED)
+			entry->action = decide_file_action(entry);
 	}
 
 	/*
@@ -818,6 +823,18 @@ decide_file_actions(void)
 	return filemap;
 }
 
+/*
+ * Prevent a given file deletion during rewind
+ */
+void
+preserve_file(char *filepath)
+{
+	file_entry_t *entry;
+
+	entry = insert_filehash_entry(filepath);
+	entry->action = FILE_ACTION_NONE;
+}
+
 
 /*
  * Helper function for filemap hash table.
diff --git a/src/bin/pg_rewind/filemap.h b/src/bin/pg_rewind/filemap.h
index 48f240dff1..47c7a7fd09 100644
--- a/src/bin/pg_rewind/filemap.h
+++ b/src/bin/pg_rewind/filemap.h
@@ -109,5 +109,6 @@ extern void process_target_wal_block_change(ForkNumber forknum,
 extern filemap_t *decide_file_actions(void);
 extern void calculate_totals(filemap_t *filemap);
 extern void print_filemap(filemap_t *filemap);
+extern void preserve_file(char *filepath);
 
 #endif			/* FILEMAP_H */
diff --git a/src/bin/pg_rewind/parsexlog.c b/src/bin/pg_rewind/parsexlog.c
index 27782237d0..40c7073522 100644
--- a/src/bin/pg_rewind/parsexlog.c
+++ b/src/bin/pg_rewind/parsexlog.c
@@ -176,6 +176,10 @@ findLastCheckpoint(const char *datadir, XLogRecPtr forkptr, int tliIndex,
 	char	   *errormsg;
 	XLogPageReadPrivate private;
 
+	/* Track WAL segments opened while searching a checkpoint */
+	XLogSegNo	segno = 0;
+	TimeLineID	tli = 0;
+
 	/*
 	 * The given fork pointer points to the end of the last common record,
 	 * which is not necessarily the beginning of the next record, if the
@@ -217,6 +221,24

Re: list of acknowledgments for PG16

2023-08-23 Thread Etsuro Fujita

On Tue, Aug 22, 2023 at 6:33 PM Peter Eisentraut  wrote:
>   As usual, please check for problems such as wrong sorting, duplicate
> names in different variants, or names in the wrong order etc.  (Our
> convention is given name followed by surname.)

I went through Japanese names on the list.  I think they are all in
the right order (ie, the given-name-followed-by-surname order).

Thanks!

Best regards,
Etsuro Fujita

Re: MERGE ... RETURNING

2023-08-23 Thread Dean Rasheed

Updated version attached, fixing an uninitialized-variable warning
from the cfbot.

Regards,
Dean
diff --git a/doc/src/sgml/dml.sgml b/doc/src/sgml/dml.sgml
new file mode 100644
index cbbc5e2..7f65f6e
--- a/doc/src/sgml/dml.sgml
+++ b/doc/src/sgml/dml.sgml
@@ -283,10 +283,15 @@ DELETE FROM products;
RETURNING
   
 
+  
+   MERGE
+   RETURNING
+  
+
   
Sometimes it is useful to obtain data from modified rows while they are
being manipulated.  The INSERT, UPDATE,
-   and DELETE commands all have an
+   DELETE, and MERGE commands all have an
optional RETURNING clause that supports this.  Use
of RETURNING avoids performing an extra database query to
collect the data, and is especially valuable when it would otherwise be
@@ -339,6 +344,24 @@ DELETE FROM products
 
   
 
+  
+   In a MERGE, the data available to RETURNING is
+   the content of the source row plus the content of the inserted, updated, or
+   deleted target row.  In addition, the merge support functions
+and 
+   may be used to return information about which merge action was executed for
+   each row.  Since it is quite common for the source and target to have many
+   of the same columns, specifying RETURNING * can lead to a
+   lot of duplicated columns, so it is often more useful to just return the
+   target row.  For example:
+
+MERGE INTO products p USING new_products n ON p.product_no = n.product_no
+  WHEN NOT MATCHED THEN INSERT VALUES (n.product_no, n.name, n.price)
+  WHEN MATCHED THEN UPDATE SET name = n.name, price = n.price
+  RETURNING pg_merge_action(), p.*;
+
+  
+
   
If there are triggers () on the target table,
the data available to RETURNING is the row as modified by
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index be2f54c..e79e2ad
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21772,6 +21772,100 @@ SELECT count(*) FROM sometable;
 
  
 
+ 
+  Merge Support Functions
+
+  
+   MERGE
+   RETURNING
+  
+
+  
+   The merge support functions shown in
+may be used in the
+   RETURNING list of a  command
+   to return additional information about the action taken for each row.
+  
+
+  
+Merge Support Functions
+
+
+ 
+  
+   
+Function
+   
+   
+Description
+   
+  
+ 
+
+ 
+  
+   
+
+ pg_merge_action
+
+pg_merge_action ( )
+text
+   
+   
+Returns the action performed on the current row ('INSERT',
+'UPDATE', or 'DELETE').
+   
+  
+
+  
+   
+
+ pg_merge_when_clause_number
+
+pg_merge_when_clause_number ( )
+integer
+   
+   
+Returns the 1-based index of the WHEN clause executed
+for the current row.
+   
+  
+ 
+
+  
+
+  
+   Example:
+
+  
+
+  
+   Note that these functions can only be used in the RETURNING
+   list of a MERGE command.  It is an error to use them in
+   any other part of a query.
+  
+
+ 
+
  
   Subquery Expressions
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
new file mode 100644
index fe8def4..51a15ca
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -1387,9 +1387,9 @@
  to a client upon the
  completion of an SQL command, usually a
  SELECT but it can be an
- INSERT, UPDATE, or
- DELETE command if the RETURNING
- clause is specified.
+ INSERT, UPDATE,
+ DELETE, or MERGE command if the
+ RETURNING clause is specified.
 
 
  The fact that a result set is a relation means that a query can be used
diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
new file mode 100644
index f55e901..6803240
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -1020,8 +1020,8 @@ INSERT INTO mytable VALUES (1,'one'), (2
 
 
 
- If the command does return rows (for example SELECT,
- or INSERT/UPDATE/DELETE
+ If the command does return rows (for example SELECT, or
+ INSERT/UPDATE/DELETE/MERGE
  with RETURNING), there are two ways to proceed.
  When the command will return at most one row, or you only care about
  the first row of output, write the command as usual but add
@@ -1149,6 +1149,7 @@ SELECT select_expressionsexpressions INTO STRICT target;
 UPDATE ... RETURNING expressions INTO STRICT target;
 DELETE ... RETURNING expressions INTO STRICT target;
+MERGE ... RETURNING expressions INTO STRICT target;
 
 
  where target can be a record variable, a row
@@ -1159,8 +1160,8 @@ DELETE ... RETURNING expres
  INTO clause) just as described above,
  and the plan is cached in the same way.
  This works for SELECT,
- INSERT/UPDATE/DELETE with
- RETURNING, and certain utility commands
+ INSERT/UPDATE/DELETE/MERGE
+ with RETURNING, and certain utility commands
  that return row sets, such as EXPLAIN.
  Except for the

Re: Synchronizing slots from primary to standby

2023-08-23 Thread Dilip Kumar

On Wed, Aug 23, 2023 at 3:38 PM shveta malik  wrote:
>
I have reviewed the v12-0002 patch and I have some comments. I see the
latest version posted sometime back and if any of this comment is
already fixed in this version then feel free to ignore that.

In general, code still needs a lot more comments to make it readable
and in some places, code formatting is also not as per PG standard so
that needs to be improved.
There are some other specific comments as listed below

1.
@@ -925,7 +936,7 @@ ApplyLauncherRegister(void)
  memset(, 0, sizeof(bgw));
  bgw.bgw_flags = BGWORKER_SHMEM_ACCESS |
  BGWORKER_BACKEND_DATABASE_CONNECTION;
- bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
+ bgw.bgw_start_time = BgWorkerStart_ConsistentState;

What is the reason for this change, can you add some comments?

2.
ApplyLauncherShmemInit(void)
 {
  bool found;
+ bool foundSlotSync;

Is there any specific reason to launch the sync worker from the
logical launcher instead of making this independent?
I mean in the future if we plan to sync physical slots as well then it
wouldn't be an expandable design.

3.
+ /*
+ * Remember the old dbids before we stop and cleanup this worker
+ * as these will be needed in order to relaunch the worker.
+ */
+ copied_dbcnt = worker->dbcount;
+ copied_dbids = (Oid *)palloc0(worker->dbcount * sizeof(Oid));
+
+ for (i = 0; i < worker->dbcount; i++)
+ copied_dbids[i] = worker->dbids[i];

probably we can just memcpy the memory containing the dbids.

4.
+ /*
+ * If worker is being reused, and there is vacancy in dbids array,
+ * just update dbids array and dbcount and we are done.
+ * But if dbids array is exhausted, stop the worker, reallocate
+ * dbids in dsm, relaunch the worker with same set of dbs as earlier
+ * plus the new db.
+ */

Why do we need to relaunch the worker, can't we just use dsa_pointer
to expand the shared memory whenever required?

5.

+static bool
+WaitForSlotSyncWorkerAttach(SlotSyncWorker *worker,
+uint16 generation,
+BackgroundWorkerHandle *handle)

this function is an exact duplicate of WaitForReplicationWorkerAttach
except for the LWlock, why don't we use the same function by passing
the LWLock as a parameter

6.
+/*
+ * Attach Slot-sync worker to worker-slot assigned by launcher.
+ */
+void
+slotsync_worker_attach(int slot)

this is also very similar to the logicalrep_worker_attach function.

Please check other similar functions and reuse them wherever possible

Also, why this function is not registering the cleanup function on shmmem_exit?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Remove distprep


On 14.07.23 10:56, Peter Eisentraut wrote:

On 14.07.23 09:54, Peter Eisentraut wrote:
diff --git a/src/tools/pginclude/cpluspluscheck 
b/src/tools/pginclude/cpluspluscheck

index 4e09c4686b..287395887c 100755
--- a/src/tools/pginclude/cpluspluscheck
+++ b/src/tools/pginclude/cpluspluscheck
@@ -134,6 +134,9 @@ do
  test "$f" = src/interfaces/ecpg/preproc/preproc.h && continue
  test "$f" = src/test/isolation/specparse.h && continue

+    # FIXME
+    test "$f" = src/backend/utils/adt/jsonpath_internal.h && continue
+
  # ppport.h is not under our control, so we can't make it 
standalone.

  test "$f" = src/pl/plperl/ppport.h && continue


Hm, what's that about?


Don't remember ... ;-)  I removed this.


Ah, there was a reason.  The headerscheck and cpluspluscheck targets 
need jsonpath_gram.h to be built first.  Which previously happened 
indirectly somehow?  I have fixed this in the new patch version.  I also 
fixed the issue that Álvaro reported nearby.


Apparently, the headerscheck and cpluspluscheck targets still didn't 
work right in the Cirrus jobs.  Here is an updated patch to address 
that.  This is also rebased over some recent changes that affected this 
patch (generated wait events stuff).
From c386175ab4194aa2b17a8374f73a9de4aa0edb56 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 23 Aug 2023 12:15:42 +0200
Subject: [PATCH v4] Remove distprep

A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation.  We have done this consistent with established
practice at the time to not require these tools for building from a
tarball.  Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.

Now this has at least two problems:

One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball.  This is pretty
complicated, but it works so far for autoconf/make.  It does not
currently work for meson; you can currently only build with meson from
a git checkout.  Making meson builds work from a tarball seems very
difficult or impossible.  One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree.  So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree.  So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.

Second, there is increased interest nowadays in precisely tracking the
origin of software.  We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs.  But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.

The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball.  The tarball now only contains
what is in the git tree (*).  Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant.  And of course we
want to get the meson build system working universally.

This commit removes the make distprep target altogether.  The make
dist target continues to do its job, it just doesn't call distprep
anymore.

(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep.  This is unchanged for now.

The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep.  (In practice, it is probably obsolete given
that git clean is available.)

The following programs are now hard build requirements in configure
(they were already required by meson.build):

- bison
- flex
- perl
---
 GNUmakefile.in|  6 +-
 config/perl.m4|  9 +-
 config/programs.m4| 21 +
 configure | 62 ++
 contrib/cube/Makefile |  7 +-
 contrib/fuzzystrmatch/Makefile|  9 +-
 contrib/seg/Makefile  |  7 +-
 doc/Makefile  |  2 +-
 doc/src/Makefile  |  2 +-
 doc/src/sgml/Makefile |  9 +-
 doc/src/sgml/installation.sgml| 83 ---
 meson.build   |  2 +-
 src/Makefile  |  5 +-
 src/Makefile.global.in| 30 +++
 src/backend/Makefile  | 68

Re: PostgreSQL 16 release announcement draft

2023-08-23 Thread jian he

>>>
PostgreSQL 16 improves the performance of existing PostgreSQL functionality
through new query planner optimizations. In this latest release, the query
planner can parallelize  `FULL` and `RIGHT` joins, utilize incremental sorts for
`SELECT DISTINCT` queries, and execute window functions more efficiently. It
also introduces `RIGHT` and `OUTER` "anti-joins", which enable users to identify
rows not present in a joined table.
>>>

I think "utilize incremental sorts is for" something like select
my_avg(distinct one),my_sum(one) from (values(1),(3)) t(one);
so it's not the same as `SELECT DISTINCT` queries?
ref: 
https://git.postgresql.org/cgit/postgresql.git/commit/?id=1349d2790bf48a4de072931c722f39337e72055e

also
 "the query planner ., and execute window functions more efficiently."
since the query planner doesn't execute anything. probably "and
optimize window functions execution"?

Re: Synchronizing slots from primary to standby

2023-08-23 Thread shveta malik

On Thu, Aug 17, 2023 at 4:09 PM shveta malik  wrote:
>
> On Thu, Aug 17, 2023 at 11:55 AM shveta malik  wrote:
> >
> > On Thu, Aug 17, 2023 at 11:44 AM Drouvot, Bertrand
> >  wrote:
> > >
> > > Hi,
> > >
> > > On 8/14/23 11:52 AM, shveta malik wrote:
> > >
> > > >
> > > > We (myself and Ajin) performed the tests to compute the lag in standby
> > > > slots as compared to primary slots with different number of slot-sync
> > > > workers configured.
> > > >
> > >
> > > Thanks!
> > >
> > > > 3 DBs were created, each with 30 tables and each table having one
> > > > logical-pub/sub configured. So this made a total of 90 logical
> > > > replication slots to be synced. Then the workload was run for aprox 10
> > > > mins. During this workload, at regular intervals, primary and standby
> > > > slots' lsns were captured (from pg_replication_slots) and compared. At
> > > > each capture, the intent was to know how much is each standby's slot
> > > > lagging behind corresponding primary's slot by taking the distance
> > > > between confirmed_flush_lsn of primary and standby slot. Then we took
> > > > the average (integer value) of this distance over the span of 10 min
> > > > workload
> > >
> > > Thanks for the explanations, make sense to me.
> > >
> > > > and this is what we got:
> > > >
> > > > With max_slot_sync_workers=1, average-lag =  42290.3563
> > > > With max_slot_sync_workers=2, average-lag =  24585.1421
> > > > With max_slot_sync_workers=3, average-lag =  14964.9215
> > > >
> > > > This shows that more workers have better chances to keep logical
> > > > replication slots in sync for this case.
> > > >
> > >
> > > Agree.
> > >
> > > > Another statistics if it interests you is, we ran a frequency test as
> > > > well (this by changing code, unit test sort of) to figure out the
> > > > 'total number of times synchronization done' with different number of
> > > > sync-slots workers configured. Same 3 DBs setup with each DB having 30
> > > > logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> > > > and 3; total number of times synchronization done was 15874, 20205 and
> > > > 23414 respectively. Note: this is not on the same machine where we
> > > > captured lsn-gap data, it is on  a little less efficient machine but
> > > > gives almost the same picture
> > > >
> > > > Next we are planning to capture this data for a lesser number of slots
> > > > like 10,30,50 etc. It may happen that the benefit of multi-workers
> > > > over single workers in such cases could be less, but let's have the
> > > > data to verify that.
> > > >
> > >
> > > Thanks a lot for those numbers and for the testing!
> > >
> > > Do you think it would make sense to also get the number of using
> > > the pg_failover_slots module? (and compare the pg_failover_slots numbers 
> > > with the
> > > "one worker" case here). Idea is to check if the patch does introduce
> > > some overhead as compare to pg_failover_slots.
> > >
> >
> > Yes, definitely. We will work on that and share the numbers soon.
> >
>
> We are working on these tests. Meanwhile attaching the patches which
> attempt to implement below functionalities:
>
> 1) Remove extra slots on standby if those no longer exist on primary
> or are no longer part of synchronize_slot_names.
> 2) Make synchronize_slot_names user-modifiable. And due to change in
> 'synchronize_slot_names', if dbids list is reduced, then take care of
> removal of extra/old db-ids (if any) from workers db-list.
>
> Thanks Ajin for working on 1. Both the above changes are in
> patch-0002. There is a test failure in the recovery module due to
> these new changes, I am looking into it and will fix it in the next
> version.
>
> Improvements in pipeline:
> a) standby slots should not be consumable.
> b) optimization of query which standby sends to primary. Currently it
> has dbid filter and slot-name filter. Slot-name filter can be
> optimized to have only those slots which belong to DBs assigned to the
> worker rather than having all 'synchronize_slot_names'.
> c) analyze if the naptime of the slot-sync worker can be auto-tuned.
> If there is no activity going on (i.e. slots are not advancing on
> primary) then increase naptime of slot-sync worker on standby and
> decrease it again when activity starts.
>

Please find the patches attached. 0002 has below changes:

1) The naptime of the worker is now tuned as per the activity on
primary. Each worker starts with a naptime of 10ms and if no activity
is observed on primary for some time, then naptime is increased to
10sec. And if activity is observed again, naptime is reduced back to
10ms. Each worker does it by choosing one slot (first one assigned to
it) for monitoring purposes. If there is no change in lsn of that slot
for say over 5 sync-checks, naptime is increased to 10sec and as soon
as a change is observed, naptime is reduced back to 10ms.

2) The query sent by standby to primary to get slot info is written
better. The query has filters : where DBID in

Re: EBCDIC sorting as a use case for ICU rules


On 17.07.23 10:10, Daniel Verite wrote:

Peter Eisentraut wrote:


You can use whitespace in the rules.  For example,

CREATE COLLATION ebcdic (provider='icu', locale='und',
rules=$$


Nice, it's clearly better that the piece of code I had in the
previous patch.
It can also be made more compact by grouping consecutive
code points, for instance <*a-r for 'a' to 'r'
I changed it that way, and also moved '^' before '[' and ']',
since according to [1], '^' is at location 0xB0 and '[' and ']'
at 0xBA and 0xBB.

Updated patch attached.


Committed with some editing.  I moved the existing rules example from 
the CREATE COLLATION page into the new section you created, so we have a 
simple example followed by the complex example.

Re: pg_rewind WAL segments deletion pitfall

2023-08-23 Thread torikoshia


On 2023-08-22 14:32, Michael Paquier wrote:
Thanks for your review!


On Fri, Aug 18, 2023 at 03:40:57PM +0900, torikoshia wrote:

Thanks for the patch, I've marked this as ready-for-committer.

BTW, this issue can be considered a bug, right?
I think it would be appropriate to provide backpatch.


Hmm, I agree that there is a good argument in back-patching as we have
the WAL files between the redo LSN and the divergence LSN, but
pg_rewind is not smart enough to keep them around.  If the archives of
the primary were not able to catch up, the old primary is as good as
kaput, and restore_command won't help here.


True.
I also imagine that in the typical failover scenario where the target 
cluster was shut down soon after the divergence and pg_rewind was 
executed without much time, we can avoid this kind of 'requested WAL 
segment has already removed' error by preventing pg_rewind from deleting 
necessary WALs.




I don't like much this patch.  While it takes correctly advantage of
the backward record read logic from SimpleXLogPageRead() able to
handle correctly timeline jumps, it creates a hidden dependency in the
code between the hash table from filemap.c and the page callback.
Wouldn't it be simpler to build a list of the segment names using the
information from WALOpenSegment and build this list in
findLastCheckpoint()?  Also, I am wondering if we should be smarter
with any potential conflict handling between the source and the
target, rather than just enforcing a FILE_ACTION_NONE for all these
files.  In short, could it be better to copy the WAL file from the
source if it exists there?

+   /*
+* Some entries (WAL segments) already have an action assigned
+* (see SimpleXLogPageRead()).
+*/
+   if (entry->action == FILE_ACTION_UNDECIDED)
+   entry->action = decide_file_action(entry);

This change makes me a bit uneasy, per se my previous comment with the
additional code dependencies.

I think that this scenario deserves a test case.  If one wants to
emulate a delay in WAL archiving, it is possible to set
archive_command to a command that we know will fail, for instance.
--
Michael


Bungina, are you going to respond to these comments?

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: persist logical slots to disk during shutdown checkpoint

2023-08-23 Thread vignesh C

On Wed, 23 Aug 2023 at 14:21, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> > Here is a patch to persist to disk logical slots during a shutdown
> > checkpoint if the updated confirmed_flush_lsn has not yet been
> > persisted.
>
> Thanks for making the patch with different approach! Here are comments.
>
> 01. RestoreSlotFromDisk
>
> ```
> slot->candidate_xmin_lsn = InvalidXLogRecPtr;
> slot->candidate_restart_lsn = InvalidXLogRecPtr;
> slot->candidate_restart_valid = InvalidXLogRecPtr;
> +   slot->last_persisted_confirmed_flush = InvalidXLogRecPtr;
> ```
>
> last_persisted_confirmed_flush was set to InvalidXLogRecPtr, but isn't it 
> better
> to use cp.slotdata. confirmed_flush? Assuming that the server is shut down 
> immediately,
> your patch forces to save.
>
> 02. t/002_always_persist.pl
>
> The original author of the patch is me, but I found that the test could pass
> without your patch. This is because pg_logical_slot_get_changes()->
> pg_logical_slot_get_changes_guts(confirm = true)  always mark the slot as 
> dirty.
> IIUC we must use the logical replication system to verify the persistence.
> Attached test can pass only when patch is applied.

Here are few other comments that I noticed:

1) I too noticed that the test passes both with and without patch:
diff --git a/contrib/test_decoding/meson.build
b/contrib/test_decoding/meson.build
index 7b05cc25a3..12afb9ea8c 100644
--- a/contrib/test_decoding/meson.build
+++ b/contrib/test_decoding/meson.build
@@ -72,6 +72,7 @@ tests += {
   'tap': {
 'tests': [
   't/001_repl_stats.pl',
+  't/002_always_persist.pl',

2) change checkPoint to checkpoint:
2.a) checkPoint should be checkpoint to maintain consistency across the file:
+# Shutdown the node once to do shutdown checkpoint
+$node->stop();
+
+# Fetch checkPoint from the control file itself
+my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+my @control_data = split("\n", $stdout);
+my $latest_checkpoint = undef;

2.b) similarly here:
+die "No checkPoint in control file found\n"
+  unless defined($latest_checkpoint);

2.c) similarly here too:
+# Compare confirmed_flush_lsn and checkPoint
+ok($confirmed_flush eq $latest_checkpoint,
+   "Check confirmed_flush is same as latest checkpoint location");

3) change checkpoint to "Latest checkpoint location":
3.a) We should change "No checkPoint in control file found\n" to:
"Latest checkpoint location not found in control file\n" as there are
many checkpoint entries in control data

+foreach (@control_data)
+{
+   if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+   {
+   $latest_checkpoint = $1;
+   last;
+   }
+}
+die "No checkPoint in control file found\n"
+  unless defined($latest_checkpoint);

3.b) We should change "Fetch checkPoint from the control file itself" to:
"Fetch Latest checkpoint location from the control file"

+# Fetch checkPoint from the control file itself
+my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+my @control_data = split("\n", $stdout);
+my $latest_checkpoint = undef;
+foreach (@control_data)
+{

Regards,
Vignesh

RE: persist logical slots to disk during shutdown checkpoint

2023-08-23 Thread Hayato Kuroda (Fujitsu)

Dear Vignesh,

> Here is a patch to persist to disk logical slots during a shutdown
> checkpoint if the updated confirmed_flush_lsn has not yet been
> persisted.

Thanks for making the patch with different approach! Here are comments.

01. RestoreSlotFromDisk

```
slot->candidate_xmin_lsn = InvalidXLogRecPtr;
slot->candidate_restart_lsn = InvalidXLogRecPtr;
slot->candidate_restart_valid = InvalidXLogRecPtr;
+   slot->last_persisted_confirmed_flush = InvalidXLogRecPtr;
```

last_persisted_confirmed_flush was set to InvalidXLogRecPtr, but isn't it better
to use cp.slotdata. confirmed_flush? Assuming that the server is shut down 
immediately,
your patch forces to save.

02. t/002_always_persist.pl

The original author of the patch is me, but I found that the test could pass
without your patch. This is because pg_logical_slot_get_changes()->
pg_logical_slot_get_changes_guts(confirm = true)  always mark the slot as dirty.
IIUC we must use the logical replication system to verify the persistence.
Attached test can pass only when patch is applied.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



another_test.patch
Description: another_test.patch

Re: subscription/015_stream sometimes breaks

2023-08-23 Thread Alvaro Herrera

On 2023-Aug-23, Alvaro Herrera wrote:

> And the reason 0002 does not remove the zeroing of ->proc is that the
> tests gets stuck when I do that, and the reason for that looks to be
> some shoddy coding in WaitForReplicationWorkerAttach, so I propose we
> change that too, as in 0003.

Hmm, actually the test got stuck when I ran it repeatedly with this
0003.  I guess there might be other places that depend on ->proc being
set to NULL on exit.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Most hackers will be perfectly comfortable conceptualizing users as entropy
 sources, so let's move on."   (Nathaniel Smith)

Re: [dsm] comment typo

> On 21 Aug 2023, at 12:15, Junwang Zhao  wrote:
> 
> On Mon, Aug 21, 2023 at 5:16 PM Daniel Gustafsson  wrote:
>> 
>>> On 18 Aug 2023, at 11:10, Junwang Zhao  wrote:
>>> 
>>> In the following sentence, I believe either 'the' or 'a' should be kept, not
>>> both. I here keep the 'the', but feel free to change.
>> 
>>> * handle: The handle of an existing object, or for DSM_OP_CREATE, the
>>> - *a new handle the caller wants created.
>>> + *new handle the caller wants created.
>> 
>> Since the handle doesn't exist for DSM_OP_CREATE, both "a handle" and "the
>> handle" seems a tad misleading, how about "the identifier for the new handle 
>> the
>> caller wants created"?
>> 
> 
> Sounds great

Done that way, thanks!

--
Daniel Gustafsson

Re: MERGE ... RETURNING

2023-08-23 Thread Dean Rasheed

On Tue, 25 Jul 2023 at 21:46, Gurjeet Singh  wrote:
>
> There seems to be other use cases which need us to invent a method to
> expose a command-specific alias. See Tatsuo Ishii's call for help in
> his patch for Row Pattern Recognition [1].
>

I think that's different though, because in that example "START" is a
row from the table, and "price" is a table column, so using the table
alias syntax "START.price" makes sense, to refer to a value from the
table.

In this case "MERGE" and "action" have nothing to do with table rows
or columns, so saying "MERGE.action" doesn't fit the pattern.

> I performed the review vo v9 patch by comparing it to v8 and v7
> patches, and then comparing to HEAD.
>

Many thanks for looking at this.

> The v9 patch is more or less a complete revert to v7 patch, plus the
> planner support for calling the merge-support functions in subqueries,
> parser catching use of merge-support functions outside MERGE command,
> and name change for one of the support functions.
>

Yes, that's a fair summary.

> But reverting to v7 also means that some of my gripes with that
> version also return; e.g. invention of EXPR_KIND_MERGE_RETURNING. And
> as noted in v7 review, I don't have a better proposal.
>

True, but I think that it's in keeping with the purpose of the
ParseExprKind enumeration:

/*
 * Expression kinds distinguished by transformExpr().  Many of these are not
 * semantically distinct so far as expression transformation goes; rather,
 * we distinguish them so that context-specific error messages can be printed.
 */

which matches what the patch is using EXPR_KIND_MERGE_RETURNING for.

> - * Uplevel PlaceHolderVars and aggregates are replaced, too.
> + * Uplevel PlaceHolderVars, aggregates, GROUPING() expressions and merge
> + * support functions are replaced, too.
>
> Needs Oxford comma: s/GROUPING() expressions and/GROUPING() expressions, and/
>

Added.

> +pg_merge_action(PG_FUNCTION_ARGS)
> +{
> ...
> +relaction = mtstate->mt_merge_action;
> +if (relaction)
> +{
> ..
> +}
> +
> +PG_RETURN_NULL();
> +}
>
> Under what circumstances would the relaction be null? Is it okay to
> return NULL from this function if relaction is null, or is it better
> to throw an error? These questions apply to the
> pg_merge_when_clause_number() function as well.
>

Yes, it's really a "should never happen" situation, so I've converted
it to elog() an error. Similarly, commandType should never be
CMD_NOTHING in pg_merge_action(), so that also now throws an error.
Also, the planner code now throws an error if it sees a merge support
function outside a MERGE. Again, that should never happen, due to the
parser check, but it seems better to be sure, and catch it early.

While at it, I tidied up the planner code a bit, making the merge
support function handling more like the other cases in
replace_correlation_vars_mutator(), and making
replace_outer_merge_support_function() more like its neighbouring
functions, such as replace_outer_grouping(). In particular, it is now
only called if it is a reference to an upper-level MERGE, not for
local references, which matches the pattern used in the neighbouring
functions.

Finally, I have added some new RLS code and tests, to apply SELECT
policies to new rows inserted by MERGE INSERT actions, if a RETURNING
clause is specified, to make it consistent with a plain INSERT ...
RETURNING command (see commit c2e08b04c9).

Regards,
Dean
diff --git a/doc/src/sgml/dml.sgml b/doc/src/sgml/dml.sgml
new file mode 100644
index cbbc5e2..7f65f6e
--- a/doc/src/sgml/dml.sgml
+++ b/doc/src/sgml/dml.sgml
@@ -283,10 +283,15 @@ DELETE FROM products;
RETURNING

+  
+   MERGE
+   RETURNING
+  
+

Sometimes it is useful to obtain data from modified rows while they are
being manipulated.  The INSERT, UPDATE,
-   and DELETE commands all have an
+   DELETE, and MERGE commands all have an
optional RETURNING clause that supports this.  Use
of RETURNING avoids performing an extra database query to
collect the data, and is especially valuable when it would otherwise be
@@ -339,6 +344,24 @@ DELETE FROM products

+  
+   In a MERGE, the data available to RETURNING is
+   the content of the source row plus the content of the inserted, updated, or
+   deleted target row.  In addition, the merge support functions
+and 
+   may be used to return information about which merge action was executed for
+   each row.  Since it is quite common for the source and target to have many
+   of the same columns, specifying RETURNING * can lead to a
+   lot of duplicated columns, so it is often more useful to just return the
+   target row.  For example:
+
+MERGE INTO products p USING new_products n ON p.product_no = n.product_no
+  WHEN NOT MATCHED THEN INSERT VALUES (n.product_no, n.name, n.price)
+  WHEN MATCHED THEN UPDATE SET name = n.name, price = n.price
+  RETURNING pg_merge_action(), p.*;
+
+  
+

If there are triggers () on the target

Re: initdb caching during tests

> On 23 Aug 2023, at 03:17, Andres Freund  wrote:
> On 2023-08-22 23:47:24 +0200, Daniel Gustafsson wrote:

>> My only small gripe is that I keep thinking about template databases for 
>> CREATE
>> DATABASE when reading the error messages in this patch, which is clearly not
>> related to what this does.
>> 
>> +   note("initializing database system by copying initdb template");
>> 
>> I personally would've used cache instead of template in the user facing parts
>> to keep concepts separated, but thats personal taste.
> 
> I am going back and forth on that one (as one can notice with $subject). It
> doesn't quite seem like a cache, as it's not "created" on demand and only
> usable when the exactly same parameters are used repeatedly. But template is
> overloaded as you say...

That's a fair point, cache is not a good word to describe a stored copy of
something prefabricated.  Let's go with template, we can always refine in-tree
if a better wording comes along.

--
Daniel Gustafsson

Re: subscription/015_stream sometimes breaks

2023-08-23 Thread Alvaro Herrera

On 2023-Aug-23, Zhijie Hou (Fujitsu) wrote:

> [1]--
>   LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> 
>   workers = logicalrep_workers_find(MyLogicalRepWorker->subid, 
> true);
>   foreach(lc, workers)
>   {
>   LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
> 
> **if (isParallelApplyWorker(w))
>   logicalrep_worker_stop_internal(w, SIGTERM);
>   }

Hmm, I think if worker->in_use is false, we shouldn't consult the rest
of the struct at all, so I propose to add the attached 0001 as a minimal
fix.

In fact, I'd go further and propose that if we do take that stance, then
we don't need clear out the contents of this struct at all, so let's
not.  That's 0002.

And the reason 0002 does not remove the zeroing of ->proc is that the
tests gets stuck when I do that, and the reason for that looks to be
some shoddy coding in WaitForReplicationWorkerAttach, so I propose we
change that too, as in 0003.

Thoughts?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From a28775fc5900cd740556a864e1826ceda268b794 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 23 Aug 2023 09:25:35 +0200
Subject: [PATCH 1/3] Don't use LogicalRepWorker until ->in_use is verified

---
 src/backend/replication/logical/launcher.c | 4 ++--
 src/include/replication/worker_internal.h  | 8 ++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 72e44d5a02..ec5156aa41 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -862,7 +862,7 @@ logicalrep_sync_worker_count(Oid subid)
 	{
 		LogicalRepWorker *w = >workers[i];
 
-		if (w->subid == subid && isTablesyncWorker(w))
+		if (isTablesyncWorker(w) && w->subid == subid)
 			res++;
 	}
 
@@ -889,7 +889,7 @@ logicalrep_pa_worker_count(Oid subid)
 	{
 		LogicalRepWorker *w = >workers[i];
 
-		if (w->subid == subid && isParallelApplyWorker(w))
+		if (isParallelApplyWorker(w) && w->subid == subid)
 			res++;
 	}
 
diff --git a/src/include/replication/worker_internal.h b/src/include/replication/worker_internal.h
index a428663859..8f4bed0958 100644
--- a/src/include/replication/worker_internal.h
+++ b/src/include/replication/worker_internal.h
@@ -327,8 +327,10 @@ extern void pa_decr_and_wait_stream_block(void);
 extern void pa_xact_finish(ParallelApplyWorkerInfo *winfo,
 		   XLogRecPtr remote_lsn);
 
-#define isParallelApplyWorker(worker) ((worker)->type == WORKERTYPE_PARALLEL_APPLY)
-#define isTablesyncWorker(worker) ((worker)->type == WORKERTYPE_TABLESYNC)
+#define isParallelApplyWorker(worker) ((worker)->in_use && \
+	   (worker)->type == WORKERTYPE_PARALLEL_APPLY)
+#define isTablesyncWorker(worker) ((worker)->in_use && \
+   (worker)->type == WORKERTYPE_TABLESYNC)
 
 static inline bool
 am_tablesync_worker(void)
@@ -339,12 +341,14 @@ am_tablesync_worker(void)
 static inline bool
 am_leader_apply_worker(void)
 {
+	Assert(MyLogicalRepWorker->in_use);
 	return (MyLogicalRepWorker->type == WORKERTYPE_APPLY);
 }
 
 static inline bool
 am_parallel_apply_worker(void)
 {
+	Assert(MyLogicalRepWorker->in_use);
 	return isParallelApplyWorker(MyLogicalRepWorker);
 }
 
-- 
2.30.2

>From 36336e6d8ef2913def59a53b6bc083a40181a85a Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 23 Aug 2023 09:53:51 +0200
Subject: [PATCH 2/3] don't clean up unnecessarily

---
 src/backend/replication/logical/launcher.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index ec5156aa41..f9d6ebf2d8 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -795,12 +795,6 @@ logicalrep_worker_cleanup(LogicalRepWorker *worker)
 
 	worker->in_use = false;
 	worker->proc = NULL;
-	worker->dbid = InvalidOid;
-	worker->userid = InvalidOid;
-	worker->subid = InvalidOid;
-	worker->relid = InvalidOid;
-	worker->leader_pid = InvalidPid;
-	worker->parallel_apply = false;
 }
 
 /*
-- 
2.30.2

>From 9dced5b0898be22004d442cb2893848663f967ba Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 23 Aug 2023 09:55:13 +0200
Subject: [PATCH 3/3] Don't rely on proc being NULL either

---
 src/backend/replication/logical/launcher.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index f9d6ebf2d8..8bfceb5c27 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -201,11 +201,18 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker,
 
 		LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
 
-		/* Worker either died or has started. Return false if died. */
-		if

Re: pg_upgrade - a function parameter shadows global 'new_cluster'

> On 23 Aug 2023, at 03:28, Peter Smith  wrote:

> PSA a small patch to remove the unnecessary parameter, and so eliminate this 
> shadowing.

Agreed, applied. Thanks!

--
Daniel Gustafsson

Re: list of acknowledgments for PG16

2023-08-23 Thread Denis Laxalde


Peter Eisentraut a écrit :

The list of acknowledgments for the PG16 release notes has been
committed.  It should show up here sometime:
.
   As usual, please check for problems such as wrong sorting, duplicate
names in different variants, or names in the wrong order etc.  (Our
convention is given name followed by surname.)



"Gabriele Varrazzo" is mentioned in commit 
0032a5456708811ca95bd80a538f4fb72ad0dd20 but it should be "Daniele 
Varrazzo" (per Discussion link in commit message); the later is already 
in the list.From c2e685b51f89d80e6e937afaaa0f8d1231fc4d2c Mon Sep 17 00:00:00 2001
From: Denis Laxalde 
Date: Wed, 23 Aug 2023 09:10:28 +0200
Subject: [PATCH] Remove a wrong name in acknowledgments

---
 doc/src/sgml/release-16.sgml | 1 -
 1 file changed, 1 deletion(-)

diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml
index db889127fe..3e8106d22b 100644
--- a/doc/src/sgml/release-16.sgml
+++ b/doc/src/sgml/release-16.sgml
@@ -3968,7 +3968,6 @@ Author: Andres Freund 
 Farias de Oliveira
 Florin Irion
 Franz-Josef Färber
-Gabriele Varrazzo
 Garen Torikian
 Georgios Kokolatos
 Gilles Darold
-- 
2.39.2

Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?