date:20250220

On Fri, Feb 7, 2025 at 5:01 AM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Tue, 4 Feb 2025 22:20:51 -0800,
>   Masahiko Sawada  wrote:
>
> >> I was just looking at bit at this series of patch labelled with v31,
> >> to see what is happening here.
> >>
> >> In 0001, we have that:
> >>
> >> +   /* format-specific routines */
> >> +   const CopyToRoutine *routine;
> >> [...]
> >> -   CopySendEndOfRow(cstate);
> >> +   cstate->routine->CopyToOneRow(cstate, slot);
> >>
> >> Having a callback where the copy state is processed once per row is
> >> neat in terms of design for the callbacks and what extensions can do,
> >> and this is much better than what 2889fd23be5 has attempted (later
> >> reverted in 1aa8324b81fa) because we don't do indirect function calls
> >> for each attribute.  Still, I have a question here: what happens for a
> >> COPY TO that involves one attribute, a short field size like an int2
> >> and many rows (the more rows the more pronounced the effect, of
> >> course)?  Could this level of indirection still be the cause of some
> >> regressions in a case like that?  This is the worst case I can think
> >> about, on top of my mind, and I am not seeing tests with few
> >> attributes like this one, where we would try to make this callback as
> >> hot as possible.  This is a performance-sensitive area.
> >
> > FYI when Sutou-san last measured the performance[1], it showed a
> > slight speed up even with fewer columns (5 columns) in both COPY TO
> > and COPY FROM cases. The callback design has not changed since then.
> > But it would be a good idea to run the benchmark with a table having a
> > single small size column.
> >
> > [1] 
> > https://www.postgresql.org/message-id/20241114.161948.1677325020727842666.kou%40clear-code.com
>
> I measured v31 patch set with 1,6,11,16,21,26,31 int2
> columns. See the attached PDF for 0001 and 0002 result.
>
> 0001 - to:
>
> It's faster than master when the number of rows are
> 1,000,000-5,000,000.
>
> It's almost same as master when the number of rows are
> 6,000,000-10,000,000.
>
> There is no significant slow down when the number of columns
> is 1.
>
> 0001 - from:
>
> 0001 doesn't change COPY FROM code. So the differences are
> not real difference.
>
> 0002 - to:
>
> 0002 doesn't change COPY TO code. So "0001 - to" and "0002 -
> to" must be the same result. But 0002 is faster than master
> for all cases. It shows that the CopyToOneRow() approach
> doesn't have significant slow down.
>
> 0002 - from:
>
> 0002 changes COPY FROM code. So this may have performance
> impact.
>
> It's almost same as master when data is smaller
> ((1,000,000-2,000,000 rows) or (3,000,000 rows and 1,6,11,16
> columns)).
>
> It's faster than master when data is larger.
>
> There is no significant slow down by 0002.
>

Thank you for sharing the benchmark results. That looks good to me.

Looking at the 0001 patch again, I have a question: we have
CopyToTextLikeOneRow() for both CSV and text format:

+/* Implementation of the per-row callback for text format */
+static void
+CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+   CopyToTextLikeOneRow(cstate, slot, false);
+}
+
+/* Implementation of the per-row callback for CSV format */
+static void
+CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
+{
+   CopyToTextLikeOneRow(cstate, slot, true);
+}

These two functions pass different is_csv value to that function,
which is used as follows:

+   if (is_csv)
+   CopyAttributeOutCSV(cstate, string,
+
 cstate->opts.force_quote_flags[attnum - 1]);
+   else
+   CopyAttributeOutText(cstate, string);

However, we can know whether the format is CSV or text by checking
cstate->opts.csv_mode instead of passing is_csv. That way, we can
directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or
CopyToTextOneRow(). It would not help performance since we already
inline CopyToTextLikeOneRow(), but it looks simpler.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: BitmapHeapScan streaming read user and prelim refactoring

On Fri, Feb 21, 2025 at 11:17 AM Melanie Plageman
 wrote:
> [v31-0001-Delay-extraction-of-TIDBitmap-per-page-offsets.patch]

Nice patch, seems like a straightforward win with the benefits you
explained: less work done under a lock, less work done in the second
iterator if the rest of this stuff doesn't make it in yet, and less
space used assuming it does.  My only comment is that I'm not sure
about this type:

+/*
+ * The tuple OffsetNumbers extracted from a single page in the bitmap.
+ */
+typedef OffsetNumber TBMOffsets[TBM_MAX_TUPLES_PER_PAGE];

It's used for a stack object, but it's also used as a degraded pointer here:

+extern void tbm_extract_page_tuple(TBMIterateResult *iteritem,
+
TBMOffsets offsets);

Maybe a personal style question but degraded pointers aren't my
favourite C feature.  A couple of alternatives, I am not sure how good
they are:  Would this output variable be better wrapped up in a new
struct for better API clarity?  If so, shouldn't such a struct also
hold ntuples itself, since that really goes with the offsets it's
holding?  If so, could iteritem->ntuples then be replaced with a
boolean iteritem->lossy, since all you really need to be able to check
is whether it has offsets to give you, instead of using -1 and -2 for
that?  Or as a variation without a new struct, would it be better to
take that output array as a plain pointer to OffsetNumber and
max_offsets, and return ntuples (ie how many were filled in, cf commit
f6bef362), for the caller to use in a loop or stash somewhere, with an
additional comment that TBM_MAX_TUPLES_PER_PAGE is guaranteed to be
enough, for convenience eg for arrays on the stack?

Re: Improve CRC32C performance on SSE4.2

2025-02-20 Thread John Naylor

On Fri, Feb 21, 2025 at 1:24 AM Devulapalli, Raghuveer
 wrote:
>
>
> > Now, there is no equivalent on MSVC and workarounds are fragile [1].
> > Maybe we could only assert initialization happened for the backend and for
> > frontend either
> > - add a couple manual initializations to to the frontend programs where we 
> > don't
> > want to lose performance for non-gcc/clang.
> > - require CRC on x86-64 MSVC since Windows 10 is EOL soon, going by Thomas
> > M.'s earlier findings on popcount (also SSE4.2) [2]
> >
> > The first is less risky but less tidy.
>
> Agree, let me think about this but not sure if I have any useful suggestions 
> here. MSVC is unfortunately not my strong suit :/

Here's another idea to make it more automatic: Give up on initializing
every capability at once. The first time we call CRC, it will be
uninitialized, so this part:

if (pg_cpucap & PGCPUCAP_CRC32C)
  return COMP_CRC32C_HW(crc, data, len);
else
  return pg_comp_crc32c_sb8(crc, data, len);

...will call the SB8 path. Inside there, do the check:

#if defined(HAVE_CRC_RUNTIME)
// separate init bit for each capability
if (unlikely(pg_cpucap & PGCPUCAP_CRC32C_INIT == 0))
{
  pg_cpucap_crc32c(); // also sets PGCPUCAP_CRC32C_INIT
  if (pg_cpucap & PGCPUCAP_CRC32C)
return COMP_CRC32C_HW(crc, data, len);
}
#endif
// ...fallthrough to SB8

--
John Naylor
Amazon Web Services

Re: GetRelationPath() vs critical sections

On Fri, Feb 21, 2025 at 6:20 PM Thomas Munro  wrote:
> #define PROCNUMBER_BITS 18
> #define MAX_BACKENDS ((1 << PROCNUMBER_BITS) - 1)
> #define PROCNUMBER_CHARS DECIMAL_DIGITS_FOR_BITS(PROCNUMBER_BITS)
>
> ... with a little helper ported to preprocessor hell from Hacker's
> Delight magic[1] for that last bit.  See attached.  But if that's a
> bit too nuts...

Erm, wait of course you just need:

#define DECIMAL_DIGITS(n) \
   ((n) < 10 ? 1 : \
(n) < 100 ? 2 : \
(n) < 1000 ? 3 : \
(n) < 1 ? 4 : \
(n) < 10 ? 5 : \
(n) < 100 ? 6 : \
(n) < 1000 ? 7 : \
(n) < 1 ? 8 : \
(n) < 10 ? 9 : 10)

#define PROCNUMBER_BITS 18
#define MAX_BACKENDS ((1 << PROCNUMBER_BITS) - 1)
#define PROCNUMBER_CHARS DECIMAL_DIGITS(MAX_BACKENDS)

Re: GetRelationPath() vs critical sections

Thomas Munro  writes:
> On Fri, Feb 21, 2025 at 9:28 AM Andres Freund  wrote:
>> The patch curently uses a hardcoded 6 for the length of MAX_BACKENDS. Does
>> anybody have a good idea for how to either
>> 
>> a) derive 6 from MAX_BACKENDS in a way that can be used in a C array size

This all seems quite over-the-top to me.  Just allocate 10 characters,
which is certainly enough for the output of a %u format spec, and
call it good.

(Yeah, I know that's not as much fun, but ...)

regards, tom lane

Re: BgBufferSync(): clarification about reusable_buffers variable

Hi Marcel,


On Sat, Feb 8, 2025 at 6:24 AM M vd H  wrote:
>
> I have been reading the source code of the BgWriter, and there is some
> code in BgBufferSync() that I don't fully understand.
>
> In BgBufferSync(), we have the following code:
>
> while (num_to_scan > 0 && reusable_buffers < upcoming_alloc_est)
> {
> intsync_state = SyncOneBuffer(next_to_clean, true,
>wb_context);
>
> if (++next_to_clean >= NBuffers)
> {
> next_to_clean = 0;
> next_passes++;
> }
> num_to_scan--;
>
> if (sync_state & BUF_WRITTEN)
> {
> reusable_buffers++;
> if (++num_written >= bgwriter_lru_maxpages)
> {
> PendingBgWriterStats.maxwritten_clean++;
> break;
> }
> }
> else if (sync_state & BUF_REUSABLE)
> reusable_buffers++;
> }
>
>
> In SyncOneBuffer(), we lock the bufHdr and then check if both the
> refcount and usage_count are zero. If so, we mark the return value as
> BUF_REUSABLE.
> My understanding is that this means that the buffer could be reused
> when am empty shared buffer is needed by a backend. However, in the
> code above, we seem to track these in the reusable_buffers variable.
> But that variable is always incremented when the buffer was written in
> SyncOneBuffer() even though that buffer might have a non-zero refcount
> or non-zero usage_count.

I also think tha reusable_buffers keep track of the number of reusable
buffers.  BgBufferSync() calls SyncOneBuffer() with skip_recently_used
= true. In that case, if SyncOneBuffer() finds the buffer with
refcount or usage_count non-zero, it just unlocks the header and
returns. Hence when called from BgBufferSync(), SyncOneBuffer() would
write a buffer only when it is not used. Hence the result would be 0
or BUF_REUSABLE or BUF_REUSABLE | BUF_WRITTEN. It can never be just
BUF_WRITTEN.

I guess, a patch like the one attached will be more readable and clear.

I ran pgbench for 5 minutes with this patch applied and didn't see the
Assert failing. But I don't think that's a good enough test to cover
all scenarios.

--
Best Wishes,
Ashutosh Bapat
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 80b0d0c5ded..bdacd5ad980 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -3414,17 +3414,19 @@ BgBufferSync(WritebackContext *wb_context)
 		}
 		num_to_scan--;
 
+		if (sync_state & BUF_REUSABLE)
+			reusable_buffers++;
+
 		if (sync_state & BUF_WRITTEN)
 		{
-			reusable_buffers++;
+			Assert(sync_state & BUF_REUSABLE);
+
 			if (++num_written >= bgwriter_lru_maxpages)
 			{
 PendingBgWriterStats.maxwritten_clean++;
 break;
 			}
 		}
-		else if (sync_state & BUF_REUSABLE)
-			reusable_buffers++;
 	}
 
 	PendingBgWriterStats.buf_written_clean += num_written;

Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

On Thu, Feb 20, 2025 at 2:07 PM Noah Misch  wrote:
>
> On Wed, Feb 19, 2025 at 10:48:29AM -0800, Masahiko Sawada wrote:
> > Thank you for reviewing the patches. I've fixed these issues and
> > attached the updated patches.
>
> Looks good.
>
> > I have one question about the 0001 patch; since we add
> > 'default_char_signedness' field to ControlFileData do we need to bump
> > PG_CONTROL_VERSION? We have comments about bumping PG_CONTROL_VERSION
> > when changing CheckPoint struct or DBState enum so it seems likely but
> > I'd like to confirm just in case that we need to bump
> > PG_CONTROL_VERSION also when changing ControlFileData.
>
> Yes.  (I'm not aware of value we get from having distinct control file version
> and catalog version, but we do have both.)
>
> > If we need, can
> > we bump it to 1800? or 1701?
>
> I'd do 1800.  The pattern seems to be to bump to 1800 for the first pg_control
> change of the v18 cycle, then 1801, then 1802 for the third change of the
> cycle.  That's based on this history:
>
> git log -U0 -p src/include/catalog/pg_control.h | grep -E '^(Date|\+#define 
> PG_CONTROL_VERSION)'

Thank you for the confirmation. That makes sense to me.

I'll push these patches with version bumps, barring any objections or
further comments.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Trigger more frequent autovacuums of heavy insert tables

On Wed, Feb 19, 2025 at 4:59 PM Nathan Bossart  wrote:
>
> On Wed, Feb 19, 2025 at 04:36:05PM -0500, Melanie Plageman wrote:
> > This makes me think I should also not cap relallfrozen when using it
> > in relation_needs_vacanalyze(). There I cap it to relallvisible and
> > relallvisible is capped to relpages. One of the ideas behind letting
> > people modify these stats in pg_class is that they can change a single
> > field to see what the effect on their system is, right?
>
> Right.  Capping these values to reflect reality seems like it could make
> that more difficult.

Attache v7 doesn't cap the result for manual stats updating done with
relation_statistics_update(). I did, however, keep the cap for the
places where vacuum/analyze/create index update the stats. There the
number for relallfrozen is coming directly from visibilitymap_count(),
so it should be correct. I could perhaps add an assert instead, but I
didn't think that really made sense. An assert is meant to help the
developer and what could the developer do about the visibility map
being corrupted.

> >> Should we allow manipulating relallfrozen like we do relallvisible?  My
> >> assumption is that would even be required for the ongoing statistics
> >> import/export work.
> >
> > Why would it be required for the statistics import/export work?
>
> It's probably not strictly required, but my naive expectation would be that
> we'd handle relallfrozen just like relallvisible, which appears to be
> dumped in the latest stats import/export patch.  Is there any reason we
> shouldn't do the same for relallfrozen?

Nope I don't think so, but I also don't know about how people are
envisioning using a manually updated relallvisible.

- Melanie
From 26dac7d2d9768c072913c911776d4f679bb21126 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 16 Jan 2025 16:31:55 -0500
Subject: [PATCH v7 2/2] Trigger more frequent autovacuums with relallfrozen

Calculate the insert threshold for triggering an autovacuum of a
relation based on the number of unfrozen pages. By only considering the
"active" (unfrozen) portion of the table when calculating how many
tuples to add to the insert threshold, we can trigger more frequent
vacuums of insert-heavy tables and increase the chances of vacuuming
those pages when they still reside in shared buffers.

Reviewed-by: Greg Sabino Mullane
---
 doc/src/sgml/config.sgml  | 16 +--
 src/backend/postmaster/autovacuum.c   | 27 ---
 src/backend/utils/misc/postgresql.conf.sample |  4 +--
 3 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 007746a4429..cdd477e62c3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8767,14 +8767,14 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;


 
- Specifies a fraction of the table size to add to
- autovacuum_vacuum_insert_threshold
- when deciding whether to trigger a VACUUM.
- The default is 0.2 (20% of table size).
- This parameter can only be set in the postgresql.conf
- file or on the server command line;
- but the setting can be overridden for individual tables by
- changing table storage parameters.
+Specifies a fraction of the active (unfrozen) table size to add to
+autovacuum_vacuum_insert_threshold
+when deciding whether to trigger a VACUUM.
+The default is 0.2 (20% of active table size).
+This parameter can only be set in the postgresql.conf
+file or on the server command line;
+but the setting can be overridden for individual tables by
+changing table storage parameters.
 

   
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ade2708b59e..050dbd43a22 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2938,7 +2938,6 @@ relation_needs_vacanalyze(Oid relid,
 {
 	bool		force_vacuum;
 	bool		av_enabled;
-	float4		reltuples;		/* pg_class.reltuples */
 
 	/* constants from reloptions or GUC variables */
 	int			vac_base_thresh,
@@ -3052,7 +3051,11 @@ relation_needs_vacanalyze(Oid relid,
 	 */
 	if (PointerIsValid(tabentry) && AutoVacuumingActive())
 	{
-		reltuples = classForm->reltuples;
+		float4		pcnt_unfrozen = 1;
+		float4		reltuples = classForm->reltuples;
+		int32		relpages = classForm->relpages;
+		int32		relallfrozen = classForm->relallfrozen;
+
 		vactuples = tabentry->dead_tuples;
 		instuples = tabentry->ins_since_vacuum;
 		anltuples = tabentry->mod_since_analyze;
@@ -3061,11 +3064,29 @@ relation_needs_vacanalyze(Oid relid,
 		if (reltuples < 0)
 			reltuples = 0;
 
+		/*
+		 * If we have data for relallfrozen, calculate the unfrozen percentage
+		 * of the table to modify insert scale factor. This helps us decide
+		 * whether or not to vacuum an insert-heavy tabl

Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2025-02-20 Thread Ajin Cherian

On Thu, Feb 20, 2025 at 3:08 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Ajin,
>
> > I compared the patch 1 which does not employ a hash cache and has the
> > overhead of starting a transaction every time the filter is checked.
> >
> > I created a test setup of 10 million inserts in 3 different scenarios:
> > 1. All inserts on unpublished tables
> > 2. Half of the inserts on unpublished table and half on pupblished table
> > 3. All inserts on published tables.
> >
> > The percentage improvement in the new optimized patch compared to the
> > old patch is:
> >
> > No transactions in publication: 85.39% improvement
> > Half transactions in publication: 72.70% improvement
> > All transactions in publication: 48.47% improvement
> >
> > Attaching a graph to show the difference.
>
> I could not find any comparisons with HEAD. Can you clarify the 
> throughput/latency/memory
> usage with HEAD?

Here's the difference in latency with head. Again 10 million inserts
in 3 scenarios: All transactions on unpublished tables, half of the
transactions on unpublished tables and all transactions on published
tables
Conclusion:
The patched code with 100 transaction throttling significantly
improves performance, reducing execution time by ~69% when no
published transactions are involved, ~43% with partial published
transactions, and ~15% in all published transactions.
Attaching a graph showing the performance differences.

I will run tests comparing memory and throughput as well.

regards,
Ajin Cherian
Fujitsu Australia

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

2025-02-20 Thread Fujii Masao





On 2025/02/20 15:27, Yuki Seino wrote:


On 2025/02/19 1:08, Fujii Masao wrote:

Just an idea, but how about updating ConditionalXactLockTableWait() to do the 
followings?
- Use LockAcquireExtended() instead of LockAcquire() to retrieve the LOCALLOCK 
value.
- Generate a log message about the lock holders, from the retrieved the 
LOCALLOCK value.
- Return the generated log message to the caller (heap_lock_tuple()), allowing 
it to log the message.

Thank you for your suggestion. I have two issues to discuss:


### 1. Issue with LockAcquireExtended()

It appears that in the dontWait case, LockAcquireExtended() is removing the 
local lock (locallock).

```
if (locallock->nLocks == 0)
     RemoveLocalLock(locallock);
```

Instead, the removal of locallock should be handled by 
ConditionalXactLockTableWait().


RemoveLocalLock() doesn't seem to remove the necessary LocalLock information
for generating the lock failure log message. Is that correct?

One idea I considered is using the LocalLock returned by LockAcquireExtended(),
but this might complicate the code more than expected. A simpler approach could
be adding a new argument, like logLockFailure, to LockAcquireExtended(),
allowing it to log the failure message when set to true. This change would
affect LockAcquireExtended() and some ConditionalLockXXX() functions,
but it doesn't seem too invasive.

As for LockAcquire(), I don't think it needs a new argument. If any
ConditionalLockXXX() functions call LockAcquire(), we could modify them to
call LockAcquireExtended() instead, enabling logging when needed.



### 2. Issue with the LOCK TABLE Case

For the LOCK TABLE scenario, RangeVarGetRelidExtended() calls 
ConditionalLockRelationOid(), which seems to require a similar modification.


Yes, if we want to support LOCK TABLE NOWAIT and other NOWAIT cases,
these additional updates would be required. However, I suggest focusing
on row-level locks with SELECT ... NOWAIT first, then expanding to
other cases later.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: pg_recvlogical requires -d but not described on the documentation

On Fri, Feb 21, 2025 at 9:56 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> Hi, I found the issue $SUBJECT. According to the doc [1]:
>
> ```
> -d dbname
> --dbname=dbname
> The database to connect to. See the description of the actions for what this 
> means in detail.
> The dbname can be a connection string. If so, connection string parameters 
> will override any
> conflicting command line options. Defaults to the user name.
> ```
>
> IIUC final statement implies --dbname can be omitted. If it is mandatory, it 
> should be described
> in the synopsis part - not written now. However, the command would fail when 
> -d is missed.
>
> ```
> $ pg_recvlogical -U postgres --start -S test -f -
> pg_recvlogical: error: no database specified
> pg_recvlogical: hint: Try "pg_recvlogical --help" for more information.
> ```
>
> ISTM the inconsistency is introduced since the initial commit. I think they 
> should be unified either
> 1) update the doc or 2) accept when -d is not specified. Personally, I like 
> 2nd approach, pg_recvlogical
> can follow the normal connection rule. I.e.,
>

Given that the discrepancy has survived so long, it seems that users
always pass -d. And to some extent, requiring users to specify a
database instead of defaulting to one is safer practice. This avoids
users fetching changes from unexpected database/slot and cause further
database inconsistencies on the receiver side. I would just fix
documentation in this case.

> a) Use PGDATABASE if specified
> b) Check 'U' and PGUSER and use it if specified
> c) Otherwise use the current OS user name

If we want to go this route, it will be good to unify the treatment of
-d option at one place in code and reuse it everywhere. Thanks to your
investigations, we are seeing too many descripancies in -d option
being used in many utilities. Fixing all at once would be good. Also
it will be good to similarly unify documentation by writing -d
documentation at only place and reusing it everywhere. But I don't
know whether our documentation allows modularization and code-reuse.

--
Best Wishes,
Ashutosh Bapat

Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-02-20 Thread Alvaro Herrera

Hello,

Thanks!

I noticed a typo 'constrint' in several places; fixed in the attached.

I discovered that this sequence, taken from added regression tests,

CREATE TABLE notnull_tbl1 (a int);
ALTER TABLE notnull_tbl1 ADD CONSTRAINT nn_parent not null a not valid;
CREATE TABLE notnull_chld (a int);
ALTER TABLE notnull_chld ADD CONSTRAINT nn_child not null a not valid;
ALTER TABLE notnull_chld INHERIT notnull_tbl1;
ALTER TABLE notnull_tbl1 VALIDATE CONSTRAINT nn_parent;

does mark the constraint as validated in the child, but only in
pg_constraint -- pg_attribute continues to be marked as 'i', so if you
try to use it for a PK, it fails:

alter table notnull_chld add constraint foo primary key (a);
ERROR:  column "a" of table "notnull_chld" is marked as NOT VALID NOT NULL 
constrint

I thought that was going to be a quick fix, so I tried to do so; since
we already have a function 'set_attnotnull', I thought it was the
perfect tool to changing attnotnull.  However, it's not great, because
since that part of the code is already doing the validation, I don't
want it to queue the validation again, so the API needs a tweak; I
changed it to receiving separately which new value to update attnotnull
to, and whether to queue validation.  With that change it works
correctly, but it is a bit ugly at the callers' side.  Maybe it works to
pass two booleans instead?  Please have a look at whether that can be
improved.


I also noticed the addition of function getNNConnameForAttnum(), which
does pretty much the same as findNotNullConstraintAttnum(), only it
ignores all validate constraints instead of ignoring all non-validated
constraints.  So after looking at the callers of the existing function
and wondering which ones of them really wanted only the validated
constraints?  It turns that the answer is none of them.  So I decided to
remove the check for that, and instead we need to add checks to every
caller of both findNotNullConstraintAttnum() and findNotNullConstraint()
so that it acts appropriately when a non-validated constraint is
returned.  I added a few elog(WARNING)s when this happens; running the
tests I notice that none of them fire.  I'm pretty sure this indicates
holes in testing: we have no test cases for these scenarios, and we
should have them for assurance that we're doing the right things.  I
recommend that you go over those WARNINGs, add test cases that make them
fire, and then fix the code so that the test cases do the right thing.
Also, just to be sure, please go over _all_ the callers of
those two functions and make sure all cases are covered by tests that
catch invalid constraints.


I also noticed that in the one place where getNNConnameForAttnum() was
called, we were passing the parent table's column number.  But in child
tables, even in partitions, the column numbers can differ from parent to
children.  So we need to walk down the hierarchy using the column name,
not the column number.  This would have become visible if the test cases
had included inheritance trees with varying column shapes.


The docs continue to say this:
  This form adds a new constraint to a table using the same constraint
  syntax as CREATE 
TABLE, plus the option NOT
  VALID, which is currently only allowed for foreign key
  and CHECK constraints.
which is missing to indicate that NOT VALID is valid for NOT NULL.

Also I think the docs for attnotnull in catalogs.sgml are a bit too
terse; I would write "The value 't' indicates that a not-null constraint
exists for the column; 'i' for an invalid constraint, 'f' for none."
which please feel free to use if you want, but if you want to come up
with your own wording, that's great too.


The InsertOneNull() function used in bootstrap would not test values for
nullness in presence of invalid constraints.  This change is mostly
pro-forma, since we don't expect invalid constraints during bootstrap,
but it seemed better to be tidy.

I have not looked at the pg_dump code yet, so the changes included here
are just pgindent.

Thank you!

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
>From ac8e690c64d354faad421a48f5adf0e26a15202b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= 
Date: Thu, 20 Feb 2025 13:05:35 +0100
Subject: [PATCH] fixup

---
 src/backend/bootstrap/bootstrap.c |   4 +-
 src/backend/catalog/pg_constraint.c   |  17 +-
 src/backend/commands/tablecmds.c  | 229 ++
 src/bin/pg_dump/pg_dump.c |  15 +-
 src/bin/pg_dump/pg_dump.h |   2 +-
 src/test/regress/expected/constraints.out |  10 +-
 src/test/regress/sql/constraints.sql  |   8 +-
 7 files changed, 131 insertions(+), 154 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c 
b/src/backend/bootstrap/bootstrap.c
index 1e95dc32f46..919972dc409 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -582,10 +582,8 @@ DefineAttr(char *name, char *

Re: Reset the output buffer after sending from WalSndWriteData

On Thu, Feb 20, 2025 at 12:50 PM Markus Wanner
 wrote:
>
> Hi,
>
> I recently stumbled over an issue with an unintentional re-transmission.
> While this clearly was our fault in the output plugin code, I think the
> walsender's exposed API could easily be hardened to prevent the bad
> consequence from this mistake.
>
> Does anything speak against the attached one line patch?

According to the documentation[1], OutputPluginPrepareWrite() has to
be called before OutputPluginWrite(). When it comes to walsender
codes, we reset the ctx->out buffer in WalSndPrepareWrite() called via
OutputPluginPrepareWrite(). Could you share the case where you faced
the unintentional re-transmission error?

Regards,

[1] 
https://www.postgresql.org/docs/devel/logicaldecoding-output-plugin.html#LOGICALDECODING-OUTPUT-PLUGIN-OUTPUT

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Add Pipelining support in psql

2025-02-20 Thread Michael Paquier

On Tue, Feb 18, 2025 at 06:34:20PM +0100, Anthonin Bonnefoy wrote:
> On Tue, Feb 18, 2025 at 8:23 AM Michael Paquier  wrote:
>> The tests in psql.sql are becoming really long.  Perhaps it would be
>> better to split that into its own file, say psql_pipeline.sql?  The
>> input file is already 2k lines, you are adding 15% more lines to that.
> 
> Agreed, I wasn't sure if this was enough to warrant a dedicated test
> file. This is now separated in psql_pipeline.sql.

You have forgotten the expected output.  Not a big issue as the input
was sent.

>> What is the reasoning here behind this restriction?  \gx is a wrapper
>> of \g with expanded mode on, but it is also possible to call \g with
>> expanded=on, bypassing this restriction.
> 
> The issue is that \gx enables expanded mode for the duration of the
> query and immediately reset it in sendquery_cleanup. With pipelining,
> the command is piped and displaying is done by either \endpipeline or
> \getresults, so the flag change has no impact. Forbidding it was a way
> to make it clearer that it won't have the expected effect. If we
> wanted a similar feature, this would need to be done with something
> like \endpipelinex or \getresultsx.

Hmm, okay.  If one wants one mode or the other it is always possible
to force one with \pset expanded when getting the results.  Not sure
if there is any need for new specific commands for these two printing
the results.  Another option would be to authorize the command to run,
but perhaps your option is just better as per the enforced behavior in
the output.  So fine by me.  There is coverage so we'll know if there
are arguments in favor of authorizing the command, if need be.

> I've split the patch and created the 3 special variables:
> PIPELINE_SYNC_COUNT, PIPELINE_COMMAND_COUNT, PIPELINE_RESULT_COUNT.

Thanks.  Looks sensible now.

> For requested_results, I don't think there's value in exposing it
> since it is used as an exit condition and thus will always be 0
> outside of ExecQueryAndProcessResults.

I've been playing with this patch and this configuration:
\set PROMPT1 
'=(pipeline=%P,sync=%:PIPELINE_SYNC_COUNT:,cmd=%:PIPELINE_COMMAND_COUNT:,res=%:PIPELINE_RESULT_COUNT:)%#'
 

That's long, but seeing the evolution of the pipeline status is pretty
cool depending on the meta-commands used.

While testing, I have been able to run into an assertion failure by
adding some tests in psql.sql to check for the case of inactive
branches for \if.  For example:
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1047,11 +1047,15 @@ select \if false \\ (bogus \else \\ 42 \endif \\ 
forty_two;
\echo arg1 arg2 arg3 arg4 arg5
\echo arg1
\encoding arg1
+   \endpipeline
\errverbose

And the report:
+psql: mainloop.c:513: MainLoop: Assertion `conditional_active(cond_stack)' 
failed.

We should have tests for all new six meta-commands in psql.sql.
MainLoop() is wrong when in pipeline mode for inactive branches.
--
Michael


signature.asc
Description: PGP signature

Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

2025-02-20 Thread Bertrand Drouvot

Hi,

On Thu, Feb 20, 2025 at 02:37:18PM +0900, Michael Paquier wrote:
> On Wed, Feb 19, 2025 at 09:24:41AM +, Bertrand Drouvot wrote:
> > That's right but that would mean almost duplicating the pg_stat_wal related
> > stuff (because it can't be removed from the doc until the fields are gone). 
> > I 
> > think it's simpler to do it as proposed initially (the end result is the 
> > same).
> 
> After sleeping on it, I still saw no reason to not apply the changes
> from 0002 in wal.sgml to describe that the stats for the writes/fsyncs
> are in pg_stat_io rather than pg_stat_wal for the "WAL configuration"
> section, so done that.

I see, I did not get that you wanted to get rid of the pg_stat_wal part before
removing the fields. Makes sense that way to "just" replace the pg_stat_wal
by pg_stat_io first in the doc as you've done in 4538bd3f1dd.

> and I've moved that as a paragraph not under the tuning part, to make
> it a more general statement with its link to "WAL configuration",
> which is a very popular link for pg_stat_io.

Makes more sense, agree.

> Attached is the rest, as of v3-0002.

LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Commitfest app release on Feb 17 with many improvements

2025-02-20 Thread Tatsuo Ishii

> At the FOSDEM dev meeting we discussed potential improvements to the
> commitfest app and how to handle deploying future changes with minimal
> disruption to existing workflows. We're going to try a new approach:
> announcing a new commitfest release ~2 weeks in advance, including
> release notes. That way people can try out the changes on the staging
> environment and raise concerns/objections. So hereby such an
> announcement:
> 
> The next commitfest app release will take place on February 17 and
> will contain the following user facing changes:
> 
> 1. Showing CFBot status on patch and commitfest pages
> 2. Showing a link to the CFBot github diff on patch and commitfest pages
> 3. Showing total additions/deletions of the most recent patchset on
> patch and commitfest pages
> 4. Showing additions/deletions of the first patch of the most recent
> patchset on the patch page
> 5. Showing the version and number of patches of the most recent
> patchset on the patch page
> 6. Canonical links for the patch page are now /patch/{patchid} instead
> of /{cfid}/{patchid}
> 7. If you subscribed for emails or patches where you're an author, you
> will now get an email when your patch needs a rebase.
> 8. Clicking headers will toggle sorting, instead of always enabling it
> 9. Allow sorting patches by name
> 10. Allow sorting patches by total additions/deletions (added together
> for total lines changed)
> 11. Add ID column to commitfest page (allows sorting)
> 12. Allow sorting using the header of closed patches too
> 13. Fix dropdown direction for dropdowns at the bottom of the page
> (they now drop up)
> 14. The previous GiHub and CirrusCI links are now integrated into the
> new CFBot status row of the patch page
> 15. The previous GitHub checkout instructions are now integrated into
> the new CFBot status row as a button that copies the commands to the
> clipboard
> 
> You can try out these changes on the staging environment[1] with
> pgtest:pgtest as user & password for the http auth. If you want to
> make edits you need to create a new account, read-only doesn't need an
> account.
> 
> [1]: https://commitfest-test.postgresql.org/

I noticed some of entries are shown with the author field being empty.
e.g. https://commitfest.postgresql.org/patch/5525/

Is this normal?
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Commitfest app release on Feb 17 with many improvements

On Thu, Feb 20, 2025 at 10:53 PM Tatsuo Ishii  wrote:
> I noticed some of entries are shown with the author field being empty.
> e.g. https://commitfest.postgresql.org/patch/5525/

When the layout of https://commitfest.postgresql.org/52/ changed,
cfbot's web scraping logic could no longer find the authors :-(.
That's a stupid problem to have, that we are going to solve with a new
machine-friendly API, but let me try to fix it quickly...

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-20 Thread Shubham Khanna

On Wed, Feb 19, 2025 at 4:23 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch!
>
> > I have added a test case for non-dry-run mode to ensure that
> > replication slots, subscriptions, and publications work as expected
> > when '--all' is specified. Additionally, I have split the test file
> > into two parts:
> > 1) Command-line sanity checks – Validates the correctness of option parsing.
> > 2) '--all' functionality and behavior – Verifies the health of
> > subscriptions and ensures that --all specific scenarios, such as
> > non-template databases not being subscribed to, are properly handled.
> > This should improve test coverage while keeping the structure manageable.
>
> TBH, I feel your change does not separate the test file into the two parts. 
> ISTM
> you just added some validation checks and a test how --all option works.
> Ashutosh, does it match your suggestion?
>
> Anyway, here are my comments.
>
> 01.
> ```
> +   int num_rows;
> ```
>
> I think num_rows is not needed, we can do like:
>
> ```
> /* Process the query result */
> -   num_rows = PQntuples(res);
> -   for (int i = 0; i < num_rows; i++)
> +   for (int i = 0; i < PQntuples(res); i+
> ```
> And
> ```
> /* Error if no databases were found on the source server */
> -   if (num_rows == 0)
> +   if (num_dbs == 0)
> ```
>

Fixed.

> 02.
> ```
> +   opt.all = false;
> ```
>
> This must be done after initializing recovery_timeout.
>

Fixed.

> 03.
> ```
> +# Set up node S1 as standby linking to node P
> +$node_p->backup('backup_3');
> +my $node_s1 = PostgreSQL::Test::Cluster->new('node_s1');
> ```
>
> I do not like the name "S1" because this is the second standby sever
> from the node_p.
>

Fixed.

> 04.
> ```
> +# Verify that only user databases got subscriptions (not template databases)
> +my @user_dbs = ($db1, $db2);
> +foreach my $dbname (@user_dbs)
> +{
> +   my $sub_exists =
> + $node_s1->safe_psql($dbname, "SELECT count(*) FROM 
> pg_subscription;");
> +   is($sub_exists, '3', "Subscription created successfully for $dbname");
> +}
> ```
>
> Hmm, what do you want to check here? pg_subscription is a global catalog so 
> that
> very loop will see exactly the same content. Also, 'postgres' is also the 
> user database.
> I feel you must ensure that all three databases (postgres, $db1, and $db2) 
> have a
> subscription here.
>

Fixed.

> 05.
> ```
> +# Verify replication is working
> +$node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES ('replication test');");
> +
> +$result = $node_s1->safe_psql($db1, "SELECT * FROM tbl1 ORDER BY 1");
> +is( $result, qq(first row
> +replication test
> +second row), "replication successful in $db1");
> +
> +$node_s1->stop;
>  # Run pg_createsubscriber on node S.  --verbose is used twice
>  # to show more information.
>  command_ok(
> ```
>
> I'm not sure the part is not needed. This have already been checked in other 
> parts, right?
>

Fixed.

The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v10-0001-Enhance-pg_createsubscriber-to-fetch-and-append-.patch
Description: Binary data

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-20 Thread Shubham Khanna

On Thu, Feb 20, 2025 at 1:23 PM Peter Smith  wrote:
>
> Hi Shubham, Here are some review comments for v9-0001.
>
> ==
> Commit message
>
> 1.
> You've changed the option to '--all' but the comment message still
> refers to '-all-databases'
>

Fixed.

> ==
> doc/src/sgml/ref/pg_createsubscriber.sgml
>
> 2.
> +  
> +   Automatically fetch all non-template databases from the source server
> +   and create subscriptions for databases with the same names on the
> +   target server.
>
> I don't see what is "automatic" about this, nor do I know really what
> "fetch" of databases is trying to convey.
>
> Isn't it simpler to just say like below?
>
> SUGGESTION
> For all source server non-template databases create subscriptions for...
>

Fixed.

> ==
> src/bin/pg_basebackup/pg_createsubscriber.c
>
> 3.
>   int recovery_timeout; /* stop recovery after this time */
> + bool all; /* --all option was specified */
>  };
>
>
> IMO 'all' is too generic for a field name; it has almost no meaning --
> all what? I think the original 'all_databases' name was better. Or
> 'all_dbs', but not just 'all'.
>
> ~~~

Fixed.

>
> 4.
> + if (opt.all)
> + {
> + if (num_dbs > 0)
> + {
> + pg_log_error("%s cannot be used with %s", "--all", "--database");
> + pg_log_error_hint("Try \"%s --help\" for more information.", progname);
> + exit(1);
> + }
> +
> + if (num_pubs > 0)
> + {
> + pg_log_error("%s cannot be used with %s", "--all", "--publication");
> + pg_log_error_hint("Try \"%s --help\" for more information.", progname);
> + exit(1);
> + }
> +
> + if (num_replslots > 0)
> + {
> + pg_log_error("%s cannot be used with %s", "--all", "--replication-slot");
> + pg_log_error_hint("Try \"%s --help\" for more information.", progname);
> + exit(1);
> + }
> +
> + if (num_subs > 0)
> + {
> + pg_log_error("%s cannot be used with %s", "--all", "--subscription");
> + pg_log_error_hint("Try \"%s --help\" for more information.", progname);
> + exit(1);
> + }
>
> 4a
> If bad combinations exist then IMO it is most likely that the --all
> came before the incompatible option. So the parameters should be the
> other way around so the result is "-- cannot be used with --all".
>
> ~

Fixed.

>
> 4b.
> You could eliminate all this code duplication by using a variable for
> the bad option,
>
> SUGGESTION
> char *bad_switch = NULL;
>
> if (num_dbs > 0)bad_switch = "--database";
> else if (num_pubs > 0)  bad_switch = "--publication";
> else if (num_replslots > 0) bad_switch = "--replication_slot";
> else if (num_subs > 0)  bad_switch = "--subscription";
>
> if (bad_switch)
> {
>   pg_log_error("%s cannot be used with --all", bad_switch);
>   pg_log_error_hint("Try \"%s --help\" for more information.", progname);
>   exit(1);
> }
>
> ~
>

Fixed.

> 4c.
> There is a double-blank line after this code fragment.
>

Fixed.

> ==
> .../t/040_pg_createsubscriber.pl
>
> 5.
> +# run pg_createsubscriber with '--database' and '--all' and verify the
> +# failure
> +command_fails_like(
> + [
> + 'pg_createsubscriber',
> + '--verbose',
> + '--dry-run',
> + '--pgdata' => $node_s->data_dir,
> + '--publisher-server' => $node_p->connstr($db1),
> + '--socketdir' => $node_s->host,
> + '--subscriber-port' => $node_s->port,
> + '--database' => $db1,
> + '--all',
> + ],
> + qr/--all cannot be used with --database/,
> + 'fail if --all is used with --database');
> +
>
> There is no difference anymore in the logic/error if the options are
> given in order "--all --database"  or if they are in order "--database
> --all". So you no longer need to have separate test cases for
> different ordering.
>
> ~~~

Fixed.

>
> 6.
> +# Verify that only user databases got subscriptions (not template databases)
> +my @user_dbs = ($db1, $db2);
> +foreach my $dbname (@user_dbs)
> +{
> + my $sub_exists =
> +   $node_s1->safe_psql($dbname, "SELECT count(*) FROM pg_subscription;");
> + is($sub_exists, '3', "Subscription created successfully for $dbname");
> +}
>
> AFAICT this is just checking a global subscription count in a loop so
> it will be 3, 3, 3. Why?
>
> I guess you intended to verify that relevant subscriptions were
> created for each of the target databases. But this code is not doing
> that.
>
> ~~~

Fixed.

>
> 7.
> +# Verify replication is working
> +$node_p->safe_psql($db1, "INSERT INTO tbl1 VALUES ('replication test');");
> +
> +$result = $node_s1->safe_psql($db1, "SELECT * FROM tbl1 ORDER BY 1");
> +is( $result, qq(first row
> +replication test
> +second row), "replication successful in $db1");
> +
> +$node_s1->stop;
>  # Run pg_createsubscriber on node S.  --verbose is used twice
>  # to show more information.
>  command_ok(
> @@ -431,8 +590,9 @@ $result = $node_s->safe_psql($db1,
>  is($result, qq(0), 'failover slot was removed');
>
>  # Check result in database $db1
> -$result = $node_s->safe_psql($db1, 'SELECT * FROM tbl1');
> +$result = $node_s->safe_psql($db1, 'SELECT * FROM tbl1 ORDER BY 1');
>  is( $result, qq(first row
> +replicat

Re: Statistics Import and Export

2025-02-20 Thread vignesh C

On Thu, 20 Feb 2025 at 15:09, Jeff Davis  wrote:
>
> On Wed, 2025-02-12 at 19:00 -0800, Jeff Davis wrote:
> > I'm still reviewing v48, but I intend to commit something soon.
>
> Committed with some revisions on top of v48:

I was checking buildfarm for another commit of mine, while checking I
noticed there is a failure in crake at [1].  I felt it might be
related to this commit as it had passed with earlier runs:
--- /home/andrew/bf/root/upgrade.crake/HEAD/origin-REL9_2_STABLE.sql.fixed
2025-02-20 04:43:40.461092087 -0500
+++ 
/home/andrew/bf/root/upgrade.crake/HEAD/converted-REL9_2_STABLE-to-HEAD.sql.fixed
2025-02-20 04:43:40.463092092 -0500
@@ -184,21 +184,87 @@
 --
 SELECT * FROM pg_catalog.pg_restore_relation_stats(
  'relation', '"MySchema"."Foo"'::regclass,
- 'version', '90224'::integer,
- 'relpages', '0'::integer,
- 'reltuples', '0'::real,
+ 'version', '00'::integer,
+ 'relpages', '1'::integer,
+ 'reltuples', '1'::real,
  'relallvisible', '0'::integer

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=crake&dt=2025-02-20%2009%3A32%3A03&stg=xversion-upgrade-REL9_2_STABLE-HEAD

Regards,
Vignesh

Re: ReplicationSlotRelease() crashes when the instance is in the single user mode

2025-02-20 Thread Bertrand Drouvot

Hi,

On Thu, Feb 20, 2025 at 02:22:41AM +, Hayato Kuroda (Fujitsu) wrote:
> Dear Michael,
> 

Thanks for the report and the patch!

> > I did not check how these call behave individually, just a few
> > comments while putting my eyes on the patch.
> > 
> > +   if (!IsUnderPostmaster)
> > +   elog(ERROR,
> > +"slot operation is prohibited in the single user 
> > mode");
> > 
> > elog() should not be used for failures that can be user-facing as this
> > would not provide any translation.
> 
> I intentionally used elog() because I thought single user mode is not 
> user-facing.
> But it is OK for me to use ereport() instead.

Yeah, I think it's also about using ereport for cases that we think can happen (
and so needs to be translated). In this case, it clearly can happen so ereport()
is better.

> > I'd suggest rewording the error message to provide some more context,
> > as well, say:
> > "cannot use %s in single-user mode", "function_name"
> 
> Fixed. PSA new version

=== 1

Nit:

"cannot use function %s in single-user mode", "function_name"" maybe? (that 
would
somehow match the existing ""user-defined types cannot use subscripting 
function %s")

=== 2

+ "pg_copy_replication_slot")));

s/pg_copy_replication_slot/copy_replication_slot/ maybe? As it is the function
that would report the error actually (could be called from
pg_copy_logical_replication_slot_[a|b|c] and 
pg_copy_physical_replication_slot_[a|b]
though).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Commitfest app release on Feb 17 with many improvements

2025-02-20 Thread Tatsuo Ishii

>> On Thu, Feb 20, 2025 at 10:53 PM Tatsuo Ishii  wrote:
>> > I noticed some of entries are shown with the author field being empty.
>> > e.g. https://commitfest.postgresql.org/patch/5525/
>>
>> When the layout of https://commitfest.postgresql.org/52/ changed,
>> cfbot's web scraping logic could no longer find the authors :-(.
>> That's a stupid problem to have, that we are going to solve with a new
>> machine-friendly API, but let me try to fix it quickly...
> 
> I believe Tatsuo was talking about the author field on the commitfest
> app, not the cfbot.

Yes, I was talking abot the commitfest app.

> Assuming I'm correct there, I think for the link you shared simply no
> Author was added when the user created the entry. In the next
> commitfest release (March 4th), that should not happen anymor (or at
> least not by accident) because then it will start to autofill yourself
> as author when you submit a patch.

Oh, I thought the field was automatically created but looks like it
was wrong assumption. Thanks for the explanation.

Best reagards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

RE: create subscription with (origin = none, copy_data = on)

2025-02-20 Thread Zhijie Hou (Fujitsu)

On Wednesday, January 29, 2025 8:19 PM Shlok Kyal  
wrote:
> 
> I have addressed the comments. Here is an updated patch.

Thanks for updating the patch. The patches look mostly OK to me, I only have
one minor comments in 0002.

1.

+CREATE PUBLICATION pub_b_c_2 FOR TABLE tab_part2_1;
+));
+
+($result, $stdout, $stderr) = $node_C->psql(
+   'postgres', "
+   CREATE SUBSCRIPTION sub_b_c CONNECTION '$node_B_connstr' PUBLICATION 
pub_b_c WITH (origin = none, copy_data = on);
+");

The naming style of new publications and subscriptions doesn't seem consistent
with existing ones in 030_origin.

Best Regards,
Hou zj

Assert when executing query on partitioned table

2025-02-20 Thread Dmitry Koval


Hi!

I got an Assert when executing an "INSERT ... ON CONFLICT ... UPDATE 
..." query on partitioned table. Managed to reproduce this situation.


Reproduction order.
---

1) Apply the patch 
[v1-0001-Triggering-Assert-on-query-with-ON-CONFLICT.patch] to "master" 
branch.


2) Build postgres with key "--enable-injection-points" + build an 
extension "injection_points" (src/test/modules/injection_points).


3) Run isolation test onconflict.spec:
make check -C src/test/modules/injection_points

Assert is triggered in postgres with stack, see attached file [stack.txt].

Clarification.
--
In the query "INSERT ... ON CONFLICT ... UPDATE ..." when executing 
INSERT, a conflict is triggered. But when trying to execute UPDATE, our 
tuple has already been moved to another partition and Assert is triggered.


Fixing.
---
I suggest replace Assert with an error message, see 
[v1-0001-draft-of-fix.patch]. This is not a final fix as I am confused 
by the comment for Assert: "we don't support an UPDATE of INSERT ON 
CONFLICT for a partitioned table".

(Why "don't support an UPDATE"?
 It's not forbidden by syntax or errors ...)

--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com
#0  0x72313029eb1c in pthread_kill () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x72313024526e in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x7231302288ff in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x5e3fccec9ff8 in ExceptionalCondition (conditionName=0x5e3fcd0b49c0 
"!ItemPointerIndicatesMovedPartitions(&tmfd.ctid)", fileName=0x5e3fcd0b3fe0 
"nodeModifyTable.c", lineNumber=2797) at assert.c:66
#4  0x5e3fcca2264f in ExecOnConflictUpdate (context=0x7ffd08eee220, 
resultRelInfo=0x5e3fce06a310, conflictTid=0x7ffd08eee0cc, 
excludedSlot=0x5e3fce069640, canSetTag=true, returning=0x7ffd08eee0d8) at 
nodeModifyTable.c:2797
#5  0x5e3fcca1fd6f in ExecInsert (context=0x7ffd08eee220, 
resultRelInfo=0x5e3fce06a310, slot=0x5e3fce069640, canSetTag=true, 
inserted_tuple=0x0, insert_destrel=0x0) at nodeModifyTable.c:1133
#6  0x5e3fcca250e3 in ExecModifyTable (pstate=0x5e3fce068cf0) at 
nodeModifyTable.c:4324
#7  0x5e3fcc9e0454 in ExecProcNodeFirst (node=0x5e3fce068cf0) at 
execProcnode.c:469
#8  0x5e3fcc9d2f5b in ExecProcNode (node=0x5e3fce068cf0) at 
../../../src/include/executor/executor.h:272
#9  0x5e3fcc9d5de8 in ExecutePlan (queryDesc=0x5e3fce10edf0, 
operation=CMD_INSERT, sendTuples=false, numberTuples=0, 
direction=ForwardScanDirection, dest=0x5e3fce1149e8) at execMain.c:1675
#10 0x5e3fcc9d3618 in standard_ExecutorRun (queryDesc=0x5e3fce10edf0, 
direction=ForwardScanDirection, count=0) at execMain.c:364
#11 0x5e3fcc9d3479 in ExecutorRun (queryDesc=0x5e3fce10edf0, 
direction=ForwardScanDirection, count=0) at execMain.c:301
#12 0x5e3f0ab0 in ProcessQuery (plan=0x5e3fce042778, 
sourceText=0x5e3fce040ef0 "INSERT INTO t_int VALUES (1, 11, 111) ON CONFLICT 
(i) DO UPDATE SET x = excluded.x;", params=0x0, queryEnv=0x0, 
dest=0x5e3fce1149e8, qc=0x7ffd08eee670) at pquery.c:160
#13 0x5e3f25e0 in PortalRunMulti (portal=0x5e3fce0bf8a0, 
isTopLevel=true, setHoldSnapshot=false, dest=0x5e3fce1149e8, 
altdest=0x5e3fce1149e8, qc=0x7ffd08eee670) at pquery.c:1271
#14 0x5e3f1aec in PortalRun (portal=0x5e3fce0bf8a0, 
count=9223372036854775807, isTopLevel=true, dest=0x5e3fce1149e8, 
altdest=0x5e3fce1149e8, qc=0x7ffd08eee670) at pquery.c:787
#15 0x5e3fcccba5bf in exec_simple_query (query_string=0x5e3fce040ef0 
"INSERT INTO t_int VALUES (1, 11, 111) ON CONFLICT (i) DO UPDATE SET x = 
excluded.x;") at postgres.c:1271
#16 0x5e3fcccbf954 in PostgresMain (dbname=0x5e3fce078fa0 
"isolation_regression", username=0x5e3fce078f88 "kovdb") at postgres.c:4691
#17 0x5e3fcccb5fdf in BackendMain (startup_data=0x7ffd08eee8f0 "", 
startup_data_len=4) at backend_startup.c:107
#18 0x5e3fccbc24d7 in postmaster_child_launch (child_type=B_BACKEND, 
child_slot=2, startup_data=0x7ffd08eee8f0 "", startup_data_len=4, 
client_sock=0x7ffd08eee950) at launch_backend.c:274
#19 0x5e3fccbc8ec3 in BackendStartup (client_sock=0x7ffd08eee950) at 
postmaster.c:3519
#20 0x5e3fccbc646c in ServerLoop () at postmaster.c:1688
#21 0x5e3fccbc5d64 in PostmasterMain (argc=8, argv=0x5e3fcdffe970) at 
postmaster.c:1386
#22 0x5e3fcca67011 in main (argc=8, argv=0x5e3fcdffe970) at main.c:230
From 02b89799a2f139a3cab41b1df761103cf8627098 Mon Sep 17 00:00:00 2001
From: Koval Dmitry 
Date: Wed, 19 Feb 2025 13:09:33 +0300
Subject: [PATCH v1] Triggering Assert on query with ON CONFLICT

---
 src/backend/executor/nodeModifyTable.c|  2 ++
 src/test/modules/injection_points/Makefile| 11 +++
 .../injection_points/specs/onconflict.spec| 30 +++
 3 files changed, 38 insertions(+), 5 deletions(-)
 create mode 100644 src/test/modules/injection_points/specs/onconflict.spec

diff --git a/src/backend/executor/nodeModifyTable.c

Re: Commitfest app release on Feb 17 with many improvements

On Thu, Feb 20, 2025 at 11:53 PM Jelte Fennema-Nio  wrote:
> On Thu, 20 Feb 2025 at 11:07, Thomas Munro  wrote:
> > When the layout of https://commitfest.postgresql.org/52/ changed,
> > cfbot's web scraping logic could no longer find the authors :-(.
> > That's a stupid problem to have, that we are going to solve with a new
> > machine-friendly API, but let me try to fix it quickly...
>
> I believe Tatsuo was talking about the author field on the commitfest
> app, not the cfbot.

Oh.  Well the author was also missing from the commit messages, and
that's now fixed.

Re: Statistics Import and Export

2025-02-20 Thread Jeff Davis

On Wed, 2025-02-12 at 19:00 -0800, Jeff Davis wrote:
> I'm still reviewing v48, but I intend to commit something soon.

Committed with some revisions on top of v48:

* removed the short option -X, leaving the long option "--statistics-
only" with the same meaning.

* removed the redundant --with-statistics option for pg_upgrade,
because that's the default anyway.

* removed an unnecessary enum TocEntryType and cleaned up the API to
just pass the desired prefix directly to _printTocEntry().

* stabilized the 002_pg_upgrade test by turning off autovacuum before
the first pg_dumpall (we still want it to run before that to collect
stats).

* stabilized the 027_stream_regress recovery test by specifying --no-
statistics when comparing the data on primary and standby

* fixed the cross-version upgrade tests by using the
adjust_old_dumpfile to replace the version specifier with 00 in the
argument list to pg_restore_* functions.

Regards,
Jeff Davis

RE: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

Dear Ajin,

Here are my comments. I must play with patches to understand more.

01.
```
extern bool ReorderBufferFilterByRelFileLocator(ReorderBuffer *rb, 
TransactionId xid,

XLogRecPtr lsn, RelFileLocator *rlocator,

ReorderBufferChangeType change_type,

bool has_tuple);
```

Can you explain why "has_tuple is needed? All callers is set to true.

02.
```
static Relation
ReorderBufferGetRelation(ReorderBuffer *rb, RelFileLocator *rlocator,
 bool has_tuple)
```

Hmm, the naming is bit confusing for me. This operation is mostly not related 
with
the reorder buffer. How about "GetPossibleDecodableRelation" or something?

03.
```
if (IsToastRelation(relation))
{
Oid real_reloid = InvalidOid;
char   *toast_name = RelationGetRelationName(relation);
/* pg_toast_ len is 9 */
char   *start_ch = &toast_name[9];

real_reloid = pg_strtoint32(start_ch);
entry->relid = real_reloid;
}
```

It is bit hacky for me. How about using sscanf like attached?

04.

IIUC, toast tables always require the filter_change() call twice, is it right?
I understood like below:

1. ReorderBufferFilterByRelFileLocator() tries to filter the change at outside 
the
   transaction. The OID indicates the pg_toast_xxx table.
2. pgoutput_filter_change() cannot find the table from the hash. It returns 
false
   with cache_valid=false.
3. ReorderBufferFilterByRelFileLocator() starts a transaction and get its 
relation.
4. The function recognizes the relation seems toast and get parent oid.
5. The function tries to filter the change in the transaction, with the parent 
oid.
6. pgoutput_filter_change()->get_rel_sync_entry() enters the parent relation to 
the
   hash and return determine the filtable or not.
7. After sometime, the same table is modified. But the toast table is not 
stored in
   the hash so that whole 1-6 steps are required.

I feel this may affect the perfomance when many toast is modified. How about 
skiping
the check for toasted ones? ISTM IsToastNamespace() can be used for the 
decision.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Commitfest app release on Feb 17 with many improvements

2025-02-20 Thread Jelte Fennema-Nio

On Thu, 20 Feb 2025 at 11:07, Thomas Munro  wrote:
>
> On Thu, Feb 20, 2025 at 10:53 PM Tatsuo Ishii  wrote:
> > I noticed some of entries are shown with the author field being empty.
> > e.g. https://commitfest.postgresql.org/patch/5525/
>
> When the layout of https://commitfest.postgresql.org/52/ changed,
> cfbot's web scraping logic could no longer find the authors :-(.
> That's a stupid problem to have, that we are going to solve with a new
> machine-friendly API, but let me try to fix it quickly...

I believe Tatsuo was talking about the author field on the commitfest
app, not the cfbot.

Assuming I'm correct there, I think for the link you shared simply no
Author was added when the user created the entry. In the next
commitfest release (March 4th), that should not happen anymor (or at
least not by accident) because then it will start to autofill yourself
as author when you submit a patch.

RE: ReplicationSlotRelease() crashes when the instance is in the single user mode

2025-02-20 Thread Zhijie Hou (Fujitsu)

On Thursday, February 20, 2025 10:23 AM Hayato Kuroda (Fujitsu) 
 wrote:
> 
> Dear Michael,
> 
> > I did not check how these call behave individually, just a few
> > comments while putting my eyes on the patch.
> >
> > +   if (!IsUnderPostmaster)
> > +   elog(ERROR,
> > +"slot operation is prohibited in the single user
> mode");
> >
> > elog() should not be used for failures that can be user-facing as this
> > would not provide any translation.
> 
> I intentionally used elog() because I thought single user mode is not
> user-facing.
> But it is OK for me to use ereport() instead.
> 
> > I'd suggest rewording the error message to provide some more context,
> > as well, say:
> > "cannot use %s in single-user mode", "function_name"
> 
> Fixed. PSA new version

I'm curious about the scope of the restrictions we plan to add. For example,
the current patch does not include checks in the functions used for consuming
changes (such as pg_logical_slot_get_changes). Was this omission intentional?

Best Regards,
Hou zj

Re: Non-text mode for pg_dumpall

2025-02-20 Thread jian he

hi.
about 0001

/*
 * connectDatabase
 *
 * Make a database connection with the given parameters.  An
 * interactive password prompt is automatically issued if required.
 *
 * If fail_on_error is false, we return NULL without printing any message
 * on failure, but preserve any prompted password for the next try.
 *
 * On success, the global variable 'connstr' is set to a connection string
 * containing the options used.
 */
PGconn *
connectDatabase(const char *dbname, const char *connection_string,
const char *pghost, const char *pgport, const char *pguser,
trivalue prompt_password, bool fail_on_error, const
char *progname,
const char **connstr, int *server_version)
do the comments need to change? since no
global variable 'connstr' in common_dumpall_restore.c
maybe we need some words to explain server_version, (i don't have a
huge opinion though).


/*-
 *
 * common_dumpall_restore.c
 *
 * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
 * This is a common file for pg_dumpall and pg_restore.
 * src/bin/pg_dump/common_dumpall_restore.c
 *
 *-
 */

may change to

/*-
 *
 * common_dumpall_restore.c
 * This is a common file for pg_dumpall and pg_restore.
 *
 * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
 * IDENTIFICATION
 *src/bin/pg_dump/common_dumpall_restore.c
 *
 *-
 */
so the style aligns with most other files.
(we can apply the same logic to src/bin/pg_dump/common_dumpall_restore.h)


in src/bin/pg_dump/pg_dumpall.c
#include "common_dumpall_restore.h"
imply include "pg_backup.h".
so in src/bin/pg_dump/pg_dumpall.c, we don't need include "pg_backup.h"


attached are minor cosmetic changes for v19.


v19_minor_change.no-cfbot
Description: Binary data

Re: Add Pipelining support in psql

2025-02-20 Thread Anthonin Bonnefoy

On Thu, Feb 20, 2025 at 9:02 AM Michael Paquier  wrote:
> You have forgotten the expected output.  Not a big issue as the input
> was sent.

I was writing the mail with the missing file when you sent this mail.
This is fixed.

> While testing, I have been able to run into an assertion failure by
> adding some tests in psql.sql to check for the case of inactive
> branches for \if.  For example:
> --- a/src/test/regress/sql/psql.sql
> +++ b/src/test/regress/sql/psql.sql
> @@ -1047,11 +1047,15 @@ select \if false \\ (bogus \else \\ 42 \endif \\ 
> forty_two;
> \echo arg1 arg2 arg3 arg4 arg5
> \echo arg1
> \encoding arg1
> +   \endpipeline
> \errverbose
>
> And the report:
> +psql: mainloop.c:513: MainLoop: Assertion `conditional_active(cond_stack)' 
> failed.
>
> We should have tests for all new six meta-commands in psql.sql.
> MainLoop() is wrong when in pipeline mode for inactive branches.

Ha yeah, I forgot about the inactive branches. I've added the new
commands and fixed the behaviour.

A small issue I've noticed while testing: When a pipeline has at least
one queue command, pqClearConnErrorState isn't called in
PQsendQueryStart and errors are appended. For example:

\startpipeline
select 1 \bind \g
select 1;
PQsendQuery not allowed in pipeline mode
select 1;
PQsendQuery not allowed in pipeline mode
PQsendQuery not allowed in pipeline mode

This looks more like an issue on libpq's side as there's no way to
reset or advance the errorReported from ExecQueryAndProcessResults
(plus PQerrorMessage seems to ignore errorReported). I've added an
additional test to track this behaviour for now as this would probably
be better discussed in a dedicated thread.

v09-0002-Add-prompt-interpolation-and-variables-for-psql-.patch
Description: Binary data

v09-0001-Add-pipelining-support-in-psql.patch
Description: Binary data

Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning

On Tue, Feb 4, 2025 at 4:07 PM Ashutosh Bapat
 wrote:
>
> If we are not interested in saving memory, there is a simpler way to
> improve planning time by adding a hash table per equivalence class to
> store the derived clauses, instead of a linked list, when the number
> of derived clauses is higher than a threshold (say 32 same as the
> threshold for join_rel_list. Maybe that approach will yield stable
> planning time.
>

PFA patchset with conflict, with latest HEAD, resolved. I have dropped
0005 from the previous patchset since it didn't show any significant
performance difference. Other than these two things, the patchset is
same as the previous one.


-- 
Best Wishes,
Ashutosh Bapat
From 8704d0ca210e763547d4eea61b00d5a13cac8a9e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Thu, 26 Sep 2024 12:06:25 +0530
Subject: [PATCH 1/6] RestrictInfo hash table interface

Introduces the RestrictInfo hash table interface and adds a pointer to
RestrictInfo hash table in PlannerInfo. RestrictInfo::rinfo_serial and
RestrictInfo::required_relids are used as key into the hash table.

This commit does not add any code which uses the interface itself.
Adding a new member to PlannerInfo increases it size and may have slight
impact due to cacheline faults when accessing this huge structure.
Similarly the interface code increases object file size which again may
have some impact on performance. This is a separate commit to assess any
performance or memory impact this change has.

Ashutosh Bapat
---
 src/backend/optimizer/util/restrictinfo.c | 143 +-
 src/include/nodes/pathnodes.h |   3 +
 src/include/optimizer/restrictinfo.h  |   3 +
 src/tools/pgindent/typedefs.list  |   2 +
 4 files changed, 150 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c
index a80083d2323..9279062a877 100644
--- a/src/backend/optimizer/util/restrictinfo.c
+++ b/src/backend/optimizer/util/restrictinfo.c
@@ -19,7 +19,7 @@
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
 #include "optimizer/restrictinfo.h"
-
+#include "common/hashfn.h"
 
 static Expr *make_sub_restrictinfos(PlannerInfo *root,
 	Expr *clause,
@@ -676,3 +676,144 @@ join_clause_is_movable_into(RestrictInfo *rinfo,
 
 	return true;
 }
+
+/* 
+ * RestrictInfo hash table interface.
+ *
+ * For storing and retrieving child RestrictInfos. Though the interface is used
+ * for child RestrictInfos, it can be used for parent RestrictInfos as well.
+ * Hence the names of functions and structures do not mention child in it unless
+ * necessary.
+ *
+ * RestrictInfo::rinfo_serial is unique for a set of RestrictInfos, all of
+ * which, are versions of same condition.  RestrictInfo::required_relids
+ * identifies the relations to which the RestrictInfo is applicable.  Together
+ * they are used as a key into the hash table since they uniquely identify a
+ * RestictInfo.
+ * 
+ */
+
+/*
+ * Hash key for RestrictInfo hash table.
+ */
+typedef struct
+{
+	int			rinfo_serial;	/* Same as RestrictInfo::rinfo_serial */
+	Relids		required_relids;	/* Same as RestrictInfo::required_relids */
+} rinfo_tab_key;
+
+/* Hash table entry for RestrictInfo hash table. */
+typedef struct rinfo_tab_entry
+{
+	rinfo_tab_key key;			/* Key must be first. */
+	RestrictInfo *rinfo;
+} rinfo_tab_entry;
+
+/*
+ * rinfo_tab_key_hash
+ *		Computes hash of RestrictInfo hash table key.
+ */
+static uint32
+rinfo_tab_key_hash(const void *key, Size size)
+{
+	rinfo_tab_key *rtabkey = (rinfo_tab_key *) key;
+	uint32		result;
+
+	Assert(sizeof(rinfo_tab_key) == size);
+
+	/* Combine hashes of all components of the key. */
+	result = hash_bytes_uint32(rtabkey->rinfo_serial);
+	result = hash_combine(result, bms_hash_value(rtabkey->required_relids));
+
+	return result;
+}
+
+/*
+ * rinfo_tab_key_match
+ *		Match function for RestrictInfo hash table.
+ */
+static int
+rinfo_tab_key_match(const void *key1, const void *key2, Size size)
+{
+	rinfo_tab_key *rtabkey1 = (rinfo_tab_key *) key1;
+	rinfo_tab_key *rtabkey2 = (rinfo_tab_key *) key2;
+	int			result;
+
+	Assert(sizeof(rinfo_tab_key) == size);
+
+	result = rtabkey1->rinfo_serial - rtabkey2->rinfo_serial;
+	if (result)
+		return result;
+
+	return !bms_equal(rtabkey1->required_relids, rtabkey2->required_relids);
+}
+
+/*
+ * get_child_rinfo_hash
+ *		Returns the child RestrictInfo hash table from PlannerInfo, creating it if
+ *		necessary.
+ */
+static HTAB *
+get_child_rinfo_hash(PlannerInfo *root)
+{
+	if (!root->child_rinfo_hash)
+	{
+		HASHCTL		hash_ctl = {0};
+
+		hash_ctl.keysize = sizeof(rinfo_tab_key);
+		hash_ctl.entrysize = sizeof(rinfo_tab_entry);
+		hash_ctl.hcxt = root->planner_cxt;
+		hash_ctl.hash = rinfo_tab_key_hash;
+		hash_ctl.match = rinfo_tab_key_match;
+
+		root->ch

Re: pg_upgrade: Support for upgrading to checksums enabled

2025-02-20 Thread Robert Treat

On Tue, Aug 27, 2024 at 5:57 PM Nathan Bossart  wrote:
>
> On Mon, Aug 26, 2024 at 08:23:44AM +0200, Peter Eisentraut wrote:
> > The purpose of this patch is to allow using pg_upgrade between clusters that
> > have different checksum settings.  When upgrading between instances with
> > different checksum settings, the --copy (default) mode automatically sets
> > (or unsets) the checksum on the fly.
> >
> > This would be particularly useful if we switched to enabling checksums by
> > default, as [0] proposes, but it's also useful without that.
>
> Given enabling checksums can be rather expensive, I think it makes sense to
> add a way to do it during pg_upgrade versus asking folks to run
> pg_checksums separately.  I'd anticipate arguments against enabling
> checksums automatically, but as you noted, we can move it to a separate
> option (e.g., --copy --enable-checksums).  Disabling checksums with
> pg_checksums is fast because it just updates pg_control, so I don't see any
> need for --disable-checkums in pg_upgrade.
>

The repercussions of either enabling or disabling checksums on
accident is quite high (not for pg_upgrade, but for $futureDBA), so
ISTM an explicit flag for BOTH is the right way to go. In that
scenario, pg_upgrade would check to make sure the clusters match and
then make the appropriate suggestion. In the case someone did
something like --enable-checksums and --link, again, we'd toss an
error that --copy mode is required to --enable-checksums.

Robert Treat
https://xzilla.net

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-20 Thread Peter Smith

On Thu, Feb 20, 2025 at 10:26 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch quickly!
>
> > > 04.
> > > ```
> > > +# Verify that only user databases got subscriptions (not template 
> > > databases)
> > > +my @user_dbs = ($db1, $db2);
> > > +foreach my $dbname (@user_dbs)
> > > +{
> > > +   my $sub_exists =
> > > + $node_s1->safe_psql($dbname, "SELECT count(*) FROM
> > pg_subscription;");
> > > +   is($sub_exists, '3', "Subscription created successfully for 
> > > $dbname");
> > > +}
> > > ```
> > >
> > > Hmm, what do you want to check here? pg_subscription is a global catalog 
> > > so
> > that
> > > very loop will see exactly the same content. Also, 'postgres' is also the 
> > > user
> > database.
> > > I feel you must ensure that all three databases (postgres, $db1, and 
> > > $db2) have
> > a
> > > subscription here.
> > >
> >
> > Fixed.
>
> My point was that the loop does not have meaning because pg_subscription
> is a global one. I and Peter considered changes like [1] is needed. It ensures
> that each databases have a subscription.
> Note: [1] cannot pass the test as-is because $db1 and $db2 contains special
> characters. Please escape appropriately.

Yes. Some test is still needed to confirm the expected subscriptions
all get created for respective dbs. But, the current test loop just
isn't doing it properly.

>
> Other comments are listed in below.
>
> 01.
> ```
> + -a
> + --all-dbs
> ```
>
> Peter's comment [2] does not say that option name should be changed.
> The scope of his comment is only in the .c file.

Yes, that's correct. My v9 suggestion to change 'all' to 'all_dbs' was
referring only to the field name of struct CreateSubscriberOptions,
nothing else. Not the usage help, not the error messages, not the
docs, not the tests, not the commit message.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Add Pipelining support in psql

2025-02-20 Thread Michael Paquier

On Thu, Feb 20, 2025 at 10:29:33AM +0100, Anthonin Bonnefoy wrote:
> Ha yeah, I forgot about the inactive branches. I've added the new
> commands and fixed the behaviour.

And I did not notice that it was as simple as forcing the status in
the routines for the new meta-commands, as we do for the existing
ones.  Noted.

> This looks more like an issue on libpq's side as there's no way to
> reset or advance the errorReported from ExecQueryAndProcessResults
> (plus PQerrorMessage seems to ignore errorReported). I've added an
> additional test to track this behaviour for now as this would probably
> be better discussed in a dedicated thread.

I am not sure if we should change that, actually, as it does not feel
completely wrong to stack these errors.  That's a bit confusing,
sure.  Perhaps a new libpq API to retrieve stacked errors when we are
in pipeline mode would be more adapted?  The design would be
different.

Anyway, I've stared at the result processing code for a couple of
hours, and the branches we're taking for the pipeline modes seem to be
rather right the way you have implemented them.  The docs, comments
and tests needed quite a few tweaks and edits to be more consistent.
There were some grammar mistakes, some frenchisms.

I'm hoping that there won't be any issues, but let's be honest, I am
definitely sure there will be some more tuning required.  It comes
down to if we want this set of features, and I do to be able to expand
tests in core with the extended query protocol and pipelines, knowing
that there is an ask for out-of-core projects.  This one being
reachable with a COPY gave me a huge smile:
+message type 0x5a arrived from server while idle

So let's take one step here, I have applied the main patch.  I am
really excited by the possibilities all this stuff offers.

Attached are the remaining pieces, split here because they are
different bullet points:
- Tests for implicit transactions with various commands, with some
edits. 
- Prompt support, with more edits.

I'm putting these on standby for a few days, to let the buildfarm
digest the main change.
--
Michael
From 035f19d120b181094b00dee21acb0b4ce4cfc1e1 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 21 Feb 2025 11:27:36 +0900
Subject: [PATCH v10 1/2] Add tests with implicit transaction blocks and
 pipelines

This checks interactions with $subject for the following commands that
have dedicated behaviors in transaction blocks:
- SET LOCAL
- REINDEX CONCURRENTLY
- VACUUM
- Subtransactions
---
 src/test/regress/expected/psql_pipeline.out | 112 
 src/test/regress/sql/psql_pipeline.sql  |  72 +
 2 files changed, 184 insertions(+)

diff --git src/test/regress/expected/psql_pipeline.out src/test/regress/expected/psql_pipeline.out
index bdf5a99d09b0..f4603d2b66ae 100644
--- src/test/regress/expected/psql_pipeline.out
+++ src/test/regress/expected/psql_pipeline.out
@@ -585,5 +585,117 @@ PQsendQuery not allowed in pipeline mode
 1
 (1 row)
 
+--
+-- Pipelines and transaction blocks
+--
+-- SET LOCAL will issue a warning when modifying a GUC outside of a
+-- transaction block.  The change will still be valid as a pipeline
+-- runs within an implicit transaction block.  Sending a sync will
+-- commit the implicit transaction block. The first command after a
+-- sync will not be seen as belonging to a pipeline.
+\startpipeline
+SET LOCAL statement_timeout='1h' \bind \g
+SHOW statement_timeout \bind \g
+\syncpipeline
+SHOW statement_timeout \bind \g
+SET LOCAL statement_timeout='2h' \bind \g
+SHOW statement_timeout \bind \g
+\endpipeline
+WARNING:  SET LOCAL can only be used in transaction blocks
+ statement_timeout 
+---
+ 1h
+(1 row)
+
+ statement_timeout 
+---
+ 0
+(1 row)
+
+ statement_timeout 
+---
+ 2h
+(1 row)
+
+-- REINDEX CONCURRENTLY fails if not the first command in a pipeline.
+\startpipeline
+SELECT $1 \bind 1 \g
+REINDEX TABLE CONCURRENTLY psql_pipeline \bind \g
+SELECT $1 \bind 2 \g
+\endpipeline
+ ?column? 
+--
+ 1
+(1 row)
+
+ERROR:  REINDEX CONCURRENTLY cannot run inside a transaction block
+-- REINDEX CONCURRENTLY works if it is the first command in a pipeline.
+\startpipeline
+REINDEX TABLE CONCURRENTLY psql_pipeline \bind \g
+SELECT $1 \bind 2 \g
+\endpipeline
+ ?column? 
+--
+ 2
+(1 row)
+
+-- Subtransactions are not allowed in a pipeline.
+\startpipeline
+SAVEPOINT a \bind \g
+SELECT $1 \bind 1 \g
+ROLLBACK TO SAVEPOINT a \bind \g
+SELECT $1 \bind 2 \g
+\endpipeline
+ERROR:  SAVEPOINT can only be used in transaction blocks
+-- LOCK fails as the first command in a pipeline, as not seen in an
+-- implicit transaction block.
+\startpipeline
+LOCK psql_pipeline \bind \g
+SELECT $1 \bind 2 \g
+\endpipeline
+ERROR:  LOCK TABLE can only be used in transaction blocks
+-- LOCK succeeds as it is not the first command in a pipeline,
+-- seen in an implicit transaction block.
+\startpipeline
+SELECT $1 \bind 1 \g
+LOCK psql_p

Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2025-02-20 Thread Peter Smith

Hi Ajin,

Some review comments for patch v14-0001.

==
src/backend/replication/logical/reorderbuffer.c

1.
+ *   We also try and filter changes that are not relevant for logical decoding
+ *   as well as give the option for plugins to filter changes in advance.
+ *   Determining whether to filter a change requires information about the
+ *   relation from the catalog, requring a transaction to be started.
+ *   When most changes in a transaction are unfilterable, the overhead of
+ *starting a transaction for each record is significant. To reduce this
+ *overhead a hash cache of relation file locators is created. Even then a
+ *hash search for every record before recording has considerable overhead
+ *especially for use cases where most tables in an instance are
not filtered.
+ *To further reduce this overhead a simple approach is used to suspend
+ *filtering for a certain number of changes CHANGES_THRESHOLD_FOR_FILTER
+ *when an unfilterable change is encountered. In other words, continue
+ *filtering changes if the last record was filtered out. If an unfilterable
+ *change is found, skip filtering the next CHANGES_THRESHOLD_FOR_FILTER
+ *changes.
+ *

1a.
/try and filter/try to filter/

~

1b.
There is some leading whitespace problem happening (spaces instead of tabs?)

~

1c.
Minor rewording

SUGGESTION (e.g. anyway this should be identical to the commit message text)

Determining whether to filter a change requires information about the
relation and the publication from the catalog, which means a
transaction must be started. But, the overhead of starting a
transaction for each record is significant. To reduce this overhead a
hash cache of relation file locators is used to remember which
relations are filterable.

Even so, doing a hash search for every record has considerable
overhead, especially for scenarios where most tables in an instance
are published. To further reduce overheads a simple approach is used:
When an unfilterable change is encountered we suspend filtering for a
certain number (CHANGES_THRESHOLD_FOR_FILTER) of subsequent changes.
In other words, continue filtering until an unfilterable change is
encountered; then skip filtering the next CHANGES_THRESHOLD_FOR_FILTER
changes, before attempting filtering again.

~~~

2.
+/*
+ * After encountering a change that cannot be filtered out, filtering is
+ * temporarily suspended. Filtering resumes after processing every 100 changes.
+ * This strategy helps to minimize the overhead of performing a hash table
+ * search for each record, especially when most changes are not filterable.
+ */
+#define CHANGES_THRESHOLD_FOR_FILTER 100

Maybe you can explain where this magic number comes from.

SUGGESTION
The CHANGES_THRESHOLD_FOR_FILTER value of 100 was chosen as the best
trade-off value after performance tests were carried out using
candidate values 10, 50, 100, and 200.

~~~

ReorderBufferQueueChange:

3.
+ /*
+ * If filtering was suspended and we've crossed the change threshold,
+ * attempt to filter again
+ */
+ if (!rb->can_filter_change && (++rb->unfiltered_changes_count
+ >= CHANGES_THRESHOLD_FOR_FILTER))
+ {
+ rb->can_filter_change = true;
+ rb->unfiltered_changes_count = 0;
+ }
+

/If filtering was suspended/If filtering is currently suspended/

~~~

ReorderBufferGetRelation:

4.
+static Relation
+ReorderBufferGetRelation(ReorderBuffer *rb, RelFileLocator *rlocator,
+ bool has_tuple)

My suggested [2-#4] name change 'ReorderBufferGetRelationForDecoding'
is not done yet. I saw Kuroda-san also said this name was confusing
[1-#02], and suggested something similar
'GetPossibleDecodableRelation'.

~~~

RelFileLocatorCacheInvalidateCallback:

5.
+ /*
+ * If relid is InvalidOid, signaling a complete reset, we must remove
+ * all entries, otherwise just remove the specific relation's entry.
+ * Always remove negative cache entries.
+ */
+ if (relid == InvalidOid || /* complete reset */
+ entry->relid == InvalidOid || /* invalid cache entry */
+ entry->relid == relid) /* individual flushed relation */
+ {
+ if (hash_search(RelFileLocatorFilterCache,
+ &entry->key,
+ HASH_REMOVE,
+ NULL) == NULL)
+ elog(ERROR, "hash table corrupted");
+ }


5a.
IMO the relid *parameter* should be mentioned explicitly to
disambiguate relid from the entry->relid.

/If relid is InvalidOid, signaling a complete reset,/If a complete
reset is requested (when 'relid' parameter is InvalidOid),/

~

5b.
/Always remove negative cache entries./Remove any invalid cache
entries (these are indicated by invalid entry->relid)/

~~~

ReorderBufferFilterByRelFileLocator:

6.
I previously [2-#7] had suggested this function code could be
refactored to share some common return logic. It is not done, but OTOH
there is no reply, so I don't know if it was overlooked or simply
rejected.

==
src/include/replication/reorderbuffer.h

7.
+ /* should we try to filter the change? */
+ bool can_filter_change;
+
+ /* number of changes after a failed attem

RE: Statistics Import and Export

Dear members,

I hope I'm in the correct thread. I found the commit 1fd1bd8 - it is so cool.
I have a question for it.

ISTM commit message said that no need to do ANALYZE again.

```
Add support to pg_dump for dumping stats, and use that during
pg_upgrade so that statistics are transferred during upgrade. In most
cases this removes the need for a costly re-analyze after upgrade.
```

But pgupgrade.sgml [2] and source code [3] said that statistics must be updated.
Did I miss something, or you have been updating this?

[2]:
```
 Because optimizer statistics are not transferred by 
pg_upgrade, you will
 be instructed to run a command to regenerate that information at the end
 of the upgrade.  You might need to set connection parameters to
 match your new cluster.
```
[3]:
```
pg_log(PG_REPORT,
   "Optimizer statistics are not transferred by pg_upgrade.\n"
   "Once you start the new server, consider running:\n"
   "%s/vacuumdb %s--all --analyze-in-stages", 
new_cluster.bindir, user_specification.data);
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Extend postgres_fdw_get_connections to return remote backend pid

2025-02-20 Thread Fujii Masao





On 2025/02/21 0:54, Sagar Shedge wrote:

Hi Fujii,


Naming is always tricky, but remote_backend_pid feels a bit too long.

Would remote_pid be sufficient?
Point looks valid. I had another perspective is to align the naming
convention to pg_backend_pid(). remote_pid is not helping to identify
whether pid belongs to postgres backend or not. Does this make sense?
Or I'm fine to go with concise name like `remote_pid`


I initially thought "remote_pid" was sufficient since the postgres_fdw
connection clearly corresponds to a backend process. However, I'm fine
with keeping "remote_backend_pid" as the column name for now. If we find
a better name later, we can rename it.



How about: "PID of the remote backend handling the connection" instead?

Updated in v2.


Thanks for updating the patch!

You still need to update the documentation. Could you add descriptions
for postgres_fdw_get_connections()?



Wouldn't it be better to return the result of PQbackendPID() instead of NULL

even when the connection is closed, for debugging purposes? This way,
users can see which remote backend previously handled the "closed" connection,
which might be helpful for troubleshooting.
Agree. Updated logic to return backend pid always except when pid is 0
which indicates no backend attached to session. Returning 0 found
misleading. What's your thought on this?


Your approach makes sense to me.



The postgres_fdw regression test failed on my MacBook with the following diff:

I updated tests to make it more stable. Let me know if it's still
failing on your setup.


Yes, the regression test passed successfully on my machine.


--- dropped.
-SELECT server_name, user_name = CURRENT_USER as "user_name = CURRENT_USER", 
valid, used_in_xact, closed
+-- dropped. remote_backend_pid will continue to return available as it fetch 
remote
+-- server backend pid from cached connections.
+SELECT server_name, user_name = CURRENT_USER as "user_name = CURRENT_USER", 
valid, used_in_xact, closed,
+CASE WHEN remote_backend_pid IS NOT NULL then 'available' ELSE 'not available' 
END AS remote_backend_pid

Instead of checking whether remote_backend_pid is NOT NULL, how about
verifying that it actually matches a remote backend's PID? For example:

remote_backend_pid = ANY(SELECT pid FROM pg_stat_activity WHERE backend_type = 'client 
backend' AND pid <> pg_backend_pid()) AS "remote_backend_pid = remote 
pg_stat_activity.pid"


-SELECT CASE WHEN closed IS NOT true THEN 1 ELSE 0 END
+SELECT server_name, CASE WHEN closed IS NOT true THEN 'false' ELSE 'true' END 
AS remote_conn_closed,
+CASE WHEN remote_backend_pid IS NOT NULL then 'available' ELSE 'not available' 
END AS remote_backend_pid

Similarly, instead of checking if remote_backend_pid is NOT NULL,
how about verifying it against pg_stat_activity?

remote_backend_pid = (SELECT pid FROM pg_stat_activity WHERE 
application_name = 'fdw_conn_check')

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-02-20 Thread Peter Smith

On Thu, Feb 20, 2025 at 5:44 PM Shubham Khanna
 wrote:
>
> On Mon, Feb 17, 2025 at 9:49 AM Peter Smith  wrote:
> >
...
> > ==
> > 1 GENERAL. Option Name?
> >
> > Wondering why the patch is introducing more terminology like
> > "cleanup"; if we are dropping publications then why not say "drop"?
> > Also I am not sure if "existing" means anything because you cannot
> > cleanup/drop something that is not "existing".
> >
> > IOW, why not call this the "--drop-publications" option?
> >
>
> I have retained the option name as '--cleanup-existing-publications'
> for now. However, I understand the concern regarding terminology and
> will revise it in the next patch version once there is a consensus on
> the final name.
>

BTW, Is it even correct to assume that the user has to choose between
(a) clean everything or (b) clean nothing? That is why I felt
something that mentions only ‘publication’ gives more flexibility.
Later, when some DROP  feature might be added then we can
make additional --drop-XXX switches, or a --drop-all switch for
cleaning everything

//

Some more review comments for patch v8-0001

==
Commit message

1.
To prevent accidental removal of user-created publications on the subscriber,
this cleanup process only targets publications that existed on the source
server and were replicated to the subscriber.
Any manually created publications on the subscriber will remain intact.

~

Is this scenario (manually created publications on the subscriber)
possible to do?
If yes, then there should be a test case for it
If not, then maybe this whole paragraph needs rewording.

~~~

2.
This feature is optional and only takes effect when the
'--cleanup-existing-publications' option is explicitly specified.
Since publication cleanup is not required when upgrading streaming replication
clusters, this option provides users with the flexibility to enable or skip the
cleanup process as needed.

~

AFAICT, this is pretty much just saying that the switch is optional
and that the feature only does something when the switch is specified
by the user. But, doesn't all that just go without saying?

==
src/bin/pg_basebackup/pg_createsubscriber.c

3.
+ /* Drop all existing publications if requested */
+ if (drop_publications)
+ {
+ pg_log_info("Dropping all existing publications in database \"%s\"",
+ dbinfo[i].dbname);
+ drop_all_publications(conn, dbinfo[i].dbname);
+ }
+
  /*
  * Since the publication was created before the consistent LSN, it is
  * available on the subscriber when the physical replica is promoted.
  * Remove publications from the subscriber because it has no use.
+ * Additionally, drop publications existed before this command if
+ * requested.
  */
  drop_publication(conn, &dbinfo[i]);
~

3a.
The logic of this code seems strange - e.g. it is optionally dropping
all publications, and then dropping yet another one again. Won't that
be already dropped as part of the drop_all?

~

3b.
Anyway, perhaps the logic should be encapsulated in a new helper
check_and_drop_existing_publications() function to match the existing
'check_and_drop_existing_subscriptions()

~

3c.
I felt logs like "Dropping all existing publications in database
\"%s\"" should be logged within the function drop_all_publications,
not at the caller.

~~~

_drop_one_publication:

4.
-/*
- * Remove publication if it couldn't finish all steps.
- */
+/* Helper function to drop a single publication by name. */
 static void
-drop_publication(PGconn *conn, struct LogicalRepInfo *dbinfo)
+_drop_one_publication(PGconn *conn, const char *pubname, const char *dbname)

A better name for this helper might be drop_publication_by_name()

~~~

5.
+ appendPQExpBuffer(query, "DROP PUBLICATION %s", pubname);
+ pg_log_debug("Executing: %s", query->data);
+ pg_log_info("Dropping publication %s in database \"%s\"", pubname,
dbinfo->dbname);

That debug seems more typically written as:
pg_log_debug("command is: %s", query->data);

~~~

drop_all_publications:

6.
+/* Drop all publications in the database. */
+static void
+drop_all_publications(PGconn *conn, const char *dbname)
+{
+ PGresult   *res;
+ int count = 0;

The 'count' variable seems unused.

~~~

7.
+ for (int i = 0; i < PQntuples(res); i++)
+ {
+ char*pubname_esc = PQescapeIdentifier(conn, PQgetvalue(res, i, 0),
+ strlen(PQgetvalue(res, i, 0)));
+
+ _drop_one_publication(conn, pubname_esc, dbname);
+ PQfreemem(pubname_esc);
+ count++;
+ }

Instead of escaping the name here, and also escaping it in
drop_publication(), that could be done inside the common helper
function.

==
.../t/040_pg_createsubscriber.pl

8.
+# Create publications to test their removal
+$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;");
+$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;");

8a.
If you are going to number the publications then it would be better to
number all of them so there is a consistent naming.

e.g. test_pub1,test_pub2; instead of test_pub,test_pub2

Re: generic plans and "initial" pruning

2025-02-20 Thread Amit Langote

On Fri, Feb 21, 2025 at 3:04 PM Tom Lane  wrote:
>
> Amit Langote  writes:
> > I pushed the final piece yesterday.
>
> trilobite reports that this fails under -DCLOBBER_CACHE_ALWAYS:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2025-02-20%2019%3A37%3A12

Looking, thanks for the heads up.


-- 
Thanks, Amit Langote

Re: Missing [NO] INDENT flag in XMLSerialize backward parsing

2025-02-20 Thread Michael Paquier

On Thu, Feb 20, 2025 at 02:27:42PM +0100, Jim Jones wrote:
> This patch adds the missing [NO] INDENT flag to XMLSerialize backward
> parsing.

   if (xexpr->op == IS_XMLSERIALIZE)
+  {
   appendStringInfo(buf, " AS %s",
format_type_with_typemod(xexpr->type,
 xexpr->typmod));
+  if (xexpr->indent)
+  appendStringInfoString(buf, " INDENT");
+  else
+  appendStringInfoString(buf, " NO INDENT");
+  }

Good catch, we are forgetting this option in ruleutils.c.  Will fix 
down to v16 where this option has been introduced as you are
proposing, with NO INDENT showing up in the default case.  The three
expected outputs look OK as written..
--
Michael


signature.asc
Description: PGP signature

Re: Virtual generated columns

2025-02-20 Thread Richard Guo

On Thu, Feb 20, 2025 at 12:25 AM Dean Rasheed  wrote:
> On Wed, 19 Feb 2025 at 01:42, Dean Rasheed  wrote:
> > One thing I don't like about this is that it's introducing more code
> > duplication between pullup_replace_vars() and
> > ReplaceVarsFromTargetList(). Those already had a lot of code in common
> > before RETURNING OLD/NEW was added, and this is duplicating even more
> > code. I think it'd be better to refactor so that they share common
> > code, since it has become quite complex, and it would be better to
> > have just one place to maintain.

Yeah, it's annoying that the two replace_rte_variables callbacks have
so much code duplication.  I think it's a win to make them share
common code.  What do you think about making this refactor a separate
patch, as it doesn't seem directly related to the bug fix here?

> I've been doing some more testing of this, and attached is another
> update, improving a few comments and adding regression tests based on
> the cases discussed so far here.

Hmm, there are some issues with v4 as far as I can see.

* In pullup_replace_vars_callback, the varlevelsup of the newnode is
adjusted before its nullingrels is updated.  This can cause problems.
If the newnode is not a Var/PHV, we adjust its nullingrels with
add_nulling_relids, and this function only works for level-zero vars.
As a result, we may fail to put the varnullingrels into the
expression.

I think we should insist that ReplaceVarFromTargetList generates the
replacement expression with varlevelsup = 0, and that the caller is
responsible for adjusting the varlevelsup if needed.  This may need
some changes to ReplaceVarsFromTargetList_callback too.

* When expanding whole-tuple references, it is possible that some
fields are expanded as Consts rather than Vars, considering dropped
columns.  I think we need to check for this when generating the fields
for a RowExpr.

* The expansion of virtual generated columns occurs after subquery
pullup, which can lead to issues.  This was an oversight on my part.
Initially, I believed it wasn't possible for an RTE_RELATION RTE to
have 'lateral' set to true, so I assumed it would be safe to expand
virtual generated columns after subquery pullup.  However, upon closer
look, this doesn't seem to be the case: if a subquery had a LATERAL
marker, that would be propagated to any of its child RTEs, even for
RTE_RELATION child RTE if this child rel has sampling info (see
pull_up_simple_subquery).

* Not an issue but I think that maybe we can share some common code
between expand_virtual_generated_columns and
expand_generated_columns_internal on how we build the generation
expressions for a virtual generated column.

I've worked on these issues and attached are the updated patches.
0001 expands virtual generated columns in the planner.  0002 refactors
the code to eliminate code duplication in the replace_rte_variables
callback functions.

> One of the new regression tests fails, which actually appears to be a
> pre-existing grouping sets bug, due to the fact that only non-Vars are
> wrapped in PHVs. This bug can be triggered without virtual generated
> columns:

Interesting. I'll take a look at this issue.

Thanks
Richard

v5-0001-Expand-virtual-generated-columns-in-the-planner.patch
Description: Binary data

v5-0002-Eliminate-code-duplication-in-replace_rte_variables-callbacks.patch
Description: Binary data

Re: generic plans and "initial" pruning

Amit Langote  writes:
> I pushed the final piece yesterday.

trilobite reports that this fails under -DCLOBBER_CACHE_ALWAYS:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2025-02-20%2019%3A37%3A12

regards, tom lane

Re: create subscription with (origin = none, copy_data = on)

2025-02-20 Thread Shlok Kyal

On Thu, 20 Feb 2025 at 16:32, Zhijie Hou (Fujitsu)
 wrote:
>
> On Wednesday, January 29, 2025 8:19 PM Shlok Kyal  
> wrote:
> >
> > I have addressed the comments. Here is an updated patch.
>
> Thanks for updating the patch. The patches look mostly OK to me, I only have
> one minor comments in 0002.
>
> 1.
>
> +CREATE PUBLICATION pub_b_c_2 FOR TABLE tab_part2_1;
> +));
> +
> +($result, $stdout, $stderr) = $node_C->psql(
> +   'postgres', "
> +   CREATE SUBSCRIPTION sub_b_c CONNECTION '$node_B_connstr' PUBLICATION 
> pub_b_c WITH (origin = none, copy_data = on);
> +");
>
> The naming style of new publications and subscriptions doesn't seem consistent
> with existing ones in 030_origin.
>

I have addressed the comments and attached the updated v9 patch.
I have also combined the 0001 and 0002 patch and updated the commit
message as per off list discussion.

Thanks and Regards
Shlok Kyal
From 9ecf7c638c41c161da36b8862e26f42449d55641 Mon Sep 17 00:00:00 2001
From: Hou Zhijie 
Date: Tue, 21 Jan 2025 16:18:34 +0800
Subject: [PATCH v9] Fix a WARNING for data origin discrepancies.

Previously, a WARNING was issued at the time of defining a subscription
with origin=NONE only when the publisher subscribed to the same table from
other publishers, indicating potential data origination from different
origins. However, the publisher can subscribe to the partition ancestors
or partition children of the table from other publishers, which could also
result in mixed-origin data inclusion. So, give a WARNING in those cases
as well.

Reported-by: Sergey Tatarintsev 
Author: Hou Zhijie 
Author: Shlok Kyal 
Reviewed-by: Vignesh C 
Reviewed-by: Amit Kapila 
Backpatch-through: 16, where it was introduced
---
 doc/src/sgml/ref/create_subscription.sgml |  12 +--
 src/backend/commands/subscriptioncmds.c   |  15 ++--
 src/test/subscription/t/030_origin.pl | 104 ++
 3 files changed, 120 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 6cf7d4f9a1a..57dec28a5df 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -534,12 +534,14 @@ CREATE SUBSCRIPTION subscription_name
 # substitute  below with your publication name(s) to be queried
 SELECT DISTINCT PT.schemaname, PT.tablename
-FROM pg_publication_tables PT,
+FROM pg_publication_tables PT
+ JOIN pg_class C ON (C.relname = PT.tablename)
+ JOIN pg_namespace N ON (N.nspname = PT.schemaname),
  pg_subscription_rel PS
- JOIN pg_class C ON (C.oid = PS.srrelid)
- JOIN pg_namespace N ON (N.oid = C.relnamespace)
-WHERE N.nspname = PT.schemaname AND
-  C.relname = PT.tablename AND
+WHERE C.relnamespace = N.oid AND
+  (PS.srrelid = C.oid OR
+  C.oid IN (SELECT relid FROM pg_partition_ancestors(PS.srrelid) UNION
+SELECT relid FROM pg_partition_tree(PS.srrelid))) AND
   PT.pubname IN ();
 
 
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 2d8a71ca1e1..4aec73bcc6b 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -2083,11 +2083,12 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
 }
 
 /*
- * Check and log a warning if the publisher has subscribed to the same table
- * from some other publisher. This check is required only if "copy_data = true"
- * and "origin = none" for CREATE SUBSCRIPTION and
- * ALTER SUBSCRIPTION ... REFRESH statements to notify the user that data
- * having origin might have been copied.
+ * Check and log a warning if the publisher has subscribed to the same table,
+ * its partition ancestors (if it's a partition), or its partition children (if
+ * it's a partitioned table), from some other publishers. This check is
+ * required only if "copy_data = true" and "origin = none" for CREATE
+ * SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH statements to notify the
+ * user that data having origin might have been copied.
  *
  * This check need not be performed on the tables that are already added
  * because incremental sync for those tables will happen through WAL and the
@@ -2117,7 +2118,9 @@ check_publications_origin(WalReceiverConn *wrconn, List *publications,
 		   "SELECT DISTINCT P.pubname AS pubname\n"
 		   "FROM pg_publication P,\n"
 		   " LATERAL pg_get_publication_tables(P.pubname) GPT\n"
-		   " JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+		   " JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid OR"
+		   " GPT.relid IN (SELECT relid FROM pg_partition_ancestors(PS.srrelid) UNION"
+		   "   SELECT relid FROM pg_partition_tree(PS.srrelid))),\n"
 		   " pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
 		   "WHERE C.oid = GPT.relid AND P.pubname IN (");
 	GetPublicationsStr(publications, &cmd, true);
diff --git a/sr

Re: Proposal - Allow extensions to set a Plan Identifier

2025-02-20 Thread Sami Imseih

I put together patches to do as is being proposed.

v1-0001:

1. Adds a planId field in PlannedStmt
2. Added an st_plan_id fields in PgBackendStatus
3. APIs to report and to retrieve a planId to PgBackendStatus

An extension is able to set a planId in PlannedStmt directly,
and while they can do the same for PgBackendStatus, I felt it is better
to provide similar APIs for this as we do for queryId. Unless the extension
passed force=true to the API, this will ensure that a planId already set
either by a top-level statement or by another extension cannot be
set when a plan is active.

Also, the extension does not need to worry about resetting the planId at
the end of execution as this will follow the same mechanism as is currently
being done for queryId. This will remove responsibility from the extension
author to have to manage this aspect.

The extension should normally compute the planId in the planner or ExecutorStart
hook. If the planId is computed in the planner_hook, the extension can set the
planId in plannedStmt making the planId available to subsequent executions of
a cached plan.

What this patch does not cover is adding a "Plan Identifier" to explain output
or to logs. Also, there is no core GUC like compute_query_id, since it does not
make sense if we're not computing a planId in core.

v2-0001:

This adds a planId to pg_stat_get_activity ( not pg_stat_activity ).
An extension
can offer its own view, similar to pg_stat_activity, which can include planId.

Note that there are no documentation updates required here as we don't
have per-field documentation for pg_stat_get_activity.

These 2 patches can probably be combined , but will leave them like
this for now.

Looking forward to feedback.

Regards

Sami


v1-0002-add-planId-to-pg_stat_get_activity.patch
Description: Binary data


v1-0001-Allow-plugins-to-set-a-64-bit-plan-identifier.patch
Description: Binary data

Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2025-02-20 Thread Amit Kapila

On Fri, Feb 21, 2025 at 7:57 AM Ajin Cherian  wrote:
>
> On Fri, Feb 21, 2025 at 12:57 PM Ajin Cherian  wrote:
> >
> > Conclusion:
> > The patched code with 100 transaction throttling significantly
> > improves performance, reducing execution time by ~69% when no
> > published transactions are involved, ~43% with partial published
> > transactions, and ~15% in all published transactions.
> > Attaching a graph showing the performance differences.
> >
>
> In these tests, I also see an increased performance with the patch
> even when all transactions are published. I will investigate why this
> happens and update.
>

Yes, it is important to investigate this because in the best case, it
should match with HEAD. One thing you can verify is whether the
changes processed on the server are exactly for the published table,
it shouldn't happen that it is processing both published and
unpublished changes. If the server is processing for both tables then
it is expected that the patch performs better. I think you can verify
before starting each test and after finishing each test whether the
slot is pointing at the appropriate location for the next test or
create a new slot for each with the required location.

-- 
With Regards,
Amit Kapila.

Re: Virtual generated columns

2025-02-20 Thread jian he

On Wed, Feb 19, 2025 at 11:25 PM Dean Rasheed  wrote:
>
> One of the new regression tests fails, which actually appears to be a
> pre-existing grouping sets bug, due to the fact that only non-Vars are
> wrapped in PHVs. This bug can be triggered without virtual generated
> columns:
>
> CREATE TABLE t (a int, b int);
> INSERT INTO t VALUES (1, 1);
>
> SELECT * FROM (SELECT a, a AS b FROM t) AS vt
> GROUP BY GROUPING SETS (a, b)
> HAVING b = 1;
>
>  a | b
> ---+---
>  1 |
> (1 row)
>
> whereas the result should be
>
>  a | b
> ---+---
>|  1
> (1 row)
>
> For reference, this code dates back to 90947674fc.
>

sorry for the noise.
i misunderstood your message.
you’ve already mentioned this problem.

in struct pullup_replace_vars_context
adding a field (bool wrap_vars) and setting it appropriately in
function pullup_replace_vars_callback
seems to solve this problem.


v4-0001-fix-expanding-virtual-generated-columns-with-g.no-cfbot
Description: Binary data

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

On Wed, Feb 19, 2025 at 4:23 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch!
>
> > I have added a test case for non-dry-run mode to ensure that
> > replication slots, subscriptions, and publications work as expected
> > when '--all' is specified. Additionally, I have split the test file
> > into two parts:
> > 1) Command-line sanity checks – Validates the correctness of option parsing.
> > 2) '--all' functionality and behavior – Verifies the health of
> > subscriptions and ensures that --all specific scenarios, such as
> > non-template databases not being subscribed to, are properly handled.
> > This should improve test coverage while keeping the structure manageable.
>
> TBH, I feel your change does not separate the test file into the two parts. 
> ISTM
> you just added some validation checks and a test how --all option works.
> Ashutosh, does it match your suggestion?
>

Thanks Hayato for pointing it out. The test changes don't match my
expectations. As you rightly pointed out, I expected two (actually
three if needed) separate test files one for argument validation and
one for testing --database scenarios (the existing tests) and one more
for testing same scenarios when --all is specified. Right now all it
does is "# Verify that only user databases got subscriptions (not
template databases)". I expected testing the actual replication as
well like tests between lines around 527 and 582 in the latest
patchset. Those tests are performed when --database is subscribed. We
need similar tests performed when --all is specified. I didn't find
any of those tests being performed on node_s2. Given that the tests
for --databases and --all will be very similar, having them in the
same test file makes more sense. We also seem to be missing
$node_s2->teardown_node, do we?

-- 
Best Wishes,
Ashutosh Bapat

Re: GetRelationPath() vs critical sections

On Fri, Feb 21, 2025 at 9:28 AM Andres Freund  wrote:
> The patch curently uses a hardcoded 6 for the length of MAX_BACKENDS. Does
> anybody have a good idea for how to either
>
> a) derive 6 from MAX_BACKENDS in a way that can be used in a C array size

Do we even check the binary digits?  BTW I see a place in lwlock.c
that still talks about 2^23-1, looks like it is out of date.  Hmmm, I
wonder if it would be better to start by declaring how many bits we
want to use, given that is our real constraint.  And then:

#define PROCNUMBER_BITS 18
#define MAX_BACKENDS ((1 << PROCNUMBER_BITS) - 1)
#define PROCNUMBER_CHARS DECIMAL_DIGITS_FOR_BITS(PROCNUMBER_BITS)

... with a little helper ported to preprocessor hell from Hacker's
Delight magic[1] for that last bit.  See attached.  But if that's a
bit too nuts...

> b) Use a static assert to check that it fits?

Right, easy stuff like sizeof(CppString2(MAX_BACKENDS)) - 1 can only
work if the token is a decimal number.  I suppose you could just use
constants:

#define MAX_BACKENDS 0x3
#define PROCNUMBER_BITS 18
#define PROCNUMBER_CHARS 6

... and then use the previous magic to statically assert their
relationship inside one translation unit, to keep it out of a widely
included header.

[1] 
https://lemire.me/blog/2021/06/03/computing-the-number-of-digits-of-an-integer-even-faster/
#include 

#define DECIMAL_DIGITS_TABLE(i) \
  (i == 0 ? 9 : \
   i == 1 ? 99 : \
   i == 2 ? 999 : \
   i == 3 ?  : \
   i == 4 ? 9 : \
   i == 5 ? 99 : \
   i == 6 ? 999 : \
   i == 7 ?  : \
   i == 8 ? 9 : 0 )
#define DECIMAL_DIGITS_FOR_BITS(bits) \
	(1 + \
	 ((9 * (bits)) >> 5) + \
	 (((1 << (bits)) - 1) > DECIMAL_DIGITS_TABLE((9 * (bits)) >> 5)))

#define PROCNUMBER_BITS 18
#define MAX_BACKENDS ((1 << PROCNUMBER_BITS) - 1)
#define PROCNUMBER_CHARS DECIMAL_DIGITS_FOR_BITS(PROCNUMBER_BITS)

int
main()
{
	printf("PROCNUMBER_BITS = %d\n", PROCNUMBER_BITS);
	printf("PROCNUMBER_CHARS = %d\n", PROCNUMBER_CHARS);
	printf("MAX_BACKENDS = %x\n", MAX_BACKENDS);

	for (unsigned int i = 1; i <= 32; ++i)
		printf("bits = %u, max = %u, digits = %d\n",
			   i,
			   (1 << i) - 1,
			   DECIMAL_DIGITS_FOR_BITS(i));
}

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

On Fri, Feb 21, 2025 at 5:18 AM Peter Smith  wrote:
>
> On Thu, Feb 20, 2025 at 10:26 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear Shubham,
> >
> > Thanks for updating the patch quickly!
> >
> > > > 04.
> > > > ```
> > > > +# Verify that only user databases got subscriptions (not template 
> > > > databases)
> > > > +my @user_dbs = ($db1, $db2);
> > > > +foreach my $dbname (@user_dbs)
> > > > +{
> > > > +   my $sub_exists =
> > > > + $node_s1->safe_psql($dbname, "SELECT count(*) FROM
> > > pg_subscription;");
> > > > +   is($sub_exists, '3', "Subscription created successfully for 
> > > > $dbname");
> > > > +}
> > > > ```
> > > >
> > > > Hmm, what do you want to check here? pg_subscription is a global 
> > > > catalog so
> > > that
> > > > very loop will see exactly the same content. Also, 'postgres' is also 
> > > > the user
> > > database.
> > > > I feel you must ensure that all three databases (postgres, $db1, and 
> > > > $db2) have
> > > a
> > > > subscription here.
> > > >
> > >
> > > Fixed.
> >
> > My point was that the loop does not have meaning because pg_subscription
> > is a global one. I and Peter considered changes like [1] is needed. It 
> > ensures
> > that each databases have a subscription.
> > Note: [1] cannot pass the test as-is because $db1 and $db2 contains special
> > characters. Please escape appropriately.
>
> Yes. Some test is still needed to confirm the expected subscriptions
> all get created for respective dbs. But, the current test loop just
> isn't doing it properly.
>
> >
> > Other comments are listed in below.
> >
> > 01.
> > ```
> > + -a
> > + --all-dbs
> > ```
> >
> > Peter's comment [2] does not say that option name should be changed.
> > The scope of his comment is only in the .c file.
>
> Yes, that's correct. My v9 suggestion to change 'all' to 'all_dbs' was
> referring only to the field name of struct CreateSubscriberOptions,
> nothing else. Not the usage help, not the error messages, not the
> docs, not the tests, not the commit message.

+1. We don't want yet another option to specify all databases. :)

-- 
Best Wishes,
Ashutosh Bapat

pg_recvlogical requires -d but not described on the documentation

Dear hackers,

Hi, I found the issue $SUBJECT. According to the doc [1]:

```
-d dbname
--dbname=dbname
The database to connect to. See the description of the actions for what this 
means in detail.
The dbname can be a connection string. If so, connection string parameters will 
override any
conflicting command line options. Defaults to the user name.
```

IIUC final statement implies --dbname can be omitted. If it is mandatory, it 
should be described
in the synopsis part - not written now. However, the command would fail when -d 
is missed.

```
$ pg_recvlogical -U postgres --start -S test -f -
pg_recvlogical: error: no database specified
pg_recvlogical: hint: Try "pg_recvlogical --help" for more information.
```

ISTM the inconsistency is introduced since the initial commit. I think they 
should be unified either
1) update the doc or 2) accept when -d is not specified. Personally, I like 2nd 
approach, pg_recvlogical
can follow the normal connection rule. I.e., 

a) Use PGDATABASE if specified
b) Check 'U' and PGUSER and use it if specified
c) Otherwise use the current OS user name

Attached patch tries to implement the 2nd approach. How do you think?

[1]: https://www.postgresql.org/docs/devel/app-pgrecvlogical.html

Best regards,
Hayato Kuroda
FUJITSU LIMITED



0001-pg_recvlogical-accept-if-d-is-not-specified.patch
Description: 0001-pg_recvlogical-accept-if-d-is-not-specified.patch

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-20 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 20 Feb 2025 15:28:26 -0800,
  Masahiko Sawada  wrote:

> Looking at the 0001 patch again, I have a question: we have
> CopyToTextLikeOneRow() for both CSV and text format:
> 
> +/* Implementation of the per-row callback for text format */
> +static void
> +CopyToTextOneRow(CopyToState cstate, TupleTableSlot *slot)
> +{
> +   CopyToTextLikeOneRow(cstate, slot, false);
> +}
> +
> +/* Implementation of the per-row callback for CSV format */
> +static void
> +CopyToCSVOneRow(CopyToState cstate, TupleTableSlot *slot)
> +{
> +   CopyToTextLikeOneRow(cstate, slot, true);
> +}
> 
> These two functions pass different is_csv value to that function,
> which is used as follows:
> 
> +   if (is_csv)
> +   CopyAttributeOutCSV(cstate, string,
> +
>  cstate->opts.force_quote_flags[attnum - 1]);
> +   else
> +   CopyAttributeOutText(cstate, string);
> 
> However, we can know whether the format is CSV or text by checking
> cstate->opts.csv_mode instead of passing is_csv. That way, we can
> directly call CopyToTextLikeOneRow() but not via CopyToCSVOneRow() or
> CopyToTextOneRow(). It would not help performance since we already
> inline CopyToTextLikeOneRow(), but it looks simpler.

This means the following, right?

1. We remove CopyToTextOneRow() and CopyToCSVOneRow()
2. We remove "bool is_csv" parameter from CopyToTextLikeOneRow()
   and use cstate->opts.csv_mode in CopyToTextLikeOneRow()
   instead of is_csv
3. We use CopyToTextLikeOneRow() for
   CopyToRoutineText::CopyToOneRow and
   CopyToRoutineCSV::CopyToOneRow

If we use this approach, we can't remove the following
branch in compile time:

+   if (is_csv)
+   CopyAttributeOutCSV(cstate, string,
+   
cstate->opts.force_quote_flags[attnum - 1]);
+   else
+   CopyAttributeOutText(cstate, string);

We can remove the branch in compile time with the current
approach (constant argument + inline).

It may have a negative performance impact because the "if"
is used many times with large data. (That's why we choose
the constant argument + inline approach in this thread.)

Thanks,
-- 
kou

Re: generic plans and "initial" pruning

2025-02-20 Thread Amit Langote

On Wed, Feb 12, 2025 at 8:53 PM Amit Langote  wrote:
> On Thu, Feb 6, 2025 at 11:35 AM Amit Langote  wrote:
> > Per cfbot-ci, the new test case output in 0002 needed to be updated.
> >
> > I plan to push 0001 tomorrow, barring any objections.
>
> I pushed that last Friday. With bb3ec16e, d47cbf47, and cbc12791 now in:
>
> * Pruning information is now stored separately from parent plan nodes
> in PlannedStmt.
>
> * Initial runtime pruning occurs as a separate step, independent of
> and before plan initialization in InitPlan().
>
> * The RT indexes of unprunable relations and those of partitions that
> survive initial pruning are stored in a global bitmapset in EState,
> allowing us to avoid work that was previously done for pruned
> partitions. This was difficult before because initial pruning wasn’t
> performed before the parent plan node was initialized, meaning that
> the work we aimed to save had already been done.
>
> The final remaining piece is to skip taking locks on partitions pruned
> during initial pruning, and the attached patch addresses that.
>
> I’d like to commit the patch next week, barring objections.

I pushed the final piece yesterday.

Thank you all who have commented on this thread, reviewed the patches
in its various incarnations, and offered advice here or offlist.

-- 
Thanks, Amit Langote

Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding

2025-02-20 Thread Ajin Cherian

On Fri, Feb 21, 2025 at 12:57 PM Ajin Cherian  wrote:
>
> On Thu, Feb 20, 2025 at 3:08 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear Ajin,
> >
> > > I compared the patch 1 which does not employ a hash cache and has the
> > > overhead of starting a transaction every time the filter is checked.
> > >
> > > I created a test setup of 10 million inserts in 3 different scenarios:
> > > 1. All inserts on unpublished tables
> > > 2. Half of the inserts on unpublished table and half on pupblished table
> > > 3. All inserts on published tables.
> > >
> > > The percentage improvement in the new optimized patch compared to the
> > > old patch is:
> > >
> > > No transactions in publication: 85.39% improvement
> > > Half transactions in publication: 72.70% improvement
> > > All transactions in publication: 48.47% improvement
> > >
> > > Attaching a graph to show the difference.
> >
> > I could not find any comparisons with HEAD. Can you clarify the 
> > throughput/latency/memory
> > usage with HEAD?
>
> Here's the difference in latency with head. Again 10 million inserts
> in 3 scenarios: All transactions on unpublished tables, half of the
> transactions on unpublished tables and all transactions on published
> tables
> Conclusion:
> The patched code with 100 transaction throttling significantly
> improves performance, reducing execution time by ~69% when no
> published transactions are involved, ~43% with partial published
> transactions, and ~15% in all published transactions.
> Attaching a graph showing the performance differences.
>

In these tests, I also see an increased performance with the patch
even when all transactions are published. I will investigate why this
happens and update.

regards,
Ajin Cherian
Fujitsu Australia

RE: ReplicationSlotRelease() crashes when the instance is in the single user mode

Dear Amit,

> Also, what about pg_replication_origin_* APIs? Do we want to restrict
> those as well if we are restricting slot operations? I don't see any
> solid theory presented in this thread on why we should add new checks
> in multiple APIs restricting those in single-user mode.

As David [1] and documentation [2] described, single user mode is typically used
for initialization, debugging and recovery purpose - not for normal purposes.
I think doing replication stuffs is not intended. Tom also considered like that 
[4].

The error I reported seemed to be introduced seven years ago (0ce5cf2), and no
one has reported till now. This also implies that there are no reasons to 
support
it.

Ideally, functions described in [5] can be banned in the single-user mode, 
except
the pg_drop_replication_slot(). Thought?

[1]: 
https://www.postgresql.org/message-id/CAKFQuwbnBkGZAM%2B5b1DWmbqU5W7b1r-nQsw87BukrUC5WLrJXg%40mail.gmail.com
[2]: https://www.postgresql.org/docs/devel/app-postgres.html
[3]: 
https://www.postgresql.org/message-id/3b2f809f-326c-38dd-7a9e-897f957a4eb1%40enterprisedb.com
[4]: https://www.postgresql.org/message-id/25024.1525789919%40sss.pgh.pa.us
[5]: 
https://www.postgresql.org/docs/devel/functions-admin.html#FUNCTIONS-REPLICATION

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: Enhancing Memory Context Statistics Reporting

2025-02-20 Thread Rahila Syed

Hi,

Please find attached the updated patches after some cleanup and test
fixes.

Thank you,
Rahila Syed

On Tue, Feb 18, 2025 at 6:35 PM Rahila Syed  wrote:

> Hi,
>
>>
>> Thanks for updating the patch!
>>
>> The below comments would be a bit too detailed at this stage, but I’d
>> like to share the points I noticed.
>>
>> Thanks for sharing the detailed comments. I have incorporated some of them
> into the new version of the patch. I will include the rest when I refine
> and
> comment the code further.
>
> Meanwhile, I have fixed the following outstanding issues:
>
> 1.  Currently one DSA  is created per backend when the first request for
>> statistics is made and remains for the lifetime of the server.
>> I think I should add logic to periodically destroy DSAs, when memory
>> context statistics are not being *actively* queried from the backend,
>> as determined by the statistics timestamp.
>>
>
> After an offline discussion with Andres and Tomas, I have fixed this to
> use
> only one DSA for all the publishing backends/processes. Each backend
>  allocates smaller chunks of memory within the DSA while publishing
> statistics.
> These chunks are tracked independently by each backend, ensuring that two
> publishing backends/processes do not block each other despite using the
> same
> DSA. This approach eliminates the overhead of creating multiple DSAs,
> one for each backend.
>
> I am not destroying the DSA area because it stores the previously
> published
> statistics for each process. This allows the system to display older
> statistics
> when the latest data cannot be retrieved within a reasonable time.
> Only the most recently updated statistics are kept, while all earlier ones
> are freed using dsa_free by each backend when they are no longer needed.
> .
>
>> 2. The two issues reported by Fujii-san here: [1].
>> i. I have proposed a fix for the first issue here [2].
>> ii. I am able to reproduce the second issue. This happens when we try
>> to query statistics of a backend running infinite_recurse.sql. While I am
>> working on finding a root-cause, I think it happens due to some memory
>> being overwritten due to to stack-depth violation, as the issue is not
>> seen
>> when I reduce the max_stack_depth to 100kb.
>>  }
>>  }
>>
>
> The second issue is also resolved by using smaller allocations within a
> DSA.
> Previously, it occurred because a few statically allocated strings were
> placed
> within a single large chunk of DSA allocation. I have changed this to use
> dynamically allocated chunks with dsa_allocate0 within the same DSA.
>
> Please find attached updated and rebased patches.
>
> Thank you,
> Rahila Syed
>


v14-0002-Function-to-report-memory-context-statistics.patch
Description: Binary data


v14-0001-Preparatory-changes-for-reporting-memory-context-sta.patch
Description: Binary data

Missing [NO] INDENT flag in XMLSerialize backward parsing

2025-02-20 Thread Jim Jones

Hi,

This patch adds the missing [NO] INDENT flag to XMLSerialize backward
parsing. For example:

CREATE VIEW v1 AS
SELECT
  xmlserialize(
    DOCUMENT '42'::xml AS text
    INDENT);

\sv v1
CREATE OR REPLACE VIEW public.v1 AS
 SELECT XMLSERIALIZE(DOCUMENT '42'::xml AS
 text INDENT) AS "xmlserialize"

SELECT * FROM v1;
  xmlserialize
-
   +
   42+
 
(1 row)


The NO INDENT flag is added by default if no explicit indentation
flag was originally provided:

CREATE VIEW v2 AS
SELECT
  xmlserialize(
    DOCUMENT '42'::xml AS text
    NO INDENT);

\sv v2
CREATE OR REPLACE VIEW public.v2 AS
 SELECT XMLSERIALIZE(DOCUMENT '42'::xml AS text NO
INDENT) AS "xmlserialize"


CREATE VIEW v3 AS
SELECT
  xmlserialize(
    DOCUMENT '42'::xml AS text);

\sv v3
CREATE OR REPLACE VIEW public.v3 AS
 SELECT XMLSERIALIZE(DOCUMENT '42'::xml AS text NO
INDENT) AS "xmlserialize"


Regression tests were updated accordingly.

Best regards, Jim

From ff1b261ef74ae381eb946ec675e0f116c728cee0 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 20 Feb 2025 14:08:04 +0100
Subject: [PATCH v1] Fix missing [NO] INDENT flag in XMLSerialize backward
 parsing

This patch adds the missing [NO] INDENT flag to XMLSerialize
backward parsing. For example:

CREATE VIEW v1 AS
SELECT
  xmlserialize(
DOCUMENT '42'::xml AS text
INDENT);

\sv v1
CREATE OR REPLACE VIEW public.v1 AS
 SELECT XMLSERIALIZE(DOCUMENT '42'::xml AS
 text INDENT) AS "xmlserialize"

SELECT * FROM v1;
  xmlserialize
-
   +
   42+
 
(1 row)

The NO INDENT flag is added by default if no explicit indentation
flag was originally provided. Regression tests have been updated.

Regression tests were updated accordingly.
---
 src/backend/utils/adt/ruleutils.c   |  7 +++
 src/test/regress/expected/xml.out   | 16 ++--
 src/test/regress/expected/xml_1.out | 10 ++
 src/test/regress/expected/xml_2.out | 16 ++--
 src/test/regress/sql/xml.sql|  2 ++
 5 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 54dad97555..d11a8a20ee 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -10142,9 +10142,16 @@ get_rule_expr(Node *node, deparse_context *context,
 	}
 }
 if (xexpr->op == IS_XMLSERIALIZE)
+{
 	appendStringInfo(buf, " AS %s",
 	 format_type_with_typemod(xexpr->type,
 			  xexpr->typmod));
+	if (xexpr->indent)
+		appendStringInfoString(buf, " INDENT");
+	else
+		appendStringInfoString(buf, " NO INDENT");
+}
+
 if (xexpr->op == IS_DOCUMENT)
 	appendStringInfoString(buf, " IS DOCUMENT");
 else
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 2e9616acda..bcc743f485 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -822,21 +822,25 @@ CREATE VIEW xmlview6 AS SELECT xmlpi(name foo, 'bar');
 CREATE VIEW xmlview7 AS SELECT xmlroot(xml '', version no value, standalone yes);
 CREATE VIEW xmlview8 AS SELECT xmlserialize(content 'good' as char(10));
 CREATE VIEW xmlview9 AS SELECT xmlserialize(content 'good' as text);
+CREATE VIEW xmlview10 AS SELECT xmlserialize(document '42' AS text indent);
+CREATE VIEW xmlview11 AS SELECT xmlserialize(document '42' AS character varying no indent);
 SELECT table_name, view_definition FROM information_schema.views
   WHERE table_name LIKE 'xmlview%' ORDER BY 1;
- table_name |  view_definition   
-+
+ table_name |view_definition
++---
  xmlview1   |  SELECT xmlcomment('test'::text) AS xmlcomment;
+ xmlview10  |  SELECT XMLSERIALIZE(DOCUMENT '42'::xml AS text INDENT) AS "xmlserialize";
+ xmlview11  |  SELECT (XMLSERIALIZE(DOCUMENT '42'::xml AS character varying NO INDENT))::character varying AS "xmlserialize";
  xmlview2   |  SELECT XMLCONCAT('hello'::xml, 'you'::xml) AS "xmlconcat";
  xmlview3   |  SELECT XMLELEMENT(NAME element, XMLATTRIBUTES(1 AS ":one:", 'deuce' AS two), 'content&') AS "xmlelement";
- xmlview4   |  SELECT XMLELEMENT(NAME employee, XMLFOREST(name AS name, age AS age, salary AS pay)) AS "xmlelement" +
+ xmlview4   |  SELECT XMLELEMENT(NAME employee, XMLFOREST(name AS name, age AS age, salary AS pay)) AS "xmlelement"+
 |FROM emp;
  xmlview5   |  SELECT XMLPARSE(CONTENT 'x'::text STRIP WHITESPACE) AS "xmlparse";
  xmlview6   |  SELECT XMLPI(NAME foo, 'bar'::text) AS "xmlpi";

Re: Non-text mode for pg_dumpall

2025-02-20 Thread Mahendra Singh Thalor

On Thu, 20 Feb 2025 at 14:48, jian he  wrote:
>
> hi.
> about 0001
>
> /*
>  * connectDatabase
>  *
>  * Make a database connection with the given parameters.  An
>  * interactive password prompt is automatically issued if required.
>  *
>  * If fail_on_error is false, we return NULL without printing any message
>  * on failure, but preserve any prompted password for the next try.
>  *
>  * On success, the global variable 'connstr' is set to a connection string
>  * containing the options used.
>  */
> PGconn *
> connectDatabase(const char *dbname, const char *connection_string,
> const char *pghost, const char *pgport, const char
*pguser,
> trivalue prompt_password, bool fail_on_error, const
> char *progname,
> const char **connstr, int *server_version)
> do the comments need to change? since no
> global variable 'connstr' in common_dumpall_restore.c
> maybe we need some words to explain server_version, (i don't have a
> huge opinion though).

Fixed.

>
>
>
/*-
>  *
>  * common_dumpall_restore.c
>  *
>  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
>  * Portions Copyright (c) 1994, Regents of the University of California
>  *
>  * This is a common file for pg_dumpall and pg_restore.
>  * src/bin/pg_dump/common_dumpall_restore.c
>  *
>
 *-
>  */
>
> may change to
>
>
/*-
>  *
>  * common_dumpall_restore.c
>  * This is a common file for pg_dumpall and pg_restore.
>  *
>  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
>  * Portions Copyright (c) 1994, Regents of the University of California
>  *
>  * IDENTIFICATION
>  *src/bin/pg_dump/common_dumpall_restore.c
>  *
>
 *-
>  */
> so the style aligns with most other files.

Fixed.

> (we can apply the same logic to src/bin/pg_dump/common_dumpall_restore.h)

We are already doing the same in the .h file.

>
>
> in src/bin/pg_dump/pg_dumpall.c
> #include "common_dumpall_restore.h"
> imply include "pg_backup.h".
> so in src/bin/pg_dump/pg_dumpall.c, we don't need include "pg_backup.h"

Fixed. Also I removed some extra .h files from the patch.

>
>
> attached are minor cosmetic changes for v19.

- /* return number of errors */
> - if (AH->n_errors)
> - n_errors = AH->n_errors;
> -
>   /* AH may be freed in CloseArchive? */
>   CloseArchive(AH);

As per this comment, we can't return AH->n_errors as this might already be
freed so we should copy before CloseArchive.

Here, I am attaching updated patches for review and testing.

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


v20_0001_move-common-code-of-pg_dumpall-and-pg_restore-to-new_file.patch
Description: Binary data


v20_0002_pg_dumpall-with-non-text_format-20th_feb.patch
Description: Binary data

Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY

2025-02-20 Thread Міхаіл Нікалаеў

Hello!

Just rebased.
From 2db0af6a7b5ba485464ad5a59ce106a6e438d41a Mon Sep 17 00:00:00 2001
From: nkey 
Date: Thu, 20 Feb 2025 14:50:58 +0300
Subject: [PATCH v8 2/4] Modify the infer_arbiter_indexes function to consider 
 both indisvalid and indisready indexes. Ensure that at least one indisvalid 
 index is still required.

The change ensures that all concurrent transactions utilize the same set of indexes as arbiters. This uniformity is required to avoid conditions that could lead to "duplicate key value violates unique constraint" errors during UPSERT operations.

The patch resolves the issues in the following specs:
* reindex_concurrently_upsert
* index_concurrently_upsert
* index_concurrently_upsert_predicate

Despite the patch, the following specs are still affected:
* reindex_concurrently_upsert_partitioned
* reindex_concurrently_upsert_on_constraint
---
 src/backend/optimizer/util/plancat.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index 71abb01f655..b6b55b7cbff 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -720,6 +720,7 @@ infer_arbiter_indexes(PlannerInfo *root)
 
 	/* Results */
 	List	   *results = NIL;
+	bool	   foundValid = false;
 
 	/*
 	 * Quickly return NIL for ON CONFLICT DO NOTHING without an inference
@@ -813,7 +814,13 @@ infer_arbiter_indexes(PlannerInfo *root)
 		idxRel = index_open(indexoid, rte->rellockmode);
 		idxForm = idxRel->rd_index;
 
-		if (!idxForm->indisvalid)
+		/*
+		 * We need to consider both indisvalid and indisready indexes because
+		 * them may become indisvalid before execution phase. It is required
+		 * to keep set of indexes used as arbiter to be the same for all
+		 * concurrent transactions.
+		 */
+		if (!idxForm->indisready)
 			goto next;
 
 		/*
@@ -835,10 +842,9 @@ infer_arbiter_indexes(PlannerInfo *root)
 		 errmsg("ON CONFLICT DO UPDATE not supported with exclusion constraints")));
 
 			results = lappend_oid(results, idxForm->indexrelid);
-			list_free(indexList);
+			foundValid |= idxForm->indisvalid;
 			index_close(idxRel, NoLock);
-			table_close(relation, NoLock);
-			return results;
+			break;
 		}
 		else if (indexOidFromConstraint != InvalidOid)
 		{
@@ -939,6 +945,7 @@ infer_arbiter_indexes(PlannerInfo *root)
 			goto next;
 
 		results = lappend_oid(results, idxForm->indexrelid);
+		foundValid |= idxForm->indisvalid;
 next:
 		index_close(idxRel, NoLock);
 	}
@@ -946,7 +953,8 @@ next:
 	list_free(indexList);
 	table_close(relation, NoLock);
 
-	if (results == NIL)
+	/* It is required to have at least one indisvalid index during the planning. */
+	if (results == NIL || !foundValid)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
  errmsg("there is no unique or exclusion constraint matching the ON CONFLICT specification")));
-- 
2.43.0

From e7ad602f92185f373c3ab1af3baac9cee0416191 Mon Sep 17 00:00:00 2001
From: nkey 
Date: Thu, 20 Feb 2025 14:52:23 +0300
Subject: [PATCH v8 4/4] Modify the ExecInitPartitionInfo function to consider 
 partitioned indexes that are potentially processed by REINDEX CONCURRENTLY as
   arbiters as well.

This is necessary to ensure that all concurrent transactions use the same set of arbiter indexes.

The patch resolves the issues in the following specs:
* reindex_concurrently_upsert_partitioned
---
 src/backend/executor/execPartition.c | 119 ---
 1 file changed, 107 insertions(+), 12 deletions(-)

diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b6e89d0620d..50e58be657f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -486,6 +486,48 @@ ExecFindPartition(ModifyTableState *mtstate,
 	return rri;
 }
 
+/*
+ * IsIndexCompatibleAsArbiter
+ * 		Checks if the indexes are identical in terms of being used
+ * 		as arbiters for the INSERT ON CONFLICT operation by comparing
+ * 		them to the provided arbiter index.
+ *
+ * Returns the true if indexes are compatible.
+ */
+static bool
+IsIndexCompatibleAsArbiter(Relation	arbiterIndexRelation,
+		   IndexInfo  *arbiterIndexInfo,
+		   Relation	indexRelation,
+		   IndexInfo  *indexInfo)
+{
+	int i;
+
+	if (arbiterIndexInfo->ii_Unique != indexInfo->ii_Unique)
+		return false;
+	/* it is not supported for cases of exclusion constraints. */
+	if (arbiterIndexInfo->ii_ExclusionOps != NULL || indexInfo->ii_ExclusionOps != NULL)
+		return false;
+	if (arbiterIndexRelation->rd_index->indnkeyatts != indexRelation->rd_index->indnkeyatts)
+		return false;
+
+	for (i = 0; i < indexRelation->rd_index->indnkeyatts; i++)
+	{
+		int			arbiterAttoNo = arbiterIndexRelation->rd_index->indkey.values[i];
+		int			attoNo = indexRelation->rd_index->indkey.values[i];
+		if (arbiterAttoNo != attoNo)
+			return false;
+	}
+
+	if (list_difference(RelationGetIndexExpressions(arbite

Proposal: pg_is_volatile function

2025-02-20 Thread Andrew Farries

I'd like to propose a new function `pg_is_volatile` that would test and return
the volatility of its argument expression. Example uses of the function would
be:

pg_is_volatile(1) -> false
pg_is_volatile(random()) -> true

The motivation for the proposal is to allow testing of column default
expressions for new columns added with `ALTER TABLE ... ADD COLUMN` before
adding the column. This is to determine whether the column default will be able
to take advantage of the fast-path optimization for non-volatile column
defaults, or whether a full table rewrite will be required.

For a schema migration tool, it's desirable for the tool to assess the
volatility of a column default for a new column before adding it. The tool can
then decide on the most appropriate way to add the column, either doing so
directly for a non-volatile default, or issuing a warning or using some other
method in the case of a volatile default.

The documentation for this function would be as follows:

```
  
   

 pg_is_volatile

pg_is_volatile ( "any" )
boolean
   
   
Tests whether the argument expression contains volatile functions (see
). This can be useful to determine
whether the expression can be used as a column default without causing
a table rewrite.
   
  
```

I believe the implementation of this function would be straightforward with a
new function in `src/backend/utils/adt/misc.c` delegating to the existing
`contain_volatile_functions_after_planning` function in
`src/backend/optimizer/util/clauses.c`.

Re: IPC::Run::time[r|out] vs our TAP tests

2025-02-20 Thread Daniel Gustafsson

> On 20 Feb 2025, at 14:06, Andrew Dunstan  wrote:

> Actually, since ok() and friends return true iff the test succeeds, instead of
> +ok(! $self->{timeout}->is_expired, 'psql query_until did not time out');
> +return undef if $self->{timeout}->is_expired;
> you can avoid doing the same test twice and say:
>  ok(! $self->{timeout}->is_expired, 'psql query_until did not time out') 
> || return undef;
> although for slightly technical reasons perlcritic disapproves of "return 
> undef" and prefers that you just write a bare "return" so we should also fix 
> that.
> Sorry for taking a second bite at the cherry.

Not at all, I agree that this is an improvement so fixed in the attached along
with a fresh pgperltidy.

--
Daniel Gustafsson



v4-0002-Report-test-failure-rather-than-aborting-in-case-.patch
Description: Binary data


v4-0001-Configure-interactive-instance-to-restart-timer.patch
Description: Binary data

Re: Fix logging for invalid recovery timeline

2025-02-20 Thread David Steele


On 2/19/25 19:45, Michael Paquier wrote:

On Wed, Feb 19, 2025 at 05:35:18PM +, David Steele wrote:

I like this idea but I would prefer to get the patch committed as-is first.
The reason is that I'm hoping to see this batch-patched (since it is a bug)
and that is less likely if the message wording is change.


(Had this thread flagged as a TODO for some time, sorry for not
chiming in earlier.)


No worries. Thank you for having a look.


Your idea would be perfect going forward, though.


We have a few logs that already track this information, but perhaps
that's better to track this extra element in the FATAL if you have
log_min_messages at fatal where LOG would not show up?  Feel free to
propose a separate patch if you think that this can be improved.


Benoit -- this was your idea. Did you want to submit a patch yourself?


At a27048cbcb58, the check is done based on the copy of the checkpoint
record, and the log is generated based on the checkpoint data in the
control file.  4a92a1c3d1c3 has begun retrieving the replay TLI from
the backup label.  v13 and v14 have this issue, but I'm not really
tempted to poke at the beast more than necessary as this code had a
lot of changes in the last couple of years, with few to no complaints
as far as I am aware.


Fair enough -- this is subtle enough that I doubt almost anyone is going 
to notice and the general content of the error message is not really 
changed by the incorrect values.



Applied down to v15 where we have xlogrecovery.c, then.  Thanks for
the report!


Thank you for the commit!

Regards,
-David

Re: Statistics Import and Export

2025-02-20 Thread Andrew Dunstan




On 2025-02-20 Th 4:39 AM, Jeff Davis wrote:

On Wed, 2025-02-12 at 19:00 -0800, Jeff Davis wrote:

I'm still reviewing v48, but I intend to commit something soon.

Committed with some revisions on top of v48:

* removed the short option -X, leaving the long option "--statistics-
only" with the same meaning.

* removed the redundant --with-statistics option for pg_upgrade,
because that's the default anyway.

* removed an unnecessary enum TocEntryType and cleaned up the API to
just pass the desired prefix directly to _printTocEntry().

* stabilized the 002_pg_upgrade test by turning off autovacuum before
the first pg_dumpall (we still want it to run before that to collect
stats).

* stabilized the 027_stream_regress recovery test by specifying --no-
statistics when comparing the data on primary and standby

* fixed the cross-version upgrade tests by using the
adjust_old_dumpfile to replace the version specifier with 00 in the
argument list to pg_restore_* functions.



The buildfarm doesn't like with this. 



The conversion regexes are wrong for versions < 10, where the major 
version is '9.x', but that just seems to be the tip of the iceberg.



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Restrict publishing of partitioned table with a foreign table as partition

2025-02-20 Thread vignesh C

On Tue, 18 Feb 2025 at 15:59, Shlok Kyal  wrote:
>
> On Mon, 17 Feb 2025 at 20:13, vignesh C  wrote:
> >
> > On Fri, 14 Feb 2025 at 12:59, Shlok Kyal  wrote:
> > >
> > > I have used the changes suggested by you. Also I have updated the
> > > comments and the function name.
> >
> > There is another concurrency issue possible:
> > +/* Check if a partitioned table has a foreign partition */
> > +bool
> > +check_partrel_has_foreign_table(Form_pg_class relform)
> > +{
> > +   boolhas_foreign_tbl = false;
> > +
> > +   if (relform->relkind == RELKIND_PARTITIONED_TABLE)
> > +   {
> > +   List   *relids = NIL;
> > +
> > +   relids = find_all_inheritors(relform->oid, NoLock, NULL);
> >
> > Create a publication with publish_via_partition_root as true, hold the
> > execution after check_partrel_has_foreign_table execution finishes.
> > Then parallely execute the following:
> > CREATE TABLE t1(id int) PARTITION BY RANGE(id);
> > CREATE TABLE part1 PARTITION OF t1 FOR VALUES FROM (0) TO (5);
> > CREATE TABLE part2 PARTITION OF t1 FOR VALUES FROM (5) TO (15)
> > PARTITION BY RANGE(id);
> > CREATE FOREIGN TABLE part2_1 PARTITION OF part2 FOR VALUES FROM (10)
> > TO (15) SERVER fdw
> >
> > Now both the partitioned table having foreign table and a publication
> > will be created.
> >
>
> Hi Vignesh,
>
> I have addressed the above issue. If we take a ShareLock on the
> pg_class, we won't be able to create table concurrently, which may
> address the issue. Thoughts?
> I have attached the v8 patch here.

Since pg_class is a very common table it is not a good idea to take
ShareLock on it. Will it be possible to use pg_partitioned_table table
instead?

Regards,
Vignesh

Re: BitmapHeapScan streaming read user and prelim refactoring

On Wed, Feb 19, 2025 at 6:03 PM Thomas Munro  wrote:
>
> On Fri, Feb 14, 2025 at 9:32 AM Andres Freund  wrote:
> > I think we'll need to add some logic in read stream that only disables 
> > advice
> > after a longer sequential sequence. Writing logic for that shouldn't be too
> > hard, I think? Determining the concrete cutoffs is probably harder, 
> > although I
> > think even fairly simplistic logic will be "good enough".
>
> (Sorry for taking time to respond, I had to try some bogus things
> first before I hit on a principled answer.)
>
> Yeah, it is far too stupid.  I think I have figured out the ideal
> cutoff: just keep issuing advice for a given sequential run of blocks
> until the pread() end of the stream catches up with the *start* of it,
> if ever (ie until the kernel sees the actual sequential reads).

So like fadvise blocks 2,3,4,5, but then we see pread block 2 so stop
fadvising for that run of blocks?

> That turned out to require only a small tweak and one new variable.  It
> avoids those stalls on reads of sequential clusters >
> io_combine_limit, with no change in behaviour for pure sequential
> streams and random streams containing sequential clusters <=
> io_combine_limit that I'd previously been fixating on.

Hmm. My understanding must be incorrect because I don't see how this
would behave differently for these two IO patterns

fadvise 2,3,4,5, pread 2

fadvise 2,100,717,999, pread 2

> I've added a
> patch for that to the v2 of my read stream improvements series[1] for
> experimentation... will post shortly.

I don't think you've posted those yet. I'd like to confirm that these
work as expected before we merge the bitmap heap scan code.

> I also see a couple of other reasons why streaming BHS can be less
> aggressive with I/O than master: the silly arbitrary buffer queue cap,
> there's already a patch in the series for that but I have a slightly
> better one now and plan to commit it today, and silly heuristics for
> look-ahead distance reduction also related to sequential detection,
> which I'll explain with a patch on the other thread.  All of these are
> cases where I was basically a bit too chicken to open the throttle all
> the way in early versions.  Will post those at [1] too after lunch...
> more soon...

I'd like to hear more about the buffer queue cap, as I can't say I
remember what that is.

I'd also be interested to see how these other patches affect the BHS
performance.

Other than that, we need to decide on effective_io_concurrency
defaults, which has also been discussed on this thread.

- Melanie

Re: Extend postgres_fdw_get_connections to return remote backend pid

2025-02-20 Thread Sagar Shedge

Hi Fujii,

> Naming is always tricky, but remote_backend_pid feels a bit too long.
Would remote_pid be sufficient?
Point looks valid. I had another perspective is to align the naming
convention to pg_backend_pid(). remote_pid is not helping to identify
whether pid belongs to postgres backend or not. Does this make sense?
Or I'm fine to go with concise name like `remote_pid`

> "three values" should be changed to "four values".
Done. Good catch!!

> How about: "PID of the remote backend handling the connection" instead?
Updated in v2.

> Wouldn't it be better to return the result of PQbackendPID() instead of NULL
even when the connection is closed, for debugging purposes? This way,
users can see which remote backend previously handled the "closed" connection,
which might be helpful for troubleshooting.
Agree. Updated logic to return backend pid always except when pid is 0
which indicates no backend attached to session. Returning 0 found
misleading. What's your thought on this?

> The postgres_fdw regression test failed on my MacBook with the following diff:
I updated tests to make it more stable. Let me know if it's still
failing on your setup.


On Wed, Feb 19, 2025 at 2:19 PM Fujii Masao  wrote:
>
>
>
> On 2025/02/19 2:06, Sagar Shedge wrote:
> > Hi Fujii,
> >
> > On Tue, Feb 18, 2025 at 10:25 PM Fujii Masao
> >  wrote:
> >
> >>
> >> I assume you're planning to extend postgres_fdw_get_connections() to
> >> also return the result of PQbackendPID(entry->conn).
> >> However, the patch you attached doesn't seem to include that change.
> >> Did you attach the wrong patch?
> >
> > My bad!!
> > You are right. I was going through an old discussion and attached the
> > same old patch file.
> >
> > Please refer to the patch file to return the remote backend pid.
>
> Thanks for the patch!
>
> Here are my review comments:
>
> The documentation needs to be updated.
>
>
> +OUT closed boolean, OUT remote_backend_pid INTEGER)
>
> Naming is always tricky, but remote_backend_pid feels a bit too long.
> Would remote_pid be sufficient?
>
>
>   * For API version 1.2 and later, this function takes an input parameter
>   * to check a connection status and returns the following
>   * additional values along with the three values from version 1.1:
>
> "three values" should be changed to "four values".
>
>
> + * - remote_backend_pid - return remote server backend pid
>
> For consistency with other field comments, "return" seems unnecessary.
> How about: "PID of the remote backend handling the connection" instead?
>
>
> +   /*
> +* If a connection status is not closed and remote 
> backend
> +* ID is valid, return remote backend ID. Otherwise, 
> return NULL.
> +*/
> +   remote_backend_pid = PQbackendPID(entry->conn);
> +   if ((is_conn_closed != 1) && (remote_backend_pid != 
> 0))
> +   values[i++] = remote_backend_pid;
>
> Wouldn't it be better to return the result of PQbackendPID() instead of NULL
> even when the connection is closed, for debugging purposes? This way,
> users can see which remote backend previously handled the "closed" connection,
> which might be helpful for troubleshooting.
>
>
> The postgres_fdw regression test failed on my MacBook with the following diff:
>
> --- /Users/postgres/pgsql/git/contrib/postgres_fdw/expected/postgres_fdw.out  
>   2025-02-19 12:53:27
> +++ /Users/postgres/pgsql/git/contrib/postgres_fdw/results/postgres_fdw.out   
>   2025-02-19 17:40:04
> @@ -12443,7 +12443,7 @@
> FROM postgres_fdw_get_connections(true);
>server_name | remote_conn_closed | remote_backend_pid
>   -++
> - loopback| false  | available
> + loopback| true   | available
>   (1 row)
>
>   -- After terminating the remote backend, since the connection is closed,
> @@ -12458,7 +12458,7 @@
> FROM postgres_fdw_get_connections(true);
>server_name | remote_conn_closed | remote_backend_pid
>   -++
> - loopback| true   | not available
> + loopback| true   | available
>   (1 row)
>
>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION
>


-- 
Sagar Dilip Shedge,
SDE AWS


v02_add_remote_backend_pid.patch
Description: Binary data

Re: Strange assertion in procarray.c

2025-02-20 Thread Mihail Nikalayeu

Hello!

Noah, I have noticed you already disabled runningcheck for isolation tests
already in injection_points[0].
The whole patch here was about to make it default for all types of tests
for injection_points.

Seems like we may close this entry.

Attached patch is just to put a rebased version here for history reasons.
[0]:
https://github.com/postgres/postgres/blob/b3ac4aa83458b1e3cc8299508a8c3e0e1490cb23/src/test/modules/injection_points/meson.build#L52
From b3ac4aa83458b1e3cc8299508a8c3e0e1490cb23 Mon Sep 17 00:00:00 2001
From: nkey 
Date: Thu, 20 Feb 2025 18:04:20 +0300
Subject: [PATCH v3] Test to reproduce issue with crash caused by passing throw
  injection points during transaction aborting (caused by call to 
 injection_init_shmem).

---
 src/backend/utils/time/snapmgr.c  |  2 +
 src/test/modules/injection_points/Makefile|  2 +-
 .../injection_points/expected/crash.out   | 26 ++
 src/test/modules/injection_points/meson.build |  1 +
 .../modules/injection_points/specs/crash.spec | 47 +++
 5 files changed, 77 insertions(+), 1 deletion(-)
 create mode 100644 src/test/modules/injection_points/expected/crash.out
 create mode 100644 src/test/modules/injection_points/specs/crash.spec

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 8f1508b1ee2..3d018c3a1e8 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -64,6 +64,7 @@
 #include "utils/resowner.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/injection_point.h"
 
 
 /*
@@ -388,6 +389,7 @@ InvalidateCatalogSnapshot(void)
 		pairingheap_remove(&RegisteredSnapshots, &CatalogSnapshot->ph_node);
 		CatalogSnapshot = NULL;
 		SnapshotResetXmin();
+		INJECTION_POINT("invalidate_catalog_snapshot_end");
 	}
 }
 
diff --git a/src/test/modules/injection_points/Makefile b/src/test/modules/injection_points/Makefile
index e680991f8d4..82e0e772b80 100644
--- a/src/test/modules/injection_points/Makefile
+++ b/src/test/modules/injection_points/Makefile
@@ -14,7 +14,7 @@ PGFILEDESC = "injection_points - facility for injection points"
 REGRESS = injection_points hashagg reindex_conc
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 
-ISOLATION = basic inplace syscache-update-pruned
+ISOLATION = basic inplace syscache-update-pruned crash
 
 TAP_TESTS = 1
 
diff --git a/src/test/modules/injection_points/expected/crash.out b/src/test/modules/injection_points/expected/crash.out
new file mode 100644
index 000..7d7f298c786
--- /dev/null
+++ b/src/test/modules/injection_points/expected/crash.out
@@ -0,0 +1,26 @@
+Parsed test spec with 4 sessions
+
+starting permutation: s4_attach_locally wx1 rxwy2 c1 ry3 c2 c3
+injection_points_set_local
+--
+  
+(1 row)
+
+step s4_attach_locally: SELECT injection_points_attach('invalidate_catalog_snapshot_end', 'wait');
+injection_points_attach
+---
+   
+(1 row)
+
+step wx1: update D1 set id = id + 1;
+step rxwy2: update D2 set id = (select id+1 from D1);
+step c1: COMMIT;
+step ry3: select id from D2;
+id
+--
+ 1
+(1 row)
+
+step c2: COMMIT;
+ERROR:  could not serialize access due to read/write dependencies among transactions
+step c3: COMMIT;
diff --git a/src/test/modules/injection_points/meson.build b/src/test/modules/injection_points/meson.build
index d61149712fd..f4981a8c118 100644
--- a/src/test/modules/injection_points/meson.build
+++ b/src/test/modules/injection_points/meson.build
@@ -45,6 +45,7 @@ tests += {
   'isolation': {
 'specs': [
   'basic',
+  'crash',
   'inplace',
   'syscache-update-pruned',
 ],
diff --git a/src/test/modules/injection_points/specs/crash.spec b/src/test/modules/injection_points/specs/crash.spec
new file mode 100644
index 000..55e6a5434ab
--- /dev/null
+++ b/src/test/modules/injection_points/specs/crash.spec
@@ -0,0 +1,47 @@
+setup
+{
+ create table D1 (id int primary key not null);
+ create table D2 (id int primary key not null);
+ insert into D1 values (1);
+ insert into D2 values (1);
+
+ CREATE EXTENSION injection_points;
+}
+
+teardown
+{
+ DROP TABLE D1, D2;
+ DROP EXTENSION injection_points;
+}
+
+session s1
+setup		{ BEGIN ISOLATION LEVEL SERIALIZABLE;}
+step wx1	{ update D1 set id = id + 1; }
+step c1		{ COMMIT; }
+
+session s2
+setup		{
+	BEGIN ISOLATION LEVEL SERIALIZABLE;
+}
+step rxwy2	{ update D2 set id = (select id+1 from D1); }
+step c2		{ COMMIT; }
+
+session s3
+setup		{ BEGIN ISOLATION LEVEL SERIALIZABLE; }
+step ry3	{ select id from D2; }
+step c3		{ COMMIT; }
+
+session s4
+setup	{
+	SELECT injection_points_set_local();
+}
+step s4_attach_locally	{ SELECT injection_points_attach('invalidate_catalog_snapshot_end', 'wait'); }
+
+permutation
+	s4_attach_locally
+	wx1
+	rxwy2
+	c1
+	ry3
+	c2
+	c3
\ No newline at end of file
-- 
2.43.0

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-02-20 Thread Daniel Gustafsson

> On 19 Feb 2025, at 15:13, Daniel Gustafsson  wrote:

> Unless something shows up I plan to commit this sometime tomorrow to allow it
> ample time in the tree before the freeze.

I spent a few more hours staring at this, and ran it through a number of CI and
local builds, without anything showing up.  Pushed to master with the first set
of buildfarm animals showing green builds.

--
Daniel Gustafsson

Re: BitmapHeapScan streaming read user and prelim refactoring

On Sun, Feb 16, 2025 at 7:29 AM Tomas Vondra  wrote:
>
> On 2/16/25 02:15, Tomas Vondra wrote:
> >
> > ...
> >
> > OK, I've uploaded the results to the github repository as usual
> >
> >   https://github.com/tvondra/bitmapscan-tests/tree/main/20250214-184807
> >
> > and I've generated the same PDF reports, with the colored comparison.
> >
> > If you compare the pivot tables (I opened the "same" PDF from the two
> > runs and flip between them using alt-tab, which makes the interesting
> > regions easy to spot), the change is very clear.
> >
> > Disabling the sequential detection greatly reduces the scope of
> > regressions. That looks pretty great, IMO.
> >
> > It also seems to lose some speedups, especially with io_combine_limit=1
> > and eic=1. I'm not sure why, if that's expected, etc.

Yea, I don't know why these would change with vs without sequential detection.
Honestly, for serial bitmap heap scan, I would assume that eic 1,
io_combine_limit 1 would be the same as master (i.e. issuing at least
the one fadvise and no additional latency introduced by trying to
combine IOs). So both the speedup and slowdown are a bit surprising to
me.

> > There still remain areas of regression, but most of them are for cases
> > that'd use index scan (tiny fraction of rows scanned), or with
> > read-ahead=4096 (and not for the lower settings).
> >
> > The read-ahead dependence is actually somewhat interesting, because I
> > realized the RAID array has this set to 8192 by default, i.e. even
> > higher than 4096 where it regresses. I suppose mdadm does that, or
> > something, I don't know how the default is calculated. But I assume it
> > depends on the number of devices, so larger arrays might have even
> > higher read-ahead values.

Are the readahead regressions you are talking about the ones with
io_combine_limit > 1 and effective_io_concurrency 0 (at higher
readahead values)? With these, we expect that no fadvises and higher
readahead values means the OS will do good readahead for us. But
somehow the read combining seems to be interfering with this.

This is curious. It seems like it must have something to do with the
way the kernel is calculating readahead size. While the linux kernel
readahead readme [1] is not particularly easy to understand, there are
some parts that stick out to me:

* The size of the async tail is determined by subtracting the size that
 * was explicitly requested from the determined request size, unless
 * this would be less than zero - then zero is used.

I wonder if somehow the larger IO requests are foiling readahead.
Honestly, I don't know how far we can get trying to figure this out.
And explicitly reverse engineering this may backfire in other ways.

- Melanie

[1] https://github.com/torvalds/linux/blob/master/mm/readahead.c

Re: Improve pgindent exclude handling: ignore empty lines

2025-02-20 Thread Andrew Dunstan




On 2025-02-18 Tu 3:02 PM, Zsolt Parragi wrote:

Hello

You are right, when I was writing the commit message I was thinking
about trim, but the script only uses chomp.

I edited the commit message to correctly describe the new behavior:
ignoring only the empty line, as that was my main intention with the
patch.





pushed, thanks.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: BitmapHeapScan streaming read user and prelim refactoring

On Wed, Feb 19, 2025 at 8:29 AM Jakub Wartak
 wrote:
>
> On Fri, Feb 14, 2025 at 7:16 PM Andres Freund  wrote:
>
> > Melanie has worked on this a fair bit, fwiw.
> >
> > My current thinking is that we'd want something very roughly like TCP
> > BBR. Basically, it predicts the currently available bandwidth not just via
> > lost packets - the traditional approach - but also by building a continually
> > updated model of "bytes in flight" and latency and uses that to predict what
> > the achievable bandwidth is.[..]
>
> Sadly that doesn't sound like PG18, right? (or I missed some thread,
> I've tried to watch Melanie's presentation though )

Yes, I spent about a year researching this. My final algorithm didn't
make it into a presentation, but the basic idea was to track how the
prefetch distance was affecting throughput on a per IO basis and push
the prefetch distance up when doing so was increasing throughput and
down when throughput isn't increasing. Doing this in a sawtooth
pattern ultimately would allow the prefetch distance to adapt to
changing system resources.

The actual algorithm was more complicated than this, but that was the
basic premise. It worked well in simulations.

- Melanie

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-02-20 Thread Jacob Champion

On Thu, Feb 20, 2025 at 8:30 AM Daniel Gustafsson  wrote:
> I spent a few more hours staring at this, and ran it through a number of CI 
> and
> local builds, without anything showing up.  Pushed to master with the first 
> set
> of buildfarm animals showing green builds.

Thank you!! And _huge thanks_ to everyone who's reviewed and provided
feedback. I'm going to start working with Andrew on getting the new
tests going in the buildfarm.

If you've been reading along and would like to get started with OAuth
validators and flows, but don't know where to start, please reach out.
The proof of the new APIs will be in the using, and the best time to
tell me if you hate those APIs is now :D

--Jacob

Re: Proposal: pg_is_volatile function

2025-02-20 Thread Pavel Stehule

Hi

čt 20. 2. 2025 v 13:48 odesílatel Andrew Farries 
napsal:

> I'd like to propose a new function `pg_is_volatile` that would test and
> return
> the volatility of its argument expression. Example uses of the function
> would
> be:
>
> pg_is_volatile(1) -> false
> pg_is_volatile(random()) -> true
>
> The motivation for the proposal is to allow testing of column default
> expressions for new columns added with `ALTER TABLE ... ADD COLUMN` before
> adding the column. This is to determine whether the column default will be
> able
> to take advantage of the fast-path optimization for non-volatile column
> defaults, or whether a full table rewrite will be required.
>
> For a schema migration tool, it's desirable for the tool to assess the
> volatility of a column default for a new column before adding it. The tool
> can
> then decide on the most appropriate way to add the column, either doing so
> directly for a non-volatile default, or issuing a warning or using some
> other
> method in the case of a volatile default.
>
> The documentation for this function would be as follows:
>
> ```
>   
>
> 
>  pg_is_volatile
> 
> pg_is_volatile ( "any" )
> boolean
>
>
> Tests whether the argument expression contains volatile functions (see
> ). This can be useful to determine
> whether the expression can be used as a column default without causing
> a table rewrite.
>
>   
> ```
>
> I believe the implementation of this function would be straightforward
> with a
> new function in `src/backend/utils/adt/misc.c` delegating to the existing
> `contain_volatile_functions_after_planning` function in
> `src/backend/optimizer/util/clauses.c`.
>
>
If this feature can be implemented, then it needs to be implemented like a
pseudo function, it cannot not be a classic function.

But for your use case you should probably check if the function is stable
too? So maybe you should check if the expression is immutable.

Maybe this function can be designed like pg_get_expr_volatility(expr) and
returns v, s or i

Probably this functionality should not be used for any other cases, so I
can imagine different and maybe more generic solutions. We can introduce
the function
`pg_does_relation_rewrite(sqlstr)`. Then the external tool can have less
knowledge about pg internals, and doesn't need to parse the command to
separate the expression.

Regards

Pavel

Re: GetRelationPath() vs critical sections

2025-02-20 Thread Andres Freund

Hi,

On 2025-02-20 14:00:10 +1300, Thomas Munro wrote:
> On Wed, Feb 19, 2025 at 3:35 PM Andres Freund  wrote:
> > After thinking about this for an embarassingly long time, I think there's
> > actually a considerably better answer for a case like this: A function that
> > returns a fixed-length string by value:
> >
> > - Compilers can fairly easily warn about on-stack values that goes out of
> >   scope
> >
> > - Because we don't need to free the memory anymore, some code that that
> >   previously needed to explicitly free the memory doesn't need to anymore
> >   (c.f. AbortBufferIO()).
> >
> > - The max lenght isn't that long, so it's actually reasonably efficient,
> >   likely commonly cheaper than psprintf.
> 
> I like it!

Does anybody have opinions about whether we should keep a backward compatible
interface in place or not?

Via https://codesearch.debian.net/ I tried to look for
references.

Unfortunately I had to exclude "relpath" as there are just too many
independent hits, due to the python function of the same name. For
relpathperm(), relpathbackend(), GetRelationPath() there looks to be just
fincore.  There also are two copies of our code, but those we don't need to
care about (libpg-query and ruby-pg-query), since they're just going to copy
the new code when updating.

I also looked for matches including relpath that had "fork" elsewhere in the
file, it's hard to see a potential use of relpath() not using fork in the
arguments or such.

Which makes me think it's not worth having a backward compatible interface?

Greetings,

Andres Freund

RE: SIMD optimization for list_sort

2025-02-20 Thread Devulapalli, Raghuveer

> Note that our current implemention is highly optimized for low-cardinality 
> inputs.
> This is needed for aggregate queries. I found this write-up of a couple 
> scalar and
> vectorized sorts, and they show this library doing very poorly on very-low
> cardinality inputs. I would look into that before trying to get something in 
> shape to
> share.
> 
> https://github.com/Voultapher/sort-research-
> rs/blob/main/writeup/intel_avx512/text.md

That write up is fairly old and those perf problems has subsequently been 
fixed. See https://github.com/intel/x86-simd-sort/pull/127 and 
https://github.com/intel/x86-simd-sort/pull/168. I still suggest measuring perf 
here for thoroughness. 

> 
> There is also the question of hardware support. It seems AVX-512 is not
> supported well on client side, where most developers work. And availability of
> any flavor is not guaranteed on server either.
> Something to keep in mind.

simd-sort also works on avx2 which is widely available. I would suggest 
benchmarking on one of the client laptops to measure the perf. 

Raghuveer

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

On Wed, Feb 19, 2025 at 1:56 AM Bertrand Drouvot
 wrote:
>
> Hi,

Thank you for looking at the patches.

>
> On Mon, Feb 17, 2025 at 12:07:56PM -0800, Masahiko Sawada wrote:
> > On Fri, Feb 14, 2025 at 2:35 AM Bertrand Drouvot
> >  wrote:
> > > Yeap, that was exactly my point when I mentioned the custodian thread 
> > > (taking
> > > into account Tom's comment quoted above).
> > >
> >
> > I've written PoC patches to have the online wal_level change work use
> > a more generic infrastructure. These patches are still in PoC state
> > but seem like a good direction to me. Here is a brief explanation for
> > each patch.
>
> Thanks for the patches!
>
> > * The 0001 patch introduces "reserved background worker slots". We
> > allocate max_process_workers + BGWORKER_CLASS_RESERVED at startup, and
> > if the number of running bgworker exceeds max_worker_processes, only
> > workers using the reserved slots can be launched. We can request to
> > use the reserved slots by adding BGWORKER_CLASS_RESERVED flag at
> > bgworker registration.
>
> I had a quick look at 0001 and I think the way that's implemented is 
> reasonnable.
> I thought this could be defined through a GUC so that extensions can benefit
> from it. But OTOH the core code should ensure the value is > as the number of
> reserved slots needed by the core so not using a GUC looks ok to me.

Interesting idea. I kept the reserved slots only for internal use but
it would be worth considering to use GUC instead.

> > * The 0002 patch introduces "bgtask worker". The bgtask infrastructure
> > is designed to execute internal tasks in background in
> > one-worker-per-one-task style. Internally, bgtask workers use the
> > reserved bgworker so it's guaranteed that they can launch.
>
> Yeah.
>
> > The
> > internal tasks that we can request are predefined and this patch has a
> > dummy task as a placeholder. This patch implements only the minimal
> > functionality for the online wal_level change work. I've not tested if
> > this bgtask infrastructure can be used for tasks that we wanted to
> > offload to the custodian worker.
>
> Again, I had a quick look and looks simple enough of our need here. It "just"
> executes "(void) InternalBgTasks[type].func()" and then exists. That's, I 
> think,
> a good starting point to add more tasks in the future (if we want to).

Yeah, we might want to extend it further, for example to pass an
argument to the background task or to ask multiple tasks for the
single bgtask worker. As far as I can read the custodian patch set,
the work of removing temp files seems not to require any argument
though.

>
> > * The 0003 patch makes wal_level a SIGHUP parameter. We do the online
> > wal_level change work using the bgtask infrastructure. There are no
> > major changes from the previous version other than that.
>
> It replaces the dummy task introduced in 0002 by the one that suits our needs
> here (through the new BgTaskWalLevelChange() function).
>
> The design looks reasonable to me. Waiting to see if others disagree before
> looking more closely at the code.

Thanks.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: Proposal: pg_is_volatile function

Pavel Stehule  writes:
> čt 20. 2. 2025 v 13:48 odesílatel Andrew Farries 
> napsal:
>> I believe the implementation of this function would be straightforward
>> with a
>> new function in `src/backend/utils/adt/misc.c` delegating to the existing
>> `contain_volatile_functions_after_planning` function in
>> `src/backend/optimizer/util/clauses.c`.

> If this feature can be implemented, then it needs to be implemented like a
> pseudo function, it cannot not be a classic function.

Yeah, this doesn't seem anywhere near as straightforward to implement
as all that.  For one thing, you'd probably rather that the argument
expression not be evaluated at all, which an ordinary function could
not prevent.  But there's also semantic questions about where exactly
in the planning process we ought to try to capture volatility.
Here are a couple of examples:

pg_is_volatile(CASE WHEN 0 > 1 THEN random() ELSE 0 END)

We'd get different answers if we allow const-folding to happen
before examining the expression than if we don't.

create function myf() returns int as 'select 1' language sql;

pg_is_volatile(myf())

myf() is marked volatile (by default), but after function inlining
the expression would just be a constant.  Which answer do you want?

For the specific use-case of determining what ALTER TABLE will do,
we'd want to determine the answer the same way ALTER TABLE will.
But I can foresee people trying to use the function for other
purposes and then complaining because they want some other answer.

On the whole, I disapprove of this whole approach to figuring out
what ALTER TABLE will do: there are too many moving parts in that
question, and this can answer only one part.  It leaves far too
much to the user to know about what other things will affect their
results.

What we have speculated about in the past is extending EXPLAIN
so that it can be applied to ALTER TABLE and other complicated
utility commands, and then for ALTER TABLE one bit of info it would
give you is whether a table rewrite (or even a table scan) is
required.  Obviously, that's a major project, and so nobody's
tackled it yet AFAIK.

regards, tom lane

Patch Review Workshop

2025-02-20 Thread Paul Jungwirth

Hello Hackers,

I'm organizing a Postgres Patch Review Workshop as a way to help new Postgres contributors get
experience and tips reviewing patches. Personally, I've been trying to review patches for years, but
I've missed most commitfests, and I've never been very confident in my feedback. Reviewing patches
is hard! But perhaps by working together it can be easier. So this workshop will be like pair
programming for patch review.

The plan is to form teams of 2-3 people, who will pick a patch and review it. Then we'll meet in a
larger group, and teams will spend 10-20 minutes presenting the patch and their feedback. We can
schedule more than one meeting if there is enough interest. Probably 3-4 groups can present in one
meeting. Perhaps we can have some veteran committers attend and review the reviews.

The March commitfest is starting soon, so I'd like to hear from people who want to participate. I'll
build a list and play matchmaker (but if you already know who you want to work with, let me know).
Please use this link to sign up:
https://docs.google.com/forms/d/e/1FAIpQLSeCI0V5xUgt-HOIZ1oeyEX-ElwopPY27WwwfpBqJ__RP7XC8A/viewform

If you are a longtime Postgres contributor and would be willing to attend the final meeting to offer
advice about the reviews, I'd love to get your help too! Please use that link to sign up.

We also have a Discord channel, #patch-review-workshop, on the PostgreSQL Hacking server:
https://discord.com/channels/1258108670710124574/1342147702804447354

Later, once you've sent your review to pgsql-hackers, send me a link. Then we'll schedule the larger
meeting(s), based on how many people completed the "assignment". I expect these will happen in late
March or early April.

Thanks to Robert Haas and Andres Freund for inspiring this! They did something similar at PgConf
2024 in Vancouver. I was at that workshop, and it was a great experience. I'm not planning to vet
patches ahead of time as they did---which sounds like a ton of work. Pick whatever patch you like.

I also don't want to reinvent anything from the commitfest app. My goal is just to help people find
partners and then present their feedback. Please add yourself as a reviewer in the commitfest app,
so that people don't all review the same thing!

Here are some links about reviewing patches:

- https://wiki.postgresql.org/wiki/Reviewing_a_Patch
- Reviewing Postgres Patches for Fun and Profit, David Steele:
https://www.youtube.com/watch?v=FzNXFJ2-r0s

Yours,

--
Paul ~{:-)
p...@illuminatedcomputing.com

Re: Improvement of var_eq_non_const()

Teodor Sigaev  writes:
> I'd like to suggest to improve var_eq_non_const() by using knowledge of MCV 
> and 
> estimate the selectivity as quadratic mean of non-null fraction divided by 
> number of distinct values (as it was before) and set of MCV selectivities.

What's the statistical interpretation of this calculation (that is,
the average MCV selectivity)?  Maybe it's better, but without any
context it seems like a pretty random thing to do.  In particular,
it seems like this could give radically different answers depending
on how many MCVs we chose to store, and I'm not sure we could argue
that the result gets more accurate with more MCVs stored.

regards, tom lane

RE: Improve CRC32C performance on SSE4.2

2025-02-20 Thread Devulapalli, Raghuveer


> Now, there is no equivalent on MSVC and workarounds are fragile [1].
> Maybe we could only assert initialization happened for the backend and for
> frontend either
> - add a couple manual initializations to to the frontend programs where we 
> don't
> want to lose performance for non-gcc/clang.
> - require CRC on x86-64 MSVC since Windows 10 is EOL soon, going by Thomas
> M.'s earlier findings on popcount (also SSE4.2) [2]
> 
> The first is less risky but less tidy.

Agree, let me think about this but not sure if I have any useful suggestions 
here. MSVC is unfortunately not my strong suit :/ 

Raghuveer

Re: New buildfarm animals with FIPS mode enabled

2025-02-20 Thread Mark Wong

On Tue, Feb 18, 2025 at 02:41:18PM +0100, Álvaro Herrera wrote:
> On 2025-Feb-17, Daniel Gustafsson wrote:
> 
> > On 17 Feb 2025, at 20:23, Tom Lane  wrote:
> 
> > > Obviously, we could talk about extending the regression tests'
> > > support for these cases, but I'm really dubious that it's worth
> > > the work.
> > 
> > Agreed.
> 
> This means that unless Mark is willing to install OpenSSL 3 from source,
> these buildfarm animals are not viable.  I'll wait for Mark to confirm,
> but given the number of animals he maintains, I think it's not really
> feasible to have some which require individual patching work.

Yeah, I can install OpenSSL 3 from source.  I'm trying make better use
of Ansible to help.

Regards,
Mark

broken munhttps://github.com/munin-monitoring/contrib/issues/1483in plugins for PostgreSQL 17

2025-02-20 Thread Pavel Stehule

Hi

Can somebody contact munin maintainers?

The some plugins for PostgreSQL are particulari broken for PostgreSQL 15,
but almost worked.
When we migrated to PostgreSQL 17, then two plugins were fully broken.

https://github.com/munin-monitoring/contrib/issues/1483

Unfortunately, the munin doesn't look too healthy.

Regards

Pavel

Re: GetRelationPath() vs critical sections

2025-02-20 Thread Noah Misch

On Thu, Feb 20, 2025 at 12:40:57PM -0500, Andres Freund wrote:
> On 2025-02-20 14:00:10 +1300, Thomas Munro wrote:
> > On Wed, Feb 19, 2025 at 3:35 PM Andres Freund  wrote:
> > > After thinking about this for an embarassingly long time, I think there's
> > > actually a considerably better answer for a case like this: A function 
> > > that
> > > returns a fixed-length string by value:
> > >
> > > - Compilers can fairly easily warn about on-stack values that goes out of
> > >   scope
> > >
> > > - Because we don't need to free the memory anymore, some code that that
> > >   previously needed to explicitly free the memory doesn't need to anymore
> > >   (c.f. AbortBufferIO()).
> > >
> > > - The max lenght isn't that long, so it's actually reasonably efficient,
> > >   likely commonly cheaper than psprintf.
> > 
> > I like it!

Works for me.

> Unfortunately I had to exclude "relpath" as there are just too many
> independent hits, due to the python function of the same name. For
> relpathperm(), relpathbackend(), GetRelationPath() there looks to be just
> fincore.

PGXN has few hits, and some of these are false positives or otherwise
irrelevant:

$ grep -re '[^.]\(relpath[a-z]*\|GetRelationPath\)(' | sed 's/-[^:]*/:/'|sort -u
db2_fdw::extern char *GetRelationPath(Oid dbNode, Oid spcNode, Oid relNode,
jsoncdc::pub fn GetRelationPath(dbNode: Oid, spcNode: Oid, relNode: Oid,
openbarter::path = relpath(frame.f_code.co_filename, refdir)  # 
relative to refdir
pg_bulkload::#define relpath(rnode, forknum)relpath((rnode))
pg_bulkload::   fname = relpath(bknode, MAIN_FORKNUM);
pg_bulkload::   fname = relpath(rnode, MAIN_FORKNUM);
pg_repack::#define relpath(rnode, forknum)  relpath((rnode))
plv8::  def _to_relpath(self, abspath, _):
plv8::  def _to_relpath(self, abspath, test_root):
plv8::  yield self._to_relpath(abspath, test_root)
tblsize_nolock::relationpath = relpath(*rfn);

> Which makes me think it's not worth having a backward compatible interface?

Agreed.  Even if 100% of those matches had to change, that's below standard
level of breakage for a major release.

Re: GetRelationPath() vs critical sections

Andres Freund  writes:
> Does anybody have opinions about whether we should keep a backward compatible
> interface in place or not?

I vote for "not" --- doesn't seem like there'll be much external
code affected, and we make comparably-sized API breaks all the time.

As a matter of style, I wonder if it'd be better to have these
functions write into a caller-supplied variable.  That seems more
in keeping with most other places in Postgres, and it would save
a copying step in cases where the caller needs the result on the
heap.  I realize that returning structs has been in C for decades,
but that doesn't mean I want some of our APIs doing it one way and
some the other.

regards, tom lane

Re: BUG #18815: Logical replication worker Segmentation fault

2025-02-20 Thread Sergey Belyashov

I have applied the patch. Replication now works. Thank you.

Regards

чт, 20 февр. 2025 г. в 00:55, Tom Lane :
>
> Sergey Belyashov  writes:
> > The 4th patch is not applicable for the current REL_17_STABLE branch.
> > There are a lot of differences from master in the worker.c.
>
> If you want to try the committed v17 patch, see
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=788baa9a25ae83b576621166367c868d3329b4d4
>
> Thanks again for the report!
>
> regards, tom lane

Re: Proposal: pg_is_volatile function

2025-02-20 Thread Pavel Stehule

Hi


> What we have speculated about in the past is extending EXPLAIN
> so that it can be applied to ALTER TABLE and other complicated
> utility commands, and then for ALTER TABLE one bit of info it would
> give you is whether a table rewrite (or even a table scan) is
> required.  Obviously, that's a major project, and so nobody's
> tackled it yet AFAIK.
>

I though same idea, using EXPLAIN for this purpose can be nice and intuitive

Regards

Pavel



>
> regards, tom lane
>

Re: Restrict copying of invalidated replication slots

On Wed, Feb 19, 2025 at 3:46 AM Shlok Kyal  wrote:
>
> On Tue, 18 Feb 2025 at 15:26, Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Monday, February 17, 2025 7:31 PM Shlok Kyal  
> > wrote:
> > >
> > > On Thu, 13 Feb 2025 at 15:54, vignesh C  wrote:
> > > >
> > > > On Tue, 4 Feb 2025 at 15:27, Shlok Kyal  
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Currently, we can copy an invalidated slot using the function
> > > > > 'pg_copy_logical_replication_slot'. As per the suggestion in the
> > > > > thread [1], we should prohibit copying of such slots.
> > > > >
> > > > > I have created a patch to address the issue.
> > > >
> > > > This patch does not fix all the copy_replication_slot scenarios
> > > > completely, there is a very corner concurrency case where an
> > > > invalidated slot still gets copied:
> > > > +   /* We should not copy invalidated replication slots */
> > > > +   if (src_isinvalidated)
> > > > +   ereport(ERROR,
> > > > +
> > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > > +errmsg("cannot copy an invalidated
> > > > replication slot")));
> > > >
> > > > Consider the following scenario:
> > > > step 1) Set up streaming replication between the primary and standby 
> > > > nodes.
> > > > step 2) Create a logical replication slot (test1) on the standby node.
> > > > step 3) Have a breakpoint in InvalidatePossiblyObsoleteSlot if cause
> > > > is RS_INVAL_WAL_LEVEL, no need to hold other invalidation causes or
> > > > add a sleep in InvalidatePossiblyObsoleteSlot function like below:
> > > > if (cause == RS_INVAL_WAL_LEVEL)
> > > > {
> > > > while (bsleep)
> > > > sleep(1);
> > > > }
> > > > step 4) Reduce wal_level on the primary to replica and restart the 
> > > > primary
> > > node.
> > > > step 5) SELECT 'copy' FROM pg_copy_logical_replication_slot('test1',
> > > > 'test2');  -- It will wait till the lock held by
> > > > InvalidatePossiblyObsoleteSlot is released while trying to create a
> > > > slot.
> > > > step 6) Increase wal_level back to logical on the primary node and
> > > > restart the primary.
> > > > step 7) Now allow the invalidation to happen (continue the breakpoint
> > > > held at step 3), the replication control lock will be released and the
> > > > invalidated slot will be copied
> > > >
> > > > After this:
> > > > postgres=# SELECT 'copy' FROM
> > > > pg_copy_logical_replication_slot('test1', 'test2');  ?column?
> > > > --
> > > >  copy
> > > > (1 row)
> > > >
> > > > -- The invalidated slot (test1) is copied successfully:
> > > > postgres=# select * from pg_replication_slots ;
> > > >  slot_name |plugin | slot_type | datoid | database | temporary
> > > > | active | active_pid | xmin | catalog_xmin | restart_lsn |
> > > > confirmed_flush_lsn | wal_status | safe_wal_size | two_phas
> > > > e |  inactive_since  | conflicting |
> > > > invalidation_reason   | failover | synced
> > > >
> > > ---+---+---++--+---+
> > > ++--+--+-+---
> > > --++---+-
> > > >
> > > --+--+-+--
> > > --+--+
> > > >  test1 | test_decoding | logical   |  5 | postgres | f
> > > > | f  ||  |  745 | 0/4029060   | 0/4029098
> > > >  | lost   |   | f
> > > >   | 2025-02-13 15:26:54.666725+05:30 | t   |
> > > > wal_level_insufficient | f| f
> > > >  test2 | test_decoding | logical   |  5 | postgres | f
> > > > | f  ||  |  745 | 0/4029060   | 0/4029098
> > > >  | reserved   |   | f
> > > >   | 2025-02-13 15:30:30.477836+05:30 | f   |
> > > >  | f| f
> > > > (2 rows)
> > > >
> > > > -- A subsequent attempt to decode changes from the invalidated slot
> > > > (test2) fails:
> > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > NULL);
> > > > WARNING:  detected write past chunk end in TXN 0x5e77e6c6f300
> > > > ERROR:  logical decoding on standby requires "wal_level" >= "logical"
> > > > on the primary
> > > >
> > > > -- Alternatively, the following error may occur:
> > > > postgres=# SELECT data FROM pg_logical_slot_get_changes('test2', NULL,
> > > > NULL);
> > > > WARNING:  detected write past chunk end in TXN 0x582d1b2d6ef0
> > > > data
> > > > 
> > > >  BEGIN 744
> > > >  COMMIT 744
> > > > (2 rows)
> > > >
> > > > This is an edge case that can occur under specific conditions
> > > > involving replication slot invalidation when there is a huge lag
> > > > between primary and standby.
> > > > There might be a similar concurrency case for wal_removed too.
> > > >
> > >
> > > Hi Vignesh,
> > >
> > > Thanks for reviewing the patch.
> >
> > Thanks for updating the patch. I have a question

Re: [PoC] Federated Authn/z with OAUTHBEARER

Daniel Gustafsson  writes:
> I spent a few more hours staring at this, and ran it through a number of CI 
> and
> local builds, without anything showing up.  Pushed to master with the first 
> set
> of buildfarm animals showing green builds.

After doing an in-tree "make check", I see

$ git status
On branch master
Your branch is up to date with 'origin/master'.

Untracked files:
  (use "git add ..." to include in what will be committed)
src/test/modules/oauth_validator/oauth_hook_client

Looks like we're short a .gitignore entry.  (It does appear that
"make clean" cleans it up, at least.)

regards, tom lane

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-02-20 Thread Jacob Champion

On Thu, Feb 20, 2025 at 12:07 PM Tom Lane  wrote:
> Looks like we're short a .gitignore entry.  (It does appear that
> "make clean" cleans it up, at least.)

So we are! Sorry about that. The attached patch gets in-tree builds
clean for me again.

Thanks,
--Jacob


fix-gitignore.patch
Description: Binary data

Re: GetRelationPath() vs critical sections

2025-02-20 Thread Andres Freund

Hi,

On 2025-02-20 14:11:16 -0500, Tom Lane wrote:
> As a matter of style, I wonder if it'd be better to have these
> functions write into a caller-supplied variable.

I wondered about that too, it does however make some code more awkward,
e.g. because there's not a good surrounding block to put the variable in.

I mildly prefer the return-by-value approach, but not more.


> That seems more in keeping with most other places in Postgres, and it would
> save a copying step in cases where the caller needs the result on the heap.

I don't think there are many cases where we have to then copy the value to the
heap, fwiw, with a more complete patch.


It's perhaps worth noting that on most (all?) architectures the *caller* will
reserve space for the return value of functions that return structs by value.


In general it'll be easier for the compiler to optimize code where it knows
the lifetime of the relevant memory, which is harder for a function that gets
passed a "target" memory location than when it's returning by value. But of
course it's very hard to believe this ever matters for the case at hand.


The patch curently uses a hardcoded 6 for the length of MAX_BACKENDS. Does
anybody have a good idea for how to either

a) derive 6 from MAX_BACKENDS in a way that can be used in a C array size

b) Use a static assert to check that it fits?

Otherwise I'll just put it in the test that my prior version already
added. Not as good as compile-time, but should still make it easy to find the
issue, if we ever increase/decrease MAX_BACKENDS.


In the attached I

1) changed all the GetRelationPath()/relpath* uses to the new way, the names
   now are unchanged
2) introduced a PROCNUMBER_CHARS, instead hardcoding 6 in the length
3) renamed RelPathString to RelPathStr and change the path member to str

I also added a second patch changing _mdfd_segpath() to do something similar,
because it felt somewhat silly to allocate memory after the prior change. Not
entirely sure it's worth it.

Greetings,

Andres Freund
>From e25fe65ef66e536ef6074a3f273c5cecd27f65b7 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 20 Feb 2025 15:12:28 -0500
Subject: [PATCH v2 1/2] Change relpath() etc to return path by value

Discussion: https://postgr.es/m/xeri5mla4b5syjd5a25nok5iez2kr3bm26j2qn4u7okzof2bmf@kwdh2vf7npra
---
 src/include/common/relpath.h  | 52 -
 src/common/relpath.c  | 77 ++-
 src/backend/access/rmgrdesc/smgrdesc.c| 10 +--
 src/backend/access/rmgrdesc/xactdesc.c|  6 +-
 src/backend/access/transam/xlogutils.c| 29 +++
 src/backend/backup/basebackup_incremental.c   | 10 +--
 src/backend/catalog/catalog.c |  6 +-
 src/backend/catalog/storage.c |  4 +-
 .../replication/logical/reorderbuffer.c   |  4 +-
 src/backend/storage/buffer/bufmgr.c   | 48 +---
 src/backend/storage/buffer/localbuf.c |  6 +-
 src/backend/storage/smgr/md.c | 59 ++
 src/backend/utils/adt/dbsize.c| 10 +--
 src/bin/pg_rewind/filemap.c   |  7 +-
 src/test/regress/expected/misc_functions.out  | 11 +++
 src/test/regress/regress.c| 29 +++
 src/test/regress/sql/misc_functions.sql   |  8 ++
 src/tools/pgindent/typedefs.list  |  1 +
 18 files changed, 218 insertions(+), 159 deletions(-)

diff --git a/src/include/common/relpath.h b/src/include/common/relpath.h
index 5162362e7d8..72fe5ee77d5 100644
--- a/src/include/common/relpath.h
+++ b/src/include/common/relpath.h
@@ -77,13 +77,61 @@ extern PGDLLIMPORT const char *const forkNames[];
 extern ForkNumber forkname_to_number(const char *forkName);
 extern int	forkname_chars(const char *str, ForkNumber *fork);
 
+
+/*
+ * MAX_BACKENDS is 2^18-1, we don't want to include postmaster.h here and
+ * there's no obvious way to derive PROCNUMBER_CHARS from it
+ * anyway. Crosschecked in test_relpath().
+ */
+#define PROCNUMBER_CHARS	6
+
+/*
+ * The longest possible relation path lengths is from the following format:
+ * sprintf(rp.path, "%s/%u/%s/%u/t%d_%u",
+ * PG_TBLSPC_DIR, spcOid,
+ * TABLESPACE_VERSION_DIRECTORY,
+ * dbOid, procNumber, relNumber);
+ *
+ * Note this does *not* include the trailing null-byte, to make it easier to
+ * combine it with other lengths.
+ */
+#define REL_PATH_STR_MAXLEN \
+	( \
+		sizeof(PG_TBLSPC_DIR) - 1 \
+		+ sizeof((char)'/') \
+		+ OIDCHARS /* spcOid */ \
+		+ sizeof((char)'/') \
+		+ sizeof(TABLESPACE_VERSION_DIRECTORY) - 1 \
+		+ sizeof((char)'/') \
+		+ OIDCHARS /* dbOid */ \
+		+ sizeof((char)'/') \
+		+ sizeof((char)'t') /* temporary table indicator */ \
+		+ PROCNUMBER_CHARS /* procNumber */ \
+		+ sizeof((char)'_') \
+		+ OIDCHARS /* relNumber */ \
+		+ sizeof((char)'_') \
+		+ FORKNAMECHARS /* forkNames[forkNumber] */ \
+	)
+
+/*
+ * String of the exact length required to represent a relation path. We return
+

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-02-20 Thread Daniel Gustafsson

> On 20 Feb 2025, at 21:21, Jacob Champion  
> wrote:
> 
> On Thu, Feb 20, 2025 at 12:07 PM Tom Lane  wrote:
>> Looks like we're short a .gitignore entry.  (It does appear that
>> "make clean" cleans it up, at least.)
> 
> So we are! Sorry about that. The attached patch gets in-tree builds
> clean for me again.

Fixed, thanks for the report!

--
Daniel Gustafsson

Reset the output buffer after sending from WalSndWriteData

2025-02-20 Thread Markus Wanner


Hi,

I recently stumbled over an issue with an unintentional re-transmission. 
While this clearly was our fault in the output plugin code, I think the 
walsender's exposed API could easily be hardened to prevent the bad 
consequence from this mistake.


Does anything speak against the attached one line patch?

Best Regards

MarkusFrom 65dd24ed975cc513e4e0de5e175dff16b4b0f6d4 Mon Sep 17 00:00:00 2001
From: Markus Wanner 
Date: Thu, 20 Feb 2025 21:01:45 +0100
Subject: [PATCH] Reset the output buffer after sending from WalSndWriteData

While not strictly necessary, clearing the buffer right away after
sending prevents an accidential re-delivery in case WalSndWriteData
is called again.
---
 src/backend/replication/walsender.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 446d10c1a7d..96884ce152f 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1563,6 +1563,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 	/* output previously gathered data in a CopyData packet */
 	pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
 
+	/* reset the output buffer to prevent re-sending */
+	resetStringInfo(ctx->out);
+
 	CHECK_FOR_INTERRUPTS();
 
 	/* Try to flush pending output to the client */
-- 
2.39.5

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

2025-02-20 Thread Alena Rybakina


Hi!

On 09.02.2025 18:38, Alexander Korotkov wrote:

Also, aren't we too restrictive while requiring is_simple_values_sequence()?
For instance, I believe cases like this (containing Var) could be transformed 
too.

select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), 
(1)));

I'm still working on it.

Also, I think there is quite a code duplication about construction of
SAOP between match_orclause_to_indexcol() and convert_VALUES_to_ANY()
functions.  I would like to see a refactoring as a separate first
patch, which extracts the common part into a function.


Done.

I have attached a patch. In addition to the transfer, I added the 
process of searching for a suitable operator and type for the left 
expression for input expressions: const and left expression, since they 
may differ from the declared types. Additionally, we convert the left 
expr to a type suitable for the found operator.


--
Regards,
Alena Rybakina
Postgres Professional
From 252e6a0b9a0b187e21a0721931f185d20061e66f Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 20 Feb 2025 23:30:01 +0300
Subject: [PATCH 2/2] Add an implementation of the x IN VALUES to 'x = ANY
 [...]', a special case of the ScalarArrayOpExpr operation.

It will simplify the query tree, eliminating the appearance of
an unnecessary join.

Since VALUES describes a relational table, and the value of such
a list is a table row, the optimizer will likely face an underestimation
problem due to the inability to estimate cardinality through
MCV statistics. The cardinality evaluation mechanism
can work with the array inclusion check operation.
If the array is small enough (< 100 elements), it will perform
a statistical evaluation element by element.

We perform the transformation in the convert_ANY_sublink_to_join
if VALUES RTE is proper and the transformation is convertible.
The conversion is only possible for operations on scalar values.

Authors: Andrei Lepikhov ,
	 Alena Rybakina 
---
 src/backend/optimizer/plan/subselect.c| 106 ++
 src/backend/optimizer/prep/prepjointree.c |   7 +
 src/include/optimizer/subselect.h |   2 +
 src/test/regress/expected/subselect.out   | 443 ++
 src/test/regress/sql/subselect.sql| 155 
 5 files changed, 713 insertions(+)

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 8230cbea3c3..08c4648f936 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1214,6 +1214,112 @@ inline_cte_walker(Node *node, inline_cte_walker_context *context)
 	return expression_tree_walker(node, inline_cte_walker, context);
 }
 
+/*
+ * The function traverses the tree looking for elements of type var.
+ * If it finds it, it returns true.
+ */
+static bool
+values_simplicity_check_walker(Node *node, void *ctx)
+{
+	if (node == NULL)
+	{
+		return false;
+	}
+	else if(IsA(node, Var))
+		return true;
+	else if(IsA(node, Query))
+		return query_tree_walker((Query *) node,
+ values_simplicity_check_walker,
+ (void*) ctx,
+ QTW_EXAMINE_RTES_BEFORE);
+
+	return expression_tree_walker(node, values_simplicity_check_walker,
+  (void *) ctx);
+}
+
+/*
+ * Designed in analogy with is_simple_values
+ */
+static bool
+is_simple_values_sequence(Query *query)
+{
+	RangeTblEntry *rte;
+
+	/* In theory removing (altering) part of restrictions */
+	if (list_length(query->targetList) > 1 ||
+		query->limitCount != NULL || query->limitOffset != NULL ||
+		query->sortClause != NIL ||
+		list_length(query->rtable) != 1)
+		return false;
+
+	rte = linitial_node(RangeTblEntry,query->rtable);
+
+	/* Permanent restrictions */
+	if (rte->rtekind != RTE_VALUES ||
+		list_length(rte->values_lists) <= 1 ||
+		contain_volatile_functions((Node *) query))
+		return false;
+
+	/*
+	 * Go to the query tree to be sure that expression doesn't
+	 * have any Var type elements.
+	 */
+	return !expression_tree_walker((Node *) (rte->values_lists),
+   values_simplicity_check_walker,
+   NULL);
+}
+
+/*
+ * Transform appropriate testexpr and const VALUES expression to SaOpExpr.
+ *
+ * Return NULL, if transformation isn't allowed.
+ */
+ScalarArrayOpExpr *
+convert_VALUES_to_ANY(Query *query, Node *testexpr)
+{
+	RangeTblEntry *rte;
+	Node	   *leftop;
+	Oid			matchOpno;
+	ListCell   *lc;
+	Oid			inputcollid;
+	bool		have_param = false;
+	List	   *consts = NIL;
+
+	/* Extract left side of SAOP from test epression */
+
+	if (!IsA(testexpr, OpExpr) ||
+		list_length(((OpExpr *) testexpr)->args) != 2 ||
+		!is_simple_values_sequence(query))
+		return NULL;
+
+	rte = linitial_node(RangeTblEntry,query->rtable);
+	leftop = linitial(((OpExpr *) testexpr)->args);
+	matchOpno = ((OpExpr *) testexpr)->opno;
+	inputcollid = linitial_oid(rte->colcollations);
+
+	foreach (lc, rte->values_lists)
+	{
+		List *elem = lfirst(lc);
+		Node *value = linitial(elem);
+
+		value = eval_const_expressions(NULL, va

Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2025-02-20 Thread Noah Misch

On Wed, Feb 19, 2025 at 10:48:29AM -0800, Masahiko Sawada wrote:
> Thank you for reviewing the patches. I've fixed these issues and
> attached the updated patches.

Looks good.

> I have one question about the 0001 patch; since we add
> 'default_char_signedness' field to ControlFileData do we need to bump
> PG_CONTROL_VERSION? We have comments about bumping PG_CONTROL_VERSION
> when changing CheckPoint struct or DBState enum so it seems likely but
> I'd like to confirm just in case that we need to bump
> PG_CONTROL_VERSION also when changing ControlFileData.

Yes.  (I'm not aware of value we get from having distinct control file version
and catalog version, but we do have both.)

> If we need, can
> we bump it to 1800? or 1701?

I'd do 1800.  The pattern seems to be to bump to 1800 for the first pg_control
change of the v18 cycle, then 1801, then 1802 for the third change of the
cycle.  That's based on this history:

git log -U0 -p src/include/catalog/pg_control.h | grep -E '^(Date|\+#define 
PG_CONTROL_VERSION)'

RE: ReplicationSlotRelease() crashes when the instance is in the single user mode

Dear hackers,

Thanks everyone for giving comments! PSA new version.
What's new:

- Message format was modified to {"cannot use function %s in single-user mode", 
"function_name"}
- Reporting funcname was adjusted based on the parameters. ternary operator was 
used.
- Guard was added for functions in logicalfunction.c.

For now, functions for replication origin and replication messages were 
retained.
I can handle them after the discussion.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v3-0001-Prohibit-slot-operations-while-in-the-single-user.patch
Description:  v3-0001-Prohibit-slot-operations-while-in-the-single-user.patch

Re: IPC::Run::time[r|out] vs our TAP tests

2025-02-20 Thread Andrew Dunstan



On 2025-02-19 We 6:56 PM, Daniel Gustafsson wrote:

On 19 Feb 2025, at 23:08, Andrew Dunstan wrote:
On 2024-10-31 Th 6:18 PM, Heikki Linnakangas wrote:

Thanks for review!


Perhaps sommething like this:

"Close the psql session and clean up resources. Each psql session must be closed with 
C before the end of the test.
Returns TRUE if psql exited successfully (i.e. with zero exit code), otherwise 
returns FALSE and reports a test failure."

Would that be accurate?

I would be OK with Heikki's version.

Fixed.


The patches have bitrotted slightly.

The attached rebases over current HEAD and passes check-world locally for me.


Also this is wrong, I think:

 isnt($self->{timeout}->is_expired, 'psql query_until timed out');

I think it should be

 ok(! $self->{timeout}->is_expired, 'psql query_until did not time out');

Fixed.



Actually, since ok() and friends return true iff the test succeeds, 
instead of


+    ok(! $self->{timeout}->is_expired, 'psql query_until did not time 
out');

+    return undef if $self->{timeout}->is_expired;

you can avoid doing the same test twice and say:

 ok(! $self->{timeout}->is_expired, 'psql query_until did not time 
out') || return undef;


although for slightly technical reasons perlcritic disapproves of 
"return undef" and prefers that you just write a bare "return" so we 
should also fix that.


Sorry for taking a second bite at the cherry.


cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Re: BitmapHeapScan streaming read user and prelim refactoring

On Wed, Feb 19, 2025 at 4:14 PM Melanie Plageman
 wrote:
>
> Attached v30 makes the tuple offset extraction happen later as you
> suggested. It turns out that none of the users need to worry much
> about allocating and freeing -- I was able to have all users make an
> offsets array on the stack. Personally I don't think we should worry
> about making a smaller array when fewer offsets are needed.
>
> Even without the read stream API, in bitmap heap scan, delaying
> extraction of the offsets lets us skip doing so for the prefetch
> iterator -- so that's kind of cool. Anyway, we are going to get rid of
> the prefetch iterator, so I won't belabor the point.
>
> I was worried the patch would be too big of a change to the tidbitmap
> API for this late in the cycle, but it turned out to be a small patch.
> I'd be interested in what folks think.

Attached v31 has some updates after self-review and some off-list
feedback from Andres. They're minor changes, so I won't enumerate
them.

- Melanie
From e85497578afa4b3031050652daeec6fc6c8953c4 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 20 Feb 2025 12:24:49 -0500
Subject: [PATCH v31 1/4] Delay extraction of TIDBitmap per page offsets

Pages from the bitmap created by the TIDBitmap API can be exact or
lossy. Exact pages extract all the tuple offsets from the bitmap and put
them in an array for the convenience of the caller.

This was done in tbm_private|shared_iterate(). However, as long as
tbm_private|shared_iterate() set a reference to the PagetableEntry in
the TBMIterateResult, the offset extraction can be done later.

Waiting to extract the tuple offsets has a few benefits. For the shared
iterator case, it allows us to extract the offsets after dropping the
shared iterator state lock, reducing time spent holding a contended
lock.

Separating the iteration step and extracting the offsets later also
allows us to avoid extracting the offsets for prefetched blocks. Those
offsets were never used, so the overhead of extracting and storing them
was wasted.

The real motivation for this change, however, is that future commits
will make bitmap heap scan use the read stream API. This requires a
TBMIterateResult per issued block. By removing the array of tuple
offsets from the TBMIterateResult and only extracting the offsets when
they are used, we reduce the memory required for per buffer data
substantially.

Suggested-by: Thomas Munro
Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com
---
 src/backend/access/gin/ginget.c  | 16 ++--
 src/backend/access/heap/heapam_handler.c | 10 -
 src/backend/nodes/tidbitmap.c| 52 +---
 src/include/access/gin_private.h |  1 +
 src/include/nodes/tidbitmap.h| 42 ---
 5 files changed, 80 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 63dd1f3679f..9fbe178ad47 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -845,6 +845,15 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
 	break;
 }
 
+/*
+ * Exact pages need their tuple offsets extracted.
+ * tbm_extract_page_tuple() will set ntuples to the correct
+ * number.
+ */
+if (entry->matchResult->ntuples == -2)
+	tbm_extract_page_tuple(entry->matchResult,
+		   entry->matchTupleOffsets);
+
 /*
  * Reset counter to the beginning of entry->matchResult. Note:
  * entry->offset is still greater than matchResult->ntuples if
@@ -884,20 +893,21 @@ entryGetItem(GinState *ginstate, GinScanEntry entry,
  * page. If that's > advancePast, so are all the other
  * offsets, so just go back to the top to get the next page.
  */
-if (entry->matchResult->offsets[entry->matchResult->ntuples - 1] <= advancePastOff)
+if (entry->matchTupleOffsets[entry->matchResult->ntuples - 1] <=
+	advancePastOff)
 {
 	entry->offset = entry->matchResult->ntuples;
 	continue;
 }
 
 /* Otherwise scan to find the first item > advancePast */
-while (entry->matchResult->offsets[entry->offset] <= advancePastOff)
+while (entry->matchTupleOffsets[entry->offset] <= advancePastOff)
 	entry->offset++;
 			}
 
 			ItemPointerSet(&entry->curItem,
 		   entry->matchResult->blockno,
-		   entry->matchResult->offsets[entry->offset]);
+		   entry->matchTupleOffsets[entry->offset]);
 			entry->offset++;
 
 			/* Done unless we need to reduce the result */
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c0bec014154..408d4b44240 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -2127,6 +2127,7 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
 	Snapshot	snapshot;
 	int			ntup;
 	TBMIterateResult *tbmres;
+	TBMOffsets	offsets;
 
 	Assert(scan->rs_flags & SO_TYPE_BI

Re: Draft for basic NUMA observability

2025-02-20 Thread Bertrand Drouvot

Hi Jakub,

On Mon, Feb 17, 2025 at 01:02:04PM +0100, Jakub Wartak wrote:
> On Thu, Feb 13, 2025 at 4:28 PM Bertrand Drouvot
>  wrote:
> 
> Hi Bertrand,
> 
> Thanks for playing with this!
> 
> > Which makes me wonder if using numa_move_pages()/move_pages is the right 
> > approach. Would be curious to know if you observe the same behavior though.
> 
> You are correct, I'm observing identical behaviour, please see attached.

Thanks for confirming!

> 
> We probably would need to split it to some separate and new view
> within the pg_buffercache extension, but that is going to be slow, yet
> still provide valid results.

Yup.

> In the previous approach that
> get_mempolicy() was allocating on 1st access, but it was slow not only
> because it was allocating but also because it was just 1 syscall per
> 1x addr (yikes!). I somehow struggle to imagine how e.g. scanning
> (really allocating) a 128GB buffer cache in future won't cause issues
> - that's like 16-17mln (* 2) syscalls to be issued when not using
> move_pages(2)

Yeah, get_mempolicy() not working on a range is not great.

> > But maybe we could use get_mempolicy() only on "valid" buffers i.e 
> > ((buf_state & BM_VALID) && (buf_state & BM_TAG_VALID)), thoughts?
> 
> Different perspective: I wanted to use the same approach in the new
> pg_shmemallocations_numa, but that won't cut it there. The other idea
> that came to my mind is to issue move_pages() from the backend that
> has already used all of those pages. That literally mean on of the
> below ideas:
> 1. from somewhere like checkpointer / bgwriter?
> 2. add touching memory on backend startup like always (sic!)
> 3. or just attempt to read/touch memory addr just before calling
> move_pages().  E.g. this last options is just two lines:
> 
> if(os_page_ptrs[blk2page+j] == 0) {
> +volatile uint64 touch pg_attribute_unused();
> os_page_ptrs[blk2page+j] = (char *)BufHdrGetBlock(bufHdr) +
> (os_page_size*j);
> +touch = *(uint64 *)os_page_ptrs[blk2page+j];
> }
> 
> and it seems to work while still issuing much less syscalls with
> move_pages() across backends, well at least here.

One of the main issue I see with 1. and 2. is that we would not get accurate
results should the kernel decides to migrate the pages. Indeed, the process 
doing
the move_pages() call needs to have accessed the pages more recently than any
kernel migrations to see accurate locations.

OTOH, one of the main issue that I see with 3. is that the monitoring could
probably influence the kernel's decision to start pages migration (I'm not 100%
sure but I could imagine it may influence the kernel's decision due to having to
read/touch the pages). 

But I'm thinking: do we really need to know the page location of every single 
page?
I think what we want to see is if the pages are "equally" distributed on all
the nodes or are somehow "stuck" to one (or more) nodes. In that case what about
using get_mempolicy() but on a subset of the buffer cache? (say every Nth buffer
or contiguous chunks). We could create a new function that would accept a
"sampling distance" as parameter for example, thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Add XMLNamespaces to XMLElement

2025-02-20 Thread Jim Jones

I've attached v6, which addresses two issues:

* Fixes a bug where XMLNAMESPACES declarations within a view were being
serialized as XMLATTRIBUTES.
* Prevents the makeString function from being called with a NULL
parameter - discussed in this thread [1].

Best regards, Jim

[1]
https://www.postgresql.org/message-id/CAFj8pRC24FEBNfTUrDgAK8f2nqDVvzWCuq%3DR%3DT19nUjL9GuLBA%40mail.gmail.com
From 93ea0e1b07763bca7ffb1d4cf88cc380640604cb Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Thu, 20 Feb 2025 09:58:34 +0100
Subject: [PATCH v6] Add XMLNamespaces option to XMLElement

This patch adds the scoped option XMLNamespaces to XMLElement,
as described in ISO/IEC 9075-14:2023, 11.2 XML lexically scoped
options:

xmlnamespaces(uri AS prefix, ...)
xmlnamespaces(DEFAULT uri, ...)
xmlnamespaces(NO DEFAULT, ...)

* prefix: Namespace's prefix.
* uri:Namespace's URI.
* DEFAULT prefix: Specifies the DEFAULT namespace to use within
the scope of a namespace declaration.
* NO DEFAULT: Specifies that NO DEFAULT namespace is to be
used within the scope of a namespace declaration.

== Examples ==

SELECT xmlelement(NAME "foo", xmlnamespaces('http:/x.y' AS bar));
  xmlelement
--
 

SELECT xmlelement(NAME "foo", xmlnamespaces(DEFAULT 'http:/x.y'));
xmlelement
--
 

 SELECT xmlelement(NAME "foo", xmlnamespaces(NO DEFAULT));
   xmlelement
-
---
 doc/src/sgml/func.sgml  |  57 +-
 src/backend/parser/gram.y   | 100 +--
 src/backend/parser/parse_clause.c   |   7 +-
 src/backend/parser/parse_expr.c |  80 +
 src/backend/utils/adt/ruleutils.c   |  79 +++--
 src/backend/utils/adt/xml.c |  53 +-
 src/include/nodes/primnodes.h   |   4 +-
 src/include/utils/xml.h |   6 +
 src/test/regress/expected/xml.out   | 259 
 src/test/regress/expected/xml_1.out | 197 +
 src/test/regress/expected/xml_2.out | 259 
 src/test/regress/sql/xml.sql| 133 ++
 12 files changed, 1200 insertions(+), 34 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index df32ee0bf5..3a59c0d8c2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14538,7 +14538,10 @@ SELECT xmlconcat('', '
 
 
-xmlelement ( NAME name , XMLATTRIBUTES ( attvalue  AS attname  , ... )  , content , ... ) xml
+xmlelement ( NAME name
+, XMLATTRIBUTES ( attvalue  AS attname  , ... ) 
+, XMLNAMESPACES ( {regular-nsuri AS nsprefix | DEFAULT default-nsuri | NO DEFAULT} , ... ) 
+, content , ... ) xml
 
 
 
@@ -14551,7 +14554,39 @@ SELECT xmlconcat('', 'PostgreSQL data type.  The
  argument(s) within XMLATTRIBUTES generate attributes
  of the XML element; the content value(s) are
- concatenated to form its content.
+ concatenated to form its content. The arguments within XMLNAMESPACES
+ constuct namespace declarations from values provided in nsuri
+ and nsprefix, which correspond to the URI of a namespace and
+ its prefix, respectively. The option DEFAULT can be used to set the
+ default namespace declaration (without a prefix) to the URI provided in default-nsuri.
+ The option NO DEFAULT states that a namespace scope has no default namespace. A valid
+ XMLNAMESPACES item must fulfill the following conditions:
+
+
+
+ 
+  Only a single DEFAULT declaration item within the same scope.
+ 
+
+
+ 
+  No two nsuri can be equal within the same scope.
+ 
+
+
+ 
+  No nsprefix can be equal to xml or xmlns,
+  and no nsuri can be equal to http://www.w3.org/2000/xmlns/
+  or to http://www.w3.org/XML/1998/namespace, as they are already bouned to standard XML declarations.
+ 
+
+
+ 
+  The value of a regular-nsuri cannot be a zero-length string.
+ 
+
+   
+
 
 
 
@@ -14574,6 +14609,24 @@ SELECT xmlelement(name foo, xmlattributes(current_date as bar), 'cont', 'ent');
  xmlelement
 -
  content
+
+SELECT xmlelement(NAME "foo:root", xmlnamespaces('http:/foo.bar/' AS foo), 'content');
+
+   xmlelement
+-
+ content
+
+ SELECT xmlelement(NAME "root", xmlnamespaces(DEFAULT 'http:/foo.bar/'), 'content');
+
+ xmlelement
+-
+ content
+
+ SELECT xmlelement(NAME "root", xmlnamespaces(NO DEFAULT), 'content');
+
+  xmlelement
+---
+ content
 ]]>
 
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d99c9355c..399f12b86c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -136,6 +136,12 @@ typedef struct KeyActions
 	KeyAction *deleteAction;
 } KeyActions;
 
+typedef struct XmlElementOpts
+{
+	List	   *xml_at

RE: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.