date:20250630

Re: fix: propagate M4 env variable to flex subprocess

2025-06-30 Thread Peter Eisentraut

On 17.06.25 07:15, Peter Eisentraut wrote:

On 28.05.25 20:42, J. Javier Maestro wrote:
On Wed, May 28, 2025 at 6:08 PM Andres Freund > wrote:

    Hi,

    On 2025-05-17 23:32:24 -0400, J. Javier Maestro wrote:
 > On Tue, May 13, 2025 at 11:54 AM Andres Freund
    mailto:and...@anarazel.de>> wrote:
 > > Bilal, I think you wrote this originally, do you recall?
 > >
 > > It seems like an issue beyond just M4...
 > >
 >
 > IIRC the rest of the tools in the environment have ways to be
    specified via
 > Meson options (BISON, FLEX, PERL) so the only issue I see is Flex
    not being
 > able to find the specific m4 binary. What other issue(s) are you
 > considering?

    PATH, LD_LIBRARY_PATH, ...

    I think this really should just add to the environment, rather than
    supplant
    it.

Ah, understood. That definitely looks like a better option.

    Do you want to write a patch like that? Otherwise I can.

Sure, I've attached the new patch. Let me know what you think, and if 
it's OK, what are the next steps to get the patch merged in main!

This patch looks right to me.

I would wait for the PG19 branching at this point, unless there is a 
concrete need for backpatching.

committed

Re: Skipping schema changes in publication

2025-06-30 Thread shveta malik

Few more comments on 002:

5)
+GetAllTablesPublicationRelations(Oid pubid, bool pubviaroot)
 {

+ List*exceptlist;
+
+ exceptlist = GetPublicationRelations(pubid, PUBLICATION_PART_ALL);


a) Here, we are assuming that the list provided by
GetPublicationRelations() will be except-tables list only, but there
is no validation of that.
b) We are using GetPublicationRelations() to get the relations which
are excluded from the publication. The name of function and comments
atop function are not in alignment with this usage.

Suggestion:
We can have a new GetPublicationExcludeRelations() function for the
concerned usage. The existing logic of GetPublicationRelations() can
be shifted to a new internal-logic function which will accept a
'except-flag' as well. Both GetPublicationRelations() and
GetPublicationExcludeRelations() can call that new function by passing
'except-flag' as false and true respectively. The new internal
function will validate 'prexcept' against that except-flag passed and
will return the results.

6)
Before your patch002, GetTopMostAncestorInPublication() was checking
pg_publication_rel and pg_publication_namespace to find out if the
table in the ancestor-list is part of a given particular. Both
pg_publication_rel and pg_publication_namespace did not have the entry
"for all tables" publications. That means
GetTopMostAncestorInPublication() was originally not checking whether
the given puboid is an "for all tables" publication to see if a rel
belongs to that particular pub or not. I

But now with the current change, we do check if pub is all-tables pub,
if so, return relid and mark ancestor_level (provided table is not
part of the except list).  IIUC, the result in 2 cases may be
different. Is that the intention? Let me know if my understanding is
wrong.

thanks
Shveta

Re: NUMA shared memory interleaving

2025-06-30 Thread Jakub Wartak

Hi Tomas!

On Fri, Jun 27, 2025 at 6:41 PM Tomas Vondra  wrote:

> I agree we should improve the behavior on NUMA systems. But I'm not sure
> this patch does far enough, or perhaps the approach seems a bit too
> blunt, ignoring some interesting stuff.
>
> AFAICS the patch essentially does the same thing as
>
>numactl --interleave=all
>
> except that it only does that to shared memory, not to process private
> memory (as if we called numa_set_localalloc). Which means it has some of
> the problems people observe with --interleave=all.
>
> In particular, this practically guarantees that (with 4K memory pages),
> each buffer hits multiple NUMA nodes. Because with the first half will
> do to node N, while the second half goes to node (N+1).
>
> That doesn't seem great. It's likely better than a misbalanced system
> with everything allocated on a single NUMA node, but I don't see how it
> could be better than "balanced" properly warmed up system where the
> buffers are not split like this.
>
> But OK, admittedly this only happens for 4K memory pages, and serious
> systems with a lot of memory are likely to use huge pages, which makes
> this less of an issue (only the buffers crossing the page boundaries
> might get split).
>
>
> My bigger comment however is that the approach focuses on balancing the
> nodes (i.e. ensuring each node gets a fair share of shared memory), and
> is entirely oblivious to the internal structure of the shared memory.
>
> * It interleaves the shared segment, but it has many pieces - shared
> buffers are the largest but not the only one. Does it make sense to
> interleave all the other pieces?
>
> * Some of the pieces are tightly related. For example, we talk about
> shared buffers as if it was one big array, but it actually is two arrays
> - blocks and descriptors. Even if buffers don't get split between nodes
> (thanks to huge pages), there's no guarantee the descriptor for the
> buffer does not end on a different node.
>
> * In fact, the descriptors are so much smaller that blocks that it's
> practically guaranteed all descriptors will end up on a single node.
>
>
> I could probably come up with a couple more similar items, but I think
> you get the idea. I do think making Postgres NUMA-aware will require
> figuring out how to distribute (or not distribute) different parts of
> the shared memory, and do that explicitly. And do that in a way that
> allows us to do other stuff in NUMA-aware way, e.g. have a separate
> freelists and clocksweep for each NUMA node, etc.

I do understand what you mean, but I'm *NOT* stating here that it
makes PG fully "NUMA-aware". I actually try to avoid doing so with
each sentence. This is only about the imbalance problem specifically.
I think we could build those follow-up optimizations as separate
patches in this or follow-up threads. If we would do it all in one
giant 0001 (without split) the very first question would be to
quantify the impact of each of those optimizations (for which we would
probably need more GUCs?). Here I'm just showing that the very first
baby step - interleaving - helps avoid interconnect saturation in some
cases too.

Anyway, even putting the fact that local mallocs() would be
interleaved, adjusting systemd startup scripts to just include
`numactl --interleave=all` sounds like some dirty hack not like proper
UX.

Also please note that:
* I do not have lot of time to dedicate towards it, yet I was kind of
always interested in researching that and wondering why we couldn't it
for such long time, therefore the previous observability work and now
$subject (note it is not claiming to be full blown NUMA awareness,
just some basic NUMA interleave as first [well, second?] step).
* I've raised this question in the first post "How to name this GUC
(numa or numa_shm_interleave) ?" I still have no idea, but `numa`,
simply looks better, and we could just add way more stuff to it over
time (in PG19 or future versions?). Does that sound good?

> That's something numa_interleave_memory simply can't do for us, and I
> suppose it might also have other downsides on large instances. I mean,
> doesn't it have to create a separate mapping for each memory page?
> Wouldn't that be a bit inefficient/costly for big instances?

No? Or what kind of mapping do you have in mind? I think our shared
memory on the kernel side is just a single VMA (contiguous memory
region), on which technically we execute mbind() (libnuma is just a
wrapper around it). I have not observed any kind of regressions,
actually quite the opposite. Not sure what you also mean by 'big
instances' (AFAIK 1-2TB shared_buffers might even fail to start).

> Of course, I'm not saying all this as a random passerby - I've been
> working on a similar patch for a while, based on Andres' experimental
> NUMA branch. It's far from complete/perfect, more of a PoC quality, but
> I hope to share it on the mailing list sometime soon.

Cool, I didn't know Andres's branch was public till now, I know h

Re: Conflict detection for update_deleted in logical replication

2025-06-30 Thread Amit Kapila

On Fri, Jun 27, 2025 at 7:58 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Here is the V43 patch set which includes the following changes:
>

Few minor comments:
1.
@@ -29645,8 +29651,10 @@ postgres=# SELECT '0/0'::pg_lsn +
pd.segment_number * ps.setting::int + :offset
 Copies an existing logical replication slot
 named src_slot_name to a logical replication
 slot named dst_slot_name, optionally changing
-the output plugin and persistence.  The copied logical slot starts
-from the same LSN as the source logical slot.  Both
+the output plugin and persistence.  The name cannot be
+pg_conflict_detection as it is reserved for
+the conflict detection.  The copied logical slot starts from the same
+LSN as the source logical slot.  Both

/The name/The new slot name

2. The server version checks can be modified to 19000 as a new branch
is created now.

3.
This ensures that if the launcher loses track of the slot after
+   * a restart, it will remember to drop the slot when it is no longer
+   * requested by any subscription.

The link between the first part of the sentence before the comma is
not clear with the remaining part of the sentence. How about writing
it as: "Acquire the conflict detection slot at startup to ensure it
can be dropped if no longer needed after a restart."?

-- 
With Regards,
Amit Kapila.

Improve error reporting in 027_stream_regress test

2025-06-30 Thread Nazir Bilal Yavuz

Hi,

This patch aims to improve 027_stream_regress test's regression test
error reporting per Andres' suggestion [1]. It basically dumps 50
lines from head and tail of the regression.diff file to the failure
message if both primary and standby is alive and the regression test
is failed.

I used the diag() function to dump regression.diff to the failure
message. I am not sure if that is the best function to use, open to
suggestions.

One disadvantage of this patch is that regress_log_027_stream_regress
has both complete and head+tail of the regression.diff file.

[1] 
https://www.postgresql.org/message-id/k46gdpibwaevxgb3cefgkl4weykcggal6evlbg5pcei4aswtli%40wrs732j5co3p

--
Regards,
Nazir Bilal Yavuz
Microsoft
From 8c1c69b42607b09dddc74ec25540b6afb48e0bd8 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Tue, 17 Jun 2025 16:08:20 +0300
Subject: [PATCH v1] Improve error reporting in 027_stream_regress test

Previously, the 027_stream_regress test only reported that the regression
test had failed, without showing the actual error. The detailed failure
information was hidden in the regression.diffs file.

This commit improves the situation by including the head and tail of
regression.diffs directly in the failure message if both the primary and
standby are alive. This helps quickly identify the root cause without
needing to open extra files.

Suggested-by: Andres Freund 
---
 src/test/perl/PostgreSQL/Test/Cluster.pm  | 19 
 src/test/recovery/t/027_stream_regress.pl | 56 +++
 2 files changed, 75 insertions(+)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 49b2c86b29c..f5bb78bc138 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -290,6 +290,25 @@ sub connstr
 
 =pod
 
+=item $node->is_alive()
+
+Check if the node is alive.
+
+=cut
+
+sub is_alive {
+my ($self) = @_;
+
+my $host = $self->host;
+my $port = $self->port;
+	my $null = File::Spec->devnull;
+
+my $cmd = "pg_isready -h $host -p $port";
+	return !system("$cmd >$null 2>&1");
+}
+
+=pod
+
 =item $node->raw_connect()
 
 Open a raw TCP or Unix domain socket connection to the server. This is
diff --git a/src/test/recovery/t/027_stream_regress.pl b/src/test/recovery/t/027_stream_regress.pl
index 83def062d11..2bbc8947064 100644
--- a/src/test/recovery/t/027_stream_regress.pl
+++ b/src/test/recovery/t/027_stream_regress.pl
@@ -81,6 +81,8 @@ my $rc =
 	  . "--max-concurrent-tests=20 "
 	  . "--inputdir=../regress "
 	  . "--outputdir=\"$outputdir\"");
+my $primary_alive = $node_primary->is_alive;
+my $standby_alive = $node_standby_1->is_alive;
 if ($rc != 0)
 {
 	# Dump out the regression diffs file, if there is one
@@ -90,9 +92,17 @@ if ($rc != 0)
 		print "=== dumping $diffs ===\n";
 		print slurp_file($diffs);
 		print "=== EOF ===\n";
+
+		# Dump 50 lines from head and tail of regression diffs to failure message
+		if ($primary_alive && $standby_alive)
+		{
+			regression_log_helper($diffs, 50);
+		}
 	}
 }
 is($rc, 0, 'regression tests pass');
+is($primary_alive, 1, 'primary is alive after the regression tests');
+is($standby_alive, 1, 'standby is alive after the regression tests');
 
 # Clobber all sequences with their next value, so that we don't have
 # differences between nodes due to caching.
@@ -181,4 +191,50 @@ UPDATE|t), 'check contents of pg_stat_statements on regression database');
 $node_standby_1->stop;
 $node_primary->stop;
 
+sub regression_log_helper
+{
+	my ($diff_file, $lines_to_dump) = @_;
+	my @lines;
+
+	open my $fh, '<', $diff_file or die "couldn't open file: $diff_file\n";
+
+	# Read all lines to process them below
+	while (my $line = <$fh>)
+	{
+		push @lines, $line;
+	}
+	close $fh;
+
+	my $line_count = scalar @lines;
+
+
+	# If the diff_file has fewer lines than (2 * $lines_to_dump), dump the entire file
+	if ($line_count <= (2 * $lines_to_dump))
+	{
+		diag("\n=== dumping $diff_file ===\n");
+		foreach my $line (@lines)
+		{
+			diag($line);
+		}
+	}
+	else
+	{
+		diag(
+			"\n=== dumping $lines_to_dump lines from head of $diff_file ===\n"
+		);
+		for my $i (0 .. $lines_to_dump - 1)
+		{
+			diag($lines[$i]);
+		}
+		diag(
+			"\n=== dumping $lines_to_dump lines from tail of $diff_file ===\n"
+		);
+		for my $i ($line_count - $lines_to_dump .. $line_count - 1)
+		{
+			diag($lines[$i]);
+		}
+	}
+	diag("=== EOF ===\n\n");
+}
+
 done_testing();
-- 
2.49.0

Re: A concurrent VACUUM FULL?

2025-06-30 Thread Erik Nordström

On Mon, Jun 30, 2025 at 12:03 PM Antonin Houska  wrote:

> Erik Nordström  wrote:
>
> > Hi hackers,
> >
> > I've been looking at the code for CLUSTER/VACUUM FULL, and whether it is
> possible to do a concurrent version of it using a
> > multi-transactional approach similar to concurrent reindexing and
> partition detach.
> >
> > The idea would be to hold weaker locks in TX1 when doing the heap
> rewrite (essentially allow reads but prevent writes), and then do the
> > actual heap swap in a second TX2 transaction.
>
> Patch [1] is in the queue that allows both reads and writes. (An exclusive
> lock is acquired here for the swaps, but that should be held for very short
> time.)
>
>
That sounds great. Do you know if there's anything I can do to help?

- Erik


> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>
> [1] https://commitfest.postgresql.org/patch/5117/
>

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

2025-06-30 Thread Daniil Davydov

Hi,

On Mon, Jun 30, 2025 at 3:47 PM Yugo Nagata  wrote:
>
> On Fri, 27 Jun 2025 18:53:02 +0700
> Daniil Davydov <3daniss...@gmail.com> wrote:
> > This patch fixes postgres behavior if I first create a function and
> > then try to CREATE OR REPLACE it in concurrent transactions.
> > But if the function doesn't exist and I try to call CREATE OR REPLACE
> > in concurrent transactions, I will get an error.
> > I wrote about it in this thread [1] and Tom Lane said that this
> > behavior is kinda expected.
> > Just in case, I decided to mention it here anyway - perhaps you will
> > have other thoughts on this matter.
> >
> > [1] 
> > https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com
>
> I agree with Tom Lane that the behavior is expected, although it would be 
> better
> if the error message were more user-friendly. We could improve it by checking 
> the
> unique constraint before calling index_insert in CatalogIndexInsert.
>

As far as I understand, unique constraint checking is specific for
each index access method.
Thus, to implement the proposed idea, you will have to create a
separate callback for check_unique function.
It doesn't seem like a very neat solution, but there aren't many other
options left.

I would suggest intercepting the error (via PG_CATCH), and if it has
the ERRCODE_UNIQUE_VIOLATION code, change the error message (more
precisely, throw another error with the desired message).
If we caught an error during the CatalogTupleInsert call, we can be
sure that the problem is in concurrent execution, because before the
insertion, we checked that such a tuple does not exist.

What do you think? And in general, are you going to fix this behavior
within this thread?

P.S.
Thank you for updating the patch.

--
Best regards,
Daniil Davydov

amcheck: support for GiST

2025-06-30 Thread Andrey Borodin

Hello hackers!

In PG19 we made a step in improving amcheck and added GIN support.
This is a thread that continues previous work [0].

Please find attached two new steps for amcheck:
1. A function to verify GiST integrity. This patch is in decent shape, simply 
rebased from previous year.
2. Support on pg_amcheck's side for this function. This patch did not receive 
such review attention before. And, perhaps, should be extended to support 
existing GIN functions.

I'll put this thread into July commitfest.

Thanks!


Best regards, Andrey Borodin.

[0] https://postgr.es/m/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru




v2025-06-30-0002-Add-GiST-support-to-pg_amcheck.patch
Description: Binary data


v2025-06-30-0001-Add-gist_index_check-function-to-verify-.patch
Description: Binary data

Re: IPC/MultixactCreation on the Standby server

2025-06-30 Thread Andrey Borodin



> On 28 Jun 2025, at 21:24, Andrey Borodin  wrote:
> 
> This seems to be fixing issue for me.

ISTM I was wrong: there is a possible recovery conflict with snapshot.

REDO:
frame #2: 0x00010179a0c8 postgres`pg_usleep(microsec=100) at 
pgsleep.c:50:10
frame #3: 0x00010144c108 
postgres`WaitExceedsMaxStandbyDelay(wait_event_info=134217772) at 
standby.c:248:2
frame #4: 0x00010144a63c 
postgres`ResolveRecoveryConflictWithVirtualXIDs(waitlist=0x000126008200, 
reason=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, wait_event_info=134217772, 
report_waiting=true) at standby.c:384:8
frame #5: 0x00010144a4f4 
postgres`ResolveRecoveryConflictWithSnapshot(snapshotConflictHorizon=1214, 
isCatalogRel=false, locator=(spcOid = 1663, dbOid = 5, relNumber = 16384)) at 
standby.c:490:2
frame #6: 0x000100e4d3f8 
postgres`heap_xlog_prune_freeze(record=0x000135808e60) at heapam.c:9208:4
frame #7: 0x000100e4d204 postgres`heap2_redo(record=0x000135808e60) 
at heapam.c:10353:4
frame #8: 0x000100f1548c 
postgres`ApplyWalRecord(xlogreader=0x000135808e60, 
record=0x000138058060, replayTLI=0x00016f0425b0) at 
xlogrecovery.c:1991:2
frame #9: 0x000100f13ff0 postgres`PerformWalRecovery at 
xlogrecovery.c:1822:4
frame #10: 0x000100ef7940 postgres`StartupXLOG at xlog.c:5821:3
frame #11: 0x000101364334 
postgres`StartupProcessMain(startup_data=0x, 
startup_data_len=0) at startup.c:258:2


SELECT:
frame #10: 0x000102a14684 postgres`GetMultiXactIdMembers(multi=278, 
members=0x00016d4f9498, from_pgupgrade=false, isLockOnly=false) at 
multixact.c:1493:6
frame #11: 0x000102991814 postgres`MultiXactIdGetUpdateXid(xmax=278, 
t_infomask=4416) at heapam.c:7478:13
frame #12: 0x000102985450 
postgres`HeapTupleGetUpdateXid(tuple=0x0001043e5c60) at heapam.c:7519:9
frame #13: 0x0001029a0360 
postgres`HeapTupleSatisfiesMVCC(htup=0x00016d4f9590, 
snapshot=0x00015b07b930, buffer=69) at heapam_visibility.c:1090:10
frame #14: 0x00010299fbc8 
postgres`HeapTupleSatisfiesVisibility(htup=0x00016d4f9590, 
snapshot=0x00015b07b930, buffer=69) at heapam_visibility.c:1772:11
frame #15: 0x000102982954 
postgres`page_collect_tuples(scan=0x00014b009648, 
snapshot=0x00015b07b930, page="", buffer=69, block=6, lines=228, 
all_visible=false, check_serializable=false) at heapam.c:480:12

page_collect_tuples() holds a lock on the buffer while examining tuples 
visibility, having InterruptHoldoffCount > 0. Tuple visibility check might need 
WAL to go on, we have to wait until some next MX be filled in.
Which might need a buffer lock or have a snapshot conflict with caller of 
page_collect_tuples().


Please find attached a dirty test, it reproduces problem my machine (startup 
deadlock, so when reproduced it takes 180s, normally passing in 10s).
Also, there is a fix: checking for recovery conflicts when falling back to case 
2 MX read.

I do not feel comfortable with using interrupts while InterruptHoldoffCount > 
0, so I need help from someone more knowledgeable about our interrupts 
machinery to tell if what I'm proposing is OK. (Álvaro?)


Also, I've modified the code to make race condition more reproducible.

multi = GetNewMultiXactId(nmembers, &offset);
// random sleep to make WAL order different order of usage on pages
if (rand()%2 == 0)
pg_usleep(1000);
(void) XLogInsert(RM_MULTIXACT_ID, XLOG_MULTIXACT_CREATE_ID);

Perhaps, I can build a fast injection points test if we want it.


Best regards, Andrey Borodin.



v5-0001-Make-next-multixact-sleep-timed-with-a-recovery-c.patch
Description: Binary data

Re: track generic and custom plans in pg_stat_statements

2025-06-30 Thread Sami Imseih

rebased patch.

Only changes to the tests due to the revert of nested query
tracking in f85f6ab051b

Regards,

Sami


v10-0001-Add-plan_cache-counters-to-pg_stat_statements.patch
Description: Binary data

Re: A concurrent VACUUM FULL?

2025-06-30 Thread Álvaro Herrera

On 2025-Jun-30, Erik Nordström wrote:

> On Mon, Jun 30, 2025 at 12:03 PM Antonin Houska  wrote:

> > Patch [1] is in the queue that allows both reads and writes. (An exclusive
> > lock is acquired here for the swaps, but that should be held for very short
> > time.)
>
> That sounds great. Do you know if there's anything I can do to help?

It would be very valuable if you can review the code, test it under the
weirdest conditions you can imagine or just under normal conditions,
proof-read the documentation, try to see if anything is missing that
should be there, and so on.  Everything that you would expect from a new
feature released as part of the next Postgres release.  Any
problems/crashes/ abnormalities that you report before the patch is
included in Postgres, is one less issue that we'll have to deal with
after the release.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)

Re: Report replica identity in pg_publication_tables

2025-06-30 Thread Amit Kapila

On Mon, Jun 30, 2025 at 3:44 PM Ashutosh Bapat
 wrote:
>
> Hi All,
>
> The commit message in the patch says it all, but let me repeat it here.
>

You forgot to attach the patch.

> When debugging issues with logical replication, replica identity
> property of tables in publication is often useful, for example, to
> determine the amount of data logged for an UPDATE or DELETE on a given
> table.
>

I think it can help to determine what is logged for the DELETE or
UPDATE operation, but not the exact amount of data. Can you please
explain with an example how such information can help with debugging?

> Given a set of publications that a WAL sender is using,
> pg_publication_tables can be used to get the list of tables whose
> changes will be replicated including the columns of those tables and
> row
> filters. But the replica identity of those tables needs to be
> separately found out by querying pg_class or joining pg_class with
> pg_publication_tables on schemaname and relname. Adding the replica
> identity column to pg_publication_tables avoids this extra step.
>
> The replica identity of a given table is not a property of
> publication, per say, so it's arguable whether it should be included
> in pg_publication_tables or not.
>

Right, discussing the use case a bit more might help us to find if
this is the right place to add 'replica identity' information.

-- 
With Regards,
Amit Kapila.

Re: Improve explicit cursor handling in pg_stat_statements

2025-06-30 Thread Sami Imseih

rebased patch.

Regards,

Sami


v6-0001-Normalize-variable-fetch-sizes-in-a-FETCH-command.patch
Description: Binary data

RE: Check for existing replication slot in pg_createsubscriber

2025-06-30 Thread Hayato Kuroda (Fujitsu)

Dear Amit, Zane,

> Okay, I find your case a good reason to add such a check, apart from
> making the code consistent in terms of these checks. One thing I was
> thinking is whether it makes sense to add these checks only in
> --dry-run mode because we normally don't expect such conflicts.
> Otherwise, each such check adds an additional network round-trip.

I think there are two things which can be checked on the command side. One task 
is
to see the duplication of names. It can be done by connecting to nodes once and
run SQLs. To avoid the round-trip, this could be added for dry run mode.

Another one is slot-name validation. For now, it completely relies on the 
publisher
side, but it is better to detect earlier. Currently 
ReplicationSlotValidateName()
does the validation, and we can move it to under common/ directory to allow
server/client side can use the function. This does very fundamental validation
for the string and may be able to do in both dry run/normal mode.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Report replica identity in pg_publication_tables

2025-06-30 Thread Ashutosh Bapat

Hi All,

The commit message in the patch says it all, but let me repeat it here.

When debugging issues with logical replication, replica identity
property of tables in publication is often useful, for example, to
determine the amount of data logged for an UPDATE or DELETE on a given
table.

Given a set of publications that a WAL sender is using,
pg_publication_tables can be used to get the list of tables whose
changes will be replicated including the columns of those tables and
row
filters. But the replica identity of those tables needs to be
separately found out by querying pg_class or joining pg_class with
pg_publication_tables on schemaname and relname. Adding the replica
identity column to pg_publication_tables avoids this extra step.

The replica identity of a given table is not a property of
publication, per say, so it's arguable whether it should be included
in pg_publication_tables or not. But the output seems to be useful.
E.g. from the tests

SELECT * FROM pg_publication_tables WHERE pubname = 'testpub6';
 pubname  | schemaname |  tablename  | attnames | rowfilter |
replica_identity
--++-+--+---+--
 testpub6 | public | rf_tbl_abcd_part_pk | {a,b}| (b > 99)  | default
(1 row)

This line gives all the information related to logical replication of
table rf_tbl_abcd_part_pk together.

-- 
Best Wishes,
Ashutosh Bapat

Re: A concurrent VACUUM FULL?

2025-06-30 Thread Erik Nordström

On Mon, Jun 30, 2025 at 1:46 PM Álvaro Herrera  wrote:

> On 2025-Jun-30, Erik Nordström wrote:
>
> > On Mon, Jun 30, 2025 at 12:03 PM Antonin Houska  wrote:
>
> > > Patch [1] is in the queue that allows both reads and writes. (An
> exclusive
> > > lock is acquired here for the swaps, but that should be held for very
> short
> > > time.)
> >
> > That sounds great. Do you know if there's anything I can do to help?
>
> It would be very valuable if you can review the code, test it under the
> weirdest conditions you can imagine or just under normal conditions,
> proof-read the documentation, try to see if anything is missing that
> should be there, and so on.  Everything that you would expect from a new
> feature released as part of the next Postgres release.  Any
> problems/crashes/ abnormalities that you report before the patch is
> included in Postgres, is one less issue that we'll have to deal with
> after the release.
>
>
I'll do my best to test the feature. One question I have, though, is why
not start with supporting concurrent reads but not writes? That would
already be a win and make the patch simpler.

Best,

- Erik


> --
> Álvaro Herrera PostgreSQL Developer  —
> https://www.EnterpriseDB.com/
> "El miedo atento y previsor es la madre de la seguridad" (E. Burke)
>

Re: NUMA shared memory interleaving

2025-06-30 Thread Jakub Wartak

On Mon, Jun 30, 2025 at 12:55 PM Jakub Wartak
 wrote:
[..]
> > FWIW while I think the patch doesn't go far enough, there's one area
> > where I think it probably goes way too far - configurability. I agree
> > it's reasonable to allow running on a subset of nodes, e.g. to split the
> > system between multiple instances etc. But do we need to configure that
> > from Postgres? Aren't people likely to already use something like
> > containers or k8 anyway?
> > I think we should just try to inherit this from
> > the environment, i.e. determine which nodes we're allowed to run, and
> > use that. Maybe we'll find we need to be smarter, but I think we caan
> > leave that for later.
>
> That's what "numa=all" is all about (take whatever is there in the
> OS/namespace)

My error, that should be: that's what "numa=AUTO" is all about (..)

-J.

Re: A concurrent VACUUM FULL?

2025-06-30 Thread Antonin Houska

Erik Nordström  wrote:

> On Mon, Jun 30, 2025 at 1:46 PM Álvaro Herrera  wrote:
> 
>  On 2025-Jun-30, Erik Nordström wrote:
> 
>  > On Mon, Jun 30, 2025 at 12:03 PM Antonin Houska  wrote:
> 
>  > > Patch [1] is in the queue that allows both reads and writes. (An 
> exclusive
>  > > lock is acquired here for the swaps, but that should be held for very 
> short
>  > > time.)
>  >
>  > That sounds great. Do you know if there's anything I can do to help?
> 
>  It would be very valuable if you can review the code, test it under the
>  weirdest conditions you can imagine or just under normal conditions,
>  proof-read the documentation, try to see if anything is missing that
>  should be there, and so on.  Everything that you would expect from a new
>  feature released as part of the next Postgres release.  Any
>  problems/crashes/ abnormalities that you report before the patch is
>  included in Postgres, is one less issue that we'll have to deal with
>  after the release.
> 
> I'll do my best to test the feature.

Thanks. I've noticed that the patch set needs rebase. I'll try to prepare a
new version today.

> One question I have, though, is why not start with supporting concurrent 
> reads but not writes? That would
> already be a win and make the patch simpler.

It occurred to me at some point too, but I think it would be rather a
different implementation. So if we were to support both read-only and
read-write modes, the amount of code would be even higher.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

Re: Report replica identity in pg_publication_tables

2025-06-30 Thread Ashutosh Bapat

On Mon, Jun 30, 2025 at 5:17 PM Amit Kapila  wrote:
>
> On Mon, Jun 30, 2025 at 3:44 PM Ashutosh Bapat
>  wrote:
> >
> > Hi All,
> >
> > The commit message in the patch says it all, but let me repeat it here.
> >
>
> You forgot to attach the patch.

Sorry. Here it is
>
> > When debugging issues with logical replication, replica identity
> > property of tables in publication is often useful, for example, to
> > determine the amount of data logged for an UPDATE or DELETE on a given
> > table.
> >
>
> I think it can help to determine what is logged for the DELETE or
> UPDATE operation, but not the exact amount of data. Can you please
> explain with an example how such information can help with debugging?

No. The change itself won't tell the amount of data that will be
logged. But given a publication it will tell what all tables being
published by that publication are using replica identity full - which
causes more columns to be logged compared to replica identity default
or index.

>
> > Given a set of publications that a WAL sender is using,
> > pg_publication_tables can be used to get the list of tables whose
> > changes will be replicated including the columns of those tables and
> > row
> > filters. But the replica identity of those tables needs to be
> > separately found out by querying pg_class or joining pg_class with
> > pg_publication_tables on schemaname and relname. Adding the replica
> > identity column to pg_publication_tables avoids this extra step.
> >
> > The replica identity of a given table is not a property of
> > publication, per say, so it's arguable whether it should be included
> > in pg_publication_tables or not.
> >
>
> Right, discussing the use case a bit more might help us to find if
> this is the right place to add 'replica identity' information.

In our case, we had multiple replication slots, each with a different
publication. One of the slots was lagging because of the amount of
data being sent through the slot. For that we wanted to know which
tables are being published through the corresponding publication and
what's the replica identity of each of the tables.
pg_publication_tables gave us the list of tables but in order to get
its replica identity we needed to join it with pg_class again.
pg_publication_tables already joins pg_class. Exposing replica
identity through pg_publication_tables makes it more convenient to get
all the information related to a tables replication through that
publication in a single line without much code change or run time
cost.

-- 
Best Wishes,
Ashutosh Bapat
From 534135e55ea228a42f735f9dd3cc3bead9b12f70 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 27 Jun 2025 12:25:31 +0530
Subject: [PATCH] Report replica identity property of tables in
 pg_publication_tables

When debugging issues with logical replication, replica identity
property of tables in publication is often useful, for example, to
determine the amount of data logged for an UPDATE or DELETE on a table.

Given a set of publications that a WAL sender is using,
pg_publication_tables can be used to get the list of tables whose
changes will be replicated including the columns of those tables and row
filters. But the replica identity of those tables needs to be separately
found out by querying pg_class or joining pg_class with
pg_publication_tables. Adding replica identity column to
pg_publication_tables avoids this extra step.

The replica identity for a given table does not change with publication
so the information will be repeated as many times the number of
publications a given table is part of. But the repetition is worth the
convenience.

Ashutosh Bapat
---
 doc/src/sgml/system-views.sgml|  9 +++
 src/backend/catalog/system_views.sql  |  9 ++-
 src/test/regress/expected/publication.out | 96 ++-
 src/test/regress/expected/rules.out   |  9 ++-
 src/test/regress/sql/publication.sql  |  9 +++
 5 files changed, 109 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 986ae1f543d..8ec1b7ba499 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -2558,6 +2558,15 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
Expression for the table's publication qualifying condition

+
+ 
+  
+   replica_identity text
+  
+  
+   Replica identity setting of the table.
+  
+ 

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 08f780a2e63..34fca1bba54 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -388,7 +388,14 @@ CREATE VIEW pg_publication_tables AS
   WHERE a.attrelid = GPT.relid AND
 a.attnum = ANY(GPT.attrs)
 ) AS attnames,
-pg_get_expr(GPT.qual, GPT.relid) AS rowfilter
+pg_get_expr(GPT.qual, GPT.relid) AS rowfilter,
+case C.relrepli

RE: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages

2025-06-30 Thread Hayato Kuroda (Fujitsu)

Dear Vignesh,

> I was unable to reproduce the same test failure on the PG17 branch,
> even after running the test around 500 times. However, on the master
> branch, the failure consistently reproduces approximately once in
> every 50 runs. I also noticed that while the buildfarm has reported
> multiple failures for this test for the master branch, none of them
> appear to be on the PG17 branch. I'm not yet sure why this discrepancy
> exists.

I was also not able to reproduce as-is. After analyzing bit more, I found on
PG17, the workload cannot generate an FPI_FOR_HINT. The type of WAL record
has longer length than the page there was a possibility that the WAL record
could be flushed partially in HEAD. But in PG17 it could not happen so that
OVERWRITE_CONTRECORD won't be appeared.

I modified the test code like [1] and confirmed that the same stuck could happen
on PG17. It generates a long record which can go across the page and can be
flushed partially.

[1]: 
```
--- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
+++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
@@ -123,6 +123,10 @@ $node->safe_psql('postgres',
 $node->safe_psql('postgres',
q{select injection_points_wakeup('checkpoint-before-old-wal-removal')});
 
+# Generate a long WAL record
+$node->safe_psql('postgres',
+   q{select pg_logical_emit_message(false, '', repeat('123456789', 
1000))});
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages

2025-06-30 Thread vignesh C

On Mon, 30 Jun 2025 at 17:41, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> > I was unable to reproduce the same test failure on the PG17 branch,
> > even after running the test around 500 times. However, on the master
> > branch, the failure consistently reproduces approximately once in
> > every 50 runs. I also noticed that while the buildfarm has reported
> > multiple failures for this test for the master branch, none of them
> > appear to be on the PG17 branch. I'm not yet sure why this discrepancy
> > exists.
>
> I was also not able to reproduce as-is. After analyzing bit more, I found on
> PG17, the workload cannot generate an FPI_FOR_HINT. The type of WAL record
> has longer length than the page there was a possibility that the WAL record
> could be flushed partially in HEAD. But in PG17 it could not happen so that
> OVERWRITE_CONTRECORD won't be appeared.
>
> I modified the test code like [1] and confirmed that the same stuck could 
> happen
> on PG17. It generates a long record which can go across the page and can be
> flushed partially.
>
> [1]:
> ```
> --- a/src/test/recovery/t/046_checkpoint_logical_slot.pl
> +++ b/src/test/recovery/t/046_checkpoint_logical_slot.pl
> @@ -123,6 +123,10 @@ $node->safe_psql('postgres',
>  $node->safe_psql('postgres',
> q{select 
> injection_points_wakeup('checkpoint-before-old-wal-removal')});
>
> +# Generate a long WAL record
> +$node->safe_psql('postgres',
> +   q{select pg_logical_emit_message(false, '', repeat('123456789', 
> 1000))});
> ```

Thanks, Kuroda-san. I’ve prepared a similar test that doesn’t rely on
injection points. The issue reproduced consistently across all
branches up to PG13. You can use the attached
049_slot_get_changes_wait_continously_pg17.pl script (found in the
049_slot_get_changes_wait_continously_pg17.zip file) to verify this.
Just copy the script to src/test/recovery and run the test to observe
the problem.
The included patch addresses the issue. Use
v3_PG17-0001-Fix-infinite-wait-when-reading-partially-written-.patch
for PG17, PG16, and PG15, and
v3_PG14-0001-Fix-infinite-wait-when-reading-partially-written-.patch
for PG14 and PG13.

Regards,
Vignesh
<>


v3_PG14-0001-Fix-infinite-wait-when-reading-partially-written-.patch
Description: Binary data


v3-0001-Fix-infinite-wait-when-reading-partially-written-.patch
Description: Binary data


v3_PG17-0001-Fix-infinite-wait-when-reading-partially-written-.patch
Description: Binary data

Re: [PATCH] Support for basic ALTER TABLE progress reporting.

2025-06-30 Thread Jiří Kavalík

Hi, thank you for the review!


On Fri, Jun 6, 2025 at 10:37 AM jian he  wrote:

> On Mon, Jun 2, 2025 at 3:35 PM Jiří Kavalík  wrote:
> > What I changed:
> >
> > `commands/tablecmds.c`
> > - start and end reporting inside `ATRewriteTables()`
> > - report blocks total, blocks and tuples scanned and possibly tuples
> written in `ATRewriteTable`
> > - add at least phase info in `validateForeignKeyConstraint`, possibly
> more if the check cannot be done by left join
> >
>
> hi.
> similar to DoCopyTo, processed  as  uint64,
> in ATRewriteTable, numTuples should be also uint64 not int?


> + pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_TUPLES_WRITTEN,
> + numTuples);
> +
>

Yes, that makes sense, updated patch attached



> Later we may do constraint checks in  ``foreach(l, tab->constraints)``.
> putting it after table_tuple_insert would be more appropriate, IMHO.


As I understand it, if we get an error from any of the constraints, the
ALTER fails anyway so there seems to be no window for mismatch. But it
might make sense to move it into the same block where `table_tuple_insert`
is anyway.


>
> static void
> ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE
> lockmode,
> AlterTableUtilityContext *context)
> {
> ListCell   *ltab;
> /* Go through each table that needs to be checked or rewritten */
> foreach(ltab, *wqueue)
> {
> AlteredTableInfo *tab = (AlteredTableInfo *) lfirst(ltab);
> /* Relations without storage may be ignored here */
> if (!RELKIND_HAS_STORAGE(tab->relkind))
> continue;
> /* Start progress reporting */
> pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER,
> tab->relid);
> pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
> PROGRESS_CLUSTER_COMMAND_ALTER_TABLE);
> }
>
> Perhaps this is a bit early—we haven't completed the error checks yet.
> we have several "ereport(ERROR..." in below.
> maybe place it before ATRewriteTable, IMHO.
>

There are three different branches under it, two of them containing
`ATRewriteTable()` calls. So I picked a place before that branching instead
of starting in two separate places. It seems simpler but thats my only
argument so I am open to other opinions (for both this and where to put the
"tuples_written" update).

Best regards
jkavalik
From e5a60149438308d4d5e107f9cc36d5d2f659188d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ji=C5=99=C3=AD=20Kaval=C3=ADk?= 
Date: Sun, 25 May 2025 23:23:56 +0200
Subject: [PATCH] ALTER TABLE progress support

---
 doc/src/sgml/monitoring.sgml | 13 +++
 doc/src/sgml/ref/alter_table.sgml| 10 ++
 src/backend/catalog/storage.c|  7 
 src/backend/catalog/system_views.sql |  2 ++
 src/backend/commands/tablecmds.c | 54 
 src/include/commands/progress.h  |  2 ++
 src/test/regress/expected/rules.out  |  2 ++
 7 files changed, 84 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4265a22d4de..09307c5f490 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -400,7 +400,7 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34 
  0:00 postgres: ser
  
   
pg_stat_progress_clusterpg_stat_progress_cluster
   One row for each backend running
-   CLUSTER or VACUUM FULL, showing 
current progress.
+   CLUSTER, VACUUM FULL or 
ALTER TABLE, showing current progress.
See .
   
  
@@ -5492,7 +5492,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
PostgreSQL has the ability to report the 
progress of
certain commands during command execution.  Currently, the only commands
which support progress reporting are ANALYZE,
-   CLUSTER,
+   CLUSTER, ALTER TABLE,
CREATE INDEX, VACUUM,
COPY,
and  (i.e., replication
@@ -5738,8 +5738,9 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
 
   
-   Whenever CLUSTER or VACUUM FULL is
-   running, the pg_stat_progress_cluster view will
+   Whenever CLUSTER, VACUUM FULL
+   or ALTER TABLE is running,
+   the pg_stat_progress_cluster view will
contain a row for each backend that is currently running either command.
The tables below describe the information that will be reported and
provide information about how to interpret it.
@@ -5801,7 +5802,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
command text
   
   
-   The command that is running. Either CLUSTER or 
VACUUM FULL.
+   The command that is running. Either CLUSTER, 
VACUUM FULL or ALTER TABLE.
   
  
 
@@ -5884,7 +5885,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
   
 
   
-   CLUSTER and VACUUM FULL Phases
+   CLUSTER, VACUUM FULL and ALTER TABLE Phases

 
 
diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index d63f3a621ac..228f27ac5fc 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.s

Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options

2025-06-30 Thread Krasiyan Andreev

Hi,
Patch applies and compiles, all included tests passed and performance gain
is really impressive. I have been using the latest versions for months with
real data and didn't find any bugs, so It is definitely ready for committer
status.

На пн, 30.06.2025 г. в 8:26 Tatsuo Ishii  написа:

> Attached is the v15 patch to fix CFbot complains.
> Other than that, nothing has been changed since v14.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>

[PATCH] Prevent replacement of a function if it's used in an index expression and is not IMMUTABLE

2025-06-30 Thread sundayjiang(蒋浩天)

Hi hackers,

The purpose of this patch is to prevent replacing a function via `CREATE OR 
REPLACE FUNCTION` with a new definition that is not marked as `IMMUTABLE`, if 
the existing function is referenced by an index expression.

Replacing such functions may lead to index corruption or runtime semantic 
inconsistencies, especially when the function’s output is not stable for the 
same input.

This patch is motivated by our analysis of both recent and earlier PostgreSQL 
security vulnerabilities, notably:

- CVE-2020-25695: A privilege escalation issue caused by non-IMMUTABLE 
expressions.
- CVE-2024-1713: A similar privilege escalation issue related to the `plv8` 
extension, akin to CVE-2020-25695.

Although these CVE vulnerabilities have been fixed, we believe this patch 
enforces a stricter rule that further enhances PostgreSQL’s robustness:

If a function is used in an index, it can only be replaced if it is declared as 
`IMMUTABLE`.

This strategy aligns with PostgreSQL’s established assumption that `IMMUTABLE` 
functions are safe to use in indexes and their behavior should remain 
consistent after updates.

We hope this contribution benefits the community, and we welcome your valuable 
feedback.

Sincerely, 
xiaojiluo (Tencent Yunding Lab) 

0001-Prevent-replacement-of-a-function-if-it-s-used-in-an.patch
Description: Binary data

Re: Historic snapshot doesn't track txns committed in BUILDING_SNAPSHOT state

2025-06-30 Thread cca5507

Hi,


If I understand correctly, this may break the basic principle:


The aim is to build a snapshot that behaves the same as a freshly taken 
MVCC snapshot would have at the time the XLogRecord was generated.


--
Regards,
ChangAo Chen



-- Original --
From:   
 "Ajin Cherian" 
   
https://www.postgresql.org/message-id/Zrmh7X8jYCbFYXjH%40ip-10-97-1-34.eu-west-3.compute.internal

I see this problem is similar to the bug reported in [1], and your fix
also addresses the issue reported there. Although I like your approach
of tracking changes starting from the BUILDING_SNAPSHOT state, I??d
like to suggest an alternative.

While debugging that issue, my plan was not to track catalog changes
prior to SNAPBUILD_CONSISTENT, but instead to ensure we don??t use
snapshots built before SNAPBUILD_CONSISTENT, since we don??t track
catalog changes in those states. We should discard previously built
snapshots and rebuild them once we reach the SNAPBUILD_CONSISTENT
state. At that point, all necessary transactions would have been
committed, and builder->xmin would have advanced enough to decode all
transactions from then on.

The problem is that previously built snapshots hang around without the
latest xmin and xmax, and we tend to reuse them. We should ensure that
all txn->base_snapshot and builder->snapshot snapshots built in the
SNAPBUILD_FULL_SNAPSHOT state are rebuilt once we reach
SNAPBUILD_CONSISTENT. For this, we need to track when the snapshot was
built. There is already a field in ReorderBufferTXN -
'base_snapshot_lsn' which we can use. If base_snapshot_lsn <
builder->start_decoding_at, then we should rebuild the snapshot. Just
a thought.

regards,
Ajin Cherian
Fujitsu Australia

[1] - 
https://www.postgresql.org/message-id/18509-983f064d174ea880%40postgresql.org

Re: doc: explain pgstatindex fragmentation

2025-06-30 Thread Peter Eisentraut


On 24.01.25 15:41, Frédéric Yhuel wrote:



On 1/24/25 14:58, Laurenz Albe wrote:

On Fri, 2025-01-24 at 13:34 +, Bertrand Drouvot wrote:
+ Since indexes have a default fillfactor of 90, this should be 
around 0.9 for

+ newly built indexes

I think 0.9 should be replaced by 90 (that's the actual kind of 
output we'd get).




Damn! I missed that one too...

But having said that, I'm not sure we should mention those 90 stuff 
because it
depends of the amount of data indexed (I mean if the index has a very 
few
leaf pages, say < 5, then it's easy to be << 90 since it's an 
average). That's
probably not the majority of indexes though so maybe just nuance the 
sentence a

bit.


Sorry about the 0.9.

Perhaps the wording could be more careful: ... this should be around 
90 for

most newly built indexes of non-neglectable size.



It looks good to me (apart from the typo). v4 attached


committed

Re: tab complete for COPY populated materialized view TO

2025-06-30 Thread Fujii Masao





On 2025/04/11 4:37, Kirill Reshke wrote:

On Fri, 11 Apr 2025 at 00:33, David G. Johnston
 wrote:


They are supported for the From variant; valid completions need only satisfy 
one of to/from, not both.



Thank you. If so, then WFM, and I don't have any more objections.


I've pushed the patch. Thanks!

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

A concurrent VACUUM FULL?

2025-06-30 Thread Erik Nordström

Hi hackers,

I've been looking at the code for CLUSTER/VACUUM FULL, and whether it is
possible to do a concurrent version of it using a multi-transactional
approach similar to concurrent reindexing and partition detach.

The idea would be to hold weaker locks in TX1 when doing the heap rewrite
(essentially allow reads but prevent writes), and then do the actual heap
swap in a second TX2 transaction.

Having experimented a bit with this approach, I found that reindexing is an
issue because that happens after the new heap has been swapped in. The
current reindex during a heap swap effectively blocks reads so if one
starts a new transaction after swapping heaps, it will block reads for a
long time.

This made me think about two ways to handle this:

1. Rebuild indexes on the temporary heap in TX1 and then swap in the new
indexes along with the new heap in TX2.

2. Do a concurrent index rebuild after the heap swap.

Of the two approaches above, (2) seems easiest to implement, but the
downside is that indexes would be invalid while indexes are rebuilt.
Therefore, (1) seems to be the more desirable one because all the heavy
lifting would be done in TX1 on the temporary heap.

Does anyone have a sense of whether approach (1) is feasible or whether
there are any major blockers?

Is this worth pursuing at all or am I missing something?

Best regards,

Erik
-- 
Database Architect, Timescale

Re: Logical Replication of sequences

2025-06-30 Thread Nisha Moond

On Fri, Jun 27, 2025 at 8:50 AM shveta malik  wrote:
>
> On Wed, Jun 25, 2025 at 7:42 PM Shlok Kyal  wrote:
> > Hi Vignesh,
> >
> > I tested with all patches applied. I have a comment:
> >
> > Let consider following case:
> > On publisher create a publication pub1 on all sequence. publication
> > has sequence s1. The curr value of s1 is 2
> > and On subscriber we have subscription on pub1 and sequence s1 has
> > value 5. Now we run:
> > "ALTER SUBSCRIPTION sub1 REFRESH PUBLICATION SEQUENCES"
> >
> > Now on subscriber currval still show '5':
> > postgres=# select currval('s1');
> >  currval
> > -
> >5
> > (1 row)
> >
> > But when we do nextval on s1 on subscriber we get '3'. Which is
> > correct assuming sequence is synced:
> > postgres=# select nextval('s1');
> >  nextval
> > -
> >3
> > (1 row)
> > postgres=# select currval('s1');
> >  currval
> > -
> >3
> > (1 row)
> >
> > Is this behaviour expected? I feel the initial " select
> > currval('s1');" should have displayed '2'. Thoughts?
>
> As per docs at [1], currval returns the value most recently obtained
> by nextval for this sequence in the current session. So behaviour is
> in alignment with docs. I tested this across multiple sessions,
> regardless of whether sequence synchronization occurs or not. The
> behavior remains consistent across sessions. As an example, if in
> session2 the sequence has advanced to a value of 10 by calling
> nextval, the currval in session1 still shows the previous value of 3,
> which was obtained by nextval in session1. So, it seems expected to
> me. But let's wait for Vignesh's comments on this.
>
> [1]: https://www.postgresql.org/docs/current/functions-sequence.html
>

I also agree. It is expected behavior, as currval returns the last
nextval generated in the same session. I've also observed this across
multiple sessions.

--
Thanks,
Nisha

Re: Logical Replication of sequences

2025-06-30 Thread Nisha Moond

On Wed, Jun 25, 2025 at 9:26 AM shveta malik  wrote:
>
> On Tue, Jun 24, 2025 at 6:44 PM Shlok Kyal  wrote:
> >
> >
> > 1. Initially, I have created a publication on sequence s1.
> > postgres=# CREATE PUBLICATION pub1 FOR ALL SEQUENCES;
> > CREATE PUBLICATION
> > postgres=# ALTER PUBLICATION pub1 SET TABLE t1;
> > ALTER PUBLICATION
> > postgres=# \d s1
> >  Sequence "public.s1"
> >   Type  | Start | Minimum |   Maximum   | Increment | Cycles? | 
> > Cache
> > +---+-+-+---+-+---
> >  bigint | 1 |   1 | 9223372036854775807 | 1 | no  | 
> > 1
> > Publications:
> > "pub1"
> > postgres=# select * from pg_publication_rel;
> >   oid  | prpubid | prrelid | prqual | prattrs
> > ---+-+-++-
> >  16415 |   16414 |   16388 ||
> > (1 row)
> >
> > Here, we can set the publication to TABLE or TABLES FOR SCHEMA. Should
> > this be allowed?
> > If a publication is created on FOR ALL TABLES, such an operation is not 
> > allowed.
> >
>
> Good catch. IMO, this should not be allowed as currently we strictly
> support either ALL SEQUENCES or ALL SEQUENCES with ALL TABLES alone.
>

+1

A similar situation existed for the ALTER PUBLICATION ... ADD ...
command as reported in [1] (point #3).
This has been addressed in v20250630, where similar to ALL TABLES, ADD
or SET operations are now disallowed for ALL SEQUENCES publications.

[1] 
https://www.postgresql.org/message-id/CABdArM7h1qQLUb_S7i6MrLPEtHXnX%2BY2fPQaSnqhCdHktcQk5Q%40mail.gmail.com

--
Thanks,
Nisha

Re: pg_logical_slot_get_changes waits continously for a partial WAL record spanning across 2 pages

2025-06-30 Thread Dilip Kumar

On Fri, Jun 27, 2025 at 9:29 PM vignesh C  wrote:
>
> On Fri, 27 Jun 2025 at 07:05, Michael Paquier  wrote:
> >
> > On Thu, Jun 26, 2025 at 05:25:42PM +0530, vignesh C wrote:
> > > On Thu, 26 Jun 2025 at 06:22, Michael Paquier  wrote:
> > >> So you are suggesting the addition of an extra ReadPageInternal() that
> > >> forces a read of only the read, perform the checks on the header, then
> > >> read the rest.  After reading SizeOfXLogShortPHD worth of data,
> > >> shouldn't the checks on xlp_rem_len be done a bit earlier than what
> > >> you are proposing in this patch?
> > >
> > > Modified
> >
> > It seems to me that this assert can be moved after the second page
> > read:
> > Assert(SizeOfXLogShortPHD <= readOff);
>
> I felt this Assert can be here to ensure we’ve read SizeOfXLogShortPHD
> before checking the page header contents. But for the second page
> read, the following existing Assert should be enough:
> Assert(pageHeaderSize <= readOff);

Thanks make sense.

> >
> > Coming back to the point of Kuroda-san about performance, did you do
> > some checks related to that and did you measure any difference?  I
> > suspect none of that because in most cases we are just going to fetch
> > the next page and we would trigger the fast-exit path of
> > ReadPageInternal() on the second call when fetching the rest.  I still
> > need to get an idea of all that by myself, probably with various
> > lengths of logical message records.
>
> The test execution times are measured in microseconds. In the results
> table below, the first row indicates the message size, and each value
> represents the median of 5 test runs.
> The attached script was used to run these tests. In the attached
> script the MSG_SIZE variable in the script should be changed to 1000(1
> page), 1 (2 pages approx), 25000 (3 pages approx) , 5 (6 pages
> approx), 10 (12 page approx), 100 (122 pages approx), 1000
>  (1220 pages approx) and 1 (12207 pages approx) and be run.
> Test execution time can be taken from run_*.dat files that will be
> generated.
>
> Size   | 1000| 1  | 25000| 5| 10  | 100
> |---|---||-||--
> Head | 9297.1 | 9895.4 | 10844.2 | 12946.5 | 16945.1 | 86187.1
> Patch | 9222.7 | 9889   | 10897.1 | 12904.2 | 16858.4 | 87115.5
>
> Size| 1000  | 1
> -||-
> HEAD | 804965.6   | 331639.7
> Patch  | 804942.6   | 321198.6
>
> The performance results show that the patch does not introduce any
> noticeable overhead across varying message sizes, I felt there was no
> impact because of the additional page header read.

Yeah, that doesn't seem like a regression.

-- 
Regards,
Dilip Kumar
Google

Re: Error with DEFAULT VALUE in temp table

2025-06-30 Thread Антуан Виолин

> Hi everyone,
> I found one bug, when you delete temporary table with DEFAULT VALUE. The
> row about this VALUE in the pg_event_trigger_dropped_objects() contains
> “False” in the column “is_temporary”. But if you look at column “name_obj”,
> you see “for pg_temp.table_name”. So PostgreSQL know, that it is temporary.
>
> Cheers
>
> Antoine Violin\
>
Hi everyone,
I made patch for this problem, I changed event_trigger, for
definitions of temporality
DEFAULT VALUE

Cheers

Antoine Violin
From aefa629be36f74b3067317ed67874f6afc73 Mon Sep 17 00:00:00 2001
From: Antoine Violin 
Date: Mon, 30 Jun 2025 10:20:52 +0700
Subject: [PATCH] Changed event_trigger for DEFAULT VALUE
---
 src/backend/commands/event_trigger.c | 137 ---
 1 file changed, 80 insertions(+), 57 deletions(-)

diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 31b2f51b589..e3aa1d36c26 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -21,6 +21,7 @@
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
 #include "catalog/objectaccess.h"
+#include "catalog/pg_attrdef.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_auth_members.h"
 #include "catalog/pg_database.h"
@@ -1276,6 +1277,69 @@ trackDroppedObjectsNeeded(void)
  * objects "reentrantly".
  */
 
+static bool
+process_catalog_object(Oid classId, Oid objectId, SQLDropObject *obj)
+{
+	Relation	catalog;
+	HeapTuple	tuple;
+
+	catalog = table_open(classId, AccessShareLock);
+	tuple = get_catalog_object_by_oid(catalog,
+	  get_object_attnum_oid(classId),
+	  objectId);
+
+	if (tuple)
+	{
+		AttrNumber	attnum;
+		Datum		datum;
+		bool		isnull;
+
+		attnum = get_object_attnum_namespace(classId);
+		if (attnum != InvalidAttrNumber)
+		{
+			datum = heap_getattr(tuple, attnum,
+ RelationGetDescr(catalog), &isnull);
+			if (!isnull)
+			{
+Oid			namespaceId;
+
+namespaceId = DatumGetObjectId(datum);
+/* temp objects are only reported if they are my own */
+if (isTempNamespace(namespaceId))
+{
+	obj->schemaname = "pg_temp";
+	obj->istemp = true;
+}
+else if (isAnyTempNamespace(namespaceId))
+{
+	table_close(catalog, AccessShareLock);
+	return false;
+}
+else
+{
+	obj->schemaname = get_namespace_name(namespaceId);
+	obj->istemp = false;
+}
+			}
+		}
+
+		if (get_object_namensp_unique(classId) &&
+			obj->address.objectSubId == 0)
+		{
+			attnum = get_object_attnum_name(classId);
+			if (attnum != InvalidAttrNumber)
+			{
+datum = heap_getattr(tuple, attnum,
+	 RelationGetDescr(catalog), &isnull);
+if (!isnull)
+	obj->objname = pstrdup(NameStr(*DatumGetName(datum)));
+			}
+		}
+	}
+
+	table_close(catalog, AccessShareLock);
+	return true;
+}
 /*
  * Register one object as being dropped by the current command.
  */
@@ -1302,7 +1366,6 @@ EventTriggerSQLDropAddObject(const ObjectAddress *object, bool original, bool no
 	obj->address = *object;
 	obj->original = original;
 	obj->normal = normal;
-
 	/*
 	 * Obtain schema names from the object's catalog tuple, if one exists;
 	 * this lets us skip objects in temp schemas.  We trust that
@@ -1311,66 +1374,26 @@ EventTriggerSQLDropAddObject(const ObjectAddress *object, bool original, bool no
 	 */
 	if (is_objectclass_supported(object->classId))
 	{
-		Relation	catalog;
-		HeapTuple	tuple;
-
-		catalog = table_open(obj->address.classId, AccessShareLock);
-		tuple = get_catalog_object_by_oid(catalog,
-		  get_object_attnum_oid(object->classId),
-		  obj->address.objectId);
-
-		if (tuple)
+		if (!process_catalog_object(object->classId, object->objectId, obj))
 		{
-			AttrNumber	attnum;
-			Datum		datum;
-			bool		isnull;
-
-			attnum = get_object_attnum_namespace(obj->address.classId);
-			if (attnum != InvalidAttrNumber)
-			{
-datum = heap_getattr(tuple, attnum,
-	 RelationGetDescr(catalog), &isnull);
-if (!isnull)
-{
-	Oid			namespaceId;
-
-	namespaceId = DatumGetObjectId(datum);
-	/* temp objects are only reported if they are my own */
-	if (isTempNamespace(namespaceId))
-	{
-		obj->schemaname = "pg_temp";
-		obj->istemp = true;
-	}
-	else if (isAnyTempNamespace(namespaceId))
-	{
-		pfree(obj);
-		table_close(catalog, AccessShareLock);
-		MemoryContextSwitchTo(oldcxt);
-		return;
-	}
-	else
-	{
-		obj->schemaname = get_namespace_name(namespaceId);
-		obj->istemp = false;
-	}
-}
-			}
-
-			if (get_object_namensp_unique(obj->address.classId) &&
-obj->address.objectSubId == 0)
+			pfree(obj);
+			MemoryContextSwitchTo(oldcxt);
+			return;
+		}
+	}
+	else if (object->classId == AttrDefaultRelationId)
+	{
+		ObjectAddress relAddress;
+		relAddress = GetAttrDefaultColumnAddress(object->objectId);
+		if (OidIsValid(relAddress.objectId))
+		{
+			if(!process_catalog_object(relAddress.classId, rel

Re: speedup COPY TO for partitioned table.

2025-06-30 Thread torikoshia

On 2025-06-27 16:14, jian he wrote:
Thanks for updating the patch!

On Thu, Jun 26, 2025 at 9:43 AM torikoshia  
wrote:

After applying the patch, blank lines exist between these statements 
as

below. Do we really need these blank lines?

```
  scan_rel = table_open(scan_oid,
AccessShareLock);

  CopyThisRelTo(cstate, scan_rel, cstate->rel,
&processed);

  table_close(scan_rel, AccessShareLock);
``

we can remove these empty new lines.
actually, I realized we don't need to use AccessShareLock here—we can 
use NoLock

instead, since BeginCopyTo has already acquired AccessShareLock via
find_all_inheritors.

That makes sense.
I think it would be helpful to add a comment explaining why NoLock is 
safe here — for example:

  /* We already got the needed lock */

In fact, in other places where table_open(..., NoLock) is used, similar 
explanatory comments are often included(Above comment is one of them).

> +/*
> + * rel: the relation from which the actual data will be copied.
> + * root_rel: if not NULL, it indicates that we are copying partitioned
> relation
> + * data to the destination, and "rel" is the partition of "root_rel".
> + * processed: number of tuples processed.
> +*/
> +static void
> +CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,

This comment only describes the parameters. Wouldn't it better to add 
a

brief summary of what this function does overall?

what do you think the following

/*
 * CopyThisRelTo:
 * This will scanning a single table (which may be a partition) and 
exporting

 * its rows to a COPY destination.
 *
 * rel: the relation from which the actual data will be copied.
 * root_rel: if not NULL, it indicates that we are copying partitioned 
relation

 * data to the destination, and "rel" is the partition of "root_rel".
 * processed: number of tuples processed.
*/
static void
CopyThisRelTo(CopyToState cstate, Relation rel, Relation root_rel,
  uint64 *processed)

I think it would be better to follow the style of nearby functions in 
the same file. For example:

  /*
   * Scan a single table (which may be a partition) and export
   * its rows to the COPY destination.
   */

Also, regarding the function name CopyThisRelTo() — I wonder if the 
"This" is really necessary?

Maybe something simpler like CopyRelTo() is enough.

What do you think?

Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.

Re: add function for creating/attaching hash table in DSM registry

2025-06-30 Thread Rahila Syed

Hi,


Here is what I have staged for commit.
>
>
Thank you for sharing the updated patch.
This looks good to me. I used the GetNamedDSA added by this patch in my
code and it worked fine.

 /* XXX: Should we verify params matches what table was created with? */

Are you planning to address the above before you commit?  It seems like a
helpful check since GetNamedDshash
takes the params as arguments. That said, I don't have a strong preference
either way.

 I have a question: is there a way to remove the entries from the registry
and free the space?
 For example, if a user decides to call dshash_destroy the dshash entry in
the registry would no longer be needed.

Thank you,
Rahila Syed

Re: Typo in pg_stat_activity definition

2025-06-30 Thread Daisuke Higuchi

 Thank you for comments!

>> One thing that you are forgetting is that the regression
>> tests are going to fail: rules.out reports the definition of
>> pg_stat_activity, and the characters casing matters in this case.
>
>I doubt it --- the regression tests will show deparsing results,
>which shouldn't change.

Yes, applying this patch will not cause the 'rules' test to fail.
Even if the pg_stat_activity view definition information in
system_views.sql is capitalized (inconsistency is fixed) by this patch, it
will be lowercase when this view is installed in database. So, the test
result does not change.

Just to be sure, I ran the 'make check' and 'make check-world', and I
confirmed that all tests were passed.

Regards.

Report bytes and transactions actually sent downtream

2025-06-30 Thread Ashutosh Bapat

Hi All,
In a recent logical replication issue, there were multiple replication
slots involved, each using a different publication. Thus the amount of
data that was replicated through each slot was expected to be
different. However, total_bytes and total_txns were reported the same
for all the replication slots as expected. One of the slots started
lagging and we were trying to figure out whether its the WAL sender
slowing down or the consumer  (in this case Debezium). The lagging
slot then showed total_txns and total_bytes lesser than other slots
giving an impression that the WAL sender is processing the data
slowly. Had pg_stat_replication_slot reported the amount of data
actually sent downstream, it would have been easier to compare it with
the amount of data received by the consumer and thus pinpoint the
bottleneck.

Here's a patch to do the same. It adds two columns
- sent_txns: The total number of transactions sent downstream.
- sent_bytes: The total number of bytes sent downstream in data messages
to pg_stat_replication_slots. sent_bytes includes only the bytes sent
as part of 'd' messages and does not include keep alive messages or
CopyDone messages for example. But those are very few and can be
ignored. If others feel that those are important to be included, we
can make that change.

Plugins may choose not to send an empty transaction downstream. It's
better to increment sent_txns counter in the plugin code when it
actually sends a BEGIN message, for example in pgoutput_send_begin()
and pg_output_begin(). This means that every plugin will need to be
modified to increment the counter for it to reported correctly.

I first thought of incrementing sent_bytes in OutputPluginWrite()
which is a central function for all logical replication message
writes. But that calls LogicalDecodingContext::write() which may
further add bytes to the message e.g. WalSndWriteData() and
LogicalOutputWrite(). So it's better to increment the counter in
implementations of LogicalDecodingContext::write(), so that we count
the exact number of bytes. These implementations are within core code
so they won't miss updating sent_bytes.

I think we should rename total_txns and total_bytes to reordered_txns
and reordered_bytes respectively, and also update the documentation
accordingly to make better sense of those numbers. But these patches
do not contain that change. If others feel the same way, I will
provide a patch with that change.

-- 
Best Wishes,
Ashutosh Bapat
From 1a385b30d6cb4ca111cbcc16ea14017c08f9a579 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 27 Jun 2025 09:16:23 +0530
Subject: [PATCH] Report data sent statistics in pg_stat_replication_slots

pg_stat_replication_slots reports the total number of transactions and bytes
added to the reorder buffer. This is the amount of WAL that is processed by the
WAL sender. This data undergoes filtering and logical decoding before sending to
the downstream. Hence the amount of WAL added to the reorder buffer does not
serve as a good metric for the amount of data sent downstream. Knowing the
amount data sent downstream is useful when debugging slowly moving logical
replication.

This patch adds two new columns to pg_stat_replication_slots:
- sent_txns: The total number of transactions sent downstream.
- sent_bytes: The total number of bytes sent downstream in data messages

Ashutosh Bapat
---
 contrib/test_decoding/expected/stats.out  | 54 ++-
 contrib/test_decoding/sql/stats.sql   | 12 +++--
 contrib/test_decoding/t/001_repl_stats.pl | 14 ++---
 contrib/test_decoding/test_decoding.c |  1 +
 doc/src/sgml/monitoring.sgml  | 27 ++
 src/backend/catalog/system_views.sql  |  2 +
 src/backend/replication/logical/logical.c | 10 +++-
 .../replication/logical/logicalfuncs.c|  2 +
 .../replication/logical/reorderbuffer.c   |  2 +
 src/backend/replication/pgoutput/pgoutput.c   |  1 +
 src/backend/replication/walsender.c   |  2 +
 src/backend/utils/activity/pgstat_replslot.c  |  2 +
 src/backend/utils/adt/pgstatfuncs.c   | 14 +++--
 src/include/catalog/pg_proc.dat   |  6 +--
 src/include/pgstat.h  |  2 +
 src/include/replication/reorderbuffer.h   |  8 +++
 src/test/recovery/t/006_logical_decoding.pl   | 12 ++---
 src/test/regress/expected/rules.out   |  4 +-
 18 files changed, 126 insertions(+), 49 deletions(-)

diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index de6dc416130..867a8506051 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -37,28 +37,34 @@ SELECT pg_stat_force_next_flush();
  
 (1 row)
 
-SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots ORDER BY slot_name;
-   slot_name| spill

Re: Logical Replication of sequences

2025-06-30 Thread Nisha Moond

On Wed, Jun 25, 2025 at 3:10 PM Shlok Kyal  wrote:
>
>
> 4. Since we are not adding sequences in the list 'sub_remove_rels',
> should we only palloc for (the count of no. of tables)? Is it worth
> the effort?
> /*
> * Rels that we want to remove from subscription and drop any slots
> * and origins corresponding to them.
> */
> sub_remove_rels = palloc(subrel_count * sizeof(SubRemoveRels));
>

The sub_remove_rels array allocates memory for all relations in the
subscription, even though it only uses entries for those that are
actually removed.
While this may result in unnecessary allocation, even when only tables
are involved. OTOH, as it’s a short-lived variable, pre-allocating can
help with performance.
This requires further analysis, I plan to handle this in the next version.

--
Thanks,
Nisha

Re: A concurrent VACUUM FULL?

2025-06-30 Thread Antonin Houska

Erik Nordström  wrote:

> Hi hackers,
> 
> I've been looking at the code for CLUSTER/VACUUM FULL, and whether it is 
> possible to do a concurrent version of it using a
> multi-transactional approach similar to concurrent reindexing and partition 
> detach.
> 
> The idea would be to hold weaker locks in TX1 when doing the heap rewrite 
> (essentially allow reads but prevent writes), and then do the
> actual heap swap in a second TX2 transaction.

Patch [1] is in the queue that allows both reads and writes. (An exclusive
lock is acquired here for the swaps, but that should be held for very short
time.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

[1] https://commitfest.postgresql.org/patch/5117/

Re: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION

2025-06-30 Thread Yugo Nagata

On Fri, 27 Jun 2025 18:53:02 +0700
Daniil Davydov <3daniss...@gmail.com> wrote:

> Hi,
> 
> On Thu, Jun 5, 2025 at 5:21 PM Yugo Nagata  wrote:
> >
> > I've attached updated patches.
> >
> 
> I have some comments on v4-0001 patch :

Thank you for your comments!

> 1)
> heap_freetuple should be called for every tuple that we get from
> SearchSysCacheCopy3.
> But if tuple is valid after the first SearchSysCacheCopy3, we
> overwrite the old pointer (by the second SearchSysCacheCopy3 call) and
> forget to free it.
> I suggest adding heap_freetuple call before the second SearchSysCacheCopy3 
> call.

Good catches. Fixed.

> 2)
> +Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup);
> +Datum   proargnames;
> +boolisnull;
> +const char *dropcmd;
> Strange alignment. I guess you should keep the same alignment as in
> deleted declarations.

Fixed.

I've attached patches including these fixes.

> 3)
> This patch fixes postgres behavior if I first create a function and
> then try to CREATE OR REPLACE it in concurrent transactions.
> But if the function doesn't exist and I try to call CREATE OR REPLACE
> in concurrent transactions, I will get an error.
> I wrote about it in this thread [1] and Tom Lane said that this
> behavior is kinda expected.
> Just in case, I decided to mention it here anyway - perhaps you will
> have other thoughts on this matter.
> 
> [1] 
> https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com

I agree with Tom Lane that the behavior is expected, although it would be better
if the error message were more user-friendly. We could improve it by checking 
the
unique constraint before calling index_insert in CatalogIndexInsert.

Regards,
Yugo Nagata

-- 
Yugo Nagata 
>From 7c9146f221a253aa3a150f6b6d5600ed0225e4e1 Mon Sep 17 00:00:00 2001
From: Yugo Nagata 
Date: Thu, 5 Jun 2025 15:32:37 +0900
Subject: [PATCH v5 3/3] Improve error reporting for concurrent updates on
 system catalog tuples

Previously, when multiple sessions attempted to modify the same system
catalog tuple concurrently due to insufficient locking, DDL commands could
fail with an internal error:

ERROR:  tuple concurrently updated

This commit improves the behavior by reporting a more appropriate and
user-facing error message in such cases, making it easier for users to
understand the cause of the failure.
---
 src/backend/access/heap/heapam.c  | 34 ---
 .../expected/syscache-update-pruned.out   |  2 +-
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0dcd6ee817e..712b7ded6d2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3194,6 +3194,7 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 {
 	TM_Result	result;
 	TM_FailureData tmfd;
+	bool is_catalog = IsCatalogRelation(relation);
 
 	result = heap_delete(relation, tid,
 		 GetCurrentCommandId(true), InvalidSnapshot,
@@ -3211,11 +3212,23 @@ simple_heap_delete(Relation relation, ItemPointer tid)
 			break;
 
 		case TM_Updated:
-			elog(ERROR, "tuple concurrently updated");
+			if (is_catalog)
+ereport(ERROR,
+		(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		 errmsg("operation failed due to a concurrent command"),
+		 errdetail("Another command modified the same object in a concurrent session.")));
+			else
+elog(ERROR, "tuple concurrently updated");
 			break;
 
 		case TM_Deleted:
-			elog(ERROR, "tuple concurrently deleted");
+			if (is_catalog)
+ereport(ERROR,
+		(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		 errmsg("operation failed due to a concurrent command"),
+		 errdetail("Another command dropped the same object in a concurrent session.")));
+			else
+elog(ERROR, "tuple concurrently deleted");
 			break;
 
 		default:
@@ -4486,6 +4499,7 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 	TM_Result	result;
 	TM_FailureData tmfd;
 	LockTupleMode lockmode;
+	bool is_catalog = IsCatalogRelation(relation);
 
 	result = heap_update(relation, otid, tup,
 		 GetCurrentCommandId(true), InvalidSnapshot,
@@ -4503,11 +4517,23 @@ simple_heap_update(Relation relation, ItemPointer otid, HeapTuple tup,
 			break;
 
 		case TM_Updated:
-			elog(ERROR, "tuple concurrently updated");
+			if (is_catalog)
+ereport(ERROR,
+		(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		 errmsg("operation failed due to a concurrent command"),
+		 errdetail("Another command modified the same object in a concurrent session.")));
+			else
+elog(ERROR, "tuple concurrently updated");
 			break;
 
 		case TM_Deleted:
-			elog(ERROR, "tuple concurrently deleted");
+			if (is_catalog)
+ereport(ERROR,
+		(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		 errmsg("operation failed due to a concurrent command

Re: Missing program_XXX calling in pgbench tests

2025-06-30 Thread Peter Eisentraut


On 12.06.25 05:23, Hayato Kuroda (Fujitsu) wrote:

+1 to focusing on the 0001 patch.

Since this isn't a bug fix, I'm not sure back-patching is strictly necessary.
That said, it does improve consistency and test coverage, e.g., by adding checks
like help text length, so I'd be fine with back-patching if others see value in 
it.


Initially I thought this was helpful even for back branches, but it is not
100% needed.
No objections even if it is only applied to master - it can check new features 
in
future.


committed

Re: pg_get_multixact_members not documented

2025-06-30 Thread Sami Imseih

On Mon, Jun 30, 2025 at 7:29 AM Ashutosh Bapat
 wrote:
>
> On Sat, Jun 28, 2025 at 4:02 AM Nathan Bossart  
> wrote:
> >
> > On Thu, Jun 12, 2025 at 03:10:56PM -0500, Nathan Bossart wrote:
> > > Barring comments/objections, I'll plan on committing/back-patching this in
> > > the near future.
> >
> > Here is what I have staged for commit.  I ended up moving it to the
> > "Transaction ID and Snapshot Information Functions" table, which is what I
> > think you had originally proposed.  Creating a new section for this seemed
> > unnecessary, and this table already has one multixact-related function.
>
> WFM.

Correct, I originally proposed the "Transaction ID and Snapshot
Information Functions"
section, but as stated in [0],
pg_get_multixact_members does not fit the description of the section as it
is described as  "The main use of these functions is to determine
which transactions
were committed between two snapshots."

IMO, it makes more sense to keep this function in a new section as we
have in v4.

[0] 
https://www.postgresql.org/message-id/CAA5RZ0vtSPzx%2B8HW1gohcWPgURX2VST5j3Nqh2OL6h9dSk0HSg%40mail.gmail.com

--
Sami

Re: Periodic FSM vacuum doesn't happen in one-pass strategy vacuum.

2025-06-30 Thread Melanie Plageman

On Tue, Jun 24, 2025 at 6:59 PM Melanie Plageman
 wrote:
>
> So, I think we should commit the fix you proposed.
>
> The only question I have left is implementation: should we have
> ndeleted as an output parameter of lazy_scan_prune() or have
> lazy_scan_prune() return it (instead of void)?
>
> In <= 16, heap_page_prune() returned the number of tuples deleted, so
> I thought of maybe having lazy_scan_prune() do this. Though, maybe it
> is confusing to have one result returned as the return value and the
> others returned in output parameters unless there is something more
> special about ndeleted. With heap_page_prune(), I think it was the
> return value because that was kind of what heap_page_prune()
> "accomplished".

Hi Sawada-san,

Just checking what you thought about this. We probably want to get
this committed and backported relatively soon. I'm happy to help with
that if needed but just want to make sure we are on the same page
about the fix.

- Melanie

Re: pg_get_multixact_members not documented

2025-06-30 Thread Nathan Bossart

On Mon, Jun 30, 2025 at 04:08:04PM +0300, Sami Imseih wrote:
> Correct, I originally proposed the "Transaction ID and Snapshot
> Information Functions" section, but as stated in [0],
> pg_get_multixact_members does not fit the description of the section as
> it is described as  "The main use of these functions is to determine
> which transactions were committed between two snapshots."

Well, is that description correct?  I've used several of these functions,
but I'm not sure I've ever used them for this purpose.

Looking again, pg_get_multixact_members() might need to be added to this
list of exceptions:

However, the functions shown in Table 9.84, except age and mxid_age,
use a 64-bit type xid8 that does not wrap around during the life of an
installation and can be converted to xid by casting if required; see
Section 67.1 for details.

I'm also a little wary of adding a new section on the back-branches because
it will change some section numbers.

-- 
nathan

Re: A concurrent VACUUM FULL?

2025-06-30 Thread wenhui qiu

HI Erik Nordström
 In online production environments, blocking writes is generally
unacceptable in most cases. The only acceptable approach is to allow
concurrent read/write operations, with brief locks permitted only during
the final steps of the process. We can see pg-osc's implementation (
https://github.com/shayonj/pg-osc) for a non-blocking approach to VACUUM
FULL operations."


无病毒。www.avg.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Mon, Jun 30, 2025 at 8:03 PM Antonin Houska  wrote:

> Erik Nordström  wrote:
>
> > On Mon, Jun 30, 2025 at 1:46 PM Álvaro Herrera 
> wrote:
> >
> >  On 2025-Jun-30, Erik Nordström wrote:
> >
> >  > On Mon, Jun 30, 2025 at 12:03 PM Antonin Houska 
> wrote:
> >
> >  > > Patch [1] is in the queue that allows both reads and writes. (An
> exclusive
> >  > > lock is acquired here for the swaps, but that should be held for
> very short
> >  > > time.)
> >  >
> >  > That sounds great. Do you know if there's anything I can do to help?
> >
> >  It would be very valuable if you can review the code, test it under the
> >  weirdest conditions you can imagine or just under normal conditions,
> >  proof-read the documentation, try to see if anything is missing that
> >  should be there, and so on.  Everything that you would expect from a new
> >  feature released as part of the next Postgres release.  Any
> >  problems/crashes/ abnormalities that you report before the patch is
> >  included in Postgres, is one less issue that we'll have to deal with
> >  after the release.
> >
> > I'll do my best to test the feature.
>
> Thanks. I've noticed that the patch set needs rebase. I'll try to prepare a
> new version today.
>
> > One question I have, though, is why not start with supporting concurrent
> reads but not writes? That would
> > already be a win and make the patch simpler.
>
> It occurred to me at some point too, but I think it would be rather a
> different implementation. So if we were to support both read-only and
> read-write modes, the amount of code would be even higher.
>
> --
> Antonin Houska
> Web: https://www.cybertec-postgresql.com
>
>
>

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-06-30 Thread Jacob Champion

On Mon, Jun 30, 2025 at 10:02 AM Daniel Gustafsson  wrote:
> > On 30 Jun 2025, at 18:58, Andres Freund  wrote:
> > Probably just needs to be added to the installed_targets list.
>
> Thanks for the report, I'll take a look today to get it fixed.

Thanks both!

Looking at the installed_targets stuff, though... why do we use `meson
install --no-rebuild` in combination with `depends:
installed_targets`? Can't we just use Meson's dependency tracking
during installation, and avoid this hazard?

--Jacob

Re: BackendKeyData is mandatory?

2025-06-30 Thread Jelte Fennema-Nio

On Mon, 30 Jun 2025 at 19:59, Jacob Champion
 wrote:
>
> On Wed, Jun 25, 2025 at 12:12 AM Jelte Fennema-Nio  wrote:
> > Attached is an attempt at implementing the above. I did not test it
> > against these systems though.
>
> With 0001, psycopg2 appears to function again when talking to a server
> that doesn't send a cancellation key, so that's good.

Great!

> It looks like Heikki has an open item for this, so I'll defer to him

Oh... Sorry for the confusion. I added that open item to the list (so
it would not be missed), and I added Heikki as the owner because he
committed the original patch. I checked now, and since Heikki hasn't
sent an email to hackers since June 10th, I'm not sure that was the
correct decision. So feel free to take ownership of the open item. I
changed it to "Unknown" for now (I did the same for some of the other
protocol docs related patches).

> for copyediting preferences around the comments and commit messages. I
> do have a problem with this part of the new documentation:
>
> +[...] If no
> +BackendKeyData is sent by the server, then that means that the 
> backend
> +does not support canceling queries using the CancelRequest messages.
>
> I don't think we really know that it means that, yet. I'd prefer something 
> like
>
> Beginning with PostgreSQL 18, libpq assumes that servers do not
> support query cancellation if they do not send BackendKeyData.
>
> in the hope that either anyone affected will complain at us to revert,
> or it'll become the de facto meaning after some time.

I understand your intent, but I don't like that the protocol docs
would be saying anything about libpq specific behaviour. I'd like to
be explicit about protocol behaviour, without considering the specific
client. That we weren't before is exactly how we got into the current
mess (maybe those IETF are on to something with their MUST/SHOULD/etc
stuff).

So I still prefer my wording over yours, especially since no-one has
been able to think of any reasonable way a production system could
utilize an "all zeros" cancellation key (except for having only one
connection, which automatically makes it non-production afaict). But I
do understand your concern, so how about this wording:

Since protocol version 3.2, if the server sent no BackendKeyData, then
that means that the backend does not support canceling queries using
the CancelRequest messages. In protocol versions before 3.2 the
behaviour is undefined if the client receives no BackendKeyData.

That way we define the behavior as "sensible" in 3.2, while still
allowing for the rare case that someone somewhere relied on the "all
zeros" cancel message being sent in libpq for PG17. And if it's a big
enough problem, we could then still change libpq to continue sending
"all zeros" if the server used protocol 3.0.

> 0002 seems to result in a connection hang if the server sends a bigger
> key. I think this may be a latent bug in the original patch -- if
> getBackendKeyData() fails, no effort is made to clean up state or
> manage the connection buffer. We just... return.

Hmm... Yeah, seems like that needs a bit more work then.

> Also, what should we do if the server sends a zero-length key?

Given that that only applies to protocol 3.2, I'd like to define that
strictly. How about simply not allowing that and throwing an error in
libpq in that case? e.g. by defining the secret so that it should be
at minimum 4 and at most 256 bytes?

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-30 Thread Tomas Vondra

On 6/27/25 19:33, Bertrand Drouvot wrote:
> Hi,
> 
> On Fri, Jun 27, 2025 at 04:52:08PM +0200, Tomas Vondra wrote:
>> Here's three small patches, that should handle the issue
> 
> Thanks for the patches!
> 
>> 0001 - Adds the batching into pg_numa_query_pages, so that the callers
>> don't need to do anything.
>>
>> The batching doesn't seem to cause any performance regression. 32-bit
>> systems can't use that much memory anyway, and on 64-bit systems the
>> batch is sufficiently large (1024).
> 
> === 1
> 
> -#define pg_numa_touch_mem_if_required(ro_volatile_var, ptr) \
> +#define pg_numa_touch_mem_if_required(ptr) \
> 
> Looks unrelated, should be in 0002?
> 

Of course, I merged it into the wrong patch. Here's a v2 that fixes
this, and also reworded some of the comments and commit messages a
little bit.

In particular it now uses "chunking" instead of "batching". I believe
bathing is "combining multiple requests into a single one", but we're
doing exactly the opposite - splitting a large request into smaller
ones. Which is what "chunking" does.

> === 2
> 
> I thought that it would be better to provide a batch size only in the 32-bit
> case (see [1]), but I now think it makes sense to also provide (a larger) one
> for non 32-bit (as you did) due to the CFI added in 0003 (as it's also good to
> have it for non 32-bit).
> 

Agreed, I think the CFI is a good thing to have.

>> 0002 - Silences the valgrind about the memory touching. It replaces the
>> macro with a static inline function, and adds suppressions for both
>> 32-bit and 64-bits. The 32-bit may be a bit pointless, because on my
>> rpi5 valgrind produces about a bunch of other stuff anyway. But doesn't
>> hurt.
>>
>> The function now looks like this:
>>
>>   static inline void
>>   pg_numa_touch_mem_if_required(void *ptr)
>>   {
>>   volatile uint64 touch pg_attribute_unused();
>>   touch = *(volatile uint64 *) ptr;
>>   }
>>
>> I did a lot of testing on multiple systems to check replacing the macro
>> with a static inline function still works - and it seems it does. But if
>> someone thinks the function won't work, I'd like to know.
> 
> LGTM.
> >> 0003 - While working on these patches, it occurred to me we could/should
>> add CHECK_FOR_INTERRUPTS() into the batch loop. This querying can take
>> quite a bit of time, so letting people to interrupt it seems reasonable.
>> It wasn't possible with just one call into the kernel, but with the
>> batching we can add a CFI.
> 
> Yeah, LGTM.
> 

Thanks!

I plan to push this tomorrow morning.

-- 
Tomas Vondra
From d5dd0631c5c5233cabe2000f310a3f902230a284 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 27 Jun 2025 12:43:20 +0200
Subject: [PATCH v2 1/3] Limit the size of numa_move_pages requests

There's a kernel bug in do_pages_stat(), affecting systems combining
64-bit kernel and 32-bit user space. The function splits the request
into chunks of 16 pointers, but forgets the pointers are 32-bit when
advancing to the next chunk. Some of the pointers get skipped, and
memory after the array is interpreted as pointers. The result is that
the produced status of memory pages is mostly bogus.

Systems combining 64-bit and 32-bit environments like this might seem
rare, but that's not the case - all 32-bit Debian packages are built in
a 32-bit chroot on a system with 64-bit kernel.

This is a long-standing kernel bug (since 2010), affecting pretty much
all kernels, so it'll take time until all systems get a fixed kernel.
Luckily, we can work around the issue by chunking the requests the same
way do_pages_stat() does, at least on affected systems. We don't know
what kernel a 32-bit build will run on, so all 32-bit builds use chunks
of 16 elements (the largest chunk before hitting the issue).

64-bit builds are not affected by this issue, and so could work without
the chunking. But chunking has other advantages, so we apply chunking
even for 64-bit builds, with chunks of 1024 elements.

Reported-by: Christoph Berg 
Author: Christoph Berg 
Author: Bertrand Drouvot 
Discussion: https://postgr.es/m/aetdozlmtzdda...@msg.df7cb.de
---
 src/port/pg_numa.c | 50 +-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/src/port/pg_numa.c b/src/port/pg_numa.c
index 4b487a2a4e8..d5935207d0a 100644
--- a/src/port/pg_numa.c
+++ b/src/port/pg_numa.c
@@ -29,6 +29,19 @@
 #include 
 #include 

+/*
+ * numa_move_pages() chunk size, has to be <= 16 to work around a kernel bug
+ * in do_pages_stat() (chunked by DO_PAGES_STAT_CHUNK_NR). By using the same
+ * chunk size, we make it work even on unfixed kernels.
+ *
+ * 64-bit system are not affected by the bug, and so use much larger chunks.
+ */
+#if SIZEOF_SIZE_T == 4
+#define NUMA_QUERY_CHUNK_SIZE 16
+#else
+#define NUMA_QUERY_CHUNK_SIZE 1024
+#endif
+
 /* libnuma requires initialization as per numa(3) on Linux */
 int
 pg_numa_init(void)
@@ -42,11 +55,46 @@ pg_numa_init(void)
  * We use move_pages(2) syscall here - ins

Re: postmaster uses more CPU in 18 beta1 with io_method=io_uring

2025-06-30 Thread Jim Nasby

+#if defined(HAVE_LIBURING_QUEUE_INIT_MEM) && defined(IORING_SETUP_NO_MMAP)
&& 1

Is that && 1 intentional?

Nit:
+ "mmap(%zu) to determine io_uring_queue_init_mem() support has failed: %m",
IMHO that would read better without "has".

+ /* FIXME: This should probably not stay at DEBUG1? */
+ elog(DEBUG1,
+ "can use combined memory mapping for io_uring, each ring needs %d bytes",
+ ret);
Assuming my read that this is only executed at postmaster start is correct,
I agree that NOTICE would also be reasonable. Though I'm not sure what a
user could actually do with the info...

+ elog(DEBUG1,
+ "can't use combined memory mapping for io_uring, kernel or liburing too
old");
OTOH this message would definitely be of interest to users; I'd say it
should at least be NOTICE, possibly even WARNING. It'd also be good to have
a HINT either explaining the downside or pointing to the docs.

+ * Memory for rings needs to be allocated to the page boundary,
+ * reserve space. Luckily it does not need to be aligned to hugepage
+ * boundaries, even if huge pages are used.
Is "reserve space" left over from something else?
AFAICT pgaio_uring_ring_shmem_size() isn't even reserving space...

On Mon, Jun 30, 2025 at 11:28 AM Andres Freund  wrote:

> Hi,
>
> On 2025-06-05 14:32:10 -0400, Andres Freund wrote:
> > On 2025-06-05 12:47:52 -0400, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > I think this is a big enough pitfall that it's, obviously assuming
> the patch
> > > > has a sensible complexity, worth fixing this in 18. RMT, anyone,
> what do you
> > > > think?
> > >
> > > Let's see the patch ... but yeah, I'd rather not ship 18 like this.
> >
> > I've attached a first draft.
> >
> > I can't make heads or tails of the ordering in configure.ac, so the
> function
> > test is probably in the wrong place.
>
> Any comments on that patch?  I'd hoped for some review comments... Unless
> I'll
> hear otherwise, I'll just do a bit more polish and push..
>
> Greetings,
>
> Andres
>
>
>

Re: Fix some inconsistencies with open-coded visibilitymap_set() callers

2025-06-30 Thread Robert Haas

On Thu, Jun 26, 2025 at 3:56 PM Melanie Plageman
 wrote:
> Here is a rebased version of this (it had some conflicts with recent commits).

One general complaint is that the commit message complains about the
status quo but isn't real clear about what the patch is actually
fixing.

I'm pretty concerned about this change:

 /*
  * If the page is all visible, need to clear that, unless we're only
  * going to add further frozen rows to it.
  *
  * If we're only adding already frozen rows to a previously empty
  * page, mark it as all-visible.
  */
 if (PageIsAllVisible(page) && !(options & HEAP_INSERT_FROZEN))
 {
 all_visible_cleared = true;
 PageClearAllVisible(page);
 visibilitymap_clear(relation,
 BufferGetBlockNumber(buffer),
 vmbuffer, VISIBILITYMAP_VALID_BITS);
 }
-else if (all_frozen_set)
-PageSetAllVisible(page);

One problem is that this falsifies the comment -- the second sentence
in the comment refers to code that the patch deletes. But also, the
all_frozen_set flag is used a bit further down when setting
xlrec->flags, and now we'll set XLH_INSERT_ALL_FROZEN_SET when in fact
we didn't set the all-frozen bit. It seems like the intention here was
that the existing XLOG_HEAP2_MULTI_INSERT record should cover setting
the all-frozen bit, if we do that, and it seems like you're changing
that somehow, but it's not clear to me why we'd want to change that,
and even if we do, and it doesn't appear to me that you've chased down
all the loose ends i.e. if that's the plan, then I'd expect the redo
routine to be modified and XLH_INSERT_ALL_FROZEN_SET to be deleted.

I think this might actually be an underlying design problem with the
patch -- heap_page_set_vm_and_log() seems to want to be in charge of
both what we do with the heap page and what we do with the
corresponding VM bit, but that runs contrary to the caller's need to
bundle the VM bit changes with some WAL record that is doing other
things to the heap page.

heap_page_set_vm_and_log() says that it shouldn't be called during
recovery but doesn't assert that it isn't.

-/*
- * It should never be the case that the visibility map page is set
- * while the page-level bit is clear, but the reverse is allowed (if
- * checksums are not enabled).  Regardless, set both bits so that we
- * get back in sync.
- *
- * NB: If the heap page is all-visible but the VM bit is not set, we
- * don't need to dirty the heap page.  However, if checksums are
- * enabled, we do need to make sure that the heap page is dirtied
- * before passing it to visibilitymap_set(), because it may be logged.
- * Given that this situation should only happen in rare cases after a
- * crash, it is not worth optimizing.
- */
-PageSetAllVisible(page);
-MarkBufferDirty(buf);
-old_vmbits = visibilitymap_set(vacrel->rel, blkno, buf,
-   InvalidXLogRecPtr,
-   vmbuffer, presult.vm_conflict_horizon,
-   flags);
+old_vmbits = heap_page_set_vm_and_log(vacrel->rel, blkno, buf,
+  vmbuffer,
presult.vm_conflict_horizon,
+  flags);

Why are we deleting a huge comment here explaining the intent of the
code when, AFAICS, the intent of the new code is the same? Even if the
intent of the new code isn't the same, why no comment?

-/*
- * If data checksums are enabled (or wal_log_hints=on), we
- * need to protect the heap page from being torn.
- *
- * If not, then we must *not* update the heap page's LSN. In
- * this case, the FPI for the heap page was omitted from the
- * WAL record inserted above, so it would be incorrect to
- * update the heap page's LSN.
- */
-if (XLogHintBitIsNeeded())
-{
-PageheapPage = BufferGetPage(heapBuf);
-
-PageSetLSN(heapPage, recptr);
-}

This is another lengthy explanatory comment that goes away with this
patch, but this time the code goes, too. But why? Presumably, either
(1) this change is wrong, or (2) the preexisting code is wrong, or (3)
the patch is trying to change the contract between visibilitymap_set()
and its callers. If it's (3), then I think there should be more
explanation of that contract change. It is unclear to me why you've
rearranged the header comment for visibilitymap_set() in the way that
you have: some of the material that used to be at the top is now at
the bottom, but it's if anything a little less

Re: A concurrent VACUUM FULL?

2025-06-30 Thread DINESH NAIR

Hi Eric,

Currently the first suggested approach "Rebuild indexes on the temporary heap 
in TX1 and then swap in the new indexes along with the new heap in TX2."  sound 
good.

It would be great if we are able to perform concurrent reads and writes.
In OLTP environments will it lead to slowing of the queries or query 
performance issues 


Thanks

Dinesh Nair



From: Erik Nordström 
Sent: Monday, June 30, 2025 3:19 PM
To: PostgreSQL Hackers 
Subject: A concurrent VACUUM FULL?

You don't often get email from e...@timescale.com. Learn why this is 
important
Caution: This email was sent from an external source. Please verify the 
sender’s identity before clicking links or opening attachments.
Hi hackers,

I've been looking at the code for CLUSTER/VACUUM FULL, and whether it is 
possible to do a concurrent version of it using a multi-transactional approach 
similar to concurrent reindexing and partition detach.

The idea would be to hold weaker locks in TX1 when doing the heap rewrite 
(essentially allow reads but prevent writes), and then do the actual heap swap 
in a second TX2 transaction.

Having experimented a bit with this approach, I found that reindexing is an 
issue because that happens after the new heap has been swapped in. The current 
reindex during a heap swap effectively blocks reads so if one starts a new 
transaction after swapping heaps, it will block reads for a long time.

This made me think about two ways to handle this:

1. Rebuild indexes on the temporary heap in TX1 and then swap in the new 
indexes along with the new heap in TX2.

2. Do a concurrent index rebuild after the heap swap.

Of the two approaches above, (2) seems easiest to implement, but the downside 
is that indexes would be invalid while indexes are rebuilt. Therefore, (1) 
seems to be the more desirable one because all the heavy lifting would be done 
in TX1 on the temporary heap.

Does anyone have a sense of whether approach (1) is feasible or whether there 
are any major blockers?

Is this worth pursuing at all or am I missing something?

Best regards,

Erik
--
Database Architect, Timescale

Re: BackendKeyData is mandatory?

2025-06-30 Thread Jacob Champion

On Wed, Jun 25, 2025 at 12:12 AM Jelte Fennema-Nio  wrote:
> Attached is an attempt at implementing the above. I did not test it
> against these systems though.

With 0001, psycopg2 appears to function again when talking to a server
that doesn't send a cancellation key, so that's good.

It looks like Heikki has an open item for this, so I'll defer to him
for copyediting preferences around the comments and commit messages. I
do have a problem with this part of the new documentation:

+[...] If no
+BackendKeyData is sent by the server, then that means that the backend
+does not support canceling queries using the CancelRequest messages.

I don't think we really know that it means that, yet. I'd prefer something like

Beginning with PostgreSQL 18, libpq assumes that servers do not
support query cancellation if they do not send BackendKeyData.

in the hope that either anyone affected will complain at us to revert,
or it'll become the de facto meaning after some time.

> I also added small commit that checks for the 256 byte key length limit
> described in the protocol documentation. This thread made me remember
> that we talked about that during PGConf.dev.

0002 seems to result in a connection hang if the server sends a bigger
key. I think this may be a latent bug in the original patch -- if
getBackendKeyData() fails, no effort is made to clean up state or
manage the connection buffer. We just... return.

Also, what should we do if the server sends a zero-length key?

Thanks,
--Jacob

Re: pg_get_multixact_members not documented

2025-06-30 Thread Nathan Bossart

On Mon, Jun 30, 2025 at 06:19:10PM +0300, Sami Imseih wrote:
> Perhaps we should think about removing this description, what do you think?

I think it's a good topic for another patch/thread.  Chances are it's not
the only description that could be updated.

>> Looking again, pg_get_multixact_members() might need to be added to this
>> list of exceptions:
>>
>> However, the functions shown in Table 9.84, except age and mxid_age,
>> use a 64-bit type xid8 that does not wrap around during the life of 
>> an
>> installation and can be converted to xid by casting if required; see
>> Section 67.1 for details.
> 
> This function returns an xid and not an int8 such as for example
> ```
> { oid => '3819', descr => 'view members of a multixactid',
>   proname => 'pg_get_multixact_members', prorows => '1000', proretset => 't',
>   provolatile => 'v', prorettype => 'record', proargtypes => 'xid',
>   proallargtypes => '{xid,xid,text}', proargmodes => '{i,o,o}',
>   proargnames => '{multixid,xid,mode}', prosrc => 'pg_get_multixact_members' 
> },
> ```
> ```
> { oid => '2943', descr => 'get current transaction ID',
>   proname => 'txid_current', provolatile => 's', proparallel => 'u',
>   prorettype => 'int8', proargtypes => '', prosrc => 'pg_current_xact_id' },
> ```
> am I missing something?

That's what I mean.  I think it should say

However, the functions shown in Table 9.84, except age, mxid_age, and
pg_get_multixact_members, use a 64-bit type xid8 that...

I noticed that this list of exceptions doesn't exist on v13-v15, presumably
because the docs for age() and mxid_age() were only back-patched to v16
(see commits 48b5aa3 and 15afb7d), which is strange because I believe those
functions are much older.  I don't see any discussion about this choice,
either.  We should probably back-patch those commits to v13 as a
prerequisite to applying this patch.

-- 
nathan

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-30 Thread Jacob Champion

On Wed, Jun 25, 2025 at 3:46 AM Nazir Bilal Yavuz  wrote:
> I wanted to experiment with it.

That was fast, thank you!

> First, I got the current list of
> features from upstream, then disabled the auto features, then
> explicitly enabled these features. I did this only for FreeBSD to show
> my idea roughly.
>
> There are two patches; 0001 disables auto features for all of the
> tasks and 0002 explicitly enables these features for FreeBSD.

Just from a stylistic perspective, I think I'd prefer for
`--auto-features=disabled` to be folded into the beginning of the
MESON_FEATURES variable. Or is there another reason to control them
fully independently?

Also: how do we ensure that none of the previously enabled features
were missed in this list, for all the OSes we've modified? (I'm a
Meson novice.)

> What do you think about this approach? If you are okay with this, I
> can apply it to all CI tasks (and possibly create another thread for
> it).

Seems good to me. I think a new top-level thread would be good too, to
get more eyes on the changes.

Thanks!
--Jacob

Re: AIO v2.5

2025-06-30 Thread Andres Freund

Hi,

On 2025-06-30 08:58:21 +0200, Antonin Houska wrote:
> Andres Freund  wrote:
> 
> > On 2025-03-13 11:53:03 +0100, Antonin Houska wrote:
> > > Attached are a few proposals for minor comment fixes.
> > 
> > Thanks, applied.
> 
> After reading the code a bit more, I noticed that the 'cb_flags' argument of
> PgAioHandleCallbackStage is not really used, at least in the existing
> callbacks. Is there an intention to use it in the future (the patches for
> async write do not seem to indicate so), or is this only a leftover from
> previous versions of the patch?

Thanks for reading through it!

It seemed more symmetric to have it for stage as well as the completion
callback - and I still think that...


> Besides that, I suggest a minor comment fix.

Thanks. Pushed that fix.

Greetings,

Andres Freund

Re: add function for creating/attaching hash table in DSM registry

2025-06-30 Thread Nathan Bossart

On Mon, Jun 30, 2025 at 01:29:22PM +0530, Rahila Syed wrote:
>  /* XXX: Should we verify params matches what table was created with? */
> 
> Are you planning to address the above before you commit?  It seems like a
> helpful check since GetNamedDshash takes the params as arguments. That
> said, I don't have a strong preference either way.

I was not planning on this, primarily because I'm not sure about comparing
the function pointers.  Note the following comment above the declaration of
dshash_parameters:

 * Compare, hash, and copy functions must be supplied even when attaching,
 * because we can't safely share function pointers between backends in general.
 * The user data pointer supplied to the create and attach functions will be
 * passed to these functions.

> I have a question: is there a way to remove the entries from the registry
> and free the space?  For example, if a user decides to call
> dshash_destroy the dshash entry in the registry would no longer be
> needed.

See the following thread:

https://postgr.es/m/flat/CAAdDe3N%3Dj8mbkJJhmU6hTQRUXKEQMoJWsQz7JZyVK%3DrDWnVdiA%40mail.gmail.com

-- 
nathan

Re: Improve error reporting in 027_stream_regress test

2025-06-30 Thread Andres Freund

Hi,

On 2025-06-30 16:01:04 +0300, Nazir Bilal Yavuz wrote:
> This patch aims to improve 027_stream_regress test's regression test
> error reporting per Andres' suggestion [1].

Thanks for working on that!

One thing I don't yet like is that I think we should report if the primary is
alive *before* reporting the diff and skipping reporting it if the primary
crashed. It's not interesting to report a long diff if the server crashed IMO.

Greetings,

Andres Freund

Re: Issue with custom operator in simple case

2025-06-30 Thread Maxim Orlov

AFAICS, we have the following problem constructions:

a IS DISTINCT FROM b
a IS NOT DISTINCT FROM b
a IN (...)
a NOT IN (...)
CASE a WHEN b THEN ... ELSE d END
NULLIF(a, b)
JOIN USING / NATURAL JOIN


As a quick recap, the following options are available to us.
1) Add the "USING operator" clause.
Rejected due to:
* Non-standard syntax.
* ruleutils.c could not safely convert this back to the source text.
* In case "IS DISTINCT FROM" on composite types, using only the eq operator
is maybe not enough.

2) Using default btree opclass/opfamily operators.
Rejected due to:
 * Adding yet another way of selecting an operator to the
existingfunc_select_candidate and opfamily
   seems too complicated and not safe.
 * May not work in some cases.

3) CVE-2018-1058 revert.
Rejected due to obvious reasons.

In my humble opinion, the best way to solve an issue is (1). Whether it's
even worth it. Although it
uses non-standard syntax, it does not break the existing one.

-- 
Best regards,
Maxim Orlov.

Re: postmaster uses more CPU in 18 beta1 with io_method=io_uring

2025-06-30 Thread Burd, Greg

> On Jun 30, 2025, at 12:27 PM, Andres Freund  wrote:
> 
> Hi,
> 
> On 2025-06-05 14:32:10 -0400, Andres Freund wrote:
>> On 2025-06-05 12:47:52 -0400, Tom Lane wrote:
>>> Andres Freund  writes:
 I think this is a big enough pitfall that it's, obviously assuming the 
 patch
 has a sensible complexity, worth fixing this in 18. RMT, anyone, what do 
 you
 think?
>>> 
>>> Let's see the patch ... but yeah, I'd rather not ship 18 like this.
>> 
>> I've attached a first draft.
>> 
>> I can't make heads or tails of the ordering in configure.ac, so the function
>> test is probably in the wrong place.
> 
> Any comments on that patch?  I'd hoped for some review comments... Unless I'll
> hear otherwise, I'll just do a bit more polish and push..

Thanks for doing this work!

I just read through the v1 patch and it looks good.  I have just a few small 
nit-picky questions:

+ #if defined(HAVE_LIBURING_QUEUE_INIT_MEM) && defined(IORING_SETUP_NO_MMAP) && 
1

The '1' looks like cruft, or am I missing something?

+ /* FIXME: This should probably not stay at DEBUG1? */

Worth fixing before pushing?

Also, this returns 'Size' but in the function uses 'size_t' I assume that's 
intentional?

+ static Size
+ pgaio_uring_ring_shmem_size(void)

The next, similar, function below this one returns 'size_t'.

Finally, and this may be me missing something everyone else knows is convention.

+ * XXX: We allocate memory for all PgAioUringContext instances and, if

Is there any reason to keep the 'XXX'?  You ask yourself a question in that 
comment, do you know the answer or was that a request to reviewers for 
feedback? :)

I hope that is helpful.

-greg

> 
> Greetings,
> 
> Andres

GNU/Hurd portability patches

2025-06-30 Thread Michael Banck

Hi,

please find attached the current patches required to get master built
and the testsuites run on Debian's hurd-i386 port. I have not had the
time to test the hurd-amd64 port in the same fashion, but will do so
next.

As mentioned in this thread[1], one needs a fairly recent kernel for the
high-resolution timers in order to avoid regression test failures, and
the buildfarm client run does not like debug_parallel_query.


Michael

[1] 
https://www.postgresql.org/message-id/685a3ccd.170a0220.2d27e6.2232%40mx.google.com
>From 2cf70fa676bacf97c24fd0b80bab6fec06b9f2db Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Tue, 10 Jun 2025 12:23:17 +0200
Subject: [PATCH 1/2] Make safeguard against incorrect fd flags for fsync more
 portable.

The current code assumed that O_RDONLY is defined as 0, but this is not
universally the case. To fix this, mask the flags with O_ACCMODE and
assert that they are O_RDONLY for directories or not O_RDONLY for files.

Co-authored-by: Samuel Thibault
---
 src/backend/storage/file/fd.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 0e8299dd556..cd1e661974a 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -411,14 +411,12 @@ pg_fsync(int fd)
 	{
 		int			desc_flags = fcntl(fd, F_GETFL);
 
-		/*
-		 * O_RDONLY is historically 0, so just make sure that for directories
-		 * no write flags are used.
-		 */
+		desc_flags &= O_ACCMODE;
+
 		if (S_ISDIR(st.st_mode))
-			Assert((desc_flags & (O_RDWR | O_WRONLY)) == 0);
+			Assert(desc_flags == O_RDONLY);
 		else
-			Assert((desc_flags & (O_RDWR | O_WRONLY)) != 0);
+			Assert(desc_flags != O_RDONLY);
 	}
 	errno = 0;
 #endif
-- 
2.39.5

>From 1dec060f19bc3bea23ec3b2b97f93b0b60fdd63e Mon Sep 17 00:00:00 2001
From: Michael Banck 
Date: Tue, 10 Jun 2025 23:34:14 +0200
Subject: [PATCH 2/2] Make sure IOV_MAX is defined.

We stopped defining IOV_MAX on non-Windows systems since 75357ab9 under
the assumption that every non-Windows system defines it in limits.h
already. However, the GNU (Hurd) system does not define this limit.

As POSIX does not mandate IOV_MAX be defined, define it to 16 if it is
undefined.

Co-authored-by: Christoph Berg
---
 src/include/port/pg_iovec.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/include/port/pg_iovec.h b/src/include/port/pg_iovec.h
index df40c7208be..d29721bc7e4 100644
--- a/src/include/port/pg_iovec.h
+++ b/src/include/port/pg_iovec.h
@@ -21,9 +21,6 @@
 
 #else
 
-/* POSIX requires at least 16 as a maximum iovcnt. */
-#define IOV_MAX 16
-
 /* Define our own POSIX-compatible iovec struct. */
 struct iovec
 {
@@ -33,6 +30,11 @@ struct iovec
 
 #endif
 
+/* POSIX requires at least 16 as a maximum iovcnt. */
+#ifndef IOV_MAX
+#define IOV_MAX 16
+#endif
+
 /*
  * Define a reasonable maximum that is safe to use on the stack in arrays of
  * struct iovec and other small types.  The operating system could limit us to
-- 
2.39.5

Re: pg_get_multixact_members not documented

2025-06-30 Thread Sami Imseih

> Sure, I am not against keeping the function in an existing section, but
> we should remove the description mentioned above for clarity.

to be clear, I am suggesting we just remove the second sentence
in the description. Therefore, instead of:

"The functions shown in Table 9.84 provide server transaction information
in an exportable form. The main use of these functions is to determine which
transactions were committed between two snapshots."

mention only:

"The functions shown in Table 9.84 provide server transaction
information in an exportable form."

I don't think we need to add anything else.

--
Sami

Adding wait events statistics

2025-06-30 Thread Bertrand Drouvot

Hi hackers,

Wait events are useful to know what backends are waiting for when there is/was
a performance issue: for this we can sample pg_stat_activity at regular 
intervals
and record historical data. That’s how it is commonly used.

It could also be useful to observe the engine/backends behavior over time and
help answer questions like:

* Is the engine’s wait pattern the same over time?
* Is application’s "A" wait pattern the same over time?
* I observe a peak in wait event "W": is it because "W" is now waiting longer or
is it because it is hit more frequently?
* I observe a peak in some of them (say for example MultiXact%), is it due to a
workload change?

For the above use cases, we need a way to track the wait events that occur 
between
samples: please find attached a proof of concept patch series doing so.

The patch series is made of:

0001 - It generates the WaitClassTable[], a lookup table that will be used by
the wait events statistics. 

The array is indexed by classId (derived from the PG_WAIT_* constants), handles
gaps in the class ID numbering and provides metadata for wait events.

This new array is generated in generate-wait_event_types.pl, so that:

* it now needs lwlocklist.h and wait_classes.h as input parameters
* WAIT_EVENT_CLASS_MASK and WAIT_EVENT_ID_MASK have been moved away from 
wait_event.c

In passing it adds several new macros that will be used by 0002.

0002 -  It adds wait events statistics

It adds a new stat kind PGSTAT_KIND_WAIT_EVENT for the wait event statistics.

This new statistic kind is a fixed one because we know the maximum number of 
wait
events. Indeed:

 * it does not take into account custom wait events as extensions have all they 
need
 at their disposal to create custom stats on their own wait events should they
 want to (limited by WAIT_EVENT_CUSTOM_HASH_MAX_SIZE though).

 * it does not take into account LWLock > LWTRANCHE_FIRST_USER_DEFINED for the 
same
 reasons as above. That said, there is no maximum limitation in 
LWLockNewTrancheId().

 * we don't want to allocate memory in some places where the counters are
 incremented (see 4feba03d8b9). We could still use the same implementation as 
for
 backend statistics (i.e, make use of flush_static_cb) if we really want/need to
 switch to variable stats.

For the moment only the counters are added (an array of currently 285 counters),
we’ll study/discuss about adding the timings once the counters are fully done.

I think we’d have more discussion/debate around the timings (should we add them
by default, add a new GUC, enable them at compilation time?…), that’s why only
the counters are in this patch series.

I think it makes sense as the counters have merit on their own. We currently 
have
273 wait events but the array is 285 long: the reason is that some wait events
classes have "holes". 

A few questions:

 * Do we need to serialize the stats based on their names (as for
 PGSTAT_KIND_REPLSLOT)? This question is the same as "is the ordering preserved
 if file stats format is not changed": I think the answer is yes (see 
f98dbdeb51d)
 , which means there is no need for to_serialized_name/from_serialized_name.

 * What if a new wait event is added? We'd need to change the stats file format,
 unless: the wait event stats kind becomes a variable one or we change a bit the
 way fixed stats are written/read to/from the stat file (we could add a new 
field
 in the PgStat_KindInfo for example).

Note: for some backends the wait events stats are not flushed (walwriter for
example), so we need to find additional places to flush the wait events stats.

0003 - It adds the pg_stat_wait_event view

It also adds documentation and regression tests.

0004 - switching PGSTAT_KIND_WAIT_EVENT to variable sized

It might be better for PGSTAT_KIND_WAIT_EVENT to be a variable sized stats kind.
That way:

* It would be easier to add custom wait events if we want to
* It would be possible to add new wait events without having to change the stats
file format

So adding 0004 to see what it would look like to have a variable sized stats 
kind
instead and decide how we want to move forward. 

It uses the uint32 wait_event_info as the hash key while the hash key is defined
as uint64: that should not be an issue but this patch does explicit casting 
though.

That said it might be better to use all the 64 bits (means not have the half 
full
of zeroes) for the hash key (better hashing distribution?): we could imagine
using something like:

((uint64) wait_event_info) | (((uint64) wait_event_info) << 32)

for the hash key.

If we decide to go that way (means with variable sized kind) then a new patch
series will be provided and will not implement the fixed one but will start
directly with the variable one.

The more I think about it, the more I think we should go for the variable sized
proposal: that's more flexible.

Remarks:

* If we want to add some "ereport” in waitEventIncrementCounter() then we’d need
to take care of the race condition

Re: Periodic FSM vacuum doesn't happen in one-pass strategy vacuum.

2025-06-30 Thread Masahiko Sawada

On Mon, Jun 30, 2025 at 10:20 PM Melanie Plageman
 wrote:
>
> On Tue, Jun 24, 2025 at 6:59 PM Melanie Plageman
>  wrote:
> >
> > So, I think we should commit the fix you proposed.
> >
> > The only question I have left is implementation: should we have
> > ndeleted as an output parameter of lazy_scan_prune() or have
> > lazy_scan_prune() return it (instead of void)?
> >
> > In <= 16, heap_page_prune() returned the number of tuples deleted, so
> > I thought of maybe having lazy_scan_prune() do this. Though, maybe it
> > is confusing to have one result returned as the return value and the
> > others returned in output parameters unless there is something more
> > special about ndeleted. With heap_page_prune(), I think it was the
> > return value because that was kind of what heap_page_prune()
> > "accomplished".
>
> Hi Sawada-san,
>
> Just checking what you thought about this. We probably want to get
> this committed and backported relatively soon. I'm happy to help with
> that if needed but just want to make sure we are on the same page
> about the fix.
>

Sorry for the late response, I was unable to work on this last week.
I'll check your reply and the solution tomorrow, and will get back to
you with my thoughts.

Regards,


-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

array_random

2025-06-30 Thread jian he

hi.

context: [1].
the attached patch already posted in [1].
I don't want to hijack another thread. so I post it in a separate thread.

The attached patch introduces a new function: array_random.
array_random description:
Returns an array filled with random values in the range min <= x <= max,
having dimensions of the lengths specified by dims. The optional lbounds
argument supplies lower-bound values for each dimension (which default
 to all 1).

array_random function signature:
array_random(min int4, max int4, dims int[] [, lbounds int[]]) -> int[]
array_random(min int8, max int8, dims int[] [, lbounds int[]]) -> int8[]
array_random(min numeric, max numeric, dims int[] [, lbounds int[]])
-> numeric[]

demo:
SELECT array_random(1, 6, array[2,5], array[2,4]);
 array_random
--
 [2:3][4:8]={{6,2,2,5,4},{4,5,6,4,6}}

reasons for adding array_random is:
1. This is better than array_fill. This can fill random and constant
values (random, min and max the same).
2.  Building a multi-dimensional PL/pgSQL function equivalent to
array_random is not efficient and is also not easier.

[1] 
https://www.postgresql.org/message-id/CACJufxGRCP19Rm66%3DTSBwmEuVr92FwL_e6YFjmCpJrgu6Km9hQ%40mail.gmail.com
From 0fb93c0edf7c100178794396bc5f09a9696e03ce Mon Sep 17 00:00:00 2001
From: jian he 
Date: Mon, 30 Jun 2025 23:02:57 +0800
Subject: [PATCH v1 1/1] array_random

we can not use function signature as array_random(anyelement, anyelement, int[]
[, int[]]) because currently, we cannot resolve the conflict for array_random(1,
2::bigint). In this case, the first argument should be promoted to bigint.

For example:
create or replace function polyf(x anyelement, y anyelement) returns anyelement as $$
select x + 1
$$ language sql;

select polyf(1, 2::bigint);
ERROR:  function polyf(integer, bigint) does not exist

select polyf(1::bigint, 2);
ERROR:  function polyf(bigint, integer) does not exist

So, we define three separate functions for array_random, similar to the approach
used for the random() function.
now it looks like:
\df array_random
 List of functions
   Schema   | Name | Result data type | Argument data types | Type
+--+--+-+--
 pg_catalog | array_random | bigint[] | min bigint, max bigint, dims integer[], lbounds integer[] DEFAULT NULL::integer[]   | func
 pg_catalog | array_random | integer[]| min integer, max integer, dims integer[], lbounds integer[] DEFAULT NULL::integer[] | func
 pg_catalog | array_random | numeric[]| min numeric, max numeric, dims integer[], lbounds integer[] DEFAULT NULL::integer[] | func
(3 rows)

original discussion: https://postgr.es/m/87plssezpc@163.com
discussion: https://postgr.es/m/XXX
---
 doc/src/sgml/func.sgml   |  36 
 src/backend/catalog/system_functions.sql |  21 ++
 src/backend/utils/adt/arrayfuncs.c   | 258 +++
 src/include/catalog/pg_proc.dat  |  12 ++
 src/test/regress/expected/arrays.out |  85 
 src/test/regress/sql/arrays.sql  |  27 +++
 6 files changed, 439 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 298791858be..a653454b424 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -20654,6 +20654,42 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+
+ array_random
+
+
+array_random ( min integer, max integer,
+dims integer[] , lbounds integer[])
+integer[]
+   
+
+   
+array_random ( min bigint, max bigint,
+dims integer[] , lbounds integer[])
+bigint[]
+   
+
+   
+array_random ( min numeric, max numeric,
+dims integer[], , lbounds integer[])
+numeric[]
+   
+
+   
+Returns an array populated with random values, each value is in the range
+min <= x <= max.
+The array has dimensions specified by dims
+The optional fourth argument (lbounds) supplies lower-bound values for each dimension (which default to all 1).
+See  also.
+   
+   
+array_random(1, 10::bigint, '{2}'::int[])
+{3,3}
+   
+  
+
   

 
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 566f308e443..b3819a0cbdd 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -73,6 +73,27 @@ CREATE OR REPLACE FUNCTION
  VOLATILE PARALLEL RESTRICTED STRICT COST 1
 AS 'drandom_normal';
 
+CREATE OR REPLACE FUNCTION
+ array_random(min integer, max integer, dims integer[], lbounds integer[] DEFAULT NULL)
+ RETURNS integer[]
+ LANGUAGE internal
+ VOLATI

Re: Collation & ctype method table, and extension hooks

2025-06-30 Thread Jeff Davis

On Sun, 2025-06-29 at 12:43 +0200, Peter Eisentraut wrote:
> I wish we could take this further and also run the "ctype is c" case 
> through the method table.  Right now, there are still a bunch of 
> open-coded special cases all over the place, which could be unified. 
> I 
> guess this isn't any worse than before, but maybe this could be a
> future 
> project?

+1. A few things need to be sorted out, but I don't see any major
problem with that.

> Patch 0003 I don't understand.  It replaces type safety by no type 
> safety, and it doesn't have any explanation or comments.  I suppose
> you 
> have further plans in this direction, but until we have seen those
> and 
> have more clarification and explanation, I would hold this back.

Part of it is simply #include cleanliness, because we can't do v16-0004
if we have the provider-specific details in the union. I don't really
like the idea of including ICU headers (indirectly) so many places.
Another part is that I'd like to abstract the providers more completely
-- I've alluded to that a few times but I haven't made an independent
proposal for that yet. Also, the union doesn't offer a lot of type
safety, so I don't see it as a big loss.

But it's not critical right now either, so I won't push for it.

> Patch 0004 seems ok.  But maybe you could explain this better in the 
> commit message, like remove includes from pg_locale.h but instead put
> them in the .c files as needed, and explain why this is possible or 
> suitable now.

It goes with v16-0003, so I will hold this back for now as well.

Regards,
Jeff Davis

Re: NUMA shared memory interleaving

2025-06-30 Thread Tomas Vondra




On 6/30/25 12:55, Jakub Wartak wrote:
> Hi Tomas!
> 
> On Fri, Jun 27, 2025 at 6:41 PM Tomas Vondra  wrote:
> 
>> I agree we should improve the behavior on NUMA systems. But I'm not sure
>> this patch does far enough, or perhaps the approach seems a bit too
>> blunt, ignoring some interesting stuff.
>>
>> AFAICS the patch essentially does the same thing as
>>
>>numactl --interleave=all
>>
>> except that it only does that to shared memory, not to process private
>> memory (as if we called numa_set_localalloc). Which means it has some of
>> the problems people observe with --interleave=all.
>>
>> In particular, this practically guarantees that (with 4K memory pages),
>> each buffer hits multiple NUMA nodes. Because with the first half will
>> do to node N, while the second half goes to node (N+1).
>>
>> That doesn't seem great. It's likely better than a misbalanced system
>> with everything allocated on a single NUMA node, but I don't see how it
>> could be better than "balanced" properly warmed up system where the
>> buffers are not split like this.
>>
>> But OK, admittedly this only happens for 4K memory pages, and serious
>> systems with a lot of memory are likely to use huge pages, which makes
>> this less of an issue (only the buffers crossing the page boundaries
>> might get split).
>>
>>
>> My bigger comment however is that the approach focuses on balancing the
>> nodes (i.e. ensuring each node gets a fair share of shared memory), and
>> is entirely oblivious to the internal structure of the shared memory.
>>
>> * It interleaves the shared segment, but it has many pieces - shared
>> buffers are the largest but not the only one. Does it make sense to
>> interleave all the other pieces?
>>
>> * Some of the pieces are tightly related. For example, we talk about
>> shared buffers as if it was one big array, but it actually is two arrays
>> - blocks and descriptors. Even if buffers don't get split between nodes
>> (thanks to huge pages), there's no guarantee the descriptor for the
>> buffer does not end on a different node.
>>
>> * In fact, the descriptors are so much smaller that blocks that it's
>> practically guaranteed all descriptors will end up on a single node.
>>
>>
>> I could probably come up with a couple more similar items, but I think
>> you get the idea. I do think making Postgres NUMA-aware will require
>> figuring out how to distribute (or not distribute) different parts of
>> the shared memory, and do that explicitly. And do that in a way that
>> allows us to do other stuff in NUMA-aware way, e.g. have a separate
>> freelists and clocksweep for each NUMA node, etc.
> 
> I do understand what you mean, but I'm *NOT* stating here that it
> makes PG fully "NUMA-aware". I actually try to avoid doing so with
> each sentence. This is only about the imbalance problem specifically.
> I think we could build those follow-up optimizations as separate
> patches in this or follow-up threads. If we would do it all in one
> giant 0001 (without split) the very first question would be to
> quantify the impact of each of those optimizations (for which we would
> probably need more GUCs?). Here I'm just showing that the very first
> baby step - interleaving - helps avoid interconnect saturation in some
> cases too.
> 
> Anyway, even putting the fact that local mallocs() would be
> interleaved, adjusting systemd startup scripts to just include
> `numactl --interleave=all` sounds like some dirty hack not like proper
> UX.
> 

I wasn't suggesting to do "numactl --interleave=all". My argument was
simply that doing numa_interleave_memory() has most of the same issues,
because it's oblivious to what's stored in the shared memory. Sure, the
fact that local memory is not interleaved too is an improvement.

But I just don't see how this could be 0001, followed by some later
improvements. ISTM the improvements would have to largely undo 0001
first, and it would be nontrivial if an optimization needs to do that
only for some part of the shared memory.

> Also please note that:
> * I do not have lot of time to dedicate towards it, yet I was kind of
> always interested in researching that and wondering why we couldn't it
> for such long time, therefore the previous observability work and now
> $subject (note it is not claiming to be full blown NUMA awareness,
> just some basic NUMA interleave as first [well, second?] step).

Sorry, I appreciate the time you spent working on these features. It
wasn't my intention to dunk on your patch. I'm afraid this is an example
of how reactions on -hackers are often focused on pointing out issues. I
apologize for that, I should have realized it earlier.

I certainly agree it'd be good to improve the NUMA support, otherwise I
wouldn't be messing with Andres' PoC patches myself.

> * I've raised this question in the first post "How to name this GUC
> (numa or numa_shm_interleave) ?" I still have no idea, but `numa`,
> simply looks better, and we could just add way more stuff to it

Re: ALTER TABLE ALTER CONSTRAINT misleading error message

2025-06-30 Thread Álvaro Herrera

On 2025-Jun-30, Álvaro Herrera wrote:

> > Just one note: Jian's patch doesn't handle the same issue for TRIGGER
> > case, so that part might still need to be addressed.
> 
> Okay, here's my take on this, wherein I reworded the proposed error
> message.  I also handled the NOT VALID case of a constraint trigger; maybe my
> patch is too focused on that specific bit and instead we should handle
> also NO INHERIT and NOT ENFORCED cases, not really sure (it's certainly
> not an important part of this patch).

For ease of review, here's the three patches.  0001 solves the main
problem with ALTER objtype ALTER CONSTRAINT NOT VALID.

I propose to put 0001 in 18 and 19, and leave 0002 and 0003 (as a single
commit) for 19 only, since it's been like that for ages and there have
been zero complaints before my own in the other thread.  I put 0002 as a
separate one just for review, to show that these errors we throw are
nothing new: these commands would also fail if we don't patch this code,
they're just using bad grammar, which is then fixed by 0003.


I think I should list Amul as the fourth co-author of 0001.  That would
make the longest list of coauthorship for a patch that small.  Or I
could just say: "Author: Postgres Global Development Group".

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
>From 484b22d971be13d0c7f5a88d40aff85720a85203 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= 
Date: Mon, 30 Jun 2025 17:07:04 +0200
Subject: [PATCH v5 1/3] Fix error message for ALTER CONSTRAINT ... NOT VALID
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Trying to alter a constraint so that it becomes NOT VALID results in an
error that assumes the constraint is a foreign key.  This is potentially
wrong, so give a more generic error message.

While at it, give CREATE CONSTRAINT TRIGGER a better error message as
well.

Co-authored-by: jian he 
Co-authored-by: Fujii Masao 
Co-authored-by: Álvaro Herrera 
Discussion: https://postgr.es/m/CACJufxHSp2puxP=q8ztugl1f+heapnzqfbzy5zngujugwjb...@mail.gmail.com
---
 src/backend/parser/gram.y | 4 
 src/test/regress/expected/constraints.out | 5 +
 src/test/regress/expected/foreign_key.out | 2 +-
 src/test/regress/sql/constraints.sql  | 3 +++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 50f53159d58..527c7ea75e3 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2668,6 +2668,10 @@ alter_table_cmd:
 		c->alterDeferrability = true;
 	if ($4 & CAS_NO_INHERIT)
 		c->alterInheritability = true;
+	if ($4 & CAS_NOT_VALID)
+		ereport(ERROR,
+errmsg("constraints cannot be altered to be NOT VALID"),
+parser_errposition(@4));
 	processCASbits($4, @4, "FOREIGN KEY",
 	&c->deferrable,
 	&c->initdeferred,
diff --git a/src/test/regress/expected/constraints.out b/src/test/regress/expected/constraints.out
index b5592617d97..ccea883cffd 100644
--- a/src/test/regress/expected/constraints.out
+++ b/src/test/regress/expected/constraints.out
@@ -748,6 +748,11 @@ ALTER TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key ENFORCED;
 ERROR:  cannot alter enforceability of constraint "unique_tbl_i_key" of relation "unique_tbl"
 ALTER TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key NOT ENFORCED;
 ERROR:  cannot alter enforceability of constraint "unique_tbl_i_key" of relation "unique_tbl"
+-- can't make an existing constraint NOT VALID
+ALTER TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key NOT VALID;
+ERROR:  constraints cannot be altered to be NOT VALID
+LINE 1: ...ABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key NOT VALID;
+ ^
 DROP TABLE unique_tbl;
 --
 -- EXCLUDE constraints
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index 6a8f3959345..f9bd252444f 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -1359,7 +1359,7 @@ LINE 1: ...e ALTER CONSTRAINT fktable_fk_fkey NOT DEFERRABLE INITIALLY ...
 ALTER TABLE fktable ALTER CONSTRAINT fktable_fk_fkey NO INHERIT;
 ERROR:  constraint "fktable_fk_fkey" of relation "fktable" is not a not-null constraint
 ALTER TABLE fktable ALTER CONSTRAINT fktable_fk_fkey NOT VALID;
-ERROR:  FOREIGN KEY constraints cannot be marked NOT VALID
+ERROR:  constraints cannot be altered to be NOT VALID
 LINE 1: ...ER TABLE fktable ALTER CONSTRAINT fktable_fk_fkey NOT VALID;
  ^
 ALTER TABLE fktable ALTER CONSTRAINT fktable_fk_fkey ENFORCED NOT ENFORCED;
diff --git a/src/test/regress/sql/constraints.sql b/src/test/regress/sql/constraints.sql
index 12668f0e0ce..7487723ab84 100644
--- a/src/test/regress/sql/constraints.sql
+++ b/src/test/regress/sql/constraints.sql
@@ -537,6 +537,9 @@ CREA

Re: Improve the performance of Unicode Normalization Forms.

2025-06-30 Thread Jeff Davis

On Tue, 2025-06-24 at 18:20 +0300, Alexander Borisov wrote:
> That's what we're aiming for - to implement the fastest approach.

Awesome! Thank you for clarifying this as a goal. Having the fastest
open-source Unicode normalization would be a great thing to highlight
when this is done.

Regards,
Jeff Davis

Re: Issue with custom operator in simple case

2025-06-30 Thread David G. Johnston

On Mon, Jun 30, 2025 at 8:49 AM Maxim Orlov  wrote:

> 3) CVE-2018-1058 revert.
> Rejected due to obvious reasons.
>
>
Not revert but maybe try again at convincing people that DBAs should be
given agency here by opting out of a security system that breaks
functioning code without providing, in reality, any immediate security
benefit.

David J.

Re: ALTER TABLE ALTER CONSTRAINT misleading error message

2025-06-30 Thread Álvaro Herrera

On 2025-Jun-27, Fujii Masao wrote:

> On 2025/06/27 22:30, Álvaro Herrera wrote:
> > On 2025-06-27, Fujii Masao wrote:
> > 
> > > To make this distinction, I just started thinking it's better to raise
> > > the error
> > > in ATExecAlterConstraint() rather than in gram.y. I've attached a draft
> > > patch that
> > > follows this approach.
> > 
> > Hmm I don't like this very much, it feels very kludgy. I think if we want 
> > to consider this in scope for pg18 I would much prefer to use the patch I 
> > mentioned near the beginning of the thread.
> 
> Are you referring to v2-0001-trial.patch proposed at [1]?

Yeah.

> This patch raises an error if ENFORCED, NOT ENFORCED, INHERIT, or NO INHERIT
> is specified. But those options are currently accepted, so these checks seem
> unnecessary for now.

Sure, the patch was developed before those options were added, so I
meant to reference the general approach rather than the specifics.

> Also, the patch raises an error for NOT VALID after calling processCASbits(),
> there's no need to call processCASbits() in the first place. It would be
> cleaner to check for NOT VALID and raise an error before calling it.

Yeah, I remember having this in mind when I read Amul's patch back in
the day, too.

> Jian He already proposed this approach in a patch at [2].
>
> It looks like Jian's patch at [2] is an updated version of the one you 
> referred to,
> so you may agree with that approach. Thought?

Ah, okay, yes I like this approach better.

> Just one note: Jian's patch doesn't handle the same issue for TRIGGER
> case, so that part might still need to be addressed.

Okay, here's my take on this, wherein I reworded the proposed error
message.  I also handled the NOT VALID case of a constraint trigger; maybe my
patch is too focused on that specific bit and instead we should handle
also NO INHERIT and NOT ENFORCED cases, not really sure (it's certainly
not an important part of this patch).

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Entristecido, Wutra (canción de Las Barreras)
echa a Freyr a rodar
y a nosotros al mar"
>From 10b9ac219ba4d01dcdf83c2fb9d7d6b344bd74f2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= 
Date: Mon, 30 Jun 2025 17:07:04 +0200
Subject: [PATCH v4] Fix error message for ALTER CONSTRAINT ... NOT VALID
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Trying to alter a constraint so that it becomes NOT VALID results in an
error that assumes the constraint is a foreign key.  This is potentially
wrong, so give a more generic error message.

While at it, give CREATE CONSTRAINT TRIGGER a better error message as
well.

Co-authored-by: jian he 
Co-authored-by: Fujii Masao 
Co-authored-by: Álvaro Herrera 
Discussion: https://postgr.es/m/CACJufxHSp2puxP=q8ztugl1f+heapnzqfbzy5zngujugwjb...@mail.gmail.com
---
 src/backend/parser/gram.y | 9 +
 src/test/regress/expected/constraints.out | 5 +
 src/test/regress/expected/foreign_key.out | 2 +-
 src/test/regress/expected/triggers.out| 9 +
 src/test/regress/sql/constraints.sql  | 3 +++
 src/test/regress/sql/triggers.sql | 7 +++
 6 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 50f53159d58..8a1712a099d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -2668,6 +2668,10 @@ alter_table_cmd:
 		c->alterDeferrability = true;
 	if ($4 & CAS_NO_INHERIT)
 		c->alterInheritability = true;
+	if ($4 & CAS_NOT_VALID)
+		ereport(ERROR,
+errmsg("constraints cannot be altered to be NOT VALID"),
+parser_errposition(@4));
 	processCASbits($4, @4, "FOREIGN KEY",
 	&c->deferrable,
 	&c->initdeferred,
@@ -6036,6 +6040,11 @@ CreateTrigStmt:
 {
 	CreateTrigStmt *n = makeNode(CreateTrigStmt);
 
+	if (($11 & CAS_NOT_VALID) != 0)
+		ereport(ERROR,
+errmsg("cannot create a NOT VALID constraint trigger"),
+parser_errposition(@11));
+
 	n->replace = $2;
 	if (n->replace) /* not supported, see CreateTrigger */
 		ereport(ERROR,
diff --git a/src/test/regress/expected/constraints.out b/src/test/regress/expected/constraints.out
index b5592617d97..ccea883cffd 100644
--- a/src/test/regress/expected/constraints.out
+++ b/src/test/regress/expected/constraints.out
@@ -748,6 +748,11 @@ ALTER TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key ENFORCED;
 ERROR:  cannot alter enforceability of constraint "unique_tbl_i_key" of relation "unique_tbl"
 ALTER TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key NOT ENFORCED;
 ERROR:  cannot alter enforceability of constraint "unique_tbl_i_key" of relation "unique_tbl"
+-- can't make an existing constraint NOT VALID
+ALTER TABLE unique_tbl ALTER CONSTRAINT unique_tbl_i_key NOT VALID;
+ERROR:  constraints cannot be altered to be NOT VALID
+LINE 1: ...ABLE uniq

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-06-30 Thread Andres Freund

Hi,

On 2025-05-02 10:42:34 -0700, Jacob Champion wrote:
> Great, thanks. I'll push it soon.

I just noticed that I think the dependencies on the meson build aren't quite
sufficient:

andres@awork3:/srv/dev/build/postgres/m-dev-assert$ ninja install-quiet
[2205/2205 1 100%] Generating install-quiet with a custom command
FAILED: install-quiet
/usr/bin/python3 /home/andres/src/meson/meson.py install --quiet --no-rebuild

ERROR: File 'src/interfaces/libpq-oauth/libpq-oauth.a' could not be found
ninja: build stopped: subcommand failed.


Probably just needs to be added to the installed_targets list.

Greetings,

Andres Freund

Re: fix: propagate M4 env variable to flex subprocess

2025-06-30 Thread J. Javier Maestro

On Mon, Jun 30, 2025 at 11:25 AM Peter Eisentraut 
wrote:

> On 17.06.25 07:15, Peter Eisentraut wrote:
> > On 28.05.25 20:42, J. Javier Maestro wrote:
> >> On Wed, May 28, 2025 at 6:08 PM Andres Freund  >> > wrote:
> >>
> >> Hi,
> >>
> >> On 2025-05-17 23:32:24 -0400, J. Javier Maestro wrote:
> >>  > On Tue, May 13, 2025 at 11:54 AM Andres Freund
> >> mailto:and...@anarazel.de>> wrote:
> >>  > > Bilal, I think you wrote this originally, do you recall?
> >>  > >
> >>  > > It seems like an issue beyond just M4...
> >>  > >
> >>  >
> >>  > IIRC the rest of the tools in the environment have ways to be
> >> specified via
> >>  > Meson options (BISON, FLEX, PERL) so the only issue I see is Flex
> >> not being
> >>  > able to find the specific m4 binary. What other issue(s) are you
> >>  > considering?
> >>
> >> PATH, LD_LIBRARY_PATH, ...
> >>
> >> I think this really should just add to the environment, rather than
> >> supplant
> >> it.
> >>
> >>
> >> Ah, understood. That definitely looks like a better option.
> >>
> >> Do you want to write a patch like that? Otherwise I can.
> >>
> >>
> >> Sure, I've attached the new patch. Let me know what you think, and if
> >> it's OK, what are the next steps to get the patch merged in main!
> >
> > This patch looks right to me.
> >
> > I would wait for the PG19 branching at this point, unless there is a
> > concrete need for backpatching.
>
> committed
>

Thanks!

Javier

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-06-30 Thread Daniel Gustafsson

> On 30 Jun 2025, at 18:58, Andres Freund  wrote:
> 
> Hi,
> 
> On 2025-05-02 10:42:34 -0700, Jacob Champion wrote:
>> Great, thanks. I'll push it soon.
> 
> I just noticed that I think the dependencies on the meson build aren't quite
> sufficient:
> 
> andres@awork3:/srv/dev/build/postgres/m-dev-assert$ ninja install-quiet
> [2205/2205 1 100%] Generating install-quiet with a custom command
> FAILED: install-quiet
> /usr/bin/python3 /home/andres/src/meson/meson.py install --quiet --no-rebuild
> 
> ERROR: File 'src/interfaces/libpq-oauth/libpq-oauth.a' could not be found
> ninja: build stopped: subcommand failed.
> 
> Probably just needs to be added to the installed_targets list.

Thanks for the report, I'll take a look today to get it fixed.

--
Daniel Gustafsson

Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

2025-06-30 Thread Bertrand Drouvot

Hi,

On Mon, Jun 30, 2025 at 12:21:44PM +0530, shveta malik wrote:
> On Mon, Jun 30, 2025 at 11:16 AM Masahiko Sawada  
> wrote:
> >
> > I'm not sure if there are users who want to disable the new behavior
> > and use only the current behavior. I think we can focus on the new API
> > and automatic behavior at this stage. If we find out there are certain
> > use cases, we can revisit this idea.
> >
> 
> Okay, sounds good to me.

Same here.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Fix OAuth feature detection on OpenBSD+Meson

2025-06-30 Thread Nazir Bilal Yavuz

Hi,

On Mon, 30 Jun 2025 at 17:45, Jacob Champion
 wrote:
>
> On Wed, Jun 25, 2025 at 3:46 AM Nazir Bilal Yavuz  wrote:
> > I wanted to experiment with it.
>
> That was fast, thank you!
>
> > First, I got the current list of
> > features from upstream, then disabled the auto features, then
> > explicitly enabled these features. I did this only for FreeBSD to show
> > my idea roughly.
> >
> > There are two patches; 0001 disables auto features for all of the
> > tasks and 0002 explicitly enables these features for FreeBSD.
>
> Just from a stylistic perspective, I think I'd prefer for
> `--auto-features=disabled` to be folded into the beginning of the
> MESON_FEATURES variable. Or is there another reason to control them
> fully independently?

No, I had thought people would prefer to control auto-features from
one central place but apparently I was wrong :) I will update the
patch according to this.

> Also: how do we ensure that none of the previously enabled features
> were missed in this list, for all the OSes we've modified? (I'm a
> Meson novice.)

I am not sure, I do not know if there is a way to ensure that. I just
manually checked features from the end of the 'meson setup' command's
output. I think this should be enough but of course an automated way
would be better if there is any.

> > What do you think about this approach? If you are okay with this, I
> > can apply it to all CI tasks (and possibly create another thread for
> > it).
>
> Seems good to me. I think a new top-level thread would be good too, to
> get more eyes on the changes.

Thanks! I will create the thread soon.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: New function normal_rand_array function to contrib/tablefunc.

2025-06-30 Thread jian he

hi.

I don't want to disrupt this thread too much.
so I created a separate thread ([1]) for my attached patch.

[1] 
https://www.postgresql.org/message-id/CACJufxF8_VzCFRHRt4OHHF74QtB8tj5Z%3Ddjsy7Y31OHKG5s1-w%40mail.gmail.com

Re: pg_get_multixact_members not documented

2025-06-30 Thread Sami Imseih

> On Mon, Jun 30, 2025 at 04:08:04PM +0300, Sami Imseih wrote:
> > Correct, I originally proposed the "Transaction ID and Snapshot
> > Information Functions" section, but as stated in [0],
> > pg_get_multixact_members does not fit the description of the section as
> > it is described as  "The main use of these functions is to determine
> > which transactions were committed between two snapshots."
>
> Well, is that description correct?  I've used several of these functions,
> but I'm not sure I've ever used them for this purpose.

I suppose the description is not applicable to all of these functions;
for example
pg_current_xact_id_if_assigned()  returns the current transactions xid, so the
description does not apply. Neither does it apply to pg_snapshot_xip().

this description has been around since 8.3 from what I can tell, and
when the inventory
of functions in this section was much smaller and most required a snapshot as an
argument.

Perhaps we should think about removing this description, what do you think?

> Looking again, pg_get_multixact_members() might need to be added to this
> list of exceptions:
>
> However, the functions shown in Table 9.84, except age and mxid_age,
> use a 64-bit type xid8 that does not wrap around during the life of an
> installation and can be converted to xid by casting if required; see
> Section 67.1 for details.

This function returns an xid and not an int8 such as for example
```
{ oid => '3819', descr => 'view members of a multixactid',
  proname => 'pg_get_multixact_members', prorows => '1000', proretset => 't',
  provolatile => 'v', prorettype => 'record', proargtypes => 'xid',
  proallargtypes => '{xid,xid,text}', proargmodes => '{i,o,o}',
  proargnames => '{multixid,xid,mode}', prosrc => 'pg_get_multixact_members' },
```
```
{ oid => '2943', descr => 'get current transaction ID',
  proname => 'txid_current', provolatile => 's', proparallel => 'u',
  prorettype => 'int8', proargtypes => '', prosrc => 'pg_current_xact_id' },
```
am I missing something?

> I'm also a little wary of adding a new section on the back-branches because
> it will change some section numbers.

Sure, I am not against keeping the function in an existing section, but
we should remove the description mentioned above for clarity.

--
Sami

Re: Allow ON CONFLICT DO UPDATE to return EXCLUDED values

2025-06-30 Thread Dean Rasheed

On Tue, 24 Jun 2025 at 19:49, Dean Rasheed  wrote:
>
> The attached patch allows EXCLUDED values to appear in the RETURNING
> list of INSERT ... ON CONFLICT DO UPDATE.
>
> I still have a lot more testing to do, and docs to update, but so far
> the results look promising. I'll add this to the next CF.
>

v2 attached, now with docs and more test cases.

Regards,
Dean
From 8983ab34fa9fc6ad5e0df91fd33b4060234a2a48 Mon Sep 17 00:00:00 2001
From: Dean Rasheed 
Date: Mon, 23 Jun 2025 17:20:02 +0100
Subject: [PATCH v2] Allow EXCLUDED in RETURNING list of INSERT ON CONFLICT DO
 UPDATE.

This allows an INSERT ... ON CONFLICT DO UPDATE command to return the
excluded values, when conflicts occur. For rows that do not conflict,
the values returned are NULL.

This builds on the infrastructure added to support OLD/NEW values in
the RETURNING list, except that in the case of EXCLUDED, there is no
need for the special syntax to change the name of the excluded
pseudo-relation in the RETURNING list. (That was required for OLD/NEW,
because those names were already used in some situations, such as
trigger functions, but there is no such problem with excluded.)
---
 doc/src/sgml/dml.sgml | 13 +++
 doc/src/sgml/ref/insert.sgml  |  8 ++
 src/backend/executor/execExpr.c   | 44 +++--
 src/backend/executor/execExprInterp.c | 21 -
 src/backend/executor/execPartition.c  | 14 +++
 src/backend/executor/nodeModifyTable.c| 51 +++---
 src/backend/optimizer/path/allpaths.c |  3 +-
 src/backend/optimizer/plan/setrefs.c  | 77 ++--
 src/backend/optimizer/prep/prepjointree.c | 21 +++--
 src/backend/optimizer/prep/preptlist.c|  9 +-
 src/backend/optimizer/util/clauses.c  |  2 +-
 src/backend/optimizer/util/var.c  | 11 ++-
 src/backend/parser/analyze.c  | 17 +++-
 src/backend/parser/parse_expr.c   |  2 +-
 src/backend/parser/parse_relation.c   |  4 +-
 src/backend/rewrite/rewriteHandler.c  | 42 +
 src/backend/rewrite/rewriteManip.c| 69 --
 src/include/executor/execExpr.h   |  7 +-
 src/include/nodes/execnodes.h |  2 +
 src/include/nodes/primnodes.h | 26 --
 src/include/parser/parse_node.h   |  7 +-
 src/include/rewrite/rewriteManip.h|  4 +-
 src/test/regress/expected/arrays.out  | 26 --
 .../regress/expected/generated_stored.out | 48 ++
 .../regress/expected/generated_virtual.out| 48 ++
 src/test/regress/expected/inherit.out |  9 +-
 src/test/regress/expected/insert_conflict.out | 92 +--
 src/test/regress/expected/returning.out   | 22 +++--
 src/test/regress/expected/rules.out   | 61 
 src/test/regress/expected/triggers.out| 74 ---
 src/test/regress/expected/updatable_views.out | 27 +-
 src/test/regress/sql/arrays.sql   |  9 +-
 src/test/regress/sql/generated_stored.sql |  4 +
 src/test/regress/sql/generated_virtual.sql|  4 +
 src/test/regress/sql/inherit.sql  |  3 +-
 src/test/regress/sql/insert_conflict.sql  | 29 +++---
 src/test/regress/sql/returning.sql|  8 +-
 src/test/regress/sql/rules.sql| 27 ++
 src/test/regress/sql/triggers.sql | 17 ++--
 src/test/regress/sql/updatable_views.sql  | 12 ++-
 src/tools/pgindent/typedefs.list  |  1 +
 41 files changed, 717 insertions(+), 258 deletions(-)

diff --git a/doc/src/sgml/dml.sgml b/doc/src/sgml/dml.sgml
index 458aee788b7..4712a49071b 100644
--- a/doc/src/sgml/dml.sgml
+++ b/doc/src/sgml/dml.sgml
@@ -392,6 +392,19 @@ UPDATE products SET price = price * 1.10
the new values may be non-NULL.
   
 
+  
+   In an INSERT with an ON CONFLICT DO UPDATE
+   clause, it is also possible to return the values excluded, if there is a
+   conflict.  For example:
+
+INSERT INTO products AS p SELECT * FROM new_products
+  ON CONFLICT (product_no) DO UPDATE SET name = excluded.name
+  RETURNING p.product_no, p.name, excluded.price;
+
+   Excluded values will be NULL for rows that do not
+   conflict.
+  
+
   
If there are triggers () on the target table,
the data available to RETURNING is the row as modified by
diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
index 3f139917790..2f4345e710c 100644
--- a/doc/src/sgml/ref/insert.sgml
+++ b/doc/src/sgml/ref/insert.sgml
@@ -343,6 +343,14 @@ INSERT INTO table_name [ AS ON CONFLICT DO UPDATE clause, the old
 values may be non-NULL.

+
+   
+If the INSERT has an
+ON CONFLICT DO UPDATE clause, a column name or
+* may be qualified using
+EXCLUDED to return the excluded values; if there
+is no conflict, the excluded values will be NULL.
+   
   
  
 
diff --git a/src/backend/execu

Re: postmaster uses more CPU in 18 beta1 with io_method=io_uring

2025-06-30 Thread Andres Freund

Hi,

On 2025-06-05 14:32:10 -0400, Andres Freund wrote:
> On 2025-06-05 12:47:52 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > I think this is a big enough pitfall that it's, obviously assuming the 
> > > patch
> > > has a sensible complexity, worth fixing this in 18. RMT, anyone, what do 
> > > you
> > > think?
> > 
> > Let's see the patch ... but yeah, I'd rather not ship 18 like this.
> 
> I've attached a first draft.
> 
> I can't make heads or tails of the ordering in configure.ac, so the function
> test is probably in the wrong place.

Any comments on that patch?  I'd hoped for some review comments... Unless I'll
hear otherwise, I'll just do a bit more polish and push..

Greetings,

Andres

Re: A concurrent VACUUM FULL?

2025-06-30 Thread Álvaro Herrera

On 2025-Jun-30, DINESH  NAIR wrote:

> In OLTP environments will it lead to slowing of the queries or query
> performance issues 

Sure, to some extent, but ideally you wouldn't use it in a recurring
fashion but only as an emergency solution out of a really serious bloat
problem (so it's not something you should have impacting your production
in a recurring fashion); also, performance should improve for the system
overall, comparing to the state before compacting the table.

I suggest you try pg_squeeze (a single run of it in a table, not
scheduled runs) and report back how the system performs for you in the
period when it is executing.  I expect that the impact of REPACK is
going to be largely the same as that of pg_squeeze, because the
implementation is very similar.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Y una voz del caos me habló y me dijo
"Sonríe y sé feliz, podría ser peor".
Y sonreí. Y fui feliz.
Y fue peor.

Re: Proposal to allow DELETE/UPDATE on partitioned tables with unsupported foreign partitions

2025-06-30 Thread Shirisha Shirisha

Hi all,

The proposed change would make the behavior consistent with the cases
> for INSERT/COPY into partitioned tables with non-insertable
> foreign-table partitions, so +1 in general.


Thanks for the initial feedback on making this behavior consistent with
INSERT/COPY.

Just wanted to follow up on the patch, any additional feedback or
improvement in the patch would be very helpful.

Thanks and Regards,
Shirisha
Broadcom Inc.


On Thu, Jun 12, 2025 at 3:59 PM Etsuro Fujita 
wrote:

> Hi,
>
> On Thu, Jun 12, 2025 at 1:47 PM Shirisha Shirisha
>  wrote:
> > We’d like to propose a change to improve DELETE and UPDATE behavior on
> partitioned tables
> > containing foreign partitions.
> >
> > Currently, DELETE or UPDATE (D/U) on a partitioned table with foreign
> partitions fails with an error
> > as below, if the FDW does not support the operation:
> >
> > `ERROR: cannot delete from foreign table`
> >
> > This failure occurs during executor initialization
> (`ExecInitModifyTable`), where PostgreSQL scans
> > all partitions of the target table and checks whether each one supports
> the requested operation.
> > If any foreign partition's FDW lacks support for D/U, the operation is
> rejected outright, even if that
> > partition would not be affected by the query.
> >
> >  As a result, DELETE/UPDATE operations are blocked even when they only
> target non-foreign partitions.
> > This means that the system errors out without considering whether
> foreign partitions are actually involved in the operation.
> > Even if no matching rows exist in a foreign partition, the operation
> still fails.
> >
> > This behavior presents a usability hurdle as it forces the user to work
> around this limitation by issuing D/U
> > statements separately on each individual child partition. This is
> cumbersome and breaks the workflow of managing such tables via the root.
> >
> > We are proposing a patch that would allow users to have a better
> workflow by continuing to perform D/U via root partition
> > even in presence of foreign partitions not implementing D/U API.
> > The key change is to defer the FDW check for foreign partitions from
> `ExecInitModifyTable` to `ExecDelete` and `ExecUpdate`.
> > This would ensure that the foreign partitions are checked only when they
> are actually targeted by the operation.
>
> The proposed change would make the behavior consistent with the cases
> for INSERT/COPY into partitioned tables with non-insertable
> foreign-table partitions, so +1 in general.  (I have not looked at the
> patch in detail yet.)
>
> Best regards,
> Etsuro Fujita
>

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.


smime.p7s
Description: S/MIME Cryptographic Signature

Re: Tags in the commitfest app: How to use them and what tags to add?

2025-06-30 Thread Jacob Champion

On Mon, Jun 23, 2025 at 12:01 PM David G. Johnston
 wrote:
>
> Yes, categories, and give each category its own line in the table.

I'm headed in the opposite direction. Let me elaborate with some very
strong opinions about the existing tags. (No one has to share my
strong opinions.)

- Help - Bikeshedding
- Good First Review
- Missing Tests

This category of tag is the best. It is completely new information,
not captured anywhere else in the UI, that is useful at the top level
and helps drive reviews forward by helping the community find
interesting things.

- Bugfix
- Security

I'm not excited about these tags, but in the absence of a bug tracker,
I'm glad we have them. Now it doesn't matter what "section" your patch
is in; if you realize it needs to be treated as a bug fix, or it gains
some sort of security caveat, you can tag them as such.

- dblink
- PL/Perl
- postgres_fdw

I don't like these at all. You can already search the patches with the
search box, so introducing a community norm that adds a bunch of
"which code did I touch" tags serves to clutter the UI and give new
CFMs an excuse to spend a bunch of time categorizing as opposed to
moving patches forward. Put the target of your patch in the entry
title somewhere -- and if it touches ten sections, I didn't really
want to see ten tags anyways.

- Backport

I am completely against this tag in particular. We have this
information already in its own Version column (though it's kind of sad
it's not sortable). If that column isn't working for people now, I
really doubt that moving the information to tags is going to help in
any way; we can either clarify the Version labels or make the meaning
discoverable.

--Jacob

Re: pg_dump --with-* options

2025-06-30 Thread Jeff Davis

On Mon, 2025-06-23 at 13:38 -0400, Robert Haas wrote:


> What confuses me about what you've written here specifically is that
> pg_dump and pg_restore are different programs with different option
> sets. So when you say we need both --with-statistics and
> --no-statistics, I guess that's true, but we're not talking about the
> same executable in both cases. It seems to me that pg_restore should
> restore everything that was dumped, but that there should be (as
> there
> are) various --no-whatever switches to skip unwanted items. But
> pg_dump should have dump a reasonable set of things by default, and
> the user should be able to add to that or subtract from it.

True, we could have different options for pg_dump and pg_restore, but
to me that seems a little strange because so many of the other options
overlap. I figured that would be confusing, but maybe it's fine.

Regards,
Jeff Davis

Re: regdatabase

2025-06-30 Thread Nathan Bossart

Committed.

-- 
nathan

Re: Tags in the commitfest app: How to use them and what tags to add?

2025-06-30 Thread Jelte Fennema-Nio

On Mon, 30 Jun 2025 at 22:48, Jacob Champion
 wrote:
> I would also like to request that CFMs be given the ability to add and
> edit (but maybe not delete?) tags.

CC: Alvaro. Since he gave me this access, so hopefully he can do the
same for the CFM group.

Re: On disable_cost

2025-06-30 Thread Robert Haas

On Mon, Jun 30, 2025 at 8:14 PM Michael Paquier  wrote:
> In short, thanks all for the work done here!

I'm glad to hear that you found it helpful!

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Correct comment atop PublicationPartOpt

2025-06-30 Thread Amit Kapila

On Tue, Jul 1, 2025 at 9:20 AM shveta malik  wrote:
>
> I found the wrong function name mentioned in the comment atop
> PublicationPartOpt.
>
> Comment wrongly mentioned the function name GetRelationPublications()
> for PublicationPartOpt usage instead of GetPublicationRelations().
> Corrected the comment.
>

LGTM.

-- 
With Regards,
Amit Kapila.

Re: Periodic FSM vacuum doesn't happen in one-pass strategy vacuum.

2025-06-30 Thread Masahiko Sawada

On Mon, Jun 30, 2025 at 10:20 PM Melanie Plageman
 wrote:
>
> On Tue, Jun 24, 2025 at 6:59 PM Melanie Plageman
>  wrote:
> >
> > So, I think we should commit the fix you proposed.

I agree with your analysis.

> >
> > The only question I have left is implementation: should we have
> > ndeleted as an output parameter of lazy_scan_prune() or have
> > lazy_scan_prune() return it (instead of void)?
> >
> > In <= 16, heap_page_prune() returned the number of tuples deleted, so
> > I thought of maybe having lazy_scan_prune() do this. Though, maybe it
> > is confusing to have one result returned as the return value and the
> > others returned in output parameters unless there is something more
> > special about ndeleted. With heap_page_prune(), I think it was the
> > return value because that was kind of what heap_page_prune()
> > "accomplished".

Given your comment, I now think it makes sense to have
lazy_scan_prune() return the number of deleted tuples as one of the
main jobs of this function is to delete tuples by HOT pruning. I've
updated the patch accordingly.

> Just checking what you thought about this. We probably want to get
> this committed and backported relatively soon. I'm happy to help with
> that if needed but just want to make sure we are on the same page
> about the fix.

I've attached the updated patches for master and backbranches (v17 and
v18). Please review these patches.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


REL17_v3-0001-Fix-missing-FSM-vacuum-opportunities-on-tables-wi.patch
Description: Binary data


master_v3-0001-Fix-missing-FSM-vacuum-opportunities-on-tables-wi.patch
Description: Binary data


REL18_v3-0001-Fix-missing-FSM-vacuum-opportunities-on-tables-wi.patch
Description: Binary data

Re: Restrict publishing of partitioned table with a foreign table as partition

2025-06-30 Thread Ajin Cherian

On Mon, Jun 9, 2025 at 3:58 PM Shlok Kyal  wrote:
>
> On Wed, 4 Jun 2025 at 16:12, Ajin Cherian  wrote:
> >
> > On Tue, May 20, 2025 at 2:33 AM Shlok Kyal  wrote:
> > >
> > > This approach seems better to me. I have created a patch with the
> > > above approach.
> > >
> > > Thanks and Regards,
> > > Shlok Kyal
> >
> > Some quick comments on the patch:
> > 1. In doc/src/sgml/ref/create_subscription.sgml:
> > +has partitioned table with foreign table as partition. If this 
> > scenario is
> > +detected we ERROR is logged to the user.
> > +   
> > +
> >
> > Should be: "If this scenario is detected an ERROR is logged to the
> > user." (remove "we").
> >
> > In src/backend/commands/subscriptioncmds.c:
> > 2. The comment header:
> > + * This function is in charge of detecting if publisher with
> > + * publish_via_partition_root=true publishes a partitioned table that has a
> > + * foreign table as a partition.
> >
> > Add "and throw an error if found" at the end of that sentence to
> > correctly describe what the function does.
> >
> > 3.
> > +appendStringInfoString(&cmd,
> > +   "SELECT DISTINCT P.pubname AS pubname "
> > +   "from pg_catalog.pg_publication p, LATERAL "
> > +   "pg_get_publication_tables(p.pubname) gpt, 
> > LATERAL "
> > +   "pg_partition_tree(gpt.relid) gt JOIN
> > pg_catalog.pg_foreign_table ft ON "
> > +   "ft.ftrelid = gt.relid WHERE p.pubviaroot
> > = true AND  p.pubname IN (");
> >
> > use FROM rather than from to maintain SQL style consistency.
> >
> > 4.
> > +errdetail_plural("The subscription being created on a
> > publication (%s) with publish_via_root_partition = true and contains
> > partitioned tables with foreign table as partition ",
> > + "The subscription being created on
> > publications (%s) with publish_via_root_partition = true and contains
> > partitioned tables with foreign table as partition ",
> > + list_length(publist), pubnames->data),
> >
> > I think you meant "publish_via_partition_root" here and not
> > "publish_via_root_partition ".
> >
>
> I have addressed all the comments and attached the updated patch.


Hi Shlok,

Some more comments:
1.
+ Replication is not supported for foreign tables.  When used as partitions
+ of partitioned tables, publishing of the partitioned table is only allowed
+ if the publish_via_partition_root is set to
+ false.

In patch: "When used as partitions of partitioned tables, publishing
of the partitioned table is only allowed if the
publish_via_partition_root is set to
false.

Change to: When foreign tables are used as partitions of partitioned
tables, publishing of the partitioned table is only allowed if the
publish_via_partition_root is set to
false.

2.
+   
+When using a subscription parameter copy_data = true, corresponding
+publications are checked if it has publish_via_partition_root = true and
+has partitioned table with foreign table as partition. If this scenario is
+detected an ERROR is logged to the user.
+   

In patch: "When using a subscription parameter copy_data = true, ..."
Change to: "When using the subscription parameter copy_data = true, ..."

In patch: "and has partitioned table with foreign table as partition."
Change to: "and has a partitioned table with a foreign table as its partition"

3.
+ * check_publications_foreign_parts
+ * Check if the publications, on which subscriber is subscribing, publishes any
+ * partitioned table that has an foreign table as its partition and has
+ * publish_via_partition_root set as true. The check is performed only if
+ * copy_data is set as true for the subscription.

In patch: "publishes any partitioned table that has an foreign table
as its partition"
Change to: "publishes any partitioned table that has a foreign table
as its partition"

4.
+ * DML data changes are not published for data in foreign tables,
+ * and yet the tablesync worker is not smart enough to omit data from
+ * foreign tables when they are partitions of partitioned tables.

change to:"Although DML changes to foreign tables are excluded from
publication, the tablesync worker will still attempt to copy data from
foreign table partitions during initial table synchronization."

5.
+ * When publish_via_partition_root is false, each partition published for
+ * replication is listed individually in pg_subscription_rel, and we
+ * don't add partitions that are foreign tables, so this check is not
+ * needed.

In patch: "so this check is not needed"
Change to: "so this function is not called for such tables"

6.
+errdetail_plural("The subscription being created on a
publication (%s) with publish_via_partition_root = true and contains
partitioned tables with foreign table as partition ",
+ "The subscription being created on
publ

Re: pgsql: Introduce pg_shmem_allocations_numa view

2025-06-30 Thread Bertrand Drouvot

Hi,

On Mon, Jun 30, 2025 at 08:56:43PM +0200, Tomas Vondra wrote:
> In particular it now uses "chunking" instead of "batching". I believe
> bathing is "combining multiple requests into a single one", but we're
> doing exactly the opposite - splitting a large request into smaller
> ones. Which is what "chunking" does.

I do agree that "chuncking" is more appropriate here.

> I plan to push this tomorrow morning.

Thanks!

LGTM, just 2 nit about the commit messages:

For 0001:

Is it worth to add a link to the Kernel Bug report or mentioned it can be
found in the discussion?

For 0003:

"
But with the chunking, introduced to work around the do_pages_stat()
bug"

Do you have in mind to quote the hex commit object name that will be generated
by 0001?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Logical Replication of sequences

2025-06-30 Thread shveta malik

On Mon, Jun 30, 2025 at 3:21 PM Nisha Moond  wrote:
>
> Tab-completion is not supported after a comma (,) in any other cases.
> For example, the following commands are valid, but tab-completion does
> not work after the comma:
>
> CREATE PUBLICATION pub7 FOR TABLE t1, TABLES IN SCHEMA public;
> CREATE PUBLICATION pub7 FOR TABLES IN SCHEMA public, TABLES IN SCHEMA schema2;
>
> I feel we can keep the behavior consistent in this case too. Thoughts?
>

Yes, let's keep the behaviour same. No need to make a change.

thanks
Shveta

Re: Fix typo in commens on ExecInsertIndexTuples

2025-06-30 Thread Amit Langote

On Tue, Jul 1, 2025 at 11:42 AM Amit Langote  wrote:
> Hi Nagata-san,
>
> On Tue, Jul 1, 2025 at 11:02 AM Yugo Nagata  wrote:
> >
> > Hi,
> >
> > I found typos in the comments of ExecInsertIndexTuples
> > that mention TUUI_{All, Summarizing} but they should be
> > TU_*.
> >
> > I've attached a patch to fix it.
>
> Good catch and thanks for the patch.  Will push shortly.

Done and backpatched to v16, where the faulty comments were introduced.

-- 
Thanks, Amit Langote

Re: back-patch documentation for age() and mxid_age()

2025-06-30 Thread Robert Haas

On Mon, Jun 30, 2025 at 5:34 PM Nathan Bossart  wrote:
> These functions have been around for a while, but commits 48b5aa3 and
> 15afb7d were only back-patched to v16.  Any objections if I apply them down
> to v13 now?

Seems fine to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Conflict detection for update_deleted in logical replication

2025-06-30 Thread Dilip Kumar

On Mon, Jun 30, 2025 at 6:59 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Mon, Jun 30, 2025 at 7:22 PM Amit Kapila wrote:
> >

I was looking at 0001, it mostly looks fine to me except this one
case.  So here we need to ensure that commits must be acquired after
marking the flag, don't you think we need to ensure strict statement
ordering using memory barrier, or we think it's not required and if so
why?

RecordTransactionCommitPrepared()
{
..
+ MyProc->delayChkptFlags |= DELAY_CHKPT_IN_COMMIT;
+
+ /*
+ * Note it is important to set committs value after marking ourselves as
+ * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is because
+ * we want to ensure all transactions that have acquired commit timestamp
+ * are finished before we allow the logical replication client to advance
+ * its xid which is used to hold back dead rows for conflict detection.
+ * See maybe_advance_nonremovable_xid.
+ */
+ committs = GetCurrentTimestamp();
}

-- 
Regards,
Dilip Kumar
Google

Re: Conflict detection for update_deleted in logical replication

2025-06-30 Thread Dilip Kumar

On Tue, Jul 1, 2025 at 10:31 AM Dilip Kumar  wrote:
>
> On Mon, Jun 30, 2025 at 6:59 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Mon, Jun 30, 2025 at 7:22 PM Amit Kapila wrote:
> > >
>
> I was looking at 0001, it mostly looks fine to me except this one
> case.  So here we need to ensure that commits must be acquired after
> marking the flag, don't you think we need to ensure strict statement
> ordering using memory barrier, or we think it's not required and if so
> why?
>
> RecordTransactionCommitPrepared()
> {
> ..
> + MyProc->delayChkptFlags |= DELAY_CHKPT_IN_COMMIT;
> +
> + /*
> + * Note it is important to set committs value after marking ourselves as
> + * in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is because
> + * we want to ensure all transactions that have acquired commit timestamp
> + * are finished before we allow the logical replication client to advance
> + * its xid which is used to hold back dead rows for conflict detection.
> + * See maybe_advance_nonremovable_xid.
> + */
> + committs = GetCurrentTimestamp();
> }

I'm unsure whether the function call inherently acts as a memory
barrier, preventing the compiler from reordering these operations.
This needs to be confirmed.


-- 
Regards,
Dilip Kumar
Google

Re: pg_dumpall dumps global objects with --statistics-only or --no-schema

2025-06-30 Thread Corey Huinker

>
>
> Since pg_dumpall treats global objects as schema-level content, it
> currently
> includes them with --schema-only but skips them with --data-only. By that
> logic,
> it should also skip them when either --statistics-only or --no-schema is
> used.
> Thought?
>
>
+1, pending resolution of the defaults issue.

At first glance, this looks like a good candidate for the the same refactor
that we did to pg_dump in 96a81c1be929, where we abandoned using the
schema_only and data_only flags to determine what should be dumped in favor
of dumpSchema, dumpData, and eventually dumpStatistics. This made for a
cleaner interface because each test was described in terms of what was
wanted, not in terms of the opposite being not the case, and all of the
double/triple-negative boolean backflips were concentrated right after the
options conflict resolutions.

However, all prospective do_this_thing branches are used exactly once, so
there is no code savings to be had and no clarity to be gained, so this
patch is fine as is.

Re: Inline non-SQL SRFs using SupportRequestSimplify

2025-06-30 Thread Paul Jungwirth


On 9/3/24 09:42, Tom Lane wrote:

Paul Jungwirth  writes:

Here are new patches using a new SupportRequestInlineSRF request type. They 
include patches and
documentation.


I took a look through this.  I feel like we're still some way away
from having something committable.  I've got two main complaint
areas:

1. It doesn't seem like integrating this into
inline_set_returning_function was the right thing after all, or
maybe just the way you did it isn't right.

> ...

2. The documentation needs to be a great deal more explicit
about what the function is supposed to return.


Thanks for the review . . . and your patience waiting for an update!

I tried a few refactoring approaches but the nicest seemed to be to keep the shared parts in 
inline_set_returning_function, but have it call out to either inline_sql_set_returning_function or 
inline_set_returning_function_with_support. The first patch just refactors but doesn't yet add 
inline_set_returning_function_with_support, then the second patch adds the new functionality.


The refactor lets us share lots of pre-condition checks, as well as parameter substitution into the 
Query result. In some cases the refactor changes the order of things, but all of those changes 
looked safe to me. I didn't love passing a SysCache HeapTuple into another function, but it does 
make the cleanup a little easier, since now we can always release it in the same place.


The first patch gave me a wacky diff, but I couldn't get git to make something less fragmented. The 
idea is simple though: move part of inline_set_returning_function into 
inline_sql_set_returning_function, and call that instead.


Rebased to 0836683a89.

Yours,

--
Paul  ~{:-)
p...@illuminatedcomputing.com
From 2a0d2bb6cc6fe25cab1065411ed4db17cc394c53 Mon Sep 17 00:00:00 2001
From: "Paul A. Jungwirth" 
Date: Tue, 24 Jun 2025 19:10:15 -0700
Subject: [PATCH v2 1/2] Move some things outside of
 inline_set_returning_function.

Added a new inline_sql_set_returning_function in preparation for
inline_set_returning_function_with_support. Then
inline_set_returning_function can call both, handling their shared needs
itself.

Author: Paul A. Jungwirth 
---
 src/backend/optimizer/util/clauses.c | 212 +++
 1 file changed, 122 insertions(+), 90 deletions(-)

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 26a3e050086..a0dc5dfcd61 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -5049,29 +5049,21 @@ evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
 
 
 /*
- * inline_set_returning_function
- *		Attempt to "inline" a set-returning function in the FROM clause.
- *
- * "rte" is an RTE_FUNCTION rangetable entry.  If it represents a call of a
- * set-returning SQL function that can safely be inlined, expand the function
- * and return the substitute Query structure.  Otherwise, return NULL.
+ * inline_sql_set_returning_function
  *
- * We assume that the RTE's expression has already been put through
- * eval_const_expressions(), which among other things will take care of
- * default arguments and named-argument notation.
+ * This implements inline_set_returning_function for sql-language functions.
+ * It parses the body (or uses the pre-parsed body if available).
+ * It allocates its own temporary MemoryContext for the parsing, then copies
+ * the result into the caller's context.
  *
- * This has a good deal of similarity to inline_function(), but that's
- * for the non-set-returning case, and there are enough differences to
- * justify separate functions.
+ * Returns NULL if the function couldn't be inlined.
  */
-Query *
-inline_set_returning_function(PlannerInfo *root, RangeTblEntry *rte)
+static Query *
+inline_sql_set_returning_function(PlannerInfo *root, RangeTblEntry *rte,
+  RangeTblFunction *rtfunc,
+  FuncExpr *fexpr, Oid func_oid, HeapTuple func_tuple,
+  Form_pg_proc funcform)
 {
-	RangeTblFunction *rtfunc;
-	FuncExpr   *fexpr;
-	Oid			func_oid;
-	HeapTuple	func_tuple;
-	Form_pg_proc funcform;
 	char	   *src;
 	Datum		tmp;
 	bool		isNull;
@@ -5088,37 +5080,6 @@ inline_set_returning_function(PlannerInfo *root, RangeTblEntry *rte)
 
 	Assert(rte->rtekind == RTE_FUNCTION);
 
-	/*
-	 * It doesn't make a lot of sense for a SQL SRF to refer to itself in its
-	 * own FROM clause, since that must cause infinite recursion at runtime.
-	 * It will cause this code to recurse too, so check for stack overflow.
-	 * (There's no need to do more.)
-	 */
-	check_stack_depth();
-
-	/* Fail if the RTE has ORDINALITY - we don't implement that here. */
-	if (rte->funcordinality)
-		return NULL;
-
-	/* Fail if RTE isn't a single, simple FuncExpr */
-	if (list_length(rte->functions) != 1)
-		return NULL;
-	rtfunc = (RangeTblFunction *) linitial(rte->functions);
-
-	if (!IsA(rtfunc->funcexpr, FuncExpr))
-		return NULL;
-	fexpr = (FuncExpr *) rtfunc->funcexpr;
-
-	func_oid =

Re: pg_get_multixact_members not documented

2025-06-30 Thread Nathan Bossart

On Mon, Jun 30, 2025 at 02:54:28PM -0500, Nathan Bossart wrote:
> I noticed that this list of exceptions doesn't exist on v13-v15, presumably
> because the docs for age() and mxid_age() were only back-patched to v16
> (see commits 48b5aa3 and 15afb7d), which is strange because I believe those
> functions are much older.  I don't see any discussion about this choice,
> either.  We should probably back-patch those commits to v13 as a
> prerequisite to applying this patch.

Spun off a new thread for this:

https://postgr.es/m/aGMCxHxLfeMdQk8m%40nathan

-- 
nathan

Re: [PoC] Federated Authn/z with OAUTHBEARER

2025-06-30 Thread Andres Freund

Hi,

On 2025-07-01 00:52:49 +0200, Daniel Gustafsson wrote:
> > On 30 Jun 2025, at 20:33, Jacob Champion  
> > wrote:
> > 
> > On Mon, Jun 30, 2025 at 10:02 AM Daniel Gustafsson  wrote:
> >>> On 30 Jun 2025, at 18:58, Andres Freund  wrote:
> >>> Probably just needs to be added to the installed_targets list.
> >> 
> >> Thanks for the report, I'll take a look today to get it fixed.
> > 
> > Thanks both!
> > 
> > Looking at the installed_targets stuff, though... why do we use `meson
> > install --no-rebuild` in combination with `depends:
> > installed_targets`? Can't we just use Meson's dependency tracking
> > during installation, and avoid this hazard?

I don't think that's really possible - the dependency tracking is useful to
generate granular *rebuild* information, but doesn't help with the first
build.

If we had dependency generation for the install target it could be helpful to
discover missing dependencies though.

> I suspect it is because without --no-rebuild the quiet target isn't entirely
> quiet.

No - the issue is that you're not allowed to run ninja while ninja is running,
as that would corrupt it's tracking (and build things multiple times). meson
install --no-rebuild would run ninja to build things...

> Still, I was unable to make something that work in all build combinations
> while keeping --no-rebuild (which isn't indicative of it being possible to
> do).

Hm, what problem did you encounter? I don't think there should be any
difficulty?

Greetings,

Andres Freund

Correct comment atop PublicationPartOpt

2025-06-30 Thread shveta malik

Hi,

I found the wrong function name mentioned in the comment atop
PublicationPartOpt.

Comment wrongly mentioned the function name GetRelationPublications()
for PublicationPartOpt usage instead of GetPublicationRelations().
Corrected the comment.

thanks
Shveta


v1-0001-Comment-correction-atop-PublicationPartOpt.patch
Description: Binary data

Re: Tags in the commitfest app: How to use them and what tags to add?

2025-06-30 Thread Jacob Champion

On Mon, Jun 30, 2025 at 1:20 PM Jacob Champion
 wrote:
> This category of tag is the best. It is completely new information,
> not captured anywhere else in the UI, that is useful at the top level
> and helps drive reviews forward by helping the community find
> interesting things.

Oh, and while I'm thinking about it, I'd like to propose two new tags
in this vein:

- "help, I'm stuck in perma-rebase hell"
- "this is my first contribution to the project"

I would also like to request that CFMs be given the ability to add and
edit (but maybe not delete?) tags.

--Jacob

1 2 >

1 - 100 of 124 matches

Mail list logo