Re: per backend WAL statistics

2025-02-27 Thread Michael Paquier
On Thu, Feb 27, 2025 at 07:47:09AM +, Bertrand Drouvot wrote:
> That's how I did it initially but decided to move it to pgstat_backend.c. The
> reason was that it's fully linked to "per backend" stats and that there is
> no SQL api on top of it (while I think that's the case for almost all the ones
> in pgstatfuncs.c). Thoughts?

Okay by me with pgstat_fetch_stat_backend in parallel, why not
exposing this part as well..  Perhaps that could be useful for some
extension?  I'd rather have out-of-core code do these lookups with the
same sanity checks in place for the procnumber and slot lookups.

The name was inconsistent with the rest of the file, so I have settled
to a pgstat_fetch_stat_backend_by_pid() to be more consistent.  A
second thing is to properly initialize bktype if defined by the
caller.

> Makes sense. I did not had in mind to submit a new patch version (to at least
> implement the above) without getting your final thoughts on your first 
> comment.
> But since a rebase is needed anyway,then please find attached a new version. 
> It
> just implements your last comment.

Attached is a rebased version of the rest.
--
Michael
From 42fd2860e3f06d525936187f46aad06892073078 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 6 Jan 2025 10:00:00 +
Subject: [PATCH v12] per backend WAL statistics

Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.

This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity.  A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.

The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.

XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
 src/include/catalog/pg_proc.dat |  7 +++
 src/include/pgstat.h| 37 ++--
 src/include/utils/pgstat_internal.h |  3 +-
 src/backend/utils/activity/pgstat_backend.c | 64 +
 src/backend/utils/activity/pgstat_wal.c |  1 +
 src/backend/utils/adt/pgstatfuncs.c | 26 -
 src/test/regress/expected/stats.out | 14 +
 src/test/regress/sql/stats.sql  |  6 ++
 doc/src/sgml/monitoring.sgml| 19 ++
 9 files changed, 156 insertions(+), 21 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bacf..3e35f8b8e99a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
   proargmodes => '{o,o,o,o,o}',
   proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
   prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+  proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+  proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+  proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o}',
+  proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+  prosrc => 'pg_stat_get_backend_wal' },
 { oid => '6248', descr => 'statistics: information about WAL prefetching',
   proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d5..06359b9157d2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
-typedef struct PgStat_Backend
-{
-	TimestampTz stat_reset_timestamp;
-	PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* -
- * PgStat_BackendPending	Non-flushed backend stats.
- * -
- */
-typedef struct PgStat_BackendPending
-{
-	/*
-	 * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
-	 */
-	PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
 typedef struct PgStat_StatDBEntry
 {
 	PgStat_Counter xact_commit;
@@ -500,6 +482,25 @@ typedef struct PgStat_WalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO io_stats;
+	PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* -
+ * PgStat_BackendPending	Non-flushed backend stats.
+ * -
+ */
+typedef struct PgStat_BackendPending
+{
+	/*
+	 * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+	 */
+	PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
 /*
  * Functions in pgstat.c
  */
diff --git a/src/include/utils/pgstat_internal.h b/sr

Re: Disabling vacuum truncate for autovacuum

2025-02-27 Thread Nathan Bossart
On Tue, Feb 18, 2025 at 01:56:09PM -0600, Nathan Bossart wrote:
> On Mon, Jan 27, 2025 at 03:38:39PM -0500, Robert Haas wrote:
>> Also, how sure are we that turning this off globally is a solid idea?
>> Off-hand, it doesn't sound that bad: there are probably situations in
>> which autovacuum never truncates anything anyway just because the tail
>> blocks of the relations always contain at least 1 tuple. But we should
>> be careful not to add a setting that is far more likely to get people
>> into trouble than to get them out of it. It would be good to hear what
>> other people think about the risk vs. reward tradeoff in this case.
> 
> My first reaction is that a global setting is probably fine most of the
> time.  I'm sure it's possible to get into bad situations if you try hard
> enough, but that's not a unique characteristic.  There are probably many
> situations where the truncation is wasted effort because we'll just end up
> extending the relation shortly afterwards, anyway.  In any case, it's
> already possible to achieve $SUBJECT with a trivial script that sets
> storage parameters on all tables, so IMHO it would be silly to withhold a
> global setting that does the same thing just on principle.

I spent some time on this one today.  A couple of notes:

* Since the reloption is a Boolean, there isn't a good way to tell whether
  it is actually set for the table or if it just inherited the default
  value.  This is important to know because we want the reloption to
  override the GUC.  I considered a bunch of different ways to handle this,
  but everything felt like a cowboy hack.  The cleanest cowboy hack I could
  come up with is an optional offset field in relopt_parse_elt that points
  to a variable that stores whether the option was explicitly set.

* I didn't see a good GUC category for vacuum_truncate.  I suppose we could
  create a new one, but for now I've just stashed it into the autovacuum
  one.  Bikeshedding welcome.

Thoughts?

-- 
nathan
>From e360b56acd1d3bd05c9df6cfc4586e51edce357e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 27 Feb 2025 21:09:37 -0600
Subject: [PATCH v2 1/1] Add vacuum_truncate GUC.

---
 doc/src/sgml/config.sgml  | 22 +++
 doc/src/sgml/ref/create_table.sgml| 10 +
 doc/src/sgml/ref/vacuum.sgml  |  3 ++-
 src/backend/access/common/reloptions.c| 11 --
 src/backend/commands/vacuum.c | 17 ++
 src/backend/utils/misc/guc_tables.c   |  9 
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/reloptions.h   |  1 +
 src/include/commands/vacuum.h |  1 +
 src/include/utils/rel.h   |  1 +
 10 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e55700f35b8..069ac35762f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8934,6 +8934,28 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH 
csv;

   
 
+  
+   vacuum_truncate (boolean)
+   
+vacuum_truncate configuration 
parameter
+   
+   
+   
+
+ Enables or disables vacuum to try to truncate off any empty pages at
+ the end of the table.  The default value is true.
+ If true, VACUUM and autovacuum
+ do the truncation and the disk space for the truncated pages is
+ returned to the operating system.  Note that the truncation requires
+ an ACCESS EXCLUSIVE lock on the table.  The
+ TRUNCATE parameter of
+ VACUUM, if
+ specified, overrides the value of this parameter.  The setting can be
+ overridden for individual tables by changing table storage parameters.
+
+   
+  
+
  
 
 
diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index 0a3e520f215..3c2315b1a8e 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1688,15 +1688,7 @@ WITH ( MODULUS numeric_literal, REM
 
 
  
-  Enables or disables vacuum to try to truncate off any empty pages
-  at the end of this table. The default value is true.
-  If true, VACUUM and
-  autovacuum do the truncation and the disk space for
-  the truncated pages is returned to the operating system.
-  Note that the truncation requires ACCESS EXCLUSIVE
-  lock on the table. The TRUNCATE parameter
-  of VACUUM, if 
specified, overrides the value
-  of this option.
+  Per-table value for  parameter.
  
 

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 971b1237d47..bd5dcaf86a5 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -265,7 +265,8 @@ VACUUM [ ( option [, ...] ) ] [ vacuum_truncate
+  and is the default unless 
+  is set to false or 

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Thomas Munro
On Fri, Feb 28, 2025 at 11:58 AM Melanie Plageman
 wrote:
> On Thu, Feb 27, 2025 at 1:08 PM Tom Lane  wrote:
> > I wonder if it'd be a good idea to add something like
> >
> > Assert(stream->distance == 1);
> > Assert(stream->pending_read_nblocks == 0);
> > Assert(stream->per_buffer_data_size == 0);
> > +   Assert(per_buffer_data == NULL);
> >
> > in read_stream_next_buffer.  I doubt that this will shut Coverity
> > up, but it would help to catch caller coding errors, i.e. passing
> > a per_buffer_data pointer when there's no per-buffer data.
>
> I think this is a good stopgap. I was discussing adding this assert
> off-list with Thomas and he wanted to detail his more ambitious plans
> for type safety improvements in the read stream API. Less on the order
> of a redesign and more like a separate read_stream_next_buffer()s for
> when there is per buffer data and when there isn't. And a by-value and
> by-reference version for the one where there is data.

Here's what I had in mind.  Is it better?
From 68e9424b590051959142917459eb4ea074589b79 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 28 Feb 2025 10:48:29 +1300
Subject: [PATCH 1/2] Improve API for retrieving data from read streams.

Dealing with the per_buffer_data argument to read_stream_next_buffer()
has proven a bit clunky.  Provide some new wrapper functions/macros:

buffer = read_stream_get_buffer(rs);
buffer = read_stream_get_buffer_and_value(rs, &my_int);
buffer = read_stream_get_buffer_and_pointer(rs, &my_pointer_to_int);

These improve readability and type safety via assertions.
---
 contrib/pg_prewarm/pg_prewarm.c  |  4 +-
 contrib/pg_visibility/pg_visibility.c|  6 +--
 src/backend/access/heap/heapam.c |  2 +-
 src/backend/access/heap/heapam_handler.c |  2 +-
 src/backend/access/heap/vacuumlazy.c |  6 +--
 src/backend/storage/aio/read_stream.c| 12 +
 src/backend/storage/buffer/bufmgr.c  |  4 +-
 src/include/storage/read_stream.h| 64 
 8 files changed, 87 insertions(+), 13 deletions(-)

diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c
index a2f0ac4af0c..f6ae266d7b0 100644
--- a/contrib/pg_prewarm/pg_prewarm.c
+++ b/contrib/pg_prewarm/pg_prewarm.c
@@ -208,11 +208,11 @@ pg_prewarm(PG_FUNCTION_ARGS)
 			Buffer		buf;
 
 			CHECK_FOR_INTERRUPTS();
-			buf = read_stream_next_buffer(stream, NULL);
+			buf = read_stream_get_buffer(stream);
 			ReleaseBuffer(buf);
 			++blocks_done;
 		}
-		Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+		Assert(read_stream_get_buffer(stream) == InvalidBuffer);
 		read_stream_end(stream);
 	}
 
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 7f268a18a74..e7187a46c9d 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -556,7 +556,7 @@ collect_visibility_data(Oid relid, bool include_pd)
 			Buffer		buffer;
 			Page		page;
 
-			buffer = read_stream_next_buffer(stream, NULL);
+			buffer = read_stream_get_buffer(stream);
 			LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
 			page = BufferGetPage(buffer);
@@ -569,7 +569,7 @@ collect_visibility_data(Oid relid, bool include_pd)
 
 	if (include_pd)
 	{
-		Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+		Assert(read_stream_get_buffer(stream) == InvalidBuffer);
 		read_stream_end(stream);
 	}
 
@@ -752,7 +752,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 		0);
 
 	/* Loop over every block in the relation. */
-	while ((buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
+	while ((buffer = read_stream_get_buffer(stream)) != InvalidBuffer)
 	{
 		bool		check_frozen = all_frozen;
 		bool		check_visible = all_visible;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa7935a0ed3..86f280069e0 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -609,7 +609,7 @@ heap_fetch_next_buffer(HeapScanDesc scan, ScanDirection dir)
 
 	scan->rs_dir = dir;
 
-	scan->rs_cbuf = read_stream_next_buffer(scan->rs_read_stream, NULL);
+	scan->rs_cbuf = read_stream_get_buffer(scan->rs_read_stream);
 	if (BufferIsValid(scan->rs_cbuf))
 		scan->rs_cblock = BufferGetBlockNumber(scan->rs_cbuf);
 }
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index e78682c3cef..7487896b06c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -1010,7 +1010,7 @@ heapam_scan_analyze_next_block(TableScanDesc scan, ReadStream *stream)
 	 * re-acquire sharelock for each tuple, but since we aren't doing much
 	 * work per tuple, the extra lock traffic is probably better avoided.
 	 */
-	hscan->rs_cbuf = read_stream_next_buffer(stream, NULL);
+	hscan->rs_cbuf = read_stream_get_buffer(stream);
 	if (!BufferIsValid(hscan

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Thomas Munro
On Fri, Feb 28, 2025 at 2:29 PM Thomas Munro  wrote:
> On Fri, Feb 28, 2025 at 11:58 AM Melanie Plageman
>  wrote:
> > On Thu, Feb 27, 2025 at 1:08 PM Tom Lane  wrote:
> > > I wonder if it'd be a good idea to add something like
> > >
> > > Assert(stream->distance == 1);
> > > Assert(stream->pending_read_nblocks == 0);
> > > Assert(stream->per_buffer_data_size == 0);
> > > +   Assert(per_buffer_data == NULL);
> > >
> > > in read_stream_next_buffer.  I doubt that this will shut Coverity
> > > up, but it would help to catch caller coding errors, i.e. passing
> > > a per_buffer_data pointer when there's no per-buffer data.
> >
> > I think this is a good stopgap. I was discussing adding this assert
> > off-list with Thomas and he wanted to detail his more ambitious plans
> > for type safety improvements in the read stream API. Less on the order
> > of a redesign and more like a separate read_stream_next_buffer()s for
> > when there is per buffer data and when there isn't. And a by-value and
> > by-reference version for the one where there is data.
>
> Here's what I had in mind.  Is it better?

Here's a slightly better one.  I think when you use
read_stream_get_buffer_and_value(stream, &value), or
read_stream_put_value(stream, space, value), then we should assert
that sizeof(value) strictly matches the available space, as shown.  But,
new in v2, if you use read_stream_get_buffer_and_pointer(stream,
&pointer), then sizeof(*pointer) should only have to  be <= the
storage space, not ==, because someone might plausibly want to make
per_buffer_data_size variable at runtime (ie decide when they
construct the stream), and then be able to retrieve a pointer to the
start of a struct with a flexible array or something like that.  In v1
I was just trying to assert that it was a
pointer-to-a-pointer-to-something and no more (in a confusing
compile-time assertion), but v2 is simpler, and is happy with a
pointer to a pointer to something that doesn't exceed the space
(run-time assertion).
From b2dd9c90f970a889deea2c2e9e16097e4e06ece8 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 28 Feb 2025 10:48:29 +1300
Subject: [PATCH v2 1/2] Improve API for retrieving data from read streams.

Dealing with the per_buffer_data argument to read_stream_next_buffer()
has proven a bit clunky.  Provide some new wrapper functions/macros:

buffer = read_stream_get_buffer(rs);
buffer = read_stream_get_buffer_and_value(rs, &my_int);
buffer = read_stream_get_buffer_and_pointer(rs, &my_pointer_to_int);

These improve readability and type safety via assertions.
---
 contrib/pg_prewarm/pg_prewarm.c  |  4 +-
 contrib/pg_visibility/pg_visibility.c|  6 +--
 src/backend/access/heap/heapam.c |  2 +-
 src/backend/access/heap/heapam_handler.c |  2 +-
 src/backend/access/heap/vacuumlazy.c |  6 +--
 src/backend/storage/aio/read_stream.c| 12 ++
 src/backend/storage/buffer/bufmgr.c  |  4 +-
 src/include/storage/read_stream.h| 55 
 8 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/contrib/pg_prewarm/pg_prewarm.c b/contrib/pg_prewarm/pg_prewarm.c
index a2f0ac4af0c..f6ae266d7b0 100644
--- a/contrib/pg_prewarm/pg_prewarm.c
+++ b/contrib/pg_prewarm/pg_prewarm.c
@@ -208,11 +208,11 @@ pg_prewarm(PG_FUNCTION_ARGS)
 			Buffer		buf;
 
 			CHECK_FOR_INTERRUPTS();
-			buf = read_stream_next_buffer(stream, NULL);
+			buf = read_stream_get_buffer(stream);
 			ReleaseBuffer(buf);
 			++blocks_done;
 		}
-		Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+		Assert(read_stream_get_buffer(stream) == InvalidBuffer);
 		read_stream_end(stream);
 	}
 
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 7f268a18a74..e7187a46c9d 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -556,7 +556,7 @@ collect_visibility_data(Oid relid, bool include_pd)
 			Buffer		buffer;
 			Page		page;
 
-			buffer = read_stream_next_buffer(stream, NULL);
+			buffer = read_stream_get_buffer(stream);
 			LockBuffer(buffer, BUFFER_LOCK_SHARE);
 
 			page = BufferGetPage(buffer);
@@ -569,7 +569,7 @@ collect_visibility_data(Oid relid, bool include_pd)
 
 	if (include_pd)
 	{
-		Assert(read_stream_next_buffer(stream, NULL) == InvalidBuffer);
+		Assert(read_stream_get_buffer(stream) == InvalidBuffer);
 		read_stream_end(stream);
 	}
 
@@ -752,7 +752,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
 		0);
 
 	/* Loop over every block in the relation. */
-	while ((buffer = read_stream_next_buffer(stream, NULL)) != InvalidBuffer)
+	while ((buffer = read_stream_get_buffer(stream)) != InvalidBuffer)
 	{
 		bool		check_frozen = all_frozen;
 		bool		check_visible = all_visible;
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa7935a0ed3..86f280069e0 100644
--- a/src/backend/access/h

Re: Log connection establishment timings

2025-02-27 Thread Bertrand Drouvot
Hi,

On Thu, Feb 27, 2025 at 05:55:19PM -0500, Melanie Plageman wrote:
> On Thu, Feb 27, 2025 at 11:30 AM Andres Freund  wrote:
> >
> > On 2025-02-27 11:08:04 -0500, Melanie Plageman wrote:
> >
> > > However, since that is a bigger project (with more refactoring, etc),
> > > he suggested that we change log_connections to a GUC_LIST
> > > (ConfigureNamesString) with options like "none", "received,
> > > "authenticated", "authorized", "all".
> >
> > Yep.
> 
> I've done a draft of this in attached v7 (see 0001).

Thanks for the patch!

> It still needs polishing (e.g. I need to figure out where to put the new guc 
> hook
> functions and enums and such)

yeah, I wonder if it would make sense to create new dedicated 
"connection_logging"
file(s).

>, but I want to see if this is a viable direction forward.
> 
> I'm worried the list of possible connection log messages could get
> unwieldy if we add many more.

Thinking out loud, I'm not sure we'd add "many more" but maybe what we could do
is to introduce some predefined groups like:

'basic' (that would be equivalent to 'received, + timings introduced in 0002')
'security' (equivalent to 'authenticated,authorized')

Not saying we need to create this kind of predefined groups now, just an idea
in case we'd add many more: that could "help" the user experience, or are you
more concerned about the code maintenance?

Just did a quick test, (did not look closely at the code), and it looks like
that:

'all': does not report the "received" (but does report authenticated and 
authorized)
'received': does not work?
'authenticated': works
'authorized': works

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Extend postgres_fdw_get_connections to return remote backend pid

2025-02-27 Thread Sagar Shedge
> For now, I think it makes sense to keep postgres_fdw_get_connections()
> aligned with the current implementation. Otherwise, explaining what
> remote_backend_pid = NULL means could be confusing, especially since
> pipeline mode isn't supported yet in postgres_fdw.

Make sense. I have created a patch according to the above suggestions.
Please review.

-- 
Sagar Dilip Shedge,
SDE AWS


v2-0001-Extend-postgres_fdw_get_connections-to-return-rem.patch
Description: Binary data


Re: Adding support for SSLKEYLOGFILE in the frontend

2025-02-27 Thread Abhishek Chanda
Attached a v6 with O_NOFOLLOW removed, I noticed that it failed on
windows. Please let me know if I should do this in any other way that
is portable across platforms.

Thanks

On Thu, Feb 27, 2025 at 11:39 PM Abhishek Chanda
 wrote:
>
> Thanks for the review, Daniel. Please find v5 of this patch attached.
> I tried to bump up the minimum OpenBSD version in installation.sgml,
> do I need to add this anywhere else? Please let me know if this needs
> anything else.
>
> Thanks
>
> On Thu, Feb 20, 2025 at 4:25 PM Daniel Gustafsson  wrote:
> >
> > > On 20 Feb 2025, at 04:36, Abhishek Chanda  wrote:
> > > Please find attached a new version of this patch. I rebased on master,
> > > added better error reporting and skipped the permissions check on
> > > windows. Please let me know if this needs any changes. I tested this
> > > locally using meson running all TAP tests.
> >
> > Thanks for the new version, a few comments on this one from reading:
> >
> > +./src/test/ssl/key.txt
> > The toplevel .gitignore should not be used for transient test output, there 
> > is
> > a .gitignore in src/test/ssl for that.  The key.txt file should also be 
> > placed
> > in PostgreSQL::Test::Utils::tempdir and then cleaned up after the test.
> >
> >
> > +  > xreflabel="sslkeylogfile">
> > The documentation should state that the file will use the NSS format.
> >
> >
> > +   if (log_file == NULL) {
> > +   libpq_append_conn_error(conn, "could not open ssl key log 
> > ...
> > +   }
> > This raise an error for not being able to operate on the file, but it 
> > doesn't
> > error out from the function and instead keeps going with more operations on 
> > the
> > file which couldn't be opened.
> >
> >
> > +   if (chmod(conn->sslkeylogfile, 0600) == -1) {
> > Rather than opening and try to set permissions separately, why not use 
> > open(2)
> > and set the umask?  We probably also want to set O_NOFOLLOW.
> >
> >
> > +   if (conn->sslkeylogfile && strlen(conn->sslkeylogfile) > 0)
> > +   SSL_CTX_set_keylog_callback(SSL_context, SSL_CTX_keylog_cb);
> > This callback does exist in OpenSSL 1.1.1 but it's only available in 
> > LibreSSL
> > from OpenBSD 7.1 and onwards where we support 7.0 as the earliest version.
> > This feature seems like a good enough reason to bump the minimum LibreSSL
> > version, and 7.1 is well out support (7.5 goes EOL next), but it would need 
> > to
> > get done here.
> >
> >
> > +# Skip file mode check on Windows
> > +return 1 if $windows_os;
> > It should be up to each individual test to determine what to skip, not the
> > library code (and it should use SKIP:).  src/test/ssl/t/001_ssltests.pl is 
> > an
> > example which skips permissions checks on Windows.  Also, I'm not sure we 
> > need
> > a generic function in the testlibrary for something so basic.
> >
> >
> > +print STDERR sprintf("%s mode must be %04o\n",
> > +   $path, $expected_file_mode);
> > Test code should not print anything to stdout/stderr, it should use the TAP
> > compliant methods like diag() etc for this.
> >
> >
> > +   "connect with server root certand sslkeylogfile=key.txt");
> > s/certand/cert and/ ?
> >
> > --
> > Daniel Gustafsson
> >
>
>
> --
> Thanks and regards
> Abhishek Chanda



-- 
Thanks and regards
Abhishek Chanda


v6-0001-Add-support-for-dumping-SSL-keylog-to-a-file.patch
Description: Binary data


Re: Disabling vacuum truncate for autovacuum

2025-02-27 Thread Laurenz Albe
On Thu, 2025-02-27 at 21:35 -0600, Nathan Bossart wrote:
> I spent some time on this one today.  A couple of notes:
> 
> * Since the reloption is a Boolean, there isn't a good way to tell whether
>   it is actually set for the table or if it just inherited the default
>   value.  This is important to know because we want the reloption to
>   override the GUC.

I agree with that, without being able to think of a better solution.

> * I didn't see a good GUC category for vacuum_truncate.  I suppose we could
>   create a new one, but for now I've just stashed it into the autovacuum
>   one.  Bikeshedding welcome.

Why not use "Client Connection Defaults / Statement Behavior", just like for
"vacuum_freeze_min_age"?  I don't think that "autovacuum" is appropriate,
since it applies to manual VACUUM as well.

Yours,
Laurenz Albe




Re: Restrict copying of invalidated replication slots

2025-02-27 Thread Amit Kapila
On Fri, Feb 28, 2025 at 5:10 AM Masahiko Sawada  wrote:
>
> On Thu, Feb 27, 2025 at 12:52 AM Amit Kapila  wrote:
> >
> > On Thu, Feb 27, 2025 at 10:47 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Feb 25, 2025 at 7:33 PM Amit Kapila  
> > > wrote:
> > > >
> > > > AFAICU, InvalidateObsoleteReplicationSlots() is not serialized with
> > > > this operation. So, isn't it possible that the source slot exists at
> > > > the later position in ReplicationSlotCtl->replication_slots and the
> > > > loop traversing slots is already ahead from the point where the newly
> > > > copied slot is created?
> > >
> > > Good point. I think it's possible.
> > >
> > > > If so, the newly created slot won't be marked
> > > > as invalid whereas the source slot will be marked as invalid. I agree
> > > > that even in such a case, at a later point, the newly created slot
> > > > will also be marked as invalid.
> > >
> > > The wal_status of the newly created slot would immediately become
> > > 'lost' and the next checkpoint will invalidate it. Do we need to do
> > > something to deal with this case?
> > >
> >
> > + /* Check if source slot became invalidated during the copy operation */
> > + if (second_slot_contents.data.invalidated != RS_INVAL_NONE)
> > + ereport(ERROR,
> > + errmsg("cannot copy replication slot \"%s\"",
> > +NameStr(*src_name)),
> > + errdetail("The source replication slot was invalidated during the
> > copy operation."));
> >
> > How about adding a similar test as above for MyReplicationSlot? That
> > should suffice the need because we have already acquired the new slot
> > by this time and invalidation should signal this process before
> > marking the new slot as invalid.
>
> IIUC in the scenario you mentioned, the loop traversing slots already
> passed the position of newly created slot in
> ReplicationSlotCtl->replication_slots array, so
> MyReplicationSlot->data.invalidated is still RS_INVAL_NONE, no?
>

Right, I don't have a simpler solution for this apart from making it
somehow serialize this operation with
InvalidateObsoleteReplicationSlots(). I don't think we need to go that
far to handle the scenario being discussed. It is better to add a
comment for this race and why it won't harm.

-- 
With Regards,
Amit Kapila.




Re: NOT ENFORCED constraint feature

2025-02-27 Thread Amul Sul
On Thu, Feb 27, 2025 at 4:48 PM Álvaro Herrera  wrote:
>
> On 2025-Feb-27, Amul Sul wrote:
>
> > Attached is the rebased patch set against the latest master head,
> > which also includes a *new* refactoring patch (0001). In this patch,
> > I’ve re-added ATExecAlterChildConstr(), which is required for the main
> > feature patch (0008) to handle recursion from different places while
> > altering enforceability.
>
> I think you refer to ATExecAlterConstrEnforceability, which claims
> (falsely) that it is a subroutine to ATExecAlterConstrRecurse; in
> reality it is called from ATExecAlterConstraintInternal or at least
> that's what I see in your 0008.  So I wonder if you haven't confused
> yourself here.  If nothing else, that comments needs fixed.  I didn't
> review these patches.
>

Yeah, that was intentional. I wanted to avoid recursion again by
hitting ATExecAlterChildConstr() at the end of
ATExecAlterConstraintInternal(). Also, I realized the value doesn’t
matter since recurse = false is explicitly set inside the
cmdcon->alterEnforceability condition. I wasn’t fully satisfied with
how we handled the recursion decision (code design), so I’ll give it
more thought. If I don’t find a better approach, I’ll add clearer
comments to explain the reasoning.


Regards,
Amul




Re: Statistics Import and Export

2025-02-27 Thread Corey Huinker
>
> +--- error: relation is wrong type
> +SELECT pg_catalog.pg_restore_relation_stats(
> +'relation', 0::oid,
> +'relpages', 17::integer,
> +'reltuples', 400.0::real,
> +'relallvisible', 4::integer);
>
> Why do you need to specify all the stats (relpages, reltuples, etc)?
> To exercise this you could just do:
> select pg_catalog.pg_restore_relation_stats('relation', 0::oid);
>

In the above case, it's historical inertia in that the pg_set_* call
required all those parameters, as well as a fear that the code - now or in
the future - might evaluate "can anything actually change from this call"
and short circuit out before actually trying to make sense of the reg_class
oid. But we can assuage that fear with just one of the three stat
parameters, and I'll adjust accordingly.


> Since I haven't been following along with this feature development, I
> don't think I can get comfortable enough with all of the changes in
> this test diff to commit them. I can't really say if this is the set
> of tests that is representative and sufficient for this feature.
>

That's fine, I hadn't anticipated that you'd review this patch, let alone
commit it.


> If you agree with me that the failure tests could be shorter, I'm
> happy to commit that, but I don't really feel comfortable assessing
> what the right set of full tests is.


The set of tests is as short as I feel comfortable with. I'll give the
parameter lists one more pass and repost.


Re: Log connection establishment timings

2025-02-27 Thread Bertrand Drouvot
Hi,

On Thu, Feb 27, 2025 at 11:14:56AM -0500, Andres Freund wrote:
> I don't think the timing overhead is a relevant factor here - compared to the
> fork of a new connection or performing authentication the cost of taking a few
> timestamps is neglegible. A timestamp costs 10s to 100s of cycles, a fork many
> many millions. Even if you have a really slow timestamp function, it's still
> going to be way way cheaper.

That's a very good point, it has to be put in perspective. The difference in
scale is so significant that the timing collection shouldn't be a concern.
Fair point!

Now I'm thinking what about "if" the connection was on a multi-threaded model?

I think we could reach the same conclusion as thread creation overhead is 
still substantial (allocating stack space, initializing thread state, and other
kernel-level operations) as compare to a really slow timestamp function.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Licence preamble update

2025-02-27 Thread Noah Misch
On Thu, Feb 27, 2025 at 04:56:05PM +, Dave Page wrote:
> Per some brief discussion on the core list, the attached patch updates the
> licence preamble to more accurately reflect the use of Postgres vs.
> PostgreSQL (see https://www.postgresql.org/about/policies/project-name/ for
> background from many years ago).

> --- a/COPYRIGHT
> +++ b/COPYRIGHT
> @@ -1,5 +1,5 @@
>  PostgreSQL Database Management System
> -(formerly known as Postgres, then as Postgres95)
> +(also known as Postgres, formerly as Postgres95)
>  
>  Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group

I'm not seeing this change as aligned with
https://www.postgresql.org/about/policies/project-name/, which says Postgres
"is an alias or nickname and is not the official name of the project."  The
official product name did change Postgres -> Postgres95 -> PostgreSQL, with
"Postgres" holding the status of a nickname since Postgres95 became the
official name.  Today's text matches that history, and the proposed text
doesn't.  Can you share more from the brief discussion?  Changing a license
file is an eyebrow-raising event, so we should do it only if the win is clear.
There may be an argument for making this change, but I'm missing it currently.




Re: Reduce TupleHashEntryData struct size by half

2025-02-27 Thread David Rowley
On Thu, 13 Feb 2025 at 14:01, Jeff Davis  wrote:
> Attached a new patchset that doesn't change the API for
> ExecCopySlotMinimalTuple(). Instead, I'm using
> ExecFetchSlotMinimalTuple(), which avoids the extra memcpy if the slot
> is TTSOpsMinimalTuple, which is what HashAgg uses. I separated out the
> change to use ExecFetchSlotMinimalTuple() into 0004 to illustrate the
> performance impact.

Some review comments:

* my_log2() takes a "long" parameter type but transitionSpace is a
"Size". These types aren't the same width on all platforms we support.
Maybe pg_nextpower2_size_t() is a better option.

* Should the following have MAXALIGN(tupleSize) on the right side of
the expression?

tupleChunkSize = tupleSize

I understand this was missing before, but both bump.c does consume at
least MAXALIGN() in BumpAlloc().

* while (16 * maxBlockSize > work_mem * 1024L)

The 1024L predates the change made in 041e8b95. "1024L" needs to be
replaced with "(Size) 1024"

Maybe you could just replace the while loop and the subsequent "if" check with:

/*
 * Like CreateWorkExprContext(), use smaller sizings for smaller work_mem,
 * to avoid large jumps in memory usage.
 */
maxBlockSize = pg_prevpower2_size_t(work_mem * (Size) 1024 / 16);

/* But no bigger than ALLOCSET_DEFAULT_MAXSIZE */
maxBlockSize = Min(maxBlockSize, ALLOCSET_DEFAULT_MAXSIZE);

/* and no smaller than ALLOCSET_DEFAULT_INITSIZE */
maxBlockSize = Max(maxBlockSize, ALLOCSET_DEFAULT_INITSIZE);

I believe that gives the same result without the looping.

* In hash_create_memory(), can you get rid of the minContextSize and
initBlockSize variables?

* Is it worth an Assert() theck additionalsize > 0?

 * The amount of space available is the additionalsize requested in the call
 * to BuildTupleHashTable(). If additionalsize was specified as zero, no
 * additional space is available and this function should not be called.
 */
static inline void *
TupleHashEntryGetAdditional(TupleHashTable hashtable, TupleHashEntry entry)
{
return (char *) entry->firstTuple - hashtable->additionalsize;
}

Benchmarks:

I was also experimenting with the performance of this using the
following test case:

create table c (a int not null);
insert into c select a from generate_Series(1,1000) a;
vacuum freeze analyze c;

Query: select a,count(*) from c group by a;

Average TPS over 10x 10 second runs with -M prepared

master: 3653.9853988
v7-0001: 3741.815979
v7-0001+0002: 3737.4313064
v7-0001+0002+0003: 3834.6271706
v7-0001+0002+0004+0004: 3912.1158887

I also tried out changing hash_agg_check_limits() so that it no longer
calls MemoryContextMemAllocated and instead uses ->mem_allocated
directly and with all the other patches got:

v7-0001+0002+0004+0004+extra: 4049.0732381

We probably shouldn't do exactly that as it be better not to access
that internal field from outside the memory context code.  A static
inline function in memutils.h that handles the non-recursive callers
might be nice.

I've attached my results of running your test in graph form. I also
see a small regression for these small scale tests. I wondering if
it's worth mocking up some code to see what the performance would be
like without the additional memcpy() that's new to
LookupTupleHashEntry_internal().  How about hacking something up that
adds an additionalsize field to TupleDesc and then set that field to
your additional size and have heap_form_minimal_tuple() allocate that
much extra memory?

David


Remove extra Sort node above a btree-compatible Index Scan

2025-02-27 Thread Alexandra Wang
Hello hackers,

While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1],
I noticed some unexpected plan diffs when running the xtree and treeb
test modules. Specifically, an additional Sort node appears on top of
an Index Only Scan, even when the index key and sort key are the same.
For example:

--- src/test/modules/xtree/expected/interval.out
+++ src/test/modules/xtree/results/interval.out
@@ -375,10 +375,12 @@
 SET enable_seqscan TO false;
 EXPLAIN (COSTS OFF)
 SELECT f1 FROM INTERVAL_TBL_OF r1 ORDER BY f1;
- QUERY PLAN
-
- Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
-(1 row)
+QUERY PLAN
+--
+ Sort
+   Sort Key: f1
+   ->  Index Only Scan using interval_tbl_of_f1_idx on interval_tbl_of r1
+(3 rows)


I’ve attached a patch that removes this unnecessary Sort node for
B-tree-compatible indexes. This change should:
- Reduce the number of test failures in the xtree module from 43 to 30
- Reduce the size of regression.diffs from 234K to 123K

Since pathkey comparison is a hot path in query planning and exercised
by many test queries, I plan to gather more performance metrics.
However, in a simple test running make installcheck with and without
the patch, I observed no negative impact on the runtime of the
regression test suite (which doesn’t include other btree-like indexes)
and a positive impact on the same regression tests for xtree.

Regression tests (same plans):
w/o patch:
make installcheck  1.36s user 2.21s system 12% cpu 28.018 total
w/ patch:
make installcheck  1.32s user 2.12s system 12% cpu 28.089 total

xtree tests:
w/o patch (inferior plan w/ extra sort node):
make installcheck  1.52s user 2.42s system 10% cpu 36.817 total
w/ patch (good plan):
make installcheck  1.52s user 2.48s system 12% cpu 32.201 total

I’ve marked the patch as no-cfbot, as it applies only on top of the
aforementioned patch series [1].

Thoughts?

[1]
https://www.postgresql.org/message-id/a5dfb7cd-7a89-48ab-a913-e5304eee0854%40eisentraut.org

Best,
Alex


v1-0001-Remove-unnecessary-Sort-node-above-Index-Scan-of-.patch.no-cfbot
Description: Binary data


Re: Remove extra Sort node above a btree-compatible Index Scan

2025-02-27 Thread Peter Geoghegan
On Fri, Feb 28, 2025 at 12:23 AM Alexandra Wang
 wrote:
> While reviewing Mark Dilger’s patch series "Index AM API cleanup" [1],
> I noticed some unexpected plan diffs when running the xtree and treeb
> test modules. Specifically, an additional Sort node appears on top of
> an Index Only Scan, even when the index key and sort key are the same.
> For example:
>
> --- src/test/modules/xtree/expected/interval.out
> +++ src/test/modules/xtree/results/interval.out

What's the xtree module? There's no such test module?

-- 
Peter Geoghegan




Re: Adding support for SSLKEYLOGFILE in the frontend

2025-02-27 Thread Abhishek Chanda
Thanks for the review, Daniel. Please find v5 of this patch attached.
I tried to bump up the minimum OpenBSD version in installation.sgml,
do I need to add this anywhere else? Please let me know if this needs
anything else.

Thanks

On Thu, Feb 20, 2025 at 4:25 PM Daniel Gustafsson  wrote:
>
> > On 20 Feb 2025, at 04:36, Abhishek Chanda  wrote:
> > Please find attached a new version of this patch. I rebased on master,
> > added better error reporting and skipped the permissions check on
> > windows. Please let me know if this needs any changes. I tested this
> > locally using meson running all TAP tests.
>
> Thanks for the new version, a few comments on this one from reading:
>
> +./src/test/ssl/key.txt
> The toplevel .gitignore should not be used for transient test output, there is
> a .gitignore in src/test/ssl for that.  The key.txt file should also be placed
> in PostgreSQL::Test::Utils::tempdir and then cleaned up after the test.
>
>
> +  xreflabel="sslkeylogfile">
> The documentation should state that the file will use the NSS format.
>
>
> +   if (log_file == NULL) {
> +   libpq_append_conn_error(conn, "could not open ssl key log ...
> +   }
> This raise an error for not being able to operate on the file, but it doesn't
> error out from the function and instead keeps going with more operations on 
> the
> file which couldn't be opened.
>
>
> +   if (chmod(conn->sslkeylogfile, 0600) == -1) {
> Rather than opening and try to set permissions separately, why not use open(2)
> and set the umask?  We probably also want to set O_NOFOLLOW.
>
>
> +   if (conn->sslkeylogfile && strlen(conn->sslkeylogfile) > 0)
> +   SSL_CTX_set_keylog_callback(SSL_context, SSL_CTX_keylog_cb);
> This callback does exist in OpenSSL 1.1.1 but it's only available in LibreSSL
> from OpenBSD 7.1 and onwards where we support 7.0 as the earliest version.
> This feature seems like a good enough reason to bump the minimum LibreSSL
> version, and 7.1 is well out support (7.5 goes EOL next), but it would need to
> get done here.
>
>
> +# Skip file mode check on Windows
> +return 1 if $windows_os;
> It should be up to each individual test to determine what to skip, not the
> library code (and it should use SKIP:).  src/test/ssl/t/001_ssltests.pl is an
> example which skips permissions checks on Windows.  Also, I'm not sure we need
> a generic function in the testlibrary for something so basic.
>
>
> +print STDERR sprintf("%s mode must be %04o\n",
> +   $path, $expected_file_mode);
> Test code should not print anything to stdout/stderr, it should use the TAP
> compliant methods like diag() etc for this.
>
>
> +   "connect with server root certand sslkeylogfile=key.txt");
> s/certand/cert and/ ?
>
> --
> Daniel Gustafsson
>


-- 
Thanks and regards
Abhishek Chanda


v5-0001-Add-support-for-dumping-SSL-keylog-to-a-file.patch
Description: Binary data


Re: Log connection establishment timings

2025-02-27 Thread Bertrand Drouvot
Hi,

On Thu, Feb 27, 2025 at 11:08:04AM -0500, Melanie Plageman wrote:
> I was just talking to Andres off-list and he mentioned that the volume
> of log_connections messages added in recent releases can really be a
> problem for users. He said ideally we would emit one message which
> consolidated these (and make sure we did so for failed connections too
> detailing the successfully completed stages).
> 
> However, since that is a bigger project (with more refactoring, etc),
> he suggested that we change log_connections to a GUC_LIST
> (ConfigureNamesString) with options like "none", "received,
> "authenticated", "authorized", "all".
> 
> Then we could add one like "established" for the final message and
> timings my patch set adds. I think the overhead of an additional log
> message being emitted probably outweighs the overhead of taking those
> additional timings.
> 
> String GUCs are a lot more work than enum GUCs, so I was thinking if
> there is a way to do it as an enum.
> 
> I think we want the user to be able to specify a list of all the log
> messages they want included, not just have each one include the
> previous ones. So, then it probably has to be a list right? There is
> no good design that would fit as an enum.

Interesting idea... Yeah, that would sound weird with an enum. I could think
about providing an enum per possible combination but I think that would
generate things like 2^N enum and won't be really user friendly (also that would
double each time we'd want to add a new possible "value" becoming quickly
unmanageable).

So yeah, I can't think of anything better than GUC_LIST.

> > I think that's somehow also around code maintenance (not only LOC), say for 
> > example
> > if we want to add more "child_type" in the check (no need to remember to 
> > update both
> > locations).
> 
> I didn't include checking the child_type in that function since it is
> unrelated to instr_time, so it sadly wouldn't help with that. We could
> macro-ize the child_type check were we to add another child_type.

Yup but my idea was to put all those line:

"
if (Log_connections &&
(child_type == B_BACKEND || child_type == B_WAL_SENDER))
{
instr_time  fork_time = ((BackendStartupData *) 
startup_data)->fork_time;

conn_timing.fork_duration = INSTR_TIME_GET_DURATION_SINCE(fork_time);
}
"

into a dedicated helper function.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [BUG]: the walsender does not update its IO statistics until it exits

2025-02-27 Thread Michael Paquier
On Wed, Feb 26, 2025 at 09:48:50AM +, Bertrand Drouvot wrote:
> Yeah I think that makes sense, done that way in the attached.
> 
> Speaking about physical walsender, I moved the test to 001_stream_rep.pl 
> instead
> (would also fail without the fix).

Hmm.  I was doing some more checks with this patch, and on closer look
I am wondering if the location you have chosen for the stats reports
is too aggressive: this requires a LWLock for the WAL sender backend
type taken in exclusive mode, with each step of WalSndLoop() taken
roughly each time a record or a batch of records is sent.  A single
installcheck with a primary/standby setup can lead to up to 50k stats
report calls.

With smaller records, the loop can become hotter, can't it?  Also,
there can be a high number of WAL senders on a single node, and I've
heard of some customers with complex logical decoding deployments with
dozens of logical WAL senders.  Isn't there a risk of having this code
path become a point of contention?  It seems to me that we should
benchmark this change more carefully, perhaps even reduce the
frequency of the report calls.
--
Michael


signature.asc
Description: PGP signature


RE: ReplicationSlotRelease() crashes when the instance is in the single user mode

2025-02-27 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> I understand that we may not have a clear use case for this to work in
> single-user mode. But how will we define the boundary in similar
> cases? I mean, we should have some rule for such exposed functions,
> and it should be followed uniformly. Now, if one needs a bigger or
> complex change to make the function work in single-user mode, then it
> is easy to take an exception from what is currently being followed in
> the code. However, if the change is as trivial as you proposed in the
> first email, why not go with that and make this function work in
> single-user mode?

Hmm, the opinion about this is completely opposite with other reviewers. I want
to ask them again how you feel. I also added Tom who pointed out in the initial
thread.

Question: how you feel to restrict SQL functions for slot during the single-user
mode? Nobody has considered use cases for it; we do not have concrete theory and
similar cases. And needed band-aid to work seems very small.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



Re: ReplicationSlotRelease() crashes when the instance is in the single user mode

2025-02-27 Thread Amit Kapila
On Thu, Feb 27, 2025 at 1:29 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> >
> > Which other functions do we see similar restrictions? I checked
> > "sequence manipulation functions" (1), and "Transaction ID and
> > Snapshot Information Functions" (2) but couldn't see similar
> > restrictions.
> >
> > (1) - https://www.postgresql.org/docs/current/functions-sequence.html
> > (2) -
> > https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INF
> > O-SNAPSHOT
>
> I grepped sources and could not find explicit limitations neither. So...this 
> might
> be overkill.
>

Yes, this is my worry.

> But I still think the restriction is OK for the slot - no need to do
> some efforts for accepting single-user mode, just close the cover.
>

I understand that we may not have a clear use case for this to work in
single-user mode. But how will we define the boundary in similar
cases? I mean, we should have some rule for such exposed functions,
and it should be followed uniformly. Now, if one needs a bigger or
complex change to make the function work in single-user mode, then it
is easy to take an exception from what is currently being followed in
the code. However, if the change is as trivial as you proposed in the
first email, why not go with that and make this function work in
single-user mode?

-- 
With Regards,
Amit Kapila.




Re: Remove extra Sort node above a btree-compatible Index Scan

2025-02-27 Thread Tom Lane
Alexandra Wang  writes:
> I’ve attached a patch that removes this unnecessary Sort node for
> B-tree-compatible indexes.

This does not look right at all.  You can't just ignore the opfamily
etc. while deciding whether two pathkeys represent the same sort
ordering, as you did in create_mergejoin_plan().  I don't like
pathkeys_have_same_sortop() either.  The pathkey data structures
were designed to let pointer comparison be sufficient for deciding
whether pathkeys are equivalent: see the "canonical pathkey" stuff
in pathkeys.c.  I think that this patch may be band-aiding over some
breakage of that concept, but you've not provided enough context to
figure out what the underlying problem is.

regards, tom lane




RE: SIMD optimization for list_sort

2025-02-27 Thread R, Rakshit
Hi All,

Thank you so much for the feedback. 

> I don't think "another extension might use it someday" makes a very strong 
> case, 
> particularly for something that requires a new dependency.

The x86-simdsort library is an optional dependency in Postgres. Also the new 
list sort implementation which uses the x86-simdsort library does not impact 
any of the existing workflows in Postgres. We have at least one use case where 
Pgvector gets a gain of 7 - 10 % in HNSW index build time using the SIMD sort 
implementation. Hence we feel that this patch could be up streamed further. 
Open to further discussion on the same.


> Note that our current implemention is highly optimized for low-cardinality 
> inputs.
> This is needed for aggregate queries. I found this write-up of a 
> couple scalar and vectorized sorts, and they show this library doing 
> very poorly on very-low cardinality inputs. I would look into that 
> before trying to get something in shape to share.
> 
> https://github.com/Voultapher/sort-research-
> rs/blob/main/writeup/intel_avx512/text.md

We ran our extension to stress list sort with low cardinality inputs. For eg, 
for an array of size 100k having repeated values in the range 1-10 we still see 
a gain of around 20% in throughput.
We will collect more data for low cardinality inputs and with AVX2 too.

Thanks and regards
Rakshit Rajeev


Re: Get rid of WALBufMappingLock

2025-02-27 Thread Michael Paquier
On Wed, Feb 26, 2025 at 01:48:47PM +0200, Alexander Korotkov wrote:
> Thank you for offering the help.  Updated version of patch is attached
> (I've added one memory barrier there just in case).  I would
> appreciate if you could run it on batta, hachi or similar hardware.

Doing a revert of the revert done in 3fb58625d18f proves that
reproducing the error is not really difficult.  I've done a make
installcheck-world USE_MODULE_DB=1 -j N without assertions, and the
point that saw a failure quickly is one of the tests of pgbench:
PANIC:  could not find WAL buffer for 0/19D366

This one happened for the test "concurrent OID generation" and CREATE
TYPE.  Of course, as it is a race condition, it is random, but it's
taking me only a couple of minutes to see the original issue on my
buildfarm host.  With assertion failures enabled, same story, and same
failure from the pgbench TAP test.

Saying that, I have also done similar tests with your v12 for a couple
of hours and this looks stable under installcheck-world.  I can see
that you've reworked quite a bit the surroundings of InitializedFrom
in this one.  If you apply that once again at some point, the
buildfarm will be judge in the long-term, but I am rather confident by
saying that the situation looks better here, at least.

One thing I would consider doing if you want to gain confidence is
tests like the one I saw causing problems with pgbench, with DDL
patterns stressing specific paths like this CREATE TYPE case.
--
Michael


signature.asc
Description: PGP signature


Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.

2025-02-27 Thread Robert Haas
On Thu, Dec 19, 2024 at 8:29 AM Robert Haas  wrote:
> Cool. I don't want to commit this right before my vacation but I'll
> look into it in the new year if nobody beats me to it.

Nobody has beaten me to it, but I thought of one potential issue.
Suppose that somebody has a foreign table that exists today with one
of these properties set to the empty string. Such a foreign table is
not usable for queries, because the queries won't make it past scan.l
on the remote side, but it's allowed to exist. With this patch, it's
not allowed to exist any more. That would be fine if no such objects
exist in the wild, but if they do, people will get a pg_dump or
pg_upgrade failure when they try to upgrade to a release that includes
this change. Does that mean that we shouldn't commit this patch?

I'm inclined to think that it's probably still OK, because (1) few
users are likely to have such objects, (2) the workaround is easy,
just drop the object or fix it to do something useful, just as users
already need to do in plenty of other, similar cases and (3) unclear
error messages are not great and future users might benefit from this
error message being better. However, one could take the opposite
position and say that (C1) it is unlikely that anyone will try to do
this in the first place and (C2) the error message that is currently
generated when you try to do this isn't really all that unclear, and
therefore it's not worth creating even a minor dump-and-reload hazard;
and that position also seems pretty defensible to me.

Other opinions?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-02-27 Thread Robert Haas
On Mon, Feb 17, 2025 at 5:44 PM Peter Geoghegan  wrote:
> I need more feedback about this. I don't understand your perspective here.
>
> If I commit the skip scan patch, but don't have something like this
> instrumentation in place, it seems quite likely that users will
> complain about how opaque its behavior is. While having this
> instrumentation isn't quite a blocker to committing the skip scan
> patch, it's not too far off, either. I want to be pragmatic. Any
> approach that's deemed acceptable is fine by me, provided it
> implements approximately the same behavior as the patch that I wrote
> implements.
>
> Where is this state that tracks the number of index searches going to
> live, if not in IndexScanDesc? I get why you don't particularly care
> for that. But I don't understand what the alternative you favor looks
> like.

+1 for having some instrumentation. I do not agree with Tom that these
are numbers that only Peter Geoghegan and 2-3 other people will ever
understand. I grant that it's not going to make sense to everyone, but
the number of people to which it will make sense I would guess is
probably in the hundreds or the thousands rather than the single
digits. Good documentation could help.

So, where should we store that information?

The thing that's odd about using IndexScanDesc is that it doesn't
contain any other instrumentation currently, or at least not that I
can see. Everything else that EXPLAIN prints in terms of index scan is
printed by show_instrumentation_count() from planstate->instrument. So
it seems reasonable to think maybe this should be part of
planstate->instrument, too, but there seem to be at least two problems
with that idea. First, that struct has just four counters (ntuples,
ntuples2, nfiltered, nfiltered2). For an index-only scan, all four of
them are in use -- a plain index scan does not use ntuples2 -- but I
think this optimization can apply to both index and index-only scans,
so we just don't have room. Second, those existing counters are used
for things that we can count in the executor, but the executor won't
know how many index searches occur down inside the AM. So I see the
following possibilities:

1. Add a new field to struct Instrumentation. Make
index_getnext_slot() and possibly other similar functions return a
count of index searches via a new parameter, and use that to bump the
new struct Instrumentation counter. It's a little ugly to have to add
a special-purpose parameter for this, but it doesn't look like there
are tons and tons of calls so maybe it's OK.

2. Add a new field to BufferUsage. Then the AM can bump this field and
explain.c will know about it the same way it knows about other changes
to pgBufferUsage. However, this is not really about buffer usage and
the new field seems utterly unlike the other things that are in that
structure, so this seems really bad to me.

3. Add a new field to IndexScanDesc, as you originally proposed. This
seems similar to #1: it's still shoveling instrumentation data around
in a way that we don't currently, but instead of shoveling it through
a new parameter, we shovel it through a new structure member. Either
way, the new thing (parameter or structure member) doesn't really look
like it belongs with what's already there, so it seems like
conservation of ugliness.

4. Add a new field to the btree-specific structure referenced by the
IndexScanDesc's opaque pointer. I think this is what Matthias was
proposing. It doesn't seem particularly hard to implement. and seems
similar to #1 and #3.

It is not clear to me that any of #1, #3, and #4 are radically better
than any of the other ones, with the following exception: it would be
a poor idea to choose #4 over #3 if this field will ultimately be used
for a bunch of different AMs, and a poor idea to choose #3 over #4 if
it's always going to be interesting only for btree. I'll defer to you
on which of those things is the case, but with the request that you
think about what is practically likely to happen and not advocate too
vigorously based on an argument that makes prominent use of the phrase
"in theory". To be honest, I don't really like any of these options
very much: they all seem a tad inelegant. But sometimes that is a
necessary evil when inventing something new. I believe if I were
implementing this myself I would probably try #1 first; if that ended
up seeming too ugly, then I would fall back to #3 or #4.

Does that help?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




RE: Improve CRC32C performance on SSE4.2

2025-02-27 Thread Devulapalli, Raghuveer
Hi John, 

Going back to your earlier comment: 

> > > I'm not a fan of exposing these architecture-specific details to
> > > places that consult the capabilities. That requires things like
+#define PGCPUCAP_CRC32C pg_cpu_have(PG_CPU_FEATURE_SSE42)

> > Isn't that one thing currently pg_cpu_have(FEATURE)?
> 
> No, I mean the one thing would be "do we have hardware CRC?", and each
> architecture would set (and check if needed) the same slot.
> 
> I'm not 100% sure this is the better way, and there may be a disadvantage I
> haven't thought of, but this makes the most sense to me. I'm willing to hear
> reasons to do it differently, but I'm thinking those reasons need to be 
> weighed
> against the loss of abstraction.

IMO, CPU SIMD (SSE4.2, AVX, etc.) features are a module of their own separate 
from capabilities/features supported in postgres (CRC32C, bitcount, etc.). 
Exposing architecture-specific details to the features that need to check 
against them is a way to make code more modular and reusable. It is reasonable 
to expect developers writing SIMD specific functions to naturally understand 
the runtime architecture requirements for that function which is easily 
accessible by just checking the value of pg_cpu_have(PG_CPU_FEATURE_..). If 
another feature in postgres requires SSE4.2, then the CPU initialization module 
doesn't need to be modified at all. They just have to worry about their feature 
and its CPU requirements. 

> Just so we're on the same page, the current separate files do initialization, 
> not
> dispatch. I'm seeing conflicting design goals
> here:
> 
> 1. Have multiple similar dispatch functions with the only difference (for now)
> being certain names (not in the patch, just discussed).
> 2. Put initialization routines in the same file, even though they have 
> literally
> nothing in common.

Sorry for not being clear. I will move the initialization routines to separate 
files. Just haven’t gotten to it yet with v10. 

> 
> > This also makes it easier to add more implementations in the future without
> having to make the dispatch function work for both ARM and x86.
> 
> I don't quite understand, since with 0001 it already works (at least in 
> theory) for 3
> architectures with hardware CRC, plus those without, and I don't see it 
> changing
> with more architectures.

The reason its working across all architectures as of now is because there is 
only one runtime check for CRC32C feature across x86, ARM and LongArch. (PCLMUL 
dispatch happens inside the SSE42). If and when we add the AVX-512 
implementation, won't the dispatch logic will look different for x86 and ARM? 
Along with that, the header file pg_crc32c.h will also look a lot messier with 
a whole bunch of nested #ifdefs.  IMO, the header file should be devoid of any 
architecture dispatch logic and simply contain declarations of various 
implementations (see 
https://github.com/r-devulap/postgressql/blob/cf80f7375f24d2fb5cd29e95dc73d4988fc6d066/src/include/port/pg_crc32c.h
 for example). The dispatch logic should be handled separately in a C file. 

> 
> +/* Access to pg_cpucap for modules that need runtime CPUID information
> +*/ bool pg_cpu_have(int feature_id) {
> +return pg_cpucap[feature_id];
>  }
> 
> Putting this in a separate translation may have to do the decision to turn 
> v8's
> #defines into an enum? In any case, this means doing a function call to decide
> which function to call. That makes no sense.

The reason it is a separate translational unit is because I intended pg_cpucap 
to be a read only variable outside of pg_cpucap.c. If the function overhead is 
not preferred, then I can get rid of it.

> 
> The goal of v9/10 was to centralize the initialization from cpuid etc, but it 
> also did
> a major re-architecture of the runtime-checks at the same. That made review
> more difficult.

I assumed we wanted to move the runtime checks to a central place and that 
needed this re-architecture. 

> 
> +#if defined(__x86_64__) || defined ...
> +pg_cpucap_x86();
> +#elif defined(__aarch64__) || defined ...
> +pg_cpucap_arm();
> +#endif
> 
> I view this another conflict in design goals: After putting everything in the 
> same
> file, the routines still have different names and have to be guarded 
> accordingly. I
> don't really see the advantage, especially since these guard symbols don't 
> even
> match the ones that guard the actual initialization steps. I tried to imply 
> that in
> my last review, but maybe I should have been more explicit.
> 
> I think the least painful step is to take the x86 initialization from v10, 
> which is
> looking great, but
> - keep separate initialization files
> - don't whack around the runtime representation, at least not in the same 
> patch

Sure. I can fix that in v11. 

Raghuveer






Re: Update docs for UUID data type

2025-02-27 Thread Masahiko Sawada
On Thu, Feb 27, 2025 at 1:26 PM Andy Alsup  wrote:
>
> I've submitted it for the up-coming commitfest. The link is: 
> https://commitfest.postgresql.org/patch/5604/
> Thanks for all your help in reviewing these changes.

Thank you for the patch!

Regarding the 0001 patch, I think we can put uuidv4() and
get_random_uuid() in the same row since they are effectively identical
functions. For example, we have precedent such as char_length() and
character_length().

The 0002 patch looks good to me.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Small memory fixes for pg_createsubcriber

2025-02-27 Thread Michael Paquier
On Thu, Feb 27, 2025 at 10:23:31AM -0300, Ranier Vilela wrote:
> Yeah, I also think it would look good like this.

It's the least confusing option, indeed.  I've reduced a bit the diffs
and done that down to v16 for the pg_upgrade part where this exists.

Double-checking the tree, it does not seem that we have simolar holes
now..  I hope that I'm not wrong.
--
Michael


signature.asc
Description: PGP signature


Re: new commitfest transition guidance

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 11:14 AM Tom Lane  wrote:
> Robert Haas  writes:
> > But let's not make the mistake of saying "we're not going to move
> > things automatically because we want to find out if the authors are
> > still interested" and then getting really concerned when some stuff
> > doesn't get moved. That's missing the whole point.
>
> +1.  Having a significant fraction of patches drop off the radar
> was the desired outcome, wasn't it?

Well, not if the authors really are still actively caring about them.
But there's been a funny evolution of our thinking about CommitFests
over the years. When I first started managing CommitFests, I evicted
patches that the author didn't update -- following a review -- within
four days. Not four business days - four calendar days. After all, it
was a CommitFest -- it was supposed to be for patches that were ready
to be committed. That policy was, I think, viewed by many as too
draconian, and it probably was. But now we've gotten to a point where
the one month gap between CommitFest N and CommitFest N+1 is thought
to be so short that it might be unreasonable to expect a patch author
to move their patch forward sometime during that time. And I think
that's clearly going too far the other way.

Perhaps even way, way too far the other way. I think it's completely
fine if somebody has a patch that they update occasionally as they
have and it putters along for a few years and it finally either gets
committed or it doesn't. I one hundred percent support part-time
developers having the opportunity to participate as and when their
schedule permits and I think that is an important part of being a good
and welcoming open source community. But there are also people who are
working very hard and very actively to progress patches and staying on
top of the CF status and any new review emails every day, and that is
ALSO a great way to do development, and it's reasonable to treat those
cases differently. I'm not sure exactly how we can best do that, but
it makes my head hurt every time I find something in the CF where the
patch author was like "well, I'll get back to this at some point" and
then three CFs later it's still sitting there in "Needs Review" or
something.

Probably not moving things forward automatically is only one part of
that problem -- we could very much use some better tooling for judging
how active certain patches are and, if not so active, whether that's
more a lack of reviewers or more author inactivity. But we're not
doing ourselves any favors by working super-hard to keep everything
that anybody might potentially care about in the same bucket as things
that are actually active.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Ranier Vilela
Hi.

Em ter., 18 de fev. de 2025 às 11:31, Melanie Plageman <
melanieplage...@gmail.com> escreveu:

> On Sun, Feb 16, 2025 at 1:12 PM Tom Lane  wrote:
> >
> > Thomas Munro  writes:
> > > Thanks!  It's green again.
> >
> > The security team's Coverity instance complained about this patch:
> >
> > *** CID 1642971:  Null pointer dereferences  (FORWARD_NULL)
> >
> /srv/coverity/git/pgsql-git/postgresql/src/backend/access/heap/vacuumlazy.c:
> 1295 in lazy_scan_heap()
> > 1289buf = read_stream_next_buffer(stream,
> &per_buffer_data);
> > 1290
> > 1291/* The relation is exhausted. */
> > 1292if (!BufferIsValid(buf))
> > 1293break;
> > 1294
> > >>> CID 1642971:  Null pointer dereferences  (FORWARD_NULL)
> > >>> Dereferencing null pointer "per_buffer_data".
> > 1295blk_info = *((uint8 *) per_buffer_data);
> > 1296CheckBufferIsPinnedOnce(buf);
> > 1297page = BufferGetPage(buf);
> > 1298blkno = BufferGetBlockNumber(buf);
> > 1299
> > 1300vacrel->scanned_pages++;
> >
> > Basically, Coverity doesn't understand that a successful call to
> > read_stream_next_buffer must set per_buffer_data here.  I don't
> > think there's much chance of teaching it that, so we'll just
> > have to dismiss this item as "intentional, not a bug".
>
> Is this easy to do? Like is there a list of things from coverity to ignore?
>
> > I do have a suggestion: I think the "per_buffer_data" variable
> > should be declared inside the "while (true)" loop not outside.
> > That way there is no chance of a value being carried across
> > iterations, so that if for some reason read_stream_next_buffer
> > failed to do what we expect and did not set per_buffer_data,
> > we'd be certain to get a null-pointer core dump rather than
> > accessing data from a previous buffer.
>
> Done and pushed. Thanks!
>
Per Coverity.

CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
8. var_deref_op: Dereferencing null pointer per_buffer_data.

I think that function *read_stream_next_buffer* can return
a invalid per_buffer_data pointer, with a valid buffer.

Sorry if I'm wrong, but the function is very suspicious.

Attached a patch, which tries to fix.

best regards,
Ranier Vilela


fix-possible-invalid-pointer-read_stream.patch
Description: Binary data


Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Andres Freund
Hi,

On 2025-02-27 14:32:28 -0300, Ranier Vilela wrote:
> Hi.
> 
> Em ter., 18 de fev. de 2025 às 11:31, Melanie Plageman <
> melanieplage...@gmail.com> escreveu:
> 
> > On Sun, Feb 16, 2025 at 1:12 PM Tom Lane  wrote:
> > >
> > > Thomas Munro  writes:
> > > > Thanks!  It's green again.
> > >
> > > The security team's Coverity instance complained about this patch:
> > >
> > > *** CID 1642971:  Null pointer dereferences  (FORWARD_NULL)
> > >
> > /srv/coverity/git/pgsql-git/postgresql/src/backend/access/heap/vacuumlazy.c:
> > 1295 in lazy_scan_heap()
> > > 1289buf = read_stream_next_buffer(stream,
> > &per_buffer_data);
> > > 1290
> > > 1291/* The relation is exhausted. */
> > > 1292if (!BufferIsValid(buf))
> > > 1293break;
> > > 1294
> > > >>> CID 1642971:  Null pointer dereferences  (FORWARD_NULL)
> > > >>> Dereferencing null pointer "per_buffer_data".
> > > 1295blk_info = *((uint8 *) per_buffer_data);
> > > 1296CheckBufferIsPinnedOnce(buf);
> > > 1297page = BufferGetPage(buf);
> > > 1298blkno = BufferGetBlockNumber(buf);
> > > 1299
> > > 1300vacrel->scanned_pages++;
> > >
> > > Basically, Coverity doesn't understand that a successful call to
> > > read_stream_next_buffer must set per_buffer_data here.  I don't
> > > think there's much chance of teaching it that, so we'll just
> > > have to dismiss this item as "intentional, not a bug".
> >
> > Is this easy to do? Like is there a list of things from coverity to ignore?
> >
> > > I do have a suggestion: I think the "per_buffer_data" variable
> > > should be declared inside the "while (true)" loop not outside.
> > > That way there is no chance of a value being carried across
> > > iterations, so that if for some reason read_stream_next_buffer
> > > failed to do what we expect and did not set per_buffer_data,
> > > we'd be certain to get a null-pointer core dump rather than
> > > accessing data from a previous buffer.
> >
> > Done and pushed. Thanks!
> >
> Per Coverity.
> 
> CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
> 8. var_deref_op: Dereferencing null pointer per_buffer_data.

That's exactly what the messages you quoted are discussing, no?


> Sorry if I'm wrong, but the function is very suspicious.

How so?



> diff --git a/src/backend/storage/aio/read_stream.c 
> b/src/backend/storage/aio/read_stream.c
> index 04bdb5e6d4..18e9b4f3c4 100644
> --- a/src/backend/storage/aio/read_stream.c
> +++ b/src/backend/storage/aio/read_stream.c
> @@ -666,6 +666,8 @@ read_stream_next_buffer(ReadStream *stream, void 
> **per_buffer_data)
>   
> READ_BUFFERS_ISSUE_ADVICE : 0)))
>   {
>   /* Fast return. */
> + if (per_buffer_data)
> + *per_buffer_data = 
> get_per_buffer_data(stream, oldest_buffer_index);
>   return buffer;
>   }

A few lines above:
Assert(stream->per_buffer_data_size == 0);

The fast path isn't used when per buffer data is used.  Adding a check for
per_buffer_data and assigning something to it is nonsensical.

Greetings,

Andres Freund




Re: Log connection establishment timings

2025-02-27 Thread Andres Freund
Hi,

On 2025-02-27 11:08:04 -0500, Melanie Plageman wrote:
> I was just talking to Andres off-list and he mentioned that the volume
> of log_connections messages added in recent releases can really be a
> problem for users.

For some added color: I've seen plenty systems where almost all the log volume
is log_connection messages, which they *have* to enable for various regulatory
reasons.  It'd still be a lot if we just emitted one message for each
connection, but logging three (and possibly four with $thread), for each
connection makes it substantially worse.


> He said ideally we would emit one message which consolidated these (and make
> sure we did so for failed connections too detailing the successfully
> completed stages).

A combined message would also not *quite* replace all use-cases, e.g. if you
want to debug arriving connections or auth problems you do want the additional
messages.  But yea, for normal operation, I do think most folks want just one
message.


> However, since that is a bigger project (with more refactoring, etc),
> he suggested that we change log_connections to a GUC_LIST
> (ConfigureNamesString) with options like "none", "received,
> "authenticated", "authorized", "all".

Yep.


> Then we could add one like "established" for the final message and
> timings my patch set adds. I think the overhead of an additional log
> message being emitted probably outweighs the overhead of taking those
> additional timings.

To bikeshed a bit: "established" could be the TCP connection establishment
just as well. I'd go for "completed" or "timings".


> String GUCs are a lot more work than enum GUCs, so I was thinking if
> there is a way to do it as an enum.
> 
> I think we want the user to be able to specify a list of all the log
> messages they want included, not just have each one include the
> previous ones. So, then it probably has to be a list right? There is
> no good design that would fit as an enum.

I don't see a way to comfortably shove this into an enum either.

Greetings,

Andres Freund




Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Andres Freund
Hi,

On 2025-02-27 12:44:24 -0500, Andres Freund wrote:
> > CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
> > 8. var_deref_op: Dereferencing null pointer per_buffer_data.
> 
> That's exactly what the messages you quoted are discussing, no?

Ah, no, it isn't. But I still think the coverity alert and the patch don't
make sense, as per the below:

> > diff --git a/src/backend/storage/aio/read_stream.c 
> > b/src/backend/storage/aio/read_stream.c
> > index 04bdb5e6d4..18e9b4f3c4 100644
> > --- a/src/backend/storage/aio/read_stream.c
> > +++ b/src/backend/storage/aio/read_stream.c
> > @@ -666,6 +666,8 @@ read_stream_next_buffer(ReadStream *stream, void 
> > **per_buffer_data)
> > 
> > READ_BUFFERS_ISSUE_ADVICE : 0)))
> > {
> > /* Fast return. */
> > +   if (per_buffer_data)
> > +   *per_buffer_data = 
> > get_per_buffer_data(stream, oldest_buffer_index);
> > return buffer;
> > }
> 
> A few lines above:
>   Assert(stream->per_buffer_data_size == 0);
> 
> The fast path isn't used when per buffer data is used.  Adding a check for
> per_buffer_data and assigning something to it is nonsensical.

Greetings,

Andres Freund




Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?

2025-02-27 Thread Peter Geoghegan
On Sat, Feb 8, 2025 at 8:47 AM Michail Nikolaev
 wrote:
> Just some commit messages + few cleanups.

I'm worried about this:

+These longer pin lifetimes can cause buffer exhaustion with messages like "no
+unpinned buffers available" when the index has many pages that have similar
+ordering; but future work can figure out how to best work that out.

I think that we should have some kind of upper bound on the number of
pins that can be acquired at any one time, in order to completely
avoid these problems. Solving that problem will probably require GiST
expertise that I don't have right now.

-- 
Peter Geoghegan




Re: Statistics Import and Export

2025-02-27 Thread Greg Sabino Mullane
I know I'm coming late to this, but I would like us to rethink having
statistics dumped by default. I was caught by this today, as I was doing
two dumps in a row, but the output changed between runs solely because the
stats got updated. It got me thinking about all the use cases of pg_dump
I've seen over the years. I think this has the potential to cause a lot of
problems for things like automated scripts. It certainly violates the
principle of least astonishment to have dumps change when no user
interaction has happened.

Alternatively, we could put stats into SECTION_POST_DATA,


No, that would make the above-mentioned problem much worse.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: Disabling vacuum truncate for autovacuum

2025-02-27 Thread Gurjeet Singh
On Mon, Jan 27, 2025 at 1:55 AM Laurenz Albe  wrote:
>
> On Thu, 2025-01-23 at 22:33 -0800, Gurjeet Singh wrote:
> > > > I am also wondering if having an autovacuum setting to control it would 
> > > > be
> > > > a good idea for a feature.
> > >
> > > I'm all for that.
> >
> > Please see attached an initial patch to disable truncation behaviour in
> > autovacuum. This patch retains the default behavior of autovacuum truncating
> > relations. The user is allowed to change the behaviour and disable relation
> > truncations system-wide by setting autovacuum_disable_vacuum_truncate = 
> > true.
> > Better parameter names welcome :-)
>
> I hope it is possible to override the global setting with the 
> "vacuum_truncate"
> option on an individual table.

Current patch behaviour is that if the autovacuum_vacuum_truncate is false, then
autovacuum will _not_ truncate any relations. If the parameter's value is true
(the default), then the relation's reloption will be honored.

A table-owner, or the database-owner, may enable truncation of a table, as they
may be trying to be nice and return the unused disk space back to the
OS/filesystem. But if the sysadmin/DBA (who is ultimately responsible for the
health of the entire db instance, as well as of any replicas of the db
instance),
wants to disable truncation across all databases to ensure that the replication
does not get stuck, then IMHO Postgres should empower the sysadmin to make
that decision, and override the relation-level setting enabled by the table-
owner or the database-owner.

> > One additional improvement I can think of is to emit a WARNING or NOTICE 
> > message
> > that truncate operation is being skipped, perhaps only if the truncation
> > would've freed up space over a certain threshold.
>
> Interesting idea, but I think it is independent from this patch.

Agreed. I'll consider writing a separate patch for this.

> > Perhaps there's value in letting this parameter be specified at database 
> > level,
> > but I'm not able to think of a reason why someone would want to disable this
> > behaviour on just one database. So leaving the parameter context to be the 
> > same
> > as most other autovacuum parameters: SIGHUP.
>
> I can imagine setting that on only a certain database. Different databases
> typically have different applications, which have different needs.

Makes sense. I don't think anything special needs to be done in the patch to
address this.

> Eventually, the patch should have documentation and regression tests.

Documentation added. Pointers on if, where, and what kind of tests to add will
be appreciated.

On Mon, Jan 27, 2025 at 12:38 PM Robert Haas  wrote:
>
> On Mon, Jan 27, 2025 at 4:55 AM Laurenz Albe  wrote:
> > My suggestion for the parameter name is "autovacuum_disable_truncate".
>
> Then it would have a different name than the reloption, and the
> opposite sense. In most cases, we've been able to keep those matching
> -- autovacuum vs. autovacuum_enabled being, I believe, the only
> current mismatch.

If we want to maintain the convention of autovacuum parameters names to be of
the format "autovacuum_" then I believe the name
autovacuum_vacuum_truncate (boolean) would be better, as compared to my original
proposal (autovacuum_disable_vacuum_truncate), or Laurenz's proposal above. The
default value should be true, to match the current autovacuum behaviour.

> Also, how sure are we that turning this off globally is a solid idea?
> Off-hand, it doesn't sound that bad: there are probably situations in
> which autovacuum never truncates anything anyway just because the tail
> blocks of the relations always contain at least 1 tuple. But we should
> be careful not to add a setting that is far more likely to get people
> into trouble than to get them out of it. It would be good to hear what
> other people think about the risk vs. reward tradeoff in this case.

Taking silence from others to be a sign of no opposition, I'm moving forward
with the patch.

On Tue, Feb 18, 2025 at 11:56 AM Nathan Bossart
 wrote:
>
> On Mon, Jan 27, 2025 at 03:38:39 PM -0500, Robert Haas wrote:
> > Also, how sure are we that turning this off globally is a solid idea?

> In any case, it's
> already possible to achieve $SUBJECT with a trivial script that sets
> storage parameters on all tables, so IMHO it would be silly to withhold a
> global setting that does the same thing just on principle.

+1

For documentation of this GUC, I borrowed heavily from the relevant sections of
CREATE TABLE and VACUUM docs.

There are 3 ways I wrote one of the sentences in the docs. I picked the last
one, as it is concise and clearer than the others. If others feel a different
choice of words would be better, I'm all ears.

 If false, autovacuum will not perform the
 truncation, even if the vacuum_truncate option has
 been set to true for the table being processed.

 If false, autovacuum will not perform the
 truncation, and it ignores the v

Re: Statistics Import and Export

2025-02-27 Thread Corey Huinker
On Thu, Feb 27, 2025 at 10:01 PM Corey Huinker 
wrote:

> +--- error: relation is wrong type
>> +SELECT pg_catalog.pg_restore_relation_stats(
>> +'relation', 0::oid,
>> +'relpages', 17::integer,
>> +'reltuples', 400.0::real,
>> +'relallvisible', 4::integer);
>>
>> Why do you need to specify all the stats (relpages, reltuples, etc)?
>> To exercise this you could just do:
>> select pg_catalog.pg_restore_relation_stats('relation', 0::oid);
>>
>
> In the above case, it's historical inertia in that the pg_set_* call
> required all those parameters, as well as a fear that the code - now or in
> the future - might evaluate "can anything actually change from this call"
> and short circuit out before actually trying to make sense of the reg_class
> oid. But we can assuage that fear with just one of the three stat
> parameters, and I'll adjust accordingly.
>

* reduced relstats parameters specified to the minimum needed to verify the
error and avoid a theoretical future logic short-circuit described above.
* version parameter usage reduced to absolute minimum - verifying that it
is accepted and ignored, though Melanie's patch may introduce a need to
bring it back in a place or two.

84 lines deleted. Not great, not terrible.

I suppose if we really trusted the TAP test databases to have "one of
everything" in terms of tables with all the datatypes, and sufficient rows
to generate interesting stats, plus some indexes of each, then we could get
rid of those two, but I feel very strongly that it would be a minor savings
at a major cost to clarity.
From d0dfca62eb8ce27fb4bfce77bd2ee835738899a8 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Wed, 26 Feb 2025 21:02:44 -0500
Subject: [PATCH v2] Organize and deduplicate statistics import tests.

Many changes, refactorings, and rebasings have taken their toll on the
statistics import tests. Now that things appear more stable and the
pg_set_* functions are gone in favor of using pg_restore_* in all cases,
it's safe to remove duplicates, combine tests where possible, and make
the test descriptions a bit more descriptive and uniform.

Additionally, parameters that were not strictly needed to demonstrate
the purpose(s) of a test were removed to reduce clutter.
---
 src/test/regress/expected/stats_import.out | 680 -
 src/test/regress/sql/stats_import.sql  | 554 +++--
 2 files changed, 466 insertions(+), 768 deletions(-)

diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out
index 1f150f7b08d..7bd7bfb3e7b 100644
--- a/src/test/regress/expected/stats_import.out
+++ b/src/test/regress/expected/stats_import.out
@@ -13,19 +13,48 @@ CREATE TABLE stats_import.test(
 tags text[]
 ) WITH (autovacuum_enabled = false);
 CREATE INDEX test_i ON stats_import.test(id);
+--
+-- relstats tests
+--
+--- error: relation is wrong type
+SELECT pg_catalog.pg_restore_relation_stats(
+'relation', 0::oid,
+'relpages', 17::integer);
+WARNING:  argument "relation" has type "oid", expected type "regclass"
+ERROR:  "relation" cannot be NULL
+-- error: relation not found
+SELECT pg_catalog.pg_restore_relation_stats(
+'relation', 0::oid::regclass,
+'relpages', 17::integer);
+ERROR:  could not open relation with OID 0
+-- error: odd number of variadic arguments cannot be pairs
+SELECT pg_restore_relation_stats(
+'relation', 'stats_import.test'::regclass,
+'relallvisible');
+ERROR:  variadic arguments must be name/value pairs
+HINT:  Provide an even number of variadic arguments that can be divided into pairs.
+-- error: argument name is NULL
+SELECT pg_restore_relation_stats(
+'relation', 'stats_import.test'::regclass,
+NULL, '17'::integer);
+ERROR:  name at variadic position 3 is NULL
+-- error: argument name is not a text type
+SELECT pg_restore_relation_stats(
+'relation', '0'::oid::regclass,
+17, '17'::integer);
+ERROR:  name at variadic position 3 has type "integer", expected type "text"
 -- starting stats
 SELECT relpages, reltuples, relallvisible
 FROM pg_class
-WHERE oid = 'stats_import.test'::regclass;
+WHERE oid = 'stats_import.test_i'::regclass;
  relpages | reltuples | relallvisible 
 --+---+---
-0 |-1 | 0
+1 | 0 | 0
 (1 row)
 
-BEGIN;
 -- regular indexes have special case locking rules
-SELECT
-pg_catalog.pg_restore_relation_stats(
+BEGIN;
+SELECT pg_catalog.pg_restore_relation_stats(
 'relation', 'stats_import.test_i'::regclass,
 'relpages', 18::integer);
  pg_restore_relation_stats 
@@ -50,32 +79,6 @@ WHERE relation = 'stats_import.test_i'::regclass AND
 (1 row)
 
 COMMIT;
-SELECT
-pg_catalog.pg_restore_relation_stats(
-'relation', 'stats_import.test_i'::regclass,
-'relpages', 19::integer );
- pg_restore_relation_stats 

- t
-(1 row)
-
--- clear
-SE

Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-27 Thread Sutou Kouhei
Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Thu, 27 Feb 2025 15:24:26 -0800,
  Masahiko Sawada  wrote:

> Pushed the 0001 patch.

Thanks!

> Regarding the 0002 patch, I realized we stopped exposing
> NextCopyFromRawFields() function:
> 
>  --- a/src/include/commands/copy.h
> +++ b/src/include/commands/copy.h
> @@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState
> *pstate, Relation rel, Node *where
>  extern void EndCopyFrom(CopyFromState cstate);
>  extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
>  Datum *values, bool *nulls);
> -extern bool NextCopyFromRawFields(CopyFromState cstate,
> - char ***fields, int *nfields);
> 
> I think that this change is not relevant with the refactoring and
> probably we should keep it exposed as extension might be using it.
> Considering that we added pg_attribute_always_inline to the function,
> does it work even if we omit pg_attribute_always_inline to its
> function declaration in the copy.h file?

Unfortunately, no. The inline + constant argument
optimization requires "static".

How about the following?

static pg_attribute_always_inline bool
NextCopyFromRawFieldsInternal(CopyFromState cstate, char ***fields, int 
*nfields, bool is_csv)
{
...
}

bool
NextCopyFromRawFields(CopyFromState cstate, char ***fields, int *nfields)
{
return NextCopyFromRawFieldsInternal(cstate, fields, nfields, 
cstate->opts.csv_mode);
}


Thanks,
-- 
kou




Re: Improve documentation regarding custom settings, placeholders, and the administrative functions

2025-02-27 Thread Zhang Mingli
On Feb 28, 2025 at 02:27 +0800, David G. Johnston , 
wrote:
> On Thu, Feb 6, 2025 at 8:04 AM Zhang Mingli  wrote:
> > On Feb 6, 2025 at 10:49 +0800, David G. Johnston 
> > , wrote:
> > >
> > > Yes, and when it does it is doing so in lieu of an error, and thus it is 
> > > not returning the value of the setting.  It does not mean the value taken 
> > > on by the named parameter is NULL, which is not possible. i.e., SHOW will 
> > > never produce NULL.
> > Hi,
> >
> > OK, LGTM.
> >
>
> Thank you for the review.  If you would like to add yourself as the reviewer 
> and mark this ready to commit here is the commitfest entry.
> https://commitfest.postgresql.org/patch/5548/
> David J.
Hi, Done.


--
Zhang Mingli
HashData


Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-02-27 Thread Peter Geoghegan
On Thu, Feb 27, 2025 at 3:42 PM Robert Haas  wrote:
> +1 for having some instrumentation. I do not agree with Tom that these
> are numbers that only Peter Geoghegan and 2-3 other people will ever
> understand. I grant that it's not going to make sense to everyone, but
> the number of people to which it will make sense I would guess is
> probably in the hundreds or the thousands rather than the single
> digits.

I agree that it's likely to be of interest to only a minority of
users. But a fairly large minority.

> Good documentation could help.

It's easy to produce an example that makes intuitive sense. For
example, with skip scan that has a qual such as "WHERE a BETWEEN 1 and
5 AND b = 12345", it is likely that EXPLAIN ANALYZE will show "Index
Searches: 5" -- one search per "a" value. Such an example might be
more useful than my original pgbench_accounts example.

Do you think that that would help?

> So, where should we store that information?
>
> The thing that's odd about using IndexScanDesc is that it doesn't
> contain any other instrumentation currently, or at least not that I
> can see.

It is unique right now, but perhaps only because this is the first
piece of instrumentation that:

A. We must track at the level of an individual index scan -- not at
the level of an index relation, like pgstat_count_index_scan(), nor at
the level of the whole system, like BufferUsage state.

AND

B. Requires that we count something that fundamentally lives inside
the index AM -- something that we cannot reasonably track/infer from
the executor proper (more on that below, in my response to your scheme
#1).

> First, that struct has just four counters (ntuples,
> ntuples2, nfiltered, nfiltered2). For an index-only scan, all four of
> them are in use -- a plain index scan does not use ntuples2 -- but I
> think this optimization can apply to both index and index-only scans,
> so we just don't have room.

Right, index-only scans have exactly the same requirements as plain
index scans/bitmap index scans.

> 1. Add a new field to struct Instrumentation. Make
> index_getnext_slot() and possibly other similar functions return a
> count of index searches via a new parameter, and use that to bump the
> new struct Instrumentation counter. It's a little ugly to have to add
> a special-purpose parameter for this, but it doesn't look like there
> are tons and tons of calls so maybe it's OK.

That seems like the strongest possible alternative to the original
scheme used in the current draft patch (scheme #3).

This scheme #1 has the same issue as scheme #3, though: it still
requires an integer counter that tracks the number of index searches
(something a bit simpler than that, such as a boolean flag set once
per amgettuple call, won't do). This is due to there being no fixed
limit on the number of index searches required during any single
amgettuple call: in general the index AM may perform quite a few index
searches before it is able to return the first tuple to the scan (or
before it can return the next tuple).

The only difference that I can see between scheme #1 and scheme #3 is
that the former requires 2 counters instead of just 1. And, we'd still
need to have 1 out of the 2 counters located either in IndexScanDesc
itself (just like scheme #3), or located in some other struct that can
at least be accessed through IndexScanDesc (like the index AM opaque
state, per scheme #4). After all, *every* piece of state known to any
amgettuple routine must ultimately come from IndexScanDesc (or from
backend global state, per scheme #2).

(Actually, I supposed it is technically possible to avoid storing
anything in IndexScanDesc by inventing another amgettuple argument,
just for this. That seems like a distinction without a difference,
though.)

> 2. Add a new field to BufferUsage. Then the AM can bump this field and
> explain.c will know about it the same way it knows about other changes
> to pgBufferUsage. However, this is not really about buffer usage and
> the new field seems utterly unlike the other things that are in that
> structure, so this seems really bad to me.

I agree. The use of global variables seems quite inappropriate for
something like this. It'll result in wrong output whenever an index
scan uses an InitPlan in its qual when that InitPlan is itself a plan
that contains an index scan (this is already an issue with "Buffers:"
instrumentation, but this would be much worse).

> 3. Add a new field to IndexScanDesc, as you originally proposed. This
> seems similar to #1: it's still shoveling instrumentation data around
> in a way that we don't currently, but instead of shoveling it through
> a new parameter, we shovel it through a new structure member. Either
> way, the new thing (parameter or structure member) doesn't really look
> like it belongs with what's already there, so it seems like
> conservation of ugliness.

Perhaps a comment noting why the new counter lives in IndexScanDesc would help?

> 4. Add a new field to th

Re: Update docs for UUID data type

2025-02-27 Thread Andy Alsup
Masahiko,

I have combined the gen_random_uuid() and uuidv4() into a single row, as
you suggested. Please find the v5 patch, which has been squashed into a
single commit.

Best regards,
Andy Alsup

On Thu, Feb 27, 2025 at 5:02 PM Masahiko Sawada 
wrote:

> On Thu, Feb 27, 2025 at 1:26 PM Andy Alsup  wrote:
> >
> > I've submitted it for the up-coming commitfest. The link is:
> https://commitfest.postgresql.org/patch/5604/
> > Thanks for all your help in reviewing these changes.
>
> Thank you for the patch!
>
> Regarding the 0001 patch, I think we can put uuidv4() and
> get_random_uuid() in the same row since they are effectively identical
> functions. For example, we have precedent such as char_length() and
> character_length().
>
> The 0002 patch looks good to me.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com
>


v5-0001-Docs-for-UUID-funcs-formatted-in-table-and-UUID-d.patch
Description: Binary data


Re: Amcheck verification of GiST and GIN

2025-02-27 Thread Kirill Reshke
On Sat, 22 Feb 2025 at 03:51, Mark Dilger  wrote:
>
>
>
> > On Feb 21, 2025, at 12:50 PM, Mark Dilger  
> > wrote:
> >
> > Could one of the patch authors take a look?
>
> I turned the TAP test which triggers the error into a regression test that 
> does likewise, for ease of stepping through the test, if anybody should want 
> to do that.  I'm attaching that patch here, but please note that I'm not 
> expecting this to be committed.

Hi!
Your efforts are much appreciated!
I used this patch to derive a smaller repro.

> this seems to either be a bug in the checking code complaining about 
> perfectly valid tuple order,

I'm doubtful this is the case. I have added some more logging to
gin_index_check, and here is output after running attached:
```
DEBUG:  processing entry tree page at blk 2, maxoff: 125

DEBUG:  comparing for offset 78 category 0
DEBUG:  comparing for offset 79 category 2
DEBUG:  comparing for offset 80 category 3
DEBUG:  comparing for offset 81 category 0
LOG:  index "ginidx" has wrong tuple order on entry tree page, block
2, offset 81, rightlink 4294967295
DEBUG:  comparing for offset 82 category 0

DEBUG:  comparing for offset 100 category 0
DEBUG:  comparing for offset 101 category 2
DEBUG:  comparing for offset 102 category 3
DEBUG:  comparing for offset 103 category 0
LOG:  index "ginidx" has wrong tuple order on entry tree page, block
2, offset 103, rightlink 4294967295
DEBUG:  comparing for offset 104 category 0
DEBUG:  comparing for offset 105 category 0
```
So, we have an entry tree page, where there is tuple on offset 80,
with gin tuple category = 3, and then it goes category 0 again. And
one more such pattern on the same page.
The ginCompareEntries function compares the gin tuples category first.
I do not understand how this would be a valid order on the page, given
that
ginCompareEntries used in `ginget.c` logic. . Maybe I'm missing
something vital about GIN.


-- 
Best regards,
Kirill Reshke


0001-Much-smaller-repro.patch
Description: Binary data


Re: SQLFunctionCache and generic plans

2025-02-27 Thread Pavel Stehule
Hi

čt 27. 2. 2025 v 21:45 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > čt 27. 2. 2025 v 20:52 odesílatel Tom Lane  napsal:
> >> So taken together, our results are all over the map, anywhere
> >> from 7% speedup to 7% slowdown.  My usual rule of thumb is that
>
> > Where do you see 7% speedup? Few lines up you wrote 0.7% faster.
>
> Alexander got that on the fx4 case, according to his response a
> few messages ago [1].  It'd be good if someone else could reproduce
> that, because right now we have two "it's slower" results versus
> only one "it's faster".
>

ok

here is a profile from master

6.98%  postgres  postgres   [.] hash_bytes
   6.30%  postgres  postgres   [.] palloc0
   3.57%  postgres  postgres   [.] SearchCatCacheInternal
   3.29%  postgres  postgres   [.] AllocSetAlloc
   2.65%  postgres  plpgsql.so [.] exec_stmts
   2.55%  postgres  postgres   [.] expression_tree_walker_impl
   2.34%  postgres  postgres   [.] _SPI_execute_plan
   2.13%  postgres  postgres   [.] CheckExprStillValid
   2.02%  postgres  postgres   [.] fmgr_info_cxt_security
   1.89%  postgres  postgres   [.] ExecInitFunc
   1.51%  postgres  postgres   [.] ExecInterpExpr
   1.48%  postgres  postgres   [.] ResourceOwnerForget
   1.44%  postgres  postgres   [.] AllocSetReset
   1.35%  postgres  postgres   [.] MemoryContextCreate
   1.30%  postgres  plpgsql.so [.] plpgsql_exec_function
   1.29%  postgres  libc.so.6  [.] __memcmp_sse2
   1.24%  postgres  postgres   [.] MemoryContextDelete
   1.13%  postgres  postgres   [.] check_stack_depth
   1.11%  postgres  postgres   [.] AllocSetContextCreateInternal
   1.09%  postgres  postgres   [.] resolve_polymorphic_argtypes
   1.08%  postgres  postgres   [.] hash_search_with_hash_value
   1.07%  postgres  postgres   [.] standard_ExecutorStart
   1.07%  postgres  postgres   [.] ExprEvalPushStep
   1.04%  postgres  postgres   [.] ExecInitExprRec
   0.95%  postgres  plpgsql.so [.] plpgsql_estate_setup
   0.91%  postgres  postgres   [.] ExecReadyInterpretedExp

and from patched

   7.08%  postgres  postgres   [.] hash_bytes
   6.25%  postgres  postgres   [.] palloc0
   3.52%  postgres  postgres   [.] SearchCatCacheInternal
   3.30%  postgres  postgres   [.] AllocSetAlloc
   2.39%  postgres  postgres   [.] expression_tree_walker_impl
   2.37%  postgres  plpgsql.so [.] exec_stmts
   2.15%  postgres  postgres   [.] _SPI_execute_plan
   2.10%  postgres  postgres   [.] CheckExprStillValid
   1.94%  postgres  postgres   [.] fmgr_info_cxt_security
   1.71%  postgres  postgres   [.] ExecInitFunc
   1.41%  postgres  postgres   [.] AllocSetReset
   1.40%  postgres  postgres   [.] ExecInterpExpr
   1.38%  postgres  postgres   [.] ExprEvalPushStep
   1.34%  postgres  postgres   [.] ResourceOwnerForget
   1.31%  postgres  postgres   [.] MemoryContextDelete
   1.24%  postgres  libc.so.6  [.] __memcmp_sse2
   1.21%  postgres  postgres   [.] MemoryContextCreate
   1.18%  postgres  postgres   [.] AllocSetContextCreateInternal
   1.17%  postgres  postgres   [.] hash_search_with_hash_value
   1.13%  postgres  postgres   [.] resolve_polymorphic_argtypes
   1.13%  postgres  plpgsql.so [.] plpgsql_exec_function
   1.03%  postgres  postgres   [.] standard_ExecutorStart
   0.98%  postgres  postgres   [.] ExecInitExprRec
   0.96%  postgres  postgres   [.] check_stack_depth

looks so there is only one significant differences

ExprEvalPushStep 1.07 x 1.38%

Regards

Pavel

compiled without assertions on gcc 15 with 02

vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
stepping : 7
microcode : 0x2f
cpu MHz : 2691.102
cache size : 6144 KB





> regards, tom lane
>
> [1]
> https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru
>


Re: Statistics Import and Export

2025-02-27 Thread Jeff Davis
On Tue, 2025-02-25 at 11:11 +0530, Ashutosh Bapat wrote:
> So the dumped statistics are not restored exactly. The reason for
> this
> is the table statistics is dumped before dumping ALTER TABLE ... ADD
> CONSTRAINT command which changes the statistics. I think all the
> pg_restore_relation_stats() calls should be dumped after all the
> schema and data modifications have been done. OR what's the point in
> dumping statistics only to get rewritten even before restore
> finishes.

In your example, it's not so bad because the stats are actually better:
the index is built after the data is present, and therefore relpages
and reltuples are correct.

The problem is more clear if you use --no-data. If you load data,
ANALYZE, pg_dump --no-data, then reload the sql file, then the stats
are lost.

That workflow is very close to what pg_upgrade does. We solved the
problem for pg_upgrade in commit 71b66171d0 by simply not updating the
statistics when building an index and IsBinaryUpgrade.

To solve the issue with dump --no-data, I propose that we change the
test in 71b66171d0 to only update the stats if the physical relpages is
non-zero.

Patch attached:

 * If the dump is --no-data, or during pg_upgrade, the table will be
empty, so the physical relpages will be zero and the restored stats
won't be overwritten.

 * If (like in your example) the dump includes data, the new stats are
based on real data, so they are better anyway. This is sort of like the
case where autoanalyze kicks in.

 * If the dump is --statistics-only, then there won't be any indexes
created in the SQL file, so when you restore the stats, they will
remain until you do something else to change them.

 * If your example really is a problem, you'd need to dump first with -
-no-statistics, and then with --statistics-only, and restore the two
SQL files in order.


Alternatively, we could put stats into SECTION_POST_DATA, which was
already discussed[*], and we decided against it (though there was not a
clear consensus).

Regards,
Jeff Davis

*:
https://www.postgresql.org/message-id/1798867.1712376328%40sss.pgh.pa.us
From 86c03cb525b49d24019c5c0ea8ec36bb82b3c58a Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Thu, 27 Feb 2025 17:06:00 -0800
Subject: [PATCH v1] Do not update stats on empty table when building index.

We previously fixed this for binary upgrade in 71b66171d0, but a
similar problem exists when using pg_dump --no-data without pg_upgrade
involved. Fix both problems by not updating the stats when the table
has no pages.

Reported-by: Ashutosh Bapat 
Discussion: https://postgr.es/m/CAExHW5vf9D+8-a5_BEX3y=2y_xY9hiCxV1=c+fnxdvfprwv...@mail.gmail.com
---
 src/backend/catalog/index.c| 17 +++--
 src/test/regress/expected/stats_import.out | 22 +-
 src/test/regress/sql/stats_import.sql  | 12 
 3 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f37b990c81d..1a3fdeab350 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2833,11 +2833,7 @@ index_update_stats(Relation rel,
 	if (reltuples == 0 && rel->rd_rel->reltuples < 0)
 		reltuples = -1;
 
-	/*
-	 * Don't update statistics during binary upgrade, because the indexes are
-	 * created before the data is moved into place.
-	 */
-	update_stats = reltuples >= 0 && !IsBinaryUpgrade;
+	update_stats = reltuples >= 0;
 
 	/*
 	 * Finish I/O and visibility map buffer locks before
@@ -2850,7 +2846,16 @@ index_update_stats(Relation rel,
 	{
 		relpages = RelationGetNumberOfBlocks(rel);
 
-		if (rel->rd_rel->relkind != RELKIND_INDEX)
+		/*
+		 * Don't update statistics when the relation is completely empty. This
+		 * is important during binary upgrade, because at the time the schema
+		 * is loaded, the data has not yet been moved into place. It's also
+		 * useful when restoring a dump containing only schema and statistics.
+		 */
+		if (relpages == 0)
+			update_stats = false;
+
+		if (update_stats && rel->rd_rel->relkind != RELKIND_INDEX)
 			visibilitymap_count(rel, &relallvisible, NULL);
 	}
 
diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out
index 1f150f7b08d..4c81fb60c91 100644
--- a/src/test/regress/expected/stats_import.out
+++ b/src/test/regress/expected/stats_import.out
@@ -12,14 +12,34 @@ CREATE TABLE stats_import.test(
 arange int4range,
 tags text[]
 ) WITH (autovacuum_enabled = false);
+SELECT
+pg_catalog.pg_restore_relation_stats(
+'relation', 'stats_import.test'::regclass,
+'relpages', 18::integer,
+	'reltuples', 21::real,
+	'relallvisible', 24::integer);
+ pg_restore_relation_stats 
+---
+ t
+(1 row)
+
+-- creating an index on an empty table shouldn't overwrite stats
 CREATE INDEX test_i ON stats_import.test(id);
+SELECT relpages, reltuples, relallvisible
+FROM pg_class
+WHERE oid = 'stats_import.test'::regclass;
+ relpa

Re: Update docs for UUID data type

2025-02-27 Thread Andy Alsup
I've submitted it for the up-coming commitfest. The link is:
https://commitfest.postgresql.org/patch/5604/
Thanks for all your help in reviewing these changes.

Best Regards,
Andy Alsup

On Thu, Feb 27, 2025 at 1:58 PM Laurenz Albe 
wrote:

> On Wed, 2025-02-26 at 22:11 -0500, Andy Alsup wrote:
> > Please find the latest patch files attached.
>
> This is good to go.  If you add it to the commitfest, I'm happy to
> mark it "ready for committer".
>
> Yours,
> Laurenz Albe
>


Re: moving some code out of explain.c

2025-02-27 Thread Tom Lane
Robert Haas  writes:
> OK, here is v2. One slightly unfortunate thing about this is that we
> end up with a line that is over 80 characters:

> extern DestReceiver *CreateExplainSerializeDestReceiver(struct
> ExplainState *es);

> Aside from perhaps shortening the function name, I don't see how to avoid 
> that.

Can't get terribly excited about that.  But speaking of
CreateExplainSerializeDestReceiver, I see that a declaration
of it is still there at the bottom of explain.h:

extern DestReceiver *CreateExplainSerializeDestReceiver(ExplainState *es);

#endif  /* EXPLAIN_H */

Oversight I assume?

regards, tom lane




Re: Should work_mem be stable for a prepared statement?

2025-02-27 Thread David Rowley
On Fri, 28 Feb 2025 at 07:42, Jeff Davis  wrote:
> https://www.postgresql.org/message-id/CAJVSvF6s1LgXF6KB2Cz68sHzk%2Bv%2BO_vmwEkaon%3DH8O9VcOr-tQ%40mail.gmail.com
>
> James pointed out something interesting, which is that a prepared
> statement enforces the work_mem limit at execution time, which might be
> different from the work_mem at the time the statement was prepared.

There's a similar but not quite the same situation with the enable_*
GUCs. The executor isn't going to pick up a new value for these like
it will for work_mem, but I think portions of the same argument can be
made, i.e. Someone might not like that turning off enable_seqscan
after doing PREPARE and EXECUTE once does not invalidate their plan.

> My first reaction is that it's not right because the costing for the
> plan is completely bogus with a different work_mem. It would make more
> sense to me if we either (a) enforced work_mem as it was at the time of
> planning; or (b) replanned if executed with a different work_mem
> (similar to how we replan sometimes with different parameters).

If we were to fix this then a) effectively already happens for the
enable_* GUCs, so b) would be the only logical way to fix.

> But I'm not sure whether someone might be relying on the existing
> behavior?

It looks like there was a bit of discussion on this topic about 18
years ago in [1], but it didn't seem to end with a very conclusive
outcome. I did learn that we once didn't have a method to invalidate
cached plans, so perhaps the current behaviour is a remnant of the
previous lack of infrastructure.

David

[1] https://www.postgresql.org/message-id/15168.1174410673%40sss.pgh.pa.us




Re: moving some code out of explain.c

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 4:12 PM Tom Lane  wrote:
> +1, but how about explain_dr.h too?  It doesn't seem though that
> we can avoid #include "executor/instrument.h" there, so it'd be
> a little asymmetrical.  But the executor inclusion doesn't seem
> nearly as much almost-circular.

OK, here is v2. One slightly unfortunate thing about this is that we
end up with a line that is over 80 characters:

extern DestReceiver *CreateExplainSerializeDestReceiver(struct
ExplainState *es);

Aside from perhaps shortening the function name, I don't see how to avoid that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0001-Avoid-including-explain.h-in-explain_format.h-and.patch
Description: Binary data


Re: Restrict copying of invalidated replication slots

2025-02-27 Thread Masahiko Sawada
On Thu, Feb 27, 2025 at 12:52 AM Amit Kapila  wrote:
>
> On Thu, Feb 27, 2025 at 10:47 AM Masahiko Sawada  
> wrote:
> >
> > On Tue, Feb 25, 2025 at 7:33 PM Amit Kapila  wrote:
> > >
> > > AFAICU, InvalidateObsoleteReplicationSlots() is not serialized with
> > > this operation. So, isn't it possible that the source slot exists at
> > > the later position in ReplicationSlotCtl->replication_slots and the
> > > loop traversing slots is already ahead from the point where the newly
> > > copied slot is created?
> >
> > Good point. I think it's possible.
> >
> > > If so, the newly created slot won't be marked
> > > as invalid whereas the source slot will be marked as invalid. I agree
> > > that even in such a case, at a later point, the newly created slot
> > > will also be marked as invalid.
> >
> > The wal_status of the newly created slot would immediately become
> > 'lost' and the next checkpoint will invalidate it. Do we need to do
> > something to deal with this case?
> >
>
> + /* Check if source slot became invalidated during the copy operation */
> + if (second_slot_contents.data.invalidated != RS_INVAL_NONE)
> + ereport(ERROR,
> + errmsg("cannot copy replication slot \"%s\"",
> +NameStr(*src_name)),
> + errdetail("The source replication slot was invalidated during the
> copy operation."));
>
> How about adding a similar test as above for MyReplicationSlot? That
> should suffice the need because we have already acquired the new slot
> by this time and invalidation should signal this process before
> marking the new slot as invalid.

IIUC in the scenario you mentioned, the loop traversing slots already
passed the position of newly created slot in
ReplicationSlotCtl->replication_slots array, so
MyReplicationSlot->data.invalidated is still RS_INVAL_NONE, no?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Should work_mem be stable for a prepared statement?

2025-02-27 Thread Tom Lane
David Rowley  writes:
> On Fri, 28 Feb 2025 at 07:42, Jeff Davis  wrote:
>> My first reaction is that it's not right because the costing for the
>> plan is completely bogus with a different work_mem. It would make more
>> sense to me if we either (a) enforced work_mem as it was at the time of
>> planning; or (b) replanned if executed with a different work_mem
>> (similar to how we replan sometimes with different parameters).

> If we were to fix this then a) effectively already happens for the
> enable_* GUCs, so b) would be the only logical way to fix.

Given that nobody's complained about this for twenty-plus years,
I can't get excited about adding complexity to do either thing.

regards, tom lane




Re: Should work_mem be stable for a prepared statement?

2025-02-27 Thread Jeff Davis
On Thu, 2025-02-27 at 17:04 -0500, Tom Lane wrote:
> Given that nobody's complained about this for twenty-plus years,
> I can't get excited about adding complexity to do either thing.

I had in mind some refactoring in this area, which ideally would not
add complexity. It might provide some nice benefits, but would
introduce this behavior change, which makes it slightly more than a
refactoring.

It sounds like the behavior change would be desirable or at least
neutral. I will have to try it out and see if the refactoring is a net
improvement or turns into a mess.

Regards,
Jeff Davis





Re: Improve documentation regarding custom settings, placeholders, and the administrative functions

2025-02-27 Thread David G. Johnston
On Thu, Feb 6, 2025 at 8:04 AM Zhang Mingli  wrote:

> On Feb 6, 2025 at 10:49 +0800, David G. Johnston <
> david.g.johns...@gmail.com>, wrote:
>
>
> Yes, and when it does it is doing so in lieu of an error, and thus it is
> not returning the value of the setting.  It does not mean the value taken
> on by the named parameter is NULL, which is not possible. i.e., SHOW will
> never produce NULL.
>
> Hi,
>
> OK, LGTM.
>
>
Thank you for the review.  If you would like to add yourself as the
reviewer and mark this ready to commit here is the commitfest entry.

https://commitfest.postgresql.org/patch/5548/

David J.


Re: Should work_mem be stable for a prepared statement?

2025-02-27 Thread Greg Sabino Mullane
On Thu, Feb 27, 2025 at 1:42 PM Jeff Davis  wrote:

> It would make more sense to me if we either (a) enforced work_mem as it
> was at the time of planning; or (b) replanned if executed with a different
> work_mem (similar to how we replan sometimes with different parameters).
>

Definitely (b).

But I'm not sure whether someone might be relying on the existing behavior?
>

I cannot fathom a reason why.

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


Re: moving some code out of explain.c

2025-02-27 Thread Tom Lane
Robert Haas  writes:
> The thing that was bugging me a bit is that explain_format.h includes
> explain.h.

Yeah, I did not like that at all either -- it makes things a bit
circular, and I fear IWYU is going to make stupid recommendations
like not including explain.h in explain.c.

Did you look at avoiding that with our standard trick of writing
detail-free struct declarations?  That is, explain_format.h
would need

struct ExplainState;/* avoid including explain.h here */

and then s/ExplainState/struct ExplainState/ in all the function
declarations.  You'd still need to get List from someplace, but
it could be gotten by including primnodes.h or even pg_list.h.

regards, tom lane




Re: Statistics Import and Export commit related issue.

2025-02-27 Thread Corey Huinker
On Tue, Feb 25, 2025 at 1:31 AM jian he  wrote:

> hi.
> the thread "Statistics Import and Export" is quite hot.
> so a new thread for some minor issue i found.
>
>
> static int
> _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
> {
>
> if (strcmp(te->desc, "STATISTICS DATA") == 0)
> {
> if (!ropt->dumpStatistics)
> return 0;
> else
> res = REQ_STATS;
> }
>
> /* If it's statistics and we don't want statistics, maybe ignore it */
> if (!ropt->dumpStatistics && strcmp(te->desc, "STATISTICS DATA") == 0)
> return 0;
> }
> As you can see, there is some duplicate code there.
> in _tocEntryRequired, we can move `` if (strcmp(te->desc, "STATISTICS
> DATA") == 0)`` after switch (curSection).
> so pg_dump --section option does not influence statistics dump.
> attached is the proposed change.
>

+1, and I think the patch provided is good as-is.


>
>
> in dumpTableData_copy, we have:
> pg_log_info("dumping contents of table \"%s.%s\"",
> tbinfo->dobj.namespace->dobj.name, classname);
> dumpRelationStats add a pg_log_info would be a good thing.
> commit:
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=1fd1bd871012732e3c6c482667d2f2c56f1a9395
> didn't have any pg_log_info.
>

I don't have a strong opinion on this one, but I lean against doing it. I
think table data got a log message because dumping table data is an
operation that takes up a major % of the dump's runtime, whereas statistics
are just some catalog operations. There are log messages for the global
catalog operations (ex. read in all user defined functions, read in all
inheritance relationships), but not per-table catalog operations. If we end
up going to a global export of attribute statistics then I'd definitely
want a log message for that operation.


>
>
> https://www.postgresql.org/docs/devel/app-pgrestore.html
> """
> --jobs=number-of-jobs
> Run the most time-consuming steps of pg_restore — those that load
> data, create indexes, or create constraints — concurrently, using up
> to number-of-jobs concurrent sessions.
> """
> we may need to change the above sentence?
> pg_restore restore STATISTICS DATA also doing it concurrently if
> --jobs is specified.


I also have no strong opinion here, but I think the sentence is fine as is
because it specifies "most time-consuming" and a single catalog row
operation is much smaller than a COPY load, or the sequential scans needed
for index builds and constraint verification.


Re: SQL:2023 JSON simplified accessor support

2025-02-27 Thread Alexandra Wang
Hello hackers,

On Thu, Feb 27, 2025 at 9:46 AM Alexandra Wang 
wrote:

> Summary of changes:
>
> v8-0001 through v8-0005:
> Refactoring and preparatory steps for the actual implementation.
>
> v8-0006 (Implement read-only dot notation for JSONB):
> I removed the vars field (introduced in v7) from JsonbSubWorkspace
> after realizing that JsonPathVariable is not actually needed for
> dot-notation.
>
> v8-0007 (Allow wildcard member access for JSONB):
> I'm aware that the #if 0 in check_indirection() is not ideal. I
> haven’t removed it yet because I’m still reviewing other cases—beyond
> our JSONB simplified accessor use case—where this check should remain
> strict. I’ll post an additional patch to address this.
>

I made the following minor changes in v9:
- More detailed commit messages
- Additional tests
- Use "?column?" as the column name for trailing .*.

Other than that, the patches remain the same as the previous
version:
0001 - 0005: preparation steps for the actual implementation and do
not change or add new behavior.
0006: jsonb dot notation and sliced subscripting
0007: jsonb wildcard member access

Thanks,
Alex


v9-0003-Export-jsonPathFromParseResult.patch
Description: Binary data


v9-0002-Allow-Generic-Type-Subscripting-to-Accept-Dot-Not.patch
Description: Binary data


v9-0004-Extract-coerce_jsonpath_subscript.patch
Description: Binary data


v9-0001-Allow-transformation-of-only-a-sublist-of-subscri.patch
Description: Binary data


v9-0006-Implement-read-only-dot-notation-for-jsonb.patch
Description: Binary data


v9-0007-Allow-wild-card-member-access-for-jsonb.patch
Description: Binary data


v9-0005-Eanble-String-node-as-field-accessors-in-generic-.patch
Description: Binary data


Re: Make COPY format extendable: Extract COPY TO format implementations

2025-02-27 Thread Masahiko Sawada
On Tue, Feb 25, 2025 at 6:08 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Tue, 25 Feb 2025 17:14:43 -0800,
>   Masahiko Sawada  wrote:
>
> > I've attached updated patches.
>
> Thanks.
>
> I found one more missing last ".":
>
> 0002:
>
> > --- a/src/backend/commands/copyfrom.c
> > +++ b/src/backend/commands/copyfrom.c
>
> > @@ -106,6 +106,145 @@ typedef struct CopyMultiInsertInfo
>
> > +/*
> > + * Built-in format-specific routines. One-row callbacks are defined in
> > + * copyfromparse.c
> > + */
>
> copyfromparse.c -> copyfromparse.c.
>
>
> Could you push them?
>

Pushed the 0001 patch.

Regarding the 0002 patch, I realized we stopped exposing
NextCopyFromRawFields() function:

 --- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -107,8 +107,6 @@ extern CopyFromState BeginCopyFrom(ParseState
*pstate, Relation rel, Node *where
 extern void EndCopyFrom(CopyFromState cstate);
 extern bool NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 Datum *values, bool *nulls);
-extern bool NextCopyFromRawFields(CopyFromState cstate,
- char ***fields, int *nfields);

I think that this change is not relevant with the refactoring and
probably we should keep it exposed as extension might be using it.
Considering that we added pg_attribute_always_inline to the function,
does it work even if we omit pg_attribute_always_inline to its
function declaration in the copy.h file?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Statistics Import and Export

2025-02-27 Thread Melanie Plageman
On Wed, Feb 26, 2025 at 9:19 PM Corey Huinker  wrote:
>>>
>>> +1, let's shorten those queries.  The coast is probably pretty
>>> clear now if you want to go do that.
>>
>>
>> On it.

So, I started reviewing this and my original thought about shortening
the queries testing pg_restore_relation_stats() wasn't included in
your patch.

For example:

+--- error: relation is wrong type
+SELECT pg_catalog.pg_restore_relation_stats(
+'relation', 0::oid,
+'relpages', 17::integer,
+'reltuples', 400.0::real,
+'relallvisible', 4::integer);

Why do you need to specify all the stats (relpages, reltuples, etc)?
To exercise this you could just do:
select pg_catalog.pg_restore_relation_stats('relation', 0::oid);

Since I haven't been following along with this feature development, I
don't think I can get comfortable enough with all of the changes in
this test diff to commit them. I can't really say if this is the set
of tests that is representative and sufficient for this feature.

If you agree with me that the failure tests could be shorter, I'm
happy to commit that, but I don't really feel comfortable assessing
what the right set of full tests is.

- Melanie




Re: moving some code out of explain.c

2025-02-27 Thread Robert Haas
On Tue, Feb 18, 2025 at 1:11 PM Peter Geoghegan  wrote:
> On Tue, Feb 18, 2025 at 1:04 PM Robert Haas  wrote:
> > I think in cases like this there is a tendency to want to achieve an
> > even split of the original file, but in practice I think that tends
> > not to be difficult to achieve. These two patches combined move about
> > 1000 lines of code into separate files, which I think is actually
> > quite a good result for a refactoring of this sort. We might want to
> > split it up even more, but I don't feel like that has to be done in
> > the first set of patches or that all such patches have to come from
> > me. So I propose to do just this much for now.
> >
> > Comments?
>
> Seems like a good idea to me.

Thanks for the quick response! I have committed these patches.

I see that the Redis-FDW test is failing; it will now need to include
"commands/explain_format.h" -- unless we  something, of course.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: moving some code out of explain.c

2025-02-27 Thread Peter Geoghegan
On Thu, Feb 27, 2025 at 1:24 PM Robert Haas  wrote:
> Thanks for the quick response! I have committed these patches.

I recently did something similar myself when I moved all of the nbtree
preprocessing code into its own file.

The obvious downside is that it tends to make "git blame" much less
useful. It was definitely worth it, though.

-- 
Peter Geoghegan




Should work_mem be stable for a prepared statement?

2025-02-27 Thread Jeff Davis
As a part of this discussion:

https://www.postgresql.org/message-id/CAJVSvF6s1LgXF6KB2Cz68sHzk%2Bv%2BO_vmwEkaon%3DH8O9VcOr-tQ%40mail.gmail.com

James pointed out something interesting, which is that a prepared
statement enforces the work_mem limit at execution time, which might be
different from the work_mem at the time the statement was prepared.

For instance:

   SET work_mem='1GB';
   PREPARE foo AS ...; -- plans using 1GB limit
   SET work_mem='1MB';
   EXECUTE foo; -- enforces 1MB limit

My first reaction is that it's not right because the costing for the
plan is completely bogus with a different work_mem. It would make more
sense to me if we either (a) enforced work_mem as it was at the time of
planning; or (b) replanned if executed with a different work_mem
(similar to how we replan sometimes with different parameters).

But I'm not sure whether someone might be relying on the existing
behavior?

If we were to implement (a) or (b), we couldn't use the work_mem global
directly, we'd need to save it in the plan, and enforce using the
plan's saved value. But that might be a good change anyway. In theory
we might need to do something similar for hash_mem_multiplier, too.

Regards,
Jeff Davis





Re: SQLFunctionCache and generic plans

2025-02-27 Thread Tom Lane
Pavel Stehule  writes:
> čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov <
> a.pyha...@postgrespro.ru> napsal:
>>> Unfortunately, there is about 5% slowdown for inlined code, and for
>>> just plpgsql code too.

>> Hi. I've tried to reproduce slowdown and couldn't.

> I'll try to get profiles.

I tried to reproduce this too.  What I got on my usual development
workstation (RHEL8/gcc 8.5.0 on x86_64) was:

fx2 example: v6 patch about 2.4% slower than HEAD
fx4 example: v6 patch about 7.3% slower than HEAD

I was quite concerned after that result, but then I tried it on
another machine (macOS/clang 16.0.0 on Apple M1) and got:

fx2 example: v6 patch about 0.2% slower than HEAD
fx4 example: v6 patch about 0.7% faster than HEAD

(These are average-of-three-runs tests on --disable-cassert
builds; I trust you guys were not doing performance tests on
assert-enabled builds?)

So taken together, our results are all over the map, anywhere
from 7% speedup to 7% slowdown.  My usual rule of thumb is that
you can see up to 2% variation in this kind of microbenchmark even
when "nothing has changed", just due to random build details like
whether critical loops cross a cacheline or not.  7% is pretty
well above that threshold, but maybe it's just random build
variation anyway.

Furthermore, since neither example involves functions.c at all
(fx2 would be inlined, and fx4 isn't SQL-language), it's hard
to see how the patch would directly affect either example unless
it were adding overhead to plancache.c.  And I don't see any
changes there that would amount to meaningful overhead for the
existing use-case with a raw parse tree.

So right at the moment I'm inclined to write this off as
measurement noise.  Perhaps it'd be worth checking a few
more platforms, though.

regards, tom lane




Re: SQLFunctionCache and generic plans

2025-02-27 Thread Tom Lane
Pavel Stehule  writes:
> čt 27. 2. 2025 v 20:52 odesílatel Tom Lane  napsal:
>> So taken together, our results are all over the map, anywhere
>> from 7% speedup to 7% slowdown.  My usual rule of thumb is that

> Where do you see 7% speedup? Few lines up you wrote 0.7% faster.

Alexander got that on the fx4 case, according to his response a
few messages ago [1].  It'd be good if someone else could reproduce
that, because right now we have two "it's slower" results versus
only one "it's faster".

regards, tom lane

[1] 
https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru




Re: SQLFunctionCache and generic plans

2025-02-27 Thread Tom Lane
Alexander Pyhalov  writes:
> Now sql functions plans are actually saved. The most of it is a 
> simplified version of plpgsql plan cache. Perhaps, I've missed 
> something.

A couple of thoughts about v6:

* I don't think it's okay to just summarily do this:

/* It's stale; unlink and delete */
fcinfo->flinfo->fn_extra = NULL;
MemoryContextDelete(fcache->fcontext);
fcache = NULL;

when fmgr_sql sees that the cache is stale.  If we're
doing a nested call of a recursive SQL function, this'd be
cutting the legs out from under the outer recursion level.
plpgsql goes to some lengths to do reference-counting of
function cache entries, and I think you need the same here.

* I don't like much of anything about 0004.  It's messy and it
gives up all the benefit of plan caching in some pretty-common
cases (anywhere where the user was sloppy about what data type
is being returned).  I wonder if we couldn't solve that with
more finesse by applying check_sql_fn_retval() to the query tree
after parse analysis, but before we hand it to plancache.c?
This is different from what happens now because we'd be applying
it before not after query rewrite, but I don't think that
query rewrite ever changes the targetlist results.  Another
point is that the resultTargetList output might change subtly,
but I don't think we care there either: I believe we only examine
that output for its result data types and resjunk markers.
(This is nonobvious and inadequately documented, but for sure we
couldn't try to execute that tlist --- it's never passed through
the planner.)

* One diff that caught my eye was

-   if (!ActiveSnapshotSet() &&
-   plansource->raw_parse_tree &&
-   analyze_requires_snapshot(plansource->raw_parse_tree))
+   if (!ActiveSnapshotSet() && StmtPlanRequiresRevalidation(plansource))

Because StmtPlanRequiresRevalidation uses
stmt_requires_parse_analysis, this is basically throwing away the
distinction between stmt_requires_parse_analysis and
analyze_requires_snapshot.  I do not think that's okay, for the
reasons explained in analyze_requires_snapshot.  We should expend the
additional notational overhead to keep those things separate.

* I'm also not thrilled by teaching StmtPlanRequiresRevalidation
how to do something equivalent to stmt_requires_parse_analysis for
Query trees.  The entire point of the current division of labor is
for it *not* to know that, but keep the knowledge of the properties
of different statement types in parser/analyze.c.  So it looks to me
like we need to add a function to parser/analyze.c that does this.
Not quite sure what to call it though.
querytree_requires_parse_analysis() would be a weird name, because
if it's a Query tree then it's already been through parse analysis.
Maybe querytree_requires_revalidation()?

regards, tom lane




Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

2025-02-27 Thread Peter Geoghegan
On Mon, Feb 17, 2025 at 5:44 PM Peter Geoghegan  wrote:
> I need more feedback about this. I don't understand your perspective here.

Attached is a version of the patch that will apply cleanly against
HEAD. (This is from v26 of my skip scan patch, which is why I've
skipped so many version numbers compared to the last patch posted on
this thread.)

I still haven't changed anything about the implementation, since this
is just to keep CFTester happy.

-- 
Peter Geoghegan


v26-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch
Description: Binary data


Re: [Doc] Improve hostssl related descriptions and option presentation

2025-02-27 Thread David G. Johnston
On Mon, Apr 22, 2024 at 2:20 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> Thoughts anyone?
>
> On Thu, Feb 1, 2024 at 3:47 PM David G. Johnston <
> david.g.johns...@gmail.com> wrote:
>
>> Motivated by a recent complaint [1] I found the hostssl related material
>> in our docs quite verbose and even repetitive.  Some of that is normal
>> since we have both an overview/walk-through section as well as a reference
>> section.  But the overview in particular was self-repetitive.  Here is a
>> first pass at some improvements.  The commit message contains both the
>> intent of the patch and some thoughts/questions still being considered.
>>
>> David J.
>>
>> [1]
>> https://www.postgresql.org/message-id/170672433664.663.11895343120533141715%40wrigleys.postgresql.org
>>
>
Withdrawn as this now has conflicts from recent changes and no one seems
interested in voicing an opinion on this for or against.

David J.


Re: Proposal: "query_work_mem" GUC, to distribute working memory to the query's individual operators

2025-02-27 Thread James Hunter
On Wed, Feb 26, 2025 at 4:09 PM Jeff Davis  wrote:
>
> My idea was that we'd attach work_mem to each Path node and Plan node
> at create time. For example, in create_agg_path() it could do:
>
>   pathnode->path.node_work_mem = work_mem;
>
> And then add to copy_generic_path_info():
>
>   dest->node_work_mem = src->node_work_mem;
>
> (and similarly for other nodes; at least those that care about
> work_mem)
>
> Then, at execution time, use node->ps.ss.plan.node_work_mem instead of
> work_mem.

This is essentially what patches 2 and 3 do. Some comments:

First, there's no need to set the workmem_limit at Path time, since
it's not needed until the Plan is init-ted into a PlanState. So I set
it on the Plan but not on the Path.

Second, the logic to assign a workmem_limit to the Agg node is a bit
more complicated than in your example, because the Agg could be either
a Hash or a Sort. If it's a Hash, it gets work_mem * hash_mem_limit;
and if it's a Sort it gets either 0 or work_mem.

We can adjust the logic so that it gets work_mem instead of 0, by
pushing the complexity out of the original workmem_limit assignment
and into later code blocks -- e.g., in an extension -- but we still
need to decide whether the Agg is a Hash or a Sort. This is why Patch
2 does:

switch (agg->aggstrategy)
{
case AGG_HASHED:
case AGG_MIXED:

/*
 * Because nodeAgg.c will combine all AGG_HASHED nodes into a
 * single phase, it's easier to store the hash working-memory
 * limit on the first AGG_{HASHED,MIXED} node, and set it to zero
 * for all subsequent AGG_HASHED nodes.
 */
agg->plan.workmem_limit = is_first ?
get_hash_memory_limit() / 1024 : 0;
break;
case AGG_SORTED:

/*
 * Also store the sort-output working-memory limit on the first
 * AGG_SORTED node, and set it to zero for all subsequent
 * AGG_SORTED nodes.
 *
 * We'll need working-memory to hold the "sort_out" only if this
 * isn't the last Agg node (in which case there's no one to sort
 * our output).
 */
agg->plan.workmem_limit = *is_first_sort && !is_last ?
work_mem : 0;

*is_first_sort = false;
break;
default:
break;
}

Notice that the logic also sets the limit to 0 on certain Agg nodes --
this can be avoided, at the cost of added complexity later. The added
complexity is because, for example, for Hash Aggs, all Hash tables
share the same overall workmem_limit. So any workmem_limit set on
subsequent hash Agg nodes would be ignored, which means that setting
that limit adds conmplexity by obscuring the underlying logic.

> Similarly, we could track the node_mem_wanted, which would be the
> estimated amount of memory the node would use if unlimited memory were
> available. I believe that's already known (or a simple calculation) at
> costing time, and we can propagate it from the Path to the Plan the
> same way.

And this is what Patch 3 does. As you point out, this is already
known, or, if not, it's a simple calculation. This is all that Patch 3
does.

> (A variant of this approach could carry the values into the PlanState
> nodes as well, and the executor would that value instead.)

That's not needed, though, and would violate existing PG conventions:
we don't copy anything from Plan to PlanState, because the assumption
is that the PlanState always has access to its corresponding Plan.
(The reason we copy from Path to Plan, I believe, is that we drop all
Paths, to save memory; because we generally have many more Paths than
Plans.)

> Extensions like yours could have a GUC like ext.query_work_mem and use
> planner_hook to modify plan after standard_planner is done, walking the
> tree and adjusting each Plan node's node_work_mem to obey
> ext.query_work_mem. Another extension might hook in at path generation
> time with set_rel_pathlist_hook or set_join_pathlist_hook to create
> paths with lower node_work_mem settings that total up to
> ext.query_work_mem.

I don't sent workmem_limit at Path time, because it's not needed
there; but I do set the workmem (estimate) at Path time, exactly so
that future optimizer hooks could make use of a Path's workmem
(estimate) to decide between different Paths.

Patch 3 sets workmem (estimate) on the Path and copies it to the Plan.
As you know, there's deliberately not a 1-1 correspondence between
Path and Plan (the way there is generally a 1-1 correspondence between
Plan and PlanState), so Patch 3 has to do some additional work to
propagate the workmem (estimate) from Path to Plan. You can see
existing examples of similar work inside file createplan.c. Creating a
Plan from a Path is not generally as simple as just copying the Path's
fields over; there are lots of special cases.

Although Patch 3 sets workmem (estimate) on the Plan, inside
createplan.c, Patch 2 doesn't set workmem_limit inside createplan.c.
An earlier draft of the patchset *did* set it there, but because of
all the special casing in createplan.c, this ended up becoming
difficul

Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Ranier Vilela
Em qui., 27 de fev. de 2025 às 14:49, Andres Freund 
escreveu:

> Hi,
>
> On 2025-02-27 12:44:24 -0500, Andres Freund wrote:
> > > CID 1592454: (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
> > > 8. var_deref_op: Dereferencing null pointer per_buffer_data.
> >
> > That's exactly what the messages you quoted are discussing, no?
>
> Ah, no, it isn't. But I still think the coverity alert and the patch don't
> make sense, as per the below:
>
> > > diff --git a/src/backend/storage/aio/read_stream.c
> b/src/backend/storage/aio/read_stream.c
> > > index 04bdb5e6d4..18e9b4f3c4 100644
> > > --- a/src/backend/storage/aio/read_stream.c
> > > +++ b/src/backend/storage/aio/read_stream.c
> > > @@ -666,6 +666,8 @@ read_stream_next_buffer(ReadStream *stream, void
> **per_buffer_data)
> > >
>  READ_BUFFERS_ISSUE_ADVICE : 0)))
> > > {
> > > /* Fast return. */
> > > +   if (per_buffer_data)
> > > +   *per_buffer_data =
> get_per_buffer_data(stream, oldest_buffer_index);
> > > return buffer;
> > > }
> >
> > A few lines above:
> >   Assert(stream->per_buffer_data_size == 0);
> >
> > The fast path isn't used when per buffer data is used.  Adding a check
> for
> > per_buffer_data and assigning something to it is nonsensical.
>
Perhaps.

But the fast path and the parameter void **per_buffer_data,
IMHO, is a serious risk in my opinion.
Of course, when in runtime.

best regards,
Ranier Vilela


Re: Docs for pg_basebackup needs v17 note for incremental backup

2025-02-27 Thread David G. Johnston
On Thu, Feb 6, 2025 at 2:46 PM Daniel Gustafsson  wrote:

> I'd be happy to help getting this in, do you have a suggested wording?
>
>
Thank you.

Attached.

David J.
From fc9768fa1de17529cddc3f3ac1fba208f7500083 Mon Sep 17 00:00:00 2001
From: "David G. Johnston" 
Date: Thu, 27 Feb 2025 10:58:57 -0700
Subject: [PATCH] Document pg_basebackup incremental backup requires v17
 server.

---
 doc/src/sgml/ref/pg_basebackup.sgml | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index c2d721208b..9659f76042 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -1005,10 +1005,11 @@ PostgreSQL documentation
 
   
pg_basebackup works with servers of the same
-   or an older major version, down to 9.1. However, WAL streaming mode (-X
-   stream) only works with server version 9.3 and later, and tar format
+   or older major version, down to 9.1. However, WAL streaming mode (-X
+   stream) only works with server version 9.3 and later, the tar format
(--format=tar) only works with server version 9.5
-   and later.
+   and later, and incremental backup (--incremental) only works
+   with server version 17 and later.
   
 
   
-- 
2.34.1



Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Tom Lane
Andres Freund  writes:
> Ah, no, it isn't. But I still think the coverity alert and the patch don't
> make sense, as per the below:

Coverity's alert makes perfect sense if you posit that Coverity
doesn't assume that this read_stream_next_buffer call will
only be applied to a stream that has per_buffer_data_size > 0.
(Even if it did understand that, I wouldn't assume that it's
smart enough to see that the fast path will never be taken.)

I wonder if it'd be a good idea to add something like

Assert(stream->distance == 1);
Assert(stream->pending_read_nblocks == 0);
Assert(stream->per_buffer_data_size == 0);
+   Assert(per_buffer_data == NULL);

in read_stream_next_buffer.  I doubt that this will shut Coverity
up, but it would help to catch caller coding errors, i.e. passing
a per_buffer_data pointer when there's no per-buffer data.

On the whole I doubt we can get rid of this warning without some
significant redesign of the read_stream API, and I don't think
it's worth the trouble.  Coverity is a tool not a requirement.
I'm content to just dismiss the warning.

regards, tom lane




Re: Confine vacuum skip logic to lazy_scan_skip

2025-02-27 Thread Melanie Plageman
On Thu, Feb 27, 2025 at 1:08 PM Tom Lane  wrote:
>
> I wonder if it'd be a good idea to add something like
>
> Assert(stream->distance == 1);
> Assert(stream->pending_read_nblocks == 0);
> Assert(stream->per_buffer_data_size == 0);
> +   Assert(per_buffer_data == NULL);
>
> in read_stream_next_buffer.  I doubt that this will shut Coverity
> up, but it would help to catch caller coding errors, i.e. passing
> a per_buffer_data pointer when there's no per-buffer data.

I think this is a good stopgap. I was discussing adding this assert
off-list with Thomas and he wanted to detail his more ambitious plans
for type safety improvements in the read stream API. Less on the order
of a redesign and more like a separate read_stream_next_buffer()s for
when there is per buffer data and when there isn't. And a by-value and
by-reference version for the one where there is data.

I'll plan to add this assert tomorrow if that discussion doesn't materialize.

- Melanie




Re: Document How Commit Handles Aborted Transactions

2025-02-27 Thread David G. Johnston
Ahmed,

Thank you for the review.

I'm a bit confused by the reports of apply and compile errors.  I didn't
touch anything involving "varlist" and see no errors then or now in my
Meson ninja build.  Nor does the CI report any.

On Fri, Feb 14, 2025 at 2:01 PM Ahmed Ashour  wrote:

> Feedback:
> -
> - The patch was manually applied due to conflicts in `advanced.sgml` and
> `commit.sgml`.
> - Fixed invalid SGML structure by wrapping `` in
> ``.
>

If those errors were/are real this wouldn't be ready to commit.  But as
they seem to be a local environment issue on your end, and you agree with
the content, I'll keep it ready to commit.

David J.


Re: moving some code out of explain.c

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 2:21 PM Álvaro Herrera  wrote:
> On 2025-Feb-27, Robert Haas wrote:
> > I see that the Redis-FDW test is failing; it will now need to include
> > "commands/explain_format.h" -- unless we  something, of course.
>
> I wonder if it was really a useful idea to move the declarations of
> those functions from explain.h to the new explain_format.h file.  It
> seems that this new file now has a bunch of declarations that have
> always been something of a public interface, together with others that
> are only of internal explain.c interest, such as
> ExplainOpenSetAsideGroup() and friends.

I'm not completely certain about what I did with the header files, but
for a somewhat different reason than what you mention here.

I don't see ExplainOpenSetAsideGroup() as being any different from
anything else that I put in explain_format.h -- it's something that
code might want to call if it's generating EXPLAIN output, and
otherwise not. But maybe I'm missing something -- what makes you see
it as different from the other stuff?

The thing that was bugging me a bit is that explain_format.h includes
explain.h. It would be nice if those were more independent of each
other, but I'm not sure if there's a reasonably clean way of
accomplishing that. When deciding what to do there, it's also worth
keeping in mind that there may be more opportunities to move stuff out
of explain.c, so we might not want to get too dogmatic about the
header file organization just yet. But, I'm certainly open to ideas.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Add regression tests of ecpg command notice (error / warning)

2025-02-27 Thread Ryo Kanbayashi
On Tue, Feb 18, 2025 at 12:49 PM Ryo Kanbayashi
 wrote:
>
> On Thu, Feb 13, 2025 at 10:49 PM Fujii Masao
>  wrote:
> >
> >
> >
> > On 2025/02/06 8:57, Ryo Kanbayashi wrote:
> > > On Wed, Feb 5, 2025 at 9:31 PM Ryo Kanbayashi  
> > > wrote:
> > >>
> > >> Hi hackers,
> > >>
> > >> When I wrote patch of ecpg command notice bug, I recognized needs of
> > >> regression tests for ecpg command notices and I say that I write the
> > >> tests.
> >
> > Thanks for working on this!
> >
> >
> > >> I explain about implementation of this patch.
> > >>
> > >> What is this patch
> > >> - add regression tests which test ecpg command notices such as warning
> > >> and errors
> > >> - test notices implemented in ecpg.addons file
> > >>
> > >> Basic policy on implementation
> > >> - do in a way that matches the method using the existing pg_regress
> > >> command as much as possible
> > >> - avoid methods that increase the scope of influence
> > >>
> > >> Next, I list answers to points that are likely to be pointed out in
> > >> advance below :)
> > >> - shell scripts and bat files is used due to ...
> > >>  avoid non zero exit code of ecpg command makes tests failure
> > >>  avoid increasing C code for executing binary which cares cross 
> > >> platform
> > >> - python code is used because I couldn't write meson.build
> > >> appropriately describe dependency about materials which is used on
> > >> tests without it. please help me...
> > >> - as you said, kick this kind of tests by pg_regress accompanied with
> > >> needless PG server process execution. but pg_regress doesn't execute
> > >> test without it and making pg_regress require modification which has
> > >> not small scope of influence
> >
> > Wouldn't it be simpler to use the existing TAP test mechanism,
> > as shown in the attached patch? Please note that this patch is very WIP,
> > so there would be many places that need further implementation and 
> > refinement.
>
> Fujii San,
>
> Thank you for reviewing and indication of better implementation.
> I rewrite my patch based on your reference implementation :)

Fujii San and other hackers,

I have rewrote my patch on TAP test sttyle :)
File for build are also updated.

Commitfest entry:
https://commitfest.postgresql.org/patch/5543/

---
Great regards,
Ryo Kanbayashi
NTT Open Source Software Center


ecpg-notice-regress-patch-tap-ver.patch
Description: Binary data


Re: moving some code out of explain.c

2025-02-27 Thread Tom Lane
Robert Haas  writes:
> On Thu, Feb 27, 2025 at 3:05 PM Tom Lane  wrote:
>> Did you look at avoiding that with our standard trick of writing
>> detail-free struct declarations?

> OK, here's a patch that does that. Thoughts?

+1, but how about explain_dr.h too?  It doesn't seem though that
we can avoid #include "executor/instrument.h" there, so it'd be
a little asymmetrical.  But the executor inclusion doesn't seem
nearly as much almost-circular.

regards, tom lane




Re: moving some code out of explain.c

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 3:05 PM Tom Lane  wrote:
> Robert Haas  writes:
> > The thing that was bugging me a bit is that explain_format.h includes
> > explain.h.
>
> Yeah, I did not like that at all either -- it makes things a bit
> circular, and I fear IWYU is going to make stupid recommendations
> like not including explain.h in explain.c.
>
> Did you look at avoiding that with our standard trick of writing
> detail-free struct declarations?  That is, explain_format.h
> would need
>
> struct ExplainState;/* avoid including explain.h here */
>
> and then s/ExplainState/struct ExplainState/ in all the function
> declarations.  You'd still need to get List from someplace, but
> it could be gotten by including primnodes.h or even pg_list.h.

OK, here's a patch that does that. Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v1-0001-Avoid-including-commands-explain.h-in-commands-ex.patch
Description: Binary data


Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 4:08 PM Tom Lane  wrote:
> I was down on it to start with, on the grounds that it wasn't adding
> enough ease-of-use to justify even a relatively small amount of added
> code.  I'm not sure that the dump-reload failure angle is much of a
> concern in reality, but I would have voted not to commit before and
> I still do.

OK, fair. Anyone want to argue the other side?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: moving some code out of explain.c

2025-02-27 Thread Álvaro Herrera
On 2025-Feb-27, Robert Haas wrote:

> I see that the Redis-FDW test is failing; it will now need to include
> "commands/explain_format.h" -- unless we  something, of course.

I wonder if it was really a useful idea to move the declarations of
those functions from explain.h to the new explain_format.h file.  It
seems that this new file now has a bunch of declarations that have
always been something of a public interface, together with others that
are only of internal explain.c interest, such as
ExplainOpenSetAsideGroup() and friends.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: SQLFunctionCache and generic plans

2025-02-27 Thread Pavel Stehule
čt 27. 2. 2025 v 20:52 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov <
> > a.pyha...@postgrespro.ru> napsal:
> >>> Unfortunately, there is about 5% slowdown for inlined code, and for
> >>> just plpgsql code too.
>
> >> Hi. I've tried to reproduce slowdown and couldn't.
>
> > I'll try to get profiles.
>
> I tried to reproduce this too.  What I got on my usual development
> workstation (RHEL8/gcc 8.5.0 on x86_64) was:
>
> fx2 example: v6 patch about 2.4% slower than HEAD
> fx4 example: v6 patch about 7.3% slower than HEAD
>
> I was quite concerned after that result, but then I tried it on
> another machine (macOS/clang 16.0.0 on Apple M1) and got:
>
> fx2 example: v6 patch about 0.2% slower than HEAD
> fx4 example: v6 patch about 0.7% faster than HEAD
>
> (These are average-of-three-runs tests on --disable-cassert
> builds; I trust you guys were not doing performance tests on
> assert-enabled builds?)
>
> So taken together, our results are all over the map, anywhere
> from 7% speedup to 7% slowdown.  My usual rule of thumb is that
>

Where do you see 7% speedup? Few lines up you wrote 0.7% faster.


you can see up to 2% variation in this kind of microbenchmark even
> when "nothing has changed", just due to random build details like
> whether critical loops cross a cacheline or not.  7% is pretty
> well above that threshold, but maybe it's just random build
> variation anyway.
>
> Furthermore, since neither example involves functions.c at all
> (fx2 would be inlined, and fx4 isn't SQL-language), it's hard
> to see how the patch would directly affect either example unless
> it were adding overhead to plancache.c.  And I don't see any
> changes there that would amount to meaningful overhead for the
> existing use-case with a raw parse tree.
>
> So right at the moment I'm inclined to write this off as
> measurement noise.  Perhaps it'd be worth checking a few
> more platforms, though.
>
> regards, tom lane
>


Re: Simplify the logic a bit (src/bin/scripts/reindexdb.c)

2025-02-27 Thread Álvaro Herrera
On 2025-Feb-14, Ranier Vilela wrote:

> Attached is the prototype version v1.
> What do you think?

I think this is still a bit confused.  The new function's comment says
"prepare the list of tables to ..." but what it really receives is a
list of _indexes_ (as evidenced by the fact that they're compared to
pg_index.indexrelid).  So on input the user_list is an index list, and
on output it has been changed into a list of tables, and the list of
indexes is the function's return value?  That seems quite weird.  I
propose to change the API of this new function thus:

static void
get_parallel_tabidx_list(PGconn *conn,
SimpleStringList *index_list,
SimpleStringList *table_list,
bool echo);

where index_list is inout and the table_list is just an output argument.

I would also remove the 'type' argument, because I don't see a need to
keep it.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php




Re: Update docs for UUID data type

2025-02-27 Thread Laurenz Albe
On Wed, 2025-02-26 at 22:11 -0500, Andy Alsup wrote:
> Please find the latest patch files attached.

This is good to go.  If you add it to the commitfest, I'm happy to
mark it "ready for committer".

Yours,
Laurenz Albe




Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.

2025-02-27 Thread Tom Lane
Robert Haas  writes:
> Nobody has beaten me to it, but I thought of one potential issue.
> Suppose that somebody has a foreign table that exists today with one
> of these properties set to the empty string. Such a foreign table is
> not usable for queries, because the queries won't make it past scan.l
> on the remote side, but it's allowed to exist.

Hmm.  A query not using that column would work just fine, so it's
at least arguable that there are such beasts in reality and the
user hasn't noticed the bug.  (I doubt this, but if we're worrying
about improbable failures...)

> With this patch, it's
> not allowed to exist any more. That would be fine if no such objects
> exist in the wild, but if they do, people will get a pg_dump or
> pg_upgrade failure when they try to upgrade to a release that includes
> this change. Does that mean that we shouldn't commit this patch?

I was down on it to start with, on the grounds that it wasn't adding
enough ease-of-use to justify even a relatively small amount of added
code.  I'm not sure that the dump-reload failure angle is much of a
concern in reality, but I would have voted not to commit before and
I still do.

regards, tom lane




Re: Log connection establishment timings

2025-02-27 Thread Melanie Plageman
On Thu, Feb 27, 2025 at 11:30 AM Andres Freund  wrote:
>
> On 2025-02-27 11:08:04 -0500, Melanie Plageman wrote:
>
> > However, since that is a bigger project (with more refactoring, etc),
> > he suggested that we change log_connections to a GUC_LIST
> > (ConfigureNamesString) with options like "none", "received,
> > "authenticated", "authorized", "all".
>
> Yep.

I've done a draft of this in attached v7 (see 0001). It still needs
polishing (e.g. I need to figure out where to put the new guc hook
functions and enums and such), but I want to see if this is a viable
direction forward.

I'm worried the list of possible connection log messages could get
unwieldy if we add many more.

- Melanie
From 6f7f1a2e3ea660596bee0348691e192b8c2fce24 Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 27 Feb 2025 17:33:38 -0500
Subject: [PATCH v7 2/2] Add connection establishment duration logging

Add durations for several key parts of connection establishment when
log_connections is enabled.

For a new incoming connection, starting from when the postmaster gets a
socket from accept() and ending when the forked child backend is first
ready for query, there are multiple steps that could each take longer
than expected due to external factors. Provide visibility into
authentication and fork duration as well as the end-to-end connection
establishment time with logging.

To make this portable, the timestamps captured in the postmaster (socket
creation time, fork initiation time) are passed through the ClientSocket
and BackendStartupData structures instead of simply saved in backend
local memory inherited by the child process.

Reviewed-by: Bertrand Drouvot 
Reviewed-by: Fujii Masao 
Reviewed-by: Jelte Fennema-Nio 
Reviewed-by: Jacob Champion 
Reviewed-by: Guillaume Lelarge 
Discussion: https://postgr.es/m/flat/CAAKRu_b_smAHK0ZjrnL5GRxnAVWujEXQWpLXYzGbmpcZd3nLYw%40mail.gmail.com
---
 doc/src/sgml/config.sgml|  2 +-
 src/backend/postmaster/launch_backend.c | 23 +++
 src/backend/postmaster/postmaster.c | 12 +++-
 src/backend/tcop/postgres.c | 21 +
 src/backend/utils/init/globals.c|  2 ++
 src/backend/utils/init/postinit.c   | 12 
 src/include/libpq/libpq-be.h|  2 ++
 src/include/miscadmin.h |  2 ++
 src/include/portability/instr_time.h|  9 +
 src/include/postmaster/postmaster.h |  8 
 src/include/tcop/backend_startup.h  |  5 +
 src/tools/pgindent/typedefs.list|  1 +
 12 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c927efd22d1..82b06179a38 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7324,7 +7324,7 @@ local0.*/var/log/postgresql

 Causes each attempted connection to the server to be logged, as well as
 successful completion of both client authentication (if necessary) and
-authorization.
+authorization. Also logs connection establishment component durations.

 

diff --git a/src/backend/postmaster/launch_backend.c b/src/backend/postmaster/launch_backend.c
index 47375e5bfaa..3fe68601899 100644
--- a/src/backend/postmaster/launch_backend.c
+++ b/src/backend/postmaster/launch_backend.c
@@ -232,6 +232,11 @@ postmaster_child_launch(BackendType child_type, int child_slot,
 
 	Assert(IsPostmasterEnvironment && !IsUnderPostmaster);
 
+	/* Capture time Postmaster initiates fork for logging */
+	if ((log_connections & LOG_CONNECTION_TIMINGS) &&
+		(child_type == B_BACKEND || child_type == B_WAL_SENDER))
+		INSTR_TIME_SET_CURRENT(((BackendStartupData *) startup_data)->fork_time);
+
 #ifdef EXEC_BACKEND
 	pid = internal_forkexec(child_process_kinds[child_type].name, child_slot,
 			startup_data, startup_data_len, client_sock);
@@ -240,6 +245,15 @@ postmaster_child_launch(BackendType child_type, int child_slot,
 	pid = fork_process();
 	if (pid == 0)/* child */
 	{
+		/* Calculate total fork duration in child backend for logging */
+		if ((log_connections & LOG_CONNECTION_TIMINGS) &&
+			(child_type == B_BACKEND || child_type == B_WAL_SENDER))
+		{
+			instr_time	fork_time = ((BackendStartupData *) startup_data)->fork_time;
+
+			conn_timing.fork_duration = INSTR_TIME_GET_DURATION_SINCE(fork_time);
+		}
+
 		/* Close the postmaster's sockets */
 		ClosePostmasterPorts(child_type == B_LOGGER);
 
@@ -618,6 +632,15 @@ SubPostmasterMain(int argc, char *argv[])
 	/* Read in the variables file */
 	read_backend_variables(argv[2], &startup_data, &startup_data_len);
 
+	/* Calculate total fork duration in child backend for logging */
+	if ((log_connections & LOG_CONNECTION_READY) &&
+		(child_type == B_BACKEND || child_type == B_WAL_SENDER))
+	{
+		instr_time	fork_time = ((BackendStartupData *) startup_data)->fork_time;
+
+		conn_timing.fork_duration = INSTR_TIME_GET_DURATION_SINCE(fork_time);
+	}
+
 	/* Cl

Re: long-standing data loss bug in initial sync of logical replication

2025-02-27 Thread Masahiko Sawada
On Thu, Feb 27, 2025 at 12:14 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Monday, February 24, 2025 5:50 PM Amit Kapila  
> wrote:
> >
> > On Wed, Dec 11, 2024 at 12:37 PM Masahiko Sawada
> >  wrote:
> > >
> > > I confirmed that the proposed patch fixes these issues. I have one
> > > question about the patch:
> > >
> > > In the main loop in SnapBuildDistributeSnapshotAndInval(), we have the
> > > following code:
> > >
> > >/*
> > > * If we don't have a base snapshot yet, there are no changes in 
> > > this
> > > * transaction which in turn implies we don't yet need a snapshot 
> > > at
> > > * all. We'll add a snapshot when the first change gets queued.
> > > *
> > > * NB: This works correctly even for subtransactions because
> > > * ReorderBufferAssignChild() takes care to transfer the base
> > snapshot
> > > * to the top-level transaction, and while iterating the 
> > > changequeue
> > > * we'll get the change from the subtxn.
> > > */
> > >if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, txn->xid))
> > >continue;
> > >
> > > Is there any case where we need to distribute inval messages to
> > > transactions that don't have the base snapshot yet but eventually need
> > > the inval messages?
> > >
> >
> > Good point. It is mentioned that for snapshots: "We'll add a snapshot
> > when the first change gets queued.". I think we achieve this via
> > builder->committed.xip array such that when we set a base snapshot for
> > a transaction, we use that array to form a snapshot. However, I don't
> > see any such consideration for invalidations. Now, we could either
> > always add invalidations to xacts that don't have base_snapshot yet or
> > have a mechanism similar committed.xid array. But it is better to
> > first reproduce the problem.
>
> I think distributing invalidations to a transaction that has not yet built a
> base snapshot is un-necessary. This is because, during the process of building
> its base snapshot, such a transaction will have already recorded the XID of 
> the
> transaction that altered the publication information into its array of
> committed XIDs. Consequently, it will reflect the latest changes in the 
> catalog
> from the beginning. In the context of logical decoding, this scenario is
> analogous to decoding a new transaction initiated after the catalog-change
> transaction has been committed.
>
> The original issue arises because the catalog cache was constructed using an
> outdated snapshot that could not reflect the latest catalog changes. However,
> this is not a problem in cases without a base snapshot. Since the existing
> catalog cache should have been invalidated upon decoding the committed
> catalog-change transaction, the subsequent transactions will construct a new
> cache with the latest snapshot.

I've also concluded it's not necessary but the reason and analysis
might be somewhat different. IIUC in the original issue (looking at
Andres's reproducer[1]), the fact that when replaying a
non-catalog-change transaction, the walsender constructed the snapshot
that doesn't reflect the catalog change is fine because the first
change of that transaction was made before the catalog change. The
problem is that the walsender process absorbed the invalidation
message when replaying the change that happened before the catalog
change, and ended up keeping replaying the subsequent changes with
that snapshot. That is why we concluded that we need to distribute the
invalidation messages to concurrently decoded transactions so that we
can invalidate the cache again at that point. As the comment
mentioned, the base snapshot is set before queuing any changes, so if
the transaction doesn't have the base snapshot yet, there must be no
queued change that happened before the catalog change. The
transactions that initiated after the catalog change don't have this
issue.

Regards,

[1] 
https://www.postgresql.org/message-id/20231118025445.crhaeeuvoe2g5dv6%40awork3.anarazel.de

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Add -k/--link option to pg_combinebackup

2025-02-27 Thread vignesh C
On Fri, 21 Feb 2025 at 03:50, Israel Barth Rubio  wrote:
>
> There are some failures on compiler warnings and Windows jobs that
> seem unrelated with this patch.
>
> Besides that, there were failures when executing `010_links.pl` in the
> Linux autoconf jobs. As discussed in a side chat with Euler and Robert,
> and pointed out by Robert, Cirrus CI uses a very small
> `--with-segsize-blocks`. My tests in the V2 patch were assuming 1GB
> segments, so they were failing in Cirrus CI.
>
> The patch V3, which I'm attaching in this reply, implements some logic
> to handle any segment size.

Few comments:
1) The new file 010_links.pl added should be included to meson.build also:
'tests': [
  't/010_pg_basebackup.pl',
  't/011_in_place_tablespace.pl',
  't/020_pg_receivewal.pl',
  't/030_pg_recvlogical.pl',
  't/040_pg_createsubscriber.pl',
],

2) Since it is a new file, "Copyright (c) 2021-2025" should be
"Copyright (c) 2025":
+++ b/src/bin/pg_combinebackup/t/010_links.pl
@@ -0,0 +1,212 @@
+# Copyright (c) 2021-2025, PostgreSQL Global Development Group
+#
+# This test aims to validate that hard links are created as expected in the
+# output directory, when running pg_combinebackup with --link mode.

3) Since it is a single statement, no need of braces here:
+   /* Warn about the possibility of compromising the backups,
when link mode */
+   if (opt.copy_method == COPY_METHOD_LINK)
+   {
+   pg_log_warning("--link mode was used; any
modifications to the output "
+  "directory may
destructively modify input directories");
+   }

Regards,
Vignesh




Re: suspicious lockup on widowbird in AdvanceXLInsertBuffer (could it be due to 6a2275b8953?)

2025-02-27 Thread Alexander Korotkov
On Thu, Feb 27, 2025 at 12:24 AM Peter Geoghegan  wrote:
>
> On Wed, Feb 26, 2025 at 5:16 PM Tomas Vondra  wrote:
> > Oh, I forgot about that. I guess if the theory is the commit might be
> > causing this, and it's been already reverted, I should just restart the
> > instance, so that it tests with the revert.
>
> According to the revert commit's commit message there was also
> problems on batta at the time. Might have been the same issue that you
> saw on widowbird. Alexander?

Yes, the backtrace posted by Tomas from widowbird looks quite the same
as posted by Michael from batta.  My apologies for committing not
well-tested patch.

--
Regards,
Alexander Korotkov
Supabase




Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints

2025-02-27 Thread Rushabh Lathia
Thank Alvaro for the fixup patch.




On Fri, Feb 21, 2025 at 11:43 AM Alvaro Herrera 
wrote:

> Hello,
>
> Thanks!
>
> I noticed a typo 'constrint' in several places; fixed in the attached.
>
> I discovered that this sequence, taken from added regression tests,
>
> CREATE TABLE notnull_tbl1 (a int);
> ALTER TABLE notnull_tbl1 ADD CONSTRAINT nn_parent not null a not valid;
> CREATE TABLE notnull_chld (a int);
> ALTER TABLE notnull_chld ADD CONSTRAINT nn_child not null a not valid;
> ALTER TABLE notnull_chld INHERIT notnull_tbl1;
> ALTER TABLE notnull_tbl1 VALIDATE CONSTRAINT nn_parent;
>
> does mark the constraint as validated in the child, but only in
> pg_constraint -- pg_attribute continues to be marked as 'i', so if you
> try to use it for a PK, it fails:
>
> alter table notnull_chld add constraint foo primary key (a);
> ERROR:  column "a" of table "notnull_chld" is marked as NOT VALID NOT NULL
> constrint
>
> I thought that was going to be a quick fix, so I tried to do so; since
> we already have a function 'set_attnotnull', I thought it was the
> perfect tool to changing attnotnull.  However, it's not great, because
> since that part of the code is already doing the validation, I don't
> want it to queue the validation again, so the API needs a tweak; I
> changed it to receiving separately which new value to update attnotnull
> to, and whether to queue validation.  With that change it works
> correctly, but it is a bit ugly at the callers' side.  Maybe it works to
> pass two booleans instead?  Please have a look at whether that can be
> improved.
>

I haven't given much thought to improving the API, but I'll look into it
now. Also,
I'm considering renaming AdjustNotNullInheritance() since it now also
checks for invalid NOT NULL constraints. What do you think?


>
> I also noticed the addition of function getNNConnameForAttnum(), which
> does pretty much the same as findNotNullConstraintAttnum(), only it
> ignores all validate constraints instead of ignoring all non-validated
> constraints.  So after looking at the callers of the existing function
> and wondering which ones of them really wanted only the validated
> constraints?  It turns that the answer is none of them.  So I decided to
> remove the check for that, and instead we need to add checks to every
> caller of both findNotNullConstraintAttnum() and findNotNullConstraint()
> so that it acts appropriately when a non-validated constraint is
> returned.  I added a few elog(WARNING)s when this happens; running the
> tests I notice that none of them fire.  I'm pretty sure this indicates
> holes in testing: we have no test cases for these scenarios, and we
> should have them for assurance that we're doing the right things.  I
> recommend that you go over those WARNINGs, add test cases that make them
> fire, and then fix the code so that the test cases do the right thing.
> Also, just to be sure, please go over _all_ the callers of
> those two functions and make sure all cases are covered by tests that
> catch invalid constraints.
>
>
I reviewed the added WARNING in the fixup patch and addressed the case in
AdjustNotNullInheritance() and ATExecSetNotNull(). I also added the related
test case to the test suite.

Regarding the WARNING in MergeAttributeIntoExisting(), it won’t be
triggered
since this block executes only when the child table has
ATTRIBUTE_NOTNULL_FALSE.
Let me know if I’m missing anything.


>
> The docs continue to say this:
>   This form adds a new constraint to a table using the same constraint
>   syntax as CREATE
> TABLE, plus the option NOT
>   VALID, which is currently only allowed for foreign key
>   and CHECK constraints.
> which is missing to indicate that NOT VALID is valid for NOT NULL.
>
> Also I think the docs for attnotnull in catalogs.sgml are a bit too
> terse; I would write "The value 't' indicates that a not-null constraint
> exists for the column; 'i' for an invalid constraint, 'f' for none."
> which please feel free to use if you want, but if you want to come up
> with your own wording, that's great too.
>
>
Done changes, as per the suggestions.



Regards,
Rushabh Lathia
www.EnterpriseDB.com


0001-Convert-pg_attribut.attnotnull-to-char-type.patch
Description: Binary data


0002-Support-NOT-VALID-and-VALIDATE-CONSTRAINT-for-named-.patch
Description: Binary data


0003-Documentation-changes.patch
Description: Binary data


Re: vacuumdb changes for stats import/export

2025-02-27 Thread John Naylor
On Wed, Feb 5, 2025 at 4:44 AM Nathan Bossart  wrote:
> [v2]

I started looking just at 0001, and it seems like a fairly
straightforward rearrangement. I found this comment quite hard to
read:

+ * 'found_objs' should be a fully qualified list of objects to process, as
+ * returned by a previous call to vacuum_one_database().  If *found_objs is
+ * NULL, it is set to the results of the catalog query discussed below.  If
+ * found_objs is NULL, the results of the catalog query are not returned.
+ *
+ * If *found_objs is NULL, this function performs a catalog query to retrieve
+ * the list of tables to process.  When 'objects' is NULL, all tables in the

I had to read it several times before I noticed the difference between
"* found_objs" and "*found_objs". Maybe some extra spacing and breaks
would help, or other reorganization.

-- 
John Naylor
Amazon Web Services




Re: [BUG]: the walsender does not update its IO statistics until it exits

2025-02-27 Thread Andres Freund
On 2025-02-27 12:09:46 +0900, Michael Paquier wrote:
> I suspect that there would be cases where a single stats kind should be able
> to handle both transactional and non-transactional flush cases.  Splitting
> that across two stats kinds would lead to a lot of duplication.

Agreed. Table stats will be the prime case where that's important.




Re: Some read stream improvements

2025-02-27 Thread Thomas Munro
On Thu, Feb 27, 2025 at 11:19 AM Thomas Munro  wrote:
> On Wed, Feb 26, 2025 at 10:55 PM Andres Freund  wrote:
> > I was working on expanding tests for AIO and as part of that wrote a test 
> > for
> > temp tables -- our coverage is fairly awful, there were many times during 
> > AIO
> > development where I knew I had trivially reachable temp table specific bugs
> > but all tests passed.
> >
> > The test for that does trigger the problem described above and is fixed by 
> > the
> > patches in this thread (which I included in the other thread):
>
> Thanks.  Alright, I'm assuming that you don't have any objections to
> the way I restyled that API, so I'm going to go ahead and push some of
> these shortly, and then follow up with a few newer patches that
> simplify and improve the look-ahead and advice control.  More very
> soon.

Ugh, I realised in another round of self-review that that version
could exceed the soft limit by a small amount if the registered
callback pins more buffers underneath it, so not pushed yet.  I think
I see how to fix that (namely the alternative design that a comment
already contemplated), more soon...




Re: Some read stream improvements

2025-02-27 Thread Andres Freund
On 2025-02-27 11:19:55 +1300, Thomas Munro wrote:
> On Wed, Feb 26, 2025 at 10:55 PM Andres Freund  wrote:
> > I was working on expanding tests for AIO and as part of that wrote a test 
> > for
> > temp tables -- our coverage is fairly awful, there were many times during 
> > AIO
> > development where I knew I had trivially reachable temp table specific bugs
> > but all tests passed.
> >
> > The test for that does trigger the problem described above and is fixed by 
> > the
> > patches in this thread (which I included in the other thread):
> 
> Thanks.  Alright, I'm assuming that you don't have any objections to
> the way I restyled that API, so I'm going to go ahead and push some of
> these shortly, and then follow up with a few newer patches that
> simplify and improve the look-ahead and advice control.  More very
> soon.

Indeed, no objections, rather the opposite. Thanks!

Greetings,

Andres Freund




Re: pg_dump, pg_dumpall, pg_restore: Add --no-policies option

2025-02-27 Thread Greg Sabino Mullane
Re-reviewed this patch: still compiles, tests pass, and does what it says
on the tin. +1

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support


bug: ALTER TABLE ADD VIRTUAL GENERATED COLUMN with table rewrite

2025-02-27 Thread jian he
hi.


bug demo:
CREATE TABLE gtest20a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
* 2) VIRTUAL);
ALTER TABLE gtest20a ADD COLUMN c float8 DEFAULT RANDOM() CHECK (b < 60);
ERROR:  no generation expression found for column number 2 of table
"pg_temp_17306"

issue is that
in ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
we need use existing (tab->relid) relation get the generated
expression, not use OIDNewHeap,
since we don't populate OIDNewHeap related generated expressions.

The attached patch fixes this issue.
From f247e3b8005e17b6446b243f7d03f7d02a5a82d7 Mon Sep 17 00:00:00 2001
From: jian he 
Date: Thu, 27 Feb 2025 22:54:03 +0800
Subject: [PATCH v1 1/1] fix ALTER TABLE ADD VIRTUAL GENERATED COLUMN when
 table rewrite

demo:
CREATE TABLE gtest20a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
ALTER TABLE gtest20a ADD COLUMN c float8 DEFAULT RANDOM() CHECK (b < 60);
ERROR:  no generation expression found for column number 2 of table "pg_temp_17306"

in ATRewriteTable, the variable OIDNewHeap(if valid) corresponding pg_attrdef
default expression entry was not populated.  so OIDNewHeap cannot be used to
call expand_generated_columns_in_expr or build_generation_expression.  Therefore
in ATRewriteTable, we can only use the existing relation to expand the generated
expression.

discussion: https://postgr.es/m/
---
 src/backend/commands/tablecmds.c| 2 +-
 src/test/regress/expected/generated_virtual.out | 4 
 src/test/regress/sql/generated_virtual.sql  | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ce7d115667e..7d870c186ca 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6149,7 +6149,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap)
 		{
 			case CONSTR_CHECK:
 needscan = true;
-con->qualstate = ExecPrepareExpr((Expr *) expand_generated_columns_in_expr(con->qual, newrel ? newrel : oldrel, 1), estate);
+con->qualstate = ExecPrepareExpr((Expr *) expand_generated_columns_in_expr(con->qual, oldrel, 1), estate);
 break;
 			case CONSTR_FOREIGN:
 /* Nothing to do here */
diff --git a/src/test/regress/expected/generated_virtual.out b/src/test/regress/expected/generated_virtual.out
index b339fbcebfa..107562f10ff 100644
--- a/src/test/regress/expected/generated_virtual.out
+++ b/src/test/regress/expected/generated_virtual.out
@@ -647,6 +647,10 @@ INSERT INTO gtest20a (a) VALUES (10);
 INSERT INTO gtest20a (a) VALUES (30);
 ALTER TABLE gtest20a ADD CHECK (b < 50);  -- fails on existing row
 ERROR:  check constraint "gtest20a_b_check" of relation "gtest20a" is violated by some row
+--table rewrite cases.
+ALTER TABLE gtest20a ADD COLUMN c float8 DEFAULT RANDOM() CHECK (b < 50); -- fails on existing row
+ERROR:  check constraint "gtest20a_b_check" of relation "gtest20a" is violated by some row
+ALTER TABLE gtest20a ADD COLUMN c float8 DEFAULT RANDOM() CHECK (b < 61); --ok.
 CREATE TABLE gtest20b (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
 INSERT INTO gtest20b (a) VALUES (10);
 INSERT INTO gtest20b (a) VALUES (30);
diff --git a/src/test/regress/sql/generated_virtual.sql b/src/test/regress/sql/generated_virtual.sql
index c80630c11a5..09fb1c477e9 100644
--- a/src/test/regress/sql/generated_virtual.sql
+++ b/src/test/regress/sql/generated_virtual.sql
@@ -319,6 +319,9 @@ CREATE TABLE gtest20a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRT
 INSERT INTO gtest20a (a) VALUES (10);
 INSERT INTO gtest20a (a) VALUES (30);
 ALTER TABLE gtest20a ADD CHECK (b < 50);  -- fails on existing row
+--table rewrite cases.
+ALTER TABLE gtest20a ADD COLUMN c float8 DEFAULT RANDOM() CHECK (b < 50); -- fails on existing row
+ALTER TABLE gtest20a ADD COLUMN c float8 DEFAULT RANDOM() CHECK (b < 61); --ok.
 
 CREATE TABLE gtest20b (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
 INSERT INTO gtest20b (a) VALUES (10);
-- 
2.34.1



Re: Add -k/--link option to pg_combinebackup

2025-02-27 Thread Robert Haas
On Thu, Feb 27, 2025 at 4:50 AM vignesh C  wrote:
> Few comments:
> 1) The new file 010_links.pl added should be included to meson.build also:
> 'tests': [
>   't/010_pg_basebackup.pl',
>   't/011_in_place_tablespace.pl',
>   't/020_pg_receivewal.pl',
>   't/030_pg_recvlogical.pl',
>   't/040_pg_createsubscriber.pl',
> ],

Also give it a different number than any existing file -- if we
already have an 010 in that directory then make this one something
else. 012, 050, or whatever makes most sense.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump, pg_dumpall, pg_restore: Add --no-policies option

2025-02-27 Thread vignesh C
On Wed, 15 Jan 2025 at 13:21, jian he  wrote:
>
> hi.
>
> around src/bin/pg_dump/pg_dump.c line 1117
> right after "ropt->no_comments = dopt.no_comments;"
> we also need add
> ropt->no_policies = dopt.no_policies;
> ?
>
> overall looks good to me.
> The tests seem wrong per
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5499
> I have no idea how to improve the test.

Here is a rebased version along with the test failure fixes, please
accept the change if you are ok with it.

Regards,
Vignesh
From 2df5cd6cb3f0679f239b3aede3a98e122be2 Mon Sep 17 00:00:00 2001
From: Vignesh 
Date: Thu, 27 Feb 2025 15:32:45 +0530
Subject: [PATCH v3] pg_dump, pg_dumpall, pg_restore: Add --no-policies option

Add --no-policies option to control row level security policy handling
in dump and restore operations. When this option is used, both CREATE
POLICY commands and ALTER TABLE ... ENABLE ROW LEVEL SECURITY commands
are excluded from dumps and skipped during restores.

This is useful in scenarios where policies need to be redefined in the
target system or when moving data between environments with different
security requirements.
---
 doc/src/sgml/ref/pg_dump.sgml|  9 +
 doc/src/sgml/ref/pg_dumpall.sgml |  9 +
 doc/src/sgml/ref/pg_restore.sgml | 10 ++
 src/bin/pg_dump/pg_backup.h  |  2 ++
 src/bin/pg_dump/pg_backup_archiver.c |  7 +++
 src/bin/pg_dump/pg_dump.c|  6 ++
 src/bin/pg_dump/pg_dumpall.c |  5 +
 src/bin/pg_dump/pg_restore.c |  4 
 src/bin/pg_dump/t/002_pg_dump.pl | 15 +++
 9 files changed, 67 insertions(+)

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 1975054d7bf..0ae40f9be58 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1105,6 +1105,15 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-policies
+  
+   
+Do not dump row security policies.
+   
+  
+ 
+
  
   --no-publications
   
diff --git a/doc/src/sgml/ref/pg_dumpall.sgml b/doc/src/sgml/ref/pg_dumpall.sgml
index c2fa5be9519..ae5afb3c7d5 100644
--- a/doc/src/sgml/ref/pg_dumpall.sgml
+++ b/doc/src/sgml/ref/pg_dumpall.sgml
@@ -441,6 +441,15 @@ exclude database PATTERN
   
  
 
+ 
+  --no-policies
+  
+   
+Do not dump row security policies.
+   
+  
+ 
+
  
   --no-publications
   
diff --git a/doc/src/sgml/ref/pg_restore.sgml b/doc/src/sgml/ref/pg_restore.sgml
index 199ea3345f3..35140187807 100644
--- a/doc/src/sgml/ref/pg_restore.sgml
+++ b/doc/src/sgml/ref/pg_restore.sgml
@@ -723,6 +723,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --no-policies
+  
+   
+Do not output commands to restore row security policies, even if
+the archive contains them.
+   
+  
+ 
+
  
   --no-publications
   
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index e783cc68d89..ada80c6cf9a 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -111,6 +111,7 @@ typedef struct _restoreOptions
 	int			column_inserts;
 	int			if_exists;
 	int			no_comments;	/* Skip comments */
+	int			no_policies;	/* Skip row security policies */
 	int			no_publications;	/* Skip publication entries */
 	int			no_security_labels; /* Skip security label entries */
 	int			no_subscriptions;	/* Skip subscription entries */
@@ -183,6 +184,7 @@ typedef struct _dumpOptions
 	int			no_comments;
 	int			no_security_labels;
 	int			no_publications;
+	int			no_policies;	/* Skip row security policies */
 	int			no_subscriptions;
 	int			no_toast_compression;
 	int			no_unlogged_table_data;
diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 632077113a4..7453d5a6fdf 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -188,6 +188,7 @@ dumpOptionsFromRestoreOptions(RestoreOptions *ropt)
 	dopt->disable_dollar_quoting = ropt->disable_dollar_quoting;
 	dopt->dump_inserts = ropt->dump_inserts;
 	dopt->no_comments = ropt->no_comments;
+	dopt->no_policies = ropt->no_policies;
 	dopt->no_publications = ropt->no_publications;
 	dopt->no_security_labels = ropt->no_security_labels;
 	dopt->no_subscriptions = ropt->no_subscriptions;
@@ -2966,6 +2967,12 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
 	if (ropt->no_comments && strcmp(te->desc, "COMMENT") == 0)
 		return 0;
 
+	/* If it's a policy, maybe ignore it */
+	if (ropt->no_policies &&
+		(strcmp(te->desc, "POLICY") == 0 ||
+		 strcmp(te->desc, "ROW SECURITY") == 0))
+		return 0;
+
 	/*
 	 * If it's a publication or a table part of a publication, maybe ignore
 	 * it.
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7c38c89bf08..6a39464709f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -508,6 +508

Patch: Cover POSITION(bytea,bytea) with tests

2025-02-27 Thread Aleksander Alekseev
Hi,

This function is currently not covered by any tests. The proposed
patch fixes this.

--
Best regards,
Aleksander Alekseev


v1-0001-Cover-POSITION-bytea-bytea-with-tests.patch
Description: Binary data


Re: Proposal: Limitations of palloc inside checkpointer

2025-02-27 Thread Maxim Orlov
I tried to implement the idea (4). This is the patch.

But, there is a problem. See, when we release lock and call
RememberSyncRequest() and acquire it again,
CompactCheckpointerRequestQueue() may be called and the state of the
request array may be changed. Of course, we can recheck num_requests after
retaking the lock and restart the algo all over again. But this is not a
great idea, since we can stuck in this loop if someone is pushing requests
in the queue.

As for case (3). In fact, the described problem happens only with high
enough values of NBuffers. Thus, user already expected postgres to use huge
amount of RAM. Is this really a problem if he will demand some more to
process sync request?

-- 
Best regards,
Maxim Orlov.


v1-0001-AbsorbSyncRequests-incrementally-instead-of-doing.patch
Description: Binary data


Re: proposal - plpgsql - support standard syntax for named arguments for cursors

2025-02-27 Thread Pavel Stehule
Hi

út 25. 2. 2025 v 6:32 odesílatel Pavel Stehule 
napsal:

> Hi
>
> po 24. 2. 2025 v 21:05 odesílatel Gilles Darold 
> napsal:
>
>> Review:
>>
>> This patch claims to add SQL/PSM named arguments syntax to cursors and
>> this what it does exactly.
>>
>>   It compiles without error with master current code and all tests
>> passed successfully.
>>
>> I think it could be ready to be committed.
>>
>>
>> Note for the committer: does it make sense to mention in the
>> documentation that this standard SQL/PSM syntax is preferred than the PG
>> syntax?
>>
>
I modified doc in same manner like function's named arguments are described

Regards

Pavel


>
>>
>> Best regards,
>>
>
> Thank you for review
>
> Pavel
>
>
>>
>> --
>> Gilles Darold
>> http://www.darold.net/
>>
>>
From aa39b0ad4fe87c33ae889bbcfa9a81e973605fd8 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Thu, 27 Feb 2025 16:35:52 +0100
Subject: [PATCH 2/2] Separate old (proprietary) syntax to own para with note
 so it is supported just for compatibility reasons.

---
 doc/src/sgml/plpgsql.sgml | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 60af57712b7..2acdf43b4a1 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -3317,7 +3317,7 @@ OPEN curs1 FOR EXECUTE format('SELECT * FROM %I WHERE col1 = $1',tabname) USING
  Opening a Bound Cursor
 
 
-OPEN bound_cursorvar  (  argument_name { := | => }  argument_value , ... ) ;
+OPEN bound_cursorvar  (  argument_name { => | := }  argument_value , ... ) ;
 
 
  
@@ -3340,7 +3340,7 @@ OPEN bound_cursorvar  (  positional
   or named notation.  In positional
   notation, all arguments are specified in order.  In named notation,
-  each argument's name is specified using := to
+  each argument's name is specified using => to
   separate it from the argument expression. Similar to calling
   functions, described in , it
   is also allowed to mix positional and named notation.
@@ -3351,11 +3351,18 @@ OPEN bound_cursorvar  (  
 OPEN curs2;
 OPEN curs3(42);
-OPEN curs3(key := 42);
 OPEN curs3(Key => 42);
 
  
 
+ 
+  An older syntax based on := is supported for backward
+  compatibility:
+
+OPEN curs3(key := 42);
+
+ 
+
  
   Because variable substitution is done on a bound cursor's query,
   there are really two ways to pass values into the cursor: either
-- 
2.48.1

From 0ac81502c81c9359b7f9ecbd77e5edd34f410a15 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Sat, 8 Feb 2025 07:45:58 +0100
Subject: [PATCH 1/2] allow to use standard syntax for named arguments for
 plpgsql cursor arguments

---
 doc/src/sgml/plpgsql.sgml |  3 ++-
 src/pl/plpgsql/src/pl_gram.y  | 13 -
 src/test/regress/expected/plpgsql.out |  3 ++-
 src/test/regress/sql/plpgsql.sql  |  3 ++-
 4 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index 78e4983139b..60af57712b7 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -3317,7 +3317,7 @@ OPEN curs1 FOR EXECUTE format('SELECT * FROM %I WHERE col1 = $1',tabname) USING
  Opening a Bound Cursor
 
 
-OPEN bound_cursorvar  (  argument_name :=  argument_value , ... ) ;
+OPEN bound_cursorvar  (  argument_name { := | => }  argument_value , ... ) ;
 
 
  
@@ -3352,6 +3352,7 @@ OPEN bound_cursorvar  (   42);
 
  
 
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 8048e040f81..dc0ae113184 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3955,9 +3955,12 @@ read_cursor_args(PLpgSQL_var *cursor, int until, YYSTYPE *yylvalp, YYLTYPE *yyll
 	tok2;
 		int			arglocation;
 
-		/* Check if it's a named parameter: "param := value" */
+		/*
+		 * Check if it's a named parameter: "param := value"
+		 * or "param => value"
+		 */
 		plpgsql_peek2(&tok1, &tok2, &arglocation, NULL, yyscanner);
-		if (tok1 == IDENT && tok2 == COLON_EQUALS)
+		if (tok1 == IDENT && (tok2 == COLON_EQUALS || tok2 == EQUALS_GREATER))
 		{
 			char	   *argname;
 			IdentifierLookup save_IdentifierLookup;
@@ -3983,11 +3986,11 @@ read_cursor_args(PLpgSQL_var *cursor, int until, YYSTYPE *yylvalp, YYLTYPE *yyll
 		 parser_errposition(*yyllocp)));
 
 			/*
-			 * Eat the ":=". We already peeked, so the error should never
-			 * happen.
+			 * Eat the ":=" and the "=>". We already peeked, so the error should
+			 * never happen.
 			 */
 			tok2 = yylex(yylvalp, yyllocp, yyscanner);
-			if (tok2 != COLON_EQUALS)
+			if (tok2 != COLON_EQUALS && tok2 != EQUALS_GREATER)
 yyerror(yyllocp, NULL, yyscanner, "syntax error");
 
 			any_named = true;
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index 0a6945581bd..31aa806e491 100644
--- a/src/test/r

Re: Draft for basic NUMA observability

2025-02-27 Thread Bertrand Drouvot
Hi,

On Wed, Feb 26, 2025 at 09:38:20AM +0100, Jakub Wartak wrote:
> On Mon, Feb 24, 2025 at 3:06 PM Andres Freund  wrote:
> >
> > Why is this done before we even have gotten -2 back? Even if we need it, it
> > seems like we ought to defer this until necessary.
> 
> Not fixed yet: maybe we could even do a `static` with
> `has_this_run_earlier` and just perform this work only once during the
> first time?

Not sure I get your idea, could you share what the code would look like?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




  1   2   >