date:20240213

Re: [PATCH] allow pg_current_logfile() execution under pg_monitor role

2024-02-13 Thread Daniel Gustafsson

> On 13 Feb 2024, at 22:29, Nathan Bossart  wrote:
> 
> On Mon, Feb 12, 2024 at 09:49:45AM -0600, Nathan Bossart wrote:
>> Okay.  I'll plan on committing this in the next few days.
> 
> Here is what I have staged for commit.

LGTM.

--
Daniel Gustafsson

Re: Allow passing extra options to initdb for tests

2024-02-13 Thread Robert Haas

On Tue, Feb 6, 2024 at 4:24 PM Peter Eisentraut  wrote:
> I think this can be useful for a wide variety of uses, like running all
> tests with checksums enabled, or with JIT enabled, or with different GUC
> settings, or with different locale settings.  (The existing pg_regress
> --no-locale option is essentially a special case of this, but it only
> provides one particular locale setting, not things like changing the
> default provider etc.)
>
> Of course, not all tests are going to pass with arbitrary options, but
> it is useful to run this against specific test suites.

+1.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

A failure in t/001_rep_changes.pl

2024-02-13 Thread Bharath Rupireddy

Hi,

I recently observed an assertion failure twice in t/001_rep_changes.pl
on HEAD with the backtrace [1] on my dev EC2 c5.4xlarge instance [2].
Unfortunately I'm not observing it again. I haven't got a chance to
dive deep into it. However, I'm posting it here just for the records,
and in case something can be derived out of the backtrace.

[1] t/001_rep_changes.pl

2024-01-31 12:24:38.474 UTC [840166]
pg_16435_sync_16393_7330237333761601891 STATEMENT:
DROP_REPLICATION_SLOT pg_16435_sync_16393_7330237333761601891 WAIT
TRAP: failed Assert("list->head != INVALID_PGPROCNO"), File:
"../../../../src/include/storage/proclist.h", Line: 101, PID: 840166
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ExceptionalCondition+0xbb)[0x55c8edf6b8f9]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x6637de)[0x55c8edd517de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ConditionVariablePrepareToSleep+0x85)[0x55c8edd51b91]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotAcquire+0x142)[0x55c8edcead6b]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(ReplicationSlotDrop+0x51)[0x55c8edceb47f]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x60da71)[0x55c8edcfba71]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(exec_replication_command+0x47e)[0x55c8edcfc96a]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostgresMain+0x7df)[0x55c8edd7d644]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5ab50c)[0x55c8edc9950c]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5aab21)[0x55c8edc98b21]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x5a70de)[0x55c8edc950de]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(PostmasterMain+0x1534)[0x55c8edc949db]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(+0x459c47)[0x55c8edb47c47]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f19fe629d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f19fe629e40]
postgres: publisher: walsender ubuntu postgres [local]
DROP_REPLICATION_SLOT(_start+0x25)[0x55c8ed7c4565]
2024-01-31 12:24:38.476 UTC [840168]
pg_16435_sync_16390_7330237333761601891 LOG:  statement: SELECT
a.attnum,   a.attname,   a.atttypid,   a.attnum =
ANY(i.indkey)  FROM pg_catalog.pg_attribute a  LEFT JOIN
pg_catalog.pg_index i   ON (i.indexrelid =
pg_get_replica_identity_index(16391)) WHERE a.attnum >
0::pg_catalog.int2   AND NOT a.attisdropped AND a.attgenerated = ''
AND a.attrelid = 16391 ORDER BY a.attnum

[2] Linux ip-000-00-0-000 6.2.0-1018-aws #18~22.04.1-Ubuntu SMP Wed
Jan 10 22:54:16 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: About a recently-added message

2024-02-13 Thread Kyotaro Horiguchi

At Wed, 14 Feb 2024 16:26:52 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> > "wal_level" must be >= logical.
..
> > wal_level must be set to "replica" or "logical" at server start.
..
> I suggest making the quoting policy consistent.

Just after this, I found another inconsistency regarding quotation.

> 'dbname' must be specified in "%s".

The use of single quotes doesn't seem to comply with our standard.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-02-13 Thread Andrei Lepikhov


On 14/2/2024 13:32, Ashutosh Bapat wrote:

On Wed, Feb 14, 2024 at 9:50 AM Andrei Lepikhov
 wrote:


On 30/1/2024 12:44, Ashutosh Bapat wrote:

Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch
addressing Amit's comments is still a separate patch for him to
review.

Thanks for this improvement. Working with partitions, I frequently see
peaks of memory consumption during planning. So, maybe one more case can
be resolved here.
Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do
we really need it as a separate routine? It might be better to inline
this code.


try_partitionwise_join() is already 200 lines long. A separate
function is better than adding more lines to try_partitionwise_join().
Also if someone wants to call build_child_join_sjinfo() outside
try_partitionwise_join() may find free_child_sjinfo_members() handy.

Make sense, thanks.

Patch 0002 adds valuable comments, and I'm OK with that.

Also, as I remember, some extensions, such as pg_hint_plan, call
build_child_join_sjinfo. It is OK to break the interface with a major
version. But what if they need child_sjinfo a bit longer and collect
links to this structure? I don't think it is a real stopper, but it is
worth additional analysis.


If these extensions call build_child_join_sjinfo() and do not call
free_child_sjinfo_members, they can keep their child sjinfo as long as
they want. I didn't understand your concern.
Nothing special. This patch makes external code responsible for 
allocation of the structure and it adds more paths to do something 
wrong. Current code looks good.


--
regards,
Andrei Lepikhov
Postgres Professional

Re: Why is subscription/t/031_column_list.pl failing so much?

2024-02-13 Thread vignesh C

On Wed, 7 Feb 2024 at 16:27, vignesh C  wrote:
>
> On Wed, 7 Feb 2024 at 15:21, Amit Kapila  wrote:
> >
> > On Tue, Feb 6, 2024 at 8:21 PM Tom Lane  wrote:
> > >
> > > Amit Kapila  writes:
> > > > Yeah, I was worried about that. The other idea I have previously
> > > > thought was to change Alter Subscription to Drop+Create Subscription.
> > > > That should also help in bringing stability without losing any
> > > > functionality.
> > >
> > > Hm, why would that fix it?
> > >
> >
> > Because for new subscriptions, we will start reading WAL from the
> > latest WAL insert pointer on the publisher which will be after the
> > point where publication is created.
>
> I was able to reproduce the issue consistently with the changes shared
> by Tom Lane at [1].
> I have made changes to change ALTER SUBSCRIPTION to DROP+CREATE
> SUBSCRIPTION and verified that the test has passed consistently for
> >50 runs that I ran. Also the test execution time increased for this
> case is very negligibly:
> Without patch: 7.991 seconds
> With test change patch:   8.121 seconds
>
> The test changes for the same are attached.

Alternative, this could also be fixed like the changes proposed by
Amit at [1]. In this case we ignore publications that are not found
for the purpose of computing RelSyncEntry attributes. We won't mark
such an entry as valid till all the publications are loaded without
anything missing. This means we won't publish operations on tables
corresponding to that publication till we found such a publication and
that seems okay.

Tomas had raised a performance issue forcing us to reload it for every
replicated change/row in case the publications are invalid at [2]. How
about keeping the default option as it is and providing a new option
skip_not_exist_publication while creating/altering a subscription. In
this case if skip_not_exist_publication  is specified we will ignore
the case if publication is not present and publications will be kept
in invalid and get validated later.

The attached patch has the changes for the same. Thoughts?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BT-ETXeRM4DHWzGxBpKafLCp__5bPA_QZfFQp7-0wj4Q%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/dc08add3-10a8-738b-983a-191c7406707b%40enterprisedb.com

Regards,
Vignesh
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 406a3c2dd1..404577725e 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -74,6 +74,7 @@ GetSubscription(Oid subid, bool missing_ok)
 	sub->passwordrequired = subform->subpasswordrequired;
 	sub->runasowner = subform->subrunasowner;
 	sub->failover = subform->subfailover;
+	sub->skipnotexistpub = subform->subskipnotexistpub;
 
 	/* Get conninfo */
 	datum = SysCacheGetAttrNotNull(SUBSCRIPTIONOID,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 6791bff9dd..bdd367d272 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1359,7 +1359,8 @@ REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
   subbinary, substream, subtwophasestate, subdisableonerr,
 			  subpasswordrequired, subrunasowner, subfailover,
-  subslotname, subsynccommit, subpublications, suborigin)
+  subskipnotexistpub, subslotname, subsynccommit, subpublications,
+  suborigin)
 ON pg_subscription TO public;
 
 CREATE VIEW pg_stat_subscription_stats AS
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index a05d69922d..bd59efc73a 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -72,6 +72,7 @@
 #define SUBOPT_FAILOVER0x2000
 #define SUBOPT_LSN	0x4000
 #define SUBOPT_ORIGIN0x8000
+#define SUBOPT_SKIP_NOT_EXISTS_PUB  0x0001
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -97,6 +98,7 @@ typedef struct SubOpts
 	bool		passwordrequired;
 	bool		runasowner;
 	bool		failover;
+	bool		skipnotexistpub;
 	char	   *origin;
 	XLogRecPtr	lsn;
 } SubOpts;
@@ -159,6 +161,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->runasowner = false;
 	if (IsSet(supported_opts, SUBOPT_FAILOVER))
 		opts->failover = false;
+	if (IsSet(supported_opts, SUBOPT_SKIP_NOT_EXISTS_PUB))
+		opts->skipnotexistpub = false;
 	if (IsSet(supported_opts, SUBOPT_ORIGIN))
 		opts->origin = pstrdup(LOGICALREP_ORIGIN_ANY);
 
@@ -316,6 +320,15 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_FAILOVER;
 			opts->failover = defGetBoolean(defel);
 		}
+		else if (IsSet(supported_opts, SUBOPT_SKIP_NOT_EXISTS_PUB) &&
+ strcmp(defel->defname, "skip_not_exist_publication") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT

About a recently-added message

2024-02-13 Thread Kyotaro Horiguchi

A recent commit added the following message:

> "wal_level" must be >= logical.

The use of the term "logical" here is a bit confusing, as it's unclear
whether it's meant to be a natural language word or a token. (I
believe it to be a token.)

On the contrary, we already have the following message:

> wal_level must be set to "replica" or "logical" at server start.

This has the conflicting policy about quotation of variable names and
enum values.

I suggest making the quoting policy consistent.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Parent/child context relation in pg_get_backend_memory_contexts()

2024-02-13 Thread Michael Paquier

On Fri, Jan 19, 2024 at 05:41:45PM +0900, torikoshia wrote:
> We already have additional description below the table which explains each
> column of the system view. For example pg_locks:
> https://www.postgresql.org/docs/devel/view-pg-locks.html

I was reading the patch, and using int[] as a representation of the
path of context IDs up to the top-most parent looks a bit strange to
me, with the relationship between each parent -> child being
preserved, visibly, based on the order of the elements in this array
made of temporary IDs compiled on-the-fly during the function
execution.  Am I the only one finding that a bit strange?  Could it be
better to use a different data type for this path and perhaps switch
to the names of the contexts involved?

It is possible to retrieve this information some WITH RECURSIVE as
well, as mentioned upthread.  Perhaps we could consider documenting
these tricks?
--
Michael

signature.asc
Description: PGP signature

Re: DSA_ALLOC_NO_OOM doesn't work

2024-02-13 Thread Robert Haas

On Tue, Feb 13, 2024 at 7:53 PM Heikki Linnakangas  wrote:
> However, I must say that the dsm_impl_op() interface is absolutely
> insane. It's like someone looked at ioctl() and thought, "hey that's a
> great idea!".

As the person who wrote that code, this made me laugh.

I agree it's not the prettiest interface, but I thought that was OK
considering that it should only ever have a very limited number of
callers. I believe I did it this way in the interest of code
compactness. Since there are four DSM implementations, I wanted the
implementation-specific code to be short and all in one place, and
jamming it all into one function served that purpose. Also, there's a
bunch of logic that is shared by multiple operations - detach and
destroy tend to be similar, and so do create and attach, and there are
even things that are shared across all operations, like the snprintf
at the top of dsm_impl_posix() or the slightly larger amount of
boilerplate at the top of dsm_impl_sysv().

I'm not particularly opposed to refactoring this to make it nicer, but
my judgement was that splitting it up into one function per operation
per implementation, say, would have involved a lot of duplication of
small bits of code that might then get out of sync with each other
over time. By doing it this way, the logic is a bit tangled -- or
maybe more than a bit -- but there's very little duplication because
each implementation gets jammed into the smallest possible box. I'm OK
with somebody deciding that I got the trade-offs wrong there, but I
will be interested to see the number of lines added vs. removed in any
future refactoring patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: index prefetching

2024-02-13 Thread Robert Haas

On Thu, Feb 8, 2024 at 3:18 AM Melanie Plageman
 wrote:
> - kill prior tuple
>
> This optimization doesn't work with index prefetching with the current
> design. Kill prior tuple relies on alternating between fetching a
> single index tuple and visiting the heap. After visiting the heap we
> can potentially kill the immediately preceding index tuple. Once we
> fetch multiple index tuples, enqueue their TIDs, and later visit the
> heap, the next index page we visit may not contain all of the index
> tuples deemed killable by our visit to the heap.

Is this maybe just a bookkeeping problem? A Boolean that says "you can
kill the prior tuple" is well-suited if and only if the prior tuple is
well-defined. But perhaps it could be replaced with something more
sophisticated that tells you which tuples are eligible to be killed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add system identifier to backup manifest

2024-02-13 Thread Amul Sul

On Fri, Feb 2, 2024 at 12:03 AM Robert Haas  wrote:

> On Thu, Feb 1, 2024 at 2:18 AM Amul Sul  wrote:
> > I intended to minimize the out param of parse_manifest_file(), which
> currently
> > returns manifest_files_hash and manifest_wal_range, and I need two more
> --
> > manifest versions and the system identifier.
>
> Sure, but you could do context.ht = manifest_data->files instead of
> context.manifest = manifest_data. The question isn't whether you
> should return the whole manifest_data from parse_manifest_file -- I
> agree with that decision -- but rather whether you should feed the
> whole thing through into the context, or just the file hash.
>
> > Yeah, we can do that, but I think it is a bit inefficient to have
> strcmp()
> > check for the pg_control file on each verify_backup_file() call,
> despite, we
> > know that path.  Also, I think, we need additional handling to ensure
> that the
> > system identifier has been verified in verify_backup_file(), what if the
> > pg_control file itself missing from the backup -- might be a rare case,
> but
> > possible.
> >
> > For now, we can do the system identifier validation after
> > verify_backup_directory().
>
> Yes, that's another option, but I don't think it's as good.
>
> Suppose you do it that way. Then what will happen when the file is
> altogether missing or inaccessible? I think verify_backup_file() will
> complain, and then you'll have to do something ugly to avoid having
> verify_system_identifier() emit the same complaint all over again.
> Remember, unless you find some complicated way of passing data around,
> it won't know whether verify_backup_file() emitted a warning or not --
> it can redo the stat() and see what happens, but it's not absolutely
> guaranteed to be the same as what happened before. Making sure that
> you always emit any given complaint once rather than twice or zero
> times is going to be tricky.
>
> It seems more natural to me to just accept the (small) cost of a
> strcmp() in verify_backup_file(). If the initial stat() fails, it
> emits whatever complaint is appropriate and returns and the logic to
> check the system identifier is never reached. If it succeeds, you can
> proceed to try to open the file and do what you need to do.
>

Ok, I did that way in the attached version, I have passed the control file's
full path as a second argument to verify_system_identifier() what we gets in
verify_backup_file(), but that is not doing any useful things with it,
since we
were using get_controlfile() to open the control file, which takes the
directory as an input and computes the full path on its own.

Regards,
Amul


v6-0002-Add-system-identifier-to-backup-manifest.patch
Description: Binary data


v6-0001-pg_verifybackup-code-refactor.patch
Description: Binary data

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Michael Paquier

On Wed, Feb 14, 2024 at 02:08:51PM +0900, Sutou Kouhei wrote:
> I understand the feeling. SQL uses "prepared" for "prepared
> statement". There are similar function names such as
> InputFunctionCall()/InputFunctionCallSafe()/DirectInputFunctionCallSafe(). 
> They
> execute (call) an input function but they use "call" not
> "execute" for it... So "Execute...Call..." may be
> redundant...
> 
> How about InputFunctionCallSafeWithInfo(),
> InputFunctionCallSafeInfo() or
> InputFunctionCallInfoCallSafe()?

WithInfo() would not be a new thing.  There are a couple of APIs named
like this when manipulating catalogs, so that sounds kind of a good
choice from here.
--
Michael


signature.asc
Description: PGP signature

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-02-13 Thread Ashutosh Bapat

On Wed, Feb 14, 2024 at 9:50 AM Andrei Lepikhov
 wrote:
>
> On 30/1/2024 12:44, Ashutosh Bapat wrote:
> > Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch
> > addressing Amit's comments is still a separate patch for him to
> > review.
> Thanks for this improvement. Working with partitions, I frequently see
> peaks of memory consumption during planning. So, maybe one more case can
> be resolved here.
> Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do
> we really need it as a separate routine? It might be better to inline
> this code.

try_partitionwise_join() is already 200 lines long. A separate
function is better than adding more lines to try_partitionwise_join().
Also if someone wants to call build_child_join_sjinfo() outside
try_partitionwise_join() may find free_child_sjinfo_members() handy.

> Patch 0002 adds valuable comments, and I'm OK with that.
>
> Also, as I remember, some extensions, such as pg_hint_plan, call
> build_child_join_sjinfo. It is OK to break the interface with a major
> version. But what if they need child_sjinfo a bit longer and collect
> links to this structure? I don't think it is a real stopper, but it is
> worth additional analysis.

If these extensions call build_child_join_sjinfo() and do not call
free_child_sjinfo_members, they can keep their child sjinfo as long as
they want. I didn't understand your concern.

-- 
Best Wishes,
Ashutosh Bapat

Re: RFC: Logging plan of the running query

2024-02-13 Thread torikoshia


On 2024-02-13 11:30, torikoshia wrote:


I'll update the patch including other points such as removing ensuring
no page lock code.


Updated the patch.


Using injection point support we should be able to add tests for
testing pg_log_query_plan behaviour when there are page locks held or
when auto_explain (with instrumentation) and pg_log_query_plan() work
on the same query plan. Use injection point to make the backend
running query wait at a suitable point to delay its execution and fire
pg_log_query_plan() from other backend. May be the same test could
examine the server log file to see if the plan is indeed output to the
server log file.


Attached patch uses injection point as below:

- There may be more points to inject, but added an injection point at 
ExecutorRun(), which seems to be the first interruption point where 
plans can be reliably displayed.
- At injection point, it'd be possible to wait for some duration and 
fire pg_log_plan_query() as you suggested. However, I'm not sure how 
long duration is appropriate considering the variety of testing 
environments. Instead, attached patch calls 
HandleLogQueryPlanInterrupt() directly and set InterruptPending.
- Tests both pg_log_plan_query() and auto_explain logs for their output, 
and the logged plans are the same.



--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 1dcac4fb4291e3b92733494624cbb090dff7aded Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Wed, 14 Feb 2024 14:41:04 +0900
Subject: [PATCH v36] Add function to log the plan of the query

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar,
Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro
Horiguchi, Robert Treat, Alena Rybakina, Ashutosh Bapat,
Jian He

Co-authored-by: James Coleman 

Discussion: https://www.postgresql.org/message-id/cf8501bcd95ba4d727cbba886ba9eea8%40oss.nttdata.com
Discussion: https://www.postgresql.org/message-id/flat/d68c3ae31672664876b22d2dcbb526d2%40postgrespro.ru
---
 contrib/auto_explain/Makefile |   2 +
 contrib/auto_explain/auto_explain.c   |  23 +--
 contrib/auto_explain/t/001_auto_explain.pl|  35 
 doc/src/sgml/func.sgml|  50 +
 src/backend/access/transam/xact.c |  17 ++
 src/backend/catalog/system_functions.sql  |   2 +
 src/backend/commands/explain.c| 180 +-
 src/backend/executor/execMain.c   |  19 ++
 src/backend/storage/ipc/procsignal.c  |   4 +
 src/backend/storage/lmgr/lock.c   |   9 +-
 src/backend/tcop/postgres.c   |   4 +
 src/backend/utils/init/globals.c  |   2 +
 src/include/catalog/pg_proc.dat   |   6 +
 src/include/commands/explain.h|   9 +
 src/include/miscadmin.h   |   1 +
 src/include/storage/lock.h|   2 -
 src/include/storage/procsignal.h  |   1 +
 src/include/tcop/pquery.h |   2 +-
 src/include/utils/elog.h  |   1 +
 .../injection_points/injection_points.c   |  11 ++
 src/test/regress/expected/misc_functions.out  |  54 +-
 src/test/regress/sql/misc_functions.sql   |  41 +++-
 22 files changed, 427 insertions(+), 48 deletions(-)

diff --git a/contrib/auto_explain/Makefile b/contrib/auto_explain/Makefile
index efd127d3ca..64fe0e0573 100644
--- a/contrib/auto_explain/Makefile
+++ b/contrib/auto_explain/Makefile
@@ -8,6 +8,8 @@ PGFILEDESC = "auto_explain - logging facility for execution plans"
 
 TAP_TESTS = 1
 
+EXTRA_INSTALL = src/test/modules/injection_points
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 677c135f59..e041b10b0e 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -401,26 +401,9 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 			es->format = auto_explain_log_format;
 			es->settings = auto_explain_log_settings;
 
-			ExplainBeginOutput(es);
-			ExplainQueryText(es, queryDesc);
-			ExplainQueryParameters(es, queryDesc->params, auto_explain_log_parameter_max_length);
-			ExplainPrintPlan(es,

Re: Propagate pathkeys from CTEs up to the outer query

2024-02-13 Thread Andrei Lepikhov


On 29/1/2024 10:18, Richard Guo wrote:

In [1] we've reached a conclusion that for a MATERIALIZED CTE it's okay
to 'allow our statistics or guesses for the sub-query to subsequently
influence what the upper planner does'.  Commit f7816aec23 exposes
column statistics to the upper planner.  In the light of that, here is a
patch that exposes info about the ordering of the CTE result to the
upper planner.

This patch was initially posted in that same thread and has received
some comments from Tom in [2].  Due to the presence of multiple patches
in that thread, it has led to confusion.  So fork a new thread here
specifically dedicated to discussing the patch about exposing pathkeys
from CTEs to the upper planner.
I like this approach. It looks good initially, but such features need 
more opinions/views/time to analyse corner cases.
It goes alongside my current backburner - pull parameterisation through 
the GROUP-BY and the query block fence up to the JOIN searching code of 
the parent query.


--
regards,
Andrei Lepikhov
Postgres Professional

Re: Allow passing extra options to initdb for tests

2024-02-13 Thread Ian Lawrence Barwick

2024年2月7日(水) 12:51 Ian Lawrence Barwick :
>
> 2024年2月6日(火) 19:54 Peter Eisentraut :
> >
> > I'm proposing here a way to pass extra options to initdb when run
> > internally during test setup in pg_regress or
> > PostgreSQL::Test::Cluster's init (which covers just about all test
> > suites other than initdb's own tests).
> >
> > For example:
> >
> >  make check PG_TEST_INITDB_EXTRA_OPTS='-k -c work_mem=50MB'
> >
> > We currently document at [0] a way to pass custom server settings to the
> > tests via PGOPTIONS.  But this only works for pg_regress and only for
> > options that can be changed at run time.  My proposal can set initdb
> > options, and since initdb has the -c option now, it can set any GUC
> > parameter as well.
> >
> > I think this can be useful for a wide variety of uses, like running all
> > tests with checksums enabled, or with JIT enabled, or with different GUC
> > settings, or with different locale settings.  (The existing pg_regress
> > --no-locale option is essentially a special case of this, but it only
> > provides one particular locale setting, not things like changing the
> > default provider etc.)
> >
> > Of course, not all tests are going to pass with arbitrary options, but
> > it is useful to run this against specific test suites.
> >
> > This patch also updates the documentation at [0] to explain the new
> > method and distinguish it from the previously documented methods.
>
> +1 for this, I recently ran into an issue with the regression tests for an
> extension where it would have been very useful to provide some initdb
> options.
>
> Patch works as expected after a quick initial test.

I had a longer look at this and can't find any issues with the code or
documentation changes.

I did wonder whether it would be worth mentioning that any initdb
options set in "PG_TEST_INITDB_EXTRA_OPTS" will override those
which can be set by pg_regress, but of the four ("--no-clean", "--no-sync",
"--debug" and "--no-locale"), only the optional  "--no-locale" can actually
be overridden, so it doesn't seem important.

Regards

Ian Barwick

RE: pg_upgrade and logical replication

2024-02-13 Thread Hayato Kuroda (Fujitsu)

Dear Vignesh,

Thanks for verifying the fix!

> Your proposal to change the tests in the following order: a) failure
> due to the insufficient max_replication_slot b) failure because the
> pg_subscription_rel has 'd' state c) successful upgrade. looks good to
> me.

Right.

> I have also verified that your changes fixes the issue as the
> successful upgrade is moved to the end and the old cluster is no
> longer used after upgrade.

Yeah, it is same as my expectation.

> One minor suggestion:
> There is an extra line break here, this can be removed:
> @@ -181,139 +310,5 @@ is($result, qq(1),
> "check the data is synced after enabling the subscription for
> the table that was in init state"
>  );
> 
> -# cleanup
>

Removed.

PSA a new version patch.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



v2-0001-Fix-testcase.patch
Description: v2-0001-Fix-testcase.patch

Re: pg_upgrade and logical replication

2024-02-13 Thread vignesh C

On Wed, 14 Feb 2024 at 09:07, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Justin,
>
> > pg_upgrade/t/004_subscription.pl says
> >
> > |my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
> >
> > ..but I think maybe it should not.
> >
> > When you try to use --link, it fails:
> > https://cirrus-ci.com/task/4669494061170688
> >
> > |Adding ".old" suffix to old global/pg_control ok
> > |
> > |If you want to start the old cluster, you will need to remove
> > |the ".old" suffix from
> > /tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_su
> > bscription_old_sub_data/pgdata/global/pg_control.old.
> > |Because "link" mode was used, the old cluster cannot be safely
> > |started once the new cluster has been started.
> > |...
> > |
> > |postgres: could not find the database system
> > |Expected to find it in the directory
> > "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> > ubscription_old_sub_data/pgdata",
> > |but could not open file
> > "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> > ubscription_old_sub_data/pgdata/global/pg_control": No such file or 
> > directory
> > |# No postmaster PID for node "old_sub"
> > |[19:36:01.396](0.250s) Bail out!  pg_ctl start failed
> >
>
> Good catch! The primal reason of the failure is to reuse the old cluster, 
> even after
> the successful upgrade. The documentation said [1]:
>
> >
> If you use link mode, the upgrade will be much faster (no file copying) and 
> use less
> disk space, but you will not be able to access your old cluster once you 
> start the new
> cluster after the upgrade.
> >
>
> > You could rename pg_control.old to avoid that immediate error, but that 
> > doesn't
> > address the essential issue that "the old cluster cannot be safely started 
> > once
> > the new cluster has been started."
>
> Yeah, I agreed that it should be avoided to access to the old cluster after 
> the upgrade.
> IIUC, pg_upgrade would be run third times in 004_subscription.
>
> 1. successful upgrade
> 2. failure due to the insufficient max_replication_slot
> 3. failure because the pg_subscription_rel has 'd' state
>
> And old instance is reused in all of runs. Therefore, the most reasonable fix 
> is to
> change the ordering of tests, i.e., "successful upgrade" should be done at 
> last.
>
> Attached patch modified the test accordingly. Also, it contains some 
> optimizations.

Your proposal to change the tests in the following order: a) failure
due to the insufficient max_replication_slot b) failure because the
pg_subscription_rel has 'd' state c) successful upgrade. looks good to
me.
I have also verified that your changes fixes the issue as the
successful upgrade is moved to the end and the old cluster is no
longer used after upgrade.

One minor suggestion:
There is an extra line break here, this can be removed:
@@ -181,139 +310,5 @@ is($result, qq(1),
"check the data is synced after enabling the subscription for
the table that was in init state"
 );

-# cleanup

Regards,
Vignesh

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Sutou Kouhei

Hi,

In 
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Wed, 14 Feb 2024 12:28:38 +0900,
  Michael Paquier  wrote:

>> How about the attached patch approach? If it's a desired
>> approach, I can also write a separated patch for COPY TO.
> 
> Hmm, I have not studied that much, but my first impression was that we
> would not require any new facility in fmgr.c, but perhaps you're right
> and it's more elegant to pass a InitFunctionCallInfoData this way.

I'm not familiar with the fmgr.c related code base but it
seems that we abstract {,binary-}input function call by
fmgr.c. So I think that it's better that we follow the
design. (If there is a person who knows the fmgr.c related
code base, please help us.)

> PrepareInputFunctionCallInfo() looks OK as a name, but I'm less a fan
> of PreparedInputFunctionCallSafe() and its "Prepared" part.  How about
> something like ExecuteInputFunctionCallSafe()?

I understand the feeling. SQL uses "prepared" for "prepared
statement". There are similar function names such as
InputFunctionCall()/InputFunctionCallSafe()/DirectInputFunctionCallSafe(). They
execute (call) an input function but they use "call" not
"execute" for it... So "Execute...Call..." may be
redundant...

How about InputFunctionCallSafeWithInfo(),
InputFunctionCallSafeInfo() or
InputFunctionCallInfoCallSafe()?

> I may be able to look more at that next week, and I would surely check
> the impact of that with a simple COPY query throttled by CPU (more
> rows and more attributes the better).

Thanks!

>Note that I've reverted 06bd311bce24 for
> the moment, as this is just getting in the way of the main patch, and
> that was non-optimal once there is a per-row callback.

Thanks for sharing the information. I'll rebase on master
when I create the v15 patch.

>> diff --git a/src/backend/commands/copyfrom.c 
>> b/src/backend/commands/copyfrom.c
>> index 41f6bc43e4..a43c853e99 100644
>> --- a/src/backend/commands/copyfrom.c
>> +++ b/src/backend/commands/copyfrom.c
>> @@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
>>  /* We keep those variables in cstate. */
>>  cstate->in_functions = in_functions;
>>  cstate->typioparams = typioparams;
>> +if (cstate->opts.binary)
>> +cstate->fcinfo = PrepareInputFunctionCallInfo();
>> +else
>> +cstate->fcinfo = PrepareReceiveFunctionCallInfo();
> 
> Perhaps we'd better avoid more callbacks like that, for now.

I'll not use a callback for this. I'll not change this part
after we introduce Copy{To,From}Routine. cstate->fcinfo
isn't used some custom COPY format handlers such as Apache
Arrow handler like cstate->in_functions and
cstate->typioparams. But they will be always allocated. It's
a bit wasteful for those handlers but we may not care about
it. So we can always use "if (state->opts.binary)" condition
here.

BTW... This part was wrong... Sorry... It should be:

if (cstate->opts.binary)
cstate->fcinfo = PrepareReceiveFunctionCallInfo();
else
cstate->fcinfo = PrepareInputFunctionCallInfo();

Thanks,
-- 
kou

Re: SQL:2011 application time

2024-02-13 Thread jian he

Hi
more minor issues.

+ FindFKComparisonOperators(
+ fkconstraint, tab, i, fkattnum,
+ &old_check_ok, &old_pfeqop_item,
+ pktypoid[i], fktypoid[i], opclasses[i],
+ is_temporal, false,
+ &pfeqoperators[i], &ppeqoperators[i], &ffeqoperators[i]);
+ }
+ if (is_temporal) {
+ pkattnum[numpks] = pkperiodattnum;
+ pktypoid[numpks] = pkperiodtypoid;
+ fkattnum[numpks] = fkperiodattnum;
+ fktypoid[numpks] = fkperiodtypoid;

- pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
- eqstrategy);
- if (OidIsValid(pfeqop))
- {
- pfeqop_right = fktyped;
- ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,
- eqstrategy);
- }
- else
- {
- /* keep compiler quiet */
- pfeqop_right = InvalidOid;
- ffeqop = InvalidOid;
- }
+ FindFKComparisonOperators(
+ fkconstraint, tab, numpks, fkattnum,
+ &old_check_ok, &old_pfeqop_item,
+ pkperiodtypoid, fkperiodtypoid, opclasses[numpks],
+ is_temporal, true,
+ &pfeqoperators[numpks], &ppeqoperators[numpks], &ffeqoperators[numpks]);
+ numfks += 1;
+ numpks += 1;
+ }

opening curly brace should be the next line, also do you think it's
good idea to add following in the `if (is_temporal)` branch
`
Assert(OidIsValid(fkperiodtypoid) && OidIsValid(pkperiodtypoid));
Assert(OidIsValid(pkperiodattnum > 0 && fkperiodattnum > 0));
`

` if (is_temporal)` branch, you can set the FindFKComparisonOperators
10th argument (is_temporal)
to true, since you are already in the ` if (is_temporal)` branch.

maybe we need some extra comments on
`
+ numfks += 1;
+ numpks += 1;
`
since it might not be that evident?

Do you think it's a good idea to list arguments line by line (with
good indentation) is good format? like:
FindFKComparisonOperators(fkconstraint,
tab,
i,
fkattnum,
&old_check_ok,
&old_pfeqop_item,
pktypoid[i],
fktypoid[i],
opclasses[i],
false,
false,
&pfeqoperators[i],
&ppeqoperators[i],
&ffeqoperators[i]);

Re: BitmapHeapScan streaming read user and prelim refactoring

2024-02-13 Thread Mark Dilger

> On Feb 13, 2024, at 3:11 PM, Melanie Plageman  
> wrote:

Thanks for the patch...

> Attached is a patch set which refactors BitmapHeapScan such that it
> can use the streaming read API [1]. It also resolves the long-standing
> FIXME in the BitmapHeapScan code suggesting that the skip fetch
> optimization should be pushed into the table AMs. Additionally, it
> moves table scan initialization to after the index scan and bitmap
> initialization.
> 
> patches 0001-0002 are assorted cleanup needed later in the set.
> patches 0003 moves the table scan initialization to after bitmap creation
> patch 0004 is, I think, a bug fix. see [2].
> patches 0005-0006 push the skip fetch optimization into the table AMs
> patches 0007-0009 change the control flow of BitmapHeapNext() to match
> that required by the streaming read API
> patch 0010 is the streaming read code not yet in master
> patch 0011 is the actual bitmapheapscan streaming read user.
> 
> patches 0001-0009 apply on top of master but 0010 and 0011 must be
> applied on top of a commit before a 21d9c3ee4ef74e2 (until a rebased
> version of the streaming read API is on the mailing list).

I followed your lead and applied them to 
6a8ffe812d194ba6f4f26791b6388a4837d17d6c.  `make check` worked fine, though I 
expect you know that already.

> The caveat is that these patches introduce breaking changes to two
> table AM functions for bitmapheapscan: table_scan_bitmap_next_block()
> and table_scan_bitmap_next_tuple().

You might want an independent perspective on how much of a hassle those 
breaking changes are, so I took a stab at that.  Having written a custom 
proprietary TAM for postgresql 15 here at EDB, and having ported it and 
released it for postgresql 16, I thought I'd try porting it to the the above 
commit with your patches.  Even without your patches, I already see breaking 
changes coming from commit f691f5b80a85c66d715b4340ffabb503eb19393e, which 
creates a similar amount of breakage for me as does your patches.  Dealing with 
the combined breakage might amount to a day of work, including testing, half of 
which I think I've already finished.  In other words, it doesn't seem like a 
big deal.

Were postgresql 17 shaping up to be compatible with TAMs written for 16, your 
patch would change that qualitatively, but since things are already 
incompatible, I think you're in the clear.

> A TBMIterateResult used to be threaded through both of these functions
> and used in BitmapHeapNext(). This patch set removes all references to
> TBMIterateResults from BitmapHeapNext. Because the streaming read API
> requires the callback to specify the next block, BitmapHeapNext() can
> no longer pass a TBMIterateResult to table_scan_bitmap_next_block().
> 
> More subtly, table_scan_bitmap_next_block() used to return false if
> there were no more visible tuples on the page or if the block that was
> requested was not valid. With these changes,
> table_scan_bitmap_next_block() will only return false when the bitmap
> has been exhausted and the scan can end. In order to use the streaming
> read API, the user must be able to request the blocks it needs without
> requiring synchronous feedback per block. Thus, this table AM function
> must change its meaning.
> 
> I think the way the patches are split up could be improved. I will
> think more about this. There are also probably a few mistakes with
> which comments are updated in which patches in the set.

I look forward to the next version of the patch set.  Thanks again for working 
on this.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning

2024-02-13 Thread Andrei Lepikhov


On 30/1/2024 12:44, Ashutosh Bapat wrote:

Thanks Vignesh. PFA patches rebased on the latest HEAD. The patch
addressing Amit's comments is still a separate patch for him to
review.
Thanks for this improvement. Working with partitions, I frequently see 
peaks of memory consumption during planning. So, maybe one more case can 
be resolved here.
Patch 0001 looks good. I'm not sure about free_child_sjinfo_members. Do 
we really need it as a separate routine? It might be better to inline 
this code.

Patch 0002 adds valuable comments, and I'm OK with that.

Also, as I remember, some extensions, such as pg_hint_plan, call 
build_child_join_sjinfo. It is OK to break the interface with a major 
version. But what if they need child_sjinfo a bit longer and collect 
links to this structure? I don't think it is a real stopper, but it is 
worth additional analysis.


--
regards,
Andrei Lepikhov
Postgres Professional

RE: Synchronizing slots from primary to standby

2024-02-13 Thread Zhijie Hou (Fujitsu)

On Wednesday, February 14, 2024 10:40 AM Amit Kapila  
wrote:
> 
> On Tue, Feb 13, 2024 at 9:25 PM Bertrand Drouvot
>  wrote:
> >
> > On Tue, Feb 13, 2024 at 05:20:35PM +0530, Amit Kapila wrote:
> > > On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot
> > >  wrote:
> > > > - 84% of the slotsync.c code is covered, the parts that are not
> > > > are mainly related to "errors".
> > > >
> > > > Worth to try to extend the coverage? (I've in mind 731, 739, 766,
> > > > 778, 786, 796,
> > > > 808)
> > > >
> > >
> > > All these additional line numbers mentioned by you are ERROR paths.
> > > I think if we want we can easily cover most of those but I am not
> > > sure if there is a benefit to cover each error path.
> >
> > Yeah, I think 731, 739 and one among the remaining ones mentioned
> > up-thread should be enough, thoughts?
> >
> 
> I don't know how beneficial those selective ones would be but if I have to 
> pick a
> few among those then I would pick the ones at 731 and 808. The reason is that
> 731 is related to cascading standby restriction which we may uplift in the 
> future
> and at that time one needs to be careful about the behavior, for 808 as well, 
> in
> the future, we may have a separate GUC for slot_db_name. These may not be
> good enough reasons as to why we add tests for these ERROR cases but not for
> others, however, if we have to randomly pick a few among all ERROR paths,
> these seem better to me than others.

Here is V87 patch that adds test for the suggested cases.

Best Regards,
Hou zj


v87-0001-Add-a-slot-synchronization-function.patch
Description: v87-0001-Add-a-slot-synchronization-function.patch

RE: pg_upgrade and logical replication

2024-02-13 Thread Hayato Kuroda (Fujitsu)

Dear Justin,

> pg_upgrade/t/004_subscription.pl says
> 
> |my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
> 
> ..but I think maybe it should not.
> 
> When you try to use --link, it fails:
> https://cirrus-ci.com/task/4669494061170688
> 
> |Adding ".old" suffix to old global/pg_control ok
> |
> |If you want to start the old cluster, you will need to remove
> |the ".old" suffix from
> /tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_su
> bscription_old_sub_data/pgdata/global/pg_control.old.
> |Because "link" mode was used, the old cluster cannot be safely
> |started once the new cluster has been started.
> |...
> |
> |postgres: could not find the database system
> |Expected to find it in the directory
> "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> ubscription_old_sub_data/pgdata",
> |but could not open file
> "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> ubscription_old_sub_data/pgdata/global/pg_control": No such file or directory
> |# No postmaster PID for node "old_sub"
> |[19:36:01.396](0.250s) Bail out!  pg_ctl start failed
> 

Good catch! The primal reason of the failure is to reuse the old cluster, even 
after
the successful upgrade. The documentation said [1]:

>
If you use link mode, the upgrade will be much faster (no file copying) and use 
less
disk space, but you will not be able to access your old cluster once you start 
the new
cluster after the upgrade.
>

> You could rename pg_control.old to avoid that immediate error, but that 
> doesn't
> address the essential issue that "the old cluster cannot be safely started 
> once
> the new cluster has been started."

Yeah, I agreed that it should be avoided to access to the old cluster after the 
upgrade.
IIUC, pg_upgrade would be run third times in 004_subscription.

1. successful upgrade
2. failure due to the insufficient max_replication_slot
3. failure because the pg_subscription_rel has 'd' state

And old instance is reused in all of runs. Therefore, the most reasonable fix 
is to 
change the ordering of tests, i.e., "successful upgrade" should be done at last.

Attached patch modified the test accordingly. Also, it contains some 
optimizations.
This can pass the test on my env:

```
pg_upgrade]$ PG_TEST_PG_UPGRADE_MODE='--link' PG_TEST_TIMEOUT_DEFAULT=10 make 
check PROVE_TESTS='t/004_subscription.pl'
...
# +++ tap check in src/bin/pg_upgrade +++
t/004_subscription.pl .. ok
All tests successful.
Files=1, Tests=14,  9 wallclock secs ( 0.03 usr  0.00 sys +  0.55 cusr  1.08 
csys =  1.66 CPU)
Result: PASS
```

How do you think?

[1]: https://www.postgresql.org/docs/devel/pgupgrade.html

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 




0001-Fix-testcase.patch
Description: 0001-Fix-testcase.patch

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Michael Paquier

On Tue, Feb 13, 2024 at 05:33:40PM +0900, Sutou Kouhei wrote:
> Hi,
> 
> In <20240209192705.5qdilvviq3py2...@awork3.anarazel.de>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 9 Feb 2024 11:27:05 -0800,
>   Andres Freund  wrote:
> 
>>> +static void
>>> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
>>> +  FmgrInfo *finfo, Oid *typioparam)
>>> +{
>>> +   Oid func_oid;
>>> +
>>> +   getTypeInputInfo(atttypid, &func_oid, typioparam);
>>> +   fmgr_info(func_oid, finfo);
>>> +}
>> 
>> FWIW, we should really change the copy code to initialize 
>> FunctionCallInfoData
>> instead of re-initializing that on every call, realy makes a difference
>> performance wise.
> 
> How about the attached patch approach? If it's a desired
> approach, I can also write a separated patch for COPY TO.

Hmm, I have not studied that much, but my first impression was that we
would not require any new facility in fmgr.c, but perhaps you're right
and it's more elegant to pass a InitFunctionCallInfoData this way.

PrepareInputFunctionCallInfo() looks OK as a name, but I'm less a fan
of PreparedInputFunctionCallSafe() and its "Prepared" part.  How about
something like ExecuteInputFunctionCallSafe()?

I may be able to look more at that next week, and I would surely check
the impact of that with a simple COPY query throttled by CPU (more
rows and more attributes the better).

>>> +   cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
>>> +   /* Set read attribute callback */
>>> +   if (cstate->opts.csv_mode)
>>> +   cstate->copy_read_attributes = CopyReadAttributesCSV;
>>> +   else
>>> +   cstate->copy_read_attributes = CopyReadAttributesText;
>>> +}
>> 
>> Isn't this precisely repeating the mistake of 2889fd23be56?
> 
> What do you think about the approach in my previous mail's
> attachments?
> https://www.postgresql.org/message-id/flat/20240209.163205.704848659612151781.kou%40clear-code.com#dbb1f8d7f2f0e8fe3c7e37a757fcfc54
>
> If it's a desired approach, I can prepare a v15 patch set
> based on the v14 patch set and the approach.

Yes, this one looks like it's using the right angle: we don't rely
anymore in cstate to decide which CopyReadAttributes to use, the
routines do that instead.  Note that I've reverted 06bd311bce24 for
the moment, as this is just getting in the way of the main patch, and
that was non-optimal once there is a per-row callback.

> diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
> index 41f6bc43e4..a43c853e99 100644
> --- a/src/backend/commands/copyfrom.c
> +++ b/src/backend/commands/copyfrom.c
> @@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
>   /* We keep those variables in cstate. */
>   cstate->in_functions = in_functions;
>   cstate->typioparams = typioparams;
> + if (cstate->opts.binary)
> + cstate->fcinfo = PrepareInputFunctionCallInfo();
> + else
> + cstate->fcinfo = PrepareReceiveFunctionCallInfo();

Perhaps we'd better avoid more callbacks like that, for now.
--
Michael


signature.asc
Description: PGP signature

Re: speed up a logical replica setup

2024-02-13 Thread Euler Taveira

On Fri, Feb 9, 2024, at 3:27 AM, vignesh C wrote:
> On Wed, 7 Feb 2024 at 10:24, Euler Taveira  wrote:
> >
> > On Fri, Feb 2, 2024, at 6:41 AM, Hayato Kuroda (Fujitsu) wrote:
> >
> > Thanks for updating the patch!
> 
> Thanks for the updated patch, few comments:
> Few comments:
> 1) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_replslot = false; should be there else there will be an
> error during  drop replication slot cleanup in error flow:

Why? drop_replication_slot() is basically called by atexit().

> 2) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_publication = false; should be there else there will be
> an error during drop publication cleanup in error flow:

Ditto. drop_publication() is basically called by atexit().

> 
> 3) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_subscription = false; should be there else there will be
> an error during drop publication cleanup in error flow:

Ditto. drop_subscription() is only called by atexit().

> 4) I was not sure if drop_publication is required here, as we will not
> create any publication in subscriber node:
> +   if (dbinfo[i].made_subscription)
> +   {
> +   conn = connect_database(dbinfo[i].subconninfo);
> +   if (conn != NULL)
> +   {
> +   drop_subscription(conn, &dbinfo[i]);
> +   if (dbinfo[i].made_publication)
> +   drop_publication(conn, &dbinfo[i]);
> +   disconnect_database(conn);
> +   }
> +   }

setup_subscriber() explains the reason.

/*
 * Since the publication was created before the consistent LSN, it is
 * available on the subscriber when the physical replica is promoted.
 * Remove publications from the subscriber because it has no use.
 */
drop_publication(conn, &dbinfo[i]);

I changed the referred code a bit because it is not reliable. Since
made_subscription was not set until we create the subscription, the
publications that were created on primary and replicated to standby won't be
removed on subscriber. Instead, it should rely on the recovery state to decide
if it should drop it.

> 5) The connection should be disconnected in case of error case:
> +   /* secure search_path */
> +   res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
> +   if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +   {
> +   pg_log_error("could not clear search_path: %s",
> PQresultErrorMessage(res));
> +   return NULL;
> +   }
> +   PQclear(res);

connect_database() is usually followed by a NULL test and exit(1) if it cannot
connect. It should be added for correctness but it is not a requirement.

> 7) These includes are not required:
> 7.a) #include 
> 7.b) #include 
> 7.c) #include 
> 7.d) #include "access/xlogdefs.h"
> 7.e) #include "catalog/pg_control.h"
> 7.f) #include "common/file_utils.h"
> 7.g) #include "utils/pidfile.h"

Good catch. I was about to review the include files.

> 8) POSTMASTER_STANDBY and POSTMASTER_FAILED are not being used, is it
> required or kept for future purpose:
> +enum WaitPMResult
> +{
> +   POSTMASTER_READY,
> +   POSTMASTER_STANDBY,
> +   POSTMASTER_STILL_STARTING,
> +   POSTMASTER_FAILED
> +};

I just copied verbatim from pg_ctl. We should remove the superfluous states.

> 9) pg_createsubscriber should be kept after pg_basebackup to maintain
> the consistent order:

Ok.

> 
> 10) dry-run help message is not very clear, how about something
> similar to pg_upgrade's message like "check clusters only, don't
> change any data":

$ /tmp/pgdevel/bin/pg_archivecleanup --help | grep dry-run
  -n, --dry-run   dry run, show the names of the files that would be
$ /tmp/pgdevel/bin/pg_combinebackup --help | grep dry-run
  -n, --dry-run don't actually do anything
$ /tmp/pgdevel/bin/pg_resetwal --help | grep dry-run
  -n, --dry-run  no update, just show what would be done
$ /tmp/pgdevel/bin/pg_rewind --help | grep dry-run
  -n, --dry-run  stop before modifying anything
$ /tmp/pgdevel/bin/pg_upgrade --help | grep check  
  -c, --check   check clusters only, don't change any data

I used the same sentence as pg_rewind but I'm fine with pg_upgrade or
pg_combinebackup sentences.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: POC, WIP: OR-clause support for indexes

2024-02-13 Thread Andrei Lepikhov


On 13/2/2024 17:03, Andrei Lepikhov wrote:

On 13/2/2024 07:00, jian he wrote:

The time is the last result of the 10 iterations.
I'm not sure about the origins of such behavior, but it seems to be an 
issue of parallel workers, not this specific optimization.
Having written that, I'd got a backburner. And to close that issue, I 
explored get_restriction_qual_cost(). A close look shows us that "x IN 
(..)" cheaper than its equivalent "x=N1 OR ...". Just numbers:


ANY: startup_cost = 0.0225; total_cost = 0.005
OR: startup_cost==0; total_cost = 0.0225

Expression total_cost is calculated per tuple. In your example, we have 
many tuples, so the low cost of expression per tuple dominates over the 
significant startup cost.


According to the above, SAOP adds 6250 to the cost of SeqScan; OR - 
13541. So, the total cost of the query with SAOP is less than with OR, 
and the optimizer doesn't choose heavy parallel workers. And it is the 
answer.


So, this example is more about the subtle balance between 
parallel/sequential execution, which can vary from one platform to another.


--
regards,
Andrei Lepikhov
Postgres Professional

Re: speed up a logical replica setup

2024-02-13 Thread Euler Taveira

On Thu, Feb 8, 2024, at 12:04 AM, Hayato Kuroda (Fujitsu) wrote:
> >
> Remember the target server was a standby (read only access). I don't expect an
> application trying to modify it; unless it is a buggy application.
> >
> 
> What if the client modifies the data just after the promotion?
> Naively considered, all the changes can be accepted, but are there any issues?

If someone modifies data after promotion, fine; she has to deal with conflicts,
if any. IMO it is solved adding one or two sentences in the documentation.

> >
> Regarding
> GUCs, almost all of them is PGC_POSTMASTER (so it cannot be modified unless 
> the
> server is restarted). The ones that are not PGC_POSTMASTER, does not affect 
> the
> pg_createsubscriber execution [1].
> >
> 
> IIUC,  primary_conninfo and primary_slot_name is PGC_SIGHUP.

Ditto.

> >
> I'm just pointing out that this case is a different from pg_upgrade (from 
> which
> this idea was taken). I'm not saying that's a bad idea. I'm just arguing that
> you might be preventing some access read only access (monitoring) when it is
> perfectly fine to connect to the database and execute queries. As I said
> before, the current UI allows anyone to setup the standby to accept only local
> connections. Of course, it is an extra step but it is possible. However, once
> you apply v16-0007, there is no option but use only local connection during 
> the
> transformation. Is it an acceptable limitation?
> >
> 
> My remained concern is written above. If they do not problematic we may not 
> have
> to restrict them for now. At that time, changes 
> 
> 1) overwriting a port number,
> 2) setting listen_addresses = ''

It can be implemented later if people are excited by it.

> are not needed, right? IIUC inconsistency of -P may be still problematic.

I still think we shouldn't have only the transformed primary_conninfo as
option.

> >
> pglogical_create_subscriber does nothing [2][3].
> >
> 
> Oh, thanks.
> Just to confirm - pglogical set shared_preload_libraries to '', should we 
> follow or not?

The in-core logical replication does not require any library to be loaded.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-02-13 Thread Andres Freund

Hi,

On 2024-02-12 17:33:24 -0800, Jeff Davis wrote:
> On Mon, 2024-02-12 at 15:36 -0800, Andres Freund wrote:
> >
> > It doesn't really seem like a necessary, or even particularly useful,
> > part. You couldn't just call WALRead() for that, since the caller
> > would need
> > to know the range up to which WAL is valid but not yet flushed as
> > well. Thus
> > the caller would need to first use WaitXLogInsertionsToFinish() or
> > something
> > like it anyway - and then there's no point in doing the WALRead()
> > anymore.
>
> I follow until the last part. Did you mean "and then there's no point
> in doing the WaitXLogInsertionsToFinish() in WALReadFromBuffers()
> anymore"?

Yes, not sure what happened in my brain there.


> For now, should I assert that the requested WAL data is before the
> Flush pointer or assert that it's before the Write pointer?

Yes, I think that'd be good.


> > Note that for replicating unflushed data, we *still* might need to
> > fall back
> > to reading WAL data from disk. In which case not asserting in
> > WALRead() would
> > just make it hard to find bugs, because not using
> > WaitXLogInsertionsToFinish()
> > would appear to work as long as data is in wal buffers, but as soon
> > as we'd
> > fall back to on-disk (but unflushed) data, we'd send bogus WAL.
>
> That makes me wonder whether my previous idea[1] might matter: when
> some buffers have been evicted, should WALReadFromBuffers() keep going
> through the loop and return the end portion of the requested data
> rather than the beginning?

I still doubt that that will help very often, but it'll take some
experimentation to figure it out, I guess.


> We can sort that out when we get closer to replicating unflushed WAL.

+1

Greetings,

Andres Freund

Re: confusing / inefficient "need_transcoding" handling in copy

2024-02-13 Thread Sutou Kouhei

Hi,

In 
  "Re: confusing / inefficient "need_transcoding" handling in copy" on Wed, 14 
Feb 2024 06:56:16 +0900,
  Michael Paquier  wrote:

> We have a couple of non-ASCII characters in the tests, but I suspect
> that this one will not be digested correctly everywhere, even if
> EUC_JP should be OK to use for the check.  How about writing an
> arbitrary sequence of bytes into a temporary file that gets used for 
> the COPY FROM instead?  See for example how we do that with
> abs_builddir in copy.sql.

It makes sense. How about the attached patch?


Thanks,
-- 
kou
>From 6eb9669f97c54f8b85fac63db40ad80664692d12 Mon Sep 17 00:00:00 2001
From: Sutou Kouhei 
Date: Wed, 14 Feb 2024 11:44:13 +0900
Subject: [PATCH v2] Add a test for invalid encoding for COPY FROM

The test data use an UTF-8 character (U+3042 HIRAGANA LETTER A) but
the test specifies EUC_JP. So it's an invalid data.
---
 src/test/regress/expected/copyencoding.out | 13 +
 src/test/regress/parallel_schedule |  2 +-
 src/test/regress/sql/copyencoding.sql  | 15 +++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/copyencoding.out
 create mode 100644 src/test/regress/sql/copyencoding.sql

diff --git a/src/test/regress/expected/copyencoding.out b/src/test/regress/expected/copyencoding.out
new file mode 100644
index 00..32a9d918fa
--- /dev/null
+++ b/src/test/regress/expected/copyencoding.out
@@ -0,0 +1,13 @@
+--
+-- Test cases for COPY WITH (ENCODING)
+--
+-- directory paths are passed to us in environment variables
+\getenv abs_builddir PG_ABS_BUILDDIR
+CREATE TABLE test (t text);
+\set utf8_csv :abs_builddir '/results/copyencoding_utf8.csv'
+-- U+3042 HIRAGANA LETTER A
+COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
+COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP');
+ERROR:  invalid byte sequence for encoding "EUC_JP": 0xe3 0x81
+CONTEXT:  COPY test, line 1
+DROP TABLE test;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1d8a414eea..238cef28c4 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -36,7 +36,7 @@ test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comment
 # execute two copy tests in parallel, to check that copy itself
 # is concurrent safe.
 # --
-test: copy copyselect copydml insert insert_conflict
+test: copy copyselect copydml copyencoding insert insert_conflict
 
 # --
 # More groups of parallel tests
diff --git a/src/test/regress/sql/copyencoding.sql b/src/test/regress/sql/copyencoding.sql
new file mode 100644
index 00..89e2124996
--- /dev/null
+++ b/src/test/regress/sql/copyencoding.sql
@@ -0,0 +1,15 @@
+--
+-- Test cases for COPY WITH (ENCODING)
+--
+
+-- directory paths are passed to us in environment variables
+\getenv abs_builddir PG_ABS_BUILDDIR
+
+CREATE TABLE test (t text);
+
+\set utf8_csv :abs_builddir '/results/copyencoding_utf8.csv'
+-- U+3042 HIRAGANA LETTER A
+COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
+COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP');
+
+DROP TABLE test;
-- 
2.43.0

Re: Synchronizing slots from primary to standby

2024-02-13 Thread Amit Kapila

On Tue, Feb 13, 2024 at 9:25 PM Bertrand Drouvot
 wrote:
>
> On Tue, Feb 13, 2024 at 05:20:35PM +0530, Amit Kapila wrote:
> > On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot
> >  wrote:
> > > - 84% of the slotsync.c code is covered, the parts that are not are mainly
> > > related to "errors".
> > >
> > > Worth to try to extend the coverage? (I've in mind 731, 739, 766, 778, 
> > > 786, 796,
> > > 808)
> > >
> >
> > All these additional line numbers mentioned by you are ERROR paths. I
> > think if we want we can easily cover most of those but I am not sure
> > if there is a benefit to cover each error path.
>
> Yeah, I think 731, 739 and one among the remaining ones mentioned up-thread 
> should
> be enough, thoughts?
>

I don't know how beneficial those selective ones would be but if I
have to pick a few among those then I would pick the ones at 731 and
808. The reason is that 731 is related to cascading standby
restriction which we may uplift in the future and at that time one
needs to be careful about the behavior, for 808 as well, in the
future, we may have a separate GUC for slot_db_name. These may not be
good enough reasons as to why we add tests for these ERROR cases but
not for others, however, if we have to randomly pick a few among all
ERROR paths, these seem better to me than others.

-- 
With Regards,
Amit Kapila.

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-02-13 Thread Jeff Davis

On Tue, 2024-02-13 at 22:47 +0530, Bharath Rupireddy wrote:
> c) If planning to replicate unflushed data, then ensure all the
> WALRead() callers wait until startptr+count is past the current
> insert
> position with WaitXLogInsertionsToFinish().

WALRead() can't read past the Write pointer, so there's no point in
calling WaitXLogInsertionsToFinish(), right?


Regards,
Jeff Davis

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-02-13 Thread Jeff Davis

Attached 2 patches.

Per Andres's suggestion, 0001 adds an:
  Assert(startptr + count <= LogwrtResult.Write)

Though if we want to allow the caller (e.g. in an extension) to
determine the valid range, perhaps using WaitXLogInsertionsToFinish(),
then the check is wrong. Maybe we should just get rid of that code
entirely and trust the caller to request a reasonable range?

On Mon, 2024-02-12 at 17:33 -0800, Jeff Davis wrote:
> That makes me wonder whether my previous idea[1] might matter: when
> some buffers have been evicted, should WALReadFromBuffers() keep
> going
> through the loop and return the end portion of the requested data
> rather than the beginning?
> [1] 
> https://www.postgresql.org/message-id/2b36bf99e762e65db0dafbf8d338756cf5fa6ece.ca...@j-davis.com

0002 is to illustrate the above idea. It's a strange API so I don't
intend to commit it in this form, but I think we will ultimately need
to do something like it when we want to replicate unflushed data.

The idea is that data past the Write pointer is always (and only)
available in the WAL buffers, so WALReadFromBuffers() should always
return it. That way we can always safely fall through to ordinary
WALRead(), which can only see before the Write pointer. There's also
data before the Write pointer that could be in the WAL buffers, and we
might as well copy that, too, if it's not evicted.

If some buffers are evicted, it will fill in the *end* of the buffer,
leaving a gap at the beginning. The nice thing is that if there is any
gap, it will be before the Write pointer, so we can always fall back to
WALRead() to fill the gap and it should always succeed.

Regards,
Jeff Davis

From f890362e9f5cefd04ee3f9406c7fcafb6a277e45 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 13 Feb 2024 11:17:08 -0800
Subject: [PATCH 1/2] Add assert to WALReadFromBuffers().

Per suggestion from Andres.
---
 src/backend/access/transam/xlog.c | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4e14c242b1..50c347a679 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1710,12 +1710,13 @@ GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
  * of bytes read successfully.
  *
  * Fewer than 'count' bytes may be read if some of the requested WAL data has
- * already been evicted from the WAL buffers, or if the caller requests data
- * that is not yet available.
+ * already been evicted.
  *
  * No locks are taken.
  *
- * The 'tli' argument is only used as a convenient safety check so that
+ * Caller should ensure that it reads no further than LogwrtResult.Write
+ * (which should have been updated by the caller when determining how far to
+ * read). The 'tli' argument is only used as a convenient safety check so that
  * callers do not read from WAL buffers on a historical timeline.
  */
 Size
@@ -1724,26 +1725,13 @@ WALReadFromBuffers(char *dstbuf, XLogRecPtr startptr, Size count,
 {
 	char	   *pdst = dstbuf;
 	XLogRecPtr	recptr = startptr;
-	XLogRecPtr	upto;
-	Size		nbytes;
+	Size		nbytes = count;
 
 	if (RecoveryInProgress() || tli != GetWALInsertionTimeLine())
 		return 0;
 
 	Assert(!XLogRecPtrIsInvalid(startptr));
-
-	/*
-	 * Don't read past the available WAL data.
-	 *
-	 * Check using local copy of LogwrtResult. Ordinarily it's been updated by
-	 * the caller when determining how far to read; but if not, it just means
-	 * we'll read less data.
-	 *
-	 * XXX: the available WAL could be extended to the WAL insert pointer by
-	 * calling WaitXLogInsertionsToFinish().
-	 */
-	upto = Min(startptr + count, LogwrtResult.Write);
-	nbytes = upto - startptr;
+	Assert(startptr + count <= LogwrtResult.Write);
 
 	/*
 	 * Loop through the buffers without a lock. For each buffer, atomically
-- 
2.34.1

From 6fc92fa74a033c881624177365afbbffc37ed873 Mon Sep 17 00:00:00 2001
From: Jeff Davis 
Date: Tue, 13 Feb 2024 16:56:46 -0800
Subject: [PATCH 2/2] WALReadFromBuffers: read end of the requested range

---
 src/backend/access/transam/xlog.c   | 109 
 src/backend/replication/walsender.c |  14 ++--
 src/include/access/xlog.h   |   2 +-
 3 files changed, 70 insertions(+), 55 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 50c347a679..ea55e1b77b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1706,11 +1706,21 @@ GetXLogBuffer(XLogRecPtr ptr, TimeLineID tli)
 }
 
 /*
- * Read WAL data directly from WAL buffers, if available. Returns the number
- * of bytes read successfully.
+ * Read WAL data directly from WAL buffers, if available.
  *
- * Fewer than 'count' bytes may be read if some of the requested WAL data has
- * already been evicted.
+ * Some pages in the requested range may already be evicted from the WAL
+ * buffers, in which case this function continues on and reads the *end* of
+ * the

Re: Small fix on query_id_enabled

2024-02-13 Thread Yugo NAGATA

On Wed, 14 Feb 2024 07:21:54 +0900
Michael Paquier  wrote:

> On Tue, Feb 13, 2024 at 11:23:47PM +0800, Julien Rouhaud wrote:
> >  +1!
> 
> Okay, applied as-is, then.

Thank you!

Regards,
Yugo Nagata

> --
> Michael


-- 
Yugo NAGATA

Re: make add_paths_to_append_rel aware of startup cost

2024-02-13 Thread Thomas Munro

On Thu, Oct 5, 2023 at 9:07 PM David Rowley  wrote:
> Thanks. Pushed.

FYI somehow this plan from a8a968a8212e flipped in this run:

=== dumping 
/home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/regression.diffs
===
diff -U3 
/home/bf/bf-build/mylodon/HEAD/pgsql/src/test/regress/expected/union.out
/home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/union.out
--- /home/bf/bf-build/mylodon/HEAD/pgsql/src/test/regress/expected/union.out
2024-01-15 00:31:13.947555940 +
+++ 
/home/bf/bf-build/mylodon/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/union.out
2024-02-14 00:06:17.075584839 +
@@ -1447,9 +1447,9 @@
->  Append
  ->  Nested Loop
Join Filter: (t1.tenthous = t2.tenthous)
-   ->  Seq Scan on tenk1 t1
+   ->  Seq Scan on tenk2 t2
->  Materialize
- ->  Seq Scan on tenk2 t2
+ ->  Seq Scan on tenk1 t1
  ->  Result
 (8 rows)

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2024-02-14%2000%3A01%3A03

Re: Fix incorrect PG_GETARG in pgcrypto

2024-02-13 Thread Michael Paquier

On Tue, Feb 13, 2024 at 05:36:36PM +0900, Michael Paquier wrote:
> You've indeed grabbed some historical inconsistencies here.  Please
> note that your patch has reversed diffs (for example, the SQL
> definition of pgp_sym_encrypt_bytea uses bytea,text,text as arguments
> and your resulting patch shows how HEAD does the job with
> bytea,bytea,bytea), but perhaps you have generated it with a command
> like `git diff -R`? 

The reversed part of the patch put aside aside, I've double-checked
your patch and the inconsistencies seem to be all addressed in this
area.

The symmetry that we have now between the bytea and text versions of
the functions is stunning, but I cannot really get excited about
merging all of them either as it would imply a bump of pgcrypto to
update the prosrc of these functions, and we have to maintain runtime
compatibility with older versions.
--
Michael


signature.asc
Description: PGP signature

Re: When extended query protocol ends?

2024-02-13 Thread Tatsuo Ishii

Hi Dave,

>> From [1] I think the JDBC driver sends something like below if
>> autosave=always option is specified.
>>
>> "BEGIN READ ONLY" Parse/Bind/Eexecute (in the extended query protocol)
>> "SAVEPOINT PGJDBC_AUTOSAVE" (in the simple query protocol)
>>
>> It seems the SAVEPOINT is sent without finishing the extended query
>> protocol (i.e. without Sync message). Is it possible for the JDBC
>> driver to issue a Sync message before sending SAVEPOINT in simple
>> query protocol? Or you can send SAVEPOINT using the extended query
>> protocol.
>>
>> [1]
>> https://www.pgpool.net/pipermail/pgpool-general/2023-December/009051.html
> 
> 
> Can you ask the OP what version of the driver they are using. From what I
> can tell we send BEGIN using SimpleQuery.

Sure. I will get back once I get the JDBC version.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: Collation version tracking for macOS

2024-02-13 Thread Thomas Munro

On Tue, Feb 13, 2024 at 9:25 AM Jeff Davis  wrote:
> On Sun, 2024-02-11 at 22:04 +0530, Robert Haas wrote:
> > "icu_multilib must be loaded via shared_preload_libraries.
> > icu_multilib ignores any ICU library with a major version greater
> > than
> > that with which PostgreSQL was built."
> >
> > It's not clear from reading this whether the second sentence here is
> > a
> > regrettable implementation restriction or design behavior. If it's
> > design behavior, what's the point of it?
>
> That restriction came from Thomas's (uncommitted) work on the same
> problem. I believe the reasoning was that we don't know whether future
> versions of ICU might break something that we're doing, though perhaps
> there's a better way.

Right, to spell that out more fully:  We compile and link against one
particular ICU library that is present at compile time, and there is a
place in that multi-lib patch that assigns the function pointers from
that version to variables of the function pointer type that we expect.
Compilation would fail if ICU ever changed relevant function
prototypes in a future release, and then we'd have to come up with
some trampoline/wrapper scheme to wallpaper over differences.  That's
why I think it's safe to use dlsym() to access function pointers for
versions up to and including the one whose headers we were compiled
against, but not later ones which we haven't tested in that way.

Sadly I won't be able to work on multi-lib ICU support again in this
cycle.  I think we managed to prove that dlopen works for this, and
learn some really interesting stuff about Unicode and ICU evolution,
but we still have to come up with the right model, catalogues and DDL
etc, for a nice user experience.  What I was most recently
experimenting with based on earlier discussions was the idea of
declaring separate providers: icu72 and icu68 could both exist and you
could create extra indexes and then drop the originals as a
no-downtime upgrade path.  I have a pet theory that you could usefully
support multi-version libc locales too if you're prepared to make
certain assumptions (short version: take the collation definition
files from any older version of your OS, compile with newer version's
localedef, give it a name like "en_US@ubuntu18", and assume/pray they
didn't change stuff that wasn't expressed in the definition file), so
I was working on a generalisation slightly wider than just
multi-version ICU.

Re: Patch: Add parse_type Function

2024-02-13 Thread David E. Wheeler

On Feb 12, 2024, at 15:55, David E. Wheeler  wrote:

> Happy to move it wherever makes the most sense.

Update with a bunch of the suggested changes. Some questions still open from 
the previous post, though. 

Best,

David



v5-0001-Add-parse_type-SQL-function.patch
Description: Binary data

Re: Document efficient self-joins / UPDATE LIMIT techniques.

2024-02-13 Thread Corey Huinker

On Tue, Feb 13, 2024 at 11:51 AM Joel Jacobson  wrote:

> On Tue, Feb 13, 2024, at 10:28, Laurenz Albe wrote:
> > On Mon, 2024-02-12 at 12:24 -0500, Corey Huinker wrote:
> >> > Do you plan to add it to the commitfest?  If yes, I'd set it "ready
> for committer".
> >>
> >> Commitfest entry reanimated.
> >
> > Truly... you created a revenant in the already closed commitfest.
> >
> > I closed that again and added a new entry in the open commitfest.
> >
> > Yours,
> > Laurenz Albe
>
> This thread reminded me of the old discussion "LIMIT for UPDATE and
> DELETE" from 2014 [1].
>
> Back in 2014, it was considered a "fringe feature" by some. It is thought
> to be more commonplace today?
>
> /Joel
>
> [1]
> https://www.postgresql.org/message-id/flat/CADB9FDf-Vh6RnKAMZ4Rrg_YP9p3THdPbji8qe4qkxRuiOwm%3Dmg%40mail.gmail.com

This patch came out of a discussion at the last PgCon with the person who
made the "fringe feature" quote, who seemed quite supportive of documenting
the technique. The comment may have been in regards to actually
implementing a LIMIT clause on UPDATE and DELETE, which isn't in the SQL
standard and would be difficult to implement as the two statements have no
concept of ordering. Documenting the workaround would alleviate some
interest in implementing a nonstandard feature.

As for whether it's commonplace, when I was a consultant I had a number of
customers that I had who bemoaned how large updates caused big replica lag,
basically punishing access to records they did care about in order to
properly archive or backfill records they don't care about. I used the
technique a lot, putting the update/delete in a loop, and often running
multiple copies of the same script at times when I/O contention was low,
but if load levels rose it was trivial to just kill a few of the scripts
until things calmed down.

Re: Small fix on query_id_enabled

2024-02-13 Thread Michael Paquier

On Tue, Feb 13, 2024 at 11:23:47PM +0800, Julien Rouhaud wrote:
>  +1!

Okay, applied as-is, then.
--
Michael


signature.asc
Description: PGP signature

Re: confusing / inefficient "need_transcoding" handling in copy

2024-02-13 Thread Michael Paquier

On Thu, Feb 08, 2024 at 05:25:01PM +0900, Sutou Kouhei wrote:
> In <20240206222445.hzq22pb2nye7r...@awork3.anarazel.de>
>   "Re: confusing / inefficient "need_transcoding" handling in copy" on Tue, 6 
> Feb 2024 14:24:45 -0800,
>   Andres Freund  wrote:
> 
>> One unfortunate issue: We don't have any tests verifying that COPY FROM
>> catches encoding issues.
> 
> How about the attached patch for it?
>
> +CREATE TABLE test (t text);
> +COPY test FROM stdin WITH (ENCODING 'EUC_JP');
> +こんにちは
> +\.
> +
> +DROP TABLE test;

We have a couple of non-ASCII characters in the tests, but I suspect
that this one will not be digested correctly everywhere, even if
EUC_JP should be OK to use for the check.  How about writing an
arbitrary sequence of bytes into a temporary file that gets used for 
the COPY FROM instead?  See for example how we do that with
abs_builddir in copy.sql.
--
Michael


signature.asc
Description: PGP signature

Re: Transaction timeout

2024-02-13 Thread Alexander Korotkov

Hi!

On Wed, Jan 31, 2024 at 11:57 AM Andrey Borodin  wrote:
> > On 31 Jan 2024, at 14:27, Japin Li  wrote:
> >
> > LGTM.
> >
> > If there is no other objections, I'll change it to ready for committer
> > next Monday.
>
> I think we have a quorum, so I decided to go ahead and flipped status to RfC. 
> Thanks!

I checked this patch.  Generally I look good.  I've slightly revised that.

I think there is one unaddressed concern by Andres Freund [1] about
the overhead of this patch by adding extra branches and function calls
in the case transaction_timeout is disabled.  I tried to measure the
overhead of this patch using a pgbench script containing 20 semicolons
(20 empty statements in 20 empty transactions).  I didn't manage to
find measurable overhead or change of performance profile (I used
XCode Instruments on my x86 MacBook).  One thing, which I still found
possible to do is to avoid unconditional calls to
get_timeout_active(TRANSACTION_TIMEOUT).  Instead I put responsibility
for disabling timeout after GUC disables the transaction_timeout
assign hook.

I removed the TODO comment from _doSetFixedOutputState().  I think
backup restore is the operation where slow commands and slow
transactions are expected, and it's natural to disable
transaction_timeout among other timeouts there.  And the existing
comment clarifies that.

Also I made some grammar fixes to docs and comments.

I'm going to push this if there are no objections.

Links.
1. 
https://www.postgresql.org/message-id/20221206011050.s6hapukjqha35hud%40alap3.anarazel.de

--
Regards,
Alexander Korotkov

0001-Introduce-transaction_timeout-v26.patch
Description: Binary data

Re: pg_upgrade and logical replication

2024-02-13 Thread Michael Paquier

On Tue, Feb 13, 2024 at 03:05:14PM -0600, Justin Pryzby wrote:
> On Tue, Jan 02, 2024 at 03:58:25PM +0530, Amit Kapila wrote:
> > Pushed.
> 
> pg_upgrade/t/004_subscription.pl says
> 
> |my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
> 
> ..but I think maybe it should not.
> 
> When you try to use --link, it fails:
> https://cirrus-ci.com/task/4669494061170688

Thanks.  It is the kind of things we don't want to lose sight on, so I
have taken this occasion to create a wiki page for the open items of
17, and added this one to it:
https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] allow pg_current_logfile() execution under pg_monitor role

2024-02-13 Thread Nathan Bossart

On Mon, Feb 12, 2024 at 09:49:45AM -0600, Nathan Bossart wrote:
> Okay.  I'll plan on committing this in the next few days.

Here is what I have staged for commit.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From bfe542c5d7b3c981e75ac6551abb34fbdf646eea Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 13 Feb 2024 15:12:36 -0600
Subject: [PATCH v2 1/1] Allow pg_monitor to execute pg_current_logfile().

We allow roles with privileges of pg_monitor to execute functions
like pg_ls_logdir(), so it seems natural that such roles would also
be able to execute this function.

Bumps catversion.

Co-authored-by: Pavlo Golub
Discussion: https://postgr.es/m/CAK7ymcLmEYWyQkiCZ64WC-HCzXAB0omM%3DYpj9B3rXe8vUAFMqw%40mail.gmail.com
---
 doc/src/sgml/func.sgml   |  5 +
 src/backend/catalog/system_functions.sql |  4 
 src/include/catalog/catversion.h |  2 +-
 src/test/regress/expected/misc_functions.out | 20 
 src/test/regress/sql/misc_functions.sql  | 11 +++
 5 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 11d537b341..c4e5b4967e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -23735,6 +23735,11 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
 .
 The result reflects the contents of
 the current_logfiles file.
+   
+   
+This function is restricted to superusers and roles with privileges of
+the pg_monitor role by default, but other users can
+be granted EXECUTE to run the function.

   
 
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 346cfb98a0..fe2bb50f46 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -777,6 +777,10 @@ GRANT EXECUTE ON FUNCTION pg_ls_logicalmapdir() TO pg_monitor;
 
 GRANT EXECUTE ON FUNCTION pg_ls_replslotdir(text) TO pg_monitor;
 
+GRANT EXECUTE ON FUNCTION pg_current_logfile() TO pg_monitor;
+
+GRANT EXECUTE ON FUNCTION pg_current_logfile(text) TO pg_monitor;
+
 GRANT pg_read_all_settings TO pg_monitor;
 
 GRANT pg_read_all_stats TO pg_monitor;
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 9fc8ac9290..80a4c19565 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202401301
+#define CATALOG_VERSION_NO	202402131
 
 #endif
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 7c15477104..d5f61dfad9 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -683,3 +683,23 @@ SELECT gist_stratnum_identity(18::smallint);
  18
 (1 row)
 
+-- pg_current_logfile
+CREATE ROLE regress_current_logfile;
+-- not available by default
+SELECT has_function_privilege('regress_current_logfile',
+  'pg_current_logfile()', 'EXECUTE');
+ has_function_privilege 
+
+ f
+(1 row)
+
+GRANT pg_monitor TO regress_current_logfile;
+-- role has privileges of pg_monitor and can execute the function
+SELECT has_function_privilege('regress_current_logfile',
+  'pg_current_logfile()', 'EXECUTE');
+ has_function_privilege 
+
+ t
+(1 row)
+
+DROP ROLE regress_current_logfile;
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 851dad90f4..928b04db7f 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -254,3 +254,14 @@ FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size - 1),
 -- test stratnum support functions
 SELECT gist_stratnum_identity(3::smallint);
 SELECT gist_stratnum_identity(18::smallint);
+
+-- pg_current_logfile
+CREATE ROLE regress_current_logfile;
+-- not available by default
+SELECT has_function_privilege('regress_current_logfile',
+  'pg_current_logfile()', 'EXECUTE');
+GRANT pg_monitor TO regress_current_logfile;
+-- role has privileges of pg_monitor and can execute the function
+SELECT has_function_privilege('regress_current_logfile',
+  'pg_current_logfile()', 'EXECUTE');
+DROP ROLE regress_current_logfile;
-- 
2.25.1

Re: [PATCH] Add native windows on arm64 support

2024-02-13 Thread Dave Cramer

On Tue, 13 Feb 2024 at 12:52, Andres Freund  wrote:

> Hi,
>
> On 2024-02-13 12:49:33 -0500, Dave Cramer wrote:
> > > I think I might have been on to something - if my human emulation of a
> > > preprocessor isn't wrong, we'd end up with
> > >
> > > #define S_UNLOCK(lock)  \
> > > do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
> > >
> > > on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier()
> just
> > > limits *compiler* level reordering, not CPU level reordering.  I think
> it's
> > > even insufficient on x86[-64], but it's definitely insufficient on arm.
> > >
> > In fact ReadWriteBarrier has been deprecated _ReadWriteBarrier |
> Microsoft
> > Learn
> > <
> https://learn.microsoft.com/en-us/cpp/intrinsics/readwritebarrier?view=msvc-170
> >
>
> I'd just ignore that, that's just pushing towards more modern stuff that's
> more applicable to C++ than C.
>
>
> > I did try using atomic_thread_fence as per atomic_thread_fence -
> > cppreference.com
> > 
>
> The semantics of atomic_thread_fence are, uh, very odd.  I'd just use
> MemoryBarrier().
>
> #define S_UNLOCK(lock)  \
do { MemoryBarrier(); (*(lock)) = 0; } while (0)

#endif

Has no effect.

I have no idea if that is what you meant that I should do ?

Dave

Re: DSA_ALLOC_NO_OOM doesn't work

2024-02-13 Thread Thomas Munro

On Wed, Feb 14, 2024 at 3:23 AM Heikki Linnakangas  wrote:
> On 29/01/2024 14:06, Heikki Linnakangas wrote:
> > If you call dsa_allocate_extended(DSA_ALLOC_NO_OOM), it will still
> > ereport an error if you run out of space (originally reported at [0]).
> >
> > Attached patch adds code to test_dsa.c to demonstrate that:
> >
> > postgres=# select test_dsa_basic();
> > ERROR:  could not resize shared memory segment "/PostgreSQL.1312700148"
> > to 1075843072 bytes: No space left on device
> >
> > [0] https://github.com/pgvector/pgvector/issues/434#issuecomment-1912744489

Right, DSA_ALLOC_NO_OOM handles the case where there aren't any more
DSM slots (which used to be more common before we ramped up some
constants) and the case where max_total_segment_size (as self-imposed
limit) would be exceeded, but there is nothing to deal with failure to
allocate at the DSM level, and yeah that just isn't a thing it can do.
Not surprisingly that this observation comes from a Docker user: its
64MB default size limit on the /dev/shm mountpoint breaks parallel
query as discussed on list a few times (see also
https://github.com/docker-library/postgres/issues/416).

This is my mistake, introduced in commit 16be2fd10019 where I failed
to pass that down into DSM code.  The only user of DSA_ALLOC_NO_OOM in
core code so far is in dshash.c, where we just re-throw after some
cleanup, commit 4569715b, so you could leak that control object due to
this phenomenon.

> I wrote the attached patch to address this, in a fairly straightforward
> or naive way. The problem was that even though dsa_allocate() had code
> to check the return value of dsm_create(), and return NULL instead of
> ereport(ERROR) if the DSA_ALLOC_NO_OOM was set, dsm_create() does not in
> fact return NULL on OOM. To fix, I added a DSM_CREATE_NO_OOM option to
> dsm_create(), and I also had to punch that through to dsm_impl_op().

Yeah, makes total sense.

> This is a little unpolished, but if we want to backpatch something
> narrow now, this would probably be the right approach.
>
> However, I must say that the dsm_impl_op() interface is absolutely
> insane. It's like someone looked at ioctl() and thought, "hey that's a
> great idea!". It mixes all different operations like creating or
> destroying a DSM segment together into one function call, and the return
> value is just a boolean, even though the function could fail for many
> different reasons, and the callers do in fact care about the reason. In
> a more natural interface, the different operations would have very
> different function signatures.

Yeah.  It also manages to channel some of shmat() et al's negative beauty.

Anyway, that leads to this treatment of errnos in your patch:

+errno = 0;
 if (dsm_impl_op(DSM_OP_CREATE, seg->handle, size,
&seg->impl_private,
-&seg->mapped_address, &seg->mapped_size, ERROR))
+&seg->mapped_address, &seg->mapped_size,
ERROR, impl_flags))
 break;
+if ((flags & DSM_CREATE_NO_OOM) != 0 && (errno == ENOMEM
|| errno == ENOSPC))

... which seems reasonable given the constraints.  Another idea might
be to write something different in one of those output parameters to
distinguish out-of-memory, but nothing really seems to fit...

> I think we must refactor that. It might be best to leave this
> DSA_ALLOC_NO_OOM bug unfixed in backpatches, and fix it on top of the
> refactorings on master only. Later, we can backpatch the refactorings
> too if we're comfortable with it; extensions shouldn't be using the
> dsm_impl_op() interface directly.

Yeah, that sounds true.  We don't do a good job of nailing down the
public API of PostgreSQL but dsm_impl_op() is a rare case that is very
obviously not intended to be called by anyone else.  On the other
hand, if we really want to avoid changing the function prototype on
principle, perhaps we could make a new operation DSM_OP_CREATE_NO_OOM
instead?

Re: pg_upgrade and logical replication

2024-02-13 Thread Justin Pryzby

On Tue, Jan 02, 2024 at 03:58:25PM +0530, Amit Kapila wrote:
> Pushed.

pg_upgrade/t/004_subscription.pl says

|my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';

..but I think maybe it should not.

When you try to use --link, it fails:
https://cirrus-ci.com/task/4669494061170688

|Adding ".old" suffix to old global/pg_control ok
|
|If you want to start the old cluster, you will need to remove
|the ".old" suffix from 
/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_subscription_old_sub_data/pgdata/global/pg_control.old.
|Because "link" mode was used, the old cluster cannot be safely
|started once the new cluster has been started.
|...
|
|postgres: could not find the database system
|Expected to find it in the directory 
"/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_subscription_old_sub_data/pgdata",
|but could not open file 
"/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_subscription_old_sub_data/pgdata/global/pg_control":
 No such file or directory
|# No postmaster PID for node "old_sub"
|[19:36:01.396](0.250s) Bail out!  pg_ctl start failed

You could rename pg_control.old to avoid that immediate error, but that doesn't
address the essential issue that "the old cluster cannot be safely started once
the new cluster has been started."

-- 
Justin

Re: Fix overflow hazard in interval rounding

2024-02-13 Thread Tom Lane

Joseph Koshakow  writes:
> On Tue, Feb 13, 2024 at 1:46 PM Tom Lane  wrote:
>> (We'd need ereport in back branches, but this problem seems to
>> me to probably not be worth back-patching.)

> Agreed, this seems like a pretty rare overflow/underflow.

OK, pushed to HEAD only.  I converted the second steps to be like
"a -= a%b" instead of "a = (a/b)*b" to make it a little clearer
that they don't have their own risks of overflow.  Maybe it's a
shade faster that way too, not sure.

regards, tom lane

Re: Fix overflow hazard in interval rounding

2024-02-13 Thread Joseph Koshakow

On Tue, Feb 13, 2024 at 1:46 PM Tom Lane  wrote:

>I think you need to use ereturn not ereport here; see other error
>cases in AdjustIntervalForTypmod.

Attached is an updated patch that makes this adjustment.

>(We'd need ereport in back branches, but this problem seems to
>me to probably not be worth back-patching.)

Agreed, this seems like a pretty rare overflow/underflow.

Thanks,
Joe Koshakow
From 470aa9c8898b4e4ebbad97d6e421377b9a3e03cf Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Tue, 13 Feb 2024 13:06:13 -0500
Subject: [PATCH] Fix overflow hazard in interval rounding

This commit fixes overflow/underflow hazards present in the interval
rounding code used to parse intervals.
---
 src/backend/utils/adt/timestamp.c  | 18 ++
 src/test/regress/expected/interval.out |  9 +
 src/test/regress/sql/interval.sql  |  4 
 3 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index c38f88dba7..97566d7e3b 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -1509,17 +1509,19 @@ AdjustIntervalForTypmod(Interval *interval, int32 typmod,
 
 			if (interval->time >= INT64CONST(0))
 			{
-interval->time = ((interval->time +
-   IntervalOffsets[precision]) /
-  IntervalScales[precision]) *
-	IntervalScales[precision];
+if (pg_add_s64_overflow(interval->time, IntervalOffsets[precision], &interval->time))
+	ereturn(escontext, false,
+			(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+			 errmsg("interval out of range")));
+interval->time = (interval->time / IntervalScales[precision]) * IntervalScales[precision];
 			}
 			else
 			{
-interval->time = -(((-interval->time +
-	 IntervalOffsets[precision]) /
-	IntervalScales[precision]) *
-   IntervalScales[precision]);
+if (pg_sub_s64_overflow(IntervalOffsets[precision], interval->time, &interval->time))
+	ereturn(escontext, false,
+			(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+			 errmsg("interval out of range")));
+interval->time = -((interval->time / IntervalScales[precision]) * IntervalScales[precision]);
 			}
 		}
 	}
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index b79b6fcd4d..055930ccac 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -929,6 +929,15 @@ SELECT interval '1 2:03:04.5678' minute to second(2);
  1 day 02:03:04.57
 (1 row)
 
+-- these should fail as out-of-range
+SELECT interval '2562047788:00:54.775807' SECOND(2);
+ERROR:  interval out of range
+LINE 1: SELECT interval '2562047788:00:54.775807' SECOND(2);
+^
+SELECT interval '-2562047788:00:54.775807' SECOND(2);
+ERROR:  interval out of range
+LINE 1: SELECT interval '-2562047788:00:54.775807' SECOND(2);
+^
 -- test casting to restricted precision (bug #14479)
 SELECT f1, f1::INTERVAL DAY TO MINUTE AS "minutes",
   (f1 + INTERVAL '1 month')::INTERVAL MONTH::INTERVAL YEAR AS "years"
diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql
index 5566ad0e51..d945a13714 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -270,6 +270,10 @@ SELECT interval '1 2:03:04.5678' hour to second(2);
 SELECT interval '1 2.3456' minute to second(2);
 SELECT interval '1 2:03.5678' minute to second(2);
 SELECT interval '1 2:03:04.5678' minute to second(2);
+-- these should fail as out-of-range
+SELECT interval '2562047788:00:54.775807' SECOND(2);
+SELECT interval '-2562047788:00:54.775807' SECOND(2);
+
 
 -- test casting to restricted precision (bug #14479)
 SELECT f1, f1::INTERVAL DAY TO MINUTE AS "minutes",
-- 
2.34.1

Re: index prefetching

2024-02-13 Thread Peter Geoghegan

On Tue, Feb 13, 2024 at 2:01 PM Tomas Vondra
 wrote:
> On 2/7/24 22:48, Melanie Plageman wrote:
> I admit I haven't thought about kill_prior_tuple until you pointed out.
> Yeah, prefetching separates (de-synchronizes) the two scans (index and
> heap) in a way that prevents this optimization. Or at least makes it
> much more complex :-(

Another thing that argues against doing this is that we might not need
to visit any more B-Tree leaf pages when there is a LIMIT n involved.
We could end up scanning a whole extra leaf page (including all of its
tuples) for want of the ability to "push down" a LIMIT to the index AM
(that's not what happens right now, but it isn't really needed at all
right now).

This property of index scans is fundamental to how index scans work.
Pinning an index page as an interlock against concurrently TID
recycling by VACUUM is directly described by the index API docs [1],
even (the docs actually use terms like "buffer pin" rather than
something more abstract sounding). I don't think that anything
affecting that behavior should be considered an implementation detail
of the nbtree index AM as such (nor any particular index AM).

I think that it makes sense to put the index AM in control here --
that almost follows from what I said about the index AM API. The index
AM already needs to be in control, in about the same way, to deal with
kill_prior_tuple (plus it helps with the  LIMIT issue I described).

There doesn't necessarily need to be much code duplication to make
that work. Offhand I suspect it would be kind of similar to how
deletion of LP_DEAD-marked index tuples by non-nbtree index AMs gets
by with generic logic implemented by
index_compute_xid_horizon_for_tuples -- that's all that we need to
determine a snapshotConflictHorizon value for recovery conflict
purposes. Note that index_compute_xid_horizon_for_tuples() reads
*index* pages, despite not being aware of the caller's index AM and
index tuple format.

(The only reason why nbtree needs a custom solution is because it has
posting list tuples to worry about, unlike GiST and unlike Hash, which
consistently use unadorned generic IndexTuple structs with heap TID
represented in the standard/generic way only. While these concepts
probably all originated in nbtree, they're still not nbtree
implementation details.)

> > Having disabled kill_prior_tuple is why the mvcc test fails. Perhaps
> > there is an easier way to fix this, as I don't think the mvcc test
> > failed on Tomas' version.
> >
>
> I kinda doubt it worked correctly, considering I simply ignored the
> optimization. It's far more likely it just worked by luck.

The test that did fail will have only revealed that the
kill_prior_tuple wasn't operating as  expected -- which isn't the same
thing as giving wrong answers.

Note that there are various ways that concurrent TID recycling might
prevent _bt_killitems() from setting LP_DEAD bits. It's totally
unsurprising that breaking kill_prior_tuple in some way could be
missed. Andres wrote the MVCC test in question precisely because
certain aspects of kill_prior_tuple were broken for months without
anybody noticing.

[1] https://www.postgresql.org/docs/devel/index-locking.html
-- 
Peter Geoghegan

Re: Fix overflow hazard in interval rounding

2024-02-13 Thread Andres Freund

Hi,

On 2024-02-13 13:31:22 -0500, Joseph Koshakow wrote:
> Attached is a patch that fixes some overflow/underflow hazards that I
> discovered in the interval rounding code.

Random, mildly related thought: I wonder if it's time to, again, look at
enabling -ftrapv in assert enabled builds.  I had looked at that a few years
back, and fixed a number of instances, but not all I think. But I think we are
a lot closer to avoiding signed overflows everywhere, and it'd be nice to find
overflow hazards more easily. Many places are broken even with -fwrapv
semantics (which we don't have on all compilers!). Trapping on such overflows
makes it far easier to find problems with tools like sqlsmith.

Greetings,

Andres Freund

Re: CI and test improvements

2024-02-13 Thread Justin Pryzby

On Wed, Jan 31, 2024 at 11:59:21AM +0100, Alvaro Herrera wrote:
> On 2024-Jan-31, vignesh C wrote:
> 
> > Are we planning to do anything more on this? I was not sure if we
> > should move this to next commitfest or return it.
> 
> Well, the patches don't apply anymore since .cirrus.tasks.yml has been
> created.  However, I'm sure we still want [some of] the improvements
> to the tests in [1].  I can volunteer to rebase the patches in time for the
> March commitfest, if Justin is not available to do so.  If you can
> please move it forward to the March cf and set it WoA, I'd appreciate
> that.

The patches are rebased.  A couple were merged since I last rebased them
~10 months ago.  The freebsd patch will probably be obsoleted by a patch
of Thomas.

On Mon, Mar 13, 2023 at 07:39:52AM +0100, Peter Eisentraut wrote:
> On 03.02.23 15:26, Justin Pryzby wrote:
> > 9baf41674ad pg_upgrade: tap test: exercise --link and --clone
> 
> This seems like a good idea.

On Wed, Mar 15, 2023 at 10:58:41AM +0100, Peter Eisentraut wrote:
> > [PATCH 4/8] pg_upgrade: tap test: exercise --link and --clone
> 
> I haven't been able to get any changes to the test run times outside
> of noise from this.  But some more coverage is sensible in any case.
> 
> I'm concerned that with this change, the only platform that tests
> --copy is Windows, but Windows has a separate code path for copy.  So
> we should leave one Unix platform to test --copy.  Maybe have FreeBSD
> test --link and macOS test --clone and leave the others with --copy?

I addressed Peter's comments, but haven't heard further.

The patch to show HTML docs artifacts may be waiting for Andres' patch
to convert CompilerWarnings to meson.

It may also be waiting on cfbot to avoid squishing all the patches
together.

I sent various patches to cfbot but haven't heard back.
https://www.postgresql.org/message-id/flat/20220409021853.gp24...@telsasoft.com
https://www.postgresql.org/message-id/flat/20220623193125.gb22...@telsasoft.com
https://github.com/justinpryzby/cfbot/commits/master
https://github.com/macdice/cfbot/pulls

-- 
Justin
>From 37a697bf2ff2e9e24c7b8cb2df289f21e1fca924 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 1/6] cirrus/windows: add compiler_warnings_script

The goal is to fail due to warnings only after running tests.
(At least historically, it's too slow to run a separate windows VM to
compile with -Werror.)

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de

I'm not sure how to write this test in windows shell; it's also easy to
write something that doesn't work in posix sh, since windows shell is
interpretting && and ||...

See also:
8a1ce5e54f6d144e4f8e19af7c767b026ee0c956
https://cirrus-ci.com/task/6241060062494720
https://cirrus-ci.com/task/6496366607204352

ci-os-only: windows
---
 .cirrus.tasks.yml | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index e4e1bcfeb99..2d397448851 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -561,12 +561,21 @@ task:
 
   build_script: |
 vcvarsall x64
-ninja -C build
+ninja -C build |tee build.txt
+REM Since pipes lose the exit status of the preceding command, rerun the compilation
+REM without the pipe, exiting now if it fails, to avoid trying to run checks
+ninja -C build > nul
 
   check_world_script: |
 vcvarsall x64
 meson test %MTEST_ARGS% --num-processes %TEST_JOBS%
 
+  # This should be last, so check_world is run even if there are warnings
+  always:
+compiler_warnings_script:
+  # this avoids using metachars which would be interpretted by the windows shell
+  - sh -c 'if grep ": warning " build.txt; then exit 1; fi; exit 0'
+
   on_failure:
 <<: *on_failure_meson
 crashlog_artifacts:
-- 
2.42.0

>From 71d3bdad00bd0a4967db5a394247a54eb0f8a1fa Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 30 Jul 2022 19:25:51 -0500
Subject: [PATCH 2/6] cirrus/002_pg_upgrade: exercise --link and --clone

This increases code coverage (and maybe accelerates the test).

See also: b059a2409faf5833b3ba7792e247d6466c9e8090

linux,
macos,
//-os-only: freebsd
---
 .cirrus.tasks.yml | 4 
 1 file changed, 4 insertions(+)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 2d397448851..53be0ce15e4 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -138,6 +138,8 @@ task:
 CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES -DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
 CFLAGS: -Og -ggdb
 
+PG_TEST_PG_UPGRADE_MODE: --link
+
   <<: *freebsd_task_template
 
   depends_on: SanityCheck
@@ -431,6 +433,8 @@ task:
 CFLAGS: -Og -ggdb
 CXXFLAGS: -Og -ggdb
 
+PG_TEST_PG_UPGRADE_MODE: --clone
+
   <<: *macos_task_template
 
   depends_on: SanityCheck
-- 
2.42.0

>From 78dd05a15f1b0f476eaa29c8f8ceb70964e009a0 Mon Sep 1

Re: index prefetching

2024-02-13 Thread Tomas Vondra

On 2/7/24 22:48, Melanie Plageman wrote:
> ...
> 
> Attached is a patch which implements a real queue and fixes some of
> the issues with the previous version. It doesn't pass tests yet and
> has issues. Some are bugs in my implementation I need to fix. Some are
> issues we would need to solve in the streaming read API. Some are
> issues with index prefetching generally.
> 
> Note that these two patches have to be applied before 21d9c3ee4e
> because Thomas hasn't released a rebased version of the streaming read
> API patches yet.
> 

Thanks for working on this, and for investigating the various issues.

> Issues
> ---
> - kill prior tuple
> 
> This optimization doesn't work with index prefetching with the current
> design. Kill prior tuple relies on alternating between fetching a
> single index tuple and visiting the heap. After visiting the heap we
> can potentially kill the immediately preceding index tuple. Once we
> fetch multiple index tuples, enqueue their TIDs, and later visit the
> heap, the next index page we visit may not contain all of the index
> tuples deemed killable by our visit to the heap.
> 

I admit I haven't thought about kill_prior_tuple until you pointed out.
Yeah, prefetching separates (de-synchronizes) the two scans (index and
heap) in a way that prevents this optimization. Or at least makes it
much more complex :-(

> In our case, we could try and fix this by prefetching only heap blocks
> referred to by index tuples on the same index page. Or we could try
> and keep a pool of index pages pinned and go back and kill index
> tuples on those pages.
> 

I think restricting the prefetching to a single index page would not be
a huge issue performance-wise - that's what the initial patch version
(implemented at the index AM level) did, pretty much. The prefetch queue
would get drained as we approach the end of the index page, but luckily
index pages tend to have a lot of entries. But it'd put an upper bound
on the prefetch distance (much lower than the e_i_c maximum 1000, but
I'd say common values are 10-100 anyway).

But how would we know we're on the same index page? That knowledge is
not available outside the index AM - the executor or indexam.c does not
know this, right? Presumably we could expose this, somehow, but it seems
like a violation of the abstraction ...

The same thing affects keeping multiple index pages pinned, for TIDs
that are yet to be used by the index scan. We'd need to know when to
release a pinned page, once we're done with processing all items.

FWIW I haven't tried to implementing any of this, so maybe I'm missing
something and it can be made to work in a nice way.

> Having disabled kill_prior_tuple is why the mvcc test fails. Perhaps
> there is an easier way to fix this, as I don't think the mvcc test
> failed on Tomas' version.
> 

I kinda doubt it worked correctly, considering I simply ignored the
optimization. It's far more likely it just worked by luck.

> - switching scan directions
> 
> If the index scan switches directions on a given invocation of
> IndexNext(), heap blocks may have already been prefetched and read for
> blocks containing tuples beyond the point at which we want to switch
> directions.
> 
> We could fix this by having some kind of streaming read "reset"
> callback to drop all of the buffers which have been prefetched which
> are now no longer needed. We'd have to go backwards from the last TID
> which was yielded to the caller and figure out which buffers in the
> pgsr buffer ranges are associated with all of the TIDs which were
> prefetched after that TID. The TIDs are in the per_buffer_data
> associated with each buffer in pgsr. The issue would be searching
> through those efficiently.
> 

Yeah, that's roughly what I envisioned in one of my previous messages
about this issue - walking back the TIDs read from the index and added
to the prefetch queue.

> The other issue is that the streaming read API does not currently
> support backwards scans. So, if we switch to a backwards scan from a
> forwards scan, we would need to fallback to the non streaming read
> method. We could do this by just setting the TID queue size to 1
> (which is what I have currently implemented). Or we could add
> backwards scan support to the streaming read API.
> 

What do you mean by "support for backwards scans" in the streaming read
API? I imagined it naively as

1) drop all requests in the streaming read API queue

2) walk back all "future" requests in the TID queue

3) start prefetching as if from scratch

Maybe there's a way to optimize this and reuse some of the work more
efficiently, but my assumption is that the scan direction does not
change very often, and that we process many items in between.

> - mark and restore
> 
> Similar to the issue with switching the scan direction, mark and
> restore requires us to reset the TID queue and streaming read queue.
> For now, I've hacked in something to the PlannerInfo and Plan to set
> the TID queue size to

Re: Fix overflow hazard in interval rounding

2024-02-13 Thread Tom Lane

Joseph Koshakow  writes:
> Attached is a patch that fixes some overflow/underflow hazards that I
> discovered in the interval rounding code.

I think you need to use ereturn not ereport here; see other error
cases in AdjustIntervalForTypmod.

(We'd need ereport in back branches, but this problem seems to
me to probably not be worth back-patching.)

regards, tom lane

Fix overflow hazard in interval rounding

2024-02-13 Thread Joseph Koshakow

Hi all,

Attached is a patch that fixes some overflow/underflow hazards that I
discovered in the interval rounding code.

The lines look a bit long, but I did run the following before committing:
`$ curl https://buildfarm.postgresql.org/cgi-bin/typedefs.pl -o
src/tools/pgindent/typedefs.list && src/tools/pgindent/pgindent
src/backend/utils/adt/timestamp.c`

Thanks,
Joe Koshakow
From 389b0d1e3f3cca6fca1e45fdd202b1ca066326c2 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow 
Date: Tue, 13 Feb 2024 13:06:13 -0500
Subject: [PATCH] Fix overflow hazard in interval rounding

This commit fixes overflow/underflow hazards present in the interval
rounding code used to parse intervals.
---
 src/backend/utils/adt/timestamp.c  | 18 ++
 src/test/regress/expected/interval.out |  9 +
 src/test/regress/sql/interval.sql  |  5 +
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index c38f88dba7..a3b65a755f 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -1509,17 +1509,19 @@ AdjustIntervalForTypmod(Interval *interval, int32 typmod,
 
 			if (interval->time >= INT64CONST(0))
 			{
-interval->time = ((interval->time +
-   IntervalOffsets[precision]) /
-  IntervalScales[precision]) *
-	IntervalScales[precision];
+if (pg_add_s64_overflow(interval->time, IntervalOffsets[precision], &interval->time))
+	ereport(ERROR,
+			errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+			errmsg("interval out of range"));
+interval->time = (interval->time / IntervalScales[precision]) * IntervalScales[precision];
 			}
 			else
 			{
-interval->time = -(((-interval->time +
-	 IntervalOffsets[precision]) /
-	IntervalScales[precision]) *
-   IntervalScales[precision]);
+if (pg_sub_s64_overflow(IntervalOffsets[precision], interval->time, &interval->time))
+	ereport(ERROR,
+			errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+			errmsg("interval out of range"));
+interval->time = -((interval->time / IntervalScales[precision]) * IntervalScales[precision]);
 			}
 		}
 	}
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index b79b6fcd4d..055930ccac 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -929,6 +929,15 @@ SELECT interval '1 2:03:04.5678' minute to second(2);
  1 day 02:03:04.57
 (1 row)
 
+-- these should fail as out-of-range
+SELECT interval '2562047788:00:54.775807' SECOND(2);
+ERROR:  interval out of range
+LINE 1: SELECT interval '2562047788:00:54.775807' SECOND(2);
+^
+SELECT interval '-2562047788:00:54.775807' SECOND(2);
+ERROR:  interval out of range
+LINE 1: SELECT interval '-2562047788:00:54.775807' SECOND(2);
+^
 -- test casting to restricted precision (bug #14479)
 SELECT f1, f1::INTERVAL DAY TO MINUTE AS "minutes",
   (f1 + INTERVAL '1 month')::INTERVAL MONTH::INTERVAL YEAR AS "years"
diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql
index 5566ad0e51..838da2cc13 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -270,6 +270,11 @@ SELECT interval '1 2:03:04.5678' hour to second(2);
 SELECT interval '1 2.3456' minute to second(2);
 SELECT interval '1 2:03.5678' minute to second(2);
 SELECT interval '1 2:03:04.5678' minute to second(2);
+-- these should fail as out-of-range
+SELECT interval '2562047788:00:54.775807' SECOND(2);
+SELECT interval '-2562047788:00:54.775807' SECOND(2);
+
+
 
 -- test casting to restricted precision (bug #14479)
 SELECT f1, f1::INTERVAL DAY TO MINUTE AS "minutes",
-- 
2.34.1

Re: glibc qsort() vulnerability

2024-02-13 Thread Nathan Bossart

On Tue, Feb 13, 2024 at 09:43:18AM +0100, Mats Kindahl wrote:
> Maybe we should change to use the original version equivalent to the inline
> function above since that works better with surrounding code?

I don't think that's necessary.  We just need to be cognizant of it when
using inlined sorts, which are pretty rare at the moment.  Your patches
should still be a net improvement in many cases because most qsorts use a
function pointer to the comparator.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Feature request support MS Entra ID Authentication from On-premises PostreSQL server

2024-02-13 Thread Trevor Kohlman

Hi Andrew, Additionally info:
Thank you very much for your email.  Additionally info:
This is what I have been able to setup for the Azure Flexserver PostgreSQL:
[image: image.png]
And this is what I am trying to do:( just drew the bottom to diagrams)  So
that we have one way to log into for all users and or apps.
[image: image.png]
Flexserver PostgreSQL has an MS Extention for PostgreSQL that has
the pgaadauth extension which I think takes care of the login info.

On Sun, Feb 11, 2024 at 4:12 PM  wrote:

> Azure Postgres login authentication :
>
>
>
> This is how I do it for the Azure PostgreSQL, I will have to test to see
> if it will log in the same way, as I need to be able to get the token from
> Azure and pass that in as the password for the User/group account in the
> on-prem database.
>
>
>
> Thanks the link ,
>
> If anyone else has been able to authenticate on-prem PostgreSQL against
> Micorosft Entra ID and has the steps to do this that would also be good
> news.
>
>
>
> *From:* Andrew Dunstan 
> *Sent:* Sunday, February 11, 2024 8:02 AM
> *To:* rs.tr...@gmail.com; pgsql-hackers@lists.postgresql.org
> *Subject:* Re: Feature request support MS Entra ID Authentication from
> On-premises PostreSQL server
>
>
>
>
>
> On 2024-02-10 Sa 12:26, rs.tr...@gmail.com wrote:
>
> Hi all,
>
>
>
> Don’t know if I got this to the right group.
>
>
>
> Proposal Template For a New Feature
>
> One-line Summary:  Feature request Natively integration support Azure
> Microsoft Entra ID for authentication from On-premises PostreSQL server.
>
>
>
> Business Use-case: Explain the problem that you are trying to solve with
> the proposal.
>
> Using new Authentciation method (entra ID) vs Ldap method for On-Premises
> PostgreSQL server databases.
>
>
>
> User impact with the change:
>
> Trying to stream line accounts so we only have one place for Users and
> accounts, for onboarding
>
> and offboarding and our Echo system is starting to move to Azure, but we
> still have On-premises PostgresSQL servers.
>
>
>
> Our Security groups want us to use new Authentication methods and have
> integration into MS Entra ID.
>
>
>
> I know that I can from the Azure PostgreSQL log in with Azure Entra ID
> with psql.exe and pgAdmin 4 and have this working for the Azure PostgreSQl
> database.
>
> But have not found a way to do this with our On-premises PostgreSQL server
> databases.
>
> There may be a method for  already doing this but I have not found it, and
> I am very new to PostgreSQL.
>
>
>
>
>
> What is the difference between this and ActiveDirectory? AD is already
> usable as an authentication mechanism. See for example
> 
> 
>
>
>
> cheers
>
>
>
> andrew
>
> --
>
> Andrew Dunstan
>
> EDB: https://www.enterprisedb.com
>
>

Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-02-13 Thread David Benjamin

On Mon, Feb 12, 2024 at 9:38 AM Daniel Gustafsson  wrote:

> > On 11 Feb 2024, at 19:19, David Benjamin  wrote:
> > It turns out the parts that came from the OpenSSL socket BIO were a
> no-op, and in fact PostgreSQL is relying on it being a no-op. Instead, it's
> cleaner to just define a custom BIO the normal way, which then leaves the
> more standard BIO_get_data mechanism usable. This also avoids the risk that
> a future OpenSSL will add a now BIO_ctrl to the socket type, with libssl
> calling into it, and then break some assumptions made by PostgreSQL.
>
> +   case BIO_CTRL_FLUSH:
> +   /* libssl expects all BIOs to support BIO_flush. */
> +   res = 1;
> +   break;
>
> Will this always be true?  Returning 1 implies that we have flushed all
> data on
> the socket, but what if we just before called BIO_set_retry..XX()?
>

The real one is also just a no-op. :-)
https://github.com/openssl/openssl/blob/master/crypto/bio/bss_sock.c#L215

This is used in places like buffer BIO or the FILE* BIO, where BIO_write
might accept data, but stage it into a buffer, to be flushed later. libssl
ends up calling BIO_flush at the end of every flight, which in turn means
that BIOs used with libssl need to support it, even if to return true
because there's nothing to flush. (Arguably TCP sockets could have used a
flush concept, to help control Nagle's algorithm, but for better or worse,
that's a socket-wide TCP_NODELAY option, rather than an explicit flush
call.)

BIO_set_retry.. behaves like POSIX I/O, where a failed EWOULDBLOCK write is
as if you never wrote to the socket at all and doesn't impact socket state.
That is, the data hasn't been accepted yet. It's not expected for BIO_flush
to care about the rejected write data. (Also I don't believe libssl will
ever trigger this case.) It's confusing because unlike an EWOULDBLOCK
errno, BIO_set_retry.. *is* itself BIO state, but that's just because the
BIO calling convention is goofy and didn't just return the error out of the
return value. So OpenSSL just stashes the bit on the BIO itself, for you to
query out immediately afterwards.

> > I've attached a patch which does that. The existing SSL tests pass with
> it, tested on Debian stable. (Though it took me a few iterations to figure
> out how to run the SSL tests, so it's possible I've missed something.)
>
> We've done a fair bit of work on making them easier to run, so I'm curious
> if
> you saw any room for improvements there as someone coming to them for the
> first
> time?
>

 A lot of my time was just trying to figure out how to run the tests in the
first place, so perhaps documentation? But I may just have been looking in
the wrong spot and honestly didn't really know what I was doing. I can try
to summarize what I did (from memory), and perhaps that can point to
possible improvements?

- I looked in the repository for instructions on running the tests and
couldn't find any. At this point, I hadn't found src/test/README.
- I found
https://wiki.postgresql.org/wiki/Developer_FAQ#How_do_I_test_my_changes.3F,
linked from https://www.postgresql.org/developer/
- I ran the configure build with --enable-cassert, ran make check, tests
passed.
- I wrote my patch and then spent a while intentionally adding bugs to see
if the tests would catch it (I wasn't sure whether there was ssl test
coverage), finally concluding that I wasn't running any ssl tests
- I looked some more and found src/test/ssl/README
- I reconfigured with --enable-tap-tests and ran make check
PG_TEST_EXTRA=ssl per those instructions, but the SSL tests still weren't
running
- I grepped for PG_TEST_EXTRA and found references in the CI config, but
using the meson build
- I installed meson, mimicked a few commands from the CI. That seemed to
work.
- I tried running only the ssl tests, looking up how you specify individual
tests in meson, to make my compile/test cycles a bit faster, but they
failed.
- I noticed that the first couple "tests" were named like setup tasks, and
guessed that the ssl tests depended on this setup to run. But by then I
just gave up and waited out the whole test suite per run. :-)

Once I got it running, it was quite smooth. I just wasn't sure how to do it.

David

Re: [PATCH] Add native windows on arm64 support

2024-02-13 Thread Andres Freund

Hi,

On 2024-02-13 12:49:33 -0500, Dave Cramer wrote:
> > I think I might have been on to something - if my human emulation of a
> > preprocessor isn't wrong, we'd end up with
> >
> > #define S_UNLOCK(lock)  \
> > do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
> >
> > on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just
> > limits *compiler* level reordering, not CPU level reordering.  I think it's
> > even insufficient on x86[-64], but it's definitely insufficient on arm.
> >
> In fact ReadWriteBarrier has been deprecated _ReadWriteBarrier | Microsoft
> Learn
> 

I'd just ignore that, that's just pushing towards more modern stuff that's
more applicable to C++ than C.


> I did try using atomic_thread_fence as per atomic_thread_fence -
> cppreference.com
> 

The semantics of atomic_thread_fence are, uh, very odd.  I'd just use
MemoryBarrier().

Greetings,

Andres Freund

Re: [PATCH] Add native windows on arm64 support

2024-02-13 Thread Dave Cramer

Dave Cramer
www.postgres.rocks


On Mon, 12 Feb 2024 at 16:19, Andres Freund  wrote:

> Hi,
>
> On 2024-02-12 12:50:12 -0800, Andres Freund wrote:
> > On 2024-02-12 13:28:40 -0500, Andrew Dunstan wrote:
> > I wonder if this indicates that we are either missing memory barriers
> > somewhere or that the memory barriers we end up with on msvc + arm aren't
> > correct?  Either could explain why the problem doesn't occur when
> building
> > with optimizations.
>
> I think I might have been on to something - if my human emulation of a
> preprocessor isn't wrong, we'd end up with
>
> #define S_UNLOCK(lock)  \
> do { _ReadWriteBarrier(); (*(lock)) = 0; } while (0)
>
> on msvc + arm. And that's entirely insufficient - _ReadWriteBarrier() just
> limits *compiler* level reordering, not CPU level reordering.  I think it's
> even insufficient on x86[-64], but it's definitely insufficient on arm.
>
In fact ReadWriteBarrier has been deprecated _ReadWriteBarrier | Microsoft
Learn


I did try using atomic_thread_fence as per atomic_thread_fence -
cppreference.com



However for some reason including #include 
causes a bunch of compiler errors.

c:\Program Files\Microsoft Visual
Studio\2022\Community\VC\Tools\MSVC\14.38.33130\include\vcruntime_c11_stdatomic.h(36):
error C2061: syntax error: identifier 'atomic_bool'

Dave

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-02-13 Thread Bharath Rupireddy

On Tue, Feb 13, 2024 at 5:06 AM Andres Freund  wrote:
>
> I doubt there's a sane way to use WALRead() without *first* ensuring that the
> range of data is valid. I think we're better of moving that responsibility
> explicitly to the caller and adding an assertion verifying that.
>
> It doesn't really seem like a necessary, or even particularly useful,
> part. You couldn't just call WALRead() for that, since the caller would need
> to know the range up to which WAL is valid but not yet flushed as well. Thus
> the caller would need to first use WaitXLogInsertionsToFinish() or something
> like it anyway - and then there's no point in doing the WALRead() anymore.
>
> Note that for replicating unflushed data, we *still* might need to fall back
> to reading WAL data from disk. In which case not asserting in WALRead() would
> just make it hard to find bugs, because not using WaitXLogInsertionsToFinish()
> would appear to work as long as data is in wal buffers, but as soon as we'd
> fall back to on-disk (but unflushed) data, we'd send bogus WAL.

Callers of WALRead() do a good amount of work to figure out what's
been flushed out but they read the un-flushed and/or invalid data see
the comment [1] around WALRead() call sites as well as a recent thread
[2] for more details.

IIUC, here's the summary of the discussion that has happened so far:
a) If only replicating flushed data, then ensure all the WALRead()
callers read how much ever is valid out of startptr+count. Fix
provided in [2] can help do that.
b) If only replicating flushed data, then ensure all the
WALReadFromBuffers() callers read how much ever is valid out of
startptr+count. Current and expected WALReadFromBuffers() callers will
anyway determine how much of it is flushed and can validly be read.
c) If planning to replicate unflushed data, then ensure all the
WALRead() callers wait until startptr+count is past the current insert
position with WaitXLogInsertionsToFinish().
d) If planning to replicate unflushed data, then ensure all the
WALReadFromBuffers() callers wait until startptr+count is past the
current insert position with WaitXLogInsertionsToFinish().

Adding an assertion or error in WALReadFromBuffers() for ensuring the
callers do follow the above set of rules is easy. We can just do
Assert(startptr+count <= LogwrtResult.Flush).

However, adding a similar assertion or error in WALRead() gets
trickier as it's being called from many places - walsenders, backends,
external tools etc. even when the server is in recovery. Therefore,
determining the actual valid LSN is a bit of a challenge.

What I think is the best way:
- Try and get the fix provided for (a) at [2].
- Implement both (c) and (d).
- Have the assertion in WALReadFromBuffers() ensuring the callers wait
until startptr+count is past the current insert position with
WaitXLogInsertionsToFinish().
- Have a comment around WALRead() to ensure the callers are requesting
the WAL that's written to the disk because it's hard to determine
what's written to disk as this gets called in many scenarios - when
server is in recovery, for walsummarizer etc.
- In the new test module, demonstrate how one can implement reading
unflushed data with WALReadFromBuffers() and/or WALRead() +
WaitXLogInsertionsToFinish().

Thoughts?

[1]
/*
 * Even though we just determined how much of the page can be validly read
 * as 'count', read the whole page anyway. It's guaranteed to be
 * zero-padded up to the page boundary if it's incomplete.
 */
if (!WALRead(state, cur_page, targetPagePtr, XLOG_BLCKSZ, tli,
 &errinfo))

[2] 
https://www.postgresql.org/message-id/CALj2ACWBRFac2TingD3PE3w2EBHXUHY3%3DAEEZPJmqhpEOBGExg%40mail.gmail.com

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Document efficient self-joins / UPDATE LIMIT techniques.

2024-02-13 Thread Joel Jacobson

On Tue, Feb 13, 2024, at 10:28, Laurenz Albe wrote:
> On Mon, 2024-02-12 at 12:24 -0500, Corey Huinker wrote:
>> > Do you plan to add it to the commitfest?  If yes, I'd set it "ready for 
>> > committer".
>> 
>> Commitfest entry reanimated. 
>
> Truly... you created a revenant in the already closed commitfest.
>
> I closed that again and added a new entry in the open commitfest.
>
> Yours,
> Laurenz Albe

This thread reminded me of the old discussion "LIMIT for UPDATE and DELETE" 
from 2014 [1].

Back in 2014, it was considered a "fringe feature" by some. It is thought to be 
more commonplace today?

/Joel

[1] 
https://www.postgresql.org/message-id/flat/CADB9FDf-Vh6RnKAMZ4Rrg_YP9p3THdPbji8qe4qkxRuiOwm%3Dmg%40mail.gmail.com

Re: logical decoding and replication of sequences, take 2

2024-02-13 Thread Robert Haas

On Sun, Jan 28, 2024 at 1:07 AM Tomas Vondra
 wrote:
> Right, locks + apply in commit order gives us this guarantee (I can't
> think of a case where it wouldn't be the case).

I couldn't find any cases of inadequate locking other than the one I mentioned.

> Doesn't the whole logical replication critically depend on the commit
> order? If you decide to arbitrarily reorder/delay the transactions, all
> kinds of really bad things can happen. That's a generic problem, it
> applies to all kinds of objects, not just sequences - a parallel apply
> would need to detect this sort of dependencies (e.g. INSERT + DELETE of
> the same key), and do something about it.

Yes, but here I'm not just talking about the commit order. I'm talking
about the order of applying non-transactional operations relative to
commits.

Consider:

T1: CREATE SEQUENCE s;
T2: BEGIN;
T2: SELECT nextval('s');
T3: SELECT nextval('s');
T2: ALTER SEQUENCE s INCREMENT 2;
T2: SELECT nextval('s');
T2: COMMIT;

The commit order is T1 < T3 < T2, but T3 makes no transactional
changes, so the commit order is really just T1 < T2. But it's
completely wrong to say that all we need to do is apply T1 before we
apply T2. The correct order of application is:

1. T1.
2. T2's first nextval
3. T3's nextval
4. T2's transactional changes (i.e. the ALTER SEQUENCE INCREMENT and
the subsequent nextval)

In other words, the fact that some sequence changes are
non-transactional creates ordering hazards that don't exist if there
are no non-transactional changes. So in that way, sequences are
different from table modifications, where applying the transactions in
order of commit is all we need to do. Here we need to apply the
transactions in order of commit and also apply the non-transactional
changes at the right point in the sequence. Consider the following
alternative apply sequence:

1. T1.
2. T2's transactional changes (i.e. the ALTER SEQUENCE INCREMENT and
the subsequent nextval)
3. T3's nextval
4. T2's first nextval

That's still in commit order. It's also wrong.

Imagine that you commit this patch and someone later wants to do
parallel logical apply. So every time they finish decoding a
transaction, they stick it in a queue to be applied by the next
available worker. But, non-transactional changes are very simple, so
we just directly apply those in the main process. Well, kaboom! But
now this can happen with the above example.

1. Decode T1. Add to queue for apply.
2. Before the (idle) apply worker has a chance to pull T1 out of the
queue, decode the first nextval and try to apply it.

Oops. We're trying to apply a modification to a sequence that hasn't
been created yet. I'm not saying that this kind of hypothetical is a
reason not to commit the patch. But it seems like we're not on the
same page about what the ordering requirements are here. I'm just
making the argument that those non-transactional operations actually
act like mini-transactions. They need to happen at the right time
relative to the real transactions. A non-transactional operation needs
to be applied after any transactions that commit before it is logged,
and before any transactions that commit after it's logged.

> Yes, I think this is a bug in handling of owned sequences - from the
> moment the "ALTER TABLE ... SET UNLOGGED" is executed, the two sessions
> generate duplicate values (until the S1 is committed, at which point the
> values generated in S2 get "forgotten").
>
> It seems we end up updating both relfilenodes, which is clearly wrong.
>
> Seems like a bug independent of the decoding, IMO.

Yeah.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Synchronizing slots from primary to standby

2024-02-13 Thread Bertrand Drouvot

Hi,

On Tue, Feb 13, 2024 at 05:20:35PM +0530, Amit Kapila wrote:
> On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot
>  wrote:
> > - 84% of the slotsync.c code is covered, the parts that are not are mainly
> > related to "errors".
> >
> > Worth to try to extend the coverage? (I've in mind 731, 739, 766, 778, 786, 
> > 796,
> > 808)
> >
> 
> All these additional line numbers mentioned by you are ERROR paths. I
> think if we want we can easily cover most of those but I am not sure
> if there is a benefit to cover each error path.

Yeah, I think 731, 739 and one among the remaining ones mentioned up-thread 
should
be enough, thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Small fix on query_id_enabled

2024-02-13 Thread Julien Rouhaud

On Tue, Feb 13, 2024 at 05:28:32PM +0900, Michael Paquier wrote:
> On Tue, Feb 13, 2024 at 01:13:43AM +0900, Yugo NAGATA wrote:
> > I attached an updated patch that adds comments noting to use 
> > IsQueryIdEnabled()
> > instead of accessing the variables directly.
>
> Sounds good to me, thanks.

 +1!

Re: brininsert optimization opportunity

2024-02-13 Thread Tomas Vondra

On 1/8/24 16:51, Alvaro Herrera wrote:
> On 2023-Dec-12, Tomas Vondra wrote:
> 
>> I propose we do a much simpler thing instead - allow the cache may be
>> initialized / cleaned up repeatedly, and make sure it gets reset at
>> convenient place (typically after index_insert calls that don't go
>> through execIndexing). That'd mean the cleanup does not need to happen
>> very far from the index_insert(), which makes the reasoning much easier.
>> 0002 does this.
> 
> I'm not in love with this 0002 patch; I think the layering after 0001 is
> correct in that the insert_cleanup call should remain in validate_index
> and called after the whole thing is done, but 0002 changes things so
> that now every table AM has to worry about doing this correctly; and a
> bug of omission will not be detected unless you have a BRIN index on
> such a table and happen to use CREATE INDEX CONCURRENTLY.  So a
> developer has essentially zero chance to do things correctly, which I
> think we'd rather avoid.
> 

True. If the AM code does not need to worry about this kind of stuff,
that would be good / less error prone.

One thing that is not very clear to me is that I don't think there's a
very good way to determine which places need the cleanup call. Because
it depends on (a) what kind of index is used and (b) what happens in the
code called earlier (which may easily do arbitrary stuff). Which means
we have to call the cleanup whenever the code *might* have done inserts
into the index. Maybe it's not such an issue in practice, though.

> So I think we should do 0001 and perhaps some further tweaks to the
> original brininsert optimization commit: I think the aminsertcleanup
> callback should receive the indexRelation as first argument; and also I
> think it's not index_insert_cleanup() job to worry about whether
> ii_AmCache is NULL or not, but instead the callback should be invoked
> always, and then it's aminsertcleanup job to do nothing if ii_AmCache is
> NULL.  That way, the index AM API doesn't have to worry about which
> parts of IndexInfo (or the indexRelation) is aminsertcleanup going to
> care about.  If we decide to change this, then the docs also need a bit
> of tweaking I think.
> 

Yeah, passing the indexRelation to the am callback seems reasonable.
It's more consistent what we do for other callbacks, and perhaps the
callback might need the indexRelation.

I don't quite see why we should invoke the callback with ii_AmCache=NULL
though. If there's nothing cached, why bother? It just means all cleanup
callbacks have to do this NULL check on their own.

> Lastly, I kinda disagree with the notion that only some of the callers
> of aminsert should call aminsertcleanup, even though btree doesn't have
> an aminsertcleanup and thus it can't affect TOAST or catalogs.  Maybe we
> can turn index_insert_cleanup into an inline function, which can quickly
> do nothing if aminsertcleanup isn't defined.  Then we no longer have the
> layering violation where we assume that btree doesn't care.  But the
> proposed change in this paragraph can be maybe handled separately to
> avoid confusing things with the bugfix in the two paragraphs above.
> 

After thinking about this a bit more I agree with you - we should call
the cleanup from each place calling aminsert, even if it's for nbtree
(or other index types that don't require cleanup at the moment).

I wonder if there's a nice way to check this in assert-enabled builds?
Could we tweak nbtree (or index AM in general) to check that all places
that called aminsert also called aminsertcleanup?

For example, I was thinking we might add a flag to IndexInfo (separate
from the ii_AmCache), tracking if aminsert() was called, and then later
check the aminsertcleanup() got called too. The problem however is
there's no obviously convenient place for this check, because IndexInfo
is not freed explicitly ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Where can I find the doxyfile?

2024-02-13 Thread Bruce Momjian

On Mon, Feb  5, 2024 at 05:29:08PM +, John Morris wrote:
> >> It seems something you want to keep for your personal use. 
> 
> >> I think this filter is the kind of genious idea that it's OK if you
> 
> >> can do it at home
> 
> I don’t agree with your characterization.
> 
> The purpose of the filter is to bring existing Postgres comments into the
> doxygen output. While I haven’t done a full survey, the majority of Postgres
> code has comments describing functions, globals, macros and structure fields.
> 
> Currently, those comments are thrown away. They do not appear in the existing
> Doxygen output.

I have found it very strange that a tool like doxygen which can create
all sorts of call graphs, just ignores some comments.  The comments
above function are very important.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: When extended query protocol ends?

2024-02-13 Thread Dave Cramer

HI Tatsuo,



On Mon, 29 Jan 2024 at 20:15, Tatsuo Ishii  wrote:

> Hello Dave,
>
> > Tatsuo Ishii  writes:
> >> Below is outputs from "pgproto" command coming with Pgpool-II.
> >> (Lines starting "FE" represents a message from frontend to backend.
> >> Lines starting "BE" represents a message from backend to frontend.)
> >
> >> FE=> Parse(stmt="", query="SET extra_float_digits = 3")
> >> FE=> Bind(stmt="", portal="")
> >> FE=> Execute(portal="")
> >> FE=> Parse(stmt="", query="SET extra_float_digits = 3")
> >> FE=> Bind(stmt="", portal="")
> >> FE=> Execute(portal="")
> >> FE=> Query (query="SET extra_float_digits = 3")
> >> <= BE ParseComplete
> >> <= BE BindComplete
> >> <= BE CommandComplete(SET)
> >> <= BE ParseComplete
> >> <= BE BindComplete
> >> <= BE CommandComplete(SET)
> >> <= BE CommandComplete(SET)
> >> <= BE ReadyForQuery(I)
> >> FE=> Terminate
> >
> >> As you can see, two "SET extra_float_digits = 3" is sent in the
> >> extended query protocol, then one "SET extra_float_digits = 3" follows
> >> in the simple query protocol. No sync message is sent. However, I get
> >> ReadyForQuery at the end. It seems the extended query protocol is
> >> ended by a simple query protocol message instead of a sync message.
> >
> >> My question is, is this legal in terms of fronted/backend protocol?
> >
> > I think it's poor practice, at best.  You should end the
> > extended-protocol query cycle before invoking simple query.
>
> From [1] I think the JDBC driver sends something like below if
> autosave=always option is specified.
>
> "BEGIN READ ONLY" Parse/Bind/Eexecute (in the extended query protocol)
> "SAVEPOINT PGJDBC_AUTOSAVE" (in the simple query protocol)
>
> It seems the SAVEPOINT is sent without finishing the extended query
> protocol (i.e. without Sync message). Is it possible for the JDBC
> driver to issue a Sync message before sending SAVEPOINT in simple
> query protocol? Or you can send SAVEPOINT using the extended query
> protocol.
>
> [1]
> https://www.pgpool.net/pipermail/pgpool-general/2023-December/009051.html


Can you ask the OP what version of the driver they are using. From what I
can tell we send BEGIN using SimpleQuery.

Dave

Re: DSA_ALLOC_NO_OOM doesn't work

2024-02-13 Thread Heikki Linnakangas


(moving to pgsql-hackers)

On 29/01/2024 14:06, Heikki Linnakangas wrote:

If you call dsa_allocate_extended(DSA_ALLOC_NO_OOM), it will still
ereport an error if you run out of space (originally reported at [0]).

Attached patch adds code to test_dsa.c to demonstrate that:

postgres=# select test_dsa_basic();
ERROR:  could not resize shared memory segment "/PostgreSQL.1312700148"
to 1075843072 bytes: No space left on device

[0] https://github.com/pgvector/pgvector/issues/434#issuecomment-1912744489


I wrote the attached patch to address this, in a fairly straightforward 
or naive way. The problem was that even though dsa_allocate() had code 
to check the return value of dsm_create(), and return NULL instead of 
ereport(ERROR) if the DSA_ALLOC_NO_OOM was set, dsm_create() does not in 
fact return NULL on OOM. To fix, I added a DSM_CREATE_NO_OOM option to 
dsm_create(), and I also had to punch that through to dsm_impl_op().


This is a little unpolished, but if we want to backpatch something 
narrow now, this would probably be the right approach.


However, I must say that the dsm_impl_op() interface is absolutely 
insane. It's like someone looked at ioctl() and thought, "hey that's a 
great idea!". It mixes all different operations like creating or 
destroying a DSM segment together into one function call, and the return 
value is just a boolean, even though the function could fail for many 
different reasons, and the callers do in fact care about the reason. In 
a more natural interface, the different operations would have very 
different function signatures.


I think we must refactor that. It might be best to leave this 
DSA_ALLOC_NO_OOM bug unfixed in backpatches, and fix it on top of the 
refactorings on master only. Later, we can backpatch the refactorings 
too if we're comfortable with it; extensions shouldn't be using the 
dsm_impl_op() interface directly.


(I skimmed through the thread where the DSM code was added, but didn't 
see any mention of why dsm_impl_op() signature is like that: 
https://www.postgresql.org/message-id/CA%2BTgmoaDqDUgt%3D4Zs_QPOnBt%3DEstEaVNP%2B5t%2Bm%3DFPNWshiPR3A%40mail.gmail.com)


--
Heikki Linnakangas
Neon (https://neon.tech)
From b60f877ad67e70b60915b2b25d0dbae6972c3536 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 13 Feb 2024 14:22:10 +0200
Subject: [PATCH 1/2] Add test

Discussion: https://www.postgresql.org/message-id/5efa4a5e-2b8b-42dd-80ed-f920718cf...@iki.fi
---
 src/test/modules/test_dsa/test_dsa.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/test/modules/test_dsa/test_dsa.c b/src/test/modules/test_dsa/test_dsa.c
index 844316dec2b..f37eb57e99d 100644
--- a/src/test/modules/test_dsa/test_dsa.c
+++ b/src/test/modules/test_dsa/test_dsa.c
@@ -51,6 +51,19 @@ test_dsa_basic(PG_FUNCTION_ARGS)
 	for (int i = 0; i < 100; i++)
 	{
 		dsa_free(a, p[i]);
+		p[i] = InvalidDsaPointer;
+	}
+
+	for (int i = 0; i < 100; i++)
+		p[i] = dsa_allocate_extended(a, 1024*1024*1024, DSA_ALLOC_NO_OOM | DSA_ALLOC_HUGE);
+
+	for (int i = 0; i < 100; i++)
+	{
+		if (p[i] != InvalidDsaPointer)
+		{
+			dsa_free(a, p[i]);
+			p[i] = InvalidDsaPointer;
+		}
 	}
 
 	dsa_detach(a);
-- 
2.39.2

From fb20d55111d726dca247b2111a175f30d412e35a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 13 Feb 2024 15:52:48 +0200
Subject: [PATCH 2/2] Fix DSA_ALLOC_NO_OOM

---
 src/backend/access/common/session.c   |  2 +-
 src/backend/access/transam/parallel.c |  2 +-
 src/backend/storage/ipc/dsm.c | 43 +-
 src/backend/storage/ipc/dsm_impl.c| 84 ---
 src/backend/utils/mmgr/dsa.c  |  2 +-
 src/include/storage/dsm.h |  1 +
 src/include/storage/dsm_impl.h|  4 +-
 7 files changed, 97 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/common/session.c b/src/backend/access/common/session.c
index 3f2256f4915..5d4fe6dbb7a 100644
--- a/src/backend/access/common/session.c
+++ b/src/backend/access/common/session.c
@@ -102,7 +102,7 @@ GetSessionDsmHandle(void)
 
 	/* Set up segment and TOC. */
 	size = shm_toc_estimate(&estimator);
-	seg = dsm_create(size, DSM_CREATE_NULL_IF_MAXSEGMENTS);
+	seg = dsm_create(size, DSM_CREATE_NULL_IF_MAXSEGMENTS | DSM_CREATE_NO_OOM);
 	if (seg == NULL)
 	{
 		MemoryContextSwitchTo(old_context);
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 849a03e4b65..df1e5e7145d 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -312,7 +312,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
 	 */
 	segsize = shm_toc_estimate(&pcxt->estimator);
 	if (pcxt->nworkers > 0)
-		pcxt->seg = dsm_create(segsize, DSM_CREATE_NULL_IF_MAXSEGMENTS);
+		pcxt->seg = dsm_create(segsize, DSM_CREATE_NULL_IF_MAXSEGMENTS | DSM_CREATE_NO_OOM);
 	if (pcxt->seg != NULL)
 		pcxt->toc = shm_toc_create(PARALLEL_MAGIC,
    dsm_segment_address(pcxt->seg),
diff --git a/src/backend/storage/

Re: Printing backtrace of postgres processes

2024-02-13 Thread Ashutosh Bapat

On Mon, Feb 12, 2024 at 6:52 AM Ashutosh Bapat
 wrote:
>
> >
> > > We defer actual action triggered by a signal till CHECK_FOR_INTERRUPTS
> > > is called. I understand that we can't do that here since we want to
> > > capture the backtrace at that moment and can't wait till next CFI. But
> > > printing the backend can surely wait till next CFI right?
> >
> > Delaying the call of backtrace() to happen during a CFI() would be
> > safe, yes, and writing data to stderr would not really be an issue as
> > at least the data would be sent somewhere.  That's less useful, but
> > we do that for memory contexts.
>
> Memory contexts do not change more or less till next CFI, but stack
> traces do. So I am not sure whether it is desirable to wait to capture
> backtrace till next CFI. Given that the user can not time a call to
> pg_log_backend() exactly, so whether it captures the backtrace exactly
> at when interrupt happens or at the next CFI may not matter much in
> practice.
>

Thinking more about this I have following thoughts/questions:

1. Whether getting a backtrace at CFI is good enough?
Backtrace is required to know what a process is doing when it's stuck
or is behaviour unexpected etc. PostgreSQL code has CFIs sprinkled in
almost all the tight loops. Whether those places are enough to cover
most of the cases that the user of this feature would care about?

2. tools like gdb, strace can be used to get the stack trace of any
process, so do we really need this tool?
Most of the OSes provide such tools but may be there are useful in
kubernetes like environment, I am not sure.

3. tools like gdb and strace are able to capture stack trace at any
point during execution. Can we use the same mechanism instead of
relying on CFI?

4. tools like gdb and strace can capture more than just stack trace
e.g. variable values, values of registers etc. Are we planning to add
those facilities as well? OR whether this feature will be useful
without those facilities?

May the feature be more useful if it can provide PostgreSQL specific
details which an external tool can not.

-- 
Best Wishes,
Ashutosh Bapat

Re: recently added jsonpath method change jsonb_path_query, jsonb_path_query_first immutability

2024-02-13 Thread Jeevan Chalke

On Sat, Feb 10, 2024 at 10:55 PM Andrew Dunstan  wrote:

>
> On 2024-02-08 Th 21:02, Jeevan Chalke wrote:
>
>
>
> On Thu, Feb 8, 2024 at 2:22 PM jian he 
> wrote:
>
>> On Thu, Feb 8, 2024 at 1:27 PM Jeevan Chalke
>>  wrote:
>> >
>> >
>> >
>> > On Wed, Feb 7, 2024 at 9:13 PM jian he 
>> wrote:
>> >>
>> >> On Wed, Feb 7, 2024 at 7:36 PM Jeevan Chalke
>> >>  wrote:
>> >> > Added checkTimezoneIsUsedForCast() check where ever we are casting
>> timezoned to non-timezoned types and vice-versa.
>> >>
>> >> https://www.postgresql.org/docs/devel/functions-json.html
>> >> above Table 9.51. jsonpath Filter Expression Elements, the Note
>> >> section, do we also need to rephrase it?
>> >
>> >
>> > OK. Added a line for the same.
>> >
>>
>> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
>> index 6788ba8..37ae2d1 100644
>> --- a/doc/src/sgml/func.sgml
>> +++ b/doc/src/sgml/func.sgml
>> @@ -18240,7 +18240,11 @@ ERROR:  jsonpath member accessor can only be
>> applied to an object
>>timestamptz, and time to
>> timetz.
>>However, all but the first of these conversions depend on the
>> current
>> setting, and thus can only be
>> performed
>> -  within timezone-aware jsonpath functions.
>> +  within timezone-aware jsonpath functions.  Similarly,
>> other
>> +  date/time-related methods that convert string to the date/time
>> types
>> +  also do the casting and may involve the current
>> +  .  To preserve the immutability,
>> those can
>> +  only be performed within timezone-aware jsonpath
>> functions.
>>   
>>  
>>
>> my proposed minor changes:
>> -  within timezone-aware jsonpath functions.
>> +  within timezone-aware jsonpath functions. Similarly,
>> other
>> +  date/time-related methods that convert string to the date/time
>> types
>> +  also do the casting and may involve the current
>> +   setting. Those conversions can
>> +  only be performed within timezone-aware jsonpath
>> functions.
>> I don't have a strong opinion, though.
>>
>
> That seems fine as well. Let's leave that to the committer.
>
> I edited slightly to my taste, and committed the patch. Thanks.
>

Thank you, Andrew and Jian.


>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>

-- 
Jeevan Chalke

*Principal, ManagerProduct Development*



edbpostgres.com

RE: speed up a logical replica setup

2024-02-13 Thread Hayato Kuroda (Fujitsu)

Dear Shubham,

Thanks for testing the patch!

> 
> I tried verifying few scenarios by using 5 databases and came across
> the following errors:
> 
> ./pg_createsubscriber -D ../new_standby -P "host=localhost port=5432
> dbname=postgres" -S "host=localhost port=9000 dbname=postgres"  -d db1
> -d db2 -d db3 -d db4 -d db5
> 
> pg_createsubscriber: error: publisher requires 6 wal sender
> processes, but only 5 remain
> pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 7.
> 
> It is successful only with 7 wal senders, so we can change error
> messages accordingly.
> 
> 
> pg_createsubscriber: error: publisher requires 6 replication slots,
> but only 5 remain
> pg_createsubscriber: hint: Consider increasing max_replication_slots
> to at least 7.
> 
> It is successful only with 7 replication slots, so we can change error
> messages accordingly.

I'm not a original author but I don't think it is needed. The hint message has
already suggested you to change to 7. According to the doc [1],  the primary
message should be factual and hint message should be used for suggestions. I 
felt
current code followed the style. Thought?

New patch is available in [2].

[1]: https://www.postgresql.org/docs/devel/error-style-guide.html
[2]: 
https://www.postgresql.org/message-id/TYCPR01MB12077A6BB424A025F04A8243DF54F2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

RE: speed up a logical replica setup

2024-02-13 Thread Hayato Kuroda (Fujitsu)

Dear hackers,

I've replied for trackability.

> Further comments for v17.
> 
> 01.
> This program assumes that the target server has same major version with this.
> Because the target server would be restarted by same version's pg_ctl command.
> I felt it should be ensured by reading the PG_VERSION.

Still investigating.

> 02.
> pg_upgrade checked the version of using executables, like pg_ctl, postgres, 
> and
> pg_resetwal. I felt it should be as well.

Still investigating.

> 03. get_bin_directory
> ```
>   if (find_my_exec(path, full_path) < 0)
>   {
>   pg_log_error("The program \"%s\" is needed by %s but was not
> found in the\n"
>"same directory as \"%s\".\n",
>"pg_ctl", progname, full_path);
> ```
> 
> s/"pg_ctl"/progname

The message was updated.

> 04.
> Missing canonicalize_path()?

I found find_my_exec() calls canonicalize_path(). No need to do.

> 05.
> Assuming that the target server is a cascade standby, i.e., it has a role as
> another primary. In this case, I thought the child node would not work. 
> Because
> pg_createsubcriber runs pg_resetwal and all WAL files would be discarded at 
> that
> time. I have not tested, but should the program detect it and exit earlier?

Still investigating.

> 06.
> wait_for_end_recovery() waits forever even if the standby has been 
> disconnected
> from the primary, right? should we check the status of the replication via
> pg_stat_wal_receiver?

Still investigating.

> 07.
> The cleanup function has couple of bugs.
> 
> * If subscriptions have been created on the database, the function also tries 
> to
>   drop a publication. But it leads an ERROR because it has been already 
> dropped.
>   See setup_subscriber().
> * If the subscription has been created, drop_replication_slot() leads an 
> ERROR.
>   Because the subscriber tried to drop the subscription while executing DROP
> SUBSCRIPTION.

Only drop_publication() was removed.

> 08.
> I found that all messages (ERROR, WARNING, INFO, etc...) would output to 
> stderr,
> but I felt it should be on stdout. Is there a reason? pg_dump outputs 
> messages to
> stderr, but the motivation might be to avoid confusion with dumps.

Still investigating.

> 09.
> I'm not sure the cleanup for subscriber is really needed. Assuming that there
> are two databases, e.g., pg1 pg2 , and we fail to create a subscription on 
> pg2.
> This can happen when the subscription which has the same name has been
> already
> created on the primary server.
> In this case a subscirption pn pg1 would be removed. But what is a next step?
> Since a timelineID on the standby server is larger than the primary (note that
> the standby has been promoted once), we cannot resume the physical replication
> as-is. IIUC the easiest method to retry is removing a cluster once and 
> restarting
> from pg_basebackup. If so, no need to cleanup the standby because it is
> corrupted.
> We just say "Please remove the cluster and recreate again".

I still think it should be, but not done yet.

New patch can be available in [1].

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB12077A6BB424A025F04A8243DF54F2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

RE: speed up a logical replica setup

2024-02-13 Thread Hayato Kuroda (Fujitsu)

Dear Vignesh,

Since the original author seems bit busy, I updated the patch set.

> 1) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_replslot = false; should be there else there will be an
> error during  drop replication slot cleanup in error flow:
> +static void
> +drop_replication_slot(PGconn *conn, LogicalRepInfo *dbinfo, const
> char *slot_name)
> +{
> +   PQExpBuffer str = createPQExpBuffer();
> +   PGresult   *res;
> +
> +   Assert(conn != NULL);
> +
> +   pg_log_info("dropping the replication slot \"%s\" on database
> \"%s\"", slot_name, dbinfo->dbname);
> +
> +   appendPQExpBuffer(str, "SELECT
> pg_drop_replication_slot('%s')", slot_name);
> +
> +   pg_log_debug("command is: %s", str->data);

Fixed.

> 2) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_publication = false; should be there else there will be
> an error during drop publication cleanup in error flow:
> +/*
> + * Remove publication if it couldn't finish all steps.
> + */
> +static void
> +drop_publication(PGconn *conn, LogicalRepInfo *dbinfo)
> +{
> +   PQExpBuffer str = createPQExpBuffer();
> +   PGresult   *res;
> +
> +   Assert(conn != NULL);
> +
> +   pg_log_info("dropping publication \"%s\" on database \"%s\"",
> dbinfo->pubname, dbinfo->dbname);
> +
> +   appendPQExpBuffer(str, "DROP PUBLICATION %s", dbinfo->pubname);
> +
> +   pg_log_debug("command is: %s", str->data);
> 
> 3) Cleanup function handler flag should be reset, i.e.
> dbinfo->made_subscription = false; should be there else there will be
> an error during drop publication cleanup in error flow:
> +/*
> + * Remove subscription if it couldn't finish all steps.
> + */
> +static void
> +drop_subscription(PGconn *conn, LogicalRepInfo *dbinfo)
> +{
> +   PQExpBuffer str = createPQExpBuffer();
> +   PGresult   *res;
> +
> +   Assert(conn != NULL);
> +
> +   pg_log_info("dropping subscription \"%s\" on database \"%s\"",
> dbinfo->subname, dbinfo->dbname);
> +
> +   appendPQExpBuffer(str, "DROP SUBSCRIPTION %s",
> dbinfo->subname);
> +
> +   pg_log_debug("command is: %s", str->data);

Fixed.

> 4) I was not sure if drop_publication is required here, as we will not
> create any publication in subscriber node:
> +   if (dbinfo[i].made_subscription)
> +   {
> +   conn = connect_database(dbinfo[i].subconninfo);
> +   if (conn != NULL)
> +   {
> +   drop_subscription(conn, &dbinfo[i]);
> +   if (dbinfo[i].made_publication)
> +   drop_publication(conn, &dbinfo[i]);
> +   disconnect_database(conn);
> +   }
> +   }

Removed. But I'm not sure the cleanup is really meaningful.
See [1].

> 5) The connection should be disconnected in case of error case:
> +   /* secure search_path */
> +   res = PQexec(conn, ALWAYS_SECURE_SEARCH_PATH_SQL);
> +   if (PQresultStatus(res) != PGRES_TUPLES_OK)
> +   {
> +   pg_log_error("could not clear search_path: %s",
> PQresultErrorMessage(res));
> +   return NULL;
> +   }
> +   PQclear(res);

PQfisnih() was added.

> 6) There should be a line break before postgres_fe inclusion, to keep
> it consistent:
> + *-
> + */
> +#include "postgres_fe.h"
> +
> +#include 

Added.

> 7) These includes are not required:
> 7.a) #include 
> 7.b) #include 
> 7.c) #include 
> 7.d) #include "access/xlogdefs.h"
> 7.e) #include "catalog/pg_control.h"
> 7.f) #include "common/file_utils.h"
> 7.g) #include "utils/pidfile.h"

Removed.

> + * src/bin/pg_basebackup/pg_createsubscriber.c
> + *
> + *-
> + */
> +#include "postgres_fe.h"
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "access/xlogdefs.h"
> +#include "catalog/pg_authid_d.h"
> +#include "catalog/pg_control.h"
> +#include "common/connect.h"
> +#include "common/controldata_utils.h"
> +#include "common/file_perm.h"
> +#include "common/file_utils.h"
> +#include "common/logging.h"
> +#include "common/restricted_token.h"
> +#include "fe_utils/recovery_gen.h"
> +#include "fe_utils/simple_list.h"
> +#include "getopt_long.h"
> +#include "utils/pidfile.h"
> 
> 8) POSTMASTER_STANDBY and POSTMASTER_FAILED are not being used, is it
> required or kept for future purpose:
> +enum WaitPMResult
> +{
> +   POSTMASTER_READY,
> +   POSTMASTER_STANDBY,
> +   POSTMASTER_STILL_STARTING,
> +   POSTMASTER_FAILED
> +};

I think they are here because WaitPMResult is just ported from pg_ctl.c.
I renamed the enumeration and removed non-necessary attributes.

> 9) pg_createsubscriber should be kept after pg_basebackup to maintain
> the consistent order:
>

Re: table inheritance versus column compression and storage settings

2024-02-13 Thread Ashutosh Bapat

On Mon, Feb 12, 2024 at 8:48 PM Peter Eisentraut  wrote:
>
> On 08.02.24 08:20, Ashutosh Bapat wrote:
> > On Wed, Feb 7, 2024 at 12:47 PM Ashutosh Bapat
> >  wrote:
> >
> >> 0001 fixes compression inheritance
> >> 0002 fixes storage inheritance
> >>
> >
> > The first patch does not update compression_1.out which makes CI
> > unhappy. Here's patchset fixing that.
>
> The changed behavior looks good to me.  The tests are good, the code
> changes are pretty straightforward.
>
> Did you by any change check that pg_dump dumps the resulting structures
> correctly?  I notice in tablecmds.c that ALTER COLUMN SET STORAGE
> recurses but ALTER COLUMN SET COMPRESSION does not.  I don't understand
> why that is, and I wonder whether it affects pg_dump.
>

I used src/bin/pg_upgrade/t/002_pg_upgrade.pl to test dump and restore
by leaving back the new objects created in compression.sql and
inherit.sql.

COMPRESSION is set using ALTER TABLE ONLY so it affects only the
parent and should not propagate to children. A child inherits the
parent first and then changes compression property. For example
```
CREATE TABLE public.cmparent1 (
f1 text
);
ALTER TABLE ONLY public.cmparent1 ALTER COLUMN f1 SET COMPRESSION pglz;

CREATE TABLE public.cminh1 (
f1 text
)
INHERITS (public.cmparent1);
ALTER TABLE ONLY public.cminh1 ALTER COLUMN f1 SET COMPRESSION lz4;
```

Same is true with the STORAGE parameter. Example
```
CREATE TABLE public.stparent1 (
a text
);
ALTER TABLE ONLY public.stparent1 ALTER COLUMN a SET STORAGE PLAIN;

CREATE TABLE public.stchild1 (
a text
)
INHERITS (public.stparent1);
ALTER TABLE ONLY public.stchild1 ALTER COLUMN a SET STORAGE PLAIN;
```

I don't think pg_dump would be affected by the difference you noted.

-- 
Best Wishes,
Ashutosh Bapat

RE: Synchronizing slots from primary to standby

2024-02-13 Thread Zhijie Hou (Fujitsu)

On Tuesday, February 13, 2024 7:30 PM Bertrand Drouvot 
 wrote:
> 
> On Tue, Feb 13, 2024 at 04:08:23AM +, Zhijie Hou (Fujitsu) wrote:
> > On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu)
>  wrote:
> > >
> > > Here is the new version patch which addressed above and most of
> > > Bertrand's comments.
> > >
> > > TODO: trying to add one test for the case the slot is valid on
> > > primary while the synced slots is invalidated on the standby.
> >
> > Here is the V85_2 patch set that added the test and fixed one typo,
> > there are no other code changes.
> 
> Thanks!
> 
> Out of curiosity I ran a code coverage and the result for slotsync.c can be 
> found
> in [1].
> 
> It appears that:
> 
> - only one function is not covered (slotsync_failure_callback()).

Thanks for the test ! I think slotsync_failure_callback can be covered easier 
in the
next slotsync worker patch on worker exit, I will post that after rebasing.

Best Regards,
Hou zj

RE: Synchronizing slots from primary to standby

2024-02-13 Thread Zhijie Hou (Fujitsu)

On Tuesday, February 13, 2024 2:51 PM Amit Kapila  
wrote:
> 
> On Tue, Feb 13, 2024 at 9:38 AM Zhijie Hou (Fujitsu) 
> wrote:
> >
> > Here is the V85_2 patch set that added the test and fixed one typo,
> > there are no other code changes.
> >
> 
> Few comments on the latest changes:

Thanks for the comments.

> ==
> 1.
> +# Confirm that the invalidated slot has been dropped.
> +$standby1->wait_for_log(qr/dropped replication slot "lsub1_slot" of
> +dbid 5/,  $log_offset);
> 
> Is it okay to hardcode dbid 5? I am a bit worried that it can lead to 
> instability in
> the test.
> 
> 2.
> +check_primary_info(WalReceiverConn *wrconn, int elevel) {
> ..
> + bool primary_info_valid;
> 
> I don't think for 0001, we need an elevel as an argument, so let's remove it.
> Additionally, can we change the variable name primary_info_valid to
> primary_slot_valid? Also, can we change the function name to
> validate_remote_info() as the remote can be both primary or standby?
> 
> 3.
> +SyncReplicationSlots(WalReceiverConn *wrconn) {
> +PG_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
> +PointerGetDatum(wrconn));  {  check_primary_info(wrconn, ERROR);
> +
> + synchronize_slots(wrconn);
> + }
> + PG_END_ENSURE_ERROR_CLEANUP(slotsync_failure_callback,
> PointerGetDatum(wrconn));
> +
> + walrcv_disconnect(wrconn);
> 
> It is better to disconnect in the caller where we have made the connection.

All above comments look good to me.
Here is the V86 patch that addressed above. This version also includes some
other minor changes:

1. Added few comments for the temporary slot creation and XLogGetOldestSegno.
2. Adjusted the doc for the SQL function.
3. Reordered two error messages in slot create function.
4. Fixed few typos.

Thanks Shveta for off-list discussions.

Best Regards,
Hou zj


v86-0001-Add-a-slot-synchronization-function.patch
Description: v86-0001-Add-a-slot-synchronization-function.patch

Re: meson vs Cygwin

2024-02-13 Thread Marco Atzeri


On 22/11/2023 13:02, Andrew Dunstan wrote:


I've been trying to get meson working with Cygwin. On a fresh install 
(Cygwin 3.4.9, gcc 11.4.0, meson 1.0.2, ninja 1.11.1) I get a bunch of 
errors like this:


ERROR:  incompatible library 
"/home/andrew/bf/buildroot/HEAD/pgsql.build/tmp_install/home/andrew/bf/buildroot/HEAD/inst/lib/postgresql/plperl.dll": missing magic block


Similar things happen if I try to build with python.

I'm not getting the same on a configure/make build. Not sure what would 
be different.



cheers


andrew



Hi Andrew,
sorry for jumping on this request so late

how are you configuring the build ?

Marco Atzeri
Postgresql package manager for Cygwin

Re: Synchronizing slots from primary to standby

2024-02-13 Thread Amit Kapila

On Tue, Feb 13, 2024 at 4:59 PM Bertrand Drouvot
 wrote:
>
> On Tue, Feb 13, 2024 at 04:08:23AM +, Zhijie Hou (Fujitsu) wrote:
> > On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu) 
> >  wrote:
> > >
> > > Here is the new version patch which addressed above and most of Bertrand's
> > > comments.
> > >
> > > TODO: trying to add one test for the case the slot is valid on primary 
> > > while the
> > > synced slots is invalidated on the standby.
> >
> > Here is the V85_2 patch set that added the test and fixed one typo,
> > there are no other code changes.
>
> Thanks!
>
> Out of curiosity I ran a code coverage and the result for slotsync.c can be
> found in [1].
>
> It appears that:
>
> - only one function is not covered (slotsync_failure_callback()).
> - 84% of the slotsync.c code is covered, the parts that are not are mainly
> related to "errors".
>
> Worth to try to extend the coverage? (I've in mind 731, 739, 766, 778, 786, 
> 796,
> 808)
>

All these additional line numbers mentioned by you are ERROR paths. I
think if we want we can easily cover most of those but I am not sure
if there is a benefit to cover each error path.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2024-02-13 Thread Bertrand Drouvot

Hi,

On Tue, Feb 13, 2024 at 04:08:23AM +, Zhijie Hou (Fujitsu) wrote:
> On Tuesday, February 13, 2024 9:16 AM Zhijie Hou (Fujitsu) 
>  wrote:
> > 
> > Here is the new version patch which addressed above and most of Bertrand's
> > comments.
> > 
> > TODO: trying to add one test for the case the slot is valid on primary 
> > while the
> > synced slots is invalidated on the standby.
> 
> Here is the V85_2 patch set that added the test and fixed one typo,
> there are no other code changes.

Thanks!

Out of curiosity I ran a code coverage and the result for slotsync.c can be
found in [1].

It appears that:

- only one function is not covered (slotsync_failure_callback()).
- 84% of the slotsync.c code is covered, the parts that are not are mainly
related to "errors".

Worth to try to extend the coverage? (I've in mind 731, 739, 766, 778, 786, 796,
808)

[1]: 
https://htmlpreview.github.io/?https://raw.githubusercontent.com/bdrouvot/pg_code_coverage/main/src/backend/replication/logical/slotsync.c.gcov.html

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Add sortsupport for range types and btree_gist

2024-02-13 Thread Bernd Helmle

Am Montag, dem 12.02.2024 um 21:00 +0800 schrieb jian he:
> + 
> +  Per default btree_gist builts
> GiST indexe with
> +  sortsupport in sorted
> mode. This usually results in a
> +  much better index quality and smaller index sizes by much faster
> index built speed. It is still
> +  possible to revert to buffered built strategy by using the
> buffering parameter
> +  when creating the index.
> + 
> +
> I believe `built` |`builts` should be `build`.

Right, Fixed.

> Also
> maybe we can simply copy some texts from
> https://www.postgresql.org/docs/current/gist-implementation.html.
> how about the following:
>   
>    The sorted method is only available if each of the opclasses used
> by the
>    index provides a sortsupport function, as
> described
>    in .  If they do, this method
> is
>    usually the best, so it is used by default.
>   It is still possible to change to a buffered build strategy by
> using
> the buffering parameter
>   to the CREATE INDEX command.
>   

Hmm not sure what you are trying to achieve with this? The opclasses in
btree_gist provides sortsupport, but by reading the above i would get
the impression they're still optional.

> 
> you've changed contrib/btree_gist/meson.build, seems we also need to
> change contrib/btree_gist/Makefile
> 

Oh, good catch. I'm so focused on meson already that i totally forgot
the good old Makefile. Fixed.

> gist_point_sortsupport have `if (ssup->abbreviate)`,  does
> range_gist_sortsupport also this part?
> I think the `if(ssup->abbreviate)` part is optional?
> Can we add some comments on it?

I've thought about abbreviated keys support but put that aside for
later. I wanted to focus on general sortsupport first before getting my
hands on it and so postponed it for another round.

If we agree that this patch needs support for abbreviated keys now, i
certainly can work on it.

Thanks for your review,

Bernd

diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index 7ac2df26c10..68190ac5e46 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -34,7 +34,7 @@ DATA = btree_gist--1.0--1.1.sql \
btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \
btree_gist--1.3--1.4.sql btree_gist--1.4--1.5.sql \
btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql \
-   btree_gist--1.7--1.8.sql
+   btree_gist--1.7--1.8.sql btree_gist--1.8--1.9.sql
 PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes"
 
 REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \
diff --git a/contrib/btree_gist/btree_bit.c b/contrib/btree_gist/btree_bit.c
index 6790f22b4b6..67ea5d9e182 100644
--- a/contrib/btree_gist/btree_bit.c
+++ b/contrib/btree_gist/btree_bit.c
@@ -6,7 +6,7 @@
 #include "btree_gist.h"
 #include "btree_utils_var.h"
 #include "utils/builtins.h"
-#include "utils/bytea.h"
+#include "utils/sortsupport.h"
 #include "utils/varbit.h"
 
 
@@ -19,10 +19,33 @@ PG_FUNCTION_INFO_V1(gbt_bit_picksplit);
 PG_FUNCTION_INFO_V1(gbt_bit_consistent);
 PG_FUNCTION_INFO_V1(gbt_bit_penalty);
 PG_FUNCTION_INFO_V1(gbt_bit_same);
+PG_FUNCTION_INFO_V1(gbt_bit_sortsupport);
+PG_FUNCTION_INFO_V1(gbt_varbit_sortsupport);
 
 
 /* define for comparison */
 
+static int
+gbt_bit_ssup_cmp(Datum x, Datum y, SortSupport ssup)
+{
+	GBT_VARKEY *key1 = PG_DETOAST_DATUM(x);
+	GBT_VARKEY *key2 = PG_DETOAST_DATUM(y);
+
+	GBT_VARKEY_R arg1 = gbt_var_key_readable(key1);
+	GBT_VARKEY_R arg2 = gbt_var_key_readable(key2);
+	Datum result;
+
+	/* lower is always equal to upper, so just compare those fields */
+	result = DirectFunctionCall2(bitcmp,
+ PointerGetDatum(arg1.lower),
+ PointerGetDatum(arg2.lower));
+
+	GBT_FREE_IF_COPY(key1, x);
+	GBT_FREE_IF_COPY(key2, y);
+
+	return DatumGetInt32(result);
+}
+
 static bool
 gbt_bitgt(const void *a, const void *b, Oid collation, FmgrInfo *flinfo)
 {
@@ -208,3 +231,25 @@ gbt_bit_penalty(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(gbt_var_penalty(result, o, n, PG_GET_COLLATION(),
 	  &tinfo, fcinfo->flinfo));
 }
+
+Datum
+gbt_bit_sortsupport(PG_FUNCTION_ARGS)
+{
+	SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+	ssup->comparator = gbt_bit_ssup_cmp;
+	ssup->ssup_extra = NULL;
+
+	PG_RETURN_VOID();
+}
+
+Datum
+gbt_varbit_sortsupport(PG_FUNCTION_ARGS)
+{
+	SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+	ssup->comparator = gbt_bit_ssup_cmp;
+	ssup->ssup_extra = NULL;
+
+	PG_RETURN_VOID();
+}
\ No newline at end of file
diff --git a/contrib/btree_gist/btree_bool.c b/contrib/btree_gist/btree_bool.c
index 8b2af129b52..5058dba2986 100644
--- a/contrib/btree_gist/btree_bool.c
+++ b/contrib/btree_gist/btree_bool.c
@@ -6,6 +6,7 @@
 #include "btree_gist.h"
 #include "btree_utils_num.h"
 #include "common/int.h"
+#include "utils/sortsupport.h"
 
 typedef struct boolkey
 {
@@ -23,6 +24,16 @@ PG_FUNCTION_INFO_V1(gbt_bool_picksplit);
 PG_FUNCTION_INFO_V1(gbt_bool_consistent);
 PG_FUNCTION_INFO_V1(gbt_bool_penalty);
 PG

Re: A new strategy for pull-up correlated ANY_SUBLINK

2024-02-13 Thread Alexander Korotkov

Hi!

> If the changes of Alena are ok, can you merge the changes and post an
> updated version so that CFBot can apply the patch and verify the
> changes. As currently CFBot is trying to apply only Alena's changes
> and failing with the following at [1]:

I think this is a very nice and pretty simple optimization.  I've
merged the changes by Alena, and slightly revised the code changes in
convert_ANY_sublink_to_join().  I'm going to push this if there are no
objections.

--
Regards,
Alexander Korotkov


0001-Pull-up-ANY-SUBLINK-with-the-necessary-lateral-su-v6.patch
Description: Binary data

Re: Add publisher and subscriber to glossary documentation.

2024-02-13 Thread Shlok Kyal

Hi,
I addressed the comments and updated the patch.

> Should these be "publisher node" and "subscriber node" instead?  Do we
> want to define the term "node"?  I think in everyday conversations we
> use "node" quite a lot, so maybe we do need an entry for it.  (Maybe
> just  suffices, plus add under instance
> "also called a node".)

Modified

> +   Publisher
> +   
> +
> +  A node where publication is defined.
> +  It replicates the publication changes to the subscriber node.
>
> Apart from deciding what to do with "node", what are "changes"?  This
> doesn't seem very specific.

Modified

> +   Subscriber
> +   
> +
> + A node where subscription is defined.
> + It subscribe to one or more publications on a publisher node and pull 
> the data
> + from the publications they subscribe to.
>
> Same issues as above, plus there are some grammar issues.

Modified

> I think these definitions should use the term "logical replication",
> which we don't currently define.  We do have "replication" where we
> provide an overview of "logical replication".  Maybe that's enough, but
> we should consider whether we want a separate definition of logical
> replication (I'm leaning towards not having one, but it's worth asking.)

Modified. Added the term "logical replication" in the definitions.
Used reference to "replication".

Thanks and Regards,
Shlok Kyal


v2-0001-Add-publisher-and-subscriber-to-glossary-document.patch
Description: Binary data

Re: POC, WIP: OR-clause support for indexes

2024-02-13 Thread Andrei Lepikhov


On 12/2/2024 17:51, Alena Rybakina wrote:

On 12.02.2024 12:01, Andrei Lepikhov wrote:

On 12/2/2024 15:55, Alena Rybakina wrote:
As I understand it, match_clauses_to_index is necessary if you have a 
RestrictInfo (rinfo1) variable, so maybe we should run it after the 
make_restrictonfo procedure, otherwise calling it, I think, is useless.
I think you must explain your note in more detail. Before this call, 
we already called make_restrictinfo() and built rinfo1, haven't we?


I got it, I think, I was confused by the “else“ block when we can 
process the index, otherwise we move on to the next element.


I think maybe “else“ block of creating restrictinfo with the indexpaths 
list creation should be moved to a separate function or just remove "else"?
IMO, it is a matter of taste. But if you are really confused, maybe it 
will make understanding for someone else simpler. So, changed.
I think we need to check that rinfo->clause is not empty, because if it 
is we can miss calling build_paths_for_OR function. We should add it there:


restriction_is_saop_clause(RestrictInfo *restrictinfo)
{
     if (IsA(restrictinfo->clause, ScalarArrayOpExpr))
I wonder if we should add here assertion, not NULL check. In what case 
we could get NULL clause here? But added for surety.


By the way, I think we need to add a check that the clauseset is not 
empty (if (!clauseset.nonempty)) otherwise we could get an error. The 
same check I noticed in build_paths_for_OR function.

I don't. Feel free to provide counterexample.

--
regards,
Andrei Lepikhov
Postgres Professional
From 2b088a1bd662c614ee491a797d2fcb67e2fb2391 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 24 Jan 2024 14:07:17 +0700
Subject: [PATCH 2/2] Teach generate_bitmap_or_paths to build BitmapOr paths
 over SAOP clauses.

Likewise OR clauses, discover SAOP array and try to split its elements
between smaller sized arrays to fit a set of partial indexes.
---
 src/backend/optimizer/path/indxpath.c | 301 ++
 src/backend/optimizer/util/predtest.c |  46 
 src/backend/optimizer/util/restrictinfo.c |  13 +
 src/include/optimizer/optimizer.h |  16 ++
 src/include/optimizer/restrictinfo.h  |   1 +
 src/test/regress/expected/select.out  | 282 
 src/test/regress/sql/select.sql   |  82 ++
 src/tools/pgindent/typedefs.list  |   1 +
 8 files changed, 689 insertions(+), 53 deletions(-)

diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 32c6a8bbdc..56b04541db 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -32,6 +32,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/array.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
@@ -1220,11 +1221,174 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
return result;
 }
 
+/*
+ * Building index paths over SAOP clause differs from the logic of OR clauses.
+ * Here we iterate across all the array elements and split them to SAOPs,
+ * corresponding to different indexes. We must match each element to an index.
+ */
+static List *
+build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo,
+List *other_clauses)
+{
+   List   *result = NIL;
+   List   *predicate_lists = NIL;
+   ListCell   *lc;
+   PredicatesData *pd;
+   ScalarArrayOpExpr  *saop = (ScalarArrayOpExpr *) rinfo->clause;
+
+   Assert(IsA(saop, ScalarArrayOpExpr) && saop->useOr);
+
+   if (!IsA(lsecond(saop->args), Const))
+   /*
+* Has it practical outcome to merge arrays which couldn't 
constantified
+* before that step?
+*/
+   return NIL;
+
+   /* Collect predicates */
+   foreach(lc, rel->indexlist)
+   {
+   IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+
+   /* Take into consideration partial indexes supporting bitmap 
scans */
+   if (!index->amhasgetbitmap || index->indpred == NIL || 
index->predOK)
+   continue;
+
+   pd = palloc0(sizeof(PredicatesData));
+   pd->id = foreach_current_index(lc);
+   /* The trick with reducing recursion is stolen from 
predicate_implied_by */
+   pd->predicate = list_length(index->indpred) == 1 ?
+   
(Node *) linitial(index->indpred) :
+   
(Node *) index->indpred;
+   predicate_lists = lappend(predicate_lists, (void *) pd);
+   }
+
+   /* Split the array data according to index predicates. */
+   if (predicate_lists == NIL ||
+

Re: POC, WIP: OR-clause support for indexes

2024-02-13 Thread Andrei Lepikhov


On 13/2/2024 07:00, jian he wrote:

+ newa = makeNode(ArrayExpr);
+ /* array_collid will be set by parse_collate.c */
+ newa->element_typeid = scalar_type;
+ newa->array_typeid = array_type;
+ newa->multidims = false;
+ newa->elements = aexprs;
+ newa->location = -1;

I am confused by the comments `array_collid will be set by
parse_collate.c`, can you further explain it?
I wonder if the second paragraph of comments on commit b310b6e will be 
enough to dive into details.



if OR expression right arm is not plain Const, but with collation
specification, eg.
`where a  = 'a' collate "C" or a = 'b' collate "C";`

then the rightop is not Const, it will be CollateExpr, it will not be
used in transformation.
Yes, it is done for simplicity right now. I'm not sure about corner 
cases of merging such expressions.




set enable_or_transformation to on;
explain(timing off, analyze, costs off)
select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
= 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
35.376 ms

The time is the last result of the 10 iterations.

The reason here - parallel workers.
If you see into the plan you will find parallel workers without 
optimization and absence of them in the case of optimization:


Gather  (cost=1000.00..28685.37 rows=87037 width=12)
(actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test
 Filter: ((x = 1) OR (x = 2) OR (x = 3) OR (x = 4) OR (x = 5) 
OR (x = 6) OR (x = 7) OR (x = 8) OR (x = 9))


Seq Scan on test  (cost=0.02..20440.02 rows=90600 width=12)
  (actual rows=90363 loops=1)
   Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))

Having 90600 tuples returned we estimate it into 87000 (less precisely) 
without transformation and 90363 (more precisely) with the transformation.
But if you play with parallel_tuple_cost and parallel_setup_cost, you 
will end up having these parallel workers:


 Gather  (cost=0.12..11691.03 rows=90600 width=12)
 (actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test
 Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))
 Rows Removed by Filter: 303212

And some profit about 25%, on my laptop.
I'm not sure about the origins of such behavior, but it seems to be an 
issue of parallel workers, not this specific optimization.


--
regards,
Andrei Lepikhov
Postgres Professional

Re: Synchronizing slots from primary to standby

2024-02-13 Thread shveta malik

On Fri, Feb 9, 2024 at 10:04 AM Amit Kapila  wrote:
>
> +reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
> {
> ...
> + /*
> + * Find the oldest existing WAL segment file.
> + *
> + * Normally, we can determine it by using the last removed segment
> + * number. However, if no WAL segment files have been removed by a
> + * checkpoint since startup, we need to search for the oldest segment
> + * file currently existing in XLOGDIR.
> + */
> + oldest_segno = XLogGetLastRemovedSegno() + 1;
> +
> + if (oldest_segno == 1)
> + {
> + TimeLineID cur_timeline;
> +
> + GetWalRcvFlushRecPtr(NULL, &cur_timeline);
> + oldest_segno = XLogGetOldestSegno(cur_timeline);
> ...
> ...
>
> This means that if the restart_lsn of the slot is from the prior
> timeline then the standby needs to wait for longer times to sync the
> slot. Ideally, it should be okay because I don't think even if
> restart_lsn of the slot may be from some prior timeline than the
> current flush timeline on standby, how often that case can happen?

I tested this behaviour on v85 patch, it is working as expected i.e.
if remot_slot's lsn belongs to a prior timeline then on executing
pg_sync_replication_slots() function, it creates a temporary slot and
waits for primary to catch up. And once primary catches up, the next
execution of SQL function persistes the slot and syncs it.

Setup: primary-->standby1-->standby2

Steps:
1) Insert data on primary. It gets replicated to both standbys.
2) Create logical slot on primary and execute
pg_sync_replication_slots() on standby1. The slot gets synced and
persisted on standby1.
3) Shutdown standby2.
4) Insert data on primary. It gets replicated to standby1.
5) Shutdown primary and promote standby1.
6) Insert some data on standby1/new primary directly.
7) Start standby2: It now needs to catch up old data of timeline1
(from step 4) + new data of timeline2 (from step 6) . It does that. On
reaching the end of the old timeline, walreceiver gets restarted and
starts streaming using the new timeline.
8) Execute pg_sync_replication_slots() on standby2 to sync the slot.
Now remote_slot's lsn belongs to a prior timeline on standby2. In my
test-run, remote_slot's lsn belonged to segno=4 on standby2, while the
oldest segno of current_timline(2) was 6. Thus it created the slot
locally with lsn belonging to the oldest segno 6 of cur_timeline(2)
but did not persist it as remote_slot's lsn was behind.
9) Now on standby1/new-primary, advance the logical slot by calling
pg_replication_slot_advance().
10) Execute pg_sync_replication_slots() again on standby2, now the
local temporary slot gets persisted as the restart_lsn of primary has
caught up.

thanks
Shveta

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-02-13 Thread Bharath Rupireddy

On Tue, Feb 13, 2024 at 1:03 AM Jeff Davis  wrote:
>
> For 0004, we need to resolve why callers are using XLOG_BLCKSZ and we
> can fix that independently, as discussed here:
>
> https://www.postgresql.org/message-id/CALj2ACV=C1GZT9XQRm4iN1NV1T=hla_hsgwnx2y5-g+mswd...@mail.gmail.com

Thanks. I started a new thread for this -
https://www.postgresql.org/message-id/CALj2ACWBRFac2TingD3PE3w2EBHXUHY3%3DAEEZPJmqhpEOBGExg%40mail.gmail.com.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Document efficient self-joins / UPDATE LIMIT techniques.

2024-02-13 Thread Laurenz Albe

On Mon, 2024-02-12 at 12:24 -0500, Corey Huinker wrote:
> > Do you plan to add it to the commitfest?  If yes, I'd set it "ready for 
> > committer".
> 
> Commitfest entry reanimated. 

Truly... you created a revenant in the already closed commitfest.

I closed that again and added a new entry in the open commitfest.

Yours,
Laurenz Albe

Re: Improve documentation of upgrade for streaming replication section.

2024-02-13 Thread Amit Kapila

On Mon, Feb 12, 2024 at 7:59 AM Amit Kapila  wrote:
>
> On Fri, Feb 9, 2024 at 9:32 PM Shubham Khanna
>  wrote:
> >
> > Currently the documentation of upgrade for streaming replication
> > section says that logical replication slots will be copied
> > irrespective of the version, but the logical replication slots are
> > copied only if the old primary is version 17.0 or later. The changes
> > for the same are in the attached patch.
> >
>
> LGTM. I'll push this tomorrow unless I see any comments or objections
> for the same.
>

Pushed!

-- 
With Regards,
Amit Kapila.

Re: glibc qsort() vulnerability

2024-02-13 Thread Mats Kindahl

On Tue, Feb 13, 2024 at 12:41 AM Andres Freund  wrote:

> Hi,
>
> On 2024-02-12 17:04:23 -0600, Nathan Bossart wrote:
> > On Mon, Feb 12, 2024 at 01:31:30PM -0800, Andres Freund wrote:
> > > One thing that's worth checking is if this ends up with *worse* code
> when the
> > > comparators are inlined. I think none of the changed comparators will
> end up
> > > getting used with an inlined sort, but ...
> >
> > Yeah, AFAICT the only inlined sorts are in tuplesort.c and bufmgr.c, and
> > the patches don't touch those files.
> >
> > > The reason we could end up with worse code is that when inlining the
> > > comparisons would make less sense for the compiler. Consider e.g.
> > > return DO_COMPARE(a, b) < 0 ?
> > > (DO_COMPARE(b, c) < 0 ? b : (DO_COMPARE(a, c) < 0 ? c : a))
> > > : (DO_COMPARE(b, c) > 0 ? b : (DO_COMPARE(a, c) < 0 ? a :
> c));
> > >
> > > With a naive implementation the compiler will understand it only cares
> about
> > > a < b, not about the other possibilities. I'm not sure that's still
> true with
> > > the more complicated optimized version.
> >
> > You aren't kidding [0].  Besides perhaps adding a comment in
> > sort_template.h, is there anything else you think we should do about this
> > now?
>
> I'd add also a comment to the new functions. I think it's fine otherwise. I
> wish there were formulation that'd be optimal for both cases, but this way
> we
> can at least adapt all places at once if either find a better formulation
> or
> change all our sorts to happen via an inline implementation of qsort or
> such.
>

I suspect that this has to do with the fact that we "abuse" the type system
by performing arithmetics on booleans converted to integers and the
compilers do not have rules for dealing with these.

For example, with the inline function "static inline cmp(a,b) { return a <
b ? -1 : a > b ? 1 : 0; }"

Which trivially can be rewritten by the compiler using very basic rules, as
follows:

DO_COMPARE(a,b)
  (a < b ? -1 : a > b ? 1 : 0) < 0
  a < b ? (-1 < 0) : a > b ? (1 < 0) : (0 < 0)
  a < b ? true : a > b ? false : false
  a < b ? true : a > b ? false : false
  a < b ? true : false
  a < b

When it comes to (a < b) - (a > b) there are no such rules added since it
is not a very common case.

Clang fares better for this case and at least shows the two alternatives as
equal.

Maybe we should change to use the original version equivalent to the inline
function above since that works better with surrounding code?

Best wishes,
Mats Kindahl


>
> Greetings,
>
> Andres
>

Re: Fix incorrect PG_GETARG in pgcrypto

2024-02-13 Thread Michael Paquier

On Mon, Feb 12, 2024 at 11:30:40PM -0500, shihao zhong wrote:
> I'd like to bring to your attention that I recently identified some
> functions in pgcrypto that are using PG_GETARG functions in a way that
> doesn't match the expected function signature of the stored
> procedures. This patch proposes a solution to address these
> inconsistencies and ensure proper alignment.

You've indeed grabbed some historical inconsistencies here.  Please
note that your patch has reversed diffs (for example, the SQL
definition of pgp_sym_encrypt_bytea uses bytea,text,text as arguments
and your resulting patch shows how HEAD does the job with
bytea,bytea,bytea), but perhaps you have generated it with a command
like `git diff -R`? 
--
Michael


signature.asc
Description: PGP signature

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-02-13 Thread Sutou Kouhei

Hi,

In <20240209192705.5qdilvviq3py2...@awork3.anarazel.de>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Fri, 9 Feb 2024 11:27:05 -0800,
  Andres Freund  wrote:

>> +static void
>> +CopyFromTextInFunc(CopyFromState cstate, Oid atttypid,
>> +   FmgrInfo *finfo, Oid *typioparam)
>> +{
>> +Oid func_oid;
>> +
>> +getTypeInputInfo(atttypid, &func_oid, typioparam);
>> +fmgr_info(func_oid, finfo);
>> +}
> 
> FWIW, we should really change the copy code to initialize FunctionCallInfoData
> instead of re-initializing that on every call, realy makes a difference
> performance wise.

How about the attached patch approach? If it's a desired
approach, I can also write a separated patch for COPY TO.

>> +cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
>> +/* Set read attribute callback */
>> +if (cstate->opts.csv_mode)
>> +cstate->copy_read_attributes = CopyReadAttributesCSV;
>> +else
>> +cstate->copy_read_attributes = CopyReadAttributesText;
>> +}
> 
> Isn't this precisely repeating the mistake of 2889fd23be56?

What do you think about the approach in my previous mail's
attachments?
https://www.postgresql.org/message-id/flat/20240209.163205.704848659612151781.kou%40clear-code.com#dbb1f8d7f2f0e8fe3c7e37a757fcfc54

If it's a desired approach, I can prepare a v15 patch set
based on the v14 patch set and the approach.


I'll reply other comments later...


Thanks,
-- 
kou
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 41f6bc43e4..a43c853e99 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1691,6 +1691,10 @@ BeginCopyFrom(ParseState *pstate,
 	/* We keep those variables in cstate. */
 	cstate->in_functions = in_functions;
 	cstate->typioparams = typioparams;
+	if (cstate->opts.binary)
+		cstate->fcinfo = PrepareInputFunctionCallInfo();
+	else
+		cstate->fcinfo = PrepareReceiveFunctionCallInfo();
 	cstate->defmap = defmap;
 	cstate->defexprs = defexprs;
 	cstate->volatile_defexprs = volatile_defexprs;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index 906756362e..e372e5efb8 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -853,6 +853,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 num_defaults = cstate->num_defaults;
 	FmgrInfo   *in_functions = cstate->in_functions;
 	Oid		   *typioparams = cstate->typioparams;
+	FunctionCallInfoBaseData *fcinfo = cstate->fcinfo;
 	int			i;
 	int		   *defmap = cstate->defmap;
 	ExprState **defexprs = cstate->defexprs;
@@ -953,12 +954,13 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			 * If ON_ERROR is specified with IGNORE, skip rows with soft
 			 * errors
 			 */
-			else if (!InputFunctionCallSafe(&in_functions[m],
-			string,
-			typioparams[m],
-			att->atttypmod,
-			(Node *) cstate->escontext,
-			&values[m]))
+			else if (!PreparedInputFunctionCallSafe(fcinfo,
+	&in_functions[m],
+	string,
+	typioparams[m],
+	att->atttypmod,
+	(Node *) cstate->escontext,
+	&values[m]))
 			{
 cstate->num_errors++;
 return true;
@@ -1958,7 +1960,7 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	if (fld_size == -1)
 	{
 		*isnull = true;
-		return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
+		return PreparedReceiveFunctionCall(cstate->fcinfo, flinfo, NULL, typioparam, typmod);
 	}
 	if (fld_size < 0)
 		ereport(ERROR,
@@ -1979,8 +1981,8 @@ CopyReadBinaryAttribute(CopyFromState cstate, FmgrInfo *flinfo,
 	cstate->attribute_buf.data[fld_size] = '\0';
 
 	/* Call the column type's binary input converter */
-	result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
- typioparam, typmod);
+	result = PreparedReceiveFunctionCall(cstate->fcinfo, flinfo, &cstate->attribute_buf,
+		 typioparam, typmod);
 
 	/* Trouble if it didn't eat the whole buffer */
 	if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index e48a86be54..b0b5310219 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1672,6 +1672,73 @@ DirectInputFunctionCallSafe(PGFunction func, char *str,
 	return true;
 }
 
+/*
+ * Prepare callinfo for PreparedInputFunctionCallSafe to reuse one callinfo
+ * instead of initializing it for each call. This is for performance.
+ */
+FunctionCallInfoBaseData *
+PrepareInputFunctionCallInfo(void)
+{
+	FunctionCallInfoBaseData *fcinfo;
+
+	fcinfo = (FunctionCallInfoBaseData *) palloc(SizeForFunctionCallInfo(3));
+	InitFunctionCallInfoData(*fcinfo, NULL, 3, InvalidOid, NULL, NULL);
+	fcinfo->args[0].isnull = false;
+	fcinfo->args[1].isnull = false;
+	fcinfo->args[2].isnull = f

Re: Small fix on query_id_enabled

2024-02-13 Thread Michael Paquier

On Tue, Feb 13, 2024 at 01:13:43AM +0900, Yugo NAGATA wrote:
> I attached an updated patch that adds comments noting to use 
> IsQueryIdEnabled()
> instead of accessing the variables directly.

Sounds good to me, thanks.
--
Michael


signature.asc
Description: PGP signature

94 matches

Mail list logo