Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-07-19 Thread Amit Kapila
On Wed, Jul 19, 2023 at 7:33 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > 2.
> > + /*
> > + * Dump logical replication slots if needed.
> > + *
> > + * XXX We cannot dump replication slots at the same time as the schema
> > + * dump because we need to separate the timing of restoring
> > + * replication slots and other objects. Replication slots, in
> > + * particular, should not be restored before executing the pg_resetwal
> > + * command because it will remove WALs that are required by the slots.
> > + */
> > + if (user_opts.include_logical_slots)
> >
> > Can you explain this point a bit more with some example scenarios?
> > Basically, if we had sent all the WAL before the upgrade then why do
> > we need to worry about the timing of pg_resetwal?
>
> OK, I can tell the example here. Should it be described on the source?
>
> Assuming that there is a valid logical replication slot as follows:
>
> ```
> postgres=# select slot_name, plugin, restart_lsn, wal_status, two_phase from 
> pg_replication_slots;
>  slot_name |plugin | restart_lsn | wal_status | two_phase
> ---+---+-++---
>  test  | test_decoding | 0/15665A8   | reserved   | f
> (1 row)
>
> postgres=# select * from pg_current_wal_lsn();
>  pg_current_wal_lsn
> 
>  0/15665E0
> (1 row)
> ```
>
> And here let's execute the pg_resetwal to the pg server.
> The existing wal segment file is purged and moved to next seg.
>
> ```
> $ pg_ctl stop -D data_N1/
> waiting for server to shut down done
> server stopped
> $ pg_resetwal -l 00010002 data_N1/
> Write-ahead log reset
> $ pg_ctl start -D data_N1/ -l N1.log
> waiting for server to start done
> server started
> ```
>
> After that the logical slot cannot move foward anymore because the required 
> WALs
> are removed, whereas the wal_status is still "reserved".
>
> ```
> postgres=# select slot_name, plugin, restart_lsn, wal_status, two_phase from 
> pg_replication_slots;
>  slot_name |plugin | restart_lsn | wal_status | two_phase
> ---+---+-++---
>  test  | test_decoding | 0/15665A8   | reserved   | f
> (1 row)
>
> postgres=# select * from pg_current_wal_lsn();
>  pg_current_wal_lsn
> 
>  0/2028328
> (1 row)
>
> postgres=# select * from pg_logical_slot_get_changes('test', NULL, NULL);
> ERROR:  requested WAL segment pg_wal/00010001 has already 
> been removed
> ```
>
> pg_upgrade runs pg_dump and then pg_resetwal, so dumping slots must be done
> separately to avoid above error.
>

Okay, so the point is that if we create the slot in the new cluster
before pg_resetwal then its restart_lsn will be set to the current LSN
position which will later be reset by pg_resetwal. So, we won't be
able to use such a slot, right?

-- 
With Regards,
Amit Kapila.




Re: Report distinct wait events when waiting for WAL "operation"

2023-07-19 Thread Bharath Rupireddy
On Mon, Jul 17, 2023 at 10:25 PM Andres Freund  wrote:
>
> Hi,
>
> Therefore I'm proposing that LWLockAcquireOrWait() and LWLockWaitForVar() not
> use the "generic" LWLockReportWaitStart(), but use caller provided wait
> events.  The attached patch adds two new wait events for the existing callers.

+1 for having separate wait events for WAL insert lock acquire and
wait for WAL insertions to finish. However, I don't think we need to
pass wait events to LWLockAcquireOrWait and LWLockWaitForVar, we can
just use wait events directly in the functions. Because these two
functions are used for acquiring WAL insert lock and waiting for WAL
insertions to finish, they aren't multipurpose functions.

> I waffled a bit about which wait event section to add these to. Ended up with
> "IPC", but would be happy to change that.
>
> WAIT_EVENT_WAL_WAIT_INSERT  WALWaitInsert   "Waiting for WAL record to be 
> copied into buffers."
> WAIT_EVENT_WAL_WAIT_WRITE   WALWaitWrite"Waiting for WAL buffers to be 
> written or flushed to disk."

IPC seems okay to me. If not, how about the PG_WAIT_LWLOCK event
class? Or, have WAIT_EVENT_WAL_WAIT_WRITE  under PG_WAIT_IO and the
other under PG_WAIT_IPC?

> ┌┬─┬───┬───┐
> │  backend_type  │ wait_event_type │  wait_event   │ count │
> ├┼─┼───┼───┤
> │ client backend │ IPC │ WALWaitInsert │22 │
> │ client backend │ LWLock  │ WALInsert │13 │
> │ client backend │ LWLock  │ WALBufMapping │ 5 │
> │ walwriter  │ (null)  │ (null)│ 1 │
> │ client backend │ Client  │ ClientRead│ 1 │
> │ client backend │ (null)  │ (null)│ 1 │
> └┴─┴───┴───┘
>
> even though they are very different
>
> FWIW, the former is bottlenecked by the number of WAL insertion locks, the
> second is bottlenecked by copying WAL into buffers due to needing to flush
> them.

This separation looks clean and gives much more info.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Re: Report distinct wait events when waiting for WAL "operation"

2023-07-19 Thread Michael Paquier
On Wed, Jul 19, 2023 at 06:49:57PM +0530, Amit Kapila wrote:
> On Mon, Jul 17, 2023 at 10:26 PM Andres Freund  wrote:
>> FWIW, the former is bottlenecked by the number of WAL insertion locks, the
>> second is bottlenecked by copying WAL into buffers due to needing to flush
>> them.
> 
> This gives a better idea of what's going on. +1 for separating these waits.

+ * As this is not used to wait for lwlocks themselves, the caller has to
+ * provide a wait event to be reported.
  */
 bool
-LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval)
+LWLockWaitForVar(LWLock *lock, uint64 *valptr, uint64 oldval, uint64 *newval,
+uint32 wait_event_info)

Makes sense to me to do this split, nice!  And this gives more
flexibility for out-of-core callers, while on it.
--
Michael


signature.asc
Description: PGP signature


Re: Do we want to enable foreign key constraints on subscriber?

2023-07-19 Thread Amit Kapila
On Wed, Jul 19, 2023 at 12:21 PM Kyotaro Horiguchi
 wrote:
>
> There's an issue brought up in the -bugs list [1]. Since triggers are
> deactivated on a subscriber by default, foreign key constraints don't
> fire for replicated changes. The docs state this is done to prevent
> repetitive data propagation between tables on subscribers. But foreign
> key triggers don't contribute to this issue.
>

Right and recent reports indicate that this does cause inconvenience for users.

> My understanding is that constraint triggers, including ones created
> using the "CREATE CONSTRAINT TRIGGER" command, aren't spposed to alter
> data.
>

I also think so. You need to update the docs for this.

Peter E., do you remember if there is any specific problem in enabling
such triggers by default for apply side? The only thing that I can
think of is that the current behavior keeps the trigger-firing rules
the same for all kinds of triggers which has merits but OTOH it causes
inconvenience to users, especially for foreign-key checks.

--
With Regards,
Amit Kapila.




Re: Row pattern recognition

2023-07-19 Thread Tatsuo Ishii
> Hello,
> 
> Thanks for working on this! We're interested in RPR as well, and I've
> been trying to get up to speed with the specs, to maybe make myself
> useful.

Thank you for being interested in this.

> 19075-5 discusses that, at least; not sure about other parts of the spec.

Thanks for the info. Unfortunately I don't have 19075-5 though.

>> Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
>> into Window clause with RPR?
> 
> Are we guaranteed to always have an equivalent window clause? There seem
> to be many differences between the two, especially when it comes to ONE
> ROW/ALL ROWS PER MATCH.

You are right. I am not 100% sure if the rewriting is possible at this
point.

> To add onto what Vik said above:
> 
>>> It seems RPR in the standard is quite complex. I think we can start
>>> with a small subset of RPR then we could gradually enhance the
>>> implementation.
>> 
>> I have no problem with that as long as we don't paint ourselves into a 
>> corner.
> 
> To me, PATTERN looks like an area where we may want to support a broader
> set of operations in the first version.

Me too but...

> The spec has a bunch of
> discussion around cases like empty matches, match order of alternation
> and permutation, etc., which are not possible to express or test with
> only the + quantifier. Those might be harder to get right in a v2, if we
> don't at least keep them in mind for v1?

Currently my patch has a limitation for the sake of simple
implementation: a pattern like "A+" is parsed and analyzed in the raw
parser. This makes subsequent process much easier because the pattern
element variable (in this case "A") and the quantifier (in this case
"+") is already identified by the raw parser. However there are much
more cases are allowed in the standard as you already pointed out. For
those cases probably we should give up to parse PATTERN items in the
raw parser, and instead the raw parser just accepts the elements as
Sconst?

>> +static List *
>> +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef 
>> *windef)
>> +{
>> +   List*patterns;
> 
> My compiler complains about the `patterns` variable here, which is
> returned without ever being initialized. (The caller doesn't seem to use
> it.)

Will fix.

>> +-- basic test using PREV
>> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
>> + WINDOW w AS (
>> + PARTITION BY company
>> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
>> + INITIAL
>> + PATTERN (START UP+ DOWN+)
>> + DEFINE
>> +  START AS TRUE,
>> +  UP AS price > PREV(price),
>> +  DOWN AS price < PREV(price)
>> +);
> 
> nitpick: IMO the tests should be making use of ORDER BY in the window
> clauses.

Right. Will fix.

> This is a very big feature. I agree with you that MEASURES seems like a
> very important "next piece" to add. Are there other areas where you
> would like reviewers to focus on right now (or avoid)?

Any comments, especially on the PREV/NEXT implementation part is
welcome. Currently the DEFINE expression like "price > PREV(price)" is
prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR.  Then
evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
hack and should be gotten ride of before v1 is committed. Better idea?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: Make mesage at end-of-recovery less scary.

2023-07-19 Thread Kyotaro Horiguchi
At Mon, 17 Jul 2023 15:20:30 +0300, Aleksander Alekseev 
 wrote in 
> Thanks for working on this, it bugged me for a while. I noticed that
> cfbot is not happy with the patch so I rebased it.
> postgresql:pg_waldump test suite didn't pass after the rebase. I fixed
> it too. Other than that the patch LGTM so I'm not changing its status
> from "Ready for Committer".

Thanks for the rebasing.

> It looks like the patch was moved between the commitfests since
> 2020... If there is anything that may help merging it into PG17 please
> let me know.

This might be just too-much or there might be some doubt in this..

This change basically makes a zero-length record be considered as the
normal end of WAL.

The most controvorsial point I think in the design is the criteria for
an error condition. The assumption is that the WAL is sound if all
bytes following a complete record, up to the next page boundary, are
zeroed out. This is slightly narrower than the original criteria,
merely checking the next record is zero-length.  Naturally, there
might be instances where that page has been blown out due to device
failure or some other reasons. Despite this, I believe it is
preferable rather than always issuing a warning (in the LOG level,
though) about a potential WAL corruption.

I've adjusted the condition for muting repeated log messages at the
same LSN, changing it from ==LOG to <=WARNING. This is simply a
consequence of following the change of "real" warnings from LOG to
WARNING. I believe this is acceptable even without considering
aforementioned change, as any single retriable (

Re: Support worker_spi to execute the function dynamically.

2023-07-19 Thread Bharath Rupireddy
On Thu, Jul 20, 2023 at 10:09 AM Michael Paquier  wrote:
>
> On Thu, Jul 20, 2023 at 09:43:37AM +0530, Bharath Rupireddy wrote:
> > +1. However, a comment above helps one to understand why some GUCs are
> > defined before if (!process_shared_preload_libraries_in_progress). As
> > this is an example extension, it will help understand the reasoning
> > better. I know we will it in the commit message, but a direct comment
> > helps:
> >
> > /*
> >  * Note that this GUC is defined irrespective of worker_spi shared 
> > library
> >  * presence in shared_preload_libraries. It's possible to create the
> >  * worker_spi extension and use functions without it being specified in
> >  * shared_preload_libraries. If we return from here without defining 
> > this
> >  * GUC, the dynamic workers launched by worker_spi_launch() will keep
> >  * crashing and restarting.
> >  */
>
> WFM to be more talkative here and document things, but I don't think
> that's it.  How about a simple "These GUCs are defined even if this
> library is not loaded with shared_preload_libraries, for
> worker_spi_launch()."

LGTM.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Support worker_spi to execute the function dynamically.

2023-07-19 Thread Michael Paquier
On Thu, Jul 20, 2023 at 09:43:37AM +0530, Bharath Rupireddy wrote:
> +1. However, a comment above helps one to understand why some GUCs are
> defined before if (!process_shared_preload_libraries_in_progress). As
> this is an example extension, it will help understand the reasoning
> better. I know we will it in the commit message, but a direct comment
> helps:
> 
> /*
>  * Note that this GUC is defined irrespective of worker_spi shared library
>  * presence in shared_preload_libraries. It's possible to create the
>  * worker_spi extension and use functions without it being specified in
>  * shared_preload_libraries. If we return from here without defining this
>  * GUC, the dynamic workers launched by worker_spi_launch() will keep
>  * crashing and restarting.
>  */

WFM to be more talkative here and document things, but I don't think
that's it.  How about a simple "These GUCs are defined even if this
library is not loaded with shared_preload_libraries, for
worker_spi_launch()."
--
Michael


signature.asc
Description: PGP signature


Re: FATAL: operator class "xxxx" does not exist for access method "btree"

2023-07-19 Thread mao zhang
Fixed!

Tom Lane  于2023年7月19日周三 11:10写道:

> mao zhang  writes:
> > running bootstrap script ... 2023-07-19 09:40:47.083 CST [2808392]
> FATAL:
> >  operator class "key_ops" does not exist for access method "btree"
>
> I'm not sure what you find so mysterious about that error message.
>
> >   Oid global_key_id;
> > ...
> >
> DECLARE_UNIQUE_INDEX(pg_bm_client_global_keys_args_oid_index,8063,BmClientGlobalKeysArgsOidIndexId,on
> pg_bm_client_global_keys_args using btree(global_key_id key_ops));
>
> If global_key_id is an OID, why aren't you declaring its index
> with opclass oid_ops, rather than the quite nonexistent "key_ops"?
>
> regards, tom lane
>


Re: Support worker_spi to execute the function dynamically.

2023-07-19 Thread Bharath Rupireddy
On Thu, Jul 20, 2023 at 9:25 AM Michael Paquier  wrote:
>
> > In my understanding, the restriction is not required. So, I think it's
> > better to change the behavior.
> > (v1-0001-Support-worker_spi-to-execute-the-function-dynamical.patch)
> >
> > What do you think?
>
> +1.  I'm OK to lift this restriction with a SIGHUP GUC for the
> database name and that's not a pattern to encourage in a template
> module.  Will do so, if there are no objections.

+1. However, a comment above helps one to understand why some GUCs are
defined before if (!process_shared_preload_libraries_in_progress). As
this is an example extension, it will help understand the reasoning
better. I know we will it in the commit message, but a direct comment
helps:

/*
 * Note that this GUC is defined irrespective of worker_spi shared library
 * presence in shared_preload_libraries. It's possible to create the
 * worker_spi extension and use functions without it being specified in
 * shared_preload_libraries. If we return from here without defining this
 * GUC, the dynamic workers launched by worker_spi_launch() will keep
 * crashing and restarting.
 */

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-19 Thread Peter Smith
Hi, I had a look at the latest 3 patch (v20-0003).

Although this patch was recently modified, the updates are mostly only
to make it compatible with the updated v20-0002 patch. Specifically,
the v20-0003 updates did not yet address my review comments from
v17-0003 [1].

Anyway, this post is just a reminder so the earlier review doesn't get
forgotten.

--
[1] v17-0003 review -
https://www.postgresql.org/message-id/CAHut%2BPuMAiO_X_Kw6ud-jr5WOm%2Brpkdu7CppDU6mu%3DgY7UVMzQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-19 Thread Amit Kapila
On Thu, Jul 20, 2023 at 8:02 AM Peter Smith  wrote:
>
> On Tue, Jul 18, 2023 at 1:54 AM Melih Mutlu  wrote:
> >
> > Hi,
> >
> > PFA updated patches. Rebased 0003 with minor changes. Addressed Peter's 
> > reviews for 0001 and 0002 with some small comments below.
> >
> > Peter Smith , 10 Tem 2023 Pzt, 10:09 tarihinde şunu 
> > yazdı:
> >>
> >> 6. LogicalRepApplyLoop
> >>
> >> + /*
> >> + * apply_dispatch() may have gone into apply_handle_commit()
> >> + * which can call process_syncing_tables_for_sync.
> >> + *
> >> + * process_syncing_tables_for_sync decides whether the sync of
> >> + * the current table is completed. If it is completed,
> >> + * streaming must be already ended. So, we can break the loop.
> >> + */
> >> + if (MyLogicalRepWorker->is_sync_completed)
> >> + {
> >> + endofstream = true;
> >> + break;
> >> + }
> >> +
> >>
> >> and
> >>
> >> + /*
> >> + * If is_sync_completed is true, this means that the tablesync
> >> + * worker is done with synchronization. Streaming has already been
> >> + * ended by process_syncing_tables_for_sync. We should move to the
> >> + * next table if needed, or exit.
> >> + */
> >> + if (MyLogicalRepWorker->is_sync_completed)
> >> + endofstream = true;
> >>
> >> ~
> >>
> >> Instead of those code fragments above assigning 'endofstream' as a
> >> side-effect, would it be the same (but tidier) to just modify the
> >> other "breaking" condition below:
> >>
> >> BEFORE:
> >> /* Check if we need to exit the streaming loop. */
> >> if (endofstream)
> >> break;
> >>
> >> AFTER:
> >> /* Check if we need to exit the streaming loop. */
> >> if (endofstream || MyLogicalRepWorker->is_sync_completed)
> >> break;
> >
> >
> > First place you mentioned also breaks the infinite loop. Such an if 
> > statement is needed there with or without endofstream assignment.
> >
> > I think if there is a flag to break a loop, using that flag to indicate 
> > that we should exit the loop seems more appropriate to me. I see that it 
> > would be a bit tidier without endofstream = true lines, but I feel like it 
> > would also be less readable.
> >
> > I don't have a strong opinion though. I'm just keeping them as they are for 
> > now, but I can change them if you disagree.
> >
>
> I felt it was slightly sneaky to re-use the existing variable as a
> convenient way to do what you want. But, I don’t feel strongly enough
> on this point to debate it -- maybe see later if others have an
> opinion about this.
>

I feel it is okay to use the existing variable 'endofstream' here but
shall we have an assertion that it is a tablesync worker?

> >>
> >>
> >> 10b.
> >> All the other tablesync-related fields of this struct are named as
> >> relXXX, so I wonder if is better for this to follow the same pattern.
> >> e.g. 'relsync_completed'
> >
> >
> > Aren't those start with rel because they're related to the relation that 
> > the tablesync worker is syncing? is_sync_completed is not a relation 
> > specific field. I'm okay with changing the name but feel like 
> > relsync_completed would be misleading.
>
> My reading of the code is slightly different: Only these fields have
> the prefix ‘rel’ and they are all grouped under the comment “/* Used
> for initial table synchronization. */” because AFAIK only these fields
> are TWS specific (not used for other kinds of workers).
>
> Since this new flag field is also TWS-specific, therefore IMO it
> should follow the same consistent name pattern. But, if you are
> unconvinced, maybe see later if others have an opinion about it.
>

+1 to use the prefix 'rel' here as the sync is specific to the
relation. Even during apply phase, we will apply the relation-specific
changes. See should_apply_changes_for_rel().

-- 
With Regards,
Amit Kapila.




Re: Support worker_spi to execute the function dynamically.

2023-07-19 Thread Michael Paquier
On Thu, Jul 20, 2023 at 11:15:51AM +0900, Masahiro Ikeda wrote:
> While I'm working on the thread[1], I found that the function of
> worker_spi module fails if 'shared_preload_libraries' doesn't have
> worker_spi.

I guess that you were patching worker_spi to register dynamically a
wait event and embed that in a TAP test or similar without loading it
in shared_preload_libraries?  FWIW, you could use a trick like what I
am attaching here to load a wait event dynamically with the custom
wait event API.  You would need to make worker_spi_init_shmem() a bit
more aggressive with an extra hook to reserve a shmem area size, but
that's enough to show the custom wait event in the same backend as the
one that launches a worker_spi dynamically, while demonstrating how
the API can be used in this case.

> In my understanding, the restriction is not required. So, I think it's
> better to change the behavior.
> (v1-0001-Support-worker_spi-to-execute-the-function-dynamical.patch)
> 
> What do you think?

+1.  I'm OK to lift this restriction with a SIGHUP GUC for the
database name and that's not a pattern to encourage in a template
module.  Will do so, if there are no objections.
--
Michael
From 2fcd773cd4009cffd626640e1553d4efdceb0777 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 20 Jul 2023 12:48:07 +0900
Subject: [PATCH] Add tweak to allow worker_spi to register wait_event
 dynamically

---
 src/test/modules/worker_spi/worker_spi.c | 35 +++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index b87d6c75d8..4e3da93129 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -48,11 +48,20 @@ PG_FUNCTION_INFO_V1(worker_spi_launch);
 
 PGDLLEXPORT void worker_spi_main(Datum main_arg) pg_attribute_noreturn();
 
+static void worker_spi_init_shmem(void);
+
 /* GUC variables */
 static int	worker_spi_naptime = 10;
 static int	worker_spi_total_workers = 2;
 static char *worker_spi_database;
 
+typedef struct WorkerSpiState
+{
+	uint32 wait_event_info;
+} WorkerSpiState;
+
+/* the pointer to the shared memory */
+static WorkerSpiState *worker_spi_state = NULL;
 
 typedef struct worktable
 {
@@ -149,6 +158,8 @@ worker_spi_main(Datum main_arg)
 	/* We're now ready to receive signals */
 	BackgroundWorkerUnblockSignals();
 
+	worker_spi_init_shmem();
+
 	/* Connect to our database */
 	BackgroundWorkerInitializeConnection(worker_spi_database, NULL, 0);
 
@@ -199,7 +210,7 @@ worker_spi_main(Datum main_arg)
 		(void) WaitLatch(MyLatch,
 		 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 		 worker_spi_naptime * 1000L,
-		 WAIT_EVENT_EXTENSION);
+		 worker_spi_state->wait_event_info);
 		ResetLatch(MyLatch);
 
 		CHECK_FOR_INTERRUPTS();
@@ -346,6 +357,26 @@ _PG_init(void)
 	}
 }
 
+static void
+worker_spi_init_shmem(void)
+{
+	bool found = false;
+
+	LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
+	worker_spi_state = ShmemInitStruct("worker_spi",
+	   sizeof(WorkerSpiState),
+	   );
+	if (!found)
+	{
+		/* First time through ... */
+		worker_spi_state->wait_event_info = WaitEventExtensionNew();
+	}
+	LWLockRelease(AddinShmemInitLock);
+
+	WaitEventExtensionRegisterName(worker_spi_state->wait_event_info,
+   "worker_spi_custom");
+}
+
 /*
  * Dynamically launch an SPI worker.
  */
@@ -358,6 +389,8 @@ worker_spi_launch(PG_FUNCTION_ARGS)
 	BgwHandleStatus status;
 	pid_t		pid;
 
+	worker_spi_init_shmem();
+
 	memset(, 0, sizeof(worker));
 	worker.bgw_flags = BGWORKER_SHMEM_ACCESS |
 		BGWORKER_BACKEND_DATABASE_CONNECTION;
-- 
2.40.1



signature.asc
Description: PGP signature


Re: Issue in _bt_getrootheight

2023-07-19 Thread Gurjeet Singh
On Tue, Jul 11, 2023 at 9:35 AM Ahmed Ibrahim
 wrote:
>
> We have been working on the pg_adviser extension whose goal is to suggest 
> indexes by creating virtual/hypothetical indexes and see how it affects the 
> query cost.
>
> The hypothetical index shouldn't take any space on the disk (allocates 0 
> pages) so we give it the flag INDEX_CREATE_SKIP_BUILD.
> But the problem comes from here when the function get_relation_info is called 
> in planning stage, it tries to calculate the B-Tree height by calling 
> function _bt_getrootheight, but the B-Tree is not built at all, and its 
> metadata page (which is block 0 in our case) doesn't exist, so this returns 
> error that it cannot read the page (since it doesn't exist).
>
> I tried to debug the code and found that this feature was introduced in 
> version 9.3 under this commit [1]. I think that in the code we need to check 
> if it's a B-Tree index AND the index is built/have some pages, then we can go 
> and calculate it otherwise just put it to -1

> I mean instead of this
> if (info->relam == BTREE_AM_OID)
> {
> /* For btrees, get tree height while we have the index open */
> info->tree_height = _bt_getrootheight(indexRelation);
> }
> else
> {
>  /* For other index types, just set it to "unknown" for now */
> info->tree_height = -1;
> }
>
> The first line should be
> if (info->relam == BTREE_AM_OID && info->pages > 0)
> or use the storage manager (smgr) to know if the first block exists.

I think the better method would be to calculate the index height
*after* get_relation_info_hook is called. That way, instead of the
server guessing whether or not an index is hypothetical it can rely on
the index adviser's notion of which index is hypothetical. The hook
implementer has the opportunity to not only mark the
indexOptInfo->hypothetical = true, but also calculate the tree height,
if they can.

Please see attached the patch that does this. Let me know if this patch helps.

Best regards,
Gurjeet
http://Gurje.et


calculate-index-height-after-calling-get_relation_info_hook.patch
Description: Binary data


Re: logicalrep_message_type throws an error

2023-07-19 Thread Amit Kapila
On Thu, Jul 20, 2023 at 9:10 AM Amit Kapila  wrote:
>
> On Wed, Jul 19, 2023 at 10:08 AM Ashutosh Bapat
>  wrote:
> >
> > On Wed, Jul 19, 2023 at 9:01 AM Amit Kapila  wrote:
> > >
> > > On Tue, Jul 18, 2023 at 10:27 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > Or can we use snprintf() writing "??? (%d)" to a fixed length char[8 +
> > > > 11] allocated on the stack instead?
> > > >
> > >
> > > In the above size calculation, shouldn't it be 7 + 11 where 7 is for
> > > (3 (???) + 1 for space + 2 for () + 1 for terminating null char) and
> > > 11 is for %d? BTW, this avoids dynamic allocation of the err string in
> > > logicalrep_message_type() but we can't return a locally allocated
> > > string, so do you think we should change the prototype of the function
> > > to get this as an argument and then use it both for valid and invalid
> > > cases?
> >
> > There are other places in the code which do something similar by using
> > statically allocated buffers like static char xya[SIZE]. We could do
> > that here. The caller may decide whether to pstrdup this buffer
> > further or just use it one time e.g. as an elog or printf argument.
> >
>
> Okay, changed it accordingly.
>

oops, forgot to attach the patch.

-- 
With Regards,
Amit Kapila.


v3-0001-Fix-the-display-of-UNKNOWN-message-type-in-apply-.patch
Description: Binary data


Re: logicalrep_message_type throws an error

2023-07-19 Thread Amit Kapila
On Wed, Jul 19, 2023 at 10:08 AM Ashutosh Bapat
 wrote:
>
> On Wed, Jul 19, 2023 at 9:01 AM Amit Kapila  wrote:
> >
> > On Tue, Jul 18, 2023 at 10:27 AM Masahiko Sawada  
> > wrote:
> > >
> > > Or can we use snprintf() writing "??? (%d)" to a fixed length char[8 +
> > > 11] allocated on the stack instead?
> > >
> >
> > In the above size calculation, shouldn't it be 7 + 11 where 7 is for
> > (3 (???) + 1 for space + 2 for () + 1 for terminating null char) and
> > 11 is for %d? BTW, this avoids dynamic allocation of the err string in
> > logicalrep_message_type() but we can't return a locally allocated
> > string, so do you think we should change the prototype of the function
> > to get this as an argument and then use it both for valid and invalid
> > cases?
>
> There are other places in the code which do something similar by using
> statically allocated buffers like static char xya[SIZE]. We could do
> that here. The caller may decide whether to pstrdup this buffer
> further or just use it one time e.g. as an elog or printf argument.
>

Okay, changed it accordingly. Currently, we call it only from
errcontext, so it looks reasonable to me to use static here.

> As I said before, we should not even print message type in the error
> context because it's unknown. Repeating that twice is useless. That
> will need some changes to apply_error_callback() though.
> But I am fine with "???" as well.
>

I think in the end it won't make a big difference. So, I would like to
go with Sawada-San's suggestion to keep the message type consistent in
actual error and error context unless that requires bigger changes.

-- 
With Regards,
Amit Kapila.




Re: Use COPY for populating all pgbench tables

2023-07-19 Thread Michael Paquier
On Wed, Jul 19, 2023 at 01:03:21PM -0500, Tristan Partin wrote:
> Didn't actually include the changes in the previous patch.

-initGenerateDataClientSide(PGconn *con)
+initBranch(PQExpBufferData *sql, int64 curr)
 {
-   PQExpBufferData sql;
+   /* "filler" column defaults to NULL */
+   printfPQExpBuffer(sql,
+ INT64_FORMAT "\t0\t\n",
+ curr + 1);
+}
+
+static void
+initTeller(PQExpBufferData *sql, int64 curr)
+{
+   /* "filler" column defaults to NULL */
+   printfPQExpBuffer(sql,
+ INT64_FORMAT "\t" INT64_FORMAT "\t0\t\n",
+ curr + 1, curr / ntellers + 1);

Hmm.  Something's not right here, see:
=# select count(*) has_nulls from pgbench_accounts where filler is null;
 has_nulls 
---
 0
(1 row)
=# select count(*) > 0 has_nulls from pgbench_tellers where filler is null;
 has_nulls 
---
 f
(1 row)
=# select count(*) > 0 has_nulls from pgbench_branches where filler is null;
 has_nulls 
---
 f
(1 row)

Note as well this comment in initCreateTables():
/*
 * Note: TPC-B requires at least 100 bytes per row, and the "filler"
 * fields in these table declarations were intended to comply with that.
 * The pgbench_accounts table complies with that because the "filler"
 * column is set to blank-padded empty string. But for all other tables
 * the columns default to NULL and so don't actually take any space.  We
 * could fix that by giving them non-null default values.  However, that
 * would completely break comparability of pgbench results with prior
 * versions. Since pgbench has never pretended to be fully TPC-B compliant
 * anyway, we stick with the historical behavior.
 */

So this patch causes pgbench to not stick with its historical
behavior, and the change is incompatible with the comments because the
tellers and branches tables don't use NULL for their filler attribute
anymore.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-19 Thread Peter Smith
Some review comments for patch v20-0002

==
src/backend/replication/logical/tablesync.c

1. finish_sync_worker
/*
 * Exit routine for synchronization worker.
 *
 * If reuse_worker is false, the worker will not be reused and exit.
 */

~

IMO the "will not be reused" part doesn't need saying -- it is
self-evident from the fact "reuse_worker is false".

SUGGESTION
If reuse_worker is false, at the conclusion of this function the
worker process will exit.

~~~

2. finish_sync_worker

- StartTransactionCommand();
- ereport(LOG,
- (errmsg("logical replication table synchronization worker for
subscription \"%s\", table \"%s\" has finished",
- MySubscription->name,
- get_rel_name(MyLogicalRepWorker->relid;
- CommitTransactionCommand();
-
  /* Find the leader apply worker and signal it. */
  logicalrep_worker_wakeup(MyLogicalRepWorker->subid, InvalidOid);

- /* Stop gracefully */
- proc_exit(0);
+ if (!reuse_worker)
+ {
+ StartTransactionCommand();
+ ereport(LOG,
+ (errmsg("logical replication table synchronization worker for
subscription \"%s\" has finished",
+ MySubscription->name)));
+ CommitTransactionCommand();
+
+ /* Stop gracefully */
+ proc_exit(0);
+ }

In the HEAD code the log message came *before* it signalled to the
apply leader. Won't it be better to keep the logic in that same order?

~~~

3. process_syncing_tables_for_sync

- finish_sync_worker();
+ /* Sync worker has completed synchronization of the current table. */
+ MyLogicalRepWorker->is_sync_completed = true;
+
+ ereport(LOG,
+ (errmsg("logical replication table synchronization worker for
subscription \"%s\", relation \"%s\" with relid %u has finished",
+ MySubscription->name,
+ get_rel_name(MyLogicalRepWorker->relid),
+ MyLogicalRepWorker->relid)));
+ CommitTransactionCommand();

IIUC it is only the " table synchronization" part that is finished
here; not the whole "table synchronization worker" (compared to
finish_sync_worker function), so maybe the word "worker"  should not
be in this message.

~~~

4. TablesyncWorkerMain

+ if (MyLogicalRepWorker->is_sync_completed)
+ {
+ /* tablesync is done unless a table that needs syncning is found */
+ done = true;

SUGGESTION (Typo "syncning" and minor rewording.)
This tablesync worker is 'done' unless another table that needs
syncing is found.

~

5.
+ /* Found a table for next iteration */
+ finish_sync_worker(true);
+
+ StartTransactionCommand();
+ ereport(LOG,
+ (errmsg("logical replication worker for subscription \"%s\" will be
reused to sync table \"%s\" with relid %u.",
+ MySubscription->name,
+ get_rel_name(MyLogicalRepWorker->relid),
+ MyLogicalRepWorker->relid)));
+ CommitTransactionCommand();
+
+ done = false;
+ break;
+ }
+ LWLockRelease(LogicalRepWorkerLock);

5a.
IMO it seems better to put this ereport *inside* the
finish_sync_worker() function alongside the similar log for when the
worker is not reused.

~

5b.
Isn't there a missing call to that LWLockRelease, if the 'break' happens?

==
src/backend/replication/logical/worker.c

6. LogicalRepApplyLoop

Refer to [1] for my reply to a previous review comment

~~~

7. InitializeLogRepWorker

  if (am_tablesync_worker())
  ereport(LOG,
- (errmsg("logical replication worker for subscription \"%s\", table
\"%s\" has started",
+ (errmsg("logical replication worker for subscription \"%s\", table
\"%s\" with relid %u has started",
  MySubscription->name,
- get_rel_name(MyLogicalRepWorker->relid;
+ get_rel_name(MyLogicalRepWorker->relid),
+ MyLogicalRepWorker->relid)));

But this is certainly a tablesync worker so the message here should
say "logical replication table synchronization worker" like the HEAD
code used to do.

It seems this mistake was introduced in patch v20-0001.

==
src/include/replication/worker_internal.h

8.
Refer to [1] for my reply to a previous review comment

--
[1] Replies to previous 0002 comments --
https://www.postgresql.org/message-id/CAHut%2BPtiAtGJC52SGNdobOah5ctYDDhWWKd%3DuP%3DrkRgXzg5rdg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-07-19 Thread Peter Smith
On Tue, Jul 18, 2023 at 1:54 AM Melih Mutlu  wrote:
>
> Hi,
>
> PFA updated patches. Rebased 0003 with minor changes. Addressed Peter's 
> reviews for 0001 and 0002 with some small comments below.
>
> Peter Smith , 10 Tem 2023 Pzt, 10:09 tarihinde şunu 
> yazdı:
>>
>> 6. LogicalRepApplyLoop
>>
>> + /*
>> + * apply_dispatch() may have gone into apply_handle_commit()
>> + * which can call process_syncing_tables_for_sync.
>> + *
>> + * process_syncing_tables_for_sync decides whether the sync of
>> + * the current table is completed. If it is completed,
>> + * streaming must be already ended. So, we can break the loop.
>> + */
>> + if (MyLogicalRepWorker->is_sync_completed)
>> + {
>> + endofstream = true;
>> + break;
>> + }
>> +
>>
>> and
>>
>> + /*
>> + * If is_sync_completed is true, this means that the tablesync
>> + * worker is done with synchronization. Streaming has already been
>> + * ended by process_syncing_tables_for_sync. We should move to the
>> + * next table if needed, or exit.
>> + */
>> + if (MyLogicalRepWorker->is_sync_completed)
>> + endofstream = true;
>>
>> ~
>>
>> Instead of those code fragments above assigning 'endofstream' as a
>> side-effect, would it be the same (but tidier) to just modify the
>> other "breaking" condition below:
>>
>> BEFORE:
>> /* Check if we need to exit the streaming loop. */
>> if (endofstream)
>> break;
>>
>> AFTER:
>> /* Check if we need to exit the streaming loop. */
>> if (endofstream || MyLogicalRepWorker->is_sync_completed)
>> break;
>
>
> First place you mentioned also breaks the infinite loop. Such an if statement 
> is needed there with or without endofstream assignment.
>
> I think if there is a flag to break a loop, using that flag to indicate that 
> we should exit the loop seems more appropriate to me. I see that it would be 
> a bit tidier without endofstream = true lines, but I feel like it would also 
> be less readable.
>
> I don't have a strong opinion though. I'm just keeping them as they are for 
> now, but I can change them if you disagree.
>

I felt it was slightly sneaky to re-use the existing variable as a
convenient way to do what you want. But, I don’t feel strongly enough
on this point to debate it -- maybe see later if others have an
opinion about this.

>>
>>
>> 10b.
>> All the other tablesync-related fields of this struct are named as
>> relXXX, so I wonder if is better for this to follow the same pattern.
>> e.g. 'relsync_completed'
>
>
> Aren't those start with rel because they're related to the relation that the 
> tablesync worker is syncing? is_sync_completed is not a relation specific 
> field. I'm okay with changing the name but feel like relsync_completed would 
> be misleading.

My reading of the code is slightly different: Only these fields have
the prefix ‘rel’ and they are all grouped under the comment “/* Used
for initial table synchronization. */” because AFAIK only these fields
are TWS specific (not used for other kinds of workers).

Since this new flag field is also TWS-specific, therefore IMO it
should follow the same consistent name pattern. But, if you are
unconvinced, maybe see later if others have an opinion about it.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Support worker_spi to execute the function dynamically.

2023-07-19 Thread Masahiro Ikeda

Hi,

While I'm working on the thread[1], I found that the function of
worker_spi module fails if 'shared_preload_libraries' doesn't have
worker_spi.

The reason is that the database name is NULL because the database name
is initialized only when process_shared_preload_libraries_in_progress
is true.

```
psql=# SELECT worker_spi_launch(1) ;
2023-07-20 11:00:56.491 JST [1179891] LOG:  worker_spi worker 1 
initialized with schema1.counted
2023-07-20 11:00:56.491 JST [1179891] FATAL:  cannot read pg_class 
without having selected a database at character 22
2023-07-20 11:00:56.491 JST [1179891] QUERY:  select count(*) from 
pg_namespace where nspname = 'schema1'
2023-07-20 11:00:56.491 JST [1179891] STATEMENT:  select count(*) from 
pg_namespace where nspname = 'schema1'
2023-07-20 11:00:56.492 JST [1179095] LOG:  background worker 
"worker_spi" (PID 1179891) exited with exit code 1

```

In my understanding, the restriction is not required. So, I think it's
better to change the behavior.
(v1-0001-Support-worker_spi-to-execute-the-function-dynamical.patch)

What do you think?

[1] Support to define custom wait events for extensions
https://www.postgresql.org/message-id/flat/b9f5411acda0cf15c8fbb767702ff43e%40oss.nttdata.com

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom c6e60c66c4066b4a01981ffae5a168901e7283eb Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Thu, 20 Jul 2023 10:34:50 +0900
Subject: [PATCH] Support worker_spi to execute the function dynamically.

Currently, the database name to connect is initialized only
when process_shared_preload_libraries_in_progress is true.
It means that worker_spi_launch() fails if shared_preload_libraries
is empty because the database to connect is NULL.

The patch changes that the database name is always initilized
when called _PG_init(). We can call worker_spi_launch() and
launch SPI workers dynamically.
---
 src/test/modules/worker_spi/worker_spi.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index 7227cfaa45..ccc38a36d3 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -296,6 +296,15 @@ _PG_init(void)
 			NULL,
 			NULL);
 
+	DefineCustomStringVariable("worker_spi.database",
+			   "Database to connect to.",
+			   NULL,
+			   _spi_database,
+			   "postgres",
+			   PGC_SIGHUP,
+			   0,
+			   NULL, NULL, NULL);
+
 	if (!process_shared_preload_libraries_in_progress)
 		return;
 
@@ -312,15 +321,6 @@ _PG_init(void)
 			NULL,
 			NULL);
 
-	DefineCustomStringVariable("worker_spi.database",
-			   "Database to connect to.",
-			   NULL,
-			   _spi_database,
-			   "postgres",
-			   PGC_POSTMASTER,
-			   0,
-			   NULL, NULL, NULL);
-
 	MarkGUCPrefixReserved("worker_spi");
 
 	/* set up common data for all our workers */
-- 
2.25.1



Re: pg_recvlogical prints bogus error when interrupted

2023-07-19 Thread Michael Paquier
On Wed, Jul 19, 2023 at 01:46:02PM +0530, Bharath Rupireddy wrote:
> Printing LSN on signal exit won't be correct - if signal is received
> before cur_record_lsn gets assigned, we will be showing an old LSN if
> it was previously assigned or invalid LSN if it wasn't assigned
> previously. Signal arrival and processing are indeterministic, so we
> can't always show the right info.

I think that there's an argument to be made because cur_record_lsn
will be set before coming back to the beginning of the replay loop
when a stop is triggered by a signal.

> Instead, we can just be simple in the messaging without an lsn like
> pg_receivewal.

Anyway, I'm OK with simple for now as it looks that you don't feel
about that either, and the patch is enough to fix the report of this
thread.  And one would get periodic information in --verbose mode
depending the sync message frequency, as well.

So, I have applied v6 after fixing two issues with it:
- I have kept the reason as an argument of prepareToTerminate(), to be
able to take advantage of the switch structure where compilers would
generate a warning if adding a new value to StreamStopReason.
- cur_record_lsn was not initialized at the beginning of
StreamLogicalLog(), which is where the compiler complains about the
case of receiving a signal before entering in the replay loop, after
establising a connection.
--
Michael


signature.asc
Description: PGP signature


Re: pg_recvlogical prints bogus error when interrupted

2023-07-19 Thread Michael Paquier
On Wed, Jul 19, 2023 at 01:33:15PM +0530, Bharath Rupireddy wrote:
> I think the delay is expected for the reason specified below and is
> not because of any of the changes in v5. As far as CTRL+C is
> concerned, it is a clean exit and hence we can't escape the while(1)
> loop.

Yes, that's also what I am expecting.  That's costly when replaying a
large change chunk, but we also want a clean exit on a signal as the
code comments document, so..
--
Michael


signature.asc
Description: PGP signature


Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-07-19 Thread David Rowley
On Wed, 19 Jul 2023 at 23:14, Dean Rasheed  wrote:
> Hmm, I'm somewhat sceptical about this second patch. It's not obvious
> why adding such tests would speed it up, and indeed, testing on my
> machine with 50M rows, I see a noticeable speed-up from patch 1, and a
> slow-down from patch 2:

I noticed that 6fcda9aba added quite a lot of conditions that need to
be checked before we process a normal decimal integer string. I think
we could likely do better and code it to assume that most strings will
be decimal and put that case first rather than last.

In the attached, I've changed that for the 32-bit version only.  A
more complete patch would need to do the 16 and 64-bit versions too.

-- setup
create table a (a int);
insert into a select x from generate_series(1,1000)x;
copy a to '~/a.dump';

-- test
truncate a; copy a from '/tmp/a.dump';

master @ ab29a7a9c
Time: 2152.633 ms (00:02.153)
Time: 2121.091 ms (00:02.121)
Time: 2100.995 ms (00:02.101)
Time: 2101.724 ms (00:02.102)
Time: 2103.949 ms (00:02.104)

master + pg_strtoint32_base_10_first.patch
Time: 2061.464 ms (00:02.061)
Time: 2035.513 ms (00:02.036)
Time: 2028.356 ms (00:02.028)
Time: 2043.218 ms (00:02.043)
Time: 2037.035 ms (00:02.037) (~3.6% faster)

Without that, we need to check if the first digit is '0' a total of 3
times and also check if the 2nd digit is any of 'x', 'X', 'o', 'O',
'b' or 'B'. It seems to be coded with the assumption that hex strings
are the most likely. I think decimals are the most likely by far.

David
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 471fbb7ee6..60c90f7252 100644
--- a/src/backend/utils/adt/numutils.c
+++ b/src/backend/utils/adt/numutils.c
@@ -314,113 +314,119 @@ pg_strtoint32_safe(const char *s, Node *escontext)
else if (*ptr == '+')
ptr++;
 
-   /* process digits */
-   if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
+   /* process decimal digits */
+   if (isdigit((unsigned char) ptr[0]) &&
+   (isdigit((unsigned char) ptr[1]) || ptr[1] == '_' || ptr[1] == 
'\0' || isspace(ptr[1])))
{
-   firstdigit = ptr += 2;
+   firstdigit = ptr;
 
while (*ptr)
{
-   if (isxdigit((unsigned char) *ptr))
+   if (isdigit((unsigned char) *ptr))
{
-   if (unlikely(tmp > -(PG_INT32_MIN / 16)))
+   if (unlikely(tmp > -(PG_INT32_MIN / 10)))
goto out_of_range;
 
-   tmp = tmp * 16 + hexlookup[(unsigned char) 
*ptr++];
+   tmp = tmp * 10 + (*ptr++ - '0');
}
else if (*ptr == '_')
{
-   /* underscore must be followed by more digits */
+   /* underscore may not be first */
+   if (unlikely(ptr == firstdigit))
+   goto invalid_syntax;
+   /* and it must be followed by more digits */
ptr++;
-   if (*ptr == '\0' || !isxdigit((unsigned char) 
*ptr))
+   if (*ptr == '\0' || !isdigit((unsigned char) 
*ptr))
goto invalid_syntax;
}
else
break;
}
}
-   else if (ptr[0] == '0' && (ptr[1] == 'o' || ptr[1] == 'O'))
+   /* process hex digits */
+   else if (ptr[0] == '0' && (ptr[1] == 'x' || ptr[1] == 'X'))
{
firstdigit = ptr += 2;
 
while (*ptr)
{
-   if (*ptr >= '0' && *ptr <= '7')
+   if (isxdigit((unsigned char) *ptr))
{
-   if (unlikely(tmp > -(PG_INT32_MIN / 8)))
+   if (unlikely(tmp > -(PG_INT32_MIN / 16)))
goto out_of_range;
 
-   tmp = tmp * 8 + (*ptr++ - '0');
+   tmp = tmp * 16 + hexlookup[(unsigned char) 
*ptr++];
}
else if (*ptr == '_')
{
/* underscore must be followed by more digits */
ptr++;
-   if (*ptr == '\0' || *ptr < '0' || *ptr > '7')
+   if (*ptr == '\0' || !isxdigit((unsigned char) 
*ptr))
goto invalid_syntax;
}
else
break;
}
}
-   else if (ptr[0] == '0' && (ptr[1] == 'b' || ptr[1] == 'B'))

Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans

2023-07-19 Thread Jacob Champion
On Mon, Feb 27, 2023 at 12:24 AM Heikki Linnakangas  wrote:
> On 22/02/2023 15:03, Aleksander Alekseev wrote:
> > If memory serves I noticed that WHERE ... IS NULL queries don't even
> > hit HeapKeyTest() and I was curious where the check for NULLs is
> > actually made. As I understand, SeqNext() in nodeSeqscan.c simply
> > iterates over all the tuples it can find and pushes them to the parent
> > node. We could get a slightly better performance for certain queries
> > if SeqNext() did the check internally.
>
> Right, it might be faster to perform the NULL-checks before checking
> visibility, for example. Arbitrary quals cannot be evaluated before
> checking visibility, but NULL checks could be.

Hi Heikki,

There's quite a bit of work left to do, but I wanted to check if the
attached patch (0002, based on top of Aleks' 0001 from upthread) was
going in the direction you were thinking. This patch pushes down any
forced-null and not-null Vars as ScanKeys. It doesn't remove the
redundant quals after turning them into ScanKeys, so it's needlessly
inefficient, but there's still a decent speedup for some of the basic
benchmarks in 0003.

Plans look something like this:

# EXPLAIN SELECT * FROM t WHERE i IS NULL;
 QUERY PLAN

 Seq Scan on t  (cost=0.00..1393.00 rows=49530 width=4)
   Scan Cond: (i IS NULL)
   Filter: (i IS NULL)
(3 rows)

# EXPLAIN SELECT * FROM t WHERE i = 3;
   QUERY PLAN

 Seq Scan on t  (cost=0.00..1643.00 rows=1 width=4)
   Scan Cond: (i IS NOT NULL)
   Filter: (i = 3)
(3 rows)

The non-nullable case worries me a bit because so many things imply IS
NOT NULL. I think I need to do some sort of cost analysis using the
null_frac statistics -- it probably only makes sense to push an
implicit SK_SEARCHNOTNULL down to the AM layer if some fraction of
rows would actually be filtered out -- but I'm not really sure how to
choose a threshold.

It would also be neat if `COUNT(col)` could push down
SK_SEARCHNOTNULL, but I think that would require a new support
function to rewrite the plan for an aggregate.

Am I on the right track?

Thanks,
--Jacob
From 45ed1948b6aac4fc8a268a77211327786fd4315f Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Tue, 6 Jun 2023 15:57:19 -0700
Subject: [PATCH 3/3] WIP: naive benchmarks

Three benchmarks (bench1-3) for varying degrees of selectivity depending
on the table used. Two benchmarks (bench4-5) for 1.0 selectivity, on
both a single column and a larger number of columns.

$ psql -f ./init postgres
$ for i in 1 2 3 4; do for t in zero twenty fifty eighty hundred; do
echo "= $i: $t ="
pgbench -n -f ./bench$i -T5 -c1 -j1 -Dtable=$t postgres
done; done
$ pgbench -n -f ./bench5 -T5 -c1 -j1 postgres
---
 bench1 |  1 +
 bench2 |  1 +
 bench3 |  1 +
 bench4 |  1 +
 bench5 |  4 
 init   | 37 +
 6 files changed, 45 insertions(+)
 create mode 100644 bench1
 create mode 100644 bench2
 create mode 100644 bench3
 create mode 100644 bench4
 create mode 100644 bench5
 create mode 100755 init

diff --git a/bench1 b/bench1
new file mode 100644
index 00..9cb32d4fcb
--- /dev/null
+++ b/bench1
@@ -0,0 +1 @@
+SELECT COUNT(i) FROM :table;
diff --git a/bench2 b/bench2
new file mode 100644
index 00..1377bae0f5
--- /dev/null
+++ b/bench2
@@ -0,0 +1 @@
+SELECT COUNT(*) FROM :table WHERE i IS NOT NULL;
diff --git a/bench3 b/bench3
new file mode 100644
index 00..524a140c0a
--- /dev/null
+++ b/bench3
@@ -0,0 +1 @@
+SELECT COUNT(*) FROM :table WHERE i IS NULL;
diff --git a/bench4 b/bench4
new file mode 100644
index 00..c8216537db
--- /dev/null
+++ b/bench4
@@ -0,0 +1 @@
+SELECT COUNT(*) FROM :table WHERE i > 0;
diff --git a/bench5 b/bench5
new file mode 100644
index 00..90a77741f6
--- /dev/null
+++ b/bench5
@@ -0,0 +1,4 @@
+SELECT COUNT(*) FROM wide
+ WHERE i <> 0 AND j <> 0 AND k <> 0 AND l <> 0 AND m <> 0 AND n <> 0 AND o <> 0
+   AND p <> 0 AND q <> 0 AND r <> 0 AND s <> 0 AND t <> 0 AND u <> 0 AND v <> 0
+   AND w <> 0 AND x <> 0 AND y <> 0 AND z <> 0;
diff --git a/init b/init
new file mode 100755
index 00..98c4a0acdb
--- /dev/null
+++ b/init
@@ -0,0 +1,37 @@
+-- Tables are named after the percentage of values that are non-NULL.
+DROP TABLE IF EXISTS zero;
+DROP TABLE IF EXISTS twenty;
+DROP TABLE IF EXISTS fifty;
+DROP TABLE IF EXISTS eighty;
+DROP TABLE IF EXISTS hundred;
+
+DROP TABLE IF EXISTS wide;
+
+\if :{?scale}
+\else
+  \set scale 1
+\endif
+
+CREATE TABLE zero AS
+  SELECT NULL::int AS i FROM generate_series(1, 10 * :scale);
+
+CREATE TABLE twenty AS
+  SELECT CASE WHEN random() < 0.2 THEN i ELSE NULL END AS i
+FROM generate_series(1, 10 * :scale) i;
+
+CREATE TABLE fifty AS
+  SELECT CASE WHEN random() < 0.5 THEN i ELSE NULL END AS i
+FROM generate_series(1, 10 * :scale) i;
+
+CREATE TABLE eighty AS
+  

Re: Support to define custom wait events for extensions

2023-07-19 Thread Michael Paquier
On Wed, Jul 19, 2023 at 11:16:34AM -0500, Tristan Partin wrote:
> On Tue Jul 11, 2023 at 6:29 PM CDT, Michael Paquier wrote:
>> This style comes from LWLockRegisterTranche() in lwlock.c.  Do you
>> think that it would be more adapted to change that to
>> pg_nextpower2_size_t() with a Size?  We could do that for the existing
>> code on HEAD as an improvement.
> 
> Yes, I think that would be the most correct code. At the very least,
> using a uint32 instead of an int, would be preferrable.

Would you like to send a patch on a new thread about that?

>> Hmm, okay, that would nice to hear about to make sure that the
>> approach taken on this thread is able to cover what you are looking
>> for.  So you mean that Neon has been using something similar to
>> register wait events in the backend?  Last time I looked at the Neon
>> repo, I did not get the impression that there was a custom patch for
>> Postgres in this area.  All the in-core code paths using
>> WAIT_EVENT_EXTENSION would gain from the APIs added here, FWIW.
> 
> I did some investigation into the Neon fork[0], and couldn't find any
> commits that seemed related. Perhaps this is on our wishlist instead of
> something we already have implemented. I have CCed Heikki to talk some
> more about how this would fit in at Neon.
> 
> [0]: https://github.com/neondatabase/postgres

Anybody with complex out-of-core extensions have wanted more
monitoring capabilities for wait events without relying on the core
backend.  To be honest, I would not be surprised to see this stuff on
more than one wishlist.
--
Michael


signature.asc
Description: PGP signature


Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

2023-07-19 Thread Michael Paquier
On Wed, Jul 19, 2023 at 11:21:17AM -0700, David Zhang wrote:
> 1) simply start server from a base backup
> 
> FATAL:  could not find recovery.signal or standby.signal when recovering
> with backup_label
> 
> HINT:  If you are restoring from a backup, touch
> "/media/david/disk1/pg_backup1/recovery.signal" or
> "/media/david/disk1/pg_backup1/standby.signal" and add required recovery
> options.

Note the difference when --write-recovery-conf is specified, where a
standby.conf is created with a primary_conninfo in
postgresql.auto.conf.  So, yes, that's expected by default with the
patch.

> 2) touch a recovery.signal file and then try to start the server, the
> following error was encountered:
> 
> FATAL:  must specify restore_command when standby mode is not enabled

Yes, that's also something expected in the scope of the v1 posted.
The idea behind this restriction is that specifying recovery.signal is
equivalent to asking for archive recovery, but not specifying
restore_command is equivalent to not provide any options to be able to
recover.  See validateRecoveryParameters() and note that this
restriction exists since the beginning of times, introduced in commit
66ec2db.  I tend to agree that there is something to be said about
self-contained backups taken from pg_basebackup, though, as these
would fail if no restore_command is specified, and this restriction is
in place before Postgres has introduced replication and easier ways to
have base backups.  As a whole, I think that there is a good argument
in favor of removing this restriction for the case where archive
recovery is requested if users have all their WAL in pg_wal/ to be
able to recover up to a consistent point, keeping these GUC
restrictions if requesting a standby (not recovery.signal, only
standby.signal).

> 3) touch a standby.signal file, then the server successfully started,
> however, it operates in standby mode, whereas the intended behavior was for
> it to function as a primary server.

standby.signal implies that the server will start in standby mode.  If
one wants to deploy a new primary, that would imply a timeline jump at
the end of recovery, you would need to specify recovery.signal
instead.

We need more discussions and more opinions, but the discussion has
stalled for a few months now.  In case, I am adding Thomas Munro in CC
who has mentioned to me at PGcon that he was interested in this patch
(this thread's problem is not directly related to the fact that the
checkpointer now runs in crash recovery, though).

For now, I am attaching a rebased v2.
--
Michael
From 9945a86161b1cf23f7002a8f1f4bce20061693d6 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 20 Jul 2023 07:59:56 +0900
Subject: [PATCH v2] Strengthen use of ArchiveRecoveryRequested and
 InArchiveRecovery

---
 src/backend/access/transam/xlogrecovery.c | 112 +++---
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  |   3 +-
 src/bin/pg_rewind/t/008_min_recovery_point.pl |   1 +
 3 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index becc2bda62..86899452b4 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -125,15 +125,19 @@ static TimeLineID curFileTLI;
 
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
- * ie. signal files were present.  When InArchiveRecovery is set, we are
- * currently recovering using offline XLOG archives.  These variables are only
- * valid in the startup process.
+ * ie. signal files or backup_label were present.  When InArchiveRecovery is
+ * set, we are currently recovering using offline XLOG archives.  These
+ + variables are only valid in the startup process.  Note that the presence of
+ * a backup_label file forces archive recovery even if there is no signal
+ * file.
  *
  * When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're
  * currently performing crash recovery using only XLOG files in pg_wal, but
  * will switch to using offline XLOG archives as soon as we reach the end of
  * WAL in pg_wal.
-*/
+ *
+ * InArchiveRecovery should never be set without ArchiveRecoveryRequested.
+ */
 bool		ArchiveRecoveryRequested = false;
 bool		InArchiveRecovery = false;
 
@@ -540,42 +544,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 	readRecoverySignalFile();
 	validateRecoveryParameters();
 
-	if (ArchiveRecoveryRequested)
-	{
-		if (StandbyModeRequested)
-			ereport(LOG,
-	(errmsg("entering standby mode")));
-		else if (recoveryTarget == RECOVERY_TARGET_XID)
-			ereport(LOG,
-	(errmsg("starting point-in-time recovery to XID %u",
-			recoveryTargetXid)));
-		else if (recoveryTarget == RECOVERY_TARGET_TIME)
-			ereport(LOG,
-	(errmsg("starting point-in-time recovery to %s",
-			timestamptz_to_str(recoveryTargetTime;
-		else if (recoveryTarget == RECOVERY_TARGET_NAME)
-			

Re: Move un-parenthesized syntax docs to "compatibility" for few SQL commands

2023-07-19 Thread Nathan Bossart
On Fri, Jun 30, 2023 at 11:25:57AM -0700, Nathan Bossart wrote:
> I think these patches are in decent shape, so I'd like to commit them soon,
> but I will wait at least a couple more weeks in case anyone has additional
> feedback.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Use of additional index columns in rows filtering

2023-07-19 Thread Peter Geoghegan
On Wed, Jul 19, 2023 at 1:46 PM Jeff Davis  wrote:
> On Wed, 2023-07-19 at 20:03 +0200, Tomas Vondra wrote:
> > Makes sense, I also need to think about maybe not having duplicate
> > clauses in the two lists. What annoys me on that it partially
> > prevents
> > the cost-based reordering done by order_qual_clauses(). So maybe we
> > should have three lists ... Also, some of the expressions count be
> > fairly expensive.
>
> Can we just calculate the costs of the pushdown and do it when it's a
> win? If the random_page_cost savings exceed the costs from evaluating
> the clause earlier, then push down.

My patch that teaches nbtree to execute ScalarArrayOps intelligently
(by dynamically choosing to not re-descend the btree to perform
another "primitive index scan" when the data we need is located on the
same leaf page as the current ScalarArrayOps arrays) took an
interesting turn recently -- one that seems related.

I found that certain kinds of queries are dramatically faster once you
teach the optimizer to accept that multi-column ScalarArrayOps can be
trusted to return tuples in logical/index order, at least under some
circumstances. For example:

pg@regression: [583930]=# create index order_by_saop on
tenk1(two,four,twenty);
CREATE INDEX

pg@regression: [583930]=# EXPLAIN (ANALYZE, BUFFERS)
select ctid, thousand from tenk1
where two in (0,1) and four in (1,2) and twenty in (1,2)
order by two, four, twenty limit 20;

This shows "Buffers: shared hit=1377" on HEAD, versus "Buffers: shared
hit=13" with my patch. All because we can safely terminate the scan
early now. The vast majority of the buffer hits the patch will avoid
are against heap pages, even though I started out with the intention
of eliminating unnecessary repeat index page accesses.

Note that build_index_paths() currently refuses to allow SAOP clauses
to be recognized as ordered with a multi-column index and a query with
a clause for more than the leading column -- that is something that
the patch needs to address (to get this particular improvement, at
least). Allowing such an index path to have useful pathkeys is
typically safe (in the sense that it won't lead to wrong answers to
queries), but we still make a conservative assumption that they can
lead to wrong answers. There are comments about "equality constraints"
that describe the restrictions right now.

But it's not just the question of basic correctness -- the optimizer
is very hesitant to use multi-column SAOPs, even in cases that don't
care about ordering. So it's also (I think, implicitly) a question of
*risk*. The risk of getting very inefficient SAOP execution in nbtree,
when it turns out that a huge number of "primitive index scans" are
needed. But, if nbtree is taught to "coalesce together" primitive
index scans at runtime, that risk goes way down.

Anyway, this seems related to what you're talking about because the
relationship between selectivity and ordering seems particularly
important in this context. And because it suggests that there is at
least some scope for adding "run time insurance" to the executor,
which is valuable in the optimizer if it bounds the potential
downside. If you can be practically certain that there is essentially
zero risk of serious problems when the costing miscalculates (for a
limited subset of cases), then life might be a lot easier -- clearly
we should be biased in one particular direction with a case that has
that kind of general profile.

My current understanding of the optimizer side of things --
particularly things like costing for "filter quals/pqquals" versus
costing for "true index quals" -- is rather basic. That will have to
change. Curious to hear your thoughts (if any) on how what you're
discussing may relate to what I need to do with my patch. Right now my
patch assumes that making SAOP clauses into proper index quals (that
usually preserve index ordering) is an unalloyed good (when safe!).
This assumption is approximately true on average, as far as I can
tell. But it's probably quite untrue in various specific cases, that
somebody is bound to care about.

--
Peter Geoghegan




Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-07-19 Thread Andrew Dunstan


On 2023-07-19 We 16:44, Andrew Dunstan wrote:



On 2023-07-19 We 15:20, Andrew Dunstan wrote:



On 2023-07-19 We 12:05, Alvaro Herrera wrote:



Maybe we need to make AdjustUpgrade just look at the major version,
something like:

    $old_version = PostgreSQL::Version->new($old_version->major);

It seems like that does work, but if we do that, then we also need to
change this line:

if ($old_version lt '9.5')
to
if ($old_version < '9.5')

otherwise you get some really mysterious failures about trying to drop
public.=>, which is in fact no longer accepted syntax since 9.5; and the
stringwise comparison returns the wrong value here.



That seems odd. String comparison like that is supposed to work. I 
will do some tests.




TBH I'm getting a sense of discomfort with the idea of having developed
a Postgres-version-number Perl module, and in the only place where we
can use it, have to settle for numeric comparison instead.



These comparisons only look like that. They are overloaded in 
PostgreSQL::Version.




The result you report suggest to me that somehow the old version is no 
longer a PostgreSQL::Version object.  Here's the patch I suggest:



diff --git a/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm 
b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm

index a241d2ceff..d7a7383deb 100644
--- a/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
+++ b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
@@ -74,6 +74,11 @@ values are arrayrefs to lists of statements to be 
run in those databases.

 sub adjust_database_contents
 {
    my ($old_version, %dbnames) = @_;
+
+   die "wrong type for \$old_version\n"
+ unless $old_version->isa("PostgreSQL::Version");
+   $old_version = PostgreSQL::Version->new($old_version->major);
+
    my $result = {};

    # remove dbs of modules known to cause pg_upgrade to fail


Do you still see errors with that?





Just realized it would need to be applied in all three exported routines.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Use of additional index columns in rows filtering

2023-07-19 Thread Jeff Davis
On Wed, 2023-07-19 at 20:03 +0200, Tomas Vondra wrote:
> Makes sense, I also need to think about maybe not having duplicate
> clauses in the two lists. What annoys me on that it partially
> prevents
> the cost-based reordering done by order_qual_clauses(). So maybe we
> should have three lists ... Also, some of the expressions count be
> fairly expensive.

Can we just calculate the costs of the pushdown and do it when it's a
win? If the random_page_cost savings exceed the costs from evaluating
the clause earlier, then push down.

> BTW could you double-check how I expanded the index_getnext_slot()? I
> recall I wasn't entirely confident the result is correct, and I
> wanted
> to try getting rid of the "while (true)" loop.

I suggest refactoring slightly to have the two loops in different
functions (rather than nested loops in the same function) to make
control flow a bit more clear. I'm not sure if the new function for the
inner loop should be defined in nodeIndexscan.c or indexam.c; I suppose
it depends on how clean the signature looks.

Also please expand the tests a bit to show more EXPLAIN plans that
illustrate the different cases.

Regards,
Jeff Davis





Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-07-19 Thread Andrew Dunstan


On 2023-07-19 We 15:20, Andrew Dunstan wrote:



On 2023-07-19 We 12:05, Alvaro Herrera wrote:



Maybe we need to make AdjustUpgrade just look at the major version,
something like:

    $old_version = PostgreSQL::Version->new($old_version->major);

It seems like that does work, but if we do that, then we also need to
change this line:

if ($old_version lt '9.5')
to
if ($old_version < '9.5')

otherwise you get some really mysterious failures about trying to drop
public.=>, which is in fact no longer accepted syntax since 9.5; and the
stringwise comparison returns the wrong value here.



That seems odd. String comparison like that is supposed to work. I 
will do some tests.




TBH I'm getting a sense of discomfort with the idea of having developed
a Postgres-version-number Perl module, and in the only place where we
can use it, have to settle for numeric comparison instead.



These comparisons only look like that. They are overloaded in 
PostgreSQL::Version.




The result you report suggest to me that somehow the old version is no 
longer a PostgreSQL::Version object.  Here's the patch I suggest:



diff --git a/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm 
b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm

index a241d2ceff..d7a7383deb 100644
--- a/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
+++ b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
@@ -74,6 +74,11 @@ values are arrayrefs to lists of statements to be run 
in those databases.

 sub adjust_database_contents
 {
    my ($old_version, %dbnames) = @_;
+
+   die "wrong type for \$old_version\n"
+ unless $old_version->isa("PostgreSQL::Version");
+   $old_version = PostgreSQL::Version->new($old_version->major);
+
    my $result = {};

    # remove dbs of modules known to cause pg_upgrade to fail


Do you still see errors with that?


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Giving more detail in pg_upgrade errormessage

2023-07-19 Thread Daniel Gustafsson
> On 18 Jul 2023, at 18:04, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> Looking at the upgrade question in [0] made me realize that we discard
>> potentially useful information for troubleshooting.  When we check if the
>> cluster is properly shut down we might as well include the status from
>> pg_controldata in the errormessage as per the trivial (but yet untested)
>> proposed diff.
> 
>> Is there a reason not to be verbose here as users might copy/paste this 
>> output
>> when asking for help?
> 
> Agreed, but I think you need to chomp the string's trailing newline,
> or it'll look ugly.  You might as well do that further up and remove
> the newlines from the comparison strings, too.

Yeah, the previous diff was mostly a sketch.  The attached strips newline and
makes the comparisons a bit neater in the process due to that.  Will apply this
trivial but seemingly useful change unless objected to.

--
Daniel Gustafsson



v2-0001-pg_upgrade-include-additional-detail-in-cluster-c.patch
Description: Binary data


Re: Adding argument names to aggregate functions

2023-07-19 Thread Daniel Gustafsson
> On 19 Jul 2023, at 19:32, Dagfinn Ilmari Mannsåker  wrote:
> 
> Daniel Gustafsson  writes:
> 
>> This patch no longer applied but had a fairly trivial conflict so I've 
>> attached
>> a rebased v3 addressing the conflict in the hopes of getting this further.
> 
> Thanks for the heads-up!  Turns out the conflict was due to the new
> json(b)_object_agg(_unique)(_strict) functions, which should also have
> proargnames added.  Here's an updated patch that does that.

Great, thanks!  I had a quick look at this while rebasing (as well as your
updated patch) and it seems like a good idea to add this.  Unless there are
objections I will look at getting this in.

--
Daniel Gustafsson





Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-07-19 Thread Andrew Dunstan


On 2023-07-19 We 12:05, Alvaro Herrera wrote:

On 2023-Jul-19, Andrew Dunstan wrote:


On 2023-07-19 We 07:05, Alvaro Herrera wrote:

I just hit a snag testing this.  It turns out that the
PostgreSQL::Version comparison stuff believes that 16beta2 < 16, which
sounds reasonable.  However, because of that, the AdjustUpgrade.pm
stanza that tries to drop tables public.gtest_normal_child{2} in
versions earlier than 16 fails, because by 16 these tables are dropped
in the test itself rather than left to linger, as was the case in
versions 15 and earlier.

The buildfarm module assumes that no adjustments are necessary if the old
and new versions are the same (e.g. HEAD to HEAD). And it never passes in a
version like '16beta2'. It extracts the version number from the branch name,
e.g. REL_16_STABLE => 16.

Hmm, OK, but I'm not testing the same versions -- I'm testing 16beta2 to
17devel.



Yeah, but you asked why the buildfarm didn't see this effect, and the 
answer is that it never uses version arguments like '16beta2'.






I can fix this either by using DROP IF EXISTS in that stanza, or by
making AdjustUpgrade use 'version <= 15'.  Any opinions on which to
prefer?

The trouble is this could well break the next time someone puts in a test
like this.

Hmm, I don't understand what you mean.



I want to prevent things like this from happening in the future if 
someone puts a test in the development branch with  "if ($oldversion < nn)".






Maybe we need to make AdjustUpgrade just look at the major version,
something like:

    $old_version = PostgreSQL::Version->new($old_version->major);

It seems like that does work, but if we do that, then we also need to
change this line:

if ($old_version lt '9.5')
to
if ($old_version < '9.5')

otherwise you get some really mysterious failures about trying to drop
public.=>, which is in fact no longer accepted syntax since 9.5; and the
stringwise comparison returns the wrong value here.



That seems odd. String comparison like that is supposed to work. I will 
do some tests.





TBH I'm getting a sense of discomfort with the idea of having developed
a Postgres-version-number Perl module, and in the only place where we
can use it, have to settle for numeric comparison instead.



These comparisons only look like that. They are overloaded in 
PostgreSQL::Version.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Atomic ops for unlogged LSN

2023-07-19 Thread John Morris
>> Why don't we just use a barrier when around reading the value? It's not like
>> CreateCheckPoint() is frequent?

One reason is that a barrier isn’t needed, and adding unnecessary barriers can 
also be confusing.

With respect to the “debug only” comment in the original code, whichever value 
is written to the structure during a checkpoint will be reset when restarting. 
Nobody will ever see that value. We could just as easily write a zero.

Shutting down is a different story. It isn’t stated, but we require the 
unlogged LSN be quiescent before shutting down. This patch doesn’t change that 
requirement.

We could also argue that memory ordering doesn’t matter when filling in a 
conventional, unlocked structure.  But the fact we have only two cases 1) 
checkpoint where the value is ignored, and 2) shutdown where it is quiescent, 
makes the additional argument unnecessary.

Would you be more comfortable if I added comments describing the two cases? My 
intent was to be minimalistic, but the original code could use better 
explanation.


Re: Requiring recovery.signal or standby.signal when recovering with a backup_label

2023-07-19 Thread David Zhang

On 2023-07-16 6:27 p.m., Michael Paquier wrote:


Delete a backup_label from a fresh base backup can easily lead to data
corruption, as the startup process would pick up as LSN to start
recovery from the control file rather than the backup_label file.
This would happen if a checkpoint updates the redo LSN in the control
file while a backup happens and the control file is copied after the
checkpoint, for instance.  If one wishes to deploy a new primary from
a base backup, recovery.signal is the way to go, making sure that the
new primary is bumped into a new timeline once recovery finishes, on
top of making sure that the startup process starts recovery from a
position where the cluster would be able to achieve a consistent
state.

Thanks a lot for sharing this information.


How would you rewrite that?  I am not sure how many details we want to
put here in terms of differences between recovery.signal and
standby.signal, still we surely should mention these are the two
possible choices.


Honestly, I can't convince myself to mention the backup_label here too. 
But, I can share some information regarding my testing of the patch and 
the corresponding results.


To assess the impact of the patch, I executed the following commands for 
before and after,


pg_basebackup -h localhost -p 5432 -U david -D pg_backup1

pg_ctl -D pg_backup1 -l /tmp/logfile start

Before the patch, there were no issues encountered when starting an 
independent Primary server.



However, after applying the patch, I observed the following behavior 
when starting from the base backup:


1) simply start server from a base backup

FATAL:  could not find recovery.signal or standby.signal when recovering 
with backup_label


HINT:  If you are restoring from a backup, touch 
"/media/david/disk1/pg_backup1/recovery.signal" or 
"/media/david/disk1/pg_backup1/standby.signal" and add required recovery 
options.


2) touch a recovery.signal file and then try to start the server, the 
following error was encountered:


FATAL:  must specify restore_command when standby mode is not enabled

3) touch a standby.signal file, then the server successfully started, 
however, it operates in standby mode, whereas the intended behavior was 
for it to function as a primary server.



Best regards,

David








Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2023-07-19 Thread Justin Pryzby
On Thu, Jun 01, 2023 at 08:50:50AM -0400, Michael Paquier wrote:
> On Wed, May 31, 2023 at 06:35:34PM -0500, Justin Pryzby wrote:
> > What do you think the comment ought to say ?  It already says:
> >
> > src/backend/catalog/heap.c-  * Make a dependency link to force 
> > the relation to be deleted if its
> > src/backend/catalog/heap.c-  * access method is.
> 
> This is the third location where we rely on the fact that
> RELKIND_HAS_TABLE_AM() does not include RELKIND_PARTITIONED_TABLE, so
> it seems worth documenting what we are relying on as a comment?  Say:
>  * Make a dependency link to force the relation to be deleted if its
>  * access method is.
>  *
>  * No need to add an explicit dependency for the toast table, as the
>  * main table depends on it.  Partitioned tables have a table access
>  * method defined, and RELKIND_HAS_TABLE_AM ignores them.

You said that this location "relies on" the macro not including
partitioned tables, but I would say the opposite: the places that use
RELKIND_HAS_TABLE_AM() and do *not* say "or relkind==PARTITIONED_TABLE"
are the ones that "rely on" that...

Anyway, this updates various comments.  No other changes.

-- 
Justin
>From e96a2a109d25bb9ed7cc9c0bf80c0824128dceac Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 7 Mar 2021 00:11:38 -0600
Subject: [PATCH] Allow specifying access method of partitioned tables..

..to be inherited by partitions

See also:
ca4103025dfe26eaaf6a500dec9170fbb176eebc
8586bf7ed8889f39a59dd99b292014b73be85342
ebfe2dbd6b624e2a428e14b7ee9322cc096f63f7 - prevent DROP AM

Authors: Justin Pryzby, Soumyadeep Chakraborty
---
 doc/src/sgml/ref/alter_table.sgml   |  4 ++
 doc/src/sgml/ref/create_table.sgml  |  9 ++-
 src/backend/catalog/heap.c  |  6 +-
 src/backend/commands/tablecmds.c| 89 +++--
 src/backend/utils/cache/lsyscache.c | 22 ++
 src/backend/utils/cache/relcache.c  |  5 ++
 src/bin/pg_dump/pg_dump.c   |  3 +-
 src/bin/pg_dump/t/002_pg_dump.pl| 34 ++
 src/include/catalog/pg_class.h  |  3 +-
 src/include/utils/lsyscache.h   |  1 +
 src/test/regress/expected/create_am.out | 62 -
 src/test/regress/sql/create_am.sql  | 25 +--
 12 files changed, 213 insertions(+), 50 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index d4d93eeb7c6..d32d4c44b10 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -719,6 +719,10 @@ WITH ( MODULUS numeric_literal, REM
  
   This form changes the access method of the table by rewriting it. See
for more information.
+  When applied to a partitioned table, there is no data to rewrite, but any
+  partitions created afterwards with
+  CREATE TABLE PARTITION OF will use that access method,
+  unless overridden by an USING clause.
  
 

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 10ef699fab9..b20d272b151 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1295,9 +1295,12 @@ WITH ( MODULUS numeric_literal, REM
   This optional clause specifies the table access method to use to store
   the contents for the new table; the method needs be an access method of
   type TABLE. See  for more
-  information.  If this option is not specified, the default table access
-  method is chosen for the new table. See  for more information.
+  information.  If this option is not specified, a default table access
+  method is chosen for the new table.
+  When creating a partition, the default table access method is the
+  access method of its parent.
+  For other tables, the default is determined by
+  .
  
 

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4c30c7d461f..82e003cabc1 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -1450,9 +1450,11 @@ heap_create_with_catalog(const char *relname,
 		 * access method is.
 		 *
 		 * No need to add an explicit dependency for the toast table, as the
-		 * main table depends on it.
+		 * main table depends on it.  Partitioned tables have an access method
+		 * defined, but are not handled by RELKIND_HAS_TABLE_AM.
 		 */
-		if (RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE)
+		if ((RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) ||
+relkind == RELKIND_PARTITIONED_TABLE)
 		{
 			ObjectAddressSet(referenced, AccessMethodRelationId, accessmtd);
 			add_exact_object_address(, addrs);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 4dc029f91f1..bc343539aeb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -573,6 +573,7 @@ static ObjectAddress ATExecClusterOn(Relation rel, const char *indexName,
 	 

Re: Use of additional index columns in rows filtering

2023-07-19 Thread Tomas Vondra
On 7/19/23 19:17, Jeff Davis wrote:
> On Wed, 2023-07-19 at 11:16 +0200, Tomas Vondra wrote:
>> I wonder if Andres was right (in the index prefetch thread) that
>> splitting regular index scans and index-only scans may not be ideal.
>> In
>> a way, this patch moves those nodes closer, both in capability and
>> code
>> (because now both use index_getnext_tid etc.).
> 
> Yeah. I could also imagine decomposing the index scan node into more
> pieces, but I don't think it would work out to be a clean data flow.
> Either way, probably out of scope for this patch.
> 

OK

> For this patch I think we should just tweak the EXPLAIN output so that
> it's a little more clear what parts are index-only (at least if VM bit
> is set) and what parts need to go to the heap.
> 

Makes sense, I also need to think about maybe not having duplicate
clauses in the two lists. What annoys me on that it partially prevents
the cost-based reordering done by order_qual_clauses(). So maybe we
should have three lists ... Also, some of the expressions count be
fairly expensive.

BTW could you double-check how I expanded the index_getnext_slot()? I
recall I wasn't entirely confident the result is correct, and I wanted
to try getting rid of the "while (true)" loop.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Use COPY for populating all pgbench tables

2023-07-19 Thread Tristan Partin

Didn't actually include the changes in the previous patch.

--
Tristan Partin
Neon (https://neon.tech)
From 5b934691b88b3b2c5675bc778b0a10e9eeff3dbe Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 23 May 2023 11:48:16 -0500
Subject: [PATCH v5] Use COPY instead of INSERT for populating all tables

COPY is a better interface for bulk loading and/or high latency
connections. Previously COPY was only used for the pgbench_accounts
table because the amount of data was so much larger than the other
tables. However, as already said there are also gains to be had in the
high latency case, such as loading data across continents.
---
 doc/src/sgml/ref/pgbench.sgml |   8 +-
 src/bin/pgbench/pgbench.c | 153 --
 2 files changed, 94 insertions(+), 67 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 850028557d..4424595cc6 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -231,10 +231,10 @@ pgbench  options  d
 extensively through a COPY.
 pgbench uses the FREEZE option with version 14 or later
 of PostgreSQL to speed up
-subsequent VACUUM, unless partitions are enabled.
-Using g causes logging to print one message
-every 100,000 rows while generating data for the
-pgbench_accounts table.
+subsequent VACUUM. If partitions are enabled for
+the pgbench_accounts table, the FREEZE option is
+not enabled. Using g causes logging to print one
+message every 100,000 rows while generating data for all tables.


 With G (server-side data generation),
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 087baa7d57..8cc2e2bca4 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4859,17 +4859,46 @@ initTruncateTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
-/*
- * Fill the standard tables with some data generated and sent from the client
- */
+typedef void initRow(PQExpBufferData *sql, int64 curr);
+
 static void
-initGenerateDataClientSide(PGconn *con)
+initBranch(PQExpBufferData *sql, int64 curr)
+{
+	/* "filler" column defaults to NULL */
+	printfPQExpBuffer(sql,
+	  INT64_FORMAT "\t0\t\n",
+	  curr + 1);
+}
+
+static void
+initTeller(PQExpBufferData *sql, int64 curr)
+{
+	/* "filler" column defaults to NULL */
+	printfPQExpBuffer(sql,
+	  INT64_FORMAT "\t" INT64_FORMAT "\t0\t\n",
+	  curr + 1, curr / ntellers + 1);
+}
+
+static void
+initAccount(PQExpBufferData *sql, int64 curr)
 {
+	/* "filler" column defaults to blank padded empty string */
+	printfPQExpBuffer(sql,
+	  INT64_FORMAT "\t" INT64_FORMAT "\t0\t\n",
+	  curr + 1, curr / naccounts + 1);
+}
+
+static void
+initPopulateTable(PGconn *con, const char *table, int64 base, initRow *init_row)
+{
+	int			n;
+	int			k;
+	int			chars = 0;
+	PGresult	*res;
 	PQExpBufferData sql;
-	PGresult   *res;
-	int			i;
-	int64		k;
-	char	   *copy_statement;
+	char 		copy_statement[256];
+	const char *copy_statement_fmt = "copy %s from stdin";
+	const int64 total = base * scale;
 
 	/* used to track elapsed time and estimate of the remaining time */
 	pg_time_usec_t start;
@@ -4878,50 +4907,24 @@ initGenerateDataClientSide(PGconn *con)
 	/* Stay on the same line if reporting to a terminal */
 	char		eol = isatty(fileno(stderr)) ? '\r' : '\n';
 
-	fprintf(stderr, "generating data (client-side)...\n");
-
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
-	/* truncate away any old data */
-	initTruncateTables(con);
-
 	initPQExpBuffer();
 
 	/*
-	 * fill branches, tellers, accounts in that order in case foreign keys
-	 * already exist
+	 * Use COPY with FREEZE on v14 and later without partitioning.  Remember
+	 * that partitions only applies to pgbench_accounts.
 	 */
-	for (i = 0; i < nbranches * scale; i++)
-	{
-		/* "filler" column defaults to NULL */
-		printfPQExpBuffer(,
-		  "insert into pgbench_branches(bid,bbalance) values(%d,0)",
-		  i + 1);
-		executeStatement(con, sql.data);
-	}
+	if (PQserverVersion(con) >= 14) {
+		const bool is_pgbench_accounts = strcmp(table, "pgbench_accounts") == 0;
 
-	for (i = 0; i < ntellers * scale; i++)
-	{
-		/* "filler" column defaults to NULL */
-		printfPQExpBuffer(,
-		  "insert into pgbench_tellers(tid,bid,tbalance) values (%d,%d,0)",
-		  i + 1, i / ntellers + 1);
-		executeStatement(con, sql.data);
+		if (!is_pgbench_accounts || partitions == 0)
+			copy_statement_fmt = "copy %s from stdin with (freeze on)";
 	}
 
-	/*
-	 * accounts is big enough to be worth using COPY and tracking runtime
-	 */
-
-	/* use COPY with FREEZE on v14 and later without partitioning */
-	if (partitions == 0 && PQserverVersion(con) >= 14)
-		copy_statement = "copy pgbench_accounts from stdin with 

Re: Use COPY for populating all pgbench tables

2023-07-19 Thread Tristan Partin

On Wed Jul 12, 2023 at 10:52 PM CDT, Michael Paquier wrote:

On Wed, Jul 12, 2023 at 09:29:35AM -0500, Tristan Partin wrote:
> On Wed Jul 12, 2023 at 1:06 AM CDT, Michael Paquier wrote:
>> This would use the freeze option only on pgbench_accounts when no
>> partitioning is defined, but my point was a bit different.  We could
>> use the FREEZE option on the teller and branch tables as well, no?
>> Okay, the impact is limited compared to accounts in terms of amount of
>> data loaded, but perhaps some people like playing with large scaling
>> factors where this could show a benefit in the initial data loading.
> 
> Perhaps, should they all be keyed off the same option? Seemed like in

> your previous comment you wanted multiple options. Sorry for not reading
> your comment correctly.

I would have though that --partition should only apply to the
pgbench_accounts table, while FREEZE should apply where it is possible
to use it, aka all the COPY queries except when pgbench_accounts is a
partition.  Would you do something different, like not applying FREEZE
to pgbench_tellers and pgbench_branches as these have much less tuples
than pgbench_accounts?


Michael,

I think I completely misunderstood you. From what I can tell, you want
to use FREEZE wherever possible. I think the new patch covers your
comment and fixes the documentation. I am hoping that I have finally
gotten what you are looking for.

--
Tristan Partin
Neon (https://neon.tech)
From 6fcc685fad9640818da7c8c5ff8a10d3fcf1aaed Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 23 May 2023 11:48:16 -0500
Subject: [PATCH v4] Use COPY instead of INSERT for populating all tables

COPY is a better interface for bulk loading and/or high latency
connections. Previously COPY was only used for the pgbench_accounts
table because the amount of data was so much larger than the other
tables. However, as already said there are also gains to be had in the
high latency case, such as loading data across continents.
---
 src/bin/pgbench/pgbench.c | 155 ++
 1 file changed, 90 insertions(+), 65 deletions(-)

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 087baa7d57..6db1500151 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -4859,17 +4859,46 @@ initTruncateTables(PGconn *con)
 	 "pgbench_tellers");
 }
 
-/*
- * Fill the standard tables with some data generated and sent from the client
- */
+typedef void initRow(PQExpBufferData *sql, int64 curr);
+
 static void
-initGenerateDataClientSide(PGconn *con)
+initBranch(PQExpBufferData *sql, int64 curr)
+{
+	/* "filler" column defaults to NULL */
+	printfPQExpBuffer(sql,
+	  INT64_FORMAT "\t0\t\n",
+	  curr + 1);
+}
+
+static void
+initTeller(PQExpBufferData *sql, int64 curr)
+{
+	/* "filler" column defaults to NULL */
+	printfPQExpBuffer(sql,
+	  INT64_FORMAT "\t" INT64_FORMAT "\t0\t\n",
+	  curr + 1, curr / ntellers + 1);
+}
+
+static void
+initAccount(PQExpBufferData *sql, int64 curr)
+{
+	/* "filler" column defaults to blank padded empty string */
+	printfPQExpBuffer(sql,
+	  INT64_FORMAT "\t" INT64_FORMAT "\t0\t\n",
+	  curr + 1, curr / naccounts + 1);
+}
+
+static void
+initPopulateTable(PGconn *con, const char *table, int64 base, initRow *init_row)
 {
+	int			n;
+	int			k;
+	int			chars = 0;
+	PGresult	*res;
 	PQExpBufferData sql;
-	PGresult   *res;
-	int			i;
-	int64		k;
-	char	   *copy_statement;
+	char 		copy_statement[256];
+	const char *copy_statement_fmt;
+	const int64 total = base * scale;
 
 	/* used to track elapsed time and estimate of the remaining time */
 	pg_time_usec_t start;
@@ -4878,50 +4907,22 @@ initGenerateDataClientSide(PGconn *con)
 	/* Stay on the same line if reporting to a terminal */
 	char		eol = isatty(fileno(stderr)) ? '\r' : '\n';
 
-	fprintf(stderr, "generating data (client-side)...\n");
-
-	/*
-	 * we do all of this in one transaction to enable the backend's
-	 * data-loading optimizations
-	 */
-	executeStatement(con, "begin");
-
-	/* truncate away any old data */
-	initTruncateTables(con);
-
 	initPQExpBuffer();
 
 	/*
-	 * fill branches, tellers, accounts in that order in case foreign keys
-	 * already exist
+	 * Use COPY with FREEZE on v14 and later without partitioning.  Remember
+	 * that partitions only applies to pgbench_accounts.
 	 */
-	for (i = 0; i < nbranches * scale; i++)
-	{
-		/* "filler" column defaults to NULL */
-		printfPQExpBuffer(,
-		  "insert into pgbench_branches(bid,bbalance) values(%d,0)",
-		  i + 1);
-		executeStatement(con, sql.data);
-	}
-
-	for (i = 0; i < ntellers * scale; i++)
-	{
-		/* "filler" column defaults to NULL */
-		printfPQExpBuffer(,
-		  "insert into pgbench_tellers(tid,bid,tbalance) values (%d,%d,0)",
-		  i + 1, i / ntellers + 1);
-		executeStatement(con, sql.data);
-	}
-
-	/*
-	 * accounts is big enough to be worth using COPY and tracking runtime
-	 */
-
-	/* use COPY with FREEZE on v14 and later without 

Re: psql: Add role's membership options to the \du+ command

2023-07-19 Thread Pavel Luzanov

On 19.07.2023 19:47, Tom Lane wrote:

And done, with some minor editorialization.


Thanks to everyone who participated in the work.
Special thanks to David for moving forward this patch for a long time, 
and to Tom for taking commit responsibilities.


--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: harmonize password reuse in vacuumdb, clusterdb, and reindexdb

2023-07-19 Thread Nathan Bossart
On Wed, Jun 28, 2023 at 10:24:09PM -0700, Nathan Bossart wrote:
> On Wed, Jun 28, 2023 at 09:20:03PM -0700, Gurjeet Singh wrote:
>> The comment on top of connect_utils.c:connectDatabase() seems pertinent:
>> 
>>> (Callers should not pass
>>> * allow_password_reuse=true unless reconnecting to the same database+user
>>> * as before, else we might create password exposure hazards.)
>> 
>> The callers of {cluster|reindex}_one_database() (which in turn call
>> connectDatabase()) clearly pass different database names in successive
>> calls to these functions. So the patch seems to be in conflict with
>> the recommendation in the comment.
>> 
>> [ ... ]
> 
> The same commit that added this comment (ff402ae) also set the
> allow_password_reuse parameter to true in vacuumdb's connectDatabase()
> calls.  I found a message from the corresponding thread that provides some
> additional detail [0].  I wonder if this comment should instead recommend
> against using the allow_password_reuse flag unless reconnecting to the same
> host/port/user target.  Connecting to different databases with the same
> host/port/user information seems okay.  Maybe I am missing something... 

I added Tom here since it looks like he was the original author of this
comment.  Tom, do you have any concerns with updating the comment for
connectDatabase() in src/fe_utils/connect_utils.c like this?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Adding argument names to aggregate functions

2023-07-19 Thread Dagfinn Ilmari Mannsåker
Daniel Gustafsson  writes:

> This patch no longer applied but had a fairly trivial conflict so I've 
> attached
> a rebased v3 addressing the conflict in the hopes of getting this further.

Thanks for the heads-up!  Turns out the conflict was due to the new
json(b)_object_agg(_unique)(_strict) functions, which should also have
proargnames added.  Here's an updated patch that does that.

- ilmari

>From 2da3bada4f2a9425cbaa925a51f78773e4e16dfb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Mon, 27 Feb 2023 13:06:29 +
Subject: [PATCH v4] Add argument names to multi-argument aggregates

This makes it easier to see which way around the arguments go when
using \dfa.  This is particularly relevant for string_agg(), but add
it to json(b)_object_agg() too for good measure.
---
 src/include/catalog/pg_proc.dat | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..3e283671dc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -4992,7 +4992,7 @@
 { oid => '3538', descr => 'concatenate aggregate input into a string',
   proname => 'string_agg', prokind => 'a', proisstrict => 'f',
   prorettype => 'text', proargtypes => 'text text',
-  prosrc => 'aggregate_dummy' },
+  proargnames => '{value,delimiter}', prosrc => 'aggregate_dummy' },
 { oid => '3543', descr => 'aggregate transition function',
   proname => 'bytea_string_agg_transfn', proisstrict => 'f',
   prorettype => 'internal', proargtypes => 'internal bytea bytea',
@@ -5004,7 +5004,7 @@
 { oid => '3545', descr => 'concatenate aggregate input into a bytea',
   proname => 'string_agg', prokind => 'a', proisstrict => 'f',
   prorettype => 'bytea', proargtypes => 'bytea bytea',
-  prosrc => 'aggregate_dummy' },
+  proargnames => '{value,delimiter}', prosrc => 'aggregate_dummy' },
 
 # To ASCII conversion
 { oid => '1845', descr => 'encode text from DB encoding to ASCII text',
@@ -8953,21 +8953,22 @@
 { oid => '3197', descr => 'aggregate input into a json object',
   proname => 'json_object_agg', prokind => 'a', proisstrict => 'f',
   provolatile => 's', prorettype => 'json', proargtypes => 'any any',
-  prosrc => 'aggregate_dummy' },
+  proargnames => '{key,value}', prosrc => 'aggregate_dummy' },
 { oid => '6280', descr => 'aggregate non-NULL input into a json object',
   proname => 'json_object_agg_strict', prokind => 'a', proisstrict => 'f',
   provolatile => 's', prorettype => 'json', proargtypes => 'any any',
-  prosrc => 'aggregate_dummy' },
+  proargnames => '{key,value}', prosrc => 'aggregate_dummy' },
 { oid => '6281',
   descr => 'aggregate input into a json object with unique keys',
   proname => 'json_object_agg_unique', prokind => 'a', proisstrict => 'f',
   provolatile => 's', prorettype => 'json', proargtypes => 'any any',
-  prosrc => 'aggregate_dummy' },
+  proargnames => '{key,value}', prosrc => 'aggregate_dummy' },
 { oid => '6282',
   descr => 'aggregate non-NULL input into a json object with unique keys',
   proname => 'json_object_agg_unique_strict', prokind => 'a',
   proisstrict => 'f', provolatile => 's', prorettype => 'json',
-  proargtypes => 'any any', prosrc => 'aggregate_dummy' },
+  proargtypes => 'any any', proargnames => '{key,value}',
+  prosrc => 'aggregate_dummy' },
 { oid => '3198', descr => 'build a json array from any inputs',
   proname => 'json_build_array', provariadic => 'any', proisstrict => 'f',
   provolatile => 's', prorettype => 'json', proargtypes => 'any',
@@ -9881,22 +9882,22 @@
   prosrc => 'jsonb_object_agg_finalfn' },
 { oid => '3270', descr => 'aggregate inputs into jsonb object',
   proname => 'jsonb_object_agg', prokind => 'a', proisstrict => 'f',
-  prorettype => 'jsonb', proargtypes => 'any any',
+  prorettype => 'jsonb', proargtypes => 'any any', proargnames => '{key,value}',
   prosrc => 'aggregate_dummy' },
 { oid => '6288', descr => 'aggregate non-NULL inputs into jsonb object',
   proname => 'jsonb_object_agg_strict', prokind => 'a', proisstrict => 'f',
-  prorettype => 'jsonb', proargtypes => 'any any',
+  prorettype => 'jsonb', proargtypes => 'any any', proargnames => '{key,value}',
   prosrc => 'aggregate_dummy' },
 { oid => '6289',
   descr => 'aggregate inputs into jsonb object checking key uniqueness',
   proname => 'jsonb_object_agg_unique', prokind => 'a', proisstrict => 'f',
-  prorettype => 'jsonb', proargtypes => 'any any',
+  prorettype => 'jsonb', proargtypes => 'any any', proargnames => '{key,value}',
   prosrc => 'aggregate_dummy' },
 { oid => '6290',
   descr => 'aggregate non-NULL inputs into jsonb object checking key uniqueness',
   proname => 'jsonb_object_agg_unique_strict', prokind => 'a',
   proisstrict => 'f', prorettype => 'jsonb', proargtypes => 'any any',
-  prosrc => 'aggregate_dummy' },
+  proargnames => '{key,value}', prosrc => 'aggregate_dummy' },
 { oid => '3271', descr => 'build 

Re: There should be a way to use the force flag when restoring databases

2023-07-19 Thread Gurjeet Singh
On Tue, Jul 18, 2023 at 12:53 AM Joan  wrote:
>
> Since posgres 13 there's the option to do a FORCE when dropping a database 
> (so it disconnects current users) Documentation here: 
> https://www.postgresql.org/docs/current/sql-dropdatabase.html
>
> I am currently using dir format for the output
>pg_dump -d "bdname" -F d -j 4 -v -f /tmp/dir
>
> And restoring the database with
>   pg_restore -d postgres -C -c --exit-on-error -F d -j 3 -v /tmp/dir
>
> Having an option to add the FORCE option to either the generated dump by 
> pg_dump, or in the pg_restore would be very useful when restoring the 
> databases to another servers so it would avoid having to do scripting.
>
> In my specific case I am using this to refresh periodically a development 
> environment with data from production servers for a small database (~200M).

Making force-drop a part of pg_dump output may be dangerous, and not
provide much flexibility at restore time.

Adding a force option to pg_restore feels like providing the right tradeoff.

The docs for 'pg_restore --create` say "Create the database before
restoring into it. If --clean is also specified, drop and recreate the
target database before connecting to it."

If we provided a force option, it may then additionally say: "If the
--clean and --force options are specified, DROP DATABASE ... WITH
FORCE command will be used to drop the database."

Using WITH FORCE is not a guarantee, as the DROP DATABASE docs clarify
the conditions under which it may fail.

Best regards,
Gurjeet
http://Gurje.et




Re: Use of additional index columns in rows filtering

2023-07-19 Thread Jeff Davis
On Wed, 2023-07-19 at 11:16 +0200, Tomas Vondra wrote:
> I wonder if Andres was right (in the index prefetch thread) that
> splitting regular index scans and index-only scans may not be ideal.
> In
> a way, this patch moves those nodes closer, both in capability and
> code
> (because now both use index_getnext_tid etc.).

Yeah. I could also imagine decomposing the index scan node into more
pieces, but I don't think it would work out to be a clean data flow.
Either way, probably out of scope for this patch.

For this patch I think we should just tweak the EXPLAIN output so that
it's a little more clear what parts are index-only (at least if VM bit
is set) and what parts need to go to the heap.

Regards,
Jeff Davis





Re: Fix last unitialized memory warning

2023-07-19 Thread Tristan Partin

On Sun Jul 9, 2023 at 2:23 AM CDT, Peter Eisentraut wrote:

On 06.07.23 15:41, Tristan Partin wrote:
> On Thu Jul 6, 2023 at 3:21 AM CDT, Peter Eisentraut wrote:
>> On 05.07.23 23:06, Tristan Partin wrote:
>>> Thanks for following up. My system is Fedora 38. I can confirm this is
>>> still happening on master.
>>>
>>> $ gcc --version
>>> gcc (GCC) 13.1.1 20230614 (Red Hat 13.1.1-4)
>>> Copyright (C) 2023 Free Software Foundation, Inc.
>>> This is free software; see the source for copying conditions.  There is NO
>>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>> $ meson setup build --buildtype=release
>>
>> This buildtype turns on -O3 warnings.  We have usually opted against
>> chasing warnings in -O3 level because there are often some
>> false-positive uninitialized variable warnings with every new compiler.
>>
>> Note that we have set the default build type to debugoptimized, for that
>> reason.
> 
> Good to know, thanks.
> 
> Regarding the original patch, do you think it is good to be applied?


That patch looks reasonable.  But I can't actually reproduce the 
warning, even with gcc-13.  I do get the warning from plpgsql.  Can you 
show the warning you are seeing?


Here is the full warning that the original patch suppresses.

[1360/1876] Compiling C object src/bin/pgbench/pgbench.p/pgbench.c.o
In function ‘coerceToInt’,
   inlined from ‘evalStandardFunc’ at ../src/bin/pgbench/pgbench.c:2607:11:
../src/bin/pgbench/pgbench.c:2032:17: warning: ‘vargs[0].type’ may be used 
uninitialized [-Wmaybe-uninitialized]
2032 | if (pval->type == PGBT_INT)
 | ^~
../src/bin/pgbench/pgbench.c: In function ‘evalStandardFunc’:
../src/bin/pgbench/pgbench.c:2240:22: note: ‘vargs’ declared here
2240 | PgBenchValue vargs[MAX_FARGS];
 |  ^
In function ‘coerceToInt’,
   inlined from ‘evalStandardFunc’ at ../src/bin/pgbench/pgbench.c:2607:11:
../src/bin/pgbench/pgbench.c:2034:32: warning: ‘vargs[0].u.ival’ may be used 
uninitialized [-Wmaybe-uninitialized]
2034 | *ival = pval->u.ival;
 | ~~~^
../src/bin/pgbench/pgbench.c: In function ‘evalStandardFunc’:
../src/bin/pgbench/pgbench.c:2240:22: note: ‘vargs’ declared here
2240 | PgBenchValue vargs[MAX_FARGS];
 |  ^
In function ‘coerceToInt’,
   inlined from ‘evalStandardFunc’ at ../src/bin/pgbench/pgbench.c:2607:11:
../src/bin/pgbench/pgbench.c:2039:40: warning: ‘vargs[0].u.dval’ may be used 
uninitialized [-Wmaybe-uninitialized]
2039 | double  dval = rint(pval->u.dval);
 |^~
../src/bin/pgbench/pgbench.c: In function ‘evalStandardFunc’:
../src/bin/pgbench/pgbench.c:2240:22: note: ‘vargs’ declared here
2240 | PgBenchValue vargs[MAX_FARGS];
 |  ^

--
Tristan Partin
Neon (https://neon.tech)




Re: psql: Add role's membership options to the \du+ command

2023-07-19 Thread David G. Johnston
On Wed, Jul 19, 2023 at 9:47 AM Tom Lane  wrote:

> I wrote:
> > Alvaro Herrera  writes:
> >> +1 for backpatching to 16, given that it's a psql-only change that
> >> pertains to a backend change that was done in the 16 timeframe.
>
> > Agreed.  In the interests of moving things along, I'll take point
> > on getting this committed.
>
> And done, with some minor editorialization.  I'll go mark the
> open item as closed.
>
>
Thank You!

David J.


Re: psql: Add role's membership options to the \du+ command

2023-07-19 Thread Tom Lane
I wrote:
> Alvaro Herrera  writes:
>> +1 for backpatching to 16, given that it's a psql-only change that
>> pertains to a backend change that was done in the 16 timeframe.

> Agreed.  In the interests of moving things along, I'll take point
> on getting this committed.

And done, with some minor editorialization.  I'll go mark the
open item as closed.

regards, tom lane




Re: Row pattern recognition

2023-07-19 Thread Jacob Champion
Hello,

Thanks for working on this! We're interested in RPR as well, and I've
been trying to get up to speed with the specs, to maybe make myself
useful.

On 6/27/23 17:58, Tatsuo Ishii wrote:
> Yes. (I think the standard calls the window frame as "full window
> frame" in context of RPR to make a contrast with the subset of the
> frame rows restricted by RPR. The paper I refered to as [2] claims
> that the latter window frame is called "reduced window frame" in the
> standard but I wasn't able to find the term in the standard.)

19075-5 discusses that, at least; not sure about other parts of the spec.

> Maybe an insane idea but what about rewriting MATCH_RECOGNIZE clause
> into Window clause with RPR?

Are we guaranteed to always have an equivalent window clause? There seem
to be many differences between the two, especially when it comes to ONE
ROW/ALL ROWS PER MATCH.

--

To add onto what Vik said above:

>> It seems RPR in the standard is quite complex. I think we can start
>> with a small subset of RPR then we could gradually enhance the
>> implementation.
> 
> I have no problem with that as long as we don't paint ourselves into a 
> corner.

To me, PATTERN looks like an area where we may want to support a broader
set of operations in the first version. The spec has a bunch of
discussion around cases like empty matches, match order of alternation
and permutation, etc., which are not possible to express or test with
only the + quantifier. Those might be harder to get right in a v2, if we
don't at least keep them in mind for v1?

> +static List *
> +transformPatternClause(ParseState *pstate, WindowClause *wc, WindowDef 
> *windef)
> +{
> +   List*patterns;

My compiler complains about the `patterns` variable here, which is
returned without ever being initialized. (The caller doesn't seem to use
it.)

> +-- basic test using PREV
> +SELECT company, tdate, price, rpr(price) OVER w FROM stock
> + WINDOW w AS (
> + PARTITION BY company
> + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> + INITIAL
> + PATTERN (START UP+ DOWN+)
> + DEFINE
> +  START AS TRUE,
> +  UP AS price > PREV(price),
> +  DOWN AS price < PREV(price)
> +);

nitpick: IMO the tests should be making use of ORDER BY in the window
clauses.

This is a very big feature. I agree with you that MEASURES seems like a
very important "next piece" to add. Are there other areas where you
would like reviewers to focus on right now (or avoid)?

Thanks!
--Jacob




Re: Support to define custom wait events for extensions

2023-07-19 Thread Tristan Partin

On Tue Jul 11, 2023 at 6:29 PM CDT, Michael Paquier wrote:

On Tue, Jul 11, 2023 at 12:39:52PM -0500, Tristan Partin wrote:
> Given the context of our last conversation, I assume this code was
> copied from somewhere else. Since this is new code, I think it would
> make more sense if newalloc was a uint16 or size_t.

This style comes from LWLockRegisterTranche() in lwlock.c.  Do you
think that it would be more adapted to change that to
pg_nextpower2_size_t() with a Size?  We could do that for the existing
code on HEAD as an improvement.


Yes, I think that would be the most correct code. At the very least,
using a uint32 instead of an int, would be preferrable.


> From what I understand, Neon differs from upstream in some way related
> to this patch. I am trying to ascertain how that is. I hope to provide
> more feedback when I learn more about it.

Hmm, okay, that would nice to hear about to make sure that the
approach taken on this thread is able to cover what you are looking
for.  So you mean that Neon has been using something similar to
register wait events in the backend?  Last time I looked at the Neon
repo, I did not get the impression that there was a custom patch for
Postgres in this area.  All the in-core code paths using
WAIT_EVENT_EXTENSION would gain from the APIs added here, FWIW.


I did some investigation into the Neon fork[0], and couldn't find any
commits that seemed related. Perhaps this is on our wishlist instead of
something we already have implemented. I have CCed Heikki to talk some
more about how this would fit in at Neon.

[0]: https://github.com/neondatabase/postgres

--
Tristan Partin
Neon (https://neon.tech)




Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-07-19 Thread Alvaro Herrera
On 2023-Jul-19, Andrew Dunstan wrote:

> 
> On 2023-07-19 We 07:05, Alvaro Herrera wrote:
> > I just hit a snag testing this.  It turns out that the
> > PostgreSQL::Version comparison stuff believes that 16beta2 < 16, which
> > sounds reasonable.  However, because of that, the AdjustUpgrade.pm
> > stanza that tries to drop tables public.gtest_normal_child{2} in
> > versions earlier than 16 fails, because by 16 these tables are dropped
> > in the test itself rather than left to linger, as was the case in
> > versions 15 and earlier.

> The buildfarm module assumes that no adjustments are necessary if the old
> and new versions are the same (e.g. HEAD to HEAD). And it never passes in a
> version like '16beta2'. It extracts the version number from the branch name,
> e.g. REL_16_STABLE => 16.

Hmm, OK, but I'm not testing the same versions -- I'm testing 16beta2 to
17devel.

> > I can fix this either by using DROP IF EXISTS in that stanza, or by
> > making AdjustUpgrade use 'version <= 15'.  Any opinions on which to
> > prefer?
> 
> The trouble is this could well break the next time someone puts in a test
> like this.

Hmm, I don't understand what you mean.

> Maybe we need to make AdjustUpgrade just look at the major version,
> something like:
> 
>    $old_version = PostgreSQL::Version->new($old_version->major);

It seems like that does work, but if we do that, then we also need to
change this line:

if ($old_version lt '9.5')
to
if ($old_version < '9.5')

otherwise you get some really mysterious failures about trying to drop
public.=>, which is in fact no longer accepted syntax since 9.5; and the
stringwise comparison returns the wrong value here.

TBH I'm getting a sense of discomfort with the idea of having developed
a Postgres-version-number Perl module, and in the only place where we
can use it, have to settle for numeric comparison instead.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
¡Ay, ay, ay!  Con lo mucho que yo lo quería (bis)
se fue de mi vera ... se fue para siempre, pa toíta ... pa toíta la vida
¡Ay Camarón! ¡Ay Camarón!(Paco de Lucía)
>From 6922ab1621f4429b059e82845ea9d6143f3eb9cd Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 19 Jul 2023 17:54:09 +0200
Subject: [PATCH] Use numeric comparison instead of PostgreSQL::Version

This seems like the wrong approach to deal with the problem.

Discussion: https://postgr.es/m/20230719110504.zbu74o54bqqlsufb@alvherre.pgsql
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 2 +-
 src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index a5688a1cf2..75c9ee9b73 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -242,7 +242,7 @@ if (defined($ENV{oldinstall}))
 	do { $dbnames{$_} = 1; }
 	  foreach split /\s+/s, $dbnames;
 	my $adjust_cmds =
-	  adjust_database_contents($oldnode->pg_version, %dbnames);
+	  adjust_database_contents($oldnode->pg_version->major, %dbnames);
 
 	foreach my $updb (keys %$adjust_cmds)
 	{
diff --git a/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
index a241d2ceff..301acbd01c 100644
--- a/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
+++ b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
@@ -200,7 +200,7 @@ sub adjust_database_contents
 			'drop function oldstyle_length(integer, text)');
 	}
 
-	if ($old_version lt '9.5')
+	if ($old_version < '9.5')
 	{
 		# cope with changes of underlying functions
 		_add_st(
-- 
2.39.2



Re: Support to define custom wait events for extensions

2023-07-19 Thread Tristan Partin

Thanks for continuing to work on this patchset. I only have
prose-related comments.


To support custom wait events, it add 2 APIs to define new wait events
for extensions dynamically.


Remove the "it" here.


The APIs are
* WaitEventExtensionNew()
* WaitEventExtensionRegisterName()



These are similar to the existing LWLockNewTrancheId() and
LWLockRegisterTranche().


This sentence seems like it could be removed given the API names have
changed during the development of this patch.


First, extensions should call WaitEventExtensionNew() to get one
or more new wait event, which IDs are allocated from a shared
counter.  Next, each individual process can use the wait event with
WaitEventExtensionRegisterName() to associate that a wait event
string to the associated name.


This portion of the commit message is a copy-paste of the function
comment. Whatever you do in the function comment (which I commented on
below), just do here as well.


+ so an wait event might be reported as just 
extension
+ rather than the extension-assigned name.


s/an/a


+   
+Custom Wait Events for Add-ins


This would be the second use of "Add-ins" ever, according to my search.
Should this be "Extensions" instead?


+
+Add-ins can define custom wait events that the wait event type is


s/that/where


+Extension.
+
+
+First, add-ins should get new one or more wait events by calling:


"one or more" doesn't seem to make sense grammatically here.


+
+uint32 WaitEventExtensionNew(void)
+
+Next, each individual process can use them to associate that


Remove "that".


+a wait event string to the associated name by calling:
+
+void WaitEventExtensionRegisterName(uint32 wait_event_info, const char 
*wait_event_name);
+
+An example can be found in
+
src/test/modules/test_custom_wait_events/test_custom_wait_events.c
+in the PostgreSQL source tree.
+
+   



+ * Register a dynamic wait event name for extension in the lookup table
+ * of the current process.


Inserting an "an" before "extension" would make this read better.


+/*
+ * Return the name of an wait event ID for extension.
+ */


s/an/a


+   /*
+* It's an user-defined wait event, so look in 
WaitEventExtensionNames[].
+* However, it's possible that the name has never been registered by
+* calling WaitEventExtensionRegisterName() in the current process, in
+* which case give up and return "extension".
+*/


s/an/a

"extension" seems very similar to "Extension". Instead of returning a
string here, could we just error? This seems like a programmer error on
the part of the extension author.


+ * Extensions can define their wait events. First, extensions should call
+ * WaitEventExtensionNew() to get one or more wait events, which IDs are
+ * allocated from a shared counter.  Next, each individual process can use
+ * them with WaitEventExtensionRegisterName() to associate that a wait
+ * event string to the associated name.


An "own" before "wait events" in the first sentence would increase
clarity. "where" instead of "which" in the next sentence. Remove "that"
after "associate" in the third sentence.

--
Tristan Partin
Neon (https://neon.tech)




Re: psql: Add role's membership options to the \du+ command

2023-07-19 Thread Tom Lane
Alvaro Herrera  writes:
> I tried this out.  It looks good to me, and I like it.  Not translating
> the labels seems correct to me.
> +1 for backpatching to 16, given that it's a psql-only change that
> pertains to a backend change that was done in the 16 timeframe.

Agreed.  In the interests of moving things along, I'll take point
on getting this committed.

> Regarding the controversy of showing SET for previous versions, I think
> it's clearer if it's shown, because ultimately what the user really
> wants to know is if the role can be SET to; they don't want to have to
> learn from memory in which version they can SET because the column is
> empty and in which version they have to look for the label.

Seems reasonable.  I'll go with that interpretation unless there's
pretty quick pushback.

regards, tom lane




Re: [RFC] Add jit deform_counter

2023-07-19 Thread Dmitry Dolgov
> On Tue, Jul 18, 2023, 3:32 PM Daniel Gustafsson  wrote
>> Here is the patch with the proposed variation.
>
> This version still leaves non-text EXPLAIN formats with timing which
doesn't
> add up.  Below are JSON and XML examples:

Good point. For the structured formats it should be represented via a nested
level. I'll try to do this and other proposed changes as soon as I'll get
back.


Re: Let's make PostgreSQL multi-threaded

2023-07-19 Thread Ashutosh Bapat
I think planner would also benefit from threads. There are many tasks
in planner that are independent and can be scheduled using dependency
graph. They are too small to be parallelized through separate backends
but large enough to be performed by threads. Planning queries
involving partitioned tables take longer time (in seconds) esp. when
there are thousands of partitions. That kind of planning will get
immensely benefited by threading. Of course we can use backends which
can pull tasks from queue but sharing the PlannerInfo and its
substructure is easier through the same address space rather than
shared memory.

On Sat, Jun 10, 2023 at 5:25 AM Bruce Momjian  wrote:
>
> On Wed, Jun  7, 2023 at 06:38:38PM +0530, Ashutosh Bapat wrote:
> > With multiple processes, we can use all the available cores (at least
> > theoretically if all those processes are independent). But is that
> > guaranteed with single process multi-thread model? Google didn't throw
> > any definitive answer to that. Usually it depends upon the OS and
> > architecture.
> >
> > Maybe a good start is to start using threads instead of parallel
> > workers e.g. for parallel vacuum, parallel query and so on while
> > leaving the processes for connections and leaders. that itself might
> > take significant time. Based on that experience move to a completely
> > threaded model. Based on my experience with other similar products, I
> > think we will settle on a multi-process multi-thread model.
>
> I think we have a few known problem that we might be able to solve
> without threads, but can help us eventually move to threads if we find
> it useful:
>
> 1)  Use threads for background workers rather than processes
> 2)  Allow sessions to be stopped and started by saving their state
>
> Ideally we would solve the problem of making shared structures
> resizable, but I am not sure how that can be easily done without
> threads.
>
> --
>   Bruce Momjian  https://momjian.us
>   EDB  https://enterprisedb.com
>
>   Only you can decide what is important to you.



-- 
Best Wishes,
Ashutosh Bapat




Re: Remove backend warnings from SSL tests

2023-07-19 Thread Tom Lane
I wrote:
> Aleksander Alekseev  writes:
>> Alternatively we could get rid of
>> ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS entirely since its practical
>> value seems to be debatable.

> Strong -1 on that, for the reason given above.

Perhaps an alternative could be to expend some more sweat on the
mechanism, so that it's actually enforced (with ERROR) during
"make installcheck", but not in "make check" or TAP tests?
I'm not sure how to make that work exactly.

regards, tom lane




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-07-19 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thanks for reviewing! The patch could be available at [1].

> Few comments/questions
> 
> 1.
> +check_for_parameter_settings(ClusterInfo *new_cluster)
> {
> ...
> +
> + res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
> + max_replication_slots = atoi(PQgetvalue(res, 0, 0));
> +
> + if (max_replication_slots == 0)
> + pg_fatal("max_replication_slots must be greater than 0");
> ...
> }
> 
> Won't it be better to verify that the value of "max_replication_slots"
> is greater than the number of logical slots we are planning to copy
> from old on the new cluster? Similar to this, I thought whether we
> need to check the value of max_wal_senders? But, I guess one can
> simply decode from slots by using APIs, so not sure about that. What
> do you think?

Agreed for verifying the max_replication_slots. There are several ways to add 
it,
so I chose the simplest one - store the #slots to global variable and compare
between it and max_replication_slots.
As for the max_wal_senders, I don't think it should be. As you said, there is a
possibility user-defined background worker uses the slot and consumes WALs.

> 2.
> + /*
> + * Dump logical replication slots if needed.
> + *
> + * XXX We cannot dump replication slots at the same time as the schema
> + * dump because we need to separate the timing of restoring
> + * replication slots and other objects. Replication slots, in
> + * particular, should not be restored before executing the pg_resetwal
> + * command because it will remove WALs that are required by the slots.
> + */
> + if (user_opts.include_logical_slots)
> 
> Can you explain this point a bit more with some example scenarios?
> Basically, if we had sent all the WAL before the upgrade then why do
> we need to worry about the timing of pg_resetwal?

OK, I can tell the example here. Should it be described on the source?

Assuming that there is a valid logical replication slot as follows:

```
postgres=# select slot_name, plugin, restart_lsn, wal_status, two_phase from 
pg_replication_slots;
 slot_name |plugin | restart_lsn | wal_status | two_phase 
---+---+-++---
 test  | test_decoding | 0/15665A8   | reserved   | f
(1 row)

postgres=# select * from pg_current_wal_lsn();
 pg_current_wal_lsn 

 0/15665E0
(1 row)
```

And here let's execute the pg_resetwal to the pg server.
The existing wal segment file is purged and moved to next seg.

```
$ pg_ctl stop -D data_N1/
waiting for server to shut down done
server stopped
$ pg_resetwal -l 00010002 data_N1/
Write-ahead log reset
$ pg_ctl start -D data_N1/ -l N1.log 
waiting for server to start done
server started
```

After that the logical slot cannot move foward anymore because the required WALs
are removed, whereas the wal_status is still "reserved".

```
postgres=# select slot_name, plugin, restart_lsn, wal_status, two_phase from 
pg_replication_slots;
 slot_name |plugin | restart_lsn | wal_status | two_phase 
---+---+-++---
 test  | test_decoding | 0/15665A8   | reserved   | f
(1 row)

postgres=# select * from pg_current_wal_lsn();
 pg_current_wal_lsn 

 0/2028328
(1 row)

postgres=# select * from pg_logical_slot_get_changes('test', NULL, NULL);
ERROR:  requested WAL segment pg_wal/00010001 has already been 
removed
```

pg_upgrade runs pg_dump and then pg_resetwal, so dumping slots must be done
separately to avoid above error.

> 3. I see that you are trying to ensure that all the WAL has been
> consumed for a slot except for shutdown_checkpoint in patch 0003 but
> do we need to think of any interaction with restart_lsn
> (MyReplicationSlot->data.restart_lsn) which is the start point to read
> WAL for decoding by walsender?

Currently I'm not sure it should be considered. Do you have in mind?

candidate_restart_lsn for the slot is set ony when XLOG_RUNNING_XACTS is decoded
(LogicalIncreaseRestartDecodingForSlot()), and is set as restart_lsn later. So
there are few timings to update the value and we cannot determine the accepatble
boundary.

Furthermore, I think restart point is not affect the result for replicating
changes on subscriber because it is always behind confimed_flush.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866E9ED5B8C5AD7F7AC062FF539A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-07-19 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing! PSA new version patchset.

> ==
> Commit message
> 
> 1.
> For pg_dump this commit includes a new option called
> "--logical-replication-slots-only".
> This option can be used to dump logical replication slots. When this option is
> specified, the slot_name, plugin, and two_phase parameters are extracted from
> pg_replication_slots. An SQL file is then generated which executes
> pg_create_logical_replication_slot() with the extracted parameters.
> 
> ~
> 
> This part doesn't do the actual execution, so maybe slightly reword this.
> 
> BEFORE
> An SQL file is then generated which executes
> pg_create_logical_replication_slot() with the extracted parameters.
> 
> SUGGESTION
> An SQL file that executes pg_create_logical_replication_slot() with
> the extracted parameters is generated.

Changed.

> 2.
> For pg_upgrade, when '--include-logical-replication-slots' is
> specified, it executes
> pg_dump with the new "--logical-replication-slots-only" option and
> restores from the
> dump. Note that we cannot dump replication slots at the same time as the 
> schema
> dump because we need to separate the timing of restoring replication slots and
> other objects. Replication slots, in  particular, should not be restored 
> before
> executing the pg_resetwal command because it will remove WALs that are
> required
> by the slots.
> 
> ~~~
> 
> Maybe "restores from the dump" can be described more?
> 
> BEFORE
> ...and restores from the dump.
> 
> SUGGESTION
> ...and restores the slots using the
> pg_create_logical_replication_slots() statements that the dump
> generated (see above).

Fixed.

> src/bin/pg_dump/pg_dump.c
> 
> 3. help
> 
> +
> + /*
> + * The option --logical-replication-slots-only is used only by pg_upgrade
> + * and should not be called by users, which is why it is not listed.
> + */
>   printf(_("  --no-commentsdo not dump comments\n"));
> ~
> 
> /not listed./not exposed by the help./

Fixed.

> 4. getLogicalReplicationSlots
> 
> + /* Check whether we should dump or not */
> + if (fout->remoteVersion < 16)
> + return;
> 
> PG16 is already in beta. I think this should now be changed to 17, right?

That's right, fixed.

> src/bin/pg_upgrade/check.c
> 
> 5. check_new_cluster
> 
> + /*
> + * Do additional works if --include-logical-replication-slots is required.
> + * These must be done before check_new_cluster_is_empty() because the
> + * slot_arr attribute of the new_cluster will be checked in the function.
> + */
> 
> SUGGESTION (minor rewording/grammar)
> Do additional work if --include-logical-replication-slots was
> specified. This must be done before check_new_cluster_is_empty()
> because the slot_arr attribute of the new_cluster will be checked in
> that function.

Fixed.

> 6. check_new_cluster_is_empty
> 
> + /*
> + * If --include-logical-replication-slots is required, check the
> + * existence of slots.
> + */
> + if (user_opts.include_logical_slots)
> + {
> + LogicalSlotInfoArr *slot_arr = _cluster.dbarr.dbs[dbnum].slot_arr;
> +
> + /* if nslots > 0, report just first entry and exit */
> + if (slot_arr->nslots)
> + pg_fatal("New cluster database \"%s\" is not empty: found logical
> replication slot \"%s\"",
> + new_cluster.dbarr.dbs[dbnum].db_name,
> + slot_arr->slots[0].slotname);
> + }
> +
> 
> 6a.
> There are a number of places in this function using
> "new_cluster.dbarr.dbs[dbnum].XXX"
> 
> It is OK but maybe it would be tidier to up-front assign a local
> variable for this?
> 
> DbInfo *pDbInfo = _cluster.dbarr.dbs[dbnum];

Seems better, fixed.

> 6b.
> The above code adds an unnecessary blank line in the loop that was not
> there previously.

Removed.

> 7. check_for_parameter_settings
> 
> +/*
> + * Verify parameter settings for creating logical replication slots
> + */
> +static void
> +check_for_parameter_settings(ClusterInfo *new_cluster)
> 
> 7a.
> I felt this might have some missing words so it was meant to say:
> 
> SUGGESTION
> Verify the parameter settings necessary for creating logical replication 
> slots.

Changed.

> 7b.
> Maybe you can give this function a better name because there is no
> hint in this generic name that it has anything to do with replication
> slots.

Renamed to check_for_logical_replication_slots(), how do you think?

> 8.
> + /* --include-logical-replication-slots can be used since PG16. */
> + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1500)
> + return;
> 
> PG16 is already in beta, so the version number (1500) and the comment
> mentioning PG16 are outdated aren't they?

Right, fixed.

> src/bin/pg_upgrade/info.c
> 
> 9.
>  static void print_rel_infos(RelInfoArr *rel_arr);
> -
> +static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
> 
> The removal of the existing blank line seems not a necessary part of this 
> patch.

Added.

> 10. get_logical_slot_infos_per_db
> 
> + char query[QUERY_ALLOC];
> +
> + query[0] = '\0'; /* initialize query string to empty */
> +
> + 

Re: Remove backend warnings from SSL tests

2023-07-19 Thread Tom Lane
Aleksander Alekseev  writes:
>> When looking at a patch in the CFBot I realized that the SSL tests generate
>> backend warnings under ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS

> Good catch. I can confirm that the patch corrects the named WARNINGs
> appearing with:
> CPPFLAGS="-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS"
> There are plenty of similar warnings left however.

Yeah.  We have not worried about making TAP tests clean under this
restriction, because the point of it is to limit the hazards of
running "make installcheck" against a cluster containing useful data.
TAP tests always use one-off test clusters, so there is no hazard
to guard against.

If we wanted to extend the rule to TAP tests as well, I think we'd
have to upgrade the WARNING to an ERROR, because otherwise we'll
never find all the violations.  Not clear to me that it's worth
the trouble, though.  And it's definitely not worth the trouble to
fix only one TAP suite.

> Alternatively we could get rid of
> ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS entirely since its practical
> value seems to be debatable.

Strong -1 on that, for the reason given above.

regards, tom lane




pg_rewind fails with in-place tablespace

2023-07-19 Thread 赵锐(惜元)
Hello postgres hackers,
 Recently I encountered an issue: pg_rewind fails when dealing with in-place 
tablespace. The problem seems to be that pg_rewind is treating in-place 
tablespace as symbolic link, while in fact it should be treated as directory.
 Here is the output of pg_rewind:
pg_rewind: error: file "pg_tblspc/16385" is of different type in source and 
target
 To help reproduce the failure, I have attached a tap test. And I am pleased to 
say that I have also identified a solution for this problem, which I have 
included in the patch.
 Thank you for your attention to this matter.
Best regards,
Rui Zhao


0001-pg_rewind-fails-with-in-place-tablespace.patch
Description: Binary data


Re: Report distinct wait events when waiting for WAL "operation"

2023-07-19 Thread Amit Kapila
On Mon, Jul 17, 2023 at 10:26 PM Andres Freund  wrote:
>
> Previously it was e.g. not really possible to distinguish that something like
> this:
>
> ┌┬─┬┬───┐
> │  backend_type  │ wait_event_type │ wait_event │ count │
> ├┼─┼┼───┤
> │ client backend │ LWLock  │ WALInsert  │32 │
> │ client backend │ (null)  │ (null) │ 9 │
> │ walwriter  │ IO  │ WALWrite   │ 1 │
> │ client backend │ Client  │ ClientRead │ 1 │
> │ client backend │ LWLock  │ WALWrite   │ 1 │
> └┴─┴┴───┘
>
> is a workload with a very different bottleneck than this:
>
> ┌┬─┬───┬───┐
> │  backend_type  │ wait_event_type │  wait_event   │ count │
> ├┼─┼───┼───┤
> │ client backend │ IPC │ WALWaitInsert │22 │
> │ client backend │ LWLock  │ WALInsert │13 │
> │ client backend │ LWLock  │ WALBufMapping │ 5 │
> │ walwriter  │ (null)  │ (null)│ 1 │
> │ client backend │ Client  │ ClientRead│ 1 │
> │ client backend │ (null)  │ (null)│ 1 │
> └┴─┴───┴───┘
>
> even though they are very different
>
> FWIW, the former is bottlenecked by the number of WAL insertion locks, the
> second is bottlenecked by copying WAL into buffers due to needing to flush
> them.
>

This gives a better idea of what's going on. +1 for separating these waits.

-- 
With Regards,
Amit Kapila.


Re: Disabling Heap-Only Tuples

2023-07-19 Thread Thom Brown
On Wed, 19 Jul 2023, 13:58 Laurenz Albe,  wrote:

> On Thu, 2023-07-06 at 22:18 +0200, Matthias van de Meent wrote:
> > On Wed, 5 Jul 2023 at 19:55, Thom Brown  wrote:
> > >
> > > On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent
> > >  wrote:
> > > > So what were you thinking of? A session GUC? A table option?
> > >
> > > Both.
> >
> > Here's a small patch implementing a new table option max_local_update
> > (name very much bikesheddable). Value is -1 (default, disabled) or the
> > size of the table in MiB that you still want to allow to update on the
> > same page. I didn't yet go for a GUC as I think that has too little
> > control on the impact on the system.
> >
> > I decided that max_local_update would be in MB because there is no
> > reloption value that can contain MaxBlockNumber and -1/disabled; and 1
> > MiB seems like enough granularity for essentially all use cases.
> >
> > The added regression tests show how this feature works, that the new
> > feature works, and validate that lock levels are acceptable
> > (ShareUpdateExclusiveLock, same as for updating fillfactor).
>
> I have looked at your patch, and I must say that I like it.  Having
> a size limit is better than my original idea of just "on" or "off".
> Essentially, it is "try to shrink the table if it grows above a limit".
>
> The patch builds fine and passes all regression tests.
>
> Documentation is missing.
>
> I agree that the name "max_local_update" could be improved.
> Perhaps "avoid_hot_above_size_mb".
>

Or "hot_table_size_threshold" or "hot_update_limit"?

Thom


Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-07-19 Thread Andrew Dunstan


On 2023-07-19 We 07:05, Alvaro Herrera wrote:

I just hit a snag testing this.  It turns out that the
PostgreSQL::Version comparison stuff believes that 16beta2 < 16, which
sounds reasonable.  However, because of that, the AdjustUpgrade.pm
stanza that tries to drop tables public.gtest_normal_child{2} in
versions earlier than 16 fails, because by 16 these tables are dropped
in the test itself rather than left to linger, as was the case in
versions 15 and earlier.

So, if you try to run the pg_upgrade test with a dump created by
16beta2, it will fail to drop these tables (because they don't exist)
and the whole test fails.  Why hasn't the buildfarm detected this
problem?  I see that Drongo is happy, but I don't understand why.
Apparently, the AdjustUpgrade.pm stuff leaves no trace.



The buildfarm module assumes that no adjustments are necessary if the 
old and new versions are the same (e.g. HEAD to HEAD). And it never 
passes in a version like '16beta2'. It extracts the version number from 
the branch name, e.g. REL_16_STABLE => 16.





I can fix this either by using DROP IF EXISTS in that stanza, or by
making AdjustUpgrade use 'version <= 15'.  Any opinions on which to
prefer?



The trouble is this could well break the next time someone puts in a 
test like this.



Maybe we need to make AdjustUpgrade just look at the major version, 
something like:



   $old_version = PostgreSQL::Version->new($old_version->major);


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Disabling Heap-Only Tuples

2023-07-19 Thread Laurenz Albe
On Thu, 2023-07-06 at 22:18 +0200, Matthias van de Meent wrote:
> On Wed, 5 Jul 2023 at 19:55, Thom Brown  wrote:
> > 
> > On Wed, 5 Jul 2023 at 18:05, Matthias van de Meent
> >  wrote:
> > > So what were you thinking of? A session GUC? A table option?
> > 
> > Both.
> 
> Here's a small patch implementing a new table option max_local_update
> (name very much bikesheddable). Value is -1 (default, disabled) or the
> size of the table in MiB that you still want to allow to update on the
> same page. I didn't yet go for a GUC as I think that has too little
> control on the impact on the system.
> 
> I decided that max_local_update would be in MB because there is no
> reloption value that can contain MaxBlockNumber and -1/disabled; and 1
> MiB seems like enough granularity for essentially all use cases.
> 
> The added regression tests show how this feature works, that the new
> feature works, and validate that lock levels are acceptable
> (ShareUpdateExclusiveLock, same as for updating fillfactor).

I have looked at your patch, and I must say that I like it.  Having
a size limit is better than my original idea of just "on" or "off".
Essentially, it is "try to shrink the table if it grows above a limit".

The patch builds fine and passes all regression tests.

Documentation is missing.

I agree that the name "max_local_update" could be improved.
Perhaps "avoid_hot_above_size_mb".

--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -342,6 +342,7 @@ typedef struct StdRdOptions
int parallel_workers;   /* max number of parallel workers */
StdRdOptIndexCleanup vacuum_index_cleanup;  /* controls index vacuuming */
boolvacuum_truncate;/* enables vacuum to truncate a relation */
+   int max_local_update;   /* Updates to pages after this block must 
go through the VM */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR10

In the comment, it should be FSM, not VM, right?

Other than that, I see nothing wrong.

Yours,
Laurenz Albe




Re: Remove backend warnings from SSL tests

2023-07-19 Thread Aleksander Alekseev
Hi,

> When looking at a patch in the CFBot I realized that the SSL tests generate
> backend warnings under ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS

Good catch. I can confirm that the patch corrects the named WARNINGs
appearing with:

CPPFLAGS="-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS"

There are plenty of similar warnings left however.

Before:

```
$ grep -r WARNING ./build/ 2>/dev/null | grep 'regression test cases
should have names' | wc -l
463
```

After:

```
$ grep -r WARNING ./build/ 2>/dev/null | grep 'regression test cases
should have names' | wc -l
403
```

Maybe we should address them too. In order to prevent this from
happening in the future perhaps we should start throwing ERRORs
instead of a WARNINGs and make sure this is tested by cfbot.

Alternatively we could get rid of
ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS entirely since its practical
value seems to be debatable.

The patch was added to the nearest commitfest [1].

Thoughts?

[1]: https://commitfest.postgresql.org/44/4451/

-- 
Best regards,
Aleksander Alekseev




Re: remaining sql/json patches

2023-07-19 Thread Amit Langote
On Wed, Jul 19, 2023 at 5:17 PM Amit Langote  wrote:
> On Wed, Jul 19, 2023 at 12:53 AM Alvaro Herrera  
> wrote:
> > On 2023-Jul-18, Amit Langote wrote:
> > > b6e1157e7d Don't include CaseTestExpr in JsonValueExpr.formatted_expr
> >
> > I feel a bit uneasy about this one.  It seems to assume that
> > formatted_expr is always set, but at the same time it's not obvious that
> > it is.  (Maybe this aspect just needs some more commentary).
>
> Hmm, I agree that the comments about formatted_expr could be improved
> further, for which I propose the attached.  Actually, staring some
> more at this, I'm inclined to change makeJsonValueExpr() to allow
> callers to pass it the finished 'formatted_expr' rather than set it by
> themselves.

Hmm, after looking some more, it may not be entirely right that
formatted_expr is always set in the code paths that call
transformJsonValueExpr().  Will look at this some more tomorrow.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Allow pg_archivecleanup to remove backup history files

2023-07-19 Thread torikoshia

On 2023-07-19 13:58, Michael Paquier wrote:

On Fri, Jun 30, 2023 at 03:48:43PM +0900, Michael Paquier wrote:

I have begun cleaning up my board, and applied 0001 for the moment.


And a few weeks later..  I have come around this thread and applied
0002 and 0003.

The flow of 0002 was straight-forward.  My main issue was in 0003,
actually, where the TAP tests were kind of confusing as written:
- There was no cleanup of the files still present after a single
command check, which could easily mess up the tests.
- The --dry-run case was using the list of WAL files for the extension
pattern checks, hardcoding names based on the position of its array.
I have switched that to use a third list of files, instead.

The result looked OK and that can be extended easily for more
patterns or more commands, so applied 0003 after doing these
adjustments, coupled with a pgperltidy run, a pgperlcritic check and
an indentation.
--
Michael


Thanks for the reviewing and applying the patches!

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION




Re: RFC: logical publication via inheritance root?

2023-07-19 Thread Aleksander Alekseev
Hi,

> v3 also fixes a nasty uninitialized stack variable, along with a bad
> collation assumption I made.

I decided to take a closer look at 0001.

Since pg_get_relation_publishing_info() is exposed to the users I
think it should be described in a bit more detail than:

```
+  descr => 'get information on how a relation will be published via a
list of publications',
```

This description in \df+ output doesn't seem to be particularly
useful. Also the function should be documented. In order to accomplish
all this it could make sense to reconsider the signature of the
function and/or split it into several separate functions.

The volatility is declared as STABLE. This is probably correct. At
least at first glance I don't see any calls of VOLATILE functions and
off the top of my head can't give an example when it will not behave
as STABLE. This being said, a second opinion would be appreciated.

process_relation_publications() misses a brief comment before the
declaration. What are the arguments, what is the return value, are
there any pre/postconditions (locks, memory), etc.

Otherwise 0001 is in a decent shape, it passes make
installcheck-world, etc. I would suggest focusing on delivering this
part, assuming there will be no push-back to the refactorings and
slight test improvements. If 0002 could be further decomposed into
separate iterative improvements this could be helpful.

-- 
Best regards,
Aleksander Alekseev




Remove backend warnings from SSL tests

2023-07-19 Thread Daniel Gustafsson
When looking at a patch in the CFBot I realized that the SSL tests generate
backend warnings under ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS due to roles
and databases not following the regression test naming convention.  While not
impacting the tested functionality, it's pretty silly to have warnings in the
test logs which can be avoided since those can throw off users debugging a test
failure.

The attached renames all roles with a regress_ prefix and databases with a
regression_ prefix to match the convention, and regenerates the certificates to
match.  With this I get a clean warning-free testrun.  There are no functional
changes included, just changed names (and comments to match).

--
Daniel Gustafsson



v1-0001-Remove-backend-warnings-when-running-SSL-tests.patch
Description: Binary data


Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-07-19 Thread Dean Rasheed
On Wed, 19 Jul 2023 at 09:24, Masahiko Sawada  wrote:
>
> > > 2) pg_strtoint32_safe() got substantially slower, mainly due
> > >to
> > > faff8f8e47f Allow underscores in integer and numeric constants.
> > > 6fcda9aba83 Non-decimal integer literals
> >
> > Agreed.
> >
> I have made some progress on dealing with performance regression on
> single client COPY. I've attached a patch to fix 2). With the patch I
> shared[1] to deal with 1), single client COPY performance seems to be
> now as good as (or slightly better than) PG15 . Here are the results
> (averages of 5 times) of loading 50M rows via COPY:
>

Hmm, I'm somewhat sceptical about this second patch. It's not obvious
why adding such tests would speed it up, and indeed, testing on my
machine with 50M rows, I see a noticeable speed-up from patch 1, and a
slow-down from patch 2:


PG15


7390.461 ms
7497.655 ms
7485.850 ms
7406.336 ms

HEAD


8388.707 ms
8283.484 ms
8391.638 ms
8363.306 ms

HEAD + P1
=

7255.128 ms
7185.319 ms
7197.822 ms
7191.176 ms

HEAD + P2
=

8687.164 ms
8654.907 ms
8641.493 ms
8668.865 ms

HEAD + P1 + P2
==

7780.126 ms
7786.427 ms
7775.047 ms
7785.938 ms


So for me at least, just applying patch 1 gives the best results, and
makes it slightly faster than PG15 (possibly due to 6b423ec677).

Regards,
Dean




Re: Extracting cross-version-upgrade knowledge from buildfarm client

2023-07-19 Thread Alvaro Herrera
I just hit a snag testing this.  It turns out that the
PostgreSQL::Version comparison stuff believes that 16beta2 < 16, which
sounds reasonable.  However, because of that, the AdjustUpgrade.pm
stanza that tries to drop tables public.gtest_normal_child{2} in
versions earlier than 16 fails, because by 16 these tables are dropped
in the test itself rather than left to linger, as was the case in
versions 15 and earlier.

So, if you try to run the pg_upgrade test with a dump created by
16beta2, it will fail to drop these tables (because they don't exist)
and the whole test fails.  Why hasn't the buildfarm detected this
problem?  I see that Drongo is happy, but I don't understand why.
Apparently, the AdjustUpgrade.pm stuff leaves no trace.

I can fix this either by using DROP IF EXISTS in that stanza, or by
making AdjustUpgrade use 'version <= 15'.  Any opinions on which to
prefer?


(Well, except that the tests added by c66a7d75e65 a few days ago fail
for some different reason -- the tests want pg_upgrade to fail, but it
doesn't fail for me.)

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
 It does it in a really, really complicated way
 why does it need to be complicated?
 Because it's MakeMaker.




Re: Add a perl function in Cluster.pm to generate WAL

2023-07-19 Thread Bharath Rupireddy
On Fri, Jun 16, 2023 at 8:00 AM Michael Paquier  wrote:
>
> On Thu, Jun 15, 2023 at 01:40:15PM +0900, Kyotaro Horiguchi wrote:
> > + "CREATE TABLE tt (); DROP TABLE tt; SELECT 
> > pg_switch_wal();");
> >
> > At least since 11, we can utilize pg_logical_emit_message() for this
> > purpose. It's more lightweight and seems appropriate, not only because
> > it doesn't cause any side effects but also bacause we don't have to
> > worry about name conflicts.
>
> Making this as cheap as possible by design is a good concept for a
> common routine.  +1.

While it seems reasonable to use pg_logical_emit_message, it won't
work for all the cases - what if someone wants to advance WAL by a few
WAL files? I think pg_switch_wal() is the way, no? For instance, the
replslot_limit.pl test increases the WAL in a very calculated way - it
increases by 5 WAL files. So, -1 to use pg_logical_emit_message.

I understand the naming conflicts for the table name used ('tt' in
this case). If the table name 'tt' looks so simple and easy for
someone to have tables with that name in their tests file, we can
generate a random table name in advance_wal, something like in the
attached v2 patch.

> > -  SELECT 'finished';",
> > - timeout => $PostgreSQL::Test::Utils::timeout_default));
> > -is($result[1], 'finished', 'check if checkpoint command is not blocked');
> > -
> > +$node_primary2->advance_wal(1);
> > +$node_primary2->safe_psql('postgres', 'CHECKPOINT;');
> >
> > This test anticipates that the checkpoint could get blocked. Shouldn't
> > we keep the timeout?
>
> Indeed, this would partially invalidate what's getting tested in light
> of 1816a1c6 where we run a secondary command after the checkpoint.  So
> the last SELECT should remain around.

Changed.

> > -$node_primary->safe_psql(
> > -'postgres', "create table retain_test(a int);
> > - select pg_switch_wal();
> > - insert into retain_test values(1);
> > - checkpoint;");
> > +$node_primary->advance_wal(1);
> > +$node_primary->safe_psql('postgres', "checkpoint;");
> >
> > The original test generated some WAL after the segment switch, which
> > appears to be a significant characteristics of the test.
>
> Still it does not matter for this specific case?  The logical slot has
> been already invalidated, so we don't care much about logical changes
> in WAL, do we?

Correct, the slot has already been invalidated and the test is
verifying that WAL isn't retained by the invalidated slot, so
essentially what it needs is to generate "some" wal. So, using
advance_wal there seems fine to me. CFBot doesn't complain anything -
https://github.com/BRupireddy/postgres/tree/add_a_TAP_test_function_to_generate_WAL_v2.

Attached the v2 patch. Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 959bd5c7dd00be05c98ab5a68406ac37af61a52f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 19 Jul 2023 09:58:11 +
Subject: [PATCH v2] Add a TAP test function to generate WAL

This commit adds a perl function in Cluster.pm to generate WAL.
Some TAP tests are now using their own way to generate WAL.
Generalizing this functionality enables multiple TAP tests to
reuse the functionality.
---
 src/test/perl/PostgreSQL/Test/Cluster.pm  | 24 ++
 src/test/recovery/t/001_stream_rep.pl |  6 +--
 src/test/recovery/t/019_replslot_limit.pl | 48 +--
 .../t/035_standby_logical_decoding.pl |  7 +--
 4 files changed, 40 insertions(+), 45 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 5e161dbee6..583c63f3db 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -3120,6 +3120,30 @@ sub create_logical_slot_on_standby
 
 =pod
 
+=item $node->advance_wal($n)
+
+Advance WAL of given node by $n segments
+
+=cut
+
+sub advance_wal
+{
+	my ($self, $n) = @_;
+
+	# Generate a random table name.
+	my $table_name = $self->safe_psql('postgres',
+			"SELECT substr(md5(random()::text), 1, 10)::text;");
+
+	# Advance by $n segments (= (wal_segment_size * $n) bytes).
+	for (my $i = 0; $i < $n; $i++)
+	{
+		$self->safe_psql('postgres',
+			qq{CREATE TABLE "$table_name"(); DROP TABLE "$table_name"; SELECT pg_switch_wal();});
+	}
+}
+
+=pod
+
 =back
 
 =cut
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 0c72ba0944..0e256dab8d 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -496,11 +496,7 @@ $node_primary->safe_psql('postgres',
 my $segment_removed = $node_primary->safe_psql('postgres',
 	'SELECT pg_walfile_name(pg_current_wal_lsn())');
 chomp($segment_removed);
-$node_primary->psql(
-	'postgres', "
-	CREATE TABLE tab_phys_slot (a int);
-	

Re: Fix a comment in basic_archive about NO_INSTALLCHECK

2023-07-19 Thread Bharath Rupireddy
On Thu, Apr 6, 2023 at 9:26 AM Michael Paquier  wrote:
>
> On Mon, Apr 03, 2023 at 08:56:10AM +0530, Bharath Rupireddy wrote:
> > It looks like comments in make file and meson file about not running
> > basic_archive tests in NO_INSTALLCHECK mode are wrong. The comments say the
> > module needs to be loaded via shared_preload_libraries=basic_archive, but
> > it actually doesn't. The custom file needs archive related parameters and
> > wal_level=replica. Here's a patch correcting that comment.
>
> Wouldn't it be better to also set shared_preload_libraries in
> basic_archive.conf?  It is true that the test works fine if setting
> only archive_library, which would cause the library with its
> _PG_init() to be loaded in the archiver process.  However the GUC
> basic_archive.archive_directory is missing from the backends.

Hm, I think the other backends will still see the value of the GUC
without shared_preload_libraries=basic_archive. You can verify it with
adding SHOW basic_archive.archive_directory; to basic_archive.sql. The
basic_archive library gets loaded by archiver via _PG_init. It's the
archiver defining a custom GUC variable which will propagate to all
the postgres processes via set_config_option_ext. Therefore, we don't
need shared_preload_libraries=basic_archive.

#3  0x7f75306406b6 in _PG_init () at basic_archive.c:86
#4  0x562652d0c87c in internal_load_library (
libname=0x5626549102d8
"/home/ubuntu/postgres/tmp_install/home/ubuntu/postgres/inst/lib/basic_archive.so")
at dfmgr.c:289
#5  0x562652d0c1e7 in load_external_function
(filename=0x562654930698 "basic_archive",
funcname=0x562652eca81b "_PG_archive_module_init",
signalNotFound=false, filehandle=0x0) at dfmgr.c:116
#6  0x562652a3a400 in LoadArchiveLibrary () at pgarch.c:841
#7  0x562652a39489 in PgArchiverMain () at pgarch.c:256
#8  0x562652a353de in AuxiliaryProcessMain
(auxtype=ArchiverProcess) at auxprocess.c:145
#9  0x562652a40b8e in StartChildProcess (type=ArchiverProcess) at
postmaster.c:5341
#10 0x562652a3e529 in process_pm_child_exit () at postmaster.c:3072
#11 0x562652a3c329 in ServerLoop () at postmaster.c:1767
#12 0x562652a3bc52 in PostmasterMain (argc=8, argv=0x56265490e1e0)
at postmaster.c:1462
#13 0x5626528efbbf in main (argc=8, argv=0x56265490e1e0) at main.c:198

> Saying that, updating the comments about the dependency with
> archive_library and the module's GUC is right.

Thanks. Any thoughts on the v1 patch attached upthread?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-07-19 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> I have studied this a bit more and it seems that is true for physical
> walsenders where we set the state of walsender as WALSNDSTATE_STOPPING
> in XLogSendPhysical, then the checkpointer finishes writing checkpoint
> record and then postmaster sends SIGUSR2 for walsender to exit. IIUC,
> this whole logic of different stop states has been introduced in
> commit c6c3334364 based on the discussion in the thread [1]. As per my
> understanding, logical walsenders don't seem to be waiting for
> shutdown checkpoint record and finishes before even we LOG that
> record. It seems that the behavior of logical walsenders is different
> from physical walsenders where we wait for them to send even the final
> shutdown checkpoint record before they finish.

Yes, you are right. Physical walsenders wait exiting checkpointer, but logical
ones exit before checkpointer does. This is because logical walsender may 
generate
WALs due by executing replication commands like START_REPLICATION and
CREATE_REPLICATION_SLOT and they may be recorded at after the shutdown
checkpoint record. This leads PANIC.

> If so, then we won't be
> able to switchover to logical subscribers even in case of a clean
> shutdown. Am, I missing something?
> 
> [1] -
> https://www.postgresql.org/message-id/CAHGQGwEsttg9P9LOOavoc9d6VB1zV
> mYgfBk%3DLjsk-UL9cEf-eA%40mail.gmail.com

Based on the above, we are considering that we delay the timing of shutdown for
logical walsenders. The preliminary workflow is:

1. When logical walsenders receives siginal from checkpointer, it consumes all
   of WAL records, change its state into WALSNDSTATE_STOPPING, and stop doing
   anything. 
2. Then the checkpointer does the shutdown checkpoint
3. After that postmaster sends signal to walsenders, same as current 
implementation.
4. Finally logical walsenders process the shutdown checkpoint record and update 
the
  confirmed_lsn after the acknowledgement from subscriber. 
  Note that logical walsenders don't have to send a shutdown checkpoint record
  to subscriber but following keep_alive will help us to increment the 
confirmed_lsn.
5. All tasks are done, they exit.

This mechanism ensures that the confirmed_lsn of active slots is same as the 
current
WAL location of old publisher, so that 0003 patch would become more simpler.
We would not have to calculate the acceptable difference anymore.

One thing we must consider is that any WALs must not be generated while decoding
the shutdown checkpoint record. It causes the PANIC. IIUC the record leads
SnapBuildSerializationPoint(), which just serializes snapbuild or restores from
it, so the change may be acceptable. Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: psql: Add role's membership options to the \du+ command

2023-07-19 Thread Alvaro Herrera
I tried this out.  It looks good to me, and I like it.  Not translating
the labels seems correct to me.

+1 for backpatching to 16, given that it's a psql-only change that
pertains to a backend change that was done in the 16 timeframe.

Regarding the controversy of showing SET for previous versions, I think
it's clearer if it's shown, because ultimately what the user really
wants to know is if the role can be SET to; they don't want to have to
learn from memory in which version they can SET because the column is
empty and in which version they have to look for the label.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Small aircraft do not crash frequently ... usually only once!"
  (ponder, http://thedailywtf.com/)




Re: A minor adjustment to get_cheapest_path_for_pathkeys

2023-07-19 Thread Richard Guo
On Tue, Jul 11, 2023 at 8:16 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi,
>
> > The check for parallel_safe should be even cheaper than cost comparison
> > so I think it's better to do that first.  The attached patch does this
> > and also updates the comment to mention the requirement about being
> > parallel-safe.
>
> The patch was marked as "Needs review" so I decided to take a look.


Thanks for the review!


> I see the reasoning behind the proposed change, but I'm not convinced
> that there will be any measurable performance improvements. Firstly,
> compare_path_costs() is rather cheap. Secondly, require_parallel_safe
> is `false` in most of the cases. Last but not least, one should prove
> that this particular place is a bottleneck under given loads. I doubt
> it is. Most of the time it's a network, disk I/O or locks.
>
> So unless the author can provide benchmarks that show measurable
> benefits of the change I suggest rejecting it.


Hmm, I doubt that there would be any measurable performance gains from
this minor tweak.  I think this tweak is more about being cosmetic.  But
I'm OK if it is deemed unnecessary and thus rejected.

Another change in this patch is to mention the requirement about being
parallel-safe in the comment.

  *   Find the cheapest path (according to the specified criterion) that
- *   satisfies the given pathkeys and parameterization.
+ *   satisfies the given pathkeys and parameterization, and is
parallel-safe
+ *   if required.

Maybe this is something that is worthwhile to do?

Thanks
Richard


Re: Use of additional index columns in rows filtering

2023-07-19 Thread Tomas Vondra



On 7/19/23 01:22, Jeff Davis wrote:
> On Wed, 2023-07-19 at 00:36 +0200, Tomas Vondra wrote:
>>> * I'm confused about the relationship of an IOS to an index filter.
>>> It
>>> seems like the index filter only works for an ordinary index scan?
>>> Why
>>> is that?
>>
>> What would it do for IOS?
> 
> The way it's presented is slightly confusing. If you have table x with
> and index on column i, then:
> 
>   EXPLAIN (ANALYZE, BUFFERS)
> SELECT i, j FROM x WHERE i = 7 and (i % 1000 = 7);
> 
>Index Scan using x_idx on x  (cost=0.42..8.45 rows=1 width=8)
> (actual time=0.094..0.098 rows=1 loops=1)
>  Index Cond: (i = 7)
>  Index Filter: ((i % 1000) = 7)
> 
> But if you remove "j" from the target list, you get:
> 
>   EXPLAIN (ANALYZE, BUFFERS)
> SELECT i FROM x WHERE i = 7 and (i % 1000 = 7);
> 
>Index Only Scan using x_idx on x  (cost=0.42..4.45 rows=1 width=4)
> (actual time=0.085..0.088 rows=1 loops=1)
>   Index Cond: (i = 7)
>   Filter: ((i % 1000) = 7)
> 
> The confused me at first because the "Filter" in the second plan means
> the same thing as the "Index Filter" in the first plan. Should we call
> the filter in an IOS an "Index Filter" too? Or is that redundant?
> 

I agree the naming in explain is a bit confusing.

I wonder if Andres was right (in the index prefetch thread) that
splitting regular index scans and index-only scans may not be ideal. In
a way, this patch moves those nodes closer, both in capability and code
(because now both use index_getnext_tid etc.).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Avoid stack frame setup in performance critical routines using tail calls

2023-07-19 Thread Andres Freund
Hi,

David and I were chatting about this patch, in the context of his bump
allocator patch.  Attached is a rebased version that is also split up into two
steps, and a bit more polished.

I wasn't sure what a good test was. I ended up measuring
  COPY pgbench_accounts TO '/dev/null' WITH (FORMAT 'binary');
of a scale 1 database with pgbench:

c=1;pgbench -q -i -s1 && pgbench  -n -c$c -j$c -t100 -f <(echo "COPY 
pgbench_accounts TO '/dev/null' WITH (FORMAT 'binary');")

average latency
HEAD:   33.865 ms
01: 32.820 ms
02: 29.934 ms

The server was pinned to the one core, turbo mode disabled.  That's a pretty
nice win, I'd say.  And I don't think this is actually the most allocator
bound workload, I just tried something fairly random...


Greetings,

Andres Freund
>From 4b97070e32ed323ae0f083bf865717c6d271ce53 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 18 Jul 2023 18:55:58 -0700
Subject: [PATCH v2 1/4] Optimize palloc() etc to allow sibling calls

The reason palloc() etc previously couldn't use sibling calls is that they did
check the returned value (to e.g. raise an error when the allocation
fails). Push the error handling down into the memory context implementation -
they can avoid performing the check at all in the hot code paths.
---
 src/include/nodes/memnodes.h  |   4 +-
 src/include/utils/memutils_internal.h |  29 +++-
 src/backend/utils/mmgr/alignedalloc.c |  11 +-
 src/backend/utils/mmgr/aset.c |  23 ++--
 src/backend/utils/mmgr/generation.c   |  14 +-
 src/backend/utils/mmgr/mcxt.c | 186 ++
 src/backend/utils/mmgr/slab.c |  12 +-
 7 files changed, 102 insertions(+), 177 deletions(-)

diff --git a/src/include/nodes/memnodes.h b/src/include/nodes/memnodes.h
index ff6453bb7ac..47d2932e6ed 100644
--- a/src/include/nodes/memnodes.h
+++ b/src/include/nodes/memnodes.h
@@ -57,10 +57,10 @@ typedef void (*MemoryStatsPrintFunc) (MemoryContext context, void *passthru,
 
 typedef struct MemoryContextMethods
 {
-	void	   *(*alloc) (MemoryContext context, Size size);
+	void	   *(*alloc) (MemoryContext context, Size size, int flags);
 	/* call this free_p in case someone #define's free() */
 	void		(*free_p) (void *pointer);
-	void	   *(*realloc) (void *pointer, Size size);
+	void	   *(*realloc) (void *pointer, Size size, int flags);
 	void		(*reset) (MemoryContext context);
 	void		(*delete_context) (MemoryContext context);
 	MemoryContext (*get_chunk_context) (void *pointer);
diff --git a/src/include/utils/memutils_internal.h b/src/include/utils/memutils_internal.h
index 2d107bbf9d4..3a050819263 100644
--- a/src/include/utils/memutils_internal.h
+++ b/src/include/utils/memutils_internal.h
@@ -19,9 +19,9 @@
 #include "utils/memutils.h"
 
 /* These functions implement the MemoryContext API for AllocSet context. */
-extern void *AllocSetAlloc(MemoryContext context, Size size);
+extern void *AllocSetAlloc(MemoryContext context, Size size, int flags);
 extern void AllocSetFree(void *pointer);
-extern void *AllocSetRealloc(void *pointer, Size size);
+extern void *AllocSetRealloc(void *pointer, Size size, int flags);
 extern void AllocSetReset(MemoryContext context);
 extern void AllocSetDelete(MemoryContext context);
 extern MemoryContext AllocSetGetChunkContext(void *pointer);
@@ -36,9 +36,9 @@ extern void AllocSetCheck(MemoryContext context);
 #endif
 
 /* These functions implement the MemoryContext API for Generation context. */
-extern void *GenerationAlloc(MemoryContext context, Size size);
+extern void *GenerationAlloc(MemoryContext context, Size size, int flags);
 extern void GenerationFree(void *pointer);
-extern void *GenerationRealloc(void *pointer, Size size);
+extern void *GenerationRealloc(void *pointer, Size size, int flags);
 extern void GenerationReset(MemoryContext context);
 extern void GenerationDelete(MemoryContext context);
 extern MemoryContext GenerationGetChunkContext(void *pointer);
@@ -54,9 +54,9 @@ extern void GenerationCheck(MemoryContext context);
 
 
 /* These functions implement the MemoryContext API for Slab context. */
-extern void *SlabAlloc(MemoryContext context, Size size);
+extern void *SlabAlloc(MemoryContext context, Size size, int flags);
 extern void SlabFree(void *pointer);
-extern void *SlabRealloc(void *pointer, Size size);
+extern void *SlabRealloc(void *pointer, Size size, int flags);
 extern void SlabReset(MemoryContext context);
 extern void SlabDelete(MemoryContext context);
 extern MemoryContext SlabGetChunkContext(void *pointer);
@@ -75,7 +75,7 @@ extern void SlabCheck(MemoryContext context);
  * part of a fully-fledged MemoryContext type.
  */
 extern void AlignedAllocFree(void *pointer);
-extern void *AlignedAllocRealloc(void *pointer, Size size);
+extern void *AlignedAllocRealloc(void *pointer, Size size, int flags);
 extern MemoryContext AlignedAllocGetChunkContext(void *pointer);
 extern Size AlignedAllocGetChunkSpace(void *pointer);
 
@@ -133,4 +133,19 @@ extern void 

Re: unrecognized node type while displaying a Path due to dangling pointer

2023-07-19 Thread Ashutosh Bapat
On Tue, Jul 18, 2023 at 5:05 AM David Rowley  wrote:
>
> On Mon, 17 Jul 2023 at 15:31, Tom Lane  wrote:
> > > I also didn't do anything about ExtensibleNode types. I assume just
> > > copying the ExtensibleNode isn't good enough. To flat copy the actual
> > > node I think would require adding a new function to
> > > ExtensibleNodeMethods.
> >
> > Yeah, the problem I've got with this approach is that flat-copying
> > FDW and Custom paths would require extending the respective APIs.
> > While that's a perfectly reasonable ask if we only need to do this
> > in HEAD, it would be a nonstarter for released branches.  Is it
> > okay to only fix this issue in HEAD?
>
> CustomPaths, I didn't think about those. That certainly makes it more
> complex.  I also now see the header comment for struct CustomPath
> mentioning that we don't copy Paths:

Somewhere upthread Tom suggested using a dummy projection path. Add a
projection path on top of input path and add the projection path to
output rel's list. That will work right?

There's some shallow copying code in reparameterize_path_by_childrel()
but that's very specific to the purpose there and doesn't consider
Custom or Foreign paths.

-- 
Best Wishes,
Ashutosh Bapat




Re: Support to define custom wait events for extensions

2023-07-19 Thread Bharath Rupireddy
On Wed, Jul 19, 2023 at 11:49 AM Masahiro Ikeda
 wrote:
>
> I updated the patch since the cfbot found a bug.
> * v7-0001-Support-custom-wait-events-for-extensions.patch

Thanks for working on this feature. +1. I've wanted this capability
for a while because extensions have many different wait loops for
different reasons, a single wait event type isn't enough.

I think we don't need a separate test extension for demonstrating use
of custom wait events, you can leverage the sample extension
worker_spi because that's where extension authors look for while
writing a new extension. Also, it simplifies your patch a lot. I don't
mind adding a few things to worker_spi for the sake of demonstrating
the use and testing for custom wait events.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-07-19 Thread Masahiko Sawada
On Wed, Jul 12, 2023 at 5:40 PM Masahiko Sawada  wrote:
>
> On Wed, Jul 12, 2023 at 3:52 AM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2023-07-03 11:55:13 +0900, Masahiko Sawada wrote:
> > > While testing PG16, I observed that in PG16 there is a big performance
> > > degradation in concurrent COPY into a single relation with 2 - 16
> > > clients in my environment. I've attached a test script that measures
> > > the execution time of COPYing 5GB data in total to the single relation
> > > while changing the number of concurrent insertions, in PG16 and PG15.
> > > Here are the results on my environment (EC2 instance, RHEL 8.6, 128
> > > vCPUs, 512GB RAM):
> > >
> > > * PG15 (4b15868b69)
> > > PG15: nclients = 1, execution time = 14.181
> > >
> > > * PG16 (c24e9ef330)
> > > PG16: nclients = 1, execution time = 17.112
> >
> > > The relevant commit is 00d1e02be2 "hio: Use ExtendBufferedRelBy() to
> > > extend tables more efficiently". With commit 1cbbee0338 (the previous
> > > commit of 00d1e02be2), I got a better numbers, it didn't have a better
> > > scalability, though:
> > >
> > > PG16: nclients = 1, execution time = 17.444
> >
> > I think the single client case is indicative of an independent regression, 
> > or
> > rather regressions - it can't have anything to do with the fallocate() issue
> > and reproduces before that too in your numbers.
>
> Right.
>
> >
> > 1) COPY got slower, due to:
> > 9f8377f7a27 Add a DEFAULT option to COPY  FROM
> >
> > This added a new palloc()/free() to every call to NextCopyFrom(). It's not 
> > at
> > all clear to me why this needs to happen in NextCopyFrom(), particularly
> > because it's already stored in CopyFromState?
>
> Yeah, it seems to me that we can palloc the bool array once and use it
> for the entire COPY FROM. With the attached small patch, the
> performance becomes much better:
>
> 15:
> 14.70500 sec
>
> 16:
> 17.42900 sec
>
> 16 w/ patch:
> 14.85600 sec
>
> >
> >
> > 2) pg_strtoint32_safe() got substantially slower, mainly due
> >to
> > faff8f8e47f Allow underscores in integer and numeric constants.
> > 6fcda9aba83 Non-decimal integer literals
>
> Agreed.
>
> >
> > pinned to one cpu, turbo mode disabled, I get the following best-of-three 
> > times for
> >   copy test from '/tmp/tmp_4.data'
> > (too impatient to use the larger file every time)
> >
> > 15:
> > 6281.107 ms
> >
> > HEAD:
> > 7000.469 ms
> >
> > backing out 9f8377f7a27:
> > 6433.516 ms
> >
> > also backing out faff8f8e47f, 6fcda9aba83:
> > 6235.453 ms
> >
> >
> > I suspect 1) can relatively easily be fixed properly. But 2) seems much
> > harder. The changes increased the number of branches substantially, that's
> > gonna cost in something as (previously) tight as pg_strtoint32().
>
> I'll look at how to fix 2).

I have made some progress on dealing with performance regression on
single client COPY. I've attached a patch to fix 2). With the patch I
shared[1] to deal with 1), single client COPY performance seems to be
now as good as (or slightly better than) PG15 . Here are the results
(averages of 5 times) of loading 50M rows via COPY:

15:
7.609 sec

16:
8.637 sec

16 w/ two patches:
7.179 sec

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoBb9Sbddh%2BnQk1BxUFaRHaa%2BfL8fCzO-Lvxqb6ZcpAHqw%40mail.gmail.com

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


fix_numeric.patch
Description: Binary data


Re: remaining sql/json patches

2023-07-19 Thread Amit Langote
On Wed, Jul 19, 2023 at 12:53 AM Alvaro Herrera  wrote:
> On 2023-Jul-18, Amit Langote wrote:
>
> > Attached updated patches.  In 0002, I removed the mention of the
> > RETURNING clause in the JSON(), JSON_SCALAR() documentation, which I
> > had forgotten to do in the last version which removed its support in
> > code.
>
> > I think 0001 looks ready to go.  Alvaro?
>
> It looks reasonable to me.

Thanks for taking another look.

I will push this tomorrow.

> > Also, I've been wondering if it isn't too late to apply the following
> > to v16 too, so as to make the code look similar in both branches:
>
> Hmm.
>
> > 785480c953 Pass constructName to transformJsonValueExpr()
>
> I think 785480c953 can easily be considered a bugfix on 7081ac46ace8, so
> I agree it's better to apply it to 16.

OK.

> > b6e1157e7d Don't include CaseTestExpr in JsonValueExpr.formatted_expr
>
> I feel a bit uneasy about this one.  It seems to assume that
> formatted_expr is always set, but at the same time it's not obvious that
> it is.  (Maybe this aspect just needs some more commentary).

Hmm, I agree that the comments about formatted_expr could be improved
further, for which I propose the attached.  Actually, staring some
more at this, I'm inclined to change makeJsonValueExpr() to allow
callers to pass it the finished 'formatted_expr' rather than set it by
themselves.

>  I agree
> that it would be better to make both branches identical, because if
> there's a problem, we are better equipped to get a fix done to both.
>
> As for the removal of makeCaseTestExpr(), I agree -- of the six callers
> of makeNode(CastTestExpr), only two of them would be able to use the new
> function, so it doesn't look of general enough usefulness.

OK, so you agree with back-patching this one too, though perhaps only
after applying something like the aforementioned patch.  Just to be
sure, would the good practice in this case be to squash the fixup
patch into b6e1157e7d before back-patching?


--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


0001-Code-review-for-commit-b6e1157e7d.patch
Description: Binary data


Re: pg_recvlogical prints bogus error when interrupted

2023-07-19 Thread Bharath Rupireddy
On Wed, Jul 19, 2023 at 1:41 PM Michael Paquier  wrote:
>
> > 3. pg_log_info("end position %X/%X reached on signal",  For
> > signal, end position is a bit vague wording and I think we can just
> > say pg_log_info("received interrupt signal, exiting"); like
> > pg_receivewal. We really can't have a valid stop_lsn for signal exit
> > because that depends on when signal arrives in the while loop. If at
> > all, someone wants to know the last received LSN - they can look at
> > the other messages that pg_recvlogical emits - pg_recvlogical:
> > confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot myslot).
>
> +case STREAM_STOP_SIGNAL:
> +pg_log_info("received interrupt signal, exiting");
> +break;
>
> Still it is useful to report the location we have finished with when
> stopping on a signal, no?  Why couldn't we use "lsn" here, aka
> cur_record_lsn?

Printing LSN on signal exit won't be correct - if signal is received
before cur_record_lsn gets assigned, we will be showing an old LSN if
it was previously assigned or invalid LSN if it wasn't assigned
previously. Signal arrival and processing are indeterministic, so we
can't always show the right info. Instead, we can just be simple in
the messaging without an lsn like pg_receivewal.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: pg_recvlogical prints bogus error when interrupted

2023-07-19 Thread Michael Paquier
On Wed, Jul 19, 2023 at 12:07:21PM +0530, Bharath Rupireddy wrote:
> 1. I don't think we need a stop_lsn variable, the cur_record_lsn can
> help if it's defined outside the loop. With this, the patch can
> further be simplified as attached v6.

Okay by me.

> 2. And, I'd prefer not to assume the stop_reason as signal by default.
> While it works for now because the remaining place where time_to_abort
> is set to true is only in the signal handler, it is not extensible, if
> there's another exit condition comes in future that sets time_to_abort
> to true.

Yeah, I was also wondering whether this should be set by the signal
handler in this case while storing the reason statically on a specific
sig_atomic_t.

> 3. pg_log_info("end position %X/%X reached on signal",  For
> signal, end position is a bit vague wording and I think we can just
> say pg_log_info("received interrupt signal, exiting"); like
> pg_receivewal. We really can't have a valid stop_lsn for signal exit
> because that depends on when signal arrives in the while loop. If at
> all, someone wants to know the last received LSN - they can look at
> the other messages that pg_recvlogical emits - pg_recvlogical:
> confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot myslot).

+case STREAM_STOP_SIGNAL:
+pg_log_info("received interrupt signal, exiting");
+break;

Still it is useful to report the location we have finished with when
stopping on a signal, no?  Why couldn't we use "lsn" here, aka
cur_record_lsn?

> 4. With v5, it was taking a while to exit after the first CTRL+C, see
> multiple CTRL+Cs at the end:
> ubuntu::~/postgres/inst/bin$ ./pg_recvlogical --slot=lsub1_repl_slot
> --file=lsub1.data --start --verbose
> pg_recvlogical: starting log streaming at 0/0 (slot lsub1_repl_slot)
> pg_recvlogical: streaming initiated
> pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot 
> lsub1_repl_slot)
> pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
> (slot lsub1_repl_slot)
> pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
> (slot lsub1_repl_slot)
> pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
> (slot lsub1_repl_slot)
> ^Cpg_recvlogical: end position 0/2867D70 reached on signal
> ^C^C^C^C^C^C^C^C^C^C^C^C
> ^C^C^C^C^C^C^C^C^C^C^C^C

Isn't that where we'd need to look at a long change but we need to
stop cleanly?  That sounds expected to me.
--
Michael


signature.asc
Description: PGP signature


Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

2023-07-19 Thread Önder Kalacı
Hi Masahiko, Amit, all

I've updated the patch.
>

I think the flow is much nicer now compared to the HEAD. I really don't
have any
comments regarding the accuracy of the code changes, all looks good to me.
Overall, I cannot see any behavioral changes as you already alluded to.

Maybe few minor notes regarding the comments:

 /*
> + * And must reference the remote relation column. This is because if it
> + * doesn't, the sequential scan is favorable over index scan in most
> + * cases..
> + */


I think the reader might have lost the context (or say in the future due to
another refactor etc). So maybe start with:

/* And the leftmost index field must refer to the  ...


Also, now in IsIndexUsableForReplicaIdentityFull() some of the conditions
have comments
some don't. Should we comment on the missing ones as well, maybe such as:

/* partial indexes are not support *
> if (indexInfo->ii_Predicate != NIL)
>
and,

> /* all indexes must have at least one attribute */
> Assert(indexInfo->ii_NumIndexAttrs >= 1);




>
>>
>> BTW, IsIndexOnlyExpression() is not necessary but the current code
>> still works fine. So do we need to backpatch it to PG16? I'm thinking
>> we can apply it to only HEAD.
>
> Either way is fine but I think if we backpatch it then the code
> remains consistent and the backpatching would be easier.
>

Yeah, I also have a slight preference for backporting. It could make it
easier to maintain the code
in the future in case of another backport(s). With the cost of making it
slightly harder for you now :)

Thanks,
Onder


Re: pg_recvlogical prints bogus error when interrupted

2023-07-19 Thread Bharath Rupireddy
On Wed, Jul 19, 2023 at 12:07 PM Bharath Rupireddy
 wrote:
>
> 4. With v5, it was taking a while to exit after the first CTRL+C, see
> multiple CTRL+Cs at the end:
> ubuntu::~/postgres/inst/bin$ ./pg_recvlogical --slot=lsub1_repl_slot
> --file=lsub1.data --start --verbose
> pg_recvlogical: starting log streaming at 0/0 (slot lsub1_repl_slot)
> pg_recvlogical: streaming initiated
> pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot 
> lsub1_repl_slot)
> pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
> (slot lsub1_repl_slot)
> pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
> (slot lsub1_repl_slot)
> pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
> (slot lsub1_repl_slot)
> ^Cpg_recvlogical: end position 0/2867D70 reached on signal
> ^C^C^C^C^C^C^C^C^C^C^C^C
> ^C^C^C^C^C^C^C^C^C^C^C^C

I think the delay is expected for the reason specified below and is
not because of any of the changes in v5. As far as CTRL+C is
concerned, it is a clean exit and hence we can't escape the while(1)
loop.

/*
 * We're doing a client-initiated clean exit and have sent CopyDone to
 * the server. Drain any messages, so we don't miss a last-minute
 * ErrorResponse. The walsender stops generating XLogData records once
 * it sees CopyDone, so expect this to finish quickly. After CopyDone,
 * it's too late for sendFeedback(), even if this were to take a long
 * time. Hence, use synchronous-mode PQgetCopyData().
 */
while (1)
{

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Adding argument names to aggregate functions

2023-07-19 Thread Daniel Gustafsson
This patch no longer applied but had a fairly trivial conflict so I've attached
a rebased v3 addressing the conflict in the hopes of getting this further.

--
Daniel Gustafsson



v3-0001-Add-argument-names-to-multi-argument-aggregates.patch
Description: Binary data


Re: pg_upgrade and logical replication

2023-07-19 Thread Michael Paquier
On Wed, May 10, 2023 at 05:59:24PM +1000, Peter Smith wrote:
> 1. ALTER SUBSCRIPTION name ADD TABLE (relid = XYZ, state = 'x' [, lsn = 
> 'X/Y'])
> 
> I was a bit confused by this relation 'state' mentioned in multiple
> places. IIUC the pg_upgrade logic is going to reject anything with a
> non-READY (not 'r') state anyhow, so what is the point of having all
> the extra grammar/parse_subscription_options etc to handle setting the
> state when only possible value must be 'r'?

We are just talking about the handling of an extra DefElem in an
extensible grammar pattern, so adding the state field does not
represent much maintenance work.  I'm OK with the addition of this
field in the data set dumped, FWIW, on the ground that it can be
useful for debugging purposes when looking at --binary-upgrade dumps,
and because we aim at copying catalog contents from one cluster to
another.

Anyway, I am not convinced that we have any need for a parse-able
grammar at all, because anything that's presented on this thread is
aimed at being used only for the internal purpose of an upgrade in a
--binary-upgrade dump with a direct catalog copy in mind, and having a
grammar would encourage abuses of it outside of this context.  I think
that we should aim for simpler than what's proposed by the patch,
actually, with either a single SQL function à-la-binary_upgrade() that
adds the contents of a relation.  Or we can be crazier and just create
INSERT queries for pg_subscription_rel to provide an exact copy of the
catalog contents.  A SQL function would be more consistent with other
objects types that use similar tricks, see
binary_upgrade_create_empty_extension() that does something similar
for some pg_extension records.  So, this function would require in
input 4 arguments:
- The subscription name or OID.
- The relation OID.
- Its LSN.
- Its sync state.

> 2. state V relstate
> 
> I still feel code readbility suffers a bit by calling some fields/vars
> a generic 'state' instead of the more descriptive 'relstate'. Maybe
> it's just me.
> 
> Previously commented same (see [1]#3, #4, #5)

Agreed to be more careful with the naming here.
--
Michael


signature.asc
Description: PGP signature


Do we want to enable foreign key constraints on subscriber?

2023-07-19 Thread Kyotaro Horiguchi
Hello.

There's an issue brought up in the -bugs list [1]. Since triggers are
deactivated on a subscriber by default, foreign key constraints don't
fire for replicated changes. The docs state this is done to prevent
repetitive data propagation between tables on subscribers. But foreign
key triggers don't contribute to this issue.

My understanding is that constraint triggers, including ones created
using the "CREATE CONSTRAINT TRIGGER" command, aren't spposed to alter
data. If this holds true, I propose that we modify the function
CreateTrigger() to make constraint triggers enabled on subscribers as
attached. The function CreateTrigger() can choose the value for the
parameter "trigger_fires_when" of CreateTriggerFireingOn() based on
whether constraintOid is valid or not.

What do you think about this change?


A reproducer follows. The last UPDATE successfully propagates to the
subscriber, removing a row that couldn't be locally removed on the
subscriber due to the referencial constraint.

Publisher:
CREATE TABLE t (a int not null, b bool not null);
ALTER TABLE t REPLICA IDENTITY FULL;
INSERT INTO t VALUES (0, true), (1, true), (2, true);
CREATE PUBLICATION p1 FOR TABLE t WHERE (b IS true);

Subscriber:
CREATE TABLE t (a int primary key, b bool);
CREATE TABLE t1 (a int references t(a) ON UPDATE CASCADE);
CREATE SUBSCRIPTION s1 CONNECTION 'host=/tmp port=5432' PUBLICATION p1;
SELECT pg_sleep(0.5);
INSERT INTO t1 VALUES (2);

== trigger correctly fires
Subscriber:
DELETE FROM t WHERE a = 2;
> ERROR:  update or delete on table "t" violates foreign key constraint 
> "t1_a_fkey" on table "t1"
> DETAIL:  Key (a)=(2) is still referenced from table "t1".

== trigger doesn't fire
Publisher:
UPDATE t SET b = false WHERE a = 2;

Subscriber:
SELECT * FROM t;   -- (2 doesn't exist)


regards.

[1]: https://www.postgresql.org/message-id/18019-21e3fdb5d9057...@postgresql.org
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 52177759ab..c0eb8ea203 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -171,7 +171,10 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 		CreateTriggerFiringOn(stmt, queryString, relOid, refRelOid,
 			  constraintOid, indexOid, funcoid,
 			  parentTriggerOid, whenClause, isInternal,
-			  in_partition, TRIGGER_FIRES_ON_ORIGIN);
+			  in_partition,
+			  OidIsValid(constraintOid) ?
+			  TRIGGER_FIRES_ALWAYS:
+			  TRIGGER_FIRES_ON_ORIGIN);
 }
 
 /*


Re: pg_recvlogical prints bogus error when interrupted

2023-07-19 Thread Bharath Rupireddy
On Wed, Jul 19, 2023 at 8:04 AM Michael Paquier  wrote:
>
> It took me some time to come back to this one, but attached is what I
> had in mind.  This stuff has three reasons to stop: keepalive, end LSN
> or signal.  This makes the code easier to follow.
>
> Thoughts or comments?

Thanks. I have some comments on v5:

1. I don't think we need a stop_lsn variable, the cur_record_lsn can
help if it's defined outside the loop. With this, the patch can
further be simplified as attached v6.

2. And, I'd prefer not to assume the stop_reason as signal by default.
While it works for now because the remaining place where time_to_abort
is set to true is only in the signal handler, it is not extensible, if
there's another exit condition comes in future that sets time_to_abort
to true.

3. pg_log_info("end position %X/%X reached on signal",  For
signal, end position is a bit vague wording and I think we can just
say pg_log_info("received interrupt signal, exiting"); like
pg_receivewal. We really can't have a valid stop_lsn for signal exit
because that depends on when signal arrives in the while loop. If at
all, someone wants to know the last received LSN - they can look at
the other messages that pg_recvlogical emits - pg_recvlogical:
confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0 (slot myslot).

4. With v5, it was taking a while to exit after the first CTRL+C, see
multiple CTRL+Cs at the end:
ubuntu::~/postgres/inst/bin$ ./pg_recvlogical --slot=lsub1_repl_slot
--file=lsub1.data --start --verbose
pg_recvlogical: starting log streaming at 0/0 (slot lsub1_repl_slot)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot lsub1_repl_slot)
pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
(slot lsub1_repl_slot)
pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
(slot lsub1_repl_slot)
pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
(slot lsub1_repl_slot)
^Cpg_recvlogical: end position 0/2867D70 reached on signal
^C^C^C^C^C^C^C^C^C^C^C^C
^C^C^C^C^C^C^C^C^C^C^C^C

5. FWIW, on HEAD we'd get the following and the patch emits a better messaging:
ubuntu:~/postgres/inst/bin$ ./pg_recvlogical --slot=lsub1_repl_slot
--file=lsub1.data --start --dbname='host=localhost port=5432
dbname=postgres user=ubuntu' --verbose
pg_recvlogical: starting log streaming at 0/0 (slot lsub1_repl_slot)
pg_recvlogical: streaming initiated
pg_recvlogical: confirming write up to 0/0, flush to 0/0 (slot lsub1_repl_slot)
pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
(slot lsub1_repl_slot)
pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
(slot lsub1_repl_slot)
pg_recvlogical: confirming write up to 0/2BFFFD0, flush to 0/2BFFFD0
(slot lsub1_repl_slot)
^Cpg_recvlogical: error: unexpected termination of replication stream:

Attaching v6 patch with the above changes to v6. Thoughts?

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v6-0001-Fix-pg_recvlogical-error-message-upon-SIGINT-SIGT.patch
Description: Binary data


Re: Flush SLRU counters in checkpointer process

2023-07-19 Thread Anthonin Bonnefoy
I think I've managed to reproduce the issue. The test I've added to check
slru flush was the one failing in the regression suite.

SELECT SUM(flushes) > :slru_flushes_before FROM pg_stat_slru;
 ?column?
--
 t

The origin seems to be a race condition on have_slrustats (
https://github.com/postgres/postgres/blob/c8e1ba736b2b9e8c98d37a5b77c4ed31baf94147/src/backend/utils/activity/pgstat_slru.c#L161-L162
).
I will try to get a new patch with improved test stability.


On Mon, Jul 3, 2023 at 3:18 PM Daniel Gustafsson  wrote:

> > On 3 Mar 2023, at 09:06, Anthonin Bonnefoy <
> anthonin.bonne...@datadoghq.com> wrote:
> >
> > Here's the patch rebased with Andres' suggestions.
> > Happy to update it if there's any additionalj change required.
>
> This patch crashes 031_recovery_conflict with a SIGInvalid on Windows, can
> you
> please investigate and see what might be going on there?  The test passed
> about
> 4 days ago on Windows so unless it's the CI being flaky it should be due
> to a
> recent change.
>
> If you don't have access to a Windows environment you can run your own
> instrumented builds in your Github account with the CI files in the
> postgres
> repo.
>
> --
> Daniel Gustafsson
>
>


Re: Support to define custom wait events for extensions

2023-07-19 Thread Michael Paquier
On Wed, Jul 19, 2023 at 12:52:10PM +0900, Masahiro Ikeda wrote:
> I would like to change the wait event name of contrib modules for example
> postgres_fdw. But, I think it's better to do so after the APIs are
> committed.

Agreed to do things one step at a time here.  Let's focus on the core
APIs and facilities first.
--
Michael


signature.asc
Description: PGP signature


Re: Support to define custom wait events for extensions

2023-07-19 Thread Masahiro Ikeda

On 2023-07-19 12:52, Masahiro Ikeda wrote:

Hi,

I updated the patches.
* v6-0001-Support-custom-wait-events-for-extensions.patch


I updated the patch since the cfbot found a bug.
* v7-0001-Support-custom-wait-events-for-extensions.patch

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom e5d63913cbba9f20098252cc1e415eda4d32f5ad Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Wed, 19 Jul 2023 12:28:12 +0900
Subject: [PATCH] Support custom wait events for extensions.

To support custom wait events, it add 2 APIs to define new wait events
for extensions dynamically.

The APIs are
* WaitEventExtensionNew()
* WaitEventExtensionRegisterName()

These are similar to the existing LWLockNewTrancheId() and
LWLockRegisterTranche().

First, extensions should call WaitEventExtensionNew() to get one
or more new wait event, which IDs are allocated from a shared
counter.  Next, each individual process can use the wait event with
WaitEventExtensionRegisterName() to associate that a wait event
string to the associated name.

Note that this includes a test module, which perhaps should be split
later into its own commit.
---
 doc/src/sgml/monitoring.sgml  |  10 +-
 doc/src/sgml/xfunc.sgml   |  23 ++
 src/backend/storage/ipc/ipci.c|   3 +
 .../activity/generate-wait_event_types.pl |   7 +-
 src/backend/utils/activity/wait_event.c   | 171 +++-
 .../utils/activity/wait_event_names.txt   |   2 +-
 src/include/utils/wait_event.h|  25 ++
 src/test/modules/Makefile |   1 +
 src/test/modules/meson.build  |   1 +
 .../test_custom_wait_events/.gitignore|   4 +
 .../modules/test_custom_wait_events/Makefile  |  23 ++
 .../test_custom_wait_events/meson.build   |  33 +++
 .../test_custom_wait_events/t/001_basic.pl| 137 ++
 .../test_custom_wait_events--1.0.sql  |  39 +++
 .../test_custom_wait_events.c | 253 ++
 .../test_custom_wait_events.control   |   3 +
 16 files changed, 718 insertions(+), 17 deletions(-)
 create mode 100644 src/test/modules/test_custom_wait_events/.gitignore
 create mode 100644 src/test/modules/test_custom_wait_events/Makefile
 create mode 100644 src/test/modules/test_custom_wait_events/meson.build
 create mode 100644 src/test/modules/test_custom_wait_events/t/001_basic.pl
 create mode 100644 src/test/modules/test_custom_wait_events/test_custom_wait_events--1.0.sql
 create mode 100644 src/test/modules/test_custom_wait_events/test_custom_wait_events.c
 create mode 100644 src/test/modules/test_custom_wait_events/test_custom_wait_events.control

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 588b720f57..3aee5c243f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1117,12 +1117,12 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 

 
- Extensions can add LWLock types to the list shown in
- .  In some cases, the name
+ Extensions can add Extension and LWLock types
+ to the list shown in  and
+ . In some cases, the name
  assigned by an extension will not be available in all server processes;
- so an LWLock wait event might be reported as
- just extension rather than the
- extension-assigned name.
+ so an wait event might be reported as just extension
+ rather than the extension-assigned name.
 

  
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index 9620ea9ae3..bc210cd03b 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -3453,6 +3453,29 @@ if (!ptr)
 

 
+   
+Custom Wait Events for Add-ins
+
+
+Add-ins can define custom wait events that the wait event type is
+Extension.
+
+
+First, add-ins should get new one or more wait events by calling:
+
+uint32 WaitEventExtensionNew(void)
+
+Next, each individual process can use them to associate that
+a wait event string to the associated name by calling:
+
+void WaitEventExtensionRegisterName(uint32 wait_event_info, const char *wait_event_name);
+
+An example can be found in
+src/test/modules/test_custom_wait_events/test_custom_wait_events.c
+in the PostgreSQL source tree.
+
+   
+

 Using C++ for Extensibility
 
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index cc387c00a1..5551afffc0 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -49,6 +49,7 @@
 #include "storage/spin.h"
 #include "utils/guc.h"
 #include "utils/snapmgr.h"
+#include "utils/wait_event.h"
 
 /* GUCs */
 int			shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -142,6 +143,7 @@ CalculateShmemSize(int *num_semaphores)
 	size = add_size(size, SyncScanShmemSize());
 	size = add_size(size, AsyncShmemSize());
 	size = add_size(size, StatsShmemSize());
+	size = add_size(size,