Re: Should the archiver process always make sure that the timeline history files exist in the archive?

2023-08-16 Thread Jimmy Yih
Hello pgsql-hackers,

After doing some more debugging on the matter, I believe this issue might be a
minor regression from commit 5332b8cec541. Prior to that commit, the archiver
process when first started on a previously promoted primary would have all the
timeline history files marked as ready for immediate archiving. If that had
happened, none of my mentioned failure scenarios would be theoretically possible
(barring someone manually deleting the timeline history files). With that in
mind, I decided to look more into my Question 1 and created a patch proposal.
The attached patch will try to archive the current timeline history file if it
has not been archived yet when the archiver process starts up.

Regards,
Jimmy Yih


From: Jimmy Yih 
Sent: Wednesday, August 9, 2023 5:00 PM
To: pgsql-hack...@postgresql.org
Subject: Should the archiver process always make sure that the timeline history 
files exist in the archive?

Hello pgsql-hackers,

While testing out some WAL archiving and PITR scenarios, it was observed that
enabling WAL archiving for the first time on a primary that was on a timeline
higher than 1 would not initially archive the timeline history file for the
timeline it was currently on. While this might be okay for most use cases, there
are scenarios where this leads to unexpected failures that seem to expose some
flaws in the logic.

Scenario 1:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Create a
standby with that backup that will be continuously restoring from the WAL
archives, the standby will not contain the timeline 2 history file. The standby
will operate normally but if you try to create a cascading standby off it using
streaming replication, the cascade standby's WAL receiver will continuously
FATAL trying to request the timeline 2 history file that the main standby does
not have.

Scenario 2:
Take a backup of a primary on timeline 2 with `pg_basebackup -Xnone`. Then try
to create a new node by doing PITR with recovery_target_timeline set to
'current' or 'latest' which will succeed. However, doing PITR with
recovery_target_timeline = '2' will fail since it is unable to find the timeline
2 history file in the WAL archives. This may be a bit contradicting since we
allow 'current' and 'latest' to recover but explicitly setting the
recovery_target_timeline to the control file's timeline id ends up with failure.

Attached is a patch containing two TAP tests that demonstrate the scenarios.

My questions are:
1. Why doesn't the archiver process try to archive timeline history files when
   WAL archiving is first configured and/or continually check (maybe when the
   archiver process gets started before the main loop)?
2. Why does explicitly setting the recovery_target_timeline to the control
   file's timeline id not follow the same logic as recovery_target_timeline set
   to 'current'?
3. Why does a cascaded standby require the timeline history file of its control
   file's timeline id (startTLI) when the main replica is able to operate fine
   without the timeline history file?

Note that my initial observations came from testing with pgBackRest (copying
pg_wal/ during backup is disabled by default) but using `pg_basebackup -Xnone`
reproduced the issues similarly and is what I present in the TAP tests. At the
moment, the only workaround I can think of is to manually run the
archive_command on the missing timeline history file(s).

Are these valid issues that should be looked into or are they expected? Scenario
2 seems like it could be easily fixed if we determine that the
recovery_target_timeline numeric value is equal to the control file's timeline
id (compare rtli and recoveryTargetTLI in validateRecoveryParameters()?) but I
wasn't sure if maybe the opposite was true where we should make 'current' and
'latest' require retrieving the timeline history files instead to help prevent
Scenario 1.

Regards,
Jimmy Yih


0001-Archive-current-timeline-history-file-on-archiver-st.patch
Description:  0001-Archive-current-timeline-history-file-on-archiver-st.patch


Fix typo in src/interfaces/libpq/po/zh_CN.po

2023-08-16 Thread Zhang Mingli
Hi, The Chinese words there are ok,  but the `Unix-domian` should be `Unix-domain`.

v1-0001-Fix-typo-src-interfaces-libpq-po-zh_CN.po.patch
Description: Binary data

Zhang MingliHashData https://www.hashdata.xyz



Re: pg_logical_emit_message() misses a XLogFlush()

2023-08-16 Thread Michael Paquier
On Tue, Aug 15, 2023 at 09:16:53PM -0700, Andres Freund wrote:
> To be clear: I don't just object to backpatching, I also object to making
> existing invocations flush WAL in HEAD. I do not at all object to adding a
> parameter that indicates flushing, or a separate function to do so. The latter
> might be better, because it allows you to flush a group of messages, rather
> than a single one.

For the latter, am I getting it right that you mean a function
completely outside of the scope of LogLogicalMessage() and
pg_logical_emit_message()?  Say, a single pg_wal_flush(lsn)?

I am a bit concerned by that, because anybody calling directly
LogLogicalMessage() or the existing function would never really think
about durability but they may want to ensure a message flush in their
code calling it.  Adding an argument does not do much about the SQL
function if it has a DEFAULT, still it addresses my first concern
about the C part.

Anyway, attached is a patch to add a 4th argument "flush" that
defaults to false.  Thoughts about this version are welcome.
--
Michael
From c70c001539b91422d4090d4e02d10473536f9cc7 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 16 Aug 2023 16:49:52 +0900
Subject: [PATCH v3] Add flush argument to pg_logical_emit_message()

The default is false, to not flush messages.  If set to true, the
message is flushed before returning back to the caller.
---
 src/include/catalog/pg_proc.dat   |  4 ++--
 src/include/replication/message.h |  3 ++-
 src/backend/catalog/system_functions.sql  | 20 +++
 .../replication/logical/logicalfuncs.c|  3 ++-
 src/backend/replication/logical/message.c | 13 ++--
 doc/src/sgml/func.sgml|  6 +-
 contrib/test_decoding/expected/messages.out   |  5 +++--
 contrib/test_decoding/sql/messages.sql|  5 +++--
 8 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..142e0c3fea 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11139,11 +11139,11 @@
   prosrc => 'pg_replication_slot_advance' },
 { oid => '3577', descr => 'emit a textual logical decoding message',
   proname => 'pg_logical_emit_message', provolatile => 'v', proparallel => 'u',
-  prorettype => 'pg_lsn', proargtypes => 'bool text text',
+  prorettype => 'pg_lsn', proargtypes => 'bool text text bool',
   prosrc => 'pg_logical_emit_message_text' },
 { oid => '3578', descr => 'emit a binary logical decoding message',
   proname => 'pg_logical_emit_message', provolatile => 'v', proparallel => 'u',
-  prorettype => 'pg_lsn', proargtypes => 'bool text bytea',
+  prorettype => 'pg_lsn', proargtypes => 'bool text bytea bool',
   prosrc => 'pg_logical_emit_message_bytea' },
 
 # event triggers
diff --git a/src/include/replication/message.h b/src/include/replication/message.h
index 6ce7f2038b..0f168d572c 100644
--- a/src/include/replication/message.h
+++ b/src/include/replication/message.h
@@ -30,7 +30,8 @@ typedef struct xl_logical_message
 #define SizeOfLogicalMessage	(offsetof(xl_logical_message, message))
 
 extern XLogRecPtr LogLogicalMessage(const char *prefix, const char *message,
-	size_t size, bool transactional);
+	size_t size, bool transactional,
+	bool flush);
 
 /* RMGR API */
 #define XLOG_LOGICAL_MESSAGE	0x00
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 07c0d89c4f..714f5da941 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -446,6 +446,26 @@ LANGUAGE INTERNAL
 VOLATILE ROWS 1000 COST 1000
 AS 'pg_logical_slot_peek_binary_changes';
 
+CREATE OR REPLACE FUNCTION pg_catalog.pg_logical_emit_message(
+transactional boolean,
+prefix text,
+message text,
+flush boolean DEFAULT false)
+RETURNS pg_lsn
+LANGUAGE INTERNAL
+VOLATILE STRICT
+AS 'pg_logical_emit_message_text';
+
+CREATE OR REPLACE FUNCTION pg_catalog.pg_logical_emit_message(
+transactional boolean,
+prefix text,
+message bytea,
+flush boolean DEFAULT false)
+RETURNS pg_lsn
+LANGUAGE INTERNAL
+VOLATILE STRICT
+AS 'pg_logical_emit_message_bytea';
+
 CREATE OR REPLACE FUNCTION pg_create_physical_replication_slot(
 IN slot_name name, IN immediately_reserve boolean DEFAULT false,
 IN temporary boolean DEFAULT false,
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 55a24c02c9..a472a0d873 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -362,10 +362,11 @@ pg_logical_emit_message_bytea(PG_FUNCTION_ARGS)
 	bool		transactional = PG_GETARG_BOOL(0);
 	char	   *prefix = text_to_cstring(PG_GETARG_TEXT_PP(1));
 	bytea	   *data = PG_GETARG_BYTEA_PP(2);
+	bool		flush = PG_GETARG_BOOL(3);
 	XLogRecPtr	lsn;
 
 	lsn = LogLogicalMessage(

Re: Return value of pg_promote()

2023-08-16 Thread Michael Paquier
On Thu, Jun 08, 2023 at 04:53:50PM +0530, Ashutosh Sharma wrote:
> Thanks for sharing your thoughts, Laurenz and Fujii-san. I've prepared
> a patch that makes pg_promote error out if it couldn't send SIGUSR1 to
> the postmaster or if the postmaster died in the middle of standby
> promotion. PFA. Please note that now (with this patch) pg_promote only
> returns false if the standby could not be promoted within the given
> wait time. In case of any kind of failure, it just reports an error
> based on the type of failure that occurred.

 if (kill(PostmasterPid, SIGUSR1) != 0)
 {
-ereport(WARNING,
-(errmsg("failed to send signal to postmaster: %m")));
 (void) unlink(PROMOTE_SIGNAL_FILE);
-PG_RETURN_BOOL(false);
+ereport(ERROR,
+(errmsg("failed to send signal to postmaster: %m")));
 }

Shouldn't you assign an error code to this one rather than the
default one for internal errors, like ERRCODE_SYSTEM_ERROR?

 /* return immediately if waiting was not requested */
@@ -744,7 +743,9 @@ pg_promote(PG_FUNCTION_ARGS)
  * necessity for manual cleanup of all postmaster children.
  */
 if (rc & WL_POSTMASTER_DEATH)
-PG_RETURN_BOOL(false);
+ereport(FATAL,
+(errcode(ERRCODE_ADMIN_SHUTDOWN),
+ errmsg("terminating connection due to unexpected 
postmaster exit")));

I would add an errcontext here, to let somebody know that the
connection died while waiting for the promotion to be processed, say
"while waiting on promotion".
--
Michael


signature.asc
Description: PGP signature


Re: cataloguing NOT NULL constraints

2023-08-16 Thread Peter Eisentraut

On 15.08.23 11:57, Dean Rasheed wrote:

Something else I noticed when reading the SQL standard is that a
user-defined CHECK (col IS NOT NULL) constraint should be recognised
by the system as also making the column not null (setting its
"nullability characteristic" to "known not nullable"). I think that's
more than just an artefact of how they say NOT NULL constraints should
be implemented, because the effect of such a CHECK constraint should
be exposed in the "columns" view of the information schema -- the
value of "is_nullable" should be "NO" if the column is "known not
nullable".


Nullability determination is different from not-null constraints.  The 
nullability characteristic of a column can be derived from multiple 
sources, including not-null constraints, check constraints, primary key 
constraints, domain constraints, as well as more complex rules in case 
of views, joins, etc.  But this is all distinct and separate from the 
issue of not-null constraints that we are discussing here.





Re: pgbench - adding pl/pgsql versions of tests

2023-08-16 Thread Fabien COELHO



Hello Nathan,


I'm unclear about what variety of scripts that could be provided given the
tables made available with pgbench. ISTM that other scenari would involve
both an initialization and associated scripts, and any proposal would be
bared because it would open the door to anything.


Why's that?


Just a wild guess based on 19 years of occasional contributions to pg and 
pgbench in particular:-)


I'm not aware of any project policy that prohibits such enhancements to 
pgbench.


Attempts in extending pgbench often fall under "you can do it outside (eg 
with a custom script) so there is no need to put that in pgbench as it 
would add to the maintenance burden with a weak benefit proven by the fact 
that it is not there already".


It might take some effort to gather consensus on a proposal like this, 
but IMHO that doesn't mean we shouldn't try.


Done it in the past. Probably will do it again in the future:-)

If the prevailing wisdom is that we shouldn't add more built-in scripts 
because there is an existing way to provide custom ones, then it's not 
clear that we should proceed with $SUBJECT, anyway.


I'm afraid there is that argument. I do not think that this policy is good 
wrt $SUBJECT, ISTM that having an easy way to test something with a 
PL/pgSQL function would help promote the language by advertising/showing 
the potential performance benefit (or not, depending). Just one function 
would be enough for that.


--
Fabien.




regexp_replace weirdness amounts to a bug?

2023-08-16 Thread Erik Rijkers

Hello,

The following surprised me enough to think it might be a bug:
(17devel)

select
   regexp_replace('Abc Def'
  , '([a-z]) ([A-Z])'
  , '\1 ' || lower('\2')  );

regexp_replace

 Abc Def
(1 row)

-- 'Abc Def' got
-- 'Abc def' expected

What do you think?

Thanks,

Erik Rijkers





Re: regexp_replace weirdness amounts to a bug?

2023-08-16 Thread Malthe
Hi Erik,

The regexp doesn't match your string because you're not allowing for
any repeat characters, try adding a '+'.

On Wed, 16 Aug 2023 at 09:07, Erik Rijkers  wrote:
>
> Hello,
>
> The following surprised me enough to think it might be a bug:
> (17devel)
>
> select
> regexp_replace('Abc Def'
>, '([a-z]) ([A-Z])'
>, '\1 ' || lower('\2')  );
>
> regexp_replace
> 
>   Abc Def
> (1 row)
>
> -- 'Abc Def' got
> -- 'Abc def' expected
>
> What do you think?
>
> Thanks,
>
> Erik Rijkers
>
>
>




Re: Normalization of utility queries in pg_stat_statements

2023-08-16 Thread jian he
On Wed, Mar 8, 2023 at 2:19 PM Michael Paquier  wrote:
>
> On Fri, Mar 03, 2023 at 09:37:27AM +0900, Michael Paquier wrote:
> > Thanks for double-checking, applied 0001 to finish this part of the
> > work.  I am attaching the remaining bits as of the attached, combined
> > into a single patch.
>
> Doing so as a single patch was not feeling right as this actually
> fixes issues with the location calculations for the Const node, so I
> have split that into three commits and finally applied the whole.
>
> As a bonus, please see attached a patch to apply the normalization to
> CALL statements using the new automated infrastructure.  OUT
> parameters can be passed to a procedure, hence I guess that these had
> better be silenced as well.  This is not aimed at being integrated,
> just for reference.
> --
> Michael

I tested it. all normally works fine. only 1 corner case:
set pg_stat_statements.track =  'all';
drop table if Exists cp_test;
CREATE TABLE cp_test (a int, b text);
CREATE or REPLACE PROCEDURE ptest1(x text) LANGUAGE SQL AS $$ INSERT
INTO cp_test VALUES (1, x); $$;

CREATE or REPLACE PROCEDURE ptest3(y text)
LANGUAGE SQL
AS $$
CALL ptest1(y);
CALL ptest1($1);
$$;
SELECT pg_stat_statements_reset();

CALL ptest3('b');

SELECT calls, toplevel, rows, query FROM pg_stat_statements ORDER BY
query COLLATE "C";
returns:
 calls | toplevel | rows |   query
---+--+--+
 1 | t|0 | CALL ptest3($1)
 2 | f|2 | INSERT INTO cp_test VALUES ($2, x)
 1 | t|1 | SELECT pg_stat_statements_reset()

here, the intermediate CALL part is optimized away. or should I expect
CALL ptest1($1) also in pg_stat_statements?




Add 'worker_type' to pg_stat_subscription

2023-08-16 Thread Peter Smith
Hi hackers,

Earlier this year I proposed a small change for the pg_stat_subscription view:

--
...it would be very useful to have an additional "kind" attribute for
this view. This will save the user from needing to do mental
gymnastics every time just to recognise what kind of process they are
looking at.
--

At that time Amit replied [1] that this could be posted as a separate
enhancement thread.

Now that the LogicalRepWorkerType has been recently pushed [2]
(something with changes in the same area of the code) it seemed the
right time to resurrect my pg_stat_subscription proposal.

PSA patch v1.

Thoughts?

--
[1] 
https://www.postgresql.org/message-id/CAA4eK1JO54%3D3s0KM9iZGSrQmmfzk9PEOKkW8TXjo2OKaKrSGCA%40mail.gmail.com
[2] 
https://github.com/postgres/postgres/commit/2a8b40e3681921943a2989fd4ec6cdbf8766566c

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Add-worker_type-to-pg_stat_subscription.patch
Description: Binary data


Re: Allow parallel plan for referential integrity checks?

2023-08-16 Thread Juan José Santamaría Flecha
On Thu, Aug 10, 2023 at 5:06 PM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

> On Tue, Jul 4, 2023 at 9:45 AM Daniel Gustafsson  wrote:
>
>> As there is no new patch submitted I will go ahead and do that, please
>> feel
>> free to resubmit when there is renewed interest in working on this.
>>
>
> Recently I restored a database from a directory format backup and having
> this feature would have been quite useful. So, I would like to resume the
> work on this patch, unless OP or someone else is already on it.
>

Hearing no one advising me otherwise I'll pick up the work on the patch.
First, I'll try to address all previous comments.

On Mon, Feb 14, 2022 at 3:34 PM Robert Haas  wrote:

> On Mon, Feb 7, 2022 at 5:26 AM Frédéric Yhuel 
> wrote:
> > I noticed that referential integrity checks aren't currently
> > parallelized. Is it on purpose?
>
> It's not 100% clear to me that it is safe. But on the other hand, it's
> also not 100% clear to me that it is unsafe.
>
> Generally, I only added CURSOR_OPT_PARALLEL_OK in places where I was
> confident that nothing bad would happen, and this wasn't one of those
> places. It's something of a nested-query environment -- your criterion
> #6. How do we know that the surrounding query isn't already parallel?
> Perhaps because it must be DML, but I think it must be more important
> to support parallel DML than to support this.
>

Currently I don't see a scenario where the validation might run inside
another parallel query, but I could be missing something. Suggestions on
some edge cases to test are more than welcome.

We already have a case where parallel operations can occur when adding a
table constraint, and that's the index creation for a primary key. Take for
example:

postgres=# SET max_parallel_maintenance_workers TO 4;
postgres=# SET parallel_setup_cost TO 0;
postgres=# SET parallel_tuple_cost TO 0;
postgres=# SET parallel_leader_participation TO 0;
postgres=# SET min_parallel_table_scan_size TO 0;
postgres=# SET client_min_messages TO debug1;
postgres=# CREATE TABLE parallel_pk_table (a int) WITH (autovacuum_enabled
= off);
postgres=# ALTER TABLE parallel_pk_table ADD PRIMARY KEY (a);
DEBUG:  ALTER TABLE / ADD PRIMARY KEY will create implicit index
"parallel_pk_table_pkey" for table "parallel_pk_table"
DEBUG:  building index "parallel_pk_table_pkey" on table
"parallel_pk_table" with request for 1 parallel workers
...
DEBUG:  index "parallel_pk_table_pkey" can safely use deduplication
DEBUG:  verifying table "parallel_pk_table"

This should be better documented. Maybe "parallel index build" [1] could
have its own section, so it can be cited from other sections?


> I'm not sure what the right thing to do here is, but I think it would
> be good if your patch included a test case.
>

I have included a basic test for parallel ADD PRIMARY KEY and ADD FOREIGN
KEY.

On Sat, Mar 19, 2022 at 1:57 AM Imseih (AWS), Sami 
wrote:

> It would be valuable to add logging to ensure that the ActiveSnapshot and
> TransactionSnapshot
> is the same for the leader and the workers. This logging could be tested
> in the TAP test.
>

That machinery is used in any parallel query, I'm not sure this patch
should be responsible for carrying that requirement.

Also, inside RI_Initial_Check you may want to set max_parallel_workers to
> max_parallel_maintenance_workers.
>
> Currently the work_mem is set to maintenance_work_mem. This will also
> require
> a doc change to call out.
>

Done.

PFA the patch.

[1] https://www.postgresql.org/docs/current/sql-createindex.html

Regards,

Juan José Santamaría Flecha


v2-0001-Allow-parallel-plan-for-referential-integrity-checks.patch
Description: Binary data


Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-16 Thread Etsuro Fujita
Hi,

On Tue, Aug 15, 2023 at 11:02 PM Önder Kalacı  wrote:
> The commit[1] seems to break some queries in Citus[2], which is an extension 
> which relies on set_join_pathlist_hook.
>
> Although the comment says /*Finally, give extensions a chance to manipulate 
> the path list.*/  we use it to extract lots of information about the joins 
> and do the planning based on the information.
>
> Now,  for some joins where consider_join_pushdown=false, we cannot get the 
> information that we used to get, which prevents doing distributed planning 
> for certain queries.
>
> We wonder if it is possible to allow extensions to access the join info under 
> all circumstances, as it used to be? Basically, removing the additional check:
>
> diff --git a/src/backend/optimizer/path/joinpath.c 
> b/src/backend/optimizer/path/joinpath.c
> index 03b3185984..080e76cbe9 100644
> --- a/src/backend/optimizer/path/joinpath.c
> +++ b/src/backend/optimizer/path/joinpath.c
> @@ -349,8 +349,7 @@ add_paths_to_joinrel(PlannerInfo *root,
> /*
>  * 6. Finally, give extensions a chance to manipulate the path list.
>  */
> -   if (set_join_pathlist_hook &&
> -   consider_join_pushdown)
> +   if (set_join_pathlist_hook)
> set_join_pathlist_hook(root, joinrel, outerrel, innerrel,
>jointype, &extra);

Maybe we could do so by leaving to extensions the decision whether
they replace joins with pseudoconstant clauses, but I am not sure that
that is a good idea, because that would require the authors to modify
and recompile their extensions to fix the issue...  So I fixed the
core side.

I am not familiar with the Citus extension, but such pseudoconstant
clauses are handled within the Citus extension?

Thanks for the report!

Best regards,
Etsuro Fujita




Re: Test case for parameterized remote path in postgres_fdw

2023-08-16 Thread Etsuro Fujita
Hi Richard,

On Wed, Aug 16, 2023 at 9:41 AM Richard Guo  wrote:
> On Tue, Aug 15, 2023 at 7:50 PM Etsuro Fujita  wrote:
>> So we should have modified the second one as well?  Attached is a
>> small patch for that.

> Agreed, nice catch!  +1 to the patch.

Thanks for looking!

Best regards,
Etsuro Fujita




Re: pg_logical_emit_message() misses a XLogFlush()

2023-08-16 Thread Tomas Vondra



On 8/16/23 06:13, Andres Freund wrote:
> Hi,
> 
> On 2023-08-16 03:20:53 +0200, Tomas Vondra wrote:
>> On 8/16/23 02:33, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2023-08-16 06:58:56 +0900, Michael Paquier wrote:
 On Tue, Aug 15, 2023 at 11:37:32AM +0200, Tomas Vondra wrote:
> Shouldn't the flush be done only for non-transactional messages? The
> transactional case will be flushed by regular commit flush.

 Indeed, that would be better.  I am sending an updated patch.

 I'd like to backpatch that, would there be any objections to that?
>>>
>>> Yes, I object. This would completely cripple the performance of some uses of
>>> logical messages - a slowdown of several orders of magnitude. It's not clear
>>> to me that flushing would be the right behaviour if it weren't released, but
>>> it certainly doesn't seem right to make such a change in a minor release.
>>>
>>
>> So are you objecting to adding the flush in general, or just to the
>> backpatching part?
> 
> Both, I think. I don't object to adding a way to trigger flushing, but I think
> it needs to be optional.
> 
> 
>> IMHO we either guarantee durability of non-transactional messages, in which
>> case this would be a clear bug - and I'd say a fairly serious one.  I'm
>> curious what the workload that'd see order of magnitude of slowdown does
>> with logical messages I've used it, but even if such workload exists, would
>> it really be enough to fix any other durability bug?
> 
> Not sure what you mean with the last sentence?
> 

Sorry, I meant to write "enough not to fix any other durability bug".

That is, either we must not lose messages, in which case it's a bug and
we should fix that. Or it's acceptable for the intended use cases, in
which case there's nothing to fix.

To me losing messages seems like a bad thing, but if the users are aware
of it and are fine with it ... I'm simply arguing that if we conclude
this is a durability bug, we should not leave it unfixed because it
might have performance impact.

> I've e.g. used non-transactional messages for:
> 
> - A non-transactional queuing system. Where sometimes one would dump a portion
>   of tables into messages, with something like
>   SELECT pg_logical_emit_message(false, 'app:', to_json(r)) FROM r;
>   Obviously flushing after every row would be bad.
> 
>   This is useful when you need to coordinate with other systems in a
>   non-transactional way. E.g. letting other parts of the system know that
>   files on disk (or in S3 or ...) were created/deleted, since a database
>   rollback wouldn't unlink/revive the files.
> 
> - Audit logging, when you want to log in a way that isn't undone by rolling
>   back transaction - just flushing every pg_logical_emit_message() would
>   increase the WAL flush rate many times, because instead of once per
>   transaction, you'd now flush once per modified row. It'd basically make it
>   impractical to use for such things.
> 
> - Optimistic locking. Emitting things that need to be locked on logical
>   replicas, to be able to commit on the primary. A pre-commit hook would wait
>   for the WAL to be replayed sufficiently - but only once per transaction, not
>   once per object.
> 

How come the possibility of losing messages is not an issue for these
use cases? I mean, surely auditors would not like that, and I guess
forgetting locks might be bad too.

> 
>> Or perhaps we don't want to guarantee durability for such messages, in
>> which case we don't need to fix it at all (even in master).
> 
> Well, I can see adding an option to flush, or perhaps a separate function to
> flush, to master.
>>
>> The docs are not very clear on what to expect, unfortunately. It says
>> that non-transactional messages are "written immediately" which I could
>> interpret in either way.
> 
> Yea, the docs certainly should be improved, regardless what we end up with.
> 

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: regexp_replace weirdness amounts to a bug?

2023-08-16 Thread Alvaro Herrera
On 2023-Aug-16, Erik Rijkers wrote:

> Hello,
> 
> The following surprised me enough to think it might be a bug:
> (17devel)
> 
> select
>regexp_replace('Abc Def'
>   , '([a-z]) ([A-Z])'
>   , '\1 ' || lower('\2')  );
> 
> regexp_replace
> 
>  Abc Def

What's happening here is that the lower() is applying to the literal \2,
and the expansion of \2 to 'D' occurs afterwards, when lower() has
already executed.  Note this other example, where the literal part of
the replacement string is correctly lowercased:

select
   regexp_replace('Abc Def'
  , '([a-z]) ([A-Z])'
  , '\1 ' || lower('D\2D'));
 regexp_replace 

 Abc dDdef
(1 fila)

I don't know how to achieve what you want.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)




Re: cataloguing NOT NULL constraints

2023-08-16 Thread Peter Eisentraut

I have two small patches that you can integrate into your patch set:

The first just changes the punctuation of "Not-null constraints" in the 
psql output to match what the documentation mostly uses.


The second has some changes to ddl.sgml to reflect that not-null 
constraints are now named and can be operated on like other constraints. 
 You might want to read that again to make sure it matches your latest 
intentions, but I think it catches all the places that are required to 
change.From 324f0050eee51c47e4c558867e6cc832652b39bb Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 16 Aug 2023 10:26:09 +0200
Subject: [PATCH 1/2] fixup! Have psql print the NOT NULL constraints on \d+

---
 contrib/test_decoding/expected/ddl.out|  8 +-
 src/bin/psql/describe.c   |  2 +-
 src/test/regress/expected/create_table.out|  6 +-
 .../regress/expected/create_table_like.out| 10 +--
 src/test/regress/expected/foreign_data.out| 84 +--
 src/test/regress/expected/generated.out   |  2 +-
 src/test/regress/expected/identity.out|  2 +-
 src/test/regress/expected/publication.out |  6 +-
 .../regress/expected/replica_identity.out |  6 +-
 src/test/regress/expected/rowsecurity.out |  2 +-
 10 files changed, 64 insertions(+), 64 deletions(-)

diff --git a/contrib/test_decoding/expected/ddl.out 
b/contrib/test_decoding/expected/ddl.out
index 95a0722c33..bcd1f74b2b 100644
--- a/contrib/test_decoding/expected/ddl.out
+++ b/contrib/test_decoding/expected/ddl.out
@@ -492,7 +492,7 @@ WITH (user_catalog_table = true)
  options  | text[]  |   |  |   
   | extended |  | 
 Indexes:
 "replication_metadata_pkey" PRIMARY KEY, btree (id)
-Not null constraints:
+Not-null constraints:
 "replication_metadata_id_not_null" NOT NULL "id"
 "replication_metadata_relation_not_null" NOT NULL "relation"
 Options: user_catalog_table=true
@@ -509,7 +509,7 @@ ALTER TABLE replication_metadata RESET (user_catalog_table);
  options  | text[]  |   |  |   
   | extended |  | 
 Indexes:
 "replication_metadata_pkey" PRIMARY KEY, btree (id)
-Not null constraints:
+Not-null constraints:
 "replication_metadata_id_not_null" NOT NULL "id"
 "replication_metadata_relation_not_null" NOT NULL "relation"
 
@@ -525,7 +525,7 @@ ALTER TABLE replication_metadata SET (user_catalog_table = 
true);
  options  | text[]  |   |  |   
   | extended |  | 
 Indexes:
 "replication_metadata_pkey" PRIMARY KEY, btree (id)
-Not null constraints:
+Not-null constraints:
 "replication_metadata_id_not_null" NOT NULL "id"
 "replication_metadata_relation_not_null" NOT NULL "relation"
 Options: user_catalog_table=true
@@ -547,7 +547,7 @@ ALTER TABLE replication_metadata SET (user_catalog_table = 
false);
  rewritemeornot | integer |   |  | 
 | plain|  | 
 Indexes:
 "replication_metadata_pkey" PRIMARY KEY, btree (id)
-Not null constraints:
+Not-null constraints:
 "replication_metadata_id_not_null" NOT NULL "id"
 "replication_metadata_relation_not_null" NOT NULL "relation"
 Options: user_catalog_table=false
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index d1dc8fa066..4d36e0cfd8 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -3072,7 +3072,7 @@ describeOneTableDetails(const char *schemaname,
tuples = PQntuples(result);
 
if (tuples > 0)
-   printTableAddFooter(&cont, _("Not null 
constraints:"));
+   printTableAddFooter(&cont, _("Not-null 
constraints:"));
 
/* Might be an empty set - that's ok */
for (i = 0; i < tuples; i++)
diff --git a/src/test/regress/expected/create_table.out 
b/src/test/regress/expected/create_table.out
index 3f6516c3f8..477e8839e9 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -854,7 +854,7 @@ drop table test_part_coll_posix;
  b  | integer |   | not null | 1   | plain|  | 
 Partition of: parted FOR VALUES IN ('b')
 Partition constraint: ((a IS NOT NULL) AND (a = 'b'::text))
-Not null constraints:
+Not-null constraints:
 "part_b_b_not_null" NOT NULL "b"
 
 -- Both partition bound and partition key in describe output
@@ -867,7 +867,7 @@ Not null constraints:
 Partition of: parted FOR VALUES IN ('c')
 Partition constraint: ((a IS NOT NULL) AND (a = 'c'::text))
 Partition key: RANGE (b)
-Not null constraints:
+Not-null constraints:
 "part_c_b_not_null" NOT NULL "b"
 Partitions: part_c_1_10 FOR VALUES FROM (1) TO (10)
 
@@ -880,7 +880,7 @@ Partitions: part_c

Re: Todo: Teach planner to evaluate multiple windows in the optimal order

2023-08-16 Thread John Naylor
On Mon, Jul 3, 2023 at 5:15 PM David Rowley  wrote:
>
> On Fri, 30 Jun 2023 at 18:45, John Naylor 
wrote:

> I looked over your patch and don't see anything to report aside from
> the unfinished/undecided part around the tiebreak function for
> tuplesort_begin_index_hash().

I went ahead and added a degenerate function, just for consistency -- might
also head off possible alarms from code analysis tools.

> I also ran the benchmark script [1] with the patch from [2] and
> calculated the speedup with [2] with and without your v3 patch. I've
> attached two graphs with the benchmark results. Any value >100%
> indicates that performing the sort for the ORDER BY at the same time
> as the WindowAgg improves performance, whereas anything < 100%
> indicates a regression.  The bars in blue show the results without
> your v3 patch and the red bars show the results with your v3 patch.
> Looking at the remaining regressions it does not really feel like
> we've found the culprit for the regressions.  Certainly, v3 helps, but
> I just don't think it's to the level we'd need to make the window sort
> changes a good idea.
>
> I'm not sure exactly how best to proceed here. I think the tiebreak
> stuff is worth doing regardless, so maybe that can just go in to
> eliminate that as a factor and we or I can continue to see what else
> is to blame.

Thanks for testing again. Sounds good, I removed a now-invalidated comment,
pgindent'd, and pushed.

--
John Naylor
EDB: http://www.enterprisedb.com


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-16 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

> > > It was primarily for upgrade purposes only. So, as we can't see a good 
> > > reason
> to
> > > go via pg_dump let's do it in upgrade unless someone thinks otherwise.
> >
> > Removed the new option in pg_dump and modified the pg_upgrade
> > directly use the slot info to restore the slot in new cluster.
> 
> In this version, creations of logical slots are serialized, whereas old ones 
> were
> parallelised per db. Do you it should be parallelized again? I have tested 
> locally
> and felt harmless. Also, this approch allows to log the executed SQLs.

I updated the patch to allow parallel executions. Workers are launched per 
slots,
each one connects to the new node via psql and executes 
pg_create_logical_replication_slot().
Moreover, following points were changed for 0002.

* Ensured to log executed SQLs for creating slots.
* Fixed an issue that 'unreserved' slots could not be upgrade. This change was 
  not expected one. Related discussion was [1].
* Added checks for output plugin libraries. pg_upgrade ensures that plugins
  referred by old slots were installed to the new executable directory. 

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866FD3F7992A46D0457F0E6F50BA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v21-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch
Description:  v21-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch


v21-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v21-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


v21-0003-pg_upgrade-Add-check-function-for-logical-replic.patch
Description:  v21-0003-pg_upgrade-Add-check-function-for-logical-replic.patch


Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-08-16 Thread John Naylor
On Tue, Aug 15, 2023 at 6:53 PM John Naylor 
wrote:
>
> On Tue, Aug 15, 2023 at 9:34 AM Masahiko Sawada 
wrote:
>
> > BTW cfbot reported that some regression tests failed due to OOM. I've
> > attached the patch to fix it.
>
> Seems worth doing now rather than later, so added this and squashed most
of the rest together.

This segfaults because of a mistake fixing a rebase conflict, so v40
attached.

--
John Naylor
EDB: http://www.enterprisedb.com


v40-ART.tar.gz
Description: application/gzip


Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-08-16 Thread Alvaro Herrera
Hello,

On 2023-Aug-12, Andres Freund wrote:

> On 2023-08-08 12:45:05 +0900, Masahiko Sawada wrote:

> > > Any chance you could your benchmark? I don't see as much of a regression 
> > > vs 16
> > > as you...
> > 
> > Sure. The results are promising for me too:
> >
> >  nclients = 1, execution time = 13.743
> >  nclients = 2, execution time = 7.552
> >  nclients = 4, execution time = 4.758
> >  nclients = 8, execution time = 3.035
> >  nclients = 16, execution time = 2.172
> >  nclients = 32, execution time = 1.959
> > nclients = 64, execution time = 1.819
> > nclients = 128, execution time = 1.583
> > nclients = 256, execution time = 1.631
> 
> Nice. We are consistently better than both 15 and "post integer parsing 16".

Since the wins from this patch were replicated and it has been pushed, I
understand that this open item can be marked as closed, so I've done
that.

Thanks,

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)




Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)

2023-08-16 Thread Antonin Houska
Thomas Munro  wrote:

> On Tue, Aug 15, 2023 at 2:23 AM Tomas Vondra
>  wrote:
> > I'm not familiar with the condition variable code enough to have an
> > opinion, but the patch seems to resolve the issue for me - I can no
> > longer reproduce the high CPU usage.
> 
> Thanks, pushed.

I try to understand this patch (commit 5ffb7c7750) because I use condition
variable in an extension. One particular problem occured to me, please
consider:

ConditionVariableSleep() gets interrupted, so AbortTransaction() calls
ConditionVariableCancelSleep(), but the signal was sent in between. Shouldn't
at least AbortTransaction() and AbortSubTransaction() check the return value
of ConditionVariableCancelSleep(), and re-send the signal if needed?

Note that I'm just thinking about such a problem, did not try to reproduce it.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-16 Thread Zhijie Hou (Fujitsu)
On Wednesday, August 16, 2023 6:25 PM Kuroda, Hayato/黒田 隼人 wrote:
> 
> Dear hackers,
> 
> > > > It was primarily for upgrade purposes only. So, as we can't see a
> > > > good reason
> > to
> > > > go via pg_dump let's do it in upgrade unless someone thinks otherwise.
> > >
> > > Removed the new option in pg_dump and modified the pg_upgrade
> > > directly use the slot info to restore the slot in new cluster.
> >
> > In this version, creations of logical slots are serialized, whereas
> > old ones were parallelised per db. Do you it should be parallelized
> > again? I have tested locally and felt harmless. Also, this approch allows 
> > to log
> the executed SQLs.
> 
> I updated the patch to allow parallel executions. Workers are launched per
> slots, each one connects to the new node via psql and executes
> pg_create_logical_replication_slot().
> Moreover, following points were changed for 0002.
> 
> * Ensured to log executed SQLs for creating slots.
> * Fixed an issue that 'unreserved' slots could not be upgrade. This change was
>   not expected one. Related discussion was [1].
> * Added checks for output plugin libraries. pg_upgrade ensures that plugins
>   referred by old slots were installed to the new executable directory.


Thanks for updating the patch ! Here are few comments:

+static void
+create_logical_replication_slots(void)
...
+   query = createPQExpBuffer();
+   escaped = createPQExpBuffer();
+   conn = connectToServer(&new_cluster, old_db->db_name);

Since the connection here is not used anymore, so I think we can remove it.

2.

+static void
+create_logical_replication_slots(void)
...
+   /* update new_cluster info again */
+   get_logical_slot_infos(&new_cluster);
+}

Do we need to get new slots again after restoring ?

3.

+   snprintf(query, sizeof(query),
+"SELECT slot_name, plugin, two_phase "
+"FROM pg_catalog.pg_replication_slots "
+"WHERE database = current_database() AND temporary = 
false "
+"AND wal_status <> 'lost';");
+
+   res = executeQueryOrDie(conn, "%s", query);
+

Instead of building the query in a new variable, can we directly put the SQL in 
executeQueryOrDie()
e.g.
executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase ...");


4.
+intnum_slots_on_old_cluster;

Instead of a new global variable, would it be better to record this in the 
cluster info ?


5.

charsql_file_name[MAXPGPATH],
log_file_name[MAXPGPATH];
+
DbInfo *old_db = &old_cluster.dbarr.dbs[dbnum];

There is an extra change here.

6.
+   for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
..
+   /* reap all children */
+   while (reap_child(true) == true)
+   ;
+   }

Maybe we can move the "while (reap_child(true) == true)" out of the for() loop ?

Best Regards,
Hou zj


Re: WIP: new system catalog pg_wait_event

2023-08-16 Thread Drouvot, Bertrand

Hi,

On 8/16/23 8:22 AM, Michael Paquier wrote:

On Wed, Aug 16, 2023 at 07:04:53AM +0200, Drouvot, Bertrand wrote:

I'd prefer the singular form. There is a lot of places where it's already used
(pg_database, pg_user, pg_namespace...to name a few) and it looks like that 
using
the plural form are exceptions.


Okay, fine by me.  


Great, singular form is being used in v6 attached.


Also, I would remove the "wait_event_" prefixes to
the field names for the attribute names.



It looks like it has already been done in v5.


Yeah, now that af720b4c50 is done, I'll add the custom wait events handling
in v6.


Thanks.  I guess that the hash tables had better remain local to
wait_event.c.


Yeah, agree, done that way in v6 (also added a test in 001_worker_spi.pl
to ensure that "worker_spi_main" is reported in pg_wait_event).

I added a "COLLATE "C"" in the "order by" clause's test that we added in
src/test/regress/sql/sysviews.sql (the check was failing on my environment).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom f55bc01ca03b80116664b03b0174bc97dc01b6b9 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Sat, 5 Aug 2023 12:39:42 +
Subject: [PATCH v6] Add catalog pg_wait_event

Adding a new system view, namely pg_wait_event, that describes the wait
events.
---
 doc/src/sgml/system-views.sgml| 64 +
 src/backend/Makefile  |  3 +-
 src/backend/catalog/system_views.sql  |  3 +
 src/backend/utils/activity/.gitignore |  1 +
 src/backend/utils/activity/Makefile   |  8 +-
 .../activity/generate-wait_event_types.pl | 46 -
 src/backend/utils/activity/meson.build|  1 +
 src/backend/utils/activity/wait_event.c   | 50 --
 src/backend/utils/activity/wait_event_funcs.c | 94 +++
 src/include/catalog/pg_proc.dat   |  6 ++
 src/include/utils/meson.build |  4 +-
 src/include/utils/wait_event.h|  7 ++
 .../modules/worker_spi/t/001_worker_spi.pl|  6 ++
 src/test/regress/expected/rules.out   |  4 +
 src/test/regress/expected/sysviews.out| 16 
 src/test/regress/sql/sysviews.sql |  4 +
 src/tools/msvc/Solution.pm|  3 +-
 src/tools/msvc/clean.bat  |  1 +
 18 files changed, 304 insertions(+), 17 deletions(-)
  15.7% doc/src/sgml/
  62.3% src/backend/utils/activity/
   3.6% src/include/catalog/
   4.2% src/include/utils/
   4.6% src/test/regress/expected/
   4.6% src/test/
   3.1% src/tools/msvc/

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 57b228076e..b2ed309fe2 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -221,6 +221,11 @@
   views
  
 
+ 
+  pg_wait_event
+  wait events
+ 
+
 

   
@@ -4825,4 +4830,63 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
  
 
+
+ 
+  pg_wait_event
+
+  
+   pg_wait_event
+  
+
+  
+   The view pg_wait_event provides description about 
the
+   wait events.
+  
+
+  
+   pg_wait_event Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   type text
+  
+  
+   Wait event type
+  
+ 
+
+ 
+  
+   name text
+  
+  
+   Wait event name
+  
+ 
+
+ 
+  
+   description text
+  
+  
+   Wait event description
+  
+ 
+
+   
+  
+ 
+
 
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 1c929383c4..3e275ac759 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -134,7 +134,7 @@ storage/lmgr/lwlocknames.h: 
storage/lmgr/generate-lwlocknames.pl storage/lmgr/lw
$(MAKE) -C storage/lmgr lwlocknames.h lwlocknames.c
 
 utils/activity/wait_event_types.h: utils/activity/generate-wait_event_types.pl 
utils/activity/wait_event_names.txt
-   $(MAKE) -C utils/activity wait_event_types.h pgstat_wait_event.c
+   $(MAKE) -C utils/activity wait_event_types.h pgstat_wait_event.c 
wait_event_funcs_data.c
 
 # run this unconditionally to avoid needing to know its dependencies here:
 submake-catalog-headers:
@@ -311,6 +311,7 @@ maintainer-clean: distclean
  storage/lmgr/lwlocknames.c \
  storage/lmgr/lwlocknames.h \
  utils/activity/pgstat_wait_event.c \
+ utils/activity/wait_event_funcs_data.c \
  utils/activity/wait_event_types.h \
  utils/adt/jsonpath_gram.c \
  utils/adt/jsonpath_gram.h \
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index af65af6bdd..f86a4dd770 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1342,3 +1342,6 @@ CREATE VIEW pg_stat_subscription_stats AS
 ss.stats_reset

Re: WIP: new system catalog pg_wait_event

2023-08-16 Thread Michael Paquier
On Wed, Aug 16, 2023 at 01:43:35PM +0200, Drouvot, Bertrand wrote:
> Yeah, agree, done that way in v6 (also added a test in 001_worker_spi.pl
> to ensure that "worker_spi_main" is reported in pg_wait_event).

-typedef struct WaitEventExtensionEntryByName
-{
-   charwait_event_name[NAMEDATALEN];   /* hash key */
-   uint16  event_id;   /* wait event ID */
-} WaitEventExtensionEntryByName;

I'd rather keep all these structures local to wait_event.c, as these
cover the internals of the hash tables.

Could you switch GetWaitEventExtensionEntries() so as it returns a
list of strings or an array of char*?  We probably can live with the
list for that.

+   charbuf[NAMEDATALEN + 44]; //"Custom wait event \"%Name%\" 
defined by extension" + 
Incorrect comment.  This would be simpler as a StringInfo.

Thanks for the extra test in worker_spi looking at the contents of the
catalog.
--
Michael


signature.asc
Description: PGP signature


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-16 Thread Shaun Thomas
> We could do something like a LOG "connection: method=%s user=%s
> (%s:%d)", without the "authenticated" and "identity" terms from
> set_authn_id().  Just to drop an idea.

That would be my inclination as well. Heck, just slap a log message
right in the specific case statements that don't have actual auth as
defined by set_authn_id. This assumes anyone really cares about it
that much, of course. :D

-- 
Shaun




Re: Handle infinite recursion in logical replication setup

2023-08-16 Thread Peter Eisentraut

On 09.08.23 04:50, Peter Smith wrote:

On Tue, Aug 8, 2023 at 6:52 PM Amit Kapila  wrote:


On Tue, Aug 8, 2023 at 1:50 PM Peter Eisentraut  wrote:


This patch added the following error message:

errdetail_plural("Subscribed publication %s is subscribing to other
publications.",
"Subscribed publications %s are subscribing to other publications.",
list_length(publist), pubnames->data),

But in PostgreSQL, a publication cannot subscribe to a publication, so
this is not giving accurate information.  Apparently, what it is trying
to say is that

The subscription that you are creating subscribes to publications that
contain tables that are written to by other subscriptions.

Can we get to a more accurate wording like this?



+1 for changing the message as per your suggestion.



PSA a patch to change this message text. The message now has wording
similar to the suggestion.


committed, thanks





dubious warning: FORMAT JSON has no effect for json and jsonb types

2023-08-16 Thread Peter Eisentraut
This warning comes from parse_expr.c transformJsonValueExpr() and is 
triggered for example by the following test case:


SELECT JSON_OBJECT('foo': NULL::json FORMAT JSON);
WARNING:  FORMAT JSON has no effect for json and jsonb types

But I don't see anything in the SQL standard that would require this 
warning.  It seems pretty clear that FORMAT JSON in this case is 
implicit and otherwise without effect.


Also, we don't have that warning in the output case (RETURNING json 
FORMAT JSON).


Anyone remember why this is here?  Should we remove it?




Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-16 Thread Önder Kalacı
Hi Etsuro,

Thanks for the response!


> Maybe we could do so by leaving to extensions the decision whether
> they replace joins with pseudoconstant clauses, but I am not sure that
> that is a good idea, because that would require the authors to modify
> and recompile their extensions to fix the issue...


I think I cannot easily follow this argument. The decision to push down the
join
(or not) doesn't seem to be related to calling set_join_pathlist_hook. It
seems like the
extension should decide what to do with the hook.

That seems the generic theme of the hooks that Postgres provides. For
example, the extension
is allowed to even override the whole planner/executor, and there is no
condition that would
prevent it from happening. In other words, an extension can easily return
wrong results with the
wrong actions taken with the hooks, and that should be responsibility of
the extension, not Postgres


> I am not familiar with the Citus extension, but such pseudoconstant
> clauses are handled within the Citus extension?
>
>
As I noted earlier, Citus relies on this hook for collecting information
about all the joins that Postgres
knows about, there is nothing specific to pseudoconstants. Some parts of
creating the (distributed)
plan relies on the information gathered from this hook. So, if information
about some of the joins
are not passed to the extension, then the decisions that the extension
gives are broken (and as a result
the queries are broken).

Thanks,
Onder


Re: [PATCH] Add function to_oct

2023-08-16 Thread Nathan Bossart
On Wed, Aug 16, 2023 at 10:35:27AM +0700, John Naylor wrote:
> ```
> *ptr = '\0';
> 
> do
> ```
> 
> to
> 
> ```
> *ptr = '\0';
> do
> ```

Oh, I misunderstood.  I thought you meant that there might be a whitespace
change on that line, not the surrounding ones.  This is fixed in v6.

> Now I'm struggling to understand why each and every instance has its own
> nominal buffer, passed down to the implementation. All we care about is the
> result -- is there some reason not to confine the buffer declaration to the
> general implementation?

We can do that if we use a static variable, which is what I've done in v6.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From dce8c16d55f1fe91747be45abe155b5d7d1f4274 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 25 Jul 2023 16:09:01 -0700
Subject: [PATCH v6 1/1] add to_binary() and to_oct()

---
 doc/src/sgml/func.sgml| 42 +
 src/backend/utils/adt/varlena.c   | 90 ---
 src/include/catalog/pg_proc.dat   | 12 
 src/test/regress/expected/strings.out | 26 +++-
 src/test/regress/sql/strings.sql  |  9 ++-
 5 files changed, 153 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..23b5ac7457 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3737,6 +3737,48 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ to_binary
+
+to_binary ( integer )
+text
+   
+   
+to_binary ( bigint )
+text
+   
+   
+Converts the number to its equivalent binary representation.
+   
+   
+to_binary(2147483647)
+111
+   
+  
+
+  
+   
+
+ to_oct
+
+to_oct ( integer )
+text
+   
+   
+to_oct ( bigint )
+text
+   
+   
+Converts the number to its equivalent octal representation.
+   
+   
+to_oct(2147483647)
+177
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index b1ec5c32ce..2330f86c09 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4919,53 +4919,97 @@ array_to_text_internal(FunctionCallInfo fcinfo, ArrayType *v,
 	return result;
 }
 
-#define HEXBASE 16
 /*
- * Convert an int32 to a string containing a base 16 (hex) representation of
- * the number.
+ * Workhorse for to_binary, to_oct, and to_hex.  Note that base must be > 1 and
+ * <= 16.
  */
-Datum
-to_hex32(PG_FUNCTION_ARGS)
+static inline char *
+convert_to_base(uint64 value, int base)
 {
-	uint32		value = (uint32) PG_GETARG_INT32(0);
+	/*
+	 * We size the buffer for to_binary's longest possible return value,
+	 * including the terminating '\0'.
+	 */
+	static char buf[sizeof(uint64) * BITS_PER_BYTE + 1];
 	char	   *ptr;
 	const char *digits = "0123456789abcdef";
-	char		buf[32];		/* bigger than needed, but reasonable */
+
+	Assert(base > 1);
+	Assert(base <= 16);
 
 	ptr = buf + sizeof(buf) - 1;
 	*ptr = '\0';
 
 	do
 	{
-		*--ptr = digits[value % HEXBASE];
-		value /= HEXBASE;
+		*--ptr = digits[value % base];
+		value /= base;
 	} while (ptr > buf && value);
 
-	PG_RETURN_TEXT_P(cstring_to_text(ptr));
+	return ptr;
 }
 
 /*
- * Convert an int64 to a string containing a base 16 (hex) representation of
+ * Convert an integer to a string containing a base-2 (binary) representation
+ * of the number.
+ */
+Datum
+to_binary32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT32(0);
+	char	   *result = convert_to_base(value, 2);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+Datum
+to_binary64(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT64(0);
+	char	   *result = convert_to_base(value, 2);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+
+/*
+ * Convert an integer to a string containing a base-8 (oct) representation of
  * the number.
  */
 Datum
-to_hex64(PG_FUNCTION_ARGS)
+to_oct32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT32(0);
+	char	   *result = convert_to_base(value, 8);
+
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
+Datum
+to_oct64(PG_FUNCTION_ARGS)
 {
 	uint64		value = (uint64) PG_GETARG_INT64(0);
-	char	   *ptr;
-	const char *digits = "0123456789abcdef";
-	char		buf[32];		/* bigger than needed, but reasonable */
+	char	   *result = convert_to_base(value, 8);
 
-	ptr = buf + sizeof(buf) - 1;
-	*ptr = '\0';
+	PG_RETURN_TEXT_P(cstring_to_text(result));
+}
 
-	do
-	{
-		*--ptr = digits[value % HEXBASE];
-		value /= HEXBASE;
-	} while (ptr > buf && value);
+/*
+ * Convert an integer to a string containing a base-16 (hex) representation of
+ * the number.
+ */
+Datum
+to_hex32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT32(0);
+	char	   *result = convert_to_base(value, 16);
+
+	PG_RETURN_TEXT_P(cstring_to_text(

Re: Fix typo in src/interfaces/libpq/po/zh_CN.po

2023-08-16 Thread Peter Eisentraut

On 16.08.23 09:34, Zhang Mingli wrote:
The Chinese words there are ok,  but the `Unix-domian` should be 
`Unix-domain`.


fixed, thanks





Re: dubious warning: FORMAT JSON has no effect for json and jsonb types

2023-08-16 Thread Merlin Moncure
On Wed, Aug 16, 2023 at 8:55 AM Peter Eisentraut 
wrote:

> This warning comes from parse_expr.c transformJsonValueExpr() and is
> triggered for example by the following test case:
>
> SELECT JSON_OBJECT('foo': NULL::json FORMAT JSON);
> WARNING:  FORMAT JSON has no effect for json and jsonb types
>
> But I don't see anything in the SQL standard that would require this
> warning.  It seems pretty clear that FORMAT JSON in this case is
> implicit and otherwise without effect.
>
> Also, we don't have that warning in the output case (RETURNING json
> FORMAT JSON).
>
> Anyone remember why this is here?  Should we remove it?


+1 for removing, on the basis that it is not suprising, and would pollute
logs for most configurations.

merlin


Re: should frontend tools use syncfs() ?

2023-08-16 Thread Nathan Bossart
Thanks for taking a look.

On Wed, Aug 16, 2023 at 08:10:10AM +0900, Michael Paquier wrote:
> On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote:
>> I ran a couple of tests for pg_upgrade with 100k tables (created using the
>> script here [0]) in order to demonstrate the potential benefits of this
>> patch.
> 
> That shows some nice numbers with many files, indeed.  How does the
> size of each file influence the difference in time?

IME the number of files tends to influence the duration much more than the
size.  I assume this is because most files are already sync'd in these code
paths that loop through every file.

> +else
> +{
> +while (errno = 0, (de = readdir(dir)) != NULL)
> +{
> +charsubpath[MAXPGPATH * 2];
> +
> +if (strcmp(de->d_name, ".") == 0 ||
> +strcmp(de->d_name, "..") == 0)
> +continue;
> 
> It seems to me that there is no need to do that for in-place
> tablespaces.  There are relative paths in pg_tblspc/, so they would be
> taken care of by the syncfs() done on the main data folder.
> 
> This does not really check if the mount points of each tablespace is
> different, as well.  For example, imagine that you have two
> tablespaces within the same disk, syncfs() twice.  Perhaps, the
> current logic is OK anyway as long as the behavior is optional, but it
> should be explained in the docs, at least.

True.  But I don't know if checking the mount point of each tablespace is
worth the complexity.  In the worst case, we'll call syncfs() on the same
file system a few times, which is probably still much faster in most cases.
FWIW this is what recovery_init_sync_method does today, and I'm not aware
of any complaints about this behavior.

The patch does have the following note:

+On Linux, syncfs may be used instead to ask the
+operating system to synchronize the whole file systems that contain the
+data directory, the WAL files, and each tablespace.

Do you think that is sufficient, or do you think we should really clearly
explain that you could end up calling syncfs() on the same file system a
few times if your tablespaces are on the same disk?  I personally feel
like that'd be a bit too verbose for the already lengthy descriptions of
this setting.

> I'm finding a bit confusing that fsync_pgdata() is coded in such a way
> that it does a silent fallback to the cascading syncs through
> walkdir() when syncfs is specified but not available in the build.
> Perhaps an error is more helpful because one would then know that they
> are trying something that's not around?

If syncfs() is not available, SYNC_METHOD_SYNCFS won't even be defined, and
parse_sync_method() should fail if "syncfs" is specified.  Furthermore, the
relevant part of fsync_pgdata() won't be compiled in whenever HAVE_SYNCFS
is not defined.
 
> +   pg_log_error("could not synchronize file system for file \"%s\": %m", 
> path);
> +   (void) close(fd);
> +   exit(EXIT_FAILURE);
> 
> walkdir() reports errors and does not consider these fatal.  Why the
> early exit()?

I know it claims to, but fsync_fname() does exit when fsync() fails:

returncode = fsync(fd);

/*
 * Some OSes don't allow us to fsync directories at all, so we can 
ignore
 * those errors. Anything else needs to be reported.
 */
if (returncode != 0 && !(isdir && (errno == EBADF || errno == EINVAL)))
{
pg_log_error("could not fsync file \"%s\": %m", fname);
(void) close(fd);
exit(EXIT_FAILURE);
}

I suspect that the current code does not treat failures for things like
open() as fatal because it's likely due to a lack of permissions on the
file, but it does treat failures to fsync() as fatal because it is more
likely to indicate that ѕomething is very wrong.  I don't know whether this
reasoning is sound, but I tried to match the current convention in the
syncfs() code.

> I am a bit concerned about the amount of duplication this patch
> introduces in the docs.  Perhaps this had better be moved into a new
> section of the docs to explain the tradeoffs, with each tool linking
> to it?

Yeah, this crossed my mind.  Do you know of any existing examples of
options with links to a common section?  One problem with this approach is
that there are small differences in the wording for some of the frontend
utilities, so it might be difficult to cleanly unite these sections.

> Do we actually need --no-sync at all if --sync-method is around?  We
> could have an extra --sync-method=none at option level with --no-sync
> still around mainly for compatibility?  Or perhaps that's just
> over-designing things?

I don't have a strong opinion.  We could take up deprecating --no-sync in a
follow-up thread, though.  Like you said, we'll probably need to keep it
around for backward compatibility, so it might not be worth t

Re: should frontend tools use syncfs() ?

2023-08-16 Thread Nathan Bossart
On Wed, Aug 16, 2023 at 08:17:05AM -0700, Nathan Bossart wrote:
> On Wed, Aug 16, 2023 at 08:10:10AM +0900, Michael Paquier wrote:
>> +   pg_log_error("could not synchronize file system for file \"%s\": 
>> %m", path);
>> +   (void) close(fd);
>> +   exit(EXIT_FAILURE);
>> 
>> walkdir() reports errors and does not consider these fatal.  Why the
>> early exit()?
> 
> I know it claims to, but fsync_fname() does exit when fsync() fails:
> 
>   returncode = fsync(fd);
> 
>   /*
>* Some OSes don't allow us to fsync directories at all, so we can 
> ignore
>* those errors. Anything else needs to be reported.
>*/
>   if (returncode != 0 && !(isdir && (errno == EBADF || errno == EINVAL)))
>   {
>   pg_log_error("could not fsync file \"%s\": %m", fname);
>   (void) close(fd);
>   exit(EXIT_FAILURE);
>   }
> 
> I suspect that the current code does not treat failures for things like
> open() as fatal because it's likely due to a lack of permissions on the
> file, but it does treat failures to fsync() as fatal because it is more
> likely to indicate that ѕomething is very wrong.  I don't know whether this
> reasoning is sound, but I tried to match the current convention in the
> syncfs() code.

Ah, it looks like this code used to treat fsync() errors as non-fatal, but
it was changed in commit 1420617.  I still find it a bit strange that some
errors that prevent a file from being sync'd are non-fatal while others
_are_ fatal, but that is probably a topic for another thread.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Replace known_assigned_xids_lck by memory barrier

2023-08-16 Thread Michail Nikolaev
Hello!

Updated version (with read barriers is attached).

> One remaining question I have is whether it is okay if we see an updated value
> for one of the head/tail variables but not the other.  It looks like the
> tail variable is only updated with ProcArrayLock held exclusively, which
> IIUC wouldn't prevent such mismatches even today, since we use a separate
> spinlock for reading them in some cases.

1) "The convention is that backends must hold shared ProcArrayLock to
examine the array", it is applied to pointers as well
2) Also, "only the startup process modifies the head/tail pointers."

So, the "tail" variable is updated by the startup process with
ProcArrayLock held in exclusive-only mode - so, no issues here.

Regarding "head" variable -  updates by the startup processes are
possible in next cases:
* ProcArrayLock in exclusive mode (like KnownAssignedXidsCompress or
KnownAssignedXidsSearch(remove=true)), no issues here
* ProcArrayLock not taken at all (like
KnownAssignedXidsAdd(exclusive_lock=false)) in such case we rely on
memory barrier machinery

Both head and tail variables are changed only with exclusive lock held.

I'll think more, but can't find something wrong here so far.
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 2a3da49b8f..e4093ed06d 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -61,7 +61,6 @@
 #include "port/pg_lfind.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
-#include "storage/spin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
@@ -82,7 +81,6 @@ typedef struct ProcArrayStruct
 	int			numKnownAssignedXids;	/* current # of valid entries */
 	int			tailKnownAssignedXids;	/* index of oldest valid element */
 	int			headKnownAssignedXids;	/* index of newest element, + 1 */
-	slock_t		known_assigned_xids_lck;	/* protects head/tail pointers */
 
 	/*
 	 * Highest subxid that has been removed from KnownAssignedXids array to
@@ -441,7 +439,6 @@ CreateSharedProcArray(void)
 		procArray->numKnownAssignedXids = 0;
 		procArray->tailKnownAssignedXids = 0;
 		procArray->headKnownAssignedXids = 0;
-		SpinLockInit(&procArray->known_assigned_xids_lck);
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
@@ -4564,11 +4561,11 @@ KnownAssignedTransactionIdsIdleMaintenance(void)
  * pointer.  This wouldn't require any lock at all, except that on machines
  * with weak memory ordering we need to be careful that other processors
  * see the array element changes before they see the head pointer change.
- * We handle this by using a spinlock to protect reads and writes of the
- * head/tail pointers.  (We could dispense with the spinlock if we were to
- * create suitable memory access barrier primitives and use those instead.)
- * The spinlock must be taken to read or write the head/tail pointers unless
- * the caller holds ProcArrayLock exclusively.
+ * We handle this by using a memory barrier to protect writes of the
+ * head pointer.
+ * The memory barrier is taken before write the head pointer unless
+ * the caller holds ProcArrayLock exclusively. Appropriate read memory barrier
+ * is taken accordingly.
  *
  * Algorithmic analysis:
  *
@@ -4755,7 +4752,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 
 	/*
 	 * Since only the startup process modifies the head/tail pointers, we
-	 * don't need a lock to read them here.
+	 * are safe read them here.
 	 */
 	head = pArray->headKnownAssignedXids;
 	tail = pArray->tailKnownAssignedXids;
@@ -4806,22 +4803,17 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 	pArray->numKnownAssignedXids += nxids;
 
 	/*
-	 * Now update the head pointer.  We use a spinlock to protect this
+	 * Now update the head pointer.  We use a memory barrier to protect this
 	 * pointer, not because the update is likely to be non-atomic, but to
 	 * ensure that other processors see the above array updates before they
 	 * see the head pointer change.
 	 *
 	 * If we're holding ProcArrayLock exclusively, there's no need to take the
-	 * spinlock.
+	 * barrier.
 	 */
-	if (exclusive_lock)
-		pArray->headKnownAssignedXids = head;
-	else
-	{
-		SpinLockAcquire(&pArray->known_assigned_xids_lck);
-		pArray->headKnownAssignedXids = head;
-		SpinLockRelease(&pArray->known_assigned_xids_lck);
-	}
+	if (!exclusive_lock)
+		pg_write_barrier();
+	pArray->headKnownAssignedXids = head;
 }
 
 /*
@@ -4843,20 +4835,12 @@ KnownAssignedXidsSearch(TransactionId xid, bool remove)
 	int			tail;
 	int			result_index = -1;
 
-	if (remove)
-	{
-		/* we hold ProcArrayLock exclusively, so no need for spinlock */
-		tail = pArray->tailKnownAssignedXids;
-		head = pArray->headKnownAssignedXids;
-	}
-	else
-	{
-		/* take spinlock to ensure we see up-to-date array contents */
-		SpinLockAcquire

Re: Extract numeric filed in JSONB more effectively

2023-08-16 Thread jian he
On Wed, Aug 16, 2023 at 2:28 PM Andy Fan  wrote:
>
> update with the correct patch..

regression=# select proname, pg_catalog.pg_get_function_arguments(oid)
from pg_proc
where proname =  'jsonb_extract_path_type';
 proname | pg_get_function_arguments
-+
 jsonb_extract_path_type | from_json jsonb, VARIADIC path_elems
text[], target_oid anyelement
(1 row)

VARIADIC should be the last argument?

select jsonb_array_element_type(jsonb'[1231]',0, null::int);
now return null.
Should it return 1231?

regression=# select jsonb_array_element_type(jsonb'1231',0, 1::int);
 jsonb_array_element_type
--
 1231
(1 row)

not sure if it's ok. if you think it's not ok then:
+ if (!JB_ROOT_IS_ARRAY(jb))
+PG_RETURN_NULL();
change to
+if (JB_ROOT_IS_SCALAR(jb) || !JB_ROOT_IS_ARRAY(jb))
+PG_RETURN_NULL();

select jsonb_array_element_type(jsonb'[1231]',0, '1'::jsonb);
will crash, because jsonb_array_element_type call
cast_jsonbvalue_to_type then in switch case, it will go to
default part. in default part you have Assert(false);
also in cast_jsonbvalue_to_type, PG_RETURN_POINTER(NULL) code won't be reached.

in jsonb_cast_support function. you already have
!jsonb_cast_is_optimized(fexpr->funcresulttype)). then in the default
branch of cast_jsonbvalue_to_type, you can just elog(error, "can only
cast to xxx type"). jsonb_array_element_type, jsonb_object_field_type,
 third argument is anyelement. so targetOid can be any datatype's oid.




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-08-16 Thread Jeff Davis
On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote:
> On 12.08.23 04:35, Jeff Davis wrote:
> > The attached patch implements a new SEARCH clause for CREATE
> > FUNCTION.
> > The SEARCH clause controls the search_path used when executing
> > functions that were created without a SET clause.
> 
> I don't understand this.  This adds a new option for cases where the 
> existing option wasn't specified.  Why not specify the existing
> option 
> then?  Is it not good enough?  Can we improve it?

SET search_path = '...' not good enough in my opinion.

1. Not specifying a SET clause falls back to the session's search_path,
which is a bad default because it leads to all kinds of inconsistent
behavior and security concerns.

2. There's no way to explicitly request that you'd actually like to use
the session's search_path, so it makes it very hard to ever change the
default.

3. It's user-unfriendly. A safe search_path that would be suitable for
most functions is "SET search_path = pg_catalog, pg_temp", which is
arcane, and requires some explanation.

4. search_path for the session is conceptually different than for a
function. A session should be context-sensitive and the same query
should (quite reasonably) behave differently for different sessions and
users to sort out things like object name conflicts, etc. A function
should (ordinarily) be context-insensitive, especially when used in
something like an index expression or constraint. Having different
syntax helps separate those concepts.

5. There's no way to prevent pg_temp from being included in the
search_path. This is separately fixable, but having the proposed SEARCH
syntax is likely to make for a better user experience in the common
cases.

I'm open to suggestion about other ways to improve it, but SEARCH is
what I came up with.

Regards,
Jeff Davis





Re: Replace known_assigned_xids_lck by memory barrier

2023-08-16 Thread Nathan Bossart
On Wed, Aug 16, 2023 at 05:30:59PM +0200, Michail Nikolaev wrote:
> Updated version (with read barriers is attached).

Thanks for the updated patch.  I've attached v4 in which I've made a number
of cosmetic edits.

> I'll think more, but can't find something wrong here so far.

IIUC this memory barrier stuff is only applicable to KnownAssignedXidsAdd()
(without an exclusive lock) when we add entries to the end of the array and
then update the head pointer.  Otherwise, appropriate locks are taken when
reading/writing the array.  For example, say we have the following array:

  head
   |
   v
[ 0, 1, 2, 3 ]

When adding elements, we keep the head pointer where it is:

  head
   |
   v
[ 0, 1, 2, 3, 4, 5 ]

If another processor sees this intermediate state, it's okay because it
will only inspect elements 0 through 3.  Only at the end do we update the
head pointer:

head
 |
 v
[ 0, 1, 2, 3, 4, 5 ]

With weak memory ordering and no barriers, another process may see this
(which is obviously no good):

head
 |
 v
[ 0, 1, 2, 3 ]

One thing that I'm still trying to understand is this code in
KnownAssignedXidsSearch():

/* we hold ProcArrayLock exclusively, so no need for spinlock */
tail = pArray->tailKnownAssignedXids;
head = pArray->headKnownAssignedXids;

It's not clear to me why holding ProcArrayLock exclusively means we don't
need to worry about the spinlock/barriers.  If KnownAssignedXidsAdd() adds
entries without a lock, holding ProcArrayLock won't protect you, and I
don't see anything else that acts as a read barrier before the array
entries are inspected.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 11de5076adc060c0dde32e8538714301f9b98d02 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 16 Aug 2023 10:52:56 -0700
Subject: [PATCH v4 1/1] Replace known_assigned_xids_lck with memory barriers.

Suggested-by: Tom Lane
Author: Michail Nikolaev
Discussion: https://postgr.es/m/CANtu0oh0si%3DjG5z_fLeFtmYcETssQ08kLEa8b6TQqDm_cinroA%40mail.gmail.com
---
 src/backend/storage/ipc/procarray.c | 71 ++---
 1 file changed, 25 insertions(+), 46 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 2a3da49b8f..a3def020b6 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -61,7 +61,6 @@
 #include "port/pg_lfind.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
-#include "storage/spin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
@@ -82,7 +81,6 @@ typedef struct ProcArrayStruct
 	int			numKnownAssignedXids;	/* current # of valid entries */
 	int			tailKnownAssignedXids;	/* index of oldest valid element */
 	int			headKnownAssignedXids;	/* index of newest element, + 1 */
-	slock_t		known_assigned_xids_lck;	/* protects head/tail pointers */
 
 	/*
 	 * Highest subxid that has been removed from KnownAssignedXids array to
@@ -441,7 +439,6 @@ CreateSharedProcArray(void)
 		procArray->numKnownAssignedXids = 0;
 		procArray->tailKnownAssignedXids = 0;
 		procArray->headKnownAssignedXids = 0;
-		SpinLockInit(&procArray->known_assigned_xids_lck);
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
@@ -4561,14 +4558,10 @@ KnownAssignedTransactionIdsIdleMaintenance(void)
  * during normal running).  Compressing unused entries out of the array
  * likewise requires exclusive lock.  To add XIDs to the array, we just insert
  * them into slots to the right of the head pointer and then advance the head
- * pointer.  This wouldn't require any lock at all, except that on machines
- * with weak memory ordering we need to be careful that other processors
- * see the array element changes before they see the head pointer change.
- * We handle this by using a spinlock to protect reads and writes of the
- * head/tail pointers.  (We could dispense with the spinlock if we were to
- * create suitable memory access barrier primitives and use those instead.)
- * The spinlock must be taken to read or write the head/tail pointers unless
- * the caller holds ProcArrayLock exclusively.
+ * pointer.  This doesn't require any lock at all, but on machines with weak
+ * memory ordering, we need to be careful that other processors see the array
+ * element changes before they see the head pointer change.  We handle this by
+ * using memory barriers.
  *
  * Algorithmic analysis:
  *
@@ -4806,22 +4799,15 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 	pArray->numKnownAssignedXids += nxids;
 
 	/*
-	 * Now update the head pointer.  We use a spinlock to protect th

Re: Replace known_assigned_xids_lck by memory barrier

2023-08-16 Thread Michail Nikolaev
Hello, good question!

Thanks for your edits.

As answer: probably we need to change
"If we know that we're holding ProcArrayLock exclusively, we don't
need the read barrier."
to
"If we're removing xid, we don't need the read barrier because only
the startup process can remove and add xids to KnownAssignedXids"

Best regards,
Mikhail.




Re: Using defines for protocol characters

2023-08-16 Thread Nathan Bossart
On Tue, Aug 15, 2023 at 11:40:07PM +0200, Alvaro Herrera wrote:
> On 2023-Aug-16, Michael Paquier wrote:
> 
>> On Wed, Aug 16, 2023 at 06:25:09AM +0900, Tatsuo Ishii wrote:
>> > Currently pqcomm.h needs c.h which is not problem for Pgpool-II. But
>> > what about other middleware?
>> 
>> Why do you need to include directly c.h?  There are definitions in
>> there that are not intended to be exposed.
> 
> What this argument says is that these new defines should be in a
> separate file, not in pqcomm.h.  IMO that makes sense, precisely because
> these defines should be usable by third parties.

I moved the definitions out to a separate file in v6.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From fc54d1655fe662abe99a4605bdff2d1b65f0dc2a Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 16 Aug 2023 12:08:29 -0700
Subject: [PATCH v6 1/1] Introduce macros for protocol characters.

Author: Dave Cramer
Reviewed-by: Alvaro Herrera, Tatsuo Ishii, Peter Smith, Robert Haas, Tom Lane, Peter Eisentraut
Discussion: https://postgr.es/m/CADK3HHKbBmK-PKf1bPNFoMC%2BoBt%2BpD9PH8h5nvmBQskEHm-Ehw%40mail.gmail.com
---
 src/backend/access/common/printsimple.c |  5 +-
 src/backend/access/transam/parallel.c   | 14 ++--
 src/backend/backup/basebackup_copy.c| 16 ++---
 src/backend/commands/async.c|  2 +-
 src/backend/commands/copyfromparse.c| 22 +++
 src/backend/commands/copyto.c   |  6 +-
 src/backend/libpq/auth-sasl.c   |  2 +-
 src/backend/libpq/auth.c|  8 +--
 src/backend/postmaster/postmaster.c |  2 +-
 src/backend/replication/walsender.c | 18 +++---
 src/backend/tcop/dest.c |  8 +--
 src/backend/tcop/fastpath.c |  2 +-
 src/backend/tcop/postgres.c | 68 ++--
 src/backend/utils/error/elog.c  |  5 +-
 src/backend/utils/misc/guc.c|  2 +-
 src/include/Makefile|  3 +-
 src/include/libpq/pqcomm.h  | 23 ++-
 src/include/libpq/protocol.h| 85 +
 src/include/meson.build |  1 +
 src/interfaces/libpq/fe-auth.c  |  2 +-
 src/interfaces/libpq/fe-connect.c   | 19 --
 src/interfaces/libpq/fe-exec.c  | 50 +++
 src/interfaces/libpq/fe-protocol3.c | 70 ++--
 src/interfaces/libpq/fe-trace.c | 70 +++-
 24 files changed, 301 insertions(+), 202 deletions(-)
 create mode 100644 src/include/libpq/protocol.h

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index ef818228ac..675b744db2 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -20,6 +20,7 @@
 
 #include "access/printsimple.h"
 #include "catalog/pg_type.h"
+#include "libpq/protocol.h"
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 
@@ -32,7 +33,7 @@ printsimple_startup(DestReceiver *self, int operation, TupleDesc tupdesc)
 	StringInfoData buf;
 	int			i;
 
-	pq_beginmessage(&buf, 'T'); /* RowDescription */
+	pq_beginmessage(&buf, PqMsg_RowDescription);
 	pq_sendint16(&buf, tupdesc->natts);
 
 	for (i = 0; i < tupdesc->natts; ++i)
@@ -65,7 +66,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 	slot_getallattrs(slot);
 
 	/* Prepare and send message */
-	pq_beginmessage(&buf, 'D');
+	pq_beginmessage(&buf, PqMsg_DataRow);
 	pq_sendint16(&buf, tupdesc->natts);
 
 	for (i = 0; i < tupdesc->natts; ++i)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 1738aecf1f..194a1207be 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1127,7 +1127,7 @@ HandleParallelMessage(ParallelContext *pcxt, int i, StringInfo msg)
 
 	switch (msgtype)
 	{
-		case 'K':/* BackendKeyData */
+		case PqMsg_BackendKeyData:
 			{
 int32		pid = pq_getmsgint(msg, 4);
 
@@ -1137,8 +1137,8 @@ HandleParallelMessage(ParallelContext *pcxt, int i, StringInfo msg)
 break;
 			}
 
-		case 'E':/* ErrorResponse */
-		case 'N':/* NoticeResponse */
+		case PqMsg_ErrorResponse:
+		case PqMsg_NoticeResponse:
 			{
 ErrorData	edata;
 ErrorContextCallback *save_error_context_stack;
@@ -1183,7 +1183,7 @@ HandleParallelMessage(ParallelContext *pcxt, int i, StringInfo msg)
 break;
 			}
 
-		case 'A':/* NotifyResponse */
+		case PqMsg_NotificationResponse:
 			{
 /* Propagate NotifyResponse. */
 int32		pid;
@@ -1217,7 +1217,7 @@ HandleParallelMessage(ParallelContext *pcxt, int i, StringInfo msg)
 break;
 			}
 
-		case 'X':/* Terminate, indicating clean exit */
+		case PqMsg_Terminate:
 			{
 shm_mq_detach(pcxt->worker[i].error_mqh);
 pcxt->worker[i].error_mqh = NULL;
@@ -1372,7 +1372,7 @@ ParallelWorkerMain(Datum main_arg)
 	 * protocol message is defined, but it won't actually be used for anything
 	 * in this case.
 	 */
-	pq_beginmessage(&msgbuf, 'K')

PostgreSQL 16 RC1 + GA release dates

2023-08-16 Thread Jonathan S. Katz

Hi,

The date for PostgreSQL 16 Release Candidate 1 (RC1) is August 31, 2023. 
Please ensure all open items[1] are completed and committed before 
August 26, 2023 12:00 UTC.


This means the current target date for the PostgreSQL 16 GA release is 
September 14, 2023. While this date could change if the release team 
decides the candidate release is not ready, please plan for this date to 
be the GA release.


Thanks,

Jonathan

[1] https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items


OpenPGP_signature
Description: OpenPGP digital signature


Re: Replace known_assigned_xids_lck by memory barrier

2023-08-16 Thread Nathan Bossart
On Wed, Aug 16, 2023 at 09:29:10PM +0200, Michail Nikolaev wrote:
> As answer: probably we need to change
> "If we know that we're holding ProcArrayLock exclusively, we don't
> need the read barrier."
> to
> "If we're removing xid, we don't need the read barrier because only
> the startup process can remove and add xids to KnownAssignedXids"

Ah, that explains it.  v5 of the patch is attached.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 6038182ac96226f0eec43b1c6ab2e060e8b1e3a8 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 16 Aug 2023 10:52:56 -0700
Subject: [PATCH v5 1/1] Replace known_assigned_xids_lck with memory barriers.

Suggested-by: Tom Lane
Author: Michail Nikolaev
Discussion: https://postgr.es/m/CANtu0oh0si%3DjG5z_fLeFtmYcETssQ08kLEa8b6TQqDm_cinroA%40mail.gmail.com
---
 src/backend/storage/ipc/procarray.c | 72 +++--
 1 file changed, 26 insertions(+), 46 deletions(-)

diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 2a3da49b8f..401e816275 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -61,7 +61,6 @@
 #include "port/pg_lfind.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
-#include "storage/spin.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/rel.h"
@@ -82,7 +81,6 @@ typedef struct ProcArrayStruct
 	int			numKnownAssignedXids;	/* current # of valid entries */
 	int			tailKnownAssignedXids;	/* index of oldest valid element */
 	int			headKnownAssignedXids;	/* index of newest element, + 1 */
-	slock_t		known_assigned_xids_lck;	/* protects head/tail pointers */
 
 	/*
 	 * Highest subxid that has been removed from KnownAssignedXids array to
@@ -441,7 +439,6 @@ CreateSharedProcArray(void)
 		procArray->numKnownAssignedXids = 0;
 		procArray->tailKnownAssignedXids = 0;
 		procArray->headKnownAssignedXids = 0;
-		SpinLockInit(&procArray->known_assigned_xids_lck);
 		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
@@ -4561,14 +4558,11 @@ KnownAssignedTransactionIdsIdleMaintenance(void)
  * during normal running).  Compressing unused entries out of the array
  * likewise requires exclusive lock.  To add XIDs to the array, we just insert
  * them into slots to the right of the head pointer and then advance the head
- * pointer.  This wouldn't require any lock at all, except that on machines
- * with weak memory ordering we need to be careful that other processors
- * see the array element changes before they see the head pointer change.
- * We handle this by using a spinlock to protect reads and writes of the
- * head/tail pointers.  (We could dispense with the spinlock if we were to
- * create suitable memory access barrier primitives and use those instead.)
- * The spinlock must be taken to read or write the head/tail pointers unless
- * the caller holds ProcArrayLock exclusively.
+ * pointer.  This doesn't require any lock at all, but on machines with weak
+ * memory ordering, we need to be careful that other processors see the array
+ * element changes before they see the head pointer change.  We handle this by
+ * using memory barriers when reading or writing the head/tail pointers (unless
+ * the caller holds ProcArrayLock exclusively).
  *
  * Algorithmic analysis:
  *
@@ -4806,22 +4800,15 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 	pArray->numKnownAssignedXids += nxids;
 
 	/*
-	 * Now update the head pointer.  We use a spinlock to protect this
-	 * pointer, not because the update is likely to be non-atomic, but to
-	 * ensure that other processors see the above array updates before they
-	 * see the head pointer change.
-	 *
-	 * If we're holding ProcArrayLock exclusively, there's no need to take the
-	 * spinlock.
+	 * Now update the head pointer.  We use a write barrier to ensure that
+	 * other processors see the above array updates before they see the head
+	 * pointer change.  The barrier isn't required if we're holding
+	 * ProcArrayLock exclusively.
 	 */
-	if (exclusive_lock)
-		pArray->headKnownAssignedXids = head;
-	else
-	{
-		SpinLockAcquire(&pArray->known_assigned_xids_lck);
-		pArray->headKnownAssignedXids = head;
-		SpinLockRelease(&pArray->known_assigned_xids_lck);
-	}
+	if (!exclusive_lock)
+		pg_write_barrier();
+
+	pArray->headKnownAssignedXids = head;
 }
 
 /*
@@ -4843,20 +4830,15 @@ KnownAssignedXidsSearch(TransactionId xid, bool remove)
 	int			tail;
 	int			result_index = -1;
 
-	if (remove)
-	{
-		/* we hold ProcArrayLock exclusively, so no need for spinlock */
-		tail = pArray->tailKnownAssignedXids;
-		head = pArray->headKnownAssignedXids;
-	}
-	else
-	{
-		/* take spinlock to ensure we see up-to-date array contents */
-		SpinLockAcquire(&pArray->known_assigned_xids_lck);
-		tail = pArray->tailKnownAssignedXids;
-		head = pArra

Re: run pgindent on a regular basis / scripted manner

2023-08-16 Thread Peter Geoghegan
On Tue, Aug 15, 2023 at 1:31 PM Nathan Bossart  wrote:
> Should we add those?  Patch attached.

I think that that makes sense. I just don't want to normalize updating
.git-blame-ignore-revs very frequently. (Actually, it's more like I
don't want to normalize any scheme that makes updating the ignore list
very frequently start to seem reasonable.)

-- 
Peter Geoghegan




Re: Faster "SET search_path"

2023-08-16 Thread Jeff Davis
On Tue, 2023-08-15 at 13:04 -0400, Robert Haas wrote:
> I suspect that dodging the GUC stack machinery is not a very good
> idea. The timing of when TRY/CATCH blocks are called is different
> from
> when subtransactions are aborted, and that distinction has messed me
> up more than once when trying to code various things.

...

> I can't say there isn't a way to make something like what you're
> talking about here work, but I bet it will be difficult to get all of
> the corner cases right, and I suspect that trying to speed up the
> existing mechanism is a better plan than trying to go around it.

The SearchPathCache that I implemented (0003) is an example of speeding
up the existing mechnism that and already offers a huge speedup. I'll
focus on getting that in first. That patch has had some review and I'm
tempted to commit it soon, but Nathan has requested another set of eyes
on it.

To bring the overhead closer to zero we need to somehow avoid repeating
so much work in guc.c, though. If we don't go around it, another
approach would be to separate GUC setting into two phases: one that
does the checks, and one that performs the final change. All the real
work (hash lookup, guc_strdup, and check_search_path) is done in the
"check" phase.

It's a bit painful to reorganize the code in guc.c, though, so that
might need to happen in a few parts and will depend on how great the
demand is.

Regards,
Jeff Davis





Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-16 Thread Jacob Champion
On Wed, Aug 16, 2023 at 6:27 AM Shaun Thomas
 wrote:
>
> > We could do something like a LOG "connection: method=%s user=%s
> > (%s:%d)", without the "authenticated" and "identity" terms from
> > set_authn_id().  Just to drop an idea.
>
> That would be my inclination as well. Heck, just slap a log message
> right in the specific case statements that don't have actual auth as
> defined by set_authn_id. This assumes anyone really cares about it
> that much, of course. :D

Maybe something like the attached?

- I made the check more generic, rather than hardcoding it inside the
trust statement, because my OAuth proposal would add a method that
only calls set_authn_id() some of the time.
- I used the phrasing "connection not authenticated" in the hopes that
it's a bit more greppable than just "connection", especially in
combination with the existing "connection authenticated" lines.

(I'm reminded that we're reflecting an unauthenticated username as-is
into the logs, but I also don't think this makes things any worse than
they are today with the "authorized" lines.)

--Jacob
From deef6a6daa161e729968d731d0588c87ca45599a Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 16 Aug 2023 14:48:49 -0700
Subject: [PATCH] log_connections: add entries for "trust" connections

Allow DBAs to audit unexpected "trust" connections.  Previously, you had
to notice the absence of a "connection authenticated" line between the
"connection received" and "connection authorized" entries.  Now it's
called out explicitly.
---
 src/backend/libpq/auth.c  | 15 +++
 src/test/authentication/t/001_password.pl |  6 ++
 2 files changed, 21 insertions(+)

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index e7571aaddb..db3d2b5289 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -645,6 +645,21 @@ ClientAuthentication(Port *port)
 #endif
 	}
 
+	if (Log_connections
+		&& (status == STATUS_OK) && !MyClientConnectionInfo.authn_id)
+	{
+		/*
+		 * No authentication was actually performed; this happens e.g. when the
+		 * trust method is in use.  For audit purposes, log a breadcrumb to
+		 * explain where in the HBA this happened.
+		 */
+		ereport(LOG,
+errmsg("connection not authenticated: user=\"%s\" method=%s "
+	   "(%s:%d)",
+	   port->user_name, hba_authname(port->hba->auth_method),
+	   port->hba->sourcefile, port->hba->linenumber));
+	}
+
 	if (ClientAuthentication_hook)
 		(*ClientAuthentication_hook) (port, status);
 
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 12552837a8..5438aa3184 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -140,8 +140,14 @@ $node->safe_psql('postgres', "CREATE database regex_testdb;");
 # considered to be authenticated.
 reset_pg_hba($node, 'all', 'all', 'trust');
 test_conn($node, 'user=scram_role', 'trust', 0,
+	log_like => [
+		qr/connection not authenticated: user="scram_role" method=trust/
+	],
 	log_unlike => [qr/connection authenticated:/]);
 test_conn($node, 'user=md5_role', 'trust', 0,
+	log_like => [
+		qr/connection not authenticated: user="md5_role" method=trust/
+	],
 	log_unlike => [qr/connection authenticated:/]);
 
 # SYSTEM_USER is null when not authenticated.
-- 
2.39.2



Re: Rename ExtendedBufferWhat in 16?

2023-08-16 Thread Andres Freund
Hi,

On 2023-08-12 12:29:05 +1200, Thomas Munro wrote:
> Commit 31966b15 invented a way for functions dealing with relation
> extension to accept a Relation in online code and an SMgrRelation in
> recovery code (instead of using the earlier FakeRelcacheEntry
> concept).  It seems highly likely that future new bufmgr.c interfaces
> will face the same problem, and need to do something similar.  Let's
> generalise the names so that each interface doesn't have to re-invent
> the wheel?  ExtendedBufferWhat is also just not a beautiful name.  How
> about BufferedObjectSelector?  That name leads to macros BOS_SMGR()
> and BOS_REL().  Could also be BufMgrObject/BMO, ... etc etc.

I like the idea of generalizing it.  I somehow don't quite like BOS*, but I
can't really put into words why, so...


> This is from a patch-set that I'm about to propose for 17, which needs
> one of these too, hence desire to generalise.  But if we rename them
> in 17, then AM authors, who are likely to discover and make use of
> this interface, would have to use different names for 16 and 17.

Makes sense to me.

Greetings,

Andres Freund




Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-08-16 Thread Andres Freund
On 2023-08-16 13:15:46 +0200, Alvaro Herrera wrote:
> Since the wins from this patch were replicated and it has been pushed, I
> understand that this open item can be marked as closed, so I've done
> that.

Thanks!




Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)

2023-08-16 Thread Thomas Munro
On Wed, Aug 16, 2023 at 11:18 PM Antonin Houska  wrote:
> I try to understand this patch (commit 5ffb7c7750) because I use condition
> variable in an extension. One particular problem occured to me, please
> consider:
>
> ConditionVariableSleep() gets interrupted, so AbortTransaction() calls
> ConditionVariableCancelSleep(), but the signal was sent in between. Shouldn't
> at least AbortTransaction() and AbortSubTransaction() check the return value
> of ConditionVariableCancelSleep(), and re-send the signal if needed?

I wondered about that in the context of our only in-tree user of
ConditionVariableSignal(), in parallel btree index creation, but since
it's using the parallel executor infrastructure, any error would be
propagated everywhere so all waits would be aborted.  There is another
place where the infrastructure cancels for you and would now eat the
result: if you prepare to sleep on one CV, and then prepare to sleep
on another, we''ll just cancel the first one.  It think that's
semantically OK: we can't really wait for two CVs at once, and if you
try you'll miss signals anyway, but you must already have code to cope
with that by re-checking your exit conditions.

> Note that I'm just thinking about such a problem, did not try to reproduce it.

Hmm.  I looked for users of ConditionVariableSignal() in the usual web
tools and didn't find anything, so I guess your extension is not
released yet or not open source.  I'm curious: what does it actually
do if there is an error in a CV-wakeup-consuming backend?  I guess it
might be some kind of work-queue processing system... it seems
inevitable that if backends are failing with errors, and you don't
respond by retrying/respawning, you'll lose or significantly delay
jobs/events/something anyway (imagine only slightly different timing:
you consume the signal and start working on a job and then ereport,
which amounts to the same thing in the end now that your transaction
is rolled back?), and when you retry you'll see whatever condition was
waited for anyway.  But that's just me imagining what some
hypothetical strawman system might look like...  what does it really
do?

(FWIW when I worked on a couple of different work queue-like systems
and tried to use ConditionVariableSignal() I eventually concluded that
it was the wrong tool for the job because its wakeup order is
undefined.  It's actually FIFO, but I wanted LIFO so that workers have
a chance to become idle and reduce the pool size, but I started to
think that once you want that level of control you really want to
build a bespoke wait list system, so I never got around to proposing
that we consider changing that.)

Our condition variables are weird.  They're not associated with a
lock, so we made start-of-wait non-atomic: prepare first, then return
control and let the caller check its condition, then sleep.  Typical
user space condition variable APIs force you to acquire some kind of
lock that protects the condition first, then check the condition, then
atomically release-associated-lock-and-start-sleeping, so there is no
data race but also no time where control is returned to the caller but
the thread is on the wait list consuming signals.  That choice has
some pros (you can choose whatever type of lock you want to protect
your condition, or none at all if you can get away with memory
barriers and magic) and cons..  However, as I think Andres was getting
at, having a non-atomic start-of-wait doesn't seem to require us to
have a non-atomic end-of-wait and associated extra contention.  So
maybe we should figure out how to fix that in 17.




Re: Would it be possible to backpatch Close support in libpq (28b5726) to PG16?

2023-08-16 Thread Joe Conway

On 8/15/23 15:39, Alvaro Herrera wrote:

On 2023-Aug-16, Michael Paquier wrote:


Personally I think backpatching 28b5726 has a really low risk of
breaking anything.


I agree about the low-risk argument, though.  This is just new code.


Here's a way to think about it.  If 16.1 was already out, would we add
libpq support for Close to 16.2?


Seems pretty clearly a "no" to me.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Rename ExtendedBufferWhat in 16?

2023-08-16 Thread Thomas Munro
On Thu, Aug 17, 2023 at 10:49 AM Andres Freund  wrote:
> On 2023-08-12 12:29:05 +1200, Thomas Munro wrote:
> > Commit 31966b15 invented a way for functions dealing with relation
> > extension to accept a Relation in online code and an SMgrRelation in
> > recovery code (instead of using the earlier FakeRelcacheEntry
> > concept).  It seems highly likely that future new bufmgr.c interfaces
> > will face the same problem, and need to do something similar.  Let's
> > generalise the names so that each interface doesn't have to re-invent
> > the wheel?  ExtendedBufferWhat is also just not a beautiful name.  How
> > about BufferedObjectSelector?  That name leads to macros BOS_SMGR()
> > and BOS_REL().  Could also be BufMgrObject/BMO, ... etc etc.
>
> I like the idea of generalizing it.  I somehow don't quite like BOS*, but I
> can't really put into words why, so...

Do you like BufferManagerRelation, BMR_REL(), BMR_SMGR()?

Just BM_ would clash with the flag namespace.

> > This is from a patch-set that I'm about to propose for 17, which needs
> > one of these too, hence desire to generalise.  But if we rename them
> > in 17, then AM authors, who are likely to discover and make use of
> > this interface, would have to use different names for 16 and 17.
>
> Makes sense to me.

Does anyone else want to object?  Restating the case in brief: commit
31966b15's naming is short-sighted and likely to lead to a
proliferation of similar things or a renaming in later releases.




Re: Rename ExtendedBufferWhat in 16?

2023-08-16 Thread Peter Geoghegan
On Wed, Aug 16, 2023 at 4:32 PM Thomas Munro  wrote:
> Does anyone else want to object?  Restating the case in brief: commit
> 31966b15's naming is short-sighted and likely to lead to a
> proliferation of similar things or a renaming in later releases.

+1 to proceeding with this change.

-- 
Peter Geoghegan




Re: Using defines for protocol characters

2023-08-16 Thread Michael Paquier
On Wed, Aug 16, 2023 at 12:29:56PM -0700, Nathan Bossart wrote:
> I moved the definitions out to a separate file in v6.

Looks sensible seen from here.

This patch is missing the installation of protocol.h in
src/tools/msvc/Install.pm for MSVC.  For pqcomm.h, we are doing that:
lcopy('src/include/libpq/pqcomm.h', $target . '/include/internal/libpq/')
|| croak 'Could not copy pqcomm.h';

So adding two similar lines for protocol.h should be enough (I assume,
did not test).

In fe-exec.c, we still have a few things for the type of objects to
work on:
- 'S' for statement.
- 'P' for portal.
Should these be added to protocol.h?  They are part of the extended
protocol.

The comment at the top of PQsendTypedCommand() mentions 'C' and 'D',
but perhaps these should be updated to the object names instead?

pqFunctionCall3(), for PQfn(), has a few more hardcoded characters for
its status codes.  I'm OK to do things incrementally so it's fine by
me to not add them now, just noticing on the way what could be added
to this new header.
--
Michael


signature.asc
Description: PGP signature


Re: Return value of pg_promote()

2023-08-16 Thread Michael Paquier
On Wed, Aug 16, 2023 at 05:02:09PM +0900, Michael Paquier wrote:
>  if (kill(PostmasterPid, SIGUSR1) != 0)
>  {
> -ereport(WARNING,
> -(errmsg("failed to send signal to postmaster: %m")));
>  (void) unlink(PROMOTE_SIGNAL_FILE);
> -PG_RETURN_BOOL(false);
> +ereport(ERROR,
> +(errmsg("failed to send signal to postmaster: %m")));
>  }
> 
> Shouldn't you assign an error code to this one rather than the
> default one for internal errors, like ERRCODE_SYSTEM_ERROR?
> 
>  /* return immediately if waiting was not requested */
> @@ -744,7 +743,9 @@ pg_promote(PG_FUNCTION_ARGS)
>   * necessity for manual cleanup of all postmaster children.
>   */
>  if (rc & WL_POSTMASTER_DEATH)
> -PG_RETURN_BOOL(false);
> +ereport(FATAL,
> +(errcode(ERRCODE_ADMIN_SHUTDOWN),
> + errmsg("terminating connection due to unexpected 
> postmaster exit")));
> 
> I would add an errcontext here, to let somebody know that the
> connection died while waiting for the promotion to be processed, say
> "while waiting on promotion".

I have just noticed that we do not have a CF entry for this proposal,
so I have added one with Laurenz as author:
https://commitfest.postgresql.org/44/4504/

For now the patch is waiting on author.  Could you address my
last review?
--
Michael


signature.asc
Description: PGP signature


Re: Rename ExtendedBufferWhat in 16?

2023-08-16 Thread Andres Freund
On 2023-08-17 11:31:27 +1200, Thomas Munro wrote:
> On Thu, Aug 17, 2023 at 10:49 AM Andres Freund  wrote:
> > On 2023-08-12 12:29:05 +1200, Thomas Munro wrote:
> > > Commit 31966b15 invented a way for functions dealing with relation
> > > extension to accept a Relation in online code and an SMgrRelation in
> > > recovery code (instead of using the earlier FakeRelcacheEntry
> > > concept).  It seems highly likely that future new bufmgr.c interfaces
> > > will face the same problem, and need to do something similar.  Let's
> > > generalise the names so that each interface doesn't have to re-invent
> > > the wheel?  ExtendedBufferWhat is also just not a beautiful name.  How
> > > about BufferedObjectSelector?  That name leads to macros BOS_SMGR()
> > > and BOS_REL().  Could also be BufMgrObject/BMO, ... etc etc.
> >
> > I like the idea of generalizing it.  I somehow don't quite like BOS*, but I
> > can't really put into words why, so...
> 
> Do you like BufferManagerRelation, BMR_REL(), BMR_SMGR()?
> 
> Just BM_ would clash with the flag namespace.

I like BMR better!




Re: pg_logical_emit_message() misses a XLogFlush()

2023-08-16 Thread Michael Paquier
On Wed, Aug 16, 2023 at 12:01:01PM +0200, Tomas Vondra wrote:
> To me losing messages seems like a bad thing, but if the users are aware
> of it and are fine with it ... I'm simply arguing that if we conclude
> this is a durability bug, we should not leave it unfixed because it
> might have performance impact.

I've been doing some digging here, and the original bdr repo posted at
[1] has a concept similar to LogLogicalMessage() called
LogStandbyMessage().  *All* the non-transactional code paths enforce
an XLogFlush() after *each* message logged.  So the original
expectation seems pretty clear to me: flushes were wanted.

[1]: https://github.com/2ndQuadrant/bdr

>> I've e.g. used non-transactional messages for:
>> 
>> - A non-transactional queuing system. Where sometimes one would dump a 
>> portion
>>   of tables into messages, with something like
>>   SELECT pg_logical_emit_message(false, 'app:', to_json(r)) FROM r;
>>   Obviously flushing after every row would be bad.
>> 
>>   This is useful when you need to coordinate with other systems in a
>>   non-transactional way. E.g. letting other parts of the system know that
>>   files on disk (or in S3 or ...) were created/deleted, since a database
>>   rollback wouldn't unlink/revive the files.
>> 
>> - Audit logging, when you want to log in a way that isn't undone by rolling
>>   back transaction - just flushing every pg_logical_emit_message() would
>>   increase the WAL flush rate many times, because instead of once per
>>   transaction, you'd now flush once per modified row. It'd basically make it
>>   impractical to use for such things.
>> 
>> - Optimistic locking. Emitting things that need to be locked on logical
>>   replicas, to be able to commit on the primary. A pre-commit hook would wait
>>   for the WAL to be replayed sufficiently - but only once per transaction, 
>> not
>>   once per object.
> 
> How come the possibility of losing messages is not an issue for these
> use cases? I mean, surely auditors would not like that, and I guess
> forgetting locks might be bad too.

+1.  Now I can also get why one may not want to flush every individual
messages if you care only about a queue to be flushed after generating
a series of them, so after sleeping on it I'm OK with the last patch I
posted where one can just choose what he wants.  The default, though,
may be better if flush is true compared to false (the patch has kept
flush at false)..

I am not sure where this is leading yet, so I have registered a CF
entry to keep track of that:
https://commitfest.postgresql.org/44/4505/
--
Michael


signature.asc
Description: PGP signature


Re: pgbench: allow to exit immediately when any client is aborted

2023-08-16 Thread Tatsuo Ishii
> About pgbench exit on abort v4:
> 
> Patch applies cleanly, compiles, local make check ok, doc looks ok.
> 
> This looks ok to me.

I have tested the v4 patch with default_transaction_isolation =
'repeatable read'.

pgbench --exit-on-abort -N -p 11002 -c 10 -T 30 test
pgbench (17devel, server 15.3)
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
maximum number of tries: 1
duration: 30 s
number of transactions actually processed: 64854
number of failed transactions: 4 (0.006%)
latency average = 4.628 ms (including failures)
initial connection time = 49.526 ms
tps = 2160.827556 (without initial connection time)

Depite the 4 failed transactions (could not serialize access due to
concurrent update) pgbench ran normally, rather than aborting the
test. Is this an expected behavior?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: WIP: new system catalog pg_wait_event

2023-08-16 Thread Masahiro Ikeda

Hi,

Thank you for creating the patch!
I think it is a very useful view as a user.

I will share some thoughts about the v6 patch.

1)

The regular expression needs to be changed in 
generate-wait_event_types.pl.

I have compared the documentation with the output of the pg_wait_event
view and found the following differences.

* For parameter names, the substitution for underscore is missing.
-ArchiveCleanupCommand Waiting for archive_cleanup_command to complete
-ArchiveCommand Waiting for archive_command to complete
+ArchiveCleanupCommand Waiting for archive-cleanup-command to complete
+ArchiveCommand Waiting for archive-command to complete
-RecoveryEndCommand Waiting for recovery_end_command to complete
+RecoveryEndCommand Waiting for recovery-end-command to complete
-RestoreCommand Waiting for restore_command to complete
+RestoreCommand Waiting for restore-command to complete

* The HTML tag match is not shortest match.
-frozenid Waiting to update pg_database.datfrozenxid and 
pg_database.datminmxid

+frozenid Waiting to update datminmxid

* There are two blanks before "about". Also " for heavy weight is
  removed (is it intended?)
-LockManager Waiting to read or update information about "heavyweight" 
locks
+LockManager Waiting to read or update information  about heavyweight 
locks


* Do we need "worker_spi_main" in the description? The name column
  shows the same one, so it could be omitted.

pg_wait_event
worker_spi_main Custom wait event "worker_spi_main" defined by 
extension


2)

Would it be better to add "extension" meaning unassigned?

Extensions can add Extension and LWLock types to the list shown in 
Table 28.8 and Table 28.12. In some cases, the name of LWLock assigned 
by an extension will not be available in all server processes; It might 
be reported as just “extension” rather than the extension-assigned 
name.


3)

Would index == els be better for the following Assert?
+   Assert(index <= els);

4)

There is a typo. alll -> all
+   /* Allocate enough space for alll entries */

5)

BTW, although I think this is outside the scope of this patch,
it might be a good idea to be able to add a description to the
API for custom wait events.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: WIP: new system catalog pg_wait_event

2023-08-16 Thread Michael Paquier
On Thu, Aug 17, 2023 at 10:53:02AM +0900, Masahiro Ikeda wrote:
> BTW, although I think this is outside the scope of this patch,
> it might be a good idea to be able to add a description to the
> API for custom wait events.

Somebody on twitter has raised this point.  I am not sure that we need
to go down to that for the sake of this view, but I'm OK to..
Disagree and Commit to any consensus reached on this matter.
--
Michael


signature.asc
Description: PGP signature


Re: should frontend tools use syncfs() ?

2023-08-16 Thread Michael Paquier
On Wed, Aug 16, 2023 at 08:23:25AM -0700, Nathan Bossart wrote:
> Ah, it looks like this code used to treat fsync() errors as non-fatal, but
> it was changed in commit 1420617.  I still find it a bit strange that some
> errors that prevent a file from being sync'd are non-fatal while others
> _are_ fatal, but that is probably a topic for another thread.

Right.  That rings a bell.
--
Michael


signature.asc
Description: PGP signature


Re: WIP: new system catalog pg_wait_event

2023-08-16 Thread Masahiro Ikeda

On 2023-08-17 10:57, Michael Paquier wrote:

On Thu, Aug 17, 2023 at 10:53:02AM +0900, Masahiro Ikeda wrote:

BTW, although I think this is outside the scope of this patch,
it might be a good idea to be able to add a description to the
API for custom wait events.


Somebody on twitter has raised this point.  I am not sure that we need
to go down to that for the sake of this view, but I'm OK to..
Disagree and Commit to any consensus reached on this matter.


Oh, okay. Thanks for sharing the information.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: PG 16 draft release notes ready

2023-08-16 Thread Bruce Momjian
You can view the Postgres 16 release notes, with markup and links to our
docs, here:

https://momjian.us/pgsql_docs/release-16.html

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: PG 16 draft release notes ready

2023-08-16 Thread Bruce Momjian
On Wed, Aug  9, 2023 at 08:35:21PM -0400, Bruce Momjian wrote:
> On Sat, Aug  5, 2023 at 04:08:47PM -0700, Noah Misch wrote:
> > > Author: Robert Haas 
> > > 2022-08-25 [e3ce2de09] Allow grant-level control of role inheritance 
> > > behavior.
> > > -->
> > > 
> > > 
> > > 
> > > Allow GRANT to control role inheritance behavior (Robert Haas)
> > > 
> > > 
> > > 
> > > By default, role inheritance is controlled by the inheritance status of 
> > > the member role.  The new GRANT clauses WITH INHERIT and WITH ADMIN can 
> > > now override this.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Allow roles that create other roles to automatically inherit the new 
> > > role's rights or SET ROLE to the new role (Robert Haas, Shi Yu)
> > > 
> > > 
> > > 
> > > This is controlled by server variable createrole_self_grant.
> > > 
> > > 
> > 
> > Similarly, v16 radically changes the CREATE ROLE ... WITH INHERIT clause.  
> > The
> > clause used to "change the behavior of already-existing grants."  Let's 
> > merge
> > these two and move the combination to the incompatibilities section.
> 
> I need help with this.  I don't understand how they can be combined, and
> I don't understand the incompatibility text in commit e3ce2de09d:
> 
> If a GRANT does not specify WITH INHERIT, the behavior based on
> whether the member role is marked INHERIT or NOINHERIT. This means
> that if all roles are marked INHERIT or NOINHERIT before any role
> grants are performed, the behavior is identical to what we had before;
> otherwise, it's different, because ALTER ROLE [NO]INHERIT now only
> changes the default behavior of future grants, and has no effect on
> existing ones.

I am waiting for an answer to this question, or can I assume the release
notes are acceptable?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Rename ExtendedBufferWhat in 16?

2023-08-16 Thread Thomas Munro
On Thu, Aug 17, 2023 at 12:42 PM Andres Freund  wrote:
> I like BMR better!

Thanks Andres and Peter.  Here's a version like that.  I hesitated
about BufMgrRelation instead, but neither name appears in code
currently and full words are better.  In this version I also renamed
all the 'eb' variables to 'bmr'.

If there are no more comments or objections, I'd like to push this to
16 and master in a day or two.
From b894f5fe17c45feca5943cdc657e6853a020084d Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Fri, 11 Aug 2023 15:07:50 +1200
Subject: [PATCH] ExtendBufferedWhat -> BufferManagerRelation.

Commit 31966b15 invented a way for functions dealing with relation
extension to accept a Relation in online code and an SMgrRelation in
recovery code.  It seems highly likely that other bufmgr.c interfaces
will face the same problem, and need to do something similar.
Generalize the names so that each interface doesn't have to re-invent
the wheel.

Back-patch to 16, which introduced this interface.  Since extension AM
authors are likely to use it, we don't want to change them in the next
release, so squeak this new name into the 16 release.

Reviewed-by: Peter Geoghegan 
Reviewed-by: Andres Freund 
Discussion: https://postgr.es/m/CA%2BhUKG%2B6tLD2BhpRWycEoti6LVLyQq457UL4ticP5xd8LqHySA%40mail.gmail.com
---
 contrib/bloom/blutils.c   |   2 +-
 src/backend/access/brin/brin.c|   4 +-
 src/backend/access/brin/brin_revmap.c |   2 +-
 src/backend/access/gin/gininsert.c|   4 +-
 src/backend/access/gin/ginutil.c  |   2 +-
 src/backend/access/gist/gist.c|   2 +-
 src/backend/access/gist/gistutil.c|   2 +-
 src/backend/access/hash/hashpage.c|   2 +-
 src/backend/access/heap/hio.c |   2 +-
 src/backend/access/heap/visibilitymap.c   |   2 +-
 src/backend/access/nbtree/nbtpage.c   |   2 +-
 src/backend/access/spgist/spgutils.c  |   2 +-
 src/backend/access/transam/xlogutils.c|   2 +-
 src/backend/commands/sequence.c   |   2 +-
 src/backend/storage/buffer/bufmgr.c   | 112 +++---
 src/backend/storage/buffer/localbuf.c |  10 +-
 src/backend/storage/freespace/freespace.c |   2 +-
 src/include/storage/buf_internals.h   |   2 +-
 src/include/storage/bufmgr.h  |  20 ++--
 src/tools/pgindent/typedefs.list  |   2 +-
 20 files changed, 90 insertions(+), 90 deletions(-)

diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index d935ed8fbd..a017d58bbe 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -386,7 +386,7 @@ BloomNewBuffer(Relation index)
 	}
 
 	/* Must extend the file */
-	buffer = ExtendBufferedRel(EB_REL(index), MAIN_FORKNUM, NULL,
+	buffer = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
 			   EB_LOCK_FIRST);
 
 	return buffer;
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 3c6a956eaa..d4fec654bb 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -848,7 +848,7 @@ brinbuild(Relation heap, Relation index, IndexInfo *indexInfo)
 	 * whole relation will be rolled back.
 	 */
 
-	meta = ExtendBufferedRel(EB_REL(index), MAIN_FORKNUM, NULL,
+	meta = ExtendBufferedRel(BMR_REL(index), MAIN_FORKNUM, NULL,
 			 EB_LOCK_FIRST | EB_SKIP_EXTENSION_LOCK);
 	Assert(BufferGetBlockNumber(meta) == BRIN_METAPAGE_BLKNO);
 
@@ -915,7 +915,7 @@ brinbuildempty(Relation index)
 	Buffer		metabuf;
 
 	/* An empty BRIN index has a metapage only. */
-	metabuf = ExtendBufferedRel(EB_REL(index), INIT_FORKNUM, NULL,
+	metabuf = ExtendBufferedRel(BMR_REL(index), INIT_FORKNUM, NULL,
 EB_LOCK_FIRST | EB_SKIP_EXTENSION_LOCK);
 
 	/* Initialize and xlog metabuffer. */
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index f4271ba31c..84d627cb5c 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -569,7 +569,7 @@ revmap_physical_extend(BrinRevmap *revmap)
 	}
 	else
 	{
-		buf = ExtendBufferedRel(EB_REL(irel), MAIN_FORKNUM, NULL,
+		buf = ExtendBufferedRel(BMR_REL(irel), MAIN_FORKNUM, NULL,
 EB_LOCK_FIRST);
 		if (BufferGetBlockNumber(buf) != mapBlk)
 		{
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index be1841de7b..56968b95ac 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -440,9 +440,9 @@ ginbuildempty(Relation index)
 MetaBuffer;
 
 	/* An empty GIN index has two pages. */
-	MetaBuffer = ExtendBufferedRel(EB_REL(index), INIT_FORKNUM, NULL,
+	MetaBuffer = ExtendBufferedRel(BMR_REL(index), INIT_FORKNUM, NULL,
    EB_LOCK_FIRST | EB_SKIP_EXTENSION_LOCK);
-	RootBuffer = ExtendBufferedRel(EB_REL(index), INIT_FORKNUM, NULL,
+	RootBuffer = ExtendBufferedRel(BMR_REL(index), INIT_FORKNUM, NULL,
    EB_LOCK_FIRST | EB_SKIP_EXTENSION_LOCK);
 
 	/* Initialize and xlog metabuffer and root buffer. */
dif

Re: should frontend tools use syncfs() ?

2023-08-16 Thread Michael Paquier
On Wed, Aug 16, 2023 at 08:17:05AM -0700, Nathan Bossart wrote:
> On Wed, Aug 16, 2023 at 08:10:10AM +0900, Michael Paquier wrote:
>> On Tue, Aug 08, 2023 at 01:06:06PM -0700, Nathan Bossart wrote:
>> +else
>> +{
>> +while (errno = 0, (de = readdir(dir)) != NULL)
>> +{
>> +charsubpath[MAXPGPATH * 2];
>> +
>> +if (strcmp(de->d_name, ".") == 0 ||
>> +strcmp(de->d_name, "..") == 0)
>> +continue;
>> 
>> It seems to me that there is no need to do that for in-place
>> tablespaces.  There are relative paths in pg_tblspc/, so they would be
>> taken care of by the syncfs() done on the main data folder.
>> 
>> This does not really check if the mount points of each tablespace is
>> different, as well.  For example, imagine that you have two
>> tablespaces within the same disk, syncfs() twice.  Perhaps, the
>> current logic is OK anyway as long as the behavior is optional, but it
>> should be explained in the docs, at least.
> 
> True.  But I don't know if checking the mount point of each tablespace is
> worth the complexity.

Perhaps worth a note, this would depend on statvfs(), which is not
that portable the last time I looked at it (NetBSD, some BSD-ish?  And
of course WIN32).

> In the worst case, we'll call syncfs() on the same
> file system a few times, which is probably still much faster in most cases.
> FWIW this is what recovery_init_sync_method does today, and I'm not aware
> of any complaints about this behavior.

Hmm.  Okay.

> The patch does have the following note:
> 
> +On Linux, syncfs may be used instead to ask the
> +operating system to synchronize the whole file systems that contain 
> the
> +data directory, the WAL files, and each tablespace.
> 
> Do you think that is sufficient, or do you think we should really clearly
> explain that you could end up calling syncfs() on the same file system a
> few times if your tablespaces are on the same disk?  I personally feel
> like that'd be a bit too verbose for the already lengthy descriptions of
> this setting.

It does not hurt to mention that the code syncfs()-es each tablespace
path (not in-place tablespaces), ignoring locations that share the
same mounting point, IMO.  For that, we'd better rely on
get_dirent_type() like the normal sync path.

>> I'm finding a bit confusing that fsync_pgdata() is coded in such a way
>> that it does a silent fallback to the cascading syncs through
>> walkdir() when syncfs is specified but not available in the build.
>> Perhaps an error is more helpful because one would then know that they
>> are trying something that's not around?
> 
> If syncfs() is not available, SYNC_METHOD_SYNCFS won't even be defined, and
> parse_sync_method() should fail if "syncfs" is specified.  Furthermore, the
> relevant part of fsync_pgdata() won't be compiled in whenever HAVE_SYNCFS
> is not defined.

That feels structurally inconsistent with what we do with other
option sets that have library dependencies.  For example, look at
compression.h and what happens for pg_compress_algorithm.  So, it
seems to me that it would be more friendly to list SYNC_METHOD_SYNCFS
all the time in SyncMethod even if HAVE_SYNCFS is not around, and at
least generate a warning rather than having a platform-dependent set
of options?

SyncMethod may be a bit too generic as name for the option structure.
How about a PGSyncMethod or pg_sync_method?

>> I am a bit concerned about the amount of duplication this patch
>> introduces in the docs.  Perhaps this had better be moved into a new
>> section of the docs to explain the tradeoffs, with each tool linking
>> to it?
> 
> Yeah, this crossed my mind.  Do you know of any existing examples of
> options with links to a common section?  One problem with this approach is
> that there are small differences in the wording for some of the frontend
> utilities, so it might be difficult to cleanly unite these sections.

The closest thing I can think of is Color Support in section
Appendixes, that describes something shared across a lot of binaries
(that would be 6 tools with this patch).

>> Do we actually need --no-sync at all if --sync-method is around?  We
>> could have an extra --sync-method=none at option level with --no-sync
>> still around mainly for compatibility?  Or perhaps that's just
>> over-designing things?
> 
> I don't have a strong opinion.  We could take up deprecating --no-sync in a
> follow-up thread, though.  Like you said, we'll probably need to keep it
> around for backward compatibility, so it might not be worth the trouble.

Okay, maybe that's not worth it.
--
Michael


signature.asc
Description: PGP signature


Re: Minor configure/meson cleanup

2023-08-16 Thread Thomas Munro
On Tue, Aug 8, 2023 at 12:27 AM Tristan Partin  wrote:
> I agree that the change look good.

Thanks both.  Pushed.




Re: Extending SMgrRelation lifetimes

2023-08-16 Thread Thomas Munro
On Wed, Aug 16, 2023 at 4:11 AM Heikki Linnakangas  wrote:
> Makes sense.

Thanks for looking!

> If you change smgrclose() to do what smgrrelease() does now, then it
> will apply automatically to extensions.
>
> If an extension is currently using smgropen()/smgrclose() correctly,
> this patch alone won't make it wrong, so it's not very critical for
> extensions to adopt the change. However, if after this we consider it OK
> to hold a pointer to SMgrRelation for longer, and start doing that in
> the backend, then extensions need to be adapted too.

Yeah, that sounds quite compelling.  Let's try that way:

 * smgrrelease() is removed
 * smgrclose() now releases resources, but doesn't destroy until EOX
 * smgrdestroy() now frees memory, and should rarely be used

Still WIP while I think about edge cases, but so far I think this is
the better option.

> > While studying this I noticed a minor thinko in smgrrelease() in
> > 15+16, so here's a fix for that also.  I haven't figured out a
> > sequence that makes anything bad happen, but we should really
> > invalidate smgr_targblock when a relfilenode is reused, since it might
> > point past the end.  This becomes more obvious once smgrrelease() is
> > used for truncation, as proposed here.
>
> +1. You can move the smgr_targblock clearing out of the loop, though.

Right, thanks.  Pushed.
From 71648a5b137540320fe782116e81f80c053c7f2c Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 10 Aug 2023 18:16:31 +1200
Subject: [PATCH v2] Give SMgrRelation pointers a well-defined lifetime.

After calling smgropen(), it was not clear how long you could continue
to use the result, because various code paths including cache
invalidation could call smgrclose(), which freed the memory.

Guarantee that the object won't be destroyed until the end of the
current transaction, or in recovery, the commit/abort record that
destroys the underlying storage.

The existing smgrclose() function now closes files and forgets all state
except the rlocator, like smgrrelease() used to do, and additionally
"disowns" the object so that it is disconnected from any Relation and
queued for destruction at AtEOXact_SMgr(), unless it is re-owned by a
Relation again before then.

A new smgrdestroy() function is used by rare places that know there
should be no other references to the SMgrRelation.

The short version:

 * smgrrelease() is removed
 * smgrclose() now releases resources, but doesn't destroy until EOX
 * smgrdestroy() now frees memory, and should rarely be used

Existing code should be unaffected, but it is now possible for code that
has an SMgrRelation object to use it repeatedly during a transaction as
long as the storage hasn't been physically dropped.  Such code would
normally hold a lock on the relation.

Reviewed-by: Heikki Linnakangas 
Discussion: https://postgr.es/m/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA%40mail.gmail.com
---
 src/backend/access/transam/xlogutils.c |  2 +-
 src/backend/postmaster/bgwriter.c  |  4 +--
 src/backend/postmaster/checkpointer.c  |  4 +--
 src/backend/storage/smgr/md.c  |  2 +-
 src/backend/storage/smgr/smgr.c| 46 --
 src/include/storage/smgr.h |  4 +--
 src/include/utils/rel.h|  6 
 7 files changed, 37 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c
index e174a2a891..bed15dad0f 100644
--- a/src/backend/access/transam/xlogutils.c
+++ b/src/backend/access/transam/xlogutils.c
@@ -657,7 +657,7 @@ XLogDropDatabase(Oid dbid)
 	 * This is unnecessarily heavy-handed, as it will close SMgrRelation
 	 * objects for other databases as well. DROP DATABASE occurs seldom enough
 	 * that it's not worth introducing a variant of smgrclose for just this
-	 * purpose. XXX: Or should we rather leave the smgr entries dangling?
+	 * purpose.
 	 */
 	smgrcloseall();
 
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index caad642ec9..a0ecc7f841 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -202,7 +202,7 @@ BackgroundWriterMain(void)
 		 * where holding deleted files open causes various strange errors.
 		 * It's not clear we need it elsewhere, but shouldn't hurt.
 		 */
-		smgrcloseall();
+		smgrdestroyall();
 
 		/* Report wait end here, when there is no further possibility of wait */
 		pgstat_report_wait_end();
@@ -248,7 +248,7 @@ BackgroundWriterMain(void)
 			 * After any checkpoint, close all smgr files.  This is so we
 			 * won't hang onto smgr references to deleted files indefinitely.
 			 */
-			smgrcloseall();
+			smgrdestroyall();
 		}
 
 		/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index ace9893d95..6daccaab78 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -316,7 +316,7 @@ CheckpointerMain(void)
 		 * where holding delet

Re: New WAL record to detect the checkpoint redo location

2023-08-16 Thread Michael Paquier
On Tue, Aug 15, 2023 at 02:23:43PM +0530, Dilip Kumar wrote:
> Yeah, good idea, actually we can do this insert outside of the
> exclusive insert lock and set the LSN of this insert as the
> checkpoint. redo location.  So now we do not need to compute the
> checkpoint. redo based on the current insertion point we can directly
> use the LSN of this record.  I have modified this and I have also
> moved some other code that is not required to be inside the WAL
> insertion lock.

Looking at this patch, I am bit surprised to see that the redo point
maps with the end location of the CHECKPOINT_REDO record as it is the
LSN returned by XLogInsert(), not its start LSN.  For example after a
checkpoint:
=# CREATE EXTENSION pg_walinspect;
CREATE EXTENSION;
=# SELECT redo_lsn, checkpoint_lsn from pg_control_checkpoint();
 redo_lsn  | checkpoint_lsn
---+
 0/19129D0 | 0/1912A08
(1 row)
=# SELECT start_lsn, prev_lsn, end_lsn, record_type
 from pg_get_wal_record_info('0/19129D0');
 start_lsn | prev_lsn  |  end_lsn  |  record_type
---+---+---+---
 0/19129D0 | 0/19129B0 | 0/1912A08 | RUNNING_XACTS
(1 row)

In this case it matches with the previous record:
=# SELECT start_lsn, prev_lsn, end_lsn, record_type
 from pg_get_wal_record_info('0/19129B0');
 start_lsn | prev_lsn  |  end_lsn  |   record_type
---+---+---+-
 0/19129B0 | 0/1912968 | 0/19129D0 | CHECKPOINT_REDO
(1 row)

This could be used to cross-check that the first record replayed is of
the correct type.  The commit message of this patch tells that "the
checkpoint-redo location is set at LSN of this record", which implies
the start LSN of the record tracked as the redo LSN, not the end of
it?  What's the intention here?
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add function to_oct

2023-08-16 Thread John Naylor
On Wed, Aug 16, 2023 at 9:24 PM Nathan Bossart 
wrote:
>
> On Wed, Aug 16, 2023 at 10:35:27AM +0700, John Naylor wrote:

> > Now I'm struggling to understand why each and every instance has its own
> > nominal buffer, passed down to the implementation. All we care about is
the
> > result -- is there some reason not to confine the buffer declaration to
the
> > general implementation?
>
> We can do that if we use a static variable, which is what I've done in v6.

That makes it a lexically-scoped global variable, which we don't need
either. Can we have the internal function allocate on the stack, then
call cstring_to_text() on that, returning the text result? That does its
own palloc.

Or maybe better, save the starting pointer, compute the length at the end,
and call cstring_to_text_with_len()?  (It seems we wouldn't need
the nul-terminator then, either.)

--
John Naylor
EDB: http://www.enterprisedb.com


Re: PG 16 draft release notes ready

2023-08-16 Thread Pavel Luzanov

On 17.08.2023 05:36, Bruce Momjian wrote:

On Wed, Aug  9, 2023 at 08:35:21PM -0400, Bruce Momjian wrote:

On Sat, Aug  5, 2023 at 04:08:47PM -0700, Noah Misch wrote:

Author: Robert Haas 
2022-08-25 [e3ce2de09] Allow grant-level control of role inheritance behavior.
-->



Allow GRANT to control role inheritance behavior (Robert Haas)



By default, role inheritance is controlled by the inheritance status of the 
member role.  The new GRANT clauses WITH INHERIT and WITH ADMIN can now 
override this.







Allow roles that create other roles to automatically inherit the new role's 
rights or SET ROLE to the new role (Robert Haas, Shi Yu)



This is controlled by server variable createrole_self_grant.



Similarly, v16 radically changes the CREATE ROLE ... WITH INHERIT clause.  The
clause used to "change the behavior of already-existing grants."  Let's merge
these two and move the combination to the incompatibilities section.

I need help with this.  I don't understand how they can be combined, and
I don't understand the incompatibility text in commit e3ce2de09d:

 If a GRANT does not specify WITH INHERIT, the behavior based on
 whether the member role is marked INHERIT or NOINHERIT. This means
 that if all roles are marked INHERIT or NOINHERIT before any role
 grants are performed, the behavior is identical to what we had before;
 otherwise, it's different, because ALTER ROLE [NO]INHERIT now only
 changes the default behavior of future grants, and has no effect on
 existing ones.

I am waiting for an answer to this question, or can I assume the release
notes are acceptable?


I can try to explain how I understand it myself.

In v15 and early, inheritance of granted to role privileges depends on 
INHERIT attribute of a role:


create user alice;
grant pg_read_all_settings to alice;

By default privileges inherited:
\c - alice
show data_directory;
   data_directory
-
 /var/lib/postgresql/15/main
(1 row)

After disabling the INHERIT attribute, privileges are not inherited:

\c - postgres
alter role alice noinherit;

\c - alice
show data_directory;
ERROR:  must be superuser or have privileges of pg_read_all_settings to 
examine "data_directory"


In v16 changing INHERIT attribute on alice role doesn't change 
inheritance behavior of already granted roles.
If we repeat the example, Alice still inherits pg_read_all_settings 
privileges after disabling the INHERIT attribute for the role.


Information for making decisions about role inheritance has been moved 
from the role attribute to GRANT role TO role [WITH INHERIT|NOINHERIT] 
command and can be viewed by the new \drg command:


\drg
    List of role grants
 Role name |  Member of   |   Options    | Grantor
---+--+--+--
 alice | pg_read_all_settings | INHERIT, SET | postgres
(1 row)

Changing the INHERIT attribute for a role now will affect (as the 
default value) only future GRANT commands without an INHERIT clause.


--
Pavel Luzanov
Postgres Professional: https://postgrespro.com





Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-16 Thread Amit Kapila
On Wed, Aug 16, 2023 at 3:55 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > > > It was primarily for upgrade purposes only. So, as we can't see a good 
> > > > reason
> > to
> > > > go via pg_dump let's do it in upgrade unless someone thinks otherwise.
> > >
> > > Removed the new option in pg_dump and modified the pg_upgrade
> > > directly use the slot info to restore the slot in new cluster.
> >
> > In this version, creations of logical slots are serialized, whereas old 
> > ones were
> > parallelised per db. Do you it should be parallelized again? I have tested 
> > locally
> > and felt harmless. Also, this approch allows to log the executed SQLs.
>
> I updated the patch to allow parallel executions. Workers are launched per 
> slots,
> each one connects to the new node via psql and executes 
> pg_create_logical_replication_slot().
>

Will it be beneficial for slots? Invoking a separate process each time
could be more costlier than slot creation. The other thing is during
slot creation, the snapbuild waits for parallel transactions to finish
so that can also hurt the patch. I think we can test it by having 50,
100, or 500 slots on the old cluster and see if doing parallel
execution for the creation of those on the new cluster has any benefit
over serial execution.

> Moreover, following points were changed for 0002.
>
> * Ensured to log executed SQLs for creating slots.
> * Fixed an issue that 'unreserved' slots could not be upgrade. This change was
>   not expected one. Related discussion was [1].
> * Added checks for output plugin libraries. pg_upgrade ensures that plugins
>   referred by old slots were installed to the new executable directory.
>

I think this is a good idea but did you test it with out-of-core
plugins, if so, can you please share the results? Also, let's update
this information in docs as well.

Few minor comments
1. Why the patch updates the slots info at the end of
create_logical_replication_slots()? Can you please update the comments
for the same?

2.
@@ -36,6 +36,7 @@ generate_old_dump(void)
  {
  char sql_file_name[MAXPGPATH],
  log_file_name[MAXPGPATH];
+
  DbInfo*old_db = &old_cluster.dbarr.dbs[dbnum];

Spurious line change.

-- 
With Regards,
Amit Kapila.




Fix an entry in wait_event_names.txt

2023-08-16 Thread Drouvot, Bertrand

Hi hackers,

While working on [1] it has been noticed by Masahiro-san that the description 
field
in the new pg_wait_event view contains 2 blanks for one row.

It turns out that it comes from wait_event_names.txt (added in fa88928).

Attached a tiny patch to fix this entry in wait_event_names.txt (I did check 
that no
other entries are in the same case).

[1]: 
https://www.postgresql.org/message-id/735fbd560ae914c96faaa23cc8d9a118%40oss.nttdata.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 8ed089064bc92f0e721ecff93fdb40cbb25c310e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Thu, 17 Aug 2023 04:04:44 +
Subject: [PATCH v1] Fix incorrect entry in wait_event_names.txt

fa88928 has introduced wait_event_names.txt, and one of its entries had
some documentation fields with two blanks.
---
 src/backend/utils/activity/wait_event_names.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/wait_event_names.txt 
b/src/backend/utils/activity/wait_event_names.txt
index f9e01e33b1..4d74f0068e 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -332,7 +332,7 @@ WAIT_EVENT_DOCONLY  ReplicationOriginState  "Waiting to 
read or update the progres
 WAIT_EVENT_DOCONLY ReplicationSlotIO   "Waiting for I/O on a 
replication slot."
 WAIT_EVENT_DOCONLY LockFastPath"Waiting to read or update a process' 
fast-path lock information."
 WAIT_EVENT_DOCONLY BufferMapping   "Waiting to associate a data block with 
a buffer in the buffer pool."
-WAIT_EVENT_DOCONLY LockManager "Waiting to read or update information  
about heavyweight locks."
+WAIT_EVENT_DOCONLY LockManager "Waiting to read or update information 
about heavyweight locks."
 WAIT_EVENT_DOCONLY PredicateLockManager"Waiting to access predicate 
lock information used by serializable transactions."
 WAIT_EVENT_DOCONLY ParallelHashJoin"Waiting to synchronize workers 
during Parallel Hash Join plan execution."
 WAIT_EVENT_DOCONLY ParallelQueryDSA"Waiting for parallel query 
dynamic shared memory allocation."
-- 
2.34.1



Re: WIP: new system catalog pg_wait_event

2023-08-16 Thread Drouvot, Bertrand

Hi,

On 8/16/23 2:08 PM, Michael Paquier wrote:

On Wed, Aug 16, 2023 at 01:43:35PM +0200, Drouvot, Bertrand wrote:

Yeah, agree, done that way in v6 (also added a test in 001_worker_spi.pl
to ensure that "worker_spi_main" is reported in pg_wait_event).


-typedef struct WaitEventExtensionEntryByName
-{
-   charwait_event_name[NAMEDATALEN];   /* hash key */
-   uint16  event_id;   /* wait event ID */
-} WaitEventExtensionEntryByName;

I'd rather keep all these structures local to wait_event.c, as these
cover the internals of the hash tables.

Could you switch GetWaitEventExtensionEntries() so as it returns a
list of strings or an array of char*?  We probably can live with the
list for that.



Yeah, I was not sure about this (returning a list of 
WaitEventExtensionEntryByName
or a list of wait event names) while working on v6.

That's true that the only need here is to get the names of the custom wait 
events.
Returning only the names would allow us to move the 
WaitEventExtensionEntryByName definition back
to the wait_event.c file.

It makes sense to me, done in v7 attached and renamed the function to 
GetWaitEventExtensionNames().


+   charbuf[NAMEDATALEN + 44]; //"Custom wait event \"%Name%\" defined 
by extension" +
Incorrect comment.  This would be simpler as a StringInfo.


Yeah and probably less error prone: done in v7.

While at it, v7 is deliberately not calling "pfree(waiteventnames)" and 
"resetStringInfo(&buf)" in
pg_get_wait_events(): reason is that they are palloc in a short-lived memory 
context while executing
pg_get_wait_events().

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 43baf166f14aeca4ef0979d93e9ed34fdc323a09 Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Sat, 5 Aug 2023 12:39:42 +
Subject: [PATCH v7] Add catalog pg_wait_event

Adding a new system view, namely pg_wait_event, that describes the wait
events.
---
 doc/src/sgml/system-views.sgml| 64 +
 src/backend/Makefile  |  3 +-
 src/backend/catalog/system_views.sql  |  3 +
 src/backend/utils/activity/.gitignore |  1 +
 src/backend/utils/activity/Makefile   |  8 +-
 .../activity/generate-wait_event_types.pl | 54 ++-
 src/backend/utils/activity/meson.build|  1 +
 src/backend/utils/activity/wait_event.c   | 44 +
 src/backend/utils/activity/wait_event_funcs.c | 93 +++
 src/include/catalog/pg_proc.dat   |  6 ++
 src/include/utils/meson.build |  4 +-
 src/include/utils/wait_event.h|  1 +
 .../modules/worker_spi/t/001_worker_spi.pl|  6 ++
 src/test/regress/expected/rules.out   |  4 +
 src/test/regress/expected/sysviews.out| 16 
 src/test/regress/sql/sysviews.sql |  4 +
 src/tools/msvc/Solution.pm|  3 +-
 src/tools/msvc/clean.bat  |  1 +
 18 files changed, 306 insertions(+), 10 deletions(-)
  16.3% doc/src/sgml/
  63.1% src/backend/utils/activity/
   3.7% src/include/catalog/
   3.0% src/test/modules/worker_spi/t/
   4.8% src/test/regress/expected/
   3.2% src/tools/msvc/
   5.5% src/

diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml
index 57b228076e..b2ed309fe2 100644
--- a/doc/src/sgml/system-views.sgml
+++ b/doc/src/sgml/system-views.sgml
@@ -221,6 +221,11 @@
   views
  
 
+ 
+  pg_wait_event
+  wait events
+ 
+
 

   
@@ -4825,4 +4830,63 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
  
 
+
+ 
+  pg_wait_event
+
+  
+   pg_wait_event
+  
+
+  
+   The view pg_wait_event provides description about 
the
+   wait events.
+  
+
+  
+   pg_wait_event Columns
+   
+
+ 
+  
+   Column Type
+  
+  
+   Description
+  
+ 
+
+
+
+ 
+  
+   type text
+  
+  
+   Wait event type
+  
+ 
+
+ 
+  
+   name text
+  
+  
+   Wait event name
+  
+ 
+
+ 
+  
+   description text
+  
+  
+   Wait event description
+  
+ 
+
+   
+  
+ 
+
 
diff --git a/src/backend/Makefile b/src/backend/Makefile
index 1c929383c4..3e275ac759 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -134,7 +134,7 @@ storage/lmgr/lwlocknames.h: 
storage/lmgr/generate-lwlocknames.pl storage/lmgr/lw
$(MAKE) -C storage/lmgr lwlocknames.h lwlocknames.c
 
 utils/activity/wait_event_types.h: utils/activity/generate-wait_event_types.pl 
utils/activity/wait_event_names.txt
-   $(MAKE) -C utils/activity wait_event_types.h pgstat_wait_event.c
+   $(MAKE) -C utils/activity wait_event_types.h pgstat_wait_event.c 
wait_event_funcs_data.c
 
 # run this unconditionally to avoid needing to know its dependencies here:
 submake-catalog-headers:
@@ -311,6 +311,7 @@ maintainer-clean: distclean
 

Re: WIP: new system catalog pg_wait_event

2023-08-16 Thread Drouvot, Bertrand

Hi,

On 8/17/23 3:53 AM, Masahiro Ikeda wrote:

Hi,

Thank you for creating the patch!
I think it is a very useful view as a user.

I will share some thoughts about the v6 patch.



Thanks for looking at it!


1)

The regular expression needs to be changed in generate-wait_event_types.pl.
I have compared the documentation with the output of the pg_wait_event
view and found the following differences.

* For parameter names, the substitution for underscore is missing.
-ArchiveCleanupCommand Waiting for archive_cleanup_command to complete
-ArchiveCommand Waiting for archive_command to complete
+ArchiveCleanupCommand Waiting for archive-cleanup-command to complete
+ArchiveCommand Waiting for archive-command to complete
-RecoveryEndCommand Waiting for recovery_end_command to complete
+RecoveryEndCommand Waiting for recovery-end-command to complete
-RestoreCommand Waiting for restore_command to complete
+RestoreCommand Waiting for restore-command to complete



Yeah, nice catch. v7 just shared up-thread replace "-" by "_"
for such a case. But I'm not sure the pg_wait_event description field needs
to be 100% aligned with the documentation: wouldn't be better to replace
"-" by " " in such cases in pg_wait_event?


* The HTML tag match is not shortest match.
-frozenid Waiting to update pg_database.datfrozenxid and pg_database.datminmxid
+frozenid Waiting to update datminmxid



Nice catch, fixed in v7.


* There are two blanks before "about".


This is coming from wait_event_names.txt, thanks for pointing out.
I just submitted a patch [1] to fix that.


Also " for heavy weight is
   removed (is it intended?)
-LockManager Waiting to read or update information about "heavyweight" locks
+LockManager Waiting to read or update information  about heavyweight locks



Not intended, fixed in v7.


* Do we need "worker_spi_main" in the description? The name column
   shows the same one, so it could be omitted.

pg_wait_event
worker_spi_main Custom wait event "worker_spi_main" defined by extension




Do you mean remove the wait event name from the description in case of custom
extension wait events? I'd prefer to keep it, if not the descriptions would be
all the same for custom wait events.


2)

Would it be better to add "extension" meaning unassigned?


Extensions can add Extension and LWLock types to the list shown in Table 28.8 
and Table 28.12. In some cases, the name of LWLock assigned by an extension 
will not be available in all server processes; It might be reported as just 
“extension” rather than the extension-assigned name.




Yeah, could make sense but I think that should be a dedicated patch for the 
documentation.


3)

Would index == els be better for the following Assert?
+    Assert(index <= els);



Agree, done in v7.


4)

There is a typo. alll -> all
+    /* Allocate enough space for alll entries */



Thanks! Fixed in v7.

[1]: 
https://www.postgresql.org/message-id/dd836027-2e9e-4df9-9fd9-7527cd1757e1%40gmail.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: WIP: new system catalog pg_wait_event

2023-08-16 Thread Drouvot, Bertrand

Hi,

On 8/17/23 3:57 AM, Michael Paquier wrote:

On Thu, Aug 17, 2023 at 10:53:02AM +0900, Masahiro Ikeda wrote:

BTW, although I think this is outside the scope of this patch,
it might be a good idea to be able to add a description to the
API for custom wait events.


Somebody on twitter has raised this point.


I think I know him ;-)


 I am not sure that we need
to go down to that for the sake of this view, but I'm OK to..
Disagree and Commit to any consensus reached on this matter.


Yeah, I changed my mind of this. I think it's better to keep the
"control" about what is displayed in the description field for
this new system view.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Synchronizing slots from primary to standby

2023-08-16 Thread Drouvot, Bertrand

Hi,

On 8/14/23 11:52 AM, shveta malik wrote:



We (myself and Ajin) performed the tests to compute the lag in standby
slots as compared to primary slots with different number of slot-sync
workers configured.



Thanks!


3 DBs were created, each with 30 tables and each table having one
logical-pub/sub configured. So this made a total of 90 logical
replication slots to be synced. Then the workload was run for aprox 10
mins. During this workload, at regular intervals, primary and standby
slots' lsns were captured (from pg_replication_slots) and compared. At
each capture, the intent was to know how much is each standby's slot
lagging behind corresponding primary's slot by taking the distance
between confirmed_flush_lsn of primary and standby slot. Then we took
the average (integer value) of this distance over the span of 10 min
workload 


Thanks for the explanations, make sense to me.


and this is what we got:

With max_slot_sync_workers=1, average-lag =  42290.3563
With max_slot_sync_workers=2, average-lag =  24585.1421
With max_slot_sync_workers=3, average-lag =  14964.9215

This shows that more workers have better chances to keep logical
replication slots in sync for this case.



Agree.


Another statistics if it interests you is, we ran a frequency test as
well (this by changing code, unit test sort of) to figure out the
'total number of times synchronization done' with different number of
sync-slots workers configured. Same 3 DBs setup with each DB having 30
logical replication slots. With 'max_slot_sync_workers' set at 1, 2
and 3; total number of times synchronization done was 15874, 20205 and
23414 respectively. Note: this is not on the same machine where we
captured lsn-gap data, it is on  a little less efficient machine but
gives almost the same picture

Next we are planning to capture this data for a lesser number of slots
like 10,30,50 etc. It may happen that the benefit of multi-workers
over single workers in such cases could be less, but let's have the
data to verify that.



Thanks a lot for those numbers and for the testing!

Do you think it would make sense to also get the number of using
the pg_failover_slots module? (and compare the pg_failover_slots numbers with 
the
"one worker" case here). Idea is to check if the patch does introduce
some overhead as compare to pg_failover_slots.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-16 Thread Amit Kapila
On Wed, Aug 16, 2023 at 4:51 PM Zhijie Hou (Fujitsu)
 wrote:
>
> 4.
> +intnum_slots_on_old_cluster;
>
> Instead of a new global variable, would it be better to record this in the 
> cluster info ?
>

I was thinking whether we can go a step ahead and remove this variable
altogether. In old cluster handling, we can get and check together at
the same place and for the new cluster, if we have a function that
returns slot_count by traversing old clusterinfo that should be
sufficient. If you have other better ideas to eliminate this variable
that is also fine. I think this will make the patch bit clean w.r.t
this new variable.

-- 
With Regards,
Amit Kapila.




Re: Fix an entry in wait_event_names.txt

2023-08-16 Thread Masahiro Ikeda

On 2023-08-17 14:49, Drouvot, Bertrand wrote:

Hi hackers,

While working on [1] it has been noticed by Masahiro-san that the
description field
in the new pg_wait_event view contains 2 blanks for one row.

It turns out that it comes from wait_event_names.txt (added in 
fa88928).


Attached a tiny patch to fix this entry in wait_event_names.txt (I did
check that no
other entries are in the same case).

[1]:
https://www.postgresql.org/message-id/735fbd560ae914c96faaa23cc8d9a118%40oss.nttdata.com

Regards,


+1. Thanks!

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: Synchronizing slots from primary to standby

2023-08-16 Thread shveta malik
On Thu, Aug 17, 2023 at 11:44 AM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 8/14/23 11:52 AM, shveta malik wrote:
>
> >
> > We (myself and Ajin) performed the tests to compute the lag in standby
> > slots as compared to primary slots with different number of slot-sync
> > workers configured.
> >
>
> Thanks!
>
> > 3 DBs were created, each with 30 tables and each table having one
> > logical-pub/sub configured. So this made a total of 90 logical
> > replication slots to be synced. Then the workload was run for aprox 10
> > mins. During this workload, at regular intervals, primary and standby
> > slots' lsns were captured (from pg_replication_slots) and compared. At
> > each capture, the intent was to know how much is each standby's slot
> > lagging behind corresponding primary's slot by taking the distance
> > between confirmed_flush_lsn of primary and standby slot. Then we took
> > the average (integer value) of this distance over the span of 10 min
> > workload
>
> Thanks for the explanations, make sense to me.
>
> > and this is what we got:
> >
> > With max_slot_sync_workers=1, average-lag =  42290.3563
> > With max_slot_sync_workers=2, average-lag =  24585.1421
> > With max_slot_sync_workers=3, average-lag =  14964.9215
> >
> > This shows that more workers have better chances to keep logical
> > replication slots in sync for this case.
> >
>
> Agree.
>
> > Another statistics if it interests you is, we ran a frequency test as
> > well (this by changing code, unit test sort of) to figure out the
> > 'total number of times synchronization done' with different number of
> > sync-slots workers configured. Same 3 DBs setup with each DB having 30
> > logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> > and 3; total number of times synchronization done was 15874, 20205 and
> > 23414 respectively. Note: this is not on the same machine where we
> > captured lsn-gap data, it is on  a little less efficient machine but
> > gives almost the same picture
> >
> > Next we are planning to capture this data for a lesser number of slots
> > like 10,30,50 etc. It may happen that the benefit of multi-workers
> > over single workers in such cases could be less, but let's have the
> > data to verify that.
> >
>
> Thanks a lot for those numbers and for the testing!
>
> Do you think it would make sense to also get the number of using
> the pg_failover_slots module? (and compare the pg_failover_slots numbers with 
> the
> "one worker" case here). Idea is to check if the patch does introduce
> some overhead as compare to pg_failover_slots.
>

Yes, definitely. We will work on that and share the numbers soon.

thanks
Shveta