Re: Rethink the wait event names for postgres_fdw, dblink and etc

2023-08-17 Thread Michael Paquier
On Fri, Aug 18, 2023 at 12:27:02PM +0900, Masahiro Ikeda wrote:
> I expect that no one will object to changing the names to appropriate
> ones. But, we need to discuss that naming convention, the names themselves,
> document descriptions and so on.
> 
> I made the v1 patch
> * CamelCase naming convention

Not sure how others feel about that, but I am OK with camel-case style
for the modules.

> I haven't added document descriptions for pg_prewarm and test modules.
> The reason is that the wait event of autoprewarm is not shown on
> pg_stat_activity. It's not an auxiliary-process and doesn't connect to
> a database, so pgstat_bestart() isn't called.

Perhaps we could just leave it out, then, adding a comment instead.

> Feedback is always welcome and appreciated.

+   /* first time, allocate or get the custom wait event */
+   if (wait_event_info == 0)
+   wait_event_info = 
WaitEventExtensionNew("DblinkConnect");
[...]
+   /* first time, allocate or get the custom wait event */
+   if (wait_event_info == 0)
+   wait_event_info = WaitEventExtensionNew("DblinkConnect");

Shouldn't dblink use two different strings?

+   if (wait_event_info == 0)
+   wait_event_info = WaitEventExtensionNew("PgPrewarmDumpDelay");

Same about autoprewarm.c.  The same flag is used in two different code
paths.  If removed from the patch, no need to do that, of course.

+static uint32 wait_event_info_connect = 0;
+static uint32 wait_event_info_receive = 0;
+static uint32 wait_event_info_cleanup_receive = 0;

Perhaps such variables could be named with shorter names proper to
each module, like pgfdw_we_receive, etc.

+   dblink could show the following wait event under the 
wait 

s/could show/can report/?

+  Waiting for same reason as PostgresFdwReceive, except 
that it's only for
+  abort.

"Waiting for transaction abort on remote server"?
---
Michael


signature.asc
Description: PGP signature


Re: logical decoding and replication of sequences, take 2

2023-08-17 Thread Amit Kapila
On Thu, Aug 17, 2023 at 7:13 PM Ashutosh Bapat
 wrote:
>
> On Wed, Aug 16, 2023 at 7:56 PM Tomas Vondra
>  wrote:
> >
> > >
> > > But whether or not that's the case, downstream should not request (and
> > > hence receive) any changes that have been already applied (and
> > > committed) downstream as a principle. I think a way to achieve this is
> > > to update the replorigin_session_origin_lsn so that a sequence change
> > > applied once is not requested (and hence sent) again.
> > >
> >
> > I guess we could update the origin, per attached 0004. We don't have
> > timestamp to set replorigin_session_origin_timestamp, but it seems we
> > don't need that.
> >
> > The attached patch merges the earlier improvements, except for the part
> > that experimented with adding a "fake" transaction (which turned out to
> > have a number of difficult issues).
>
> 0004 looks good to me.


+ {
  CommitTransactionCommand();
+
+ /*
+ * Update origin state so we don't try applying this sequence
+ * change in case of crash.
+ *
+ * XXX We don't have replorigin_session_origin_timestamp, but we
+ * can just leave that set to 0.
+ */
+ replorigin_session_origin_lsn = seq.lsn;

IIUC, your proposal is to update the replorigin_session_origin_lsn, so
that after restart, it doesn't use some prior origin LSN to start with
which can in turn lead the sequence to go backward. If so, it should
be updated before calling CommitTransactionCommand() as we are doing
in apply_handle_commit_internal(). If that is not the intention then
it is not clear to me how updating replorigin_session_origin_lsn after
commit is helpful.

>
 But I need to review the impact of not setting
> replorigin_session_origin_timestamp.
>

This may not have a direct impact on built-in replication as I think
we don't rely on it yet but we need to think of out-of-core solutions.
I am not sure if I understood your proposal as per my previous comment
but once you clarify the same, I'll also try to think on the same.

-- 
With Regards,
Amit Kapila.




Re: Extract numeric filed in JSONB more effectively

2023-08-17 Thread jian he
On Fri, Aug 18, 2023 at 10:55 AM Chapman Flack  wrote:
>
>
> Again, all of that complication stems from the choice to use the
> anyelement return type and rely on polymorphic type resolution
> to figure the oid out, when we already have the oid to begin with
> and the oid is all we want.
>

you want jsonb_object_field_type(internal, jsonb, text)? because on
sql level, it's safe.

The return data type is determined when we are in jsonb_cast_support.
we just need to pass the {return data type} information to the next
function: jsonb_object_field_type.




Re: New WAL record to detect the checkpoint redo location

2023-08-17 Thread Dilip Kumar
On Fri, Aug 18, 2023 at 5:24 AM Michael Paquier  wrote:
>
> On Thu, Aug 17, 2023 at 01:11:50PM +0530, Dilip Kumar wrote:
> > Yeah right, actually I was confused, I assumed it will return the
> > start LSN of the record.  And I do not see any easy way to identify
> > the Start LSN of this record so maybe this solution will not work.  I
> > will have to think of something else.  Thanks for pointing it out.
>
> About that.  One thing to consider may be ReserveXLogInsertLocation()
> while holding the WAL insert lock, but you can just rely on
> ProcLastRecPtr for the job after inserting the REDO record, no?

Yeah right, we can use ProcLastRecPtr.  I will send the updated patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-17 Thread Peter Smith
Here are some review comments for the patch v21-0003

==
Commit message

1.
pg_upgrade fails if the old node has slots which status is 'lost' or they do not
consume all WAL records. These are needed for prevent the data loss.

~

Maybe some minor brush-up like:

SUGGESTION
In order to prevent data loss, pg_upgrade will fail if the old node
has slots with the status 'lost', or with unconsumed WAL records.

==
src/bin/pg_upgrade/check.c

2. check_for_confirmed_flush_lsn

+ /* Check that all logical slots are not in 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE temporary = false AND wal_status = 'lost';");
+
+ ntups = PQntuples(res);
+ i_slotname = PQfnumber(res, "slot_name");
+
+ for (i = 0; i < ntups; i++)
+ {
+ is_error = true;
+
+ pg_log(PG_WARNING,
+"\nWARNING: logical replication slot \"%s\" is obsolete.",
+PQgetvalue(res, i, i_slotname));
+ }
+
+ PQclear(res);
+
+ if (is_error)
+ pg_fatal("logical replication slots not to be in 'lost' state.");
+

2a. (GENERAL)
The above code for checking lost state seems out of place in this
function which is meant for checking confirmed flush lsn.

Maybe you jammed both kinds of logic into one function to save on the
extra PGconn or something but IMO two separate functions would be
better. e.g.
- check_for_lost_slots
- check_for_confirmed_flush_lsn

~

2b.
+ /* Check that all logical slots are not in 'lost' state. */

SUGGESTION
/* Check there are no logical replication slots with a 'lost' state. */

~

2c.
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE temporary = false AND wal_status = 'lost';");

This SQL fragment is very much like others in previous patches. Be
sure to make all the cases and clauses consistent with all those
similar SQL fragments.

~

2d.
+ is_error = true;

That doesn't need to be in the loop. Better to just say:
is_error = (ntups > 0);

~

2e.
There is a mix of terms in the WARNING and in the pg_fatal -- e.g.
"obsolete" versus "lost". Is it OK?

~

2f.
+ pg_fatal("logical replication slots not to be in 'lost' state.");

English? And maybe it should be much more verbose...

"Upgrade of this installation is not allowed because one or more
logical replication slots with a state of 'lost' were detected."

~~~

3. check_for_confirmed_flush_lsn

+ /*
+ * Check that all logical replication slots have reached the latest
+ * checkpoint position (SHUTDOWN_CHECKPOINT record). This checks cannot be
+ * done in case of live_check because the server has not been written the
+ * SHUTDOWN_CHECKPOINT record yet.
+ */
+ if (!live_check)
+ {
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE confirmed_flush_lsn != '%X/%X' AND temporary = false;",
+ old_cluster.controldata.chkpnt_latest_upper,
+ old_cluster.controldata.chkpnt_latest_lower);
+
+ ntups = PQntuples(res);
+ i_slotname = PQfnumber(res, "slot_name");
+
+ for (i = 0; i < ntups; i++)
+ {
+ is_error = true;
+
+ pg_log(PG_WARNING,
+"\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+PQgetvalue(res, i, i_slotname));
+ }
+
+ PQclear(res);
+ PQfinish(conn);
+
+ if (is_error)
+ pg_fatal("All logical replication slots consumed all the WALs.");

~

3a.
/This checks/This check/

~

3b.
I don't think the separation of
chkpnt_latest_upper/chkpnt_latest_lower is needed like this. AFAIK
there is an LSN_FORMAT_ARGS(lsn) macro designed for handling exactly
this kind of parameter substitution.

~

3c.
+ is_error = true;

That doesn't need to be in the loop. Better to just say:
is_error = (ntups > 0);

~

3d.
+ pg_fatal("All logical replication slots consumed all the WALs.");

The message seems backward. shouldn't it say something like:
"Upgrade of this installation is not allowed because one or more
logical replication slots still have unconsumed WAL records."

==
src/bin/pg_upgrade/controldata.c

4. get_control_data

+ /*
+ * Upper and lower part of LSN must be read and stored
+ * separately because it is reported as %X/%X format.
+ */
+ cluster->controldata.chkpnt_latest_upper =
+ strtoul(p, , 16);
+ cluster->controldata.chkpnt_latest_lower =
+ strtoul(++slash, NULL, 16);

I felt that this field separation code is maybe not necessary. Please
refer to other review comments in this post.

==
src/bin/pg_upgrade/pg_upgrade.h

5. ControlData

+
+ uint32 chkpnt_latest_upper;
+ uint32 chkpnt_latest_lower;
 } ControlData;

~

Actually, I did not recognise the reason why this cannot be stored
properly as a single XLogRecPtr field. Please see other review
comments in this post.

==
.../t/003_logical_replication_slots.pl

6. GENERAL

Many of the changes to this file are just renaming the
'old_node'/'new_node' to 'old_publisher'/'new_publisher'.

This seems a basic change not really associated with this patch 0003.
To reduce the code churn, this change should be moved into the earlier
patch where this 

Re: Adding a LogicalRepWorker type field

2023-08-17 Thread shveta malik
On Fri, Aug 18, 2023 at 8:50 AM Amit Kapila  wrote:
>
> On Mon, Aug 14, 2023 at 12:08 PM Peter Smith  wrote:
> >
> > The main patch for adding the worker type enum has been pushed [1].
> >
> > Here is the remaining (rebased) patch for changing some previous
> > cascading if/else to switch on the LogicalRepWorkerType enum instead.
> >
>
> I see this as being useful if we plan to add more worker types. Does
> anyone else see this remaining patch as an improvement?
>

I feel it does give a tad bit more clarity for cases where we have
'else' part with no clear comments or relevant keywords. As an
example, in function 'should_apply_changes_for_rel' , we have:
else
return (rel->state == SUBREL_STATE_READY ||
(rel->state == SUBREL_STATE_SYNCDONE &&
 rel->statelsn <= remote_final_lsn));

It is difficult to figure out which worker is this if I do not know
the concept completely;   'case WORKERTYPE_APPLY' makes it better for
the reader to understand.

thanks
Shveta




Rethink the wait event names for postgres_fdw, dblink and etc

2023-08-17 Thread Masahiro Ikeda

Hi,

Recently, the API to define custom wait events for extension is 
supported.

* Change custom wait events to use dynamic shared hash tables(af720b4c5)

So, I'd like to rethink the wait event names for modules which use
"WAIT_EVENT_EXTENSION" wait events.
* postgres_fdw
* dblink
* pg_prewarm
* test_shm_mq
* worker_spi

I expect that no one will object to changing the names to appropriate
ones. But, we need to discuss that naming convention, the names 
themselves,

document descriptions and so on.

I made the v1 patch
* CamelCase naming convention
* Add document descriptions for each module

I haven't added document descriptions for pg_prewarm and test modules.
The reason is that the wait event of autoprewarm is not shown on
pg_stat_activity. It's not an auxiliary-process and doesn't connect to
a database, so pgstat_bestart() isn't be called.

Feedback is always welcome and appreciated.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 73c4c6562509465bea75a9bbd273298bdf0ee85e Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Fri, 18 Aug 2023 11:38:05 +0900
Subject: [PATCH] Make to use custom wait events for modules

---
 contrib/dblink/dblink.c   | 15 +-
 contrib/pg_prewarm/autoprewarm.c  | 13 -
 contrib/postgres_fdw/connection.c | 23 ++--
 doc/src/sgml/dblink.sgml  | 26 +
 doc/src/sgml/postgres-fdw.sgml| 53 +++
 doc/src/sgml/xfunc.sgml   |  6 +--
 src/test/modules/test_shm_mq/setup.c  |  9 +++-
 src/test/modules/test_shm_mq/test.c   |  9 +++-
 .../modules/worker_spi/t/001_worker_spi.pl|  8 +--
 src/test/modules/worker_spi/worker_spi.c  |  2 +-
 10 files changed, 147 insertions(+), 17 deletions(-)

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 41e1f6c91d..26fcd093c7 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -130,6 +130,9 @@ static void restoreLocalGucs(int nestlevel);
 static remoteConn *pconn = NULL;
 static HTAB *remoteConnHash = NULL;
 
+/* value cached, fetched from shared memory */
+static uint32 wait_event_info = 0;
+
 /*
  *	Following is list that holds multiple remote connections.
  *	Calling convention of each dblink function changes to accept
@@ -202,8 +205,12 @@ dblink_get_conn(char *conname_or_str,
 			connstr = conname_or_str;
 		dblink_connstr_check(connstr);
 
+		/* first time, allocate or get the custom wait event */
+		if (wait_event_info == 0)
+			wait_event_info = WaitEventExtensionNew("DblinkConnect");
+
 		/* OK to make connection */
-		conn = libpqsrv_connect(connstr, WAIT_EVENT_EXTENSION);
+		conn = libpqsrv_connect(connstr, wait_event_info);
 
 		if (PQstatus(conn) == CONNECTION_BAD)
 		{
@@ -292,8 +299,12 @@ dblink_connect(PG_FUNCTION_ARGS)
 	/* check password in connection string if not superuser */
 	dblink_connstr_check(connstr);
 
+	/* first time, allocate or get the custom wait event */
+	if (wait_event_info == 0)
+		wait_event_info = WaitEventExtensionNew("DblinkConnect");
+
 	/* OK to make connection */
-	conn = libpqsrv_connect(connstr, WAIT_EVENT_EXTENSION);
+	conn = libpqsrv_connect(connstr, wait_event_info);
 
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index d0efc9e524..8c2581f457 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -104,6 +104,7 @@ static AutoPrewarmSharedState *apw_state = NULL;
 /* GUC variables. */
 static bool autoprewarm = true; /* start worker? */
 static int	autoprewarm_interval = 300; /* dump interval */
+static uint32 wait_event_info = 0; /* value cached, fetched from shared memory */
 
 /*
  * Module load callback.
@@ -231,13 +232,21 @@ autoprewarm_main(Datum main_arg)
 			ProcessConfigFile(PGC_SIGHUP);
 		}
 
+		/*
+		 * allocate or get the custom wait event though it will not be shown
+		 * on pg_stat_activity because the autoprewarm worker doesn't connect
+		 * a database.
+		 */
+		if (wait_event_info == 0)
+			wait_event_info = WaitEventExtensionNew("PgPrewarmDumpDelay");
+
 		if (autoprewarm_interval <= 0)
 		{
 			/* We're only dumping at shutdown, so just wait forever. */
 			(void) WaitLatch(MyLatch,
 			 WL_LATCH_SET | WL_EXIT_ON_PM_DEATH,
 			 -1L,
-			 WAIT_EVENT_EXTENSION);
+			wait_event_info);
 		}
 		else
 		{
@@ -264,7 +273,7 @@ autoprewarm_main(Datum main_arg)
 			(void) WaitLatch(MyLatch,
 			 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
 			 delay_in_ms,
-			 WAIT_EVENT_EXTENSION);
+			 wait_event_info);
 		}
 
 		/* Reset the latch, loop. */
diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index 7e12b722ec..c88624763d 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -83,6 +83,11 @@ static unsigned int prep_stmt_number = 0;
 /* tracks whether any work is needed in callback functions 

Re: Adding a LogicalRepWorker type field

2023-08-17 Thread Amit Kapila
On Mon, Aug 14, 2023 at 12:08 PM Peter Smith  wrote:
>
> The main patch for adding the worker type enum has been pushed [1].
>
> Here is the remaining (rebased) patch for changing some previous
> cascading if/else to switch on the LogicalRepWorkerType enum instead.
>

I see this as being useful if we plan to add more worker types. Does
anyone else see this remaining patch as an improvement?

-- 
With Regards,
Amit Kapila.




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-17 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> > I have not used other plugins, but forcibly renamed the shared object file.
> > I would test by plugins like wal2json[1] if more cases are needed.
> >
> > 1. created logical replication slots on old node
> >   SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding')
> > 2. stopped the old nde
> > 3. forcibly renamed the so file. I used following script:
> >   sudo mv /path/to/test_decoding.so /path/to//test\"_decoding.so
> > 4. executed pg_upgrade and failed. Outputs what I got were:
> >
> > ```
> > Checking for presence of required libraries fatal
> >
> 
> Your test sounds reasonable but there is no harm in testing wal2json
> or some other plugin just to mimic the actual production scenario.
> Additionally, it would give us better coverage for the patch by
> testing out-of-core plugins for some other tests as well.

I've tested by using wal2json, decoder_raw[1], and my small decoder. The 
results were
the same: pg_upgrade correctly raised an ERROR. Following demo shows the case 
for wal2json.

In this test, the plugin was installed only on the old node and a slot was 
created.
Below shows the created slot:

```
(Old)=# SELECT slot_name, plugin FROM pg_replication_slots
slot_name |  plugin  
---+--
 test  | wal2json
(1 row)
```

And I confirmed that the plugin worked well via pg_logical_slot_get_changes()
(This was needed to move forward the confirmed_flush_lsn)

```
(Old)=# INSERT INTO foo VALUES (1)
INSERT 0 1
(Old)=# SELECT * FROM pg_logical_slot_get_changes('test', NULL, NULL);
   lsn| xid |   
data  
 
--+-+-
-
 0/63C8A8 | 731 | 
{"change":[{"kind":"insert","schema":"public","table":"foo","columnnames":["id"],"columntypes":["integer"],"
columnvalues":[1]}]}
(1 row)
```

Then the pg_upgrade was executed but failed, same as the previous example.

```
Checking for presence of required libraries fatal

Your installation references loadable libraries that are missing from the
new installation.  You can add these libraries to the new installation,
or remove the functions using them from the old installation.  A list of
problem libraries is in the file:
data_N3/pg_upgrade_output.d/20230818T030006.675/loadable_libraries.txt
Failure, exiting
```

In the loadable_libraries.txt, it mentioned that wal2json was not installed to 
new directory.

```
could not load library "wal2json": ERROR:  could not access file "wal2json": No 
such file or directory
In database: postgres
```

Note that upgrade was done if the plugin was installed to new binary too.

Acknowledgement: Thank you Michael and Euler for creating great plugins!

[1]: https://github.com/michaelpq/pg_plugins

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Extract numeric filed in JSONB more effectively

2023-08-17 Thread Chapman Flack

On 2023-08-17 21:14, Andy Fan wrote:

The idea of an 'internal' return type with no 'internal' parameter
was quickly and rightly shot down.


Yes, it mainly breaks the type-safety system.  Parser need to know
the result type, so PG defines the rule like this:


Well, the reason "internal return type with no internal parameter type"
was shot down was more specific: if there is such a function in the
catalog, an SQL user can call it, and then its return type is a value
typed 'internal', and with that the SQL user could call other
functions with 'internal' parameters, and that's what breaks type
safety. The specific problem is not having at least one 'internal'
input parameter.

There are lots of functions in the catalog with internal return type
(I count 111). They are not inherently bad; the rule is simply that
each one also needs at least one IN parameter typed internal, to
make sure it can't be directly called from SQL.


anyelement fn(anyment in);

if the exprType(in) in the query tree is X, then PG would think fn
return type X.  that's why we have to have an anyelement in the
input.


That's a consequence of the choice to have anyelement as the return
type, though. A different choice wouldn't have that consequence.


I have some trouble understanding this.  are you saying something
like:

internal fn(internal jsonValue,  internal typeOid)?

If so, would it break the type-safety system?


That is what I'm saying, and it doesn't break type safety at the
SQL level, because as long as it has parameters declared internal,
no SQL can ever call it. So it can only appear in an expression
tree because your SupportRequestSimplify put it there properly
typed, after the SQL query was parsed but before evaluation.

The thing about 'internal' is it doesn't represent any specific
type, it doesn't necessarily represent the same type every time it
is mentioned, and it often means something that isn't a cataloged
type at all, such as a pointer to some kind of struct. There must be
documentation explaining what it has to be. For example, your
jsonb_cast_support function has an 'internal' parameter and
'internal' return type. From the specification for support
functions, you know the 'internal' for the parameter type means
"one of the Node structs in supportnodes.h", and the 'internal'
for the return type means "an expression tree semantically
equivalent to the FuncExpr".

So, in addition to declaring
internal fn(internal jsonValue,  internal typeOid), you would
have to write a clear spec that jsonValue has to be a JsonbValue,
typeOid has to be something you can call DatumGetObjectId on,
and the return value should be a Datum in proper form
corresponding to typeOid. And, of course, generate the expression
tree so all of that is true when it's evaluated.


Perhaps there are parts of that rewriting that no existing node type
can represent?


The description above was in broad strokes. Because I haven't
tried to implement this, I don't know whether some roadblock would
appear, such as, is it hard to make a Const node of type internal
and containing an oid? Or, what sort of node must be inserted to
clarify that the 'internal' return is actually a Datum of the
expected type? By construction, we know that it is, but how to
make that explicit in the expression tree?


a).  we can't use the makeNullConst because jsonb_xxx_type is
strict,  so if we have NULL constant input here,  the PG system
will return NULL directly.  b).  Not only the type oid is the thing
We are interested in the  const.constvalue is as well since
'explain select '  to access it to show it as a string.
Datum(0) as the constvalue will crash in this sense.  That's why
makeDummyConst was introduced.


Again, all of that complication stems from the choice to use the
anyelement return type and rely on polymorphic type resolution
to figure the oid out, when we already have the oid to begin with
and the oid is all we want.


something like "assertion of
'internal'-to-foo binary coercibility, vouched by a prosupport
function", would that be a bad thing?


I can't follow this as well.


That was just another way of saying what I was getting at above
about what's needed in the expression tree to indicate that the
'internal' produced by this function is, in fact, really a bool
(or whatever). We know that it is, but perhaps the expression
tree will be considered ill-formed without a node that says so.
A node representing a no-op, binary conversion would suffice,
but is there already a node that's allowed to represent an
internal-to-bool no-op cast?

If there isn't, one might have to be invented. So it might be that
if we go down the "use polymorphic resolution" road, we have to
invent dummy Consts, and down the "internal" road we also have to
invent something, like the "no-op cast considered correct because
a SupportRequestSimplify function put it here" node.

If it came down to having to invent one of those things or the
other, I'd think the latter more directly 

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-17 Thread Amit Kapila
On Thu, Aug 17, 2023 at 2:10 PM Peter Smith  wrote:
>
> Here are some review comments for the first 2 patches.
>
>
> 3.
>
> +
> + Before you start upgrading the publisher node, ensure that the
> + subscription is temporarily disabled. After the upgrade is complete,
> + execute the
> + ALTER
> SUBSCRIPTION ... DISABLE
> + command to update the connection string, and then re-enable the
> + subscription.
> +
>
> 3a.
> That link made no sense in this context.
>
> Don't you mean to say:
> ALTER SUBSCRIPTION ... CONNECTION ...
>

I think the command is correct here but the wording should mention
about disabling the subscription.

>
>   /*
> - * Fetch all libraries containing non-built-in C functions in this DB.
> + * Fetch all libraries containing non-built-in C functions and
> + * output plugins in this DB.
>   */
>   ress[dbnum] = executeQueryOrDie(conn,
>   "SELECT DISTINCT probin "
>   "FROM pg_catalog.pg_proc "
>   "WHERE prolang = %u AND "
>   "probin IS NOT NULL AND "
> - "oid >= %u;",
> + "oid >= %u "
> + "UNION "
> + "SELECT DISTINCT plugin "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE wal_status <> 'lost' AND "
> + "database = current_database() AND "
> + "temporary IS FALSE;",
>   ClanguageId,
>   FirstNormalObjectId);
>   totaltups += PQntuples(ress[dbnum]);
>
> ~
>
> Maybe it is OK, but it somehow seems like the new logic has been
> jammed into the get_loadable_libraries() function for coding
> convenience. For example, all the names (function names, variable
> names, structure field names) are referring to "libraries", so the
> plugin seems a bit out of place.
>

But the same name library (as plugin) should exist for the upgrade of
slots. I feel doing it separately could either lead to a redundant
code or a different way to achieve the same thing. Do you envision any
problem which we are not seeing?

> ~~~
> 10. get_loadable_libraries
>
> + "UNION "
> + "SELECT DISTINCT plugin "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE wal_status <> 'lost' AND "
> + "database = current_database() AND "
> + "temporary IS FALSE;",
>
> IMO this SQL might be more readable if it uses an alias (like 'rs')
> for the catalog. Then rs.wal_status, rs.database, rs.temporary etc.
>

Then it will become inconsistent with the existing query which doesn't
use any alias. So, I think we should either change the existing query
to use an alias or not use it at all as the patch does. I would prefer
later.

>
> 16. create_logical_replication_slots
>
> + appendPQExpBuffer(query, "SELECT
> pg_catalog.pg_create_logical_replication_slot(");
> + appendStringLiteral(query, slot_arr->slots[slotnum].slotname,
> + slot_arr->encoding, slot_arr->std_strings);
> + appendPQExpBuffer(query, ", ");
> + appendStringLiteral(query, slot_arr->slots[slotnum].plugin,
> + slot_arr->encoding, slot_arr->std_strings);
> + appendPQExpBuffer(query, ", false, %s);",
> +   slot_arr->slots[slotnum].two_phase ? "true" : "false");
>
> I noticed that the function comment for appendStringLiteral() says:
> "We need it in situations where we do not have a PGconn available.
> Where we do, appendStringLiteralConn is a better choice.".
>
> But in this code, we *do* have PGconn available. So, shouldn't we be
> following the advice of the appendStringLiteral() function comment and
> use the other API instead?
>

I think that will avoid maintaining encoding and std_strings in the
slot's array. So, this sounds like a good idea to me.

-- 
With Regards,
Amit Kapila.




Re: Remove distprep

2023-08-17 Thread Andres Freund
On 2023-08-18 10:11:12 +0900, Michael Paquier wrote:
> On Wed, Aug 09, 2023 at 07:38:40AM -0700, Andres Freund wrote:
> > On 2023-08-09 16:25:28 +0200, Christoph Berg wrote:
> >> Understood, I was just pointing out there are more types of generated
> >> files in there.
> > 
> > The situation for configure is somewhat different, due to being maintained 
> > in
> > the repository, rather than just being included in the tarball...
> 
> This one comes down to Debian that patches autoconf with its own set
> of options, requiring a new ./configure in the tree, right?

I'm not sure what you're really asking here?




Re: pg_upgrade - typo in verbose log

2023-08-17 Thread Peter Smith
On Fri, Aug 18, 2023 at 10:47 AM Michael Paquier  wrote:
>
> On Thu, Aug 17, 2023 at 06:09:31PM +1000, Peter Smith wrote:
> > Ping.
>
> Funnily enough, I was looking at this entry yesterday, before you
> replied, and was wondering what's happening here.
>
> > I thought v2 was ready to be pushed, but then this thread went silent
> > for 3 months
>
> Change looks fine, so applied.

Thanks!

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Extract numeric filed in JSONB more effectively

2023-08-17 Thread Andy Fan
Hi Chapman,

Thanks for the review!

The idea of an 'internal' return type with no 'internal' parameter
> was quickly and rightly shot down.


Yes, it mainly breaks the type-safety system.  Parser need to know
the result type, so PG defines the rule like this:

anyelement fn(anyment in);

if the exprType(in) in the query tree is X, then PG would think fn
return type X.  that's why we have to have an anyelement in the
input.


> But it would have seemed to me
> enough to address that objection by using 'internal' also in its
> parameter list. I could imagine a function declared with two
> 'internal' parameters, one understood to be a JsonbValue and one
> understood to be a type oid, and an 'internal' result, treated in
> the rewritten expression tree as binary-coercible to the desired
> result.
>

I have some trouble understanding this.  are you saying something
like:

internal fn(internal jsonValue,  internal typeOid)?

If so, would it break the type-safety system?  And I'm not pretty
sure the 'binary-coercible' here.  is it same as the 'binary-coercible'
in "timestamp is not binary coercible with timestamptz since..."?
I have a strong feeling that I think I misunderstood you here.


> Perhaps there are parts of that rewriting that no existing node type
> can represent?
>

I didn't understand this as well:(:(

But I have the sense that that approach was abandoned early, in
> favor of the current approach using user-visible polymorphic
> types, and supplying typed dummy constants for use in the
> resolution of those types, with a new function introduced to create
> said dummy constants, including allocation and input conversion
> in the case of numeric, just so said dummy constants can be
> passed into functions that have no use for them other than to
> call get_fn_expr_argtype to recover the type oid, which was the
> only thing needed in the first place.


Yes,  but if we follow the type-safety system, we can't simply input
a Oid targetOid, then there are some more considerations here:
a).  we can't use the makeNullConst because jsonb_xxx_type is
strict,  so if we have NULL constant input here,  the PG system
will return NULL directly.  b).  Not only the type oid is the thing
We are interested in the  const.constvalue is as well since
'explain select '  to access it to show it as a string.
Datum(0) as the constvalue will crash in this sense.  That's why
makeDummyConst was introduced.


> something like "assertion of
> 'internal'-to-foo binary coercibility, vouched by a prosupport
> function", would that be a bad thing?
>

I can't follow this as well.  Could you provide the function prototype
here?

-- 
Best Regards
Andy Fan


Re: Remove distprep

2023-08-17 Thread Michael Paquier
On Wed, Aug 09, 2023 at 07:38:40AM -0700, Andres Freund wrote:
> On 2023-08-09 16:25:28 +0200, Christoph Berg wrote:
>> Understood, I was just pointing out there are more types of generated
>> files in there.
> 
> The situation for configure is somewhat different, due to being maintained in
> the repository, rather than just being included in the tarball...

This one comes down to Debian that patches autoconf with its own set
of options, requiring a new ./configure in the tree, right?
--
Michael


signature.asc
Description: PGP signature


Re: Query regarding sharing data directory

2023-08-17 Thread Michael Paquier
On Thu, Aug 17, 2023 at 10:51:45AM +0530, Harsh N Bhatt wrote:
> I have a doubt regarding the storage of the data directory path of
> PostgreSQL. I'm using PostgreSQL server 15 in which I want to give a path
> to an external drive RAID Memory Storage. Which is on the LAN Network? Is
> it compatible or not? If it is then which file system is suitable: NAS or
> SAN?
> 
> If it is, can you share any documents with me?

If you can connect to your server, you could use the following query
to know where your data folder is:
SHOW data_directory;

The location of the data directory is something that distributions and
installations set by themselves, so in short it depends on your
environment except if you set up a cluster by yourself.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade - typo in verbose log

2023-08-17 Thread Michael Paquier
On Thu, Aug 17, 2023 at 06:09:31PM +1000, Peter Smith wrote:
> Ping.

Funnily enough, I was looking at this entry yesterday, before you
replied, and was wondering what's happening here.

> I thought v2 was ready to be pushed, but then this thread went silent
> for 3 months

Change looks fine, so applied.
--
Michael


signature.asc
Description: PGP signature


Re: Ignore 2PC transaction GIDs in query jumbling

2023-08-17 Thread Michael Paquier
On Tue, Aug 15, 2023 at 08:49:37AM +0900, Michael Paquier wrote:
> Hmm.  One issue with the patch is that we finish by considering
> DEALLOCATE ALL and DEALLOCATE $1 as the same things, compiling the
> same query IDs.  The difference is made in the Nodes by assigning NULL
> to the name but we would now ignore it.  Wouldn't it be better to add
> an extra field to DeallocateStmt to track separately the named
> deallocate queries and ALL in monitoring?

In short, I would propose something like that, with a new boolean
field in DeallocateStmt that's part of the jumbling.

Dagfinn, Julien, what do you think about the attached?
--
Michael
From ad91672367f408663824b36b6dd517513d7f0a2a Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 18 Aug 2023 09:12:58 +0900
Subject: [PATCH v2] Track DEALLOCATE statements in pg_stat_activity

Treat the statement name as a constant when jumbling.  A boolean field
is added to DeallocateStmt to make a distinction between the ALL and
named cases.
---
 src/include/nodes/parsenodes.h|  8 +++-
 src/backend/parser/gram.y |  8 
 .../pg_stat_statements/expected/utility.out   | 41 +++
 .../pg_stat_statements/pg_stat_statements.c   |  8 +---
 contrib/pg_stat_statements/sql/utility.sql| 13 ++
 5 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2565348303..2b356336eb 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3925,8 +3925,12 @@ typedef struct ExecuteStmt
 typedef struct DeallocateStmt
 {
 	NodeTag		type;
-	char	   *name;			/* The name of the plan to remove */
-	/* NULL means DEALLOCATE ALL */
+	/* The name of the plan to remove.  NULL if DEALLOCATE ALL. */
+	char	   *name pg_node_attr(query_jumble_ignore);
+	/* true if DEALLOCATE ALL */
+	bool		isall;
+	/* token location, or -1 if unknown */
+	int			location pg_node_attr(query_jumble_location);
 } DeallocateStmt;
 
 /*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b3bdf947b6..df34e96f89 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11936,6 +11936,8 @@ DeallocateStmt: DEALLOCATE name
 		DeallocateStmt *n = makeNode(DeallocateStmt);
 
 		n->name = $2;
+		n->isall = false;
+		n->location = @2;
 		$$ = (Node *) n;
 	}
 | DEALLOCATE PREPARE name
@@ -11943,6 +11945,8 @@ DeallocateStmt: DEALLOCATE name
 		DeallocateStmt *n = makeNode(DeallocateStmt);
 
 		n->name = $3;
+		n->isall = false;
+		n->location = @3;
 		$$ = (Node *) n;
 	}
 | DEALLOCATE ALL
@@ -11950,6 +11954,8 @@ DeallocateStmt: DEALLOCATE name
 		DeallocateStmt *n = makeNode(DeallocateStmt);
 
 		n->name = NULL;
+		n->isall = true;
+		n->location = -1;
 		$$ = (Node *) n;
 	}
 | DEALLOCATE PREPARE ALL
@@ -11957,6 +11963,8 @@ DeallocateStmt: DEALLOCATE name
 		DeallocateStmt *n = makeNode(DeallocateStmt);
 
 		n->name = NULL;
+		n->isall = true;
+		n->location = -1;
 		$$ = (Node *) n;
 	}
 		;
diff --git a/contrib/pg_stat_statements/expected/utility.out b/contrib/pg_stat_statements/expected/utility.out
index 93735d5d85..f331044f3e 100644
--- a/contrib/pg_stat_statements/expected/utility.out
+++ b/contrib/pg_stat_statements/expected/utility.out
@@ -472,6 +472,47 @@ SELECT pg_stat_statements_reset();
  
 (1 row)
 
+-- Execution statements
+SELECT 1 as a;
+ a 
+---
+ 1
+(1 row)
+
+PREPARE stat_select AS SELECT $1 AS a;
+EXECUTE stat_select (1);
+ a 
+---
+ 1
+(1 row)
+
+DEALLOCATE stat_select;
+PREPARE stat_select AS SELECT $1 AS a;
+EXECUTE stat_select (2);
+ a 
+---
+ 2
+(1 row)
+
+DEALLOCATE PREPARE stat_select;
+DEALLOCATE ALL;
+DEALLOCATE PREPARE ALL;
+SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C";
+ calls | rows | query 
+---+--+---
+ 2 |0 | DEALLOCATE $1
+ 2 |0 | DEALLOCATE ALL
+ 2 |2 | PREPARE stat_select AS SELECT $1 AS a
+ 1 |1 | SELECT $1 as a
+ 1 |1 | SELECT pg_stat_statements_reset()
+(5 rows)
+
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
 -- SET statements.
 -- These use two different strings, still they count as one entry.
 SET work_mem = '1MB';
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 55b957d251..06b65aeef5 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -104,8 +104,7 @@ static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
  * ignores.
  */
 #define PGSS_HANDLED_UTILITY(n)		(!IsA(n, ExecuteStmt) && \
-	!IsA(n, PrepareStmt) && \
-	!IsA(n, DeallocateStmt))
+	!IsA(n, PrepareStmt))
 
 /*
  * Extension version number, for supporting 

Re: New WAL record to detect the checkpoint redo location

2023-08-17 Thread Michael Paquier
On Thu, Aug 17, 2023 at 01:11:50PM +0530, Dilip Kumar wrote:
> Yeah right, actually I was confused, I assumed it will return the
> start LSN of the record.  And I do not see any easy way to identify
> the Start LSN of this record so maybe this solution will not work.  I
> will have to think of something else.  Thanks for pointing it out.

About that.  One thing to consider may be ReserveXLogInsertLocation()
while holding the WAL insert lock, but you can just rely on
ProcLastRecPtr for the job after inserting the REDO record, no?
--
Michael


signature.asc
Description: PGP signature


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-17 Thread Michael Paquier
On Thu, Aug 17, 2023 at 03:29:28PM -0400, Stephen Frost wrote:
> On Thu, Aug 17, 2023 at 15:23 Robert Haas  wrote:
>> For what it's worth, my vote would be for "connection authenticated:
>> ... method=trust".
> 
> I don’t have any particular objection to this language and agree that it’s
> actually closer to how we talk about the trust auth method in our
> documentation.

After sleeping on it, I think that I'd just agree with Robert's point
to just use the same language as the message, while also agreeing with
the patch to not set MyClientConnectionInfo.authn_id in the uaTrust
case, only logging something under log_connections.

+* No authentication was actually performed; this happens e.g. when the
+* trust method is in use.

This comment should be reworded a bit, say "No authentication identity
was set; blah ..".

> Maybe if we decided to rework the documentation … or perhaps just ripped
> “trust” out entirely … but those are whole different things from what we
> are trying to accomplish here.

Not sure I see any point in doing that these days.
--
Michael


signature.asc
Description: PGP signature


Re: Normalization of utility queries in pg_stat_statements

2023-08-17 Thread Michael Paquier
On Wed, Aug 16, 2023 at 05:11:47PM +0800, jian he wrote:
> SELECT calls, toplevel, rows, query FROM pg_stat_statements ORDER BY
> query COLLATE "C";
> returns:
>  calls | toplevel | rows |   query
> ---+--+--+
>  1 | t|0 | CALL ptest3($1)
>  2 | f|2 | INSERT INTO cp_test VALUES ($2, x)
>  1 | t|1 | SELECT pg_stat_statements_reset()
> 
> here, the intermediate CALL part is optimized away. or should I expect
> CALL ptest1($1) also in pg_stat_statements?

I would have guessed that ptest1() being called as part of ptest3()
should show up in the report if you use track = all, as all the nested
queries of a function, even if it is pure SQL, ought to show up.  Now
note that ptest1() not showing up is not a new behavior, ~15 does the
same thing by missing it.
--
Michael


signature.asc
Description: PGP signature


Re: Use of additional index columns in rows filtering

2023-08-17 Thread Jeff Davis
On Wed, 2023-08-09 at 17:14 -0700, Peter Geoghegan wrote:
> + Index quals are better than equivalent index filters because bitmap
> index scans can only use index quals

It seems there's consensus that:

  * Index Filters (Tomas's patch and the topic of this thread) are more
general, because they can work on any expression, e.g. 1/x, which can
throw an error if x=0.
  * Index quals are more optimizable, because such operators are not
supposed to throw errors or have side effects, so we can evaluate them
before determining visibility.

I wouldn't describe one as "better" than the other, but I assume you
meant "more optimizable".

It's interesting that there's overlap in utility between Tomas's
current patch and Peter's work on optimizing SAOPs. But I don't see a
lot of tension there -- it seems like Tomas's patch will always be
useful for filters that might throw an error (or have other side
effects).

Is there any part of the design here that's preventing this patch from
moving forward? If not, what are the TODOs for the current patch, or is
it just waiting on more review?

Regards,
Jeff Davis





Re: Fix an entry in wait_event_names.txt

2023-08-17 Thread Michael Paquier
On Thu, Aug 17, 2023 at 03:25:27PM +0900, Masahiro Ikeda wrote:
> +1. Thanks!

Applied.
--
Michael


signature.asc
Description: PGP signature


Re: WIP: new system catalog pg_wait_event

2023-08-17 Thread Michael Paquier
On Thu, Aug 17, 2023 at 04:37:22PM +0900, Masahiro Ikeda wrote:
> On 2023-08-17 14:53, Drouvot, Bertrand wrote:
> BTW, is it better to start with "Waiting" like any other lines?
> For example, "Waiting for custom event \"worker_spi_main\" defined by an
> extension".

Using "Waiting for custom event" is a good term here.  I am wondering
if this should be s/extension/module/, as an event could be set by a
loaded library, which may not be set with CREATE EXTENSION.

> ex.
> postgres(3789417)=# SELECT a.pid, a.wait_event_type, a.wait_event,
> w.description FROM pg_stat_activity a JOIN pg_wait_event w ON
> (a.wait_event_type = w.type AND a.wait_event = w.name) WHERE wait_event is
> NOT NULL;
>pid   | wait_event_type |wait_event |
> description
> -+-+---+
>  3783759 | Activity| AutoVacuumMain| Waiting in main loop of
> autovacuum launcher process
>  3812866 | Extension   | WorkerSpiMain | Custom wait event
> "WorkerSpiMain" defined by extension
>  3783756 | Activity| BgWriterHibernate | Waiting in background
> writer process, hibernating
>  3783760 | Activity| ArchiverMain  | Waiting in main loop of
> archiver process
>  3783755 | Activity| CheckpointerMain  | Waiting in main loop of
> checkpointer process
>  3783758 | Activity| WalWriterMain | Waiting in main loop of WAL
> writer process
> (6 rows)

You need to think about the PDF being generated from the SGML docs, as
well, so this should not be too large.  I don't think we need to care
about wait_event_type for this example, so it can be removed.  If the
tuple generated is still too large, showing one tuple under \x with a
filter on backend_type would be enough. 
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add native windows on arm64 support

2023-08-17 Thread Michael Paquier
On Thu, Aug 17, 2023 at 09:41:44AM +0100, Anthony Roberts wrote:
> Just following up on this, has there been any movement?
> 
> I did see another message go into the thread here with no reply:
> https://www.postgresql.org/message-id/ZKPLjLjIjwN2lxkg%40paquier.xyz

I don't have an environment to test the patch, but I don't object to
it per se.  However, I don't really want to move forward completely
blindly as well in the long-term.

As mentioned to Niyas, could it be possible to provide to the
community a buildfarm machine that would be able to test this
environment?  I am not sure what's your status on that.  Perhaps this
is already set up and you are just waiting for the patch to be merged?
--
Michael


signature.asc
Description: PGP signature


Re: meson: pgxs Makefile.global differences

2023-08-17 Thread Tristan Partin

On Thu Aug 17, 2023 at 3:51 PM CDT, Andres Freund wrote:

PS: I don't have and...@anarazel.dev , just .de :)


Fat fingered a "v" somehow.

--
Tristan Partin
Neon (https://neon.tech)




Re: meson: pgxs Makefile.global differences

2023-08-17 Thread Andrew Dunstan




> On Aug 17, 2023, at 4:51 PM, Andres Freund  wrote:
> 
> Hi,
> 
>> On 2023-08-17 14:45:54 -0500, Tristan Partin wrote:
>>> On Thu Aug 17, 2023 at 2:32 PM CDT, Andrew Dunstan wrote:
>>> I started digging into a warning I noticed on my FDW builds where
>>> Postgres is built with meson, e.g. 
>>> 
>>> which has this:
>>> 
>>> cc1: warning: ‘-Wformat-security’ ignored without ‘-Wformat’
>>> [-Wformat-security]
>>> 
>>> I found that the pgxs Makefile.global built under meson is a bit
>>> different. On debug builds for both this is what I get on HEAD (meson)
>>> and REL_15_STABLE (autoconf), stripped of the current components:
> 
> I assume "current" means the flags that are present in both cases?


Yes, sorry, meant to type common.

> 
> 
>>>  HEAD: CFLAGS =-Wshadow=compatible-local
>>> REL_15_STABLE: CFLAGS =-Wall  -g
>>> 
>>> The warning is apparently due to the missing -Wall.
>>> 
>>> Shouldn't we be aiming for pretty much identical settings?
> 
> The difference for -Wshadow=compatible-local is due to changes between 15 and
> HEAD.
> 
> We're indeed not adding -Wall right now (the warning level is handled by
> meson, so it doesn't show up in our cflags right now).
> 
> 
>> I agree that they should be identical. The meson bild should definitely be
>> aiming for 100% compatibility for the Makefile.global.
> 
> I don't think that's feasible. It was a fair bit of work to get the most
> important contents to match, while skipping lots of things that are primarily
> relevant for building the server (which isn't relevant for pgxs).
> 
> That said, in this specific case, I agree, we should likely emit -Wall to
> Makefile.global in meson as well.
> 
> 

Cool

Cheers 

Andrew



Re: meson: pgxs Makefile.global differences

2023-08-17 Thread Andres Freund
Hi,

On 2023-08-17 14:45:54 -0500, Tristan Partin wrote:
> On Thu Aug 17, 2023 at 2:32 PM CDT, Andrew Dunstan wrote:
> > I started digging into a warning I noticed on my FDW builds where
> > Postgres is built with meson, e.g. 
> > 
> > which has this:
> >
> > cc1: warning: ‘-Wformat-security’ ignored without ‘-Wformat’
> > [-Wformat-security]
> >
> > I found that the pgxs Makefile.global built under meson is a bit
> > different. On debug builds for both this is what I get on HEAD (meson)
> > and REL_15_STABLE (autoconf), stripped of the current components:

I assume "current" means the flags that are present in both cases?


> >   HEAD: CFLAGS =-Wshadow=compatible-local
> > REL_15_STABLE: CFLAGS =-Wall  -g
> >
> > The warning is apparently due to the missing -Wall.
> >
> > Shouldn't we be aiming for pretty much identical settings?

The difference for -Wshadow=compatible-local is due to changes between 15 and
HEAD.

We're indeed not adding -Wall right now (the warning level is handled by
meson, so it doesn't show up in our cflags right now).


> I agree that they should be identical. The meson bild should definitely be
> aiming for 100% compatibility for the Makefile.global.

I don't think that's feasible. It was a fair bit of work to get the most
important contents to match, while skipping lots of things that are primarily
relevant for building the server (which isn't relevant for pgxs).

That said, in this specific case, I agree, we should likely emit -Wall to
Makefile.global in meson as well.

Greetings,

Andres Freund


PS: I don't have and...@anarazel.dev , just .de :)




Re: Extract numeric filed in JSONB more effectively

2023-08-17 Thread Chapman Flack

On 2023-08-17 05:07, Andy Fan wrote:

Thanks for the review,  v9 attached!


From the earliest iterations of this patch, I seem to recall
a couple of designs being considered:

In one, the type-specific cast function would only be internally
usable, would take a type oid as an extra parameter (supplied in
the SupportRequestSimplify rewriting), and would have to be
declared with some nonspecific return type; 'internal' was
mentioned.

The idea of an 'internal' return type with no 'internal' parameter
was quickly and rightly shot down. But it would have seemed to me
enough to address that objection by using 'internal' also in its
parameter list. I could imagine a function declared with two
'internal' parameters, one understood to be a JsonbValue and one
understood to be a type oid, and an 'internal' result, treated in
the rewritten expression tree as binary-coercible to the desired
result.

Admittedly, I have not tried to implement that myself to see
what unexpected roadblocks might exist on that path. Perhaps
there are parts of that rewriting that no existing node type
can represent? Someone more familiar with those corners of
PostgreSQL may immediately see other difficulties I do not.

But I have the sense that that approach was abandoned early, in
favor of the current approach using user-visible polymorphic
types, and supplying typed dummy constants for use in the
resolution of those types, with a new function introduced to create
said dummy constants, including allocation and input conversion
in the case of numeric, just so said dummy constants can be
passed into functions that have no use for them other than to
call get_fn_expr_argtype to recover the type oid, which was the
only thing needed in the first place.

Compared to the initial direction I thought this was going,
none of that strikes me as better.

Nothing makes my opinion authoritative here, and there may
indeed be reasons it is better, known to others more familiar
with that code than I am. But it bugs me.

If the obstacles to the earlier approach came down to needing
a new type of expression node, something like "assertion of
'internal'-to-foo binary coercibility, vouched by a prosupport
function", would that be a bad thing?

Regards,
-Chap




Re: meson: pgxs Makefile.global differences

2023-08-17 Thread Tristan Partin

On Thu Aug 17, 2023 at 2:32 PM CDT, Andrew Dunstan wrote:
I started digging into a warning I noticed on my FDW builds where 
Postgres is built with meson, e.g. 
 
which has this:


cc1: warning: ‘-Wformat-security’ ignored without ‘-Wformat’ 
[-Wformat-security]


I found that the pgxs Makefile.global built under meson is a bit 
different. On debug builds for both this is what I get on HEAD (meson) 
and REL_15_STABLE (autoconf), stripped of the current components:


  HEAD: CFLAGS =-Wshadow=compatible-local
REL_15_STABLE: CFLAGS =-Wall  -g

The warning is apparently due to the missing -Wall.

Shouldn't we be aiming for pretty much identical settings?


I agree that they should be identical. The meson bild should definitely 
be aiming for 100% compatibility for the Makefile.global.


--
Tristan Partin
Neon (https://neon.tech)




meson: pgxs Makefile.global differences

2023-08-17 Thread Andrew Dunstan
I started digging into a warning I noticed on my FDW builds where 
Postgres is built with meson, e.g. 
 
which has this:


cc1: warning: ‘-Wformat-security’ ignored without ‘-Wformat’ 
[-Wformat-security]


I found that the pgxs Makefile.global built under meson is a bit 
different. On debug builds for both this is what I get on HEAD (meson) 
and REL_15_STABLE (autoconf), stripped of the current components:


 HEAD: CFLAGS =-Wshadow=compatible-local
REL_15_STABLE: CFLAGS =-Wall  -g

The warning is apparently due to the missing -Wall.

Shouldn't we be aiming for pretty much identical settings?


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-17 Thread Stephen Frost
Greetings,

On Thu, Aug 17, 2023 at 15:23 Robert Haas  wrote:

> On Thu, Aug 17, 2023 at 12:54 PM Jacob Champion 
> wrote:
> > On Thu, Aug 17, 2023 at 9:46 AM Stephen Frost 
> wrote:
> > > Don't like 'skipped' but that feels closer.
> > >
> > > How about 'connection bypassed authentication'?
> >
> > Works for me; see v2.
>
> For what it's worth, my vote would be for "connection authenticated:
> ... method=trust".


I don’t have any particular objection to this language and agree that it’s
actually closer to how we talk about the trust auth method in our
documentation.

Maybe if we decided to rework the documentation … or perhaps just ripped
“trust” out entirely … but those are whole different things from what we
are trying to accomplish here.

Thanks,

Stephen


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-17 Thread Robert Haas
On Thu, Aug 17, 2023 at 12:54 PM Jacob Champion  wrote:
> On Thu, Aug 17, 2023 at 9:46 AM Stephen Frost  wrote:
> > Don't like 'skipped' but that feels closer.
> >
> > How about 'connection bypassed authentication'?
>
> Works for me; see v2.

For what it's worth, my vote would be for "connection authenticated:
... method=trust". The only reason we're not doing that is because
there's some argument that trusting that the client is who they say
they are is not really authentication at all. But this seems silly,
because we put "trust" in the "METHOD" column of pg_hba.conf, so in
that case we already treat it as an authentication method. Also, any
such line in pg_hba.conf still matches against the supplied IP address
mask, which I suppose could be viewed as a form of authentication. Or
maybe not. But I wonder if we're just being too persnickety about
language here, in a way that maybe isn't consistent with our previous
practice.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-17 Thread Jacob Champion
On Thu, Aug 17, 2023 at 9:46 AM Stephen Frost  wrote:
> Don't like 'skipped' but that feels closer.
>
> How about 'connection bypassed authentication'?

Works for me; see v2.

Thanks!
--Jacob
From 5404c28391d2faae8a306e4e21c2ddfe1b70b53e Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 16 Aug 2023 14:48:49 -0700
Subject: [PATCH v2] log_connections: add entries for "trust" connections

Allow DBAs to audit unexpected "trust" connections.  Previously, you had
to notice the absence of a "connection authenticated" line between the
"connection received" and "connection authorized" entries.  Now it's
called out explicitly.
---
 src/backend/libpq/auth.c  | 15 +++
 src/test/authentication/t/001_password.pl |  6 ++
 2 files changed, 21 insertions(+)

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index e7571aaddb..a68e1c9b52 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -645,6 +645,21 @@ ClientAuthentication(Port *port)
 #endif
 	}
 
+	if (Log_connections
+		&& (status == STATUS_OK) && !MyClientConnectionInfo.authn_id)
+	{
+		/*
+		 * No authentication was actually performed; this happens e.g. when the
+		 * trust method is in use.  For audit purposes, log a breadcrumb to
+		 * explain where in the HBA this happened.
+		 */
+		ereport(LOG,
+errmsg("connection bypassed authentication: user=\"%s\" method=%s "
+	   "(%s:%d)",
+	   port->user_name, hba_authname(port->hba->auth_method),
+	   port->hba->sourcefile, port->hba->linenumber));
+	}
+
 	if (ClientAuthentication_hook)
 		(*ClientAuthentication_hook) (port, status);
 
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 12552837a8..3440afbaaa 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -140,8 +140,14 @@ $node->safe_psql('postgres', "CREATE database regex_testdb;");
 # considered to be authenticated.
 reset_pg_hba($node, 'all', 'all', 'trust');
 test_conn($node, 'user=scram_role', 'trust', 0,
+	log_like => [
+		qr/connection bypassed authentication: user="scram_role" method=trust/
+	],
 	log_unlike => [qr/connection authenticated:/]);
 test_conn($node, 'user=md5_role', 'trust', 0,
+	log_like => [
+		qr/connection bypassed authentication: user="md5_role" method=trust/
+	],
 	log_unlike => [qr/connection authenticated:/]);
 
 # SYSTEM_USER is null when not authenticated.
-- 
2.39.2



Re: Using defines for protocol characters

2023-08-17 Thread Nathan Bossart
On Thu, Aug 17, 2023 at 12:22:24PM -0400, Robert Haas wrote:
> I think this is going to be a major improvement in terms of readability!
> 
> I'm pretty happy with this version, too. We may be running out of
> things to argue about -- and no, I don't want to argue about
> underscores more!

Awesome.  If we are indeed out of things to argue about, I'll plan on
committing this sometime next week.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-17 Thread Stephen Frost
Greetings,

* Jacob Champion (jchamp...@timescale.com) wrote:
> On Thu, Aug 17, 2023 at 9:01 AM Stephen Frost  wrote:
> > Maybe 'connection allowed' instead..?
> 
> Hm. It hasn't really been allowed yet, either. To illustrate what I mean:
> 
> LOG:  connection received: host=[local]
> LOG:  connection allowed: user="jacob" method=trust
> (/home/jacob/src/data/pg16/pg_hba.conf:117)
> LOG:  connection authorized: user=jacob database=postgres
> application_name=psql
> 
> Maybe "unauthenticated connection:"? "connection without
> authentication:"? "connection skipped authentication:"?

Don't like 'skipped' but that feels closer.

How about 'connection bypassed authentication'?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-17 Thread Jacob Champion
On Thu, Aug 17, 2023 at 9:01 AM Stephen Frost  wrote:
> That doesn't seem quite right ... admittedly, 'trust' isn't performing
> authentication but there can certainly be an argument made that the
> basic 'matched a line in pg_hba.conf' is a form of authentication

I'm not personally on board with this argument, but...

> and
> worse really, saying 'not authenticated' would seem to imply that we
> didn't allow the connection when, really, we did, and that could be
> confusing to someone.

...with this one, I agree.

> Maybe 'connection allowed' instead..?

Hm. It hasn't really been allowed yet, either. To illustrate what I mean:

LOG:  connection received: host=[local]
LOG:  connection allowed: user="jacob" method=trust
(/home/jacob/src/data/pg16/pg_hba.conf:117)
LOG:  connection authorized: user=jacob database=postgres
application_name=psql

Maybe "unauthenticated connection:"? "connection without
authentication:"? "connection skipped authentication:"?

--Jacob




Re: Using defines for protocol characters

2023-08-17 Thread Dave Cramer
On Thu, Aug 17, 2023 at 9:22 AM Robert Haas  wrote:

> On Thu, Aug 17, 2023 at 12:13 PM Nathan Bossart
>  wrote:
> > On Thu, Aug 17, 2023 at 09:31:55AM +0900, Michael Paquier wrote:
> > > Looks sensible seen from here.
> >
> > Thanks for taking a look.
>
> I think this is going to be a major improvement in terms of readability!


That was the primary goal

>
Dave

>
> --
Dave Cramer


Re: Using defines for protocol characters

2023-08-17 Thread Robert Haas
On Thu, Aug 17, 2023 at 12:13 PM Nathan Bossart
 wrote:
> On Thu, Aug 17, 2023 at 09:31:55AM +0900, Michael Paquier wrote:
> > Looks sensible seen from here.
>
> Thanks for taking a look.

I think this is going to be a major improvement in terms of readability!

I'm pretty happy with this version, too. We may be running out of
things to argue about -- and no, I don't want to argue about
underscores more!

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Using defines for protocol characters

2023-08-17 Thread Nathan Bossart
On Thu, Aug 17, 2023 at 09:31:55AM +0900, Michael Paquier wrote:
> Looks sensible seen from here.

Thanks for taking a look.

> This patch is missing the installation of protocol.h in
> src/tools/msvc/Install.pm for MSVC.  For pqcomm.h, we are doing that:
> lcopy('src/include/libpq/pqcomm.h', $target . '/include/internal/libpq/')
> || croak 'Could not copy pqcomm.h';
> 
> So adding two similar lines for protocol.h should be enough (I assume,
> did not test).

I added those lines in v7.

> In fe-exec.c, we still have a few things for the type of objects to
> work on:
> - 'S' for statement.
> - 'P' for portal.
> Should these be added to protocol.h?  They are part of the extended
> protocol.

IMHO they should be added, but I've intentionally restricted this first
patch to only codes with existing names in protocol.sgml.  I figured we
could work on naming other things in a follow-up discussion.

> The comment at the top of PQsendTypedCommand() mentions 'C' and 'D',
> but perhaps these should be updated to the object names instead?

Done.

> pqFunctionCall3(), for PQfn(), has a few more hardcoded characters for
> its status codes.  I'm OK to do things incrementally so it's fine by
> me to not add them now, just noticing on the way what could be added
> to this new header.

Cool, thanks.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 69f94689857a469333fecbfd2057868140796516 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 17 Aug 2023 09:00:46 -0700
Subject: [PATCH v7 1/1] Introduce macros for protocol characters.

Author: Dave Cramer
Reviewed-by: Alvaro Herrera, Tatsuo Ishii, Peter Smith, Robert Haas, Tom Lane, Peter Eisentraut, Michael Paquier
Discussion: https://postgr.es/m/CADK3HHKbBmK-PKf1bPNFoMC%2BoBt%2BpD9PH8h5nvmBQskEHm-Ehw%40mail.gmail.com
---
 src/backend/access/common/printsimple.c |  5 +-
 src/backend/access/transam/parallel.c   | 14 ++--
 src/backend/backup/basebackup_copy.c| 16 ++---
 src/backend/commands/async.c|  2 +-
 src/backend/commands/copyfromparse.c| 22 +++
 src/backend/commands/copyto.c   |  6 +-
 src/backend/libpq/auth-sasl.c   |  2 +-
 src/backend/libpq/auth.c|  8 +--
 src/backend/postmaster/postmaster.c |  2 +-
 src/backend/replication/walsender.c | 18 +++---
 src/backend/tcop/dest.c |  8 +--
 src/backend/tcop/fastpath.c |  2 +-
 src/backend/tcop/postgres.c | 68 ++--
 src/backend/utils/error/elog.c  |  5 +-
 src/backend/utils/misc/guc.c|  2 +-
 src/include/Makefile|  3 +-
 src/include/libpq/pqcomm.h  | 23 ++-
 src/include/libpq/protocol.h| 85 +
 src/include/meson.build |  1 +
 src/interfaces/libpq/fe-auth.c  |  2 +-
 src/interfaces/libpq/fe-connect.c   | 19 --
 src/interfaces/libpq/fe-exec.c  | 54 
 src/interfaces/libpq/fe-protocol3.c | 70 ++--
 src/interfaces/libpq/fe-trace.c | 70 +++-
 src/tools/msvc/Install.pm   |  2 +
 25 files changed, 305 insertions(+), 204 deletions(-)
 create mode 100644 src/include/libpq/protocol.h

diff --git a/src/backend/access/common/printsimple.c b/src/backend/access/common/printsimple.c
index ef818228ac..675b744db2 100644
--- a/src/backend/access/common/printsimple.c
+++ b/src/backend/access/common/printsimple.c
@@ -20,6 +20,7 @@
 
 #include "access/printsimple.h"
 #include "catalog/pg_type.h"
+#include "libpq/protocol.h"
 #include "libpq/pqformat.h"
 #include "utils/builtins.h"
 
@@ -32,7 +33,7 @@ printsimple_startup(DestReceiver *self, int operation, TupleDesc tupdesc)
 	StringInfoData buf;
 	int			i;
 
-	pq_beginmessage(, 'T'); /* RowDescription */
+	pq_beginmessage(, PqMsg_RowDescription);
 	pq_sendint16(, tupdesc->natts);
 
 	for (i = 0; i < tupdesc->natts; ++i)
@@ -65,7 +66,7 @@ printsimple(TupleTableSlot *slot, DestReceiver *self)
 	slot_getallattrs(slot);
 
 	/* Prepare and send message */
-	pq_beginmessage(, 'D');
+	pq_beginmessage(, PqMsg_DataRow);
 	pq_sendint16(, tupdesc->natts);
 
 	for (i = 0; i < tupdesc->natts; ++i)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 1738aecf1f..194a1207be 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1127,7 +1127,7 @@ HandleParallelMessage(ParallelContext *pcxt, int i, StringInfo msg)
 
 	switch (msgtype)
 	{
-		case 'K':/* BackendKeyData */
+		case PqMsg_BackendKeyData:
 			{
 int32		pid = pq_getmsgint(msg, 4);
 
@@ -1137,8 +1137,8 @@ HandleParallelMessage(ParallelContext *pcxt, int i, StringInfo msg)
 break;
 			}
 
-		case 'E':/* ErrorResponse */
-		case 'N':/* NoticeResponse */
+		case PqMsg_ErrorResponse:
+		case PqMsg_NoticeResponse:
 			{
 ErrorData	edata;
 ErrorContextCallback *save_error_context_stack;
@@ -1183,7 

Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-17 Thread Stephen Frost
Greetings,

* Jacob Champion (jchamp...@timescale.com) wrote:
> Maybe something like the attached?

> - I used the phrasing "connection not authenticated" in the hopes that
> it's a bit more greppable than just "connection", especially in
> combination with the existing "connection authenticated" lines.

That doesn't seem quite right ... admittedly, 'trust' isn't performing
authentication but there can certainly be an argument made that the
basic 'matched a line in pg_hba.conf' is a form of authentication, and
worse really, saying 'not authenticated' would seem to imply that we
didn't allow the connection when, really, we did, and that could be
confusing to someone.

Maybe 'connection allowed' instead..?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: [PATCH] Add function to_oct

2023-08-17 Thread Nathan Bossart
On Thu, Aug 17, 2023 at 12:35:54PM +0700, John Naylor wrote:
> That makes it a lexically-scoped global variable, which we don't need
> either. Can we have the internal function allocate on the stack, then
> call cstring_to_text() on that, returning the text result? That does its
> own palloc.
> 
> Or maybe better, save the starting pointer, compute the length at the end,
> and call cstring_to_text_with_len()?  (It seems we wouldn't need
> the nul-terminator then, either.)

Works for me.  I did it that way in v7.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From b7eccfe08150ec90ca4b48fe69c2b37453c633b2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 25 Jul 2023 16:09:01 -0700
Subject: [PATCH v7 1/1] add to_binary() and to_oct()

---
 doc/src/sgml/func.sgml| 42 
 src/backend/utils/adt/varlena.c   | 92 +++
 src/include/catalog/pg_proc.dat   | 12 
 src/test/regress/expected/strings.out | 26 +++-
 src/test/regress/sql/strings.sql  |  9 ++-
 5 files changed, 153 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..23b5ac7457 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3737,6 +3737,48 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ to_binary
+
+to_binary ( integer )
+text
+   
+   
+to_binary ( bigint )
+text
+   
+   
+Converts the number to its equivalent binary representation.
+   
+   
+to_binary(2147483647)
+111
+   
+  
+
+  
+   
+
+ to_oct
+
+to_oct ( integer )
+text
+   
+   
+to_oct ( bigint )
+text
+   
+   
+Converts the number to its equivalent octal representation.
+   
+   
+to_oct(2147483647)
+177
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index b1ec5c32ce..58c0cbcdd8 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4919,53 +4919,95 @@ array_to_text_internal(FunctionCallInfo fcinfo, ArrayType *v,
 	return result;
 }
 
-#define HEXBASE 16
 /*
- * Convert an int32 to a string containing a base 16 (hex) representation of
- * the number.
+ * We size the buffer for to_binary's longest possible return value.
  */
-Datum
-to_hex32(PG_FUNCTION_ARGS)
+#define CONVERT_TO_BASE_BUFSIZE		(sizeof(uint64) * BITS_PER_BYTE)
+
+/*
+ * Workhorse for to_binary, to_oct, and to_hex.  Note that base must be > 1 and
+ * <= 16.
+ */
+static inline text *
+convert_to_base(uint64 value, int base)
 {
-	uint32		value = (uint32) PG_GETARG_INT32(0);
 	char	   *ptr;
 	const char *digits = "0123456789abcdef";
-	char		buf[32];		/* bigger than needed, but reasonable */
+	char		buf[CONVERT_TO_BASE_BUFSIZE];
+	int			len;
+
+	Assert(base > 1);
+	Assert(base <= 16);
 
-	ptr = buf + sizeof(buf) - 1;
-	*ptr = '\0';
+	ptr = buf + sizeof(buf);
 
 	do
 	{
-		*--ptr = digits[value % HEXBASE];
-		value /= HEXBASE;
+		*--ptr = digits[value % base];
+		value /= base;
 	} while (ptr > buf && value);
 
-	PG_RETURN_TEXT_P(cstring_to_text(ptr));
+	len = CONVERT_TO_BASE_BUFSIZE - (ptr - buf);
+	return cstring_to_text_with_len(ptr, len);
+}
+
+#undef CONVERT_TO_BASE_BUFSIZE
+
+/*
+ * Convert an integer to a string containing a base-2 (binary) representation
+ * of the number.
+ */
+Datum
+to_binary32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT32(0);
+
+	PG_RETURN_TEXT_P(convert_to_base(value, 2));
+}
+Datum
+to_binary64(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT64(0);
+
+	PG_RETURN_TEXT_P(convert_to_base(value, 2));
 }
 
 /*
- * Convert an int64 to a string containing a base 16 (hex) representation of
+ * Convert an integer to a string containing a base-8 (oct) representation of
  * the number.
  */
 Datum
-to_hex64(PG_FUNCTION_ARGS)
+to_oct32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT32(0);
+
+	PG_RETURN_TEXT_P(convert_to_base(value, 8));
+}
+Datum
+to_oct64(PG_FUNCTION_ARGS)
 {
 	uint64		value = (uint64) PG_GETARG_INT64(0);
-	char	   *ptr;
-	const char *digits = "0123456789abcdef";
-	char		buf[32];		/* bigger than needed, but reasonable */
 
-	ptr = buf + sizeof(buf) - 1;
-	*ptr = '\0';
+	PG_RETURN_TEXT_P(convert_to_base(value, 8));
+}
 
-	do
-	{
-		*--ptr = digits[value % HEXBASE];
-		value /= HEXBASE;
-	} while (ptr > buf && value);
+/*
+ * Convert an integer to a string containing a base-16 (hex) representation of
+ * the number.
+ */
+Datum
+to_hex32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT32(0);
+
+	PG_RETURN_TEXT_P(convert_to_base(value, 16));
+}
+Datum
+to_hex64(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT64(0);
 
-	PG_RETURN_TEXT_P(cstring_to_text(ptr));
+	

Re: PG 16 draft release notes ready

2023-08-17 Thread Bruce Momjian
On Wed, Aug 16, 2023 at 10:36:05PM -0400, Bruce Momjian wrote:
> You can view the Postgres 16 release notes, with markup and links to our
> docs, here:
> 
>   https://momjian.us/pgsql_docs/release-16.html

FYI, thanks to this patch:

commit 78ee60ed84
Author: Tom Lane 
Date:   Mon Jan 9 15:08:24 2023 -0500

Doc: add XML ID attributes to  and  tags.

This doesn't have any external effect at the moment, but it
will allow adding useful link-discoverability features later.

Brar Piening, reviewed by Karl Pinc.

Discussion: 
https://postgr.es/m/CAB8KJ=jpuQU9QJe4+RgWENrK5g9jhoysMw2nvTN_esoOU0=a...@mail.gmail.com

I was able to add more links to the docs, and the links were more
precise.  I used to be frustrated I couldn't find nearby links to
content, but I had no such troubles this year.  I think the additional
and more precise links will help people digest the release notes more
efficiently.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: run pgindent on a regular basis / scripted manner

2023-08-17 Thread Nathan Bossart
On Wed, Aug 16, 2023 at 01:15:55PM -0700, Peter Geoghegan wrote:
> On Tue, Aug 15, 2023 at 1:31 PM Nathan Bossart  
> wrote:
>> Should we add those?  Patch attached.
> 
> I think that that makes sense.

Committed.

> I just don't want to normalize updating
> .git-blame-ignore-revs very frequently. (Actually, it's more like I
> don't want to normalize any scheme that makes updating the ignore list
> very frequently start to seem reasonable.)

Agreed.  I've found myself habitually running pgindent since becoming a
committer, but I'm sure I'll forget it one of these days.  From a quick
skim of this thread, it sounds like a pre-commit hook [0] might be the best
option at the moment.

[0] https://wiki.postgresql.org/wiki/Working_with_Git#Using_git_hooks

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Extending SMgrRelation lifetimes

2023-08-17 Thread Robert Haas
On Thu, Aug 17, 2023 at 1:11 AM Thomas Munro  wrote:
> Still WIP while I think about edge cases, but so far I think this is
> the better option.

I think this direction makes a lot of sense. The lack of a defined
lifetime for SMgrRelation objects makes correct programming difficult,
makes efficient programming difficult, and doesn't really have any
advantages. I know this is just a WIP patch but for the final version
I think it would make sense to try to do a bit more work on the
comments. For instance:

- In src/include/utils/rel.h, instead of just deleting that comment,
how about documenting the new object lifetime? Or maybe that comment
belongs elsewhere, but I think it would definitely good to spell it
out very explicitly at some suitable place.

- When we change smgrcloseall() to smgrdestroyall(), maybe it's worth
spelling out why destroying is needed and not just closing. For
example, the second hunk in bgwriter.c includes a comment that says
"After any checkpoint, close all smgr files. This is so we won't hang
onto smgr references to deleted files indefinitely." But maybe it
should say something like "After any checkpoint, close all smgr files
and destroy the associated smgr objects. This guarantees that we close
the actual file descriptors, that we close the File objects as managed
by fd.c, and that we also destroy the smgr objects. We don't want any
of these resources to stick around indefinitely after a relation file
has been deleted."

- Maybe it's worth adding comments around some of the smgrclose[all]
calls to mentioning that in those cases we want the file descriptors
(and File objects?) to get closed but don't want to invalidate
pointers. But I'm less sure that this is necessary. I don't want to
have a million comments everywhere, just enough that someone looking
at this stuff in the future can orient themselves about what's going
on without too much difficulty.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Allow Index AM scan API to access information about LIMIT clause

2023-08-17 Thread Peifeng Qiu
Hi hackers,

I'm developing an index AM for a specific user defined type. Details
about LIMIT clause can lead to different strategy during scan. So
"SELECT * FROM tbl ORDER BY col LIMIT 5" will have a different
code path to "SELECT * FROM tbl ORDER BY col LIMIT 500".
It's still the same IndexScan.

In planning phase, we have this via amcostestimate. But I don't
see a proper way in ambeginscan/amrescan/amgettuple. For
example,
ambeginscan_function: we have indexRelation, nkeys and norderbys.
amrescan_function: we have the IndexScanDesc built by beginscan,
and detailed info about the scan keys.
amgettuple_function: we have IndexScanDesc, and scan direction.
Maybe I miss some API please point out, thanks.

In FDW API, BeginForeignScan has ForeignScanState which
includes the whole plan. It's possible to find LIMIT clause.
So I propose adding a ScanState pointer to IndexScanDesc. In
IndexNext() populate this in IndexScanDesc after ambeginscan.
Then amrescan/amgettuple can adjust it's strategy with information
about LIMIT cluase, or more generally the whole plan tree. This
will make AM scan API on par with FDW API in my opinion. This
approach should be compatible with existing extensions if we place
the newly added pointer at the end of IndexScanDesc.

Another approach is adding a new API to IndexAmRoutine and
give the extension a way to access plan information. But this
doesn't seems to provide more benefits compare to the above
approach.

Thoughts?

Best regards,
Peifeng Qiu




Re: RFC: Logging plan of the running query

2023-08-17 Thread torikoshia

On 2023-06-16 01:34, James Coleman wrote:

Attached is v28
which sets ProcessLogQueryPlanInterruptActive to false in errfinish
when necessary. Once built with those two patches I'm simply running
`make check`.


With v28-0001 and v28-0002 patch, I confirmed backend processes consume 
huge

amount of memory and under some environments they were terminated by OOM
killer.

This was because memory was allocated from existing memory contexts and 
they

were not freed after ProcessLogQueryPlanInterrupt().
Updated the patch to use dedicated memory context for
ProcessLogQueryPlanInterrupt().

Applying attached patch and v28-0002 patch, `make check` successfully
completed after 20min and 50GB of logs on my environment.


On 2023-06-15 01:48, James Coleman wrote:
> The tests have been running since last night, but have been apparently
> hung now for many hours.


I don't know if this has anything to do with the hung you faced, but I 
thought

it might be possible that the large amount of memory usage resulted in
swapping, which caused a significant delay in processing.

If possible, I would be very grateful if you could try to reproduce this 
with

the v29 patch.


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom edadc6a9690a246209ccfbfec28af82a05f49a35 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 17 Aug 2023 22:11:31 +0900
Subject: [PATCH v29] Add function to log the plan of the query

Currently, we have to wait for the query execution to finish
to check its plan. This is not so convenient when
investigating long-running queries on production environments
where we cannot use debuggers.
To improve this situation, this patch adds
pg_log_query_plan() function that requests to log the
plan of the specified backend process.

By default, only superusers are allowed to request to log the
plans because allowing any users to issue this request at an
unbounded rate would cause lots of log messages and which can
lead to denial of service.

On receipt of the request, at the next CHECK_FOR_INTERRUPTS(),
the target backend logs its plan at LOG_SERVER_ONLY level, so
that these plans will appear in the server log but not be sent
to the client.

Reviewed-by: Bharath Rupireddy, Fujii Masao, Dilip Kumar,
Masahiro Ikeda, Ekaterina Sokolova, Justin Pryzby, Kyotaro
Horiguchi, Robert Treat, Alena Rybakina

Co-authored-by: James Coleman 
---
 doc/src/sgml/func.sgml   |  49 ++
 src/backend/access/transam/xact.c|  13 ++
 src/backend/catalog/system_functions.sql |   2 +
 src/backend/commands/explain.c   | 164 ++-
 src/backend/executor/execMain.c  |  14 ++
 src/backend/storage/ipc/procsignal.c |   4 +
 src/backend/storage/lmgr/lock.c  |   9 +-
 src/backend/tcop/postgres.c  |   4 +
 src/backend/utils/error/elog.c   |   2 +
 src/backend/utils/init/globals.c |   2 +
 src/include/catalog/pg_proc.dat  |   6 +
 src/include/commands/explain.h   |   4 +
 src/include/miscadmin.h  |   1 +
 src/include/storage/lock.h   |   2 -
 src/include/storage/procsignal.h |   1 +
 src/include/tcop/pquery.h|   2 +-
 src/include/utils/elog.h |   1 +
 src/test/regress/expected/misc_functions.out |  54 --
 src/test/regress/sql/misc_functions.sql  |  41 +++--
 19 files changed, 347 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..d2ead014d2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26243,6 +26243,25 @@ SELECT collation for ('foo' COLLATE "de_DE");

   
 
+  
+   
+
+ pg_log_query_plan
+
+pg_log_query_plan ( pid integer )
+boolean
+   
+   
+Requests to log the plan of the query currently running on the
+backend with specified process ID.
+It will be logged at LOG message level and
+will appear in the server log based on the log
+configuration set (See 
+for more information), but will not be sent to the client
+regardless of .
+
+  
+
   

 
@@ -26463,6 +26482,36 @@ LOG:  Grand total: 1651920 bytes in 201 blocks; 622360 free (88 chunks); 1029560
 because it may generate a large number of log messages.

 
+   
+pg_log_query_plan can be used
+to log the plan of a backend process. For example:
+
+postgres=# SELECT pg_log_query_plan(201116);
+ pg_log_query_plan
+---
+ t
+(1 row)
+
+The format of the query plan is the same as when VERBOSE,
+COSTS, SETTINGS and
+FORMAT TEXT are used in the EXPLAIN
+command. For example:
+
+LOG:  query plan running on backend with PID 201116 is:
+Query Text: SELECT * FROM pgbench_accounts;
+Seq Scan on public.pgbench_accounts  (cost=0.00..52787.00 rows=200 

Re: Allow parallel plan for referential integrity checks?

2023-08-17 Thread Frédéric Yhuel




On 8/17/23 14:00, Frédéric Yhuel wrote:



On 8/17/23 09:32, Frédéric Yhuel wrote:



On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
Recently I restored a database from a directory format backup and 
having this feature would have been quite useful


Hi,

Thanks for resuming work on this patch. I forgot to mention this in my 
original email, but the motivation was also to speed up the restore 
process. Parallelizing the FK checks could make a huge difference in 
certain cases. We should probably provide such a test case (with perf 
numbers), and maybe this is it what Robert asked for.


I have attached two scripts which demonstrate the following problems:



Let me add the plans for more clarity:

1a. if the tables aren't analyzed nor vacuumed before the post-data 
step, then they are index-only scanned, with a lot of heap fetches 
(depending on their size, the planner sometimes chooses a seq scan 
instead).




https://explain.dalibo.com/plan/7491ga22c5293683#raw

1b. if the tables have been analyzed but not vacuumed before the 
post-data-step, then they are scanned sequentially. Usually better, but 
still not so good without a parallel plan.




https://explain.dalibo.com/plan/e92c5161g880bdhf#raw

2. if the visibility maps have been created, then the tables are 
index-only scanned without heap fetches, but this can still be slower 
than a parallel seq scan.




https://explain.dalibo.com/plan/de22bdb4ggc3dffg#raw

And at last, with the patch applied : 
https://explain.dalibo.com/plan/54612a09ffh2565a#raw





Re: logical decoding and replication of sequences, take 2

2023-08-17 Thread Ashutosh Bapat
On Wed, Aug 16, 2023 at 7:56 PM Tomas Vondra
 wrote:
>
> >
> > But whether or not that's the case, downstream should not request (and
> > hence receive) any changes that have been already applied (and
> > committed) downstream as a principle. I think a way to achieve this is
> > to update the replorigin_session_origin_lsn so that a sequence change
> > applied once is not requested (and hence sent) again.
> >
>
> I guess we could update the origin, per attached 0004. We don't have
> timestamp to set replorigin_session_origin_timestamp, but it seems we
> don't need that.
>
> The attached patch merges the earlier improvements, except for the part
> that experimented with adding a "fake" transaction (which turned out to
> have a number of difficult issues).

0004 looks good to me. But I need to review the impact of not setting
replorigin_session_origin_timestamp.

What fake transaction experiment are you talking about?

--
Best Wishes,
Ashutosh Bapat




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-17 Thread Amit Kapila
On Thu, Aug 17, 2023 at 3:48 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > > * Added checks for output plugin libraries. pg_upgrade ensures that 
> > > plugins
> > >   referred by old slots were installed to the new executable directory.
> > >
> >
> > I think this is a good idea but did you test it with out-of-core
> > plugins, if so, can you please share the results? Also, let's update
> > this information in docs as well.
>
> I have not used other plugins, but forcibly renamed the shared object file.
> I would test by plugins like wal2json[1] if more cases are needed.
>
> 1. created logical replication slots on old node
>   SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding')
> 2. stopped the old nde
> 3. forcibly renamed the so file. I used following script:
>   sudo mv /path/to/test_decoding.so /path/to//test\"_decoding.so
> 4. executed pg_upgrade and failed. Outputs what I got were:
>
> ```
> Checking for presence of required libraries fatal
>

Your test sounds reasonable but there is no harm in testing wal2json
or some other plugin just to mimic the actual production scenario.
Additionally, it would give us better coverage for the patch by
testing out-of-core plugins for some other tests as well.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-17 Thread Amit Kapila
On Thu, Aug 17, 2023 at 6:07 PM Masahiko Sawada  wrote:
>
> On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila  wrote:
> >
> > On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada  
> > > > wrote:
> > > > > Another idea is (which might have already discussed thoguh) that we 
> > > > > check if the latest shutdown checkpoint LSN in the control file 
> > > > > matches the confirmed_flush_lsn in pg_replication_slots view. That 
> > > > > way, we can ensure that the slot has consumed all WAL records before 
> > > > > the last shutdown. We don't need to worry about WAL records generated 
> > > > > after starting the old cluster during the upgrade, at least for 
> > > > > logical replication slots.
> > > > >
> > > >
> > > > Right, this is somewhat closer to what Patch is already doing. But
> > > > remember in this case we need to remember and use the latest
> > > > checkpoint from the control file before the old cluster is started
> > > > because otherwise the latest checkpoint location could be even updated
> > > > during the upgrade. So, instead of reading from WAL, we need to change
> > > > so that we rely on the control file's latest LSN.
> > >
> > > Yes, I was thinking the same idea.
> > >
> > > But it works for only replication slots for logical replication. Do we
> > > want to check if no meaningful WAL records are generated after the
> > > latest shutdown checkpoint, for manually created slots (or non-logical
> > > replication slots)? If so, we would need to have something reading WAL
> > > records in the end.
> > >
> >
> > This feature only targets logical replication slots. I don't see a
> > reason to be different for manually created logical replication slots.
> > Is there something particular that you think we could be missing?
>
> Sorry I was not clear. I meant the logical replication slots that are
> *not* used by logical replication, i.e., are created manually and used
> by third party tools that periodically consume decoded changes. As we
> discussed before, these slots will never be able to pass that
> confirmed_flush_lsn check.
>

I think normally one would have a background process to periodically
consume changes. Won't one can use the walsender infrastructure for
their plugins to consume changes probably by using replication
protocol? Also, I feel it is the plugin author's responsibility to
consume changes or advance slot to the required position before
shutdown.

> After some thoughts, one thing we might
> need to consider is that in practice, the upgrade project is performed
> during the maintenance window and has a backup plan that revert the
> upgrade process, in case something bad happens. If we require the
> users to drop such logical replication slots, they cannot resume to
> use the old cluster in that case, since they would need to create new
> slots, missing some changes.
>

Can't one keep the backup before removing slots?

> Other checks in pg_upgrade seem to be
> compatibility checks that would eventually be required for the upgrade
> anyway. Do we need to consider this case? For example, we do that
> confirmed_flush_lsn check for only the slots with pgoutput plugin.
>

I think one is allowed to use pgoutput plugin even for manually
created slots. So, such a check may not work.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-17 Thread Masahiko Sawada
On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila  wrote:
>
> On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada  wrote:
> >
> > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila  wrote:
> > >
> > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada  
> > > wrote:
> > > > Another idea is (which might have already discussed thoguh) that we 
> > > > check if the latest shutdown checkpoint LSN in the control file matches 
> > > > the confirmed_flush_lsn in pg_replication_slots view. That way, we can 
> > > > ensure that the slot has consumed all WAL records before the last 
> > > > shutdown. We don't need to worry about WAL records generated after 
> > > > starting the old cluster during the upgrade, at least for logical 
> > > > replication slots.
> > > >
> > >
> > > Right, this is somewhat closer to what Patch is already doing. But
> > > remember in this case we need to remember and use the latest
> > > checkpoint from the control file before the old cluster is started
> > > because otherwise the latest checkpoint location could be even updated
> > > during the upgrade. So, instead of reading from WAL, we need to change
> > > so that we rely on the control file's latest LSN.
> >
> > Yes, I was thinking the same idea.
> >
> > But it works for only replication slots for logical replication. Do we
> > want to check if no meaningful WAL records are generated after the
> > latest shutdown checkpoint, for manually created slots (or non-logical
> > replication slots)? If so, we would need to have something reading WAL
> > records in the end.
> >
>
> This feature only targets logical replication slots. I don't see a
> reason to be different for manually created logical replication slots.
> Is there something particular that you think we could be missing?

Sorry I was not clear. I meant the logical replication slots that are
*not* used by logical replication, i.e., are created manually and used
by third party tools that periodically consume decoded changes. As we
discussed before, these slots will never be able to pass that
confirmed_flush_lsn check. After some thoughts, one thing we might
need to consider is that in practice, the upgrade project is performed
during the maintenance window and has a backup plan that revert the
upgrade process, in case something bad happens. If we require the
users to drop such logical replication slots, they cannot resume to
use the old cluster in that case, since they would need to create new
slots, missing some changes. Other checks in pg_upgrade seem to be
compatibility checks that would eventually be required for the upgrade
anyway. Do we need to consider this case? For example, we do that
confirmed_flush_lsn check for only the slots with pgoutput plugin.

> > >
> > > Yet another thing I am trying to consider is whether we can allow to
> > > upgrade slots from 16 or 15 to later versions. As of now, the patch
> > > has the following check:
> > > getLogicalReplicationSlots()
> > > {
> > > ...
> > > + /* Check whether we should dump or not */
> > > + if (fout->remoteVersion < 17)
> > > + return;
> > > ...
> > > }
> > >
> > > If we decide to use the existing view pg_replication_slots then can we
> > > consider upgrading slots from the prior version to 17? Now, if we want
> > > to invent any new API similar to pg_replslotdata then we can't do this
> > > because it won't exist in prior versions but OTOH using existing view
> > > pg_replication_slots can allow us to fetch slot info from older
> > > versions as well. So, I think it is worth considering.
> >
> > I think that without 0001 patch the replication slots will not be able
> > to pass the confirmed_flush_lsn check.
> >
>
> Right, but we can think of backpatching the same. Anyway, we can do
> that as a separate work by starting a new thread to see if there is a
> broader agreement for backpatching such a change. For now, we can
> focus on >=v17.
>

Agreed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: walsender "wakeup storm" on PG16, likely because of bc971f4025c (Optimize walsender wake up logic using condition variables)

2023-08-17 Thread Antonin Houska
Thomas Munro  wrote:

> On Wed, Aug 16, 2023 at 11:18 PM Antonin Houska  wrote:
> > I try to understand this patch (commit 5ffb7c7750) because I use condition
> > variable in an extension. One particular problem occured to me, please
> > consider:
> >
> > ConditionVariableSleep() gets interrupted, so AbortTransaction() calls
> > ConditionVariableCancelSleep(), but the signal was sent in between. 
> > Shouldn't
> > at least AbortTransaction() and AbortSubTransaction() check the return value
> > of ConditionVariableCancelSleep(), and re-send the signal if needed?
> 
> I wondered about that in the context of our only in-tree user of
> ConditionVariableSignal(), in parallel btree index creation, but since
> it's using the parallel executor infrastructure, any error would be
> propagated everywhere so all waits would be aborted.

I see, ConditionVariableSignal() is currently used only to signal other
workers running in the same transactions. The other parts use
ConditionVariableBroadcast(), so no consumer should miss its signal.

> > Note that I'm just thinking about such a problem, did not try to reproduce 
> > it.
> 
> Hmm.  I looked for users of ConditionVariableSignal() in the usual web
> tools and didn't find anything, so I guess your extension is not
> released yet or not open source.  I'm curious: what does it actually
> do if there is an error in a CV-wakeup-consuming backend?  I guess it
> might be some kind of work-queue processing system... it seems
> inevitable that if backends are failing with errors, and you don't
> respond by retrying/respawning, you'll lose or significantly delay
> jobs/events/something anyway (imagine only slightly different timing:
> you consume the signal and start working on a job and then ereport,
> which amounts to the same thing in the end now that your transaction
> is rolled back?), and when you retry you'll see whatever condition was
> waited for anyway.  But that's just me imagining what some
> hypothetical strawman system might look like...  what does it really
> do?

If you're interested, the extension is pg_squeeze [1]. I think the use case is
rather special. All the work is done by a background worker, but an user
function can be called to submit a "task" for the worker and wait for its
completion. So the function sleeps on a CV and the worker uses the CV to wake
it up. If this function ends due to ERROR, the user is supposed to find a log
message in the worker output sooner or later. It may sound weird, but that
function exists primarily for regression tests, so ERROR is a problem anyway.

> (FWIW when I worked on a couple of different work queue-like systems
> and tried to use ConditionVariableSignal() I eventually concluded that
> it was the wrong tool for the job because its wakeup order is
> undefined.  It's actually FIFO, but I wanted LIFO so that workers have
> a chance to become idle and reduce the pool size, but I started to
> think that once you want that level of control you really want to
> build a bespoke wait list system, so I never got around to proposing
> that we consider changing that.)
> 
> Our condition variables are weird.  They're not associated with a
> lock, so we made start-of-wait non-atomic: prepare first, then return
> control and let the caller check its condition, then sleep.  Typical
> user space condition variable APIs force you to acquire some kind of
> lock that protects the condition first, then check the condition, then
> atomically release-associated-lock-and-start-sleeping, so there is no
> data race but also no time where control is returned to the caller but
> the thread is on the wait list consuming signals.  That choice has
> some pros (you can choose whatever type of lock you want to protect
> your condition, or none at all if you can get away with memory
> barriers and magic) and cons..  However, as I think Andres was getting
> at, having a non-atomic start-of-wait doesn't seem to require us to
> have a non-atomic end-of-wait and associated extra contention.  So
> maybe we should figure out how to fix that in 17.

Thanks for sharing your point of view. I'm fine with this low-level approach:
it's well documented and there are examples in the PG code showing how it
should be used :-)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

https://github.com/cybertec-postgresql/pg_squeeze/




Re: Allow parallel plan for referential integrity checks?

2023-08-17 Thread Frédéric Yhuel



On 8/17/23 09:32, Frédéric Yhuel wrote:



On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
Recently I restored a database from a directory format backup and 
having this feature would have been quite useful


Hi,

Thanks for resuming work on this patch. I forgot to mention this in my 
original email, but the motivation was also to speed up the restore 
process. Parallelizing the FK checks could make a huge difference in 
certain cases. We should probably provide such a test case (with perf 
numbers), and maybe this is it what Robert asked for.


I have attached two scripts which demonstrate the following problems:

1a. if the tables aren't analyzed nor vacuumed before the post-data 
step, then they are index-only scanned, with a lot of heap fetches 
(depending on their size, the planner sometimes chooses a seq scan instead).


1b. if the tables have been analyzed but not vacuumed before the 
post-data-step, then they are scanned sequentially. Usually better, but 
still not so good without a parallel plan.


2. if the visibility maps have been created, then the tables are 
index-only scanned without heap fetches, but this can still be slower 
than a parallel seq scan.


So it would be nice if pg_restore could vacuum analyze the tables before 
the post-data step. I believe it would be faster in most cases.


And it would be nice to allow a parallel plan for RI checks.

Best regards,
Frédéric

create_and_fill_tables.sql
Description: application/sql


post_data.sql
Description: application/sql


Re: meson: Non-feature feature options

2023-08-17 Thread Nazir Bilal Yavuz
Hi,

I was looking at older threads and found that. There were failures
when rebasing v1-uuid patch to master so I updated it and also added
documentation. I attached the v2 of the patch. I am sending v2 patch
to this thread but should I create a new thread for uuid patch?

Regards,
Nazir Bilal Yavuz
Microsoft
From 83f3222f83b5bb508fc7ce7140731f742e84f166 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Thu, 17 Aug 2023 13:08:32 +0300
Subject: [PATCH v2] meson: Make auto the default of the uuid option

The 'uuid' option is of type 'combo', but we add a choice 'auto' that
simulates the behavior of a feature option. This way, uuid is used
automatically by default if present, but we retain the ability to
potentially select another uuid library.

uuid search order is 'e2fs', 'bsd', 'ossp'. Because, docs [1] states
that OSSP uuid library is not well maintained, and is becoming
increasingly difficult to port to newer platforms; so we can put 'ossp'
to the end. Between 'e2fs' and 'bsd', 'e2fs' is used more often
than 'bsd'. Hence, they can be ordered as 'e2fs', 'bsd', 'ossp'.

[1] https://www.postgresql.org/docs/current/uuid-ossp.html
---
 .cirrus.yml|  5 +-
 doc/src/sgml/installation.sgml | 19 +++-
 meson.build| 87 --
 meson_options.txt  |  4 +-
 src/makefiles/meson.build  |  2 +-
 5 files changed, 83 insertions(+), 34 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index 314ae2d804b..c39f06c893b 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -181,7 +181,7 @@ task:
 su postgres <<-EOF
   meson setup \
 --buildtype=debug \
--Dcassert=true -Duuid=bsd -Dtcl_version=tcl86 -Ddtrace=auto \
+-Dcassert=true -Dtcl_version=tcl86 -Ddtrace=auto \
 -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
 -Dextra_lib_dirs=/usr/local/lib -Dextra_include_dirs=/usr/local/include/ \
 build
@@ -243,7 +243,6 @@ LINUX_CONFIGURE_FEATURES: _CONFIGURE_FEATURES >-
 
 LINUX_MESON_FEATURES: _MESON_FEATURES >-
   -Dllvm=enabled
-  -Duuid=e2fs
 
 
 task:
@@ -509,7 +508,7 @@ task:
   -Dextra_include_dirs=${brewpath}/include \
   -Dextra_lib_dirs=${brewpath}/lib \
   -Dcassert=true \
-  -Duuid=e2fs -Ddtrace=auto \
+  -Ddtrace=auto \
   -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \
   build
 
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index ac8eee47c66..59722356475 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -2574,7 +2574,7 @@ ninja install
  
 
  
-  -Duuid=LIBRARY
+  -Duuid={ auto | none | LIBRARY }
   

 Build the  module
@@ -2585,7 +2585,22 @@ ninja install

 
  
-  none to not build the uuid module. This is the default.
+  auto to select UUID library module automatically between
+  e2fs, bsd, ossp. This is the default.
+  
+  
+  Search order is e2fs, bsd, ossp. Because,
+  ossp UUID library is not well maintained; and is
+  becoming increasingly difficult to port to newer platforms. So,
+  ossp will be last option. Between
+  e2fs and bsd, e2fs is
+  used more often than bsd. Hence, they ordered as
+  e2fs, bsd, ossp.
+ 
+
+
+ 
+  none to not build the uuid module.
  
 
 
diff --git a/meson.build b/meson.build
index f5ec442f9a9..3f0067e4754 100644
--- a/meson.build
+++ b/meson.build
@@ -1333,36 +1333,71 @@ endif
 ###
 
 uuidopt = get_option('uuid')
+uuid_library = 'none'
+uuid = not_found_dep
+if uuidopt == 'auto' and auto_features.disabled()
+  uuidopt = 'none'
+endif
+
 if uuidopt != 'none'
-  uuidname = uuidopt.to_upper()
-  if uuidopt == 'e2fs'
-uuid = dependency('uuid', required: true)
-uuidfunc = 'uuid_generate'
-uuidheader = 'uuid/uuid.h'
-  elif uuidopt == 'bsd'
-# libc should have uuid function
-uuid = declare_dependency()
-uuidfunc = 'uuid_to_string'
-uuidheader = 'uuid.h'
-  elif uuidopt == 'ossp'
-uuid = dependency('ossp-uuid', required: true)
-uuidfunc = 'uuid_export'
-uuidheader = 'uuid.h'
-  else
-error('unknown uuid build option value: @0@'.format(uuidopt))
-  endif
 
-  if not cc.has_header_symbol(uuidheader, uuidfunc, args: test_c_args, dependencies: uuid)
-error('uuid library @0@ missing required function @1@'.format(uuidopt, uuidfunc))
-  endif
-  cdata.set('HAVE_@0@'.format(uuidheader.underscorify().to_upper()), 1)
+  # uuid search order is 'e2fs', 'bsd', 'ossp'. Because, docs [1] states
+  # that OSSP uuid library is not well maintained, and is becoming
+  # increasingly difficult to port to newer platforms; so we can put 'ossp'
+  # to the end. Between 'e2fs' and 'bsd', 'e2fs' is used more often
+  # than 'bsd'. Hence, they can be ordered as 'e2fs', 'bsd', 'ossp'.
+  

Re: Synchronizing slots from primary to standby

2023-08-17 Thread shveta malik
On Thu, Aug 17, 2023 at 11:55 AM shveta malik  wrote:
>
> On Thu, Aug 17, 2023 at 11:44 AM Drouvot, Bertrand
>  wrote:
> >
> > Hi,
> >
> > On 8/14/23 11:52 AM, shveta malik wrote:
> >
> > >
> > > We (myself and Ajin) performed the tests to compute the lag in standby
> > > slots as compared to primary slots with different number of slot-sync
> > > workers configured.
> > >
> >
> > Thanks!
> >
> > > 3 DBs were created, each with 30 tables and each table having one
> > > logical-pub/sub configured. So this made a total of 90 logical
> > > replication slots to be synced. Then the workload was run for aprox 10
> > > mins. During this workload, at regular intervals, primary and standby
> > > slots' lsns were captured (from pg_replication_slots) and compared. At
> > > each capture, the intent was to know how much is each standby's slot
> > > lagging behind corresponding primary's slot by taking the distance
> > > between confirmed_flush_lsn of primary and standby slot. Then we took
> > > the average (integer value) of this distance over the span of 10 min
> > > workload
> >
> > Thanks for the explanations, make sense to me.
> >
> > > and this is what we got:
> > >
> > > With max_slot_sync_workers=1, average-lag =  42290.3563
> > > With max_slot_sync_workers=2, average-lag =  24585.1421
> > > With max_slot_sync_workers=3, average-lag =  14964.9215
> > >
> > > This shows that more workers have better chances to keep logical
> > > replication slots in sync for this case.
> > >
> >
> > Agree.
> >
> > > Another statistics if it interests you is, we ran a frequency test as
> > > well (this by changing code, unit test sort of) to figure out the
> > > 'total number of times synchronization done' with different number of
> > > sync-slots workers configured. Same 3 DBs setup with each DB having 30
> > > logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> > > and 3; total number of times synchronization done was 15874, 20205 and
> > > 23414 respectively. Note: this is not on the same machine where we
> > > captured lsn-gap data, it is on  a little less efficient machine but
> > > gives almost the same picture
> > >
> > > Next we are planning to capture this data for a lesser number of slots
> > > like 10,30,50 etc. It may happen that the benefit of multi-workers
> > > over single workers in such cases could be less, but let's have the
> > > data to verify that.
> > >
> >
> > Thanks a lot for those numbers and for the testing!
> >
> > Do you think it would make sense to also get the number of using
> > the pg_failover_slots module? (and compare the pg_failover_slots numbers 
> > with the
> > "one worker" case here). Idea is to check if the patch does introduce
> > some overhead as compare to pg_failover_slots.
> >
>
> Yes, definitely. We will work on that and share the numbers soon.
>

We are working on these tests. Meanwhile attaching the patches which
attempt to implement below functionalities:

1) Remove extra slots on standby if those no longer exist on primary
or are no longer part of synchronize_slot_names.
2) Make synchronize_slot_names user-modifiable. And due to change in
'synchronize_slot_names', if dbids list is reduced, then take care of
removal of extra/old db-ids (if any) from workers db-list.

Thanks Ajin for working on 1. Both the above changes are in
patch-0002. There is a test failure in the recovery module due to
these new changes, I am looking into it and will fix it in the next
version.

Improvements in pipeline:
a) standby slots should not be consumable.
b) optimization of query which standby sends to primary. Currently it
has dbid filter and slot-name filter. Slot-name filter can be
optimized to have only those slots which belong to DBs assigned to the
worker rather than having all 'synchronize_slot_names'.
c) analyze if the naptime of the slot-sync worker can be auto-tuned.
If there is no activity going on (i.e. slots are not advancing on
primary) then increase naptime of slot-sync worker on standby and
decrease it again when activity starts.

thanks
Shveta


v12-0001-Allow-logical-walsenders-to-wait-for-physical-st.patch
Description: Binary data


v12-0002-Add-logical-slot-sync-capability-to-physical-sta.patch
Description: Binary data


Re: POC, WIP: OR-clause support for indexes

2023-08-17 Thread a.rybakina
Sorry, I didn't write correctly enough, about the second second place in 
the code where the conversion works well enough is the removal of 
duplicate OR expressions.


I attached patch to learn it in more detail.

On 17.08.2023 13:08, a.rybakina wrote:

Hi, all!

The optimizer will itself do a limited form of "normalizing to CNF".
Are you familiar with extract_restriction_or_clauses(), from
orclauses.c? Comments above the function have an example of how this
can work:

  * Although a join clause must reference multiple relations overall,
  * an OR of ANDs clause might contain sub-clauses that reference 
just one
  * relation and can be used to build a restriction clause for that 
rel.

  * For example consider
  *  WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45));
  * We can transform this into
  *  WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45))
  *  AND (a.x = 42 OR a.x = 44)
  *  AND (b.y = 43 OR b.z = 45);
  * which allows the latter clauses to be applied during the scans 
of a and b,
  * perhaps as index qualifications, and in any case reducing the 
number of
  * rows arriving at the join.  In essence this is a partial 
transformation to
  * CNF (AND of ORs format).  It is not complete, however, because 
we do not
  * unravel the original OR --- doing so would usually bloat the 
qualification

  * expression to little gain.
This is an interesting feature. I didn't notice this function before, 
I studied many times consider_new_or_cause, which were called there. 
As far as I know, there is a selectivity calculation going on there, 
but as far as I remember, I called it earlier after my conversion, 
and unfortunately it didn't solve my problem with calculating 
selectivity. I'll reconsider it again, maybe I can find something I 
missed.

Of course this immediately makes me wonder: shouldn't your patch be
able to perform an additional transformation here? You know, by
transforming "a.x = 42 OR a.x = 44" into "a IN (42, 44)"? Although I
haven't checked for myself, I assume that this doesn't happen right
now, since your patch currently performs all of its transformations
during parsing.

I also noticed that the same comment block goes on to say something
about "clauselist_selectivity's inability to recognize redundant
conditions". Perhaps that is relevant to the problems you were having
with selectivity estimation, back when the code was in
preprocess_qual_conditions() instead? I have no reason to believe that
there should be any redundancy left behind by your transformation, so
this is just one possibility to consider.
Separately, the commit message of commit 25a9e54d2d says something
about how the planner builds RestrictInfos, which seems
possibly-relevant. That commit enhanced extended statistics for OR
clauses, so the relevant paragraph describes a limitation of extended
statistics with OR clauses specifically. I'm just guessing, but it
still seems like it might be relevant to the problem you ran into with
selectivity estimation. Another possibility to consider.


I understood what is said about AND clauses in this comment. It seems 
to me that AND clauses saved like (BoolExpr *) 
expr->args->(RestrictInfo *) clauseA->(RestrictInfo *)clauseB lists 
and OR clauses saved like (BoolExpr *) expr -> 
orclause->(RestrictInfo *)clause A->(RestrictInfo *)clause B.


As I understand it, selectivity is calculated for each expression. 
But I'll exploring it deeper, because I think this place may contain 
the answer to the question, what's wrong with selectivity calculation 
in my patch.


I could move transformation in there (extract_restriction_or_clauses) 
and didn't have any problem with selectivity calculation, besides it 
also works on the redundant or duplicates stage. So, it looks like:


CREATE TABLE tenk1 (unique1 int, unique2 int, ten int, hundred int); 
insert into tenk1 SELECT x,x,x,x FROM generate_series(1,5) as x; 
CREATE INDEX a_idx1 ON tenk1(unique1); CREATE INDEX a_idx2 ON 
tenk1(unique2); CREATE INDEX a_hundred ON tenk1(hundred);


explain analyze select * from tenk1 a join tenk1 b on ((a.unique2 = 3 
or a.unique2 = 7));


PLAN 
-- 
Nested Loop (cost=0.29..2033.62 rows=10 width=32) (actual 
time=0.090..60.258 rows=10 loops=1) -> Seq Scan on tenk1 b 
(cost=0.00..771.00 rows=5 width=16) (actual time=0.016..9.747 
rows=5 loops=1) -> Materialize (cost=0.29..12.62 rows=2 width=16) 
(actual time=0.000..0.000 rows=2 loops=5) -> Index Scan using 
a_idx2 on tenk1 a (cost=0.29..12.62 rows=2 width=16) (actual 
time=0.063..0.068 rows=2 loops=1) Index Cond: (unique2 = ANY (ARRAY[3, 
7])) Planning Time: 8.257 ms Execution Time: 64.453 ms (7 rows)


Overall, this was due to incorrectly defined types of elements in the 
array, and if we had applied the transformation with the definition of 
the tup operator, we could 

Re: POC, WIP: OR-clause support for indexes

2023-08-17 Thread a.rybakina

Hi, all!

The optimizer will itself do a limited form of "normalizing to CNF".
Are you familiar with extract_restriction_or_clauses(), from
orclauses.c? Comments above the function have an example of how this
can work:

  * Although a join clause must reference multiple relations overall,
  * an OR of ANDs clause might contain sub-clauses that reference 
just one

  * relation and can be used to build a restriction clause for that rel.
  * For example consider
  *  WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45));
  * We can transform this into
  *  WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45))
  *  AND (a.x = 42 OR a.x = 44)
  *  AND (b.y = 43 OR b.z = 45);
  * which allows the latter clauses to be applied during the scans of 
a and b,
  * perhaps as index qualifications, and in any case reducing the 
number of
  * rows arriving at the join.  In essence this is a partial 
transformation to
  * CNF (AND of ORs format).  It is not complete, however, because we 
do not
  * unravel the original OR --- doing so would usually bloat the 
qualification

  * expression to little gain.
This is an interesting feature. I didn't notice this function before, 
I studied many times consider_new_or_cause, which were called there. 
As far as I know, there is a selectivity calculation going on there, 
but as far as I remember, I called it earlier after my conversion, and 
unfortunately it didn't solve my problem with calculating selectivity. 
I'll reconsider it again, maybe I can find something I missed.

Of course this immediately makes me wonder: shouldn't your patch be
able to perform an additional transformation here? You know, by
transforming "a.x = 42 OR a.x = 44" into "a IN (42, 44)"? Although I
haven't checked for myself, I assume that this doesn't happen right
now, since your patch currently performs all of its transformations
during parsing.

I also noticed that the same comment block goes on to say something
about "clauselist_selectivity's inability to recognize redundant
conditions". Perhaps that is relevant to the problems you were having
with selectivity estimation, back when the code was in
preprocess_qual_conditions() instead? I have no reason to believe that
there should be any redundancy left behind by your transformation, so
this is just one possibility to consider.
Separately, the commit message of commit 25a9e54d2d says something
about how the planner builds RestrictInfos, which seems
possibly-relevant. That commit enhanced extended statistics for OR
clauses, so the relevant paragraph describes a limitation of extended
statistics with OR clauses specifically. I'm just guessing, but it
still seems like it might be relevant to the problem you ran into with
selectivity estimation. Another possibility to consider.


I understood what is said about AND clauses in this comment. It seems 
to me that AND clauses saved like (BoolExpr *) 
expr->args->(RestrictInfo *) clauseA->(RestrictInfo *)clauseB lists 
and OR clauses saved like (BoolExpr *) expr -> orclause->(RestrictInfo 
*)clause A->(RestrictInfo *)clause B.


As I understand it, selectivity is calculated for each expression. But 
I'll exploring it deeper, because I think this place may contain the 
answer to the question, what's wrong with selectivity calculation in 
my patch.


I could move transformation in there (extract_restriction_or_clauses) 
and didn't have any problem with selectivity calculation, besides it 
also works on the redundant or duplicates stage. So, it looks like:


CREATE TABLE tenk1 (unique1 int, unique2 int, ten int, hundred int); 
insert into tenk1 SELECT x,x,x,x FROM generate_series(1,5) as x; 
CREATE INDEX a_idx1 ON tenk1(unique1); CREATE INDEX a_idx2 ON 
tenk1(unique2); CREATE INDEX a_hundred ON tenk1(hundred);


explain analyze select * from tenk1 a join tenk1 b on ((a.unique2 = 3 or 
a.unique2 = 7));


PLAN 
-- 
Nested Loop (cost=0.29..2033.62 rows=10 width=32) (actual 
time=0.090..60.258 rows=10 loops=1) -> Seq Scan on tenk1 b 
(cost=0.00..771.00 rows=5 width=16) (actual time=0.016..9.747 
rows=5 loops=1) -> Materialize (cost=0.29..12.62 rows=2 width=16) 
(actual time=0.000..0.000 rows=2 loops=5) -> Index Scan using a_idx2 
on tenk1 a (cost=0.29..12.62 rows=2 width=16) (actual time=0.063..0.068 
rows=2 loops=1) Index Cond: (unique2 = ANY (ARRAY[3, 7])) Planning Time: 
8.257 ms Execution Time: 64.453 ms (7 rows)


Overall, this was due to incorrectly defined types of elements in the 
array, and if we had applied the transformation with the definition of 
the tup operator, we could have avoided such problems (I used 
make_scalar_array_op and have not yet found an alternative to this).


When I moved the transformation on the index creation stage, it couldn't 
work properly and as a result I faced the same problem of selectivity 

Re: [PATCH] Add native windows on arm64 support

2023-08-17 Thread Anthony Roberts

Hi All,

Just following up on this, has there been any movement?

I did see another message go into the thread here with no reply: 
https://www.postgresql.org/message-id/ZKPLjLjIjwN2lxkg%40paquier.xyz


Thanks,
Anthony

On 14/06/2023 13:25, Niyas Sait wrote:

Hello,

Just wanted to give a heads up that I am moving to a new role outside 
Linaro and as a result wouldn't be able to continue contributing.


Anthony Roberts from Linaro is going to support the enablement work.






Query regarding sharing data directory

2023-08-17 Thread Harsh N Bhatt
Respected Sir/Ma'am,
I have a doubt regarding the storage of the data directory path of
PostgreSQL. I'm using PostgreSQL server 15 in which I want to give a path
to an external drive RAID Memory Storage. Which is on the LAN Network? Is
it compatible or not? If it is then which file system is suitable: NAS or
SAN?

If it is, can you share any documents with me?

Regards
Harsh


Re: Extract numeric filed in JSONB more effectively

2023-08-17 Thread Andy Fan
Hi jian:

On Thu, Aug 17, 2023 at 12:32 AM jian he 
wrote:

> On Wed, Aug 16, 2023 at 2:28 PM Andy Fan  wrote:
> >
> > update with the correct patch..
>
> regression=# select proname, pg_catalog.pg_get_function_arguments(oid)
> from pg_proc
> where proname =  'jsonb_extract_path_type';
>  proname | pg_get_function_arguments
>
> -+
>  jsonb_extract_path_type | from_json jsonb, VARIADIC path_elems
> text[], target_oid anyelement
> (1 row)
>
> VARIADIC should be the last argument?
>

Currently if users call this function directly(usually I don't  think
so), they will get something wrong.   This issue is fixed in the
v9 version.  To keep the consistency among all the functions,
I moved the 'target_type anyelement' to the 1st argument.
Thanks for the report!


> select jsonb_array_element_type(jsonb'[1231]',0, null::int);
> now return null.
> Should it return 1231?
>

No, this is by design. the function is declared as strict, so
any NULL inputs yield a NULL output.  That's just what we
talked above (the markDummyConst section).  I don't
think this should be addressed.


> select jsonb_array_element_type(jsonb'[1231]',0, '1'::jsonb);
> will crash
>

OK,  looks I didn't pay enough attention to the 'user directly call
jsonb_xx_type' function, so I changed the code in v9 based on
your suggestion.

Thanks for the review,  v9 attached!

-- 
Best Regards
Andy Fan


test.sql
Description: Binary data


v9-0001-optimize-casting-jsonb-to-a-given-type.patch
Description: Binary data


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-17 Thread Peter Smith
Here are some review comments for the first 2 patches.

(There are a couple of overlaps with what Hou-san already wrote review
comments about)

For patch v21-0001...

==
1. SaveSlotToPath

- /* and don't do anything if there's nothing to write */
- if (!was_dirty)
+ /*
+ * and don't do anything if there's nothing to write, unless it's this is
+ * called for a logical slot during a shutdown checkpoint, as we want to
+ * persist the confirmed_flush_lsn in that case, even if that's the only
+ * modification.
+ */
+ if (!was_dirty && (SlotIsPhysical(slot) || !is_shutdown))
  return;

The condition seems to be coded in a slightly awkward way when
compared to how the comment was worded.

How about:
if (!was_dirty && !(SlotIsLogical(slot) && is_shutdown))


//

For patch v21-0002...

==
Commit Message

1.

For pg_upgrade, it query the logical replication slots information from the old
cluter and restores the slots using the pg_create_logical_replication_slots()
statements. Note that we need to separate the timing of restoring replication
slots and other objects. Replication slots, in  particular, should not be
restored before executing the pg_resetwal command because it will remove WALs
that are required by the slots.

~

Revisit this paragraph. There are lots of typos etc.

1a.
"For pg_upgrade". I think this wording is a hangover from back when
the patch was split into two parts for pg_dump and pg_upgrade, but now
it seems strange.

~
1b.
/cluter/cluster/

~
1c
/because it/because pg_resetwal/

==
src/sgml/ref/pgupgrade.sgml

2.

+   
+Prepare for publisher upgrades
+
+
+ pg_upgrade try to dump and restore logical
+ replication slots. This helps avoid the need for manually defining the
+ same replication slot on the new publisher.
+
+

2a.
/try/attempts to/ ??

~
2b.
Is "dump" the right word here? I didn't see dumping happening in the
patch anymore.

~~~

3.

+
+ Before you start upgrading the publisher node, ensure that the
+ subscription is temporarily disabled. After the upgrade is complete,
+ execute the
+ ALTER
SUBSCRIPTION ... DISABLE
+ command to update the connection string, and then re-enable the
+ subscription.
+

3a.
That link made no sense in this context.

Don't you mean to say:
ALTER SUBSCRIPTION ... CONNECTION ...

~

3b.
Hmm. I wonder now did you *also* mean to describe how to disable? For example:

Before you start upgrading the publisher node, ensure that the
subscription is temporarily disabled, by executing
ALTER SUBSCRIPTION ...
DISABLE.

~~~

4.

+
+
+ Upgrading slots has some settings. At first, all the slots must not be in
+ lost, and they must have consumed all the WALs on old
+ node. Furthermore, new node must have larger
+ max_replication_slots
+ than existing slots on old node, and
+ wal_level must be
+ logical. pg_upgrade will
+ run error if something wrong.
+
+   
+

4a.
"At first, all the slots must not be in lost"

Apart from being strangely worded, I was not familiar with what it
meant to say "must not be in lost". Will this be meaningful to the
user?

IMO this should have more description, e.g. including mentioning the
"wal_status" attribute with the appropriate link to
https://www.postgresql.org/docs/current/view-pg-replication-slots.html

~

4b.
BEFORE
Upgrading slots has some settings. ...
pg_upgrade will run error if something
wrong.

SUGGESTION
There are some prerequisites for pg_upgrade
to be able to upgrade the replication slots. If these are not met an
error will be reported.

~

4c.
Wondered if this list of prerequisites might be better presented as an
SGML list.

==
src/bin/pg_upgrade/check.c

5.
 extern char *output_files[];
+extern int num_slots_on_old_cluster;

~

IMO something feels not quite right about having this counter floating
around as a global variable.

Shouldn't this instead be a field member of the old_cluster. That
seems to be the normal way to hold the cluster-wise info.

~~~

6. check_new_cluster_is_empty

  RelInfoArr *rel_arr = _cluster.dbarr.dbs[dbnum].rel_arr;
+ DbInfo *pDbInfo = _cluster.dbarr.dbs[dbnum];
+ LogicalSlotInfoArr *slot_arr = >slot_arr;

IIRC I previously suggested adding this 'pDbInfo' variable because
there are several places that can make use of it.

You are using it only in the NEW code, but did not replace the
existing other code to make use of it:
pg_fatal("New cluster database \"%s\" is not empty: found relation \"%s.%s\"",
new_cluster.dbarr.dbs[dbnum].db_name,

~~~

7. check_for_logical_replication_slots

+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)
+{
+ PGresult   *res;
+ PGconn*conn = connectToServer(new_cluster, "template1");
+ int max_replication_slots;
+ char*wal_level;
+
+ /* logical replication slots can be dumped since PG17. */
+ if 

Re: pg_upgrade - typo in verbose log

2023-08-17 Thread Peter Smith
On Thu, May 11, 2023 at 7:09 PM Peter Smith  wrote:
>
> On Thu, May 11, 2023 at 6:30 PM Daniel Gustafsson  wrote:
> >
> > > On 11 May 2023, at 07:41, Peter Smith  wrote:
> >
> > > While reviewing another patch for the file info.c, I noticed some
> > > misplaced colon (':') in the verbose logs for print_rel_infos().
> >
> > That spelling was introduced in c2e9b2f28818 which was the initial import of
> > pg_upgrade into contrib/ for the 9.0 release (at that time the function was
> > relarr_print() which via a few other names was renamed to print_rel_infos() 
> > in
> > 0a5f1199319).
> >
> > It's not entirely clear to me if the current spelling is a mistake or
> > intentional, but I do agree that your version is an improvement.
> >
> > To be consistent with other log output in pg_upgrade we should probably also
> > wrap the relname and reltblspace in quotes as \"%s.%s\" and \"%s\" (and the
> > Database in print_db_infos()).
> >
>
> Thanks for checking, and for the feedback.
>
> PSA patch v2 updated as suggested.
>

Ping.

I thought v2 was ready to be pushed, but then this thread went silent
for 3 months

??

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: New WAL record to detect the checkpoint redo location

2023-08-17 Thread Dilip Kumar
On Thu, Aug 17, 2023 at 10:52 AM Michael Paquier  wrote:
>
> On Tue, Aug 15, 2023 at 02:23:43PM +0530, Dilip Kumar wrote:
> > Yeah, good idea, actually we can do this insert outside of the
> > exclusive insert lock and set the LSN of this insert as the
> > checkpoint. redo location.  So now we do not need to compute the
> > checkpoint. redo based on the current insertion point we can directly
> > use the LSN of this record.  I have modified this and I have also
> > moved some other code that is not required to be inside the WAL
> > insertion lock.
>
> Looking at this patch, I am bit surprised to see that the redo point
> maps with the end location of the CHECKPOINT_REDO record as it is the
> LSN returned by XLogInsert(), not its start LSN.

Yeah right, actually I was confused, I assumed it will return the
start LSN of the record.  And I do not see any easy way to identify
the Start LSN of this record so maybe this solution will not work.  I
will have to think of something else.  Thanks for pointing it out.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: WIP: new system catalog pg_wait_event

2023-08-17 Thread Masahiro Ikeda

Hi,

On 2023-08-17 14:53, Drouvot, Bertrand wrote:

On 8/17/23 3:53 AM, Masahiro Ikeda wrote:

1)

The regular expression needs to be changed in 
generate-wait_event_types.pl.

I have compared the documentation with the output of the pg_wait_event
view and found the following differences.

* For parameter names, the substitution for underscore is missing.
-ArchiveCleanupCommand Waiting for archive_cleanup_command to complete
-ArchiveCommand Waiting for archive_command to complete
+ArchiveCleanupCommand Waiting for archive-cleanup-command to complete
+ArchiveCommand Waiting for archive-command to complete
-RecoveryEndCommand Waiting for recovery_end_command to complete
+RecoveryEndCommand Waiting for recovery-end-command to complete
-RestoreCommand Waiting for restore_command to complete
+RestoreCommand Waiting for restore-command to complete



Yeah, nice catch. v7 just shared up-thread replace "-" by "_"
for such a case. But I'm not sure the pg_wait_event description field 
needs
to be 100% aligned with the documentation: wouldn't be better to 
replace

"-" by " " in such cases in pg_wait_event?


* The HTML tag match is not shortest match.
-frozenid Waiting to update pg_database.datfrozenxid and 
pg_database.datminmxid

+frozenid Waiting to update datminmxid



Nice catch, fixed in v7.


Thanks, I confirmed.


* There are two blanks before "about".


This is coming from wait_event_names.txt, thanks for pointing out.
I just submitted a patch [1] to fix that.


Thanks. I agree it's better to be treated as a separated patch.


Also " for heavy weight is
   removed (is it intended?)
-LockManager Waiting to read or update information about "heavyweight" 
locks
+LockManager Waiting to read or update information  about heavyweight 
locks




Not intended, fixed in v7.


OK, I confirmed it's fixed.


* Do we need "worker_spi_main" in the description? The name column
   shows the same one, so it could be omitted.

pg_wait_event
worker_spi_main Custom wait event "worker_spi_main" defined by 
extension




Do you mean remove the wait event name from the description in case of 
custom
extension wait events? I'd prefer to keep it, if not the descriptions 
would be

all the same for custom wait events.


OK, I thought it's redundant.

BTW, is it better to start with "Waiting" like any other lines?
For example, "Waiting for custom event \"worker_spi_main\" defined by an 
extension".



2)

Would it be better to add "extension" meaning unassigned?

Extensions can add Extension and LWLock types to the list shown in 
Table 28.8 and Table 28.12. In some cases, the name of LWLock 
assigned by an extension will not be available in all server 
processes; It might be reported as just “extension” rather than the 
extension-assigned name.




Yeah, could make sense but I think that should be a dedicated patch
for the documentation.


OK, make sense.


3)

Would index == els be better for the following Assert?
+    Assert(index <= els);



Agree, done in v7.


4)

There is a typo. alll -> all
+    /* Allocate enough space for alll entries */


Thanks, I confirmed it's fixed.


The followings are additional comments for v7.

1)


I am not sure that "pg_wait_event" is a good idea for the name if the
new view.  How about "pg_wait_events" instead, in plural form?  There
is more than one wait event listed.
I'd prefer the singular form. There is a lot of places where it's 
already used
(pg_database, pg_user, pg_namespace...to name a few) and it looks like 
that using

the plural form are exceptions.


Since I got the same feeling as Michael-san that "pg_wait_events" would 
be better,
I check the names of all system views. I think the singular form seems 
to be
exceptions for views though the singular form is used usually for system 
catalogs.


https://www.postgresql.org/docs/devel/views.html
https://www.postgresql.org/docs/devel/catalogs.html

2)

$ctmp must be $wctmp?

+   rename($wctmp, "$output_path/wait_event_funcs_data.c")
+ || die "rename: $ctmp to $output_path/wait_event_funcs_data.c: $!";


3)

"List all the wait events type, name and descriptions" is enough?

+ * List all the wait events (type, name) and their descriptions.

4)

Why don't you add the usage of the view in the monitoring.sgml?

```
   
 Here is an example of how wait events can be viewed:


SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE 
wait_event is NOT NULL;

 pid  | wait_event_type | wait_event
--+-+
 2540 | Lock| relation
 6644 | LWLock  | ProcArray
(2 rows)

   
```

ex.
postgres(3789417)=# SELECT a.pid, a.wait_event_type, a.wait_event, 
w.description FROM pg_stat_activity a JOIN pg_wait_event w ON 
(a.wait_event_type = w.type AND a.wait_event = w.name) WHERE wait_event 
is NOT NULL;
   pid   | wait_event_type |wait_event |  
description


Re: Allow parallel plan for referential integrity checks?

2023-08-17 Thread Frédéric Yhuel




On 8/10/23 17:06, Juan José Santamaría Flecha wrote:
Recently I restored a database from a directory format backup and having 
this feature would have been quite useful


Hi,

Thanks for resuming work on this patch. I forgot to mention this in my 
original email, but the motivation was also to speed up the restore 
process. Parallelizing the FK checks could make a huge difference in 
certain cases. We should probably provide such a test case (with perf 
numbers), and maybe this is it what Robert asked for.





Re: Synchronizing slots from primary to standby

2023-08-17 Thread shveta malik
On Thu, Aug 17, 2023 at 11:44 AM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 8/14/23 11:52 AM, shveta malik wrote:
>
> >
> > We (myself and Ajin) performed the tests to compute the lag in standby
> > slots as compared to primary slots with different number of slot-sync
> > workers configured.
> >
>
> Thanks!
>
> > 3 DBs were created, each with 30 tables and each table having one
> > logical-pub/sub configured. So this made a total of 90 logical
> > replication slots to be synced. Then the workload was run for aprox 10
> > mins. During this workload, at regular intervals, primary and standby
> > slots' lsns were captured (from pg_replication_slots) and compared. At
> > each capture, the intent was to know how much is each standby's slot
> > lagging behind corresponding primary's slot by taking the distance
> > between confirmed_flush_lsn of primary and standby slot. Then we took
> > the average (integer value) of this distance over the span of 10 min
> > workload
>
> Thanks for the explanations, make sense to me.
>
> > and this is what we got:
> >
> > With max_slot_sync_workers=1, average-lag =  42290.3563
> > With max_slot_sync_workers=2, average-lag =  24585.1421
> > With max_slot_sync_workers=3, average-lag =  14964.9215
> >
> > This shows that more workers have better chances to keep logical
> > replication slots in sync for this case.
> >
>
> Agree.
>
> > Another statistics if it interests you is, we ran a frequency test as
> > well (this by changing code, unit test sort of) to figure out the
> > 'total number of times synchronization done' with different number of
> > sync-slots workers configured. Same 3 DBs setup with each DB having 30
> > logical replication slots. With 'max_slot_sync_workers' set at 1, 2
> > and 3; total number of times synchronization done was 15874, 20205 and
> > 23414 respectively. Note: this is not on the same machine where we
> > captured lsn-gap data, it is on  a little less efficient machine but
> > gives almost the same picture
> >
> > Next we are planning to capture this data for a lesser number of slots
> > like 10,30,50 etc. It may happen that the benefit of multi-workers
> > over single workers in such cases could be less, but let's have the
> > data to verify that.
> >
>
> Thanks a lot for those numbers and for the testing!
>
> Do you think it would make sense to also get the number of using
> the pg_failover_slots module? (and compare the pg_failover_slots numbers with 
> the
> "one worker" case here). Idea is to check if the patch does introduce
> some overhead as compare to pg_failover_slots.
>

Yes, definitely. We will work on that and share the numbers soon.

thanks
Shveta




Re: Fix an entry in wait_event_names.txt

2023-08-17 Thread Masahiro Ikeda

On 2023-08-17 14:49, Drouvot, Bertrand wrote:

Hi hackers,

While working on [1] it has been noticed by Masahiro-san that the
description field
in the new pg_wait_event view contains 2 blanks for one row.

It turns out that it comes from wait_event_names.txt (added in 
fa88928).


Attached a tiny patch to fix this entry in wait_event_names.txt (I did
check that no
other entries are in the same case).

[1]:
https://www.postgresql.org/message-id/735fbd560ae914c96faaa23cc8d9a118%40oss.nttdata.com

Regards,


+1. Thanks!

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-17 Thread Amit Kapila
On Wed, Aug 16, 2023 at 4:51 PM Zhijie Hou (Fujitsu)
 wrote:
>
> 4.
> +intnum_slots_on_old_cluster;
>
> Instead of a new global variable, would it be better to record this in the 
> cluster info ?
>

I was thinking whether we can go a step ahead and remove this variable
altogether. In old cluster handling, we can get and check together at
the same place and for the new cluster, if we have a function that
returns slot_count by traversing old clusterinfo that should be
sufficient. If you have other better ideas to eliminate this variable
that is also fine. I think this will make the patch bit clean w.r.t
this new variable.

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2023-08-17 Thread Drouvot, Bertrand

Hi,

On 8/14/23 11:52 AM, shveta malik wrote:



We (myself and Ajin) performed the tests to compute the lag in standby
slots as compared to primary slots with different number of slot-sync
workers configured.



Thanks!


3 DBs were created, each with 30 tables and each table having one
logical-pub/sub configured. So this made a total of 90 logical
replication slots to be synced. Then the workload was run for aprox 10
mins. During this workload, at regular intervals, primary and standby
slots' lsns were captured (from pg_replication_slots) and compared. At
each capture, the intent was to know how much is each standby's slot
lagging behind corresponding primary's slot by taking the distance
between confirmed_flush_lsn of primary and standby slot. Then we took
the average (integer value) of this distance over the span of 10 min
workload 


Thanks for the explanations, make sense to me.


and this is what we got:

With max_slot_sync_workers=1, average-lag =  42290.3563
With max_slot_sync_workers=2, average-lag =  24585.1421
With max_slot_sync_workers=3, average-lag =  14964.9215

This shows that more workers have better chances to keep logical
replication slots in sync for this case.



Agree.


Another statistics if it interests you is, we ran a frequency test as
well (this by changing code, unit test sort of) to figure out the
'total number of times synchronization done' with different number of
sync-slots workers configured. Same 3 DBs setup with each DB having 30
logical replication slots. With 'max_slot_sync_workers' set at 1, 2
and 3; total number of times synchronization done was 15874, 20205 and
23414 respectively. Note: this is not on the same machine where we
captured lsn-gap data, it is on  a little less efficient machine but
gives almost the same picture

Next we are planning to capture this data for a lesser number of slots
like 10,30,50 etc. It may happen that the benefit of multi-workers
over single workers in such cases could be less, but let's have the
data to verify that.



Thanks a lot for those numbers and for the testing!

Do you think it would make sense to also get the number of using
the pg_failover_slots module? (and compare the pg_failover_slots numbers with 
the
"one worker" case here). Idea is to check if the patch does introduce
some overhead as compare to pg_failover_slots.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com