Re: Fix pg_stat_reset_single_table_counters function

2023-08-20 Thread Masahiro Ikeda

On 2023-08-21 13:34, Michael Paquier wrote:

On Mon, Aug 21, 2023 at 11:33:28AM +0900, Kyotaro Horiguchi wrote:

I'm not sure about others, but I would avoid using the name
"current_database" for the variable.


Agreed.  Looking at the surroundings, we have for example "datname" in
the collation tests so I have just used that, and applied the patch
down to 15.


Thanks!

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Re: Fix pg_stat_reset_single_table_counters function

2023-08-20 Thread Michael Paquier
On Mon, Aug 21, 2023 at 11:33:28AM +0900, Kyotaro Horiguchi wrote:
> I'm not sure about others, but I would avoid using the name
> "current_database" for the variable.

Agreed.  Looking at the surroundings, we have for example "datname" in
the collation tests so I have just used that, and applied the patch
down to 15.
--
Michael


signature.asc
Description: PGP signature


Re: Fix pg_stat_reset_single_table_counters function

2023-08-20 Thread Masahiro Ikeda

On 2023-08-21 11:33, Kyotaro Horiguchi wrote:

On 2023/08/15 14:13, Masahiro Ikeda wrote:

On 2023-08-15 11:48, Masahiko Sawada wrote:

+COMMENT ON DATABASE :current_database IS 'This is a test comment';
-- insert or update in 'pg_shdescription'

I think the current_database should be quoted (see other examples
where using current_database(), e.g. collate.linux.utf8.sql). Also it
would be better to reset the comment after the test.




I'm not sure about others, but I would avoid using the name
"current_database" for the variable.

I would just use "database" or "db" instead.


Thanks! I agree your comment.
I fixed in v5 patch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 5980c041e33fd180e43868cbdc878ac0cf93112e Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Mon, 21 Aug 2023 13:24:27 +0900
Subject: [PATCH] Fix pg_stat_reset_single_table_counters function.

This commit revives the feature to reset statistics for a single
relation shared across all databases in the cluster to zero, which
was implemented by the following commit.
* Enhance pg_stat_reset_single_table_counters function(e04267844)

The following commit accidentally deleted the feature.
* pgstat: store statistics in shared memory(5891c7a8e)

Need to backpatch from 15.

Reported-by: Mitsuru Hinata
---
 src/backend/utils/adt/pgstatfuncs.c |  9 +--
 src/test/regress/expected/stats.out | 42 +
 src/test/regress/sql/stats.sql  | 26 ++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2a4c8ef87f..2b9742ad21 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -17,6 +17,7 @@
 #include "access/htup_details.h"
 #include "access/xlog.h"
 #include "access/xlogprefetcher.h"
+#include "catalog/catalog.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_type.h"
 #include "common/ip.h"
@@ -1776,13 +1777,17 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
-/* Reset a single counter in the current database */
+/*
+ * Reset a statistics for a single object, which may be of current
+ * database or shared across all databases in the cluster.
+ */
 Datum
 pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 {
 	Oid			taboid = PG_GETARG_OID(0);
+	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
 
-	pgstat_reset(PGSTAT_KIND_RELATION, MyDatabaseId, taboid);
+	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
 
 	PG_RETURN_VOID();
 }
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 319164a5e9..9985579e18 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -764,6 +764,48 @@ FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass;
 2 | t  |3 | t
 (1 row)
 
+-
+-- Test to reset stats for a table shared across all databases (ex. pg_shdescription)
+-
+-- store the old comment to reset
+SELECT shobj_description(d.oid, 'pg_database') as description_before
+FROM pg_database d WHERE datname = current_database() \gset
+-- update the stats in pg_shdescription
+BEGIN;
+SELECT current_database() as db \gset
+COMMENT ON DATABASE :"db" IS 'This is a test comment';
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--
+ 
+(1 row)
+
+COMMIT;
+-- check to reset the stats
+SELECT n_tup_ins + n_tup_upd > 0 FROM pg_stat_all_tables WHERE relid = 'pg_shdescription'::regclass;
+ ?column? 
+--
+ t
+(1 row)
+
+SELECT pg_stat_reset_single_table_counters('pg_shdescription'::regclass);
+ pg_stat_reset_single_table_counters 
+-
+ 
+(1 row)
+
+SELECT n_tup_ins + n_tup_upd FROM pg_stat_all_tables WHERE relid = 'pg_shdescription'::regclass;
+ ?column? 
+--
+0
+(1 row)
+
+-- cleanup the comment
+\if :{?description_before}
+  COMMENT ON DATABASE :"db" IS :'description_before';
+\else
+  COMMENT ON DATABASE :"db" IS NULL;
+\endif
 -
 -- Test that various stats views are being properly populated
 -
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 9a16df1c49..5393b18faa 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -376,6 +376,32 @@ COMMIT;
 SELECT seq_scan, :'test_last_seq' = last_seq_scan AS seq_ok, idx_scan, :'test_last_idx' < last_idx_scan AS idx_ok
 FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass;
 
+-
+-- Test to reset stats for a table shared across all databases (ex. pg_shdescription)
+-
+
+-- store the old comment to reset
+SELECT shobj_description(d.oid, 'pg_database') as description_before
+FROM pg_database d WHERE datname = current_database() \gset
+
+-- update the stats in pg_shdescription
+BEGIN;
+SELECT current_database() as db \gset
+COMMENT ON DATABASE :"db" IS 'This is a test comment';
+SELECT pg_stat_force_next_flush();

Re: Make --help output fit within 80 columns per line

2023-08-20 Thread Masahiro Ikeda

On 2023-07-05 10:47, torikoshia wrote:

Hi,

As discussed in [1], outputs of --help for some commands fits into 80 
columns

per line, while others do not.

Since it seems preferable to have consistent line break policy and some 
people
use 80-column terminal, wouldn't it be better to make all commands in 
80

columns per line?

Attached patch which does this for src/bin commands.

If this is the way to go, I'll do same things for contrib commands.

[1] 
https://www.postgresql.org/message-id/3fe4af5a0a81fc6a2ec01cb484c0a487%40oss.nttdata.com


Thanks for making the patches! I have some comments to v1 patch.

(1)

Why don't you add test for the purpose? It could be overkill...
I though the following function is the best place.

diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm 
b/src/test/perl/PostgreSQL/Test/Utils.pm

index 617caa022f..1bdb81ac56 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -843,6 +843,10 @@ sub program_help_ok
ok($result, "$cmd --help exit code 0");
isnt($stdout, '', "$cmd --help goes to stdout");
is($stderr, '', "$cmd --help nothing to stderr");
+   foreach my $line (split /\n/, $stdout)
+   {
+   ok(length($line) <= 80, "$cmd --help output fit within 
80 columns per line");

+   }
return;
 }

(2)

Is there any reason that only src/bin commands are targeted? I found 
that

we also need to fix vacuumlo with the above test. I think it's better to
fix it because it's a contrib module.

$ vacuumlo --help | while IFS='' read line; do echo $((`echo $line | wc 
-m` - 1)) $line; done | sort -n -r  | head -n 2
84   -n, --dry-run don't remove large objects, just show 
what would be done
74   -l, --limit=LIMIT commit after removing each LIMIT large 
objects


(3)

Is to delete '/mnt/server' intended?  I though it better to leave it as
is since archive_cleanup_command example uses the absolute path.

-			 "  pg_archivecleanup /mnt/server/archiverdir 
00010010.0020.backup\n"));
+			 "  pg_archivecleanup archiverdir 
00010010.0020.backup\n"));


I will confirmed that the --help text are not changed and only
the line breaks are changed.  But, currently the above change
break it.

(4)

I found that some binaries, for example ecpg, are not tested with
program_help_ok(). Is it better to add tests in the patch?

BTW, I check the difference with the following commands
# files include "--help"
$ find -name "*.c" | xargs -I {} sh -c 'if [ `grep -e --help {} | wc -l` 
-gt 0 ]; then echo {}; fi'


# programs which is tested with program_help_ok
$ find -name "*.pl" | xargs -I {} sh -c 'grep -e program_help_ok {}'

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION




Support run-time partition pruning for hash join

2023-08-20 Thread Richard Guo
If we have a hash join with an Append node on the outer side, something
like

 Hash Join
   Hash Cond: (pt.a = t.a)
   ->  Append
 ->  Seq Scan on pt_p1 pt_1
 ->  Seq Scan on pt_p2 pt_2
 ->  Seq Scan on pt_p3 pt_3
   ->  Hash
 ->  Seq Scan on t

We can actually prune those subnodes of the Append that cannot possibly
contain any matching tuples from the other side of the join.  To do
that, when building the Hash table, for each row from the inner side we
can compute the minimum set of subnodes that can possibly match the join
condition.  When we have built the Hash table and start to execute the
Append node, we should have known which subnodes are survived and thus
can skip other subnodes.

This kind of partition pruning can be extended to happen across multiple
join levels.  For instance,

 Hash Join
   Hash Cond: (pt.a = t2.a)
   ->  Hash Join
 Hash Cond: (pt.a = t1.a)
 ->  Append
   ->  Seq Scan on pt_p1 pt_1
   ->  Seq Scan on pt_p2 pt_2
   ->  Seq Scan on pt_p3 pt_3
 ->  Hash
   ->  Seq Scan on t1
   ->  Hash
 ->  Seq Scan on t2

We can compute the matching subnodes of the Append when building Hash
table for 't1' according to the join condition 'pt.a = t1.a', and when
building Hash table for 't2' according to join condition 'pt.a = t2.a',
and the final surviving subnodes would be their intersection.

Greenplum [1] has implemented this kind of partition pruning as
'Partition Selector'.  Attached is a patch that refactores Greenplum's
implementation to make it work on PostgreSQL master.  Here are some
details about the patch.

During planning:

1. When creating a hash join plan in create_hashjoin_plan() we first
   collect information required to build PartitionPruneInfos at this
   join, which includes the join's RestrictInfos and the join's inner
   relids, and put this information in a stack.

2. When we call create_append_plan() for an appendrel, for each of the
   joins we check if join partition pruning is possible to take place
   for this appendrel, based on the information collected at that join,
   and if so build a PartitionPruneInfo and add it to the stack entry.

3. After finishing the outer side of the hash join, we should have built
   all the PartitionPruneInfos that can be used to perform join
   partition pruning at this join.  So we pop out the stack entry to get
   the PartitionPruneInfos and add them to Hash node.

During executing:

When building the hash table for a hash join, we perform the partition
prunning for each row according to each of the JoinPartitionPruneStates
at this join, and store each result in a special executor parameter to
make it available to Append nodes.  When executing an Append node, we
can directly use the pre-computed pruning results to skip subnodes that
cannot contain any matching rows.

Here is a query that shows the effect of the join partition prunning.

CREATE TABLE pt (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE pt_p1 PARTITION OF pt FOR VALUES FROM (0) TO (250);
CREATE TABLE pt_p2 PARTITION OF pt FOR VALUES FROM (250) TO (500);
CREATE TABLE pt_p3 PARTITION OF pt FOR VALUES FROM (500) TO (600);
INSERT INTO pt SELECT i, i % 25, to_char(i, 'FM') FROM
generate_series(0, 599) i WHERE i % 2 = 0;

CREATE TABLE t1 (a int, b int);
INSERT INTO t1 values (10, 10);

CREATE TABLE t2 (a int, b int);
INSERT INTO t2 values (300, 300);

ANALYZE pt, t1, t2;

SET enable_nestloop TO off;

explain (analyze, costs off, summary off, timing off)
select * from pt join t1 on pt.a = t1.a right join t2 on pt.a = t2.a;
 QUERY PLAN

 Hash Right Join (actual rows=1 loops=1)
   Hash Cond: (pt.a = t2.a)
   ->  Hash Join (actual rows=0 loops=1)
 Hash Cond: (pt.a = t1.a)
 ->  Append (actual rows=0 loops=1)
   ->  Seq Scan on pt_p1 pt_1 (never executed)
   ->  Seq Scan on pt_p2 pt_2 (never executed)
   ->  Seq Scan on pt_p3 pt_3 (never executed)
 ->  Hash (actual rows=1 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   ->  Seq Scan on t1 (actual rows=1 loops=1)
   ->  Hash (actual rows=1 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 9kB
 ->  Seq Scan on t2 (actual rows=1 loops=1)
(14 rows)


There are several points that need more consideration.

1. All the join partition prunning decisions are made in createplan.c
   where the best path tree has been decided.  This is not great.  Maybe
   it's better to make it happen when we build up the path tree, so that
   we can take the partition prunning into consideration when estimating
   the costs.

2. In order to make the join partition prunning take effect, the patch
   hacks the empty-outer optimization in ExecHashJoinImpl().  Not sure
   if this is a good practice.

3. This patch does not support parallel 

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-20 Thread Amit Kapila
On Sun, Aug 20, 2023 at 6:49 PM Masahiko Sawada  wrote:
>
> On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila  wrote:
> >
> > >
> > > Sorry I was not clear. I meant the logical replication slots that are
> > > *not* used by logical replication, i.e., are created manually and used
> > > by third party tools that periodically consume decoded changes. As we
> > > discussed before, these slots will never be able to pass that
> > > confirmed_flush_lsn check.
> > >
> >
> > I think normally one would have a background process to periodically
> > consume changes. Won't one can use the walsender infrastructure for
> > their plugins to consume changes probably by using replication
> > protocol?
>
> Not sure.
>

I think one can use Streaming Replication Protocol to achieve it [1].

> > Also, I feel it is the plugin author's responsibility to
> > consume changes or advance slot to the required position before
> > shutdown.
>
> How does the plugin author ensure that the slot consumes all WAL
> records including shutdown_checkpoint before shutdown?
>

By using "Streaming Replication Protocol" so that walsender can take
care of it. If not, I think users should drop such slots before the
upgrade because anyway, they won't be usable after the upgrade.

> >
> > > After some thoughts, one thing we might
> > > need to consider is that in practice, the upgrade project is performed
> > > during the maintenance window and has a backup plan that revert the
> > > upgrade process, in case something bad happens. If we require the
> > > users to drop such logical replication slots, they cannot resume to
> > > use the old cluster in that case, since they would need to create new
> > > slots, missing some changes.
> > >
> >
> > Can't one keep the backup before removing slots?
>
> Yes, but restoring the back could take time.
>
> >
> > > Other checks in pg_upgrade seem to be
> > > compatibility checks that would eventually be required for the upgrade
> > > anyway. Do we need to consider this case? For example, we do that
> > > confirmed_flush_lsn check for only the slots with pgoutput plugin.
> > >
> >
> > I think one is allowed to use pgoutput plugin even for manually
> > created slots. So, such a check may not work.
>
> Right, but I thought it's a very rare case.
>

Okay, but not sure that we can ignore it.

> Since the slot's flushed_confirmed_lsn check is not a compatibility
> check unlike the existing check, I wonder if we can make it optional.
>

There are arguments both ways. Initially, the patch proposed to make
them optional by having an option like
--include-logical-replication-slots but Jonathan raised a point that
it will be more work for users and should be the default. Then we also
discussed having an option like --exclude-logical-replication-slots
but as we don't have any other similar option, it doesn't seem natural
to add such an option. Also, I am afraid, if there is no user of such
an option, it won't be worth it. BTW, how would you like to see it as
an optional (via --include or via --exclude switch)?

Personally, I am okay to make it optional if we have a broader
consensus. My preference would be to have an --exclude kind of option.
How about first getting the main patch reviewed and committed, then
based on consensus, we can decide whether to make it optional and if
so, what is the preferred way?

[1] - https://www.postgresql.org/docs/current/protocol-replication.html

-- 
With Regards,
Amit Kapila.




Re: Extract numeric filed in JSONB more effectively

2023-08-20 Thread Chapman Flack

On 2023-08-20 21:31, Andy Fan wrote:

Highlighting the user case of makeRelableType is interesting! But using
the Oid directly looks more promising for this question IMO, it looks 
like:
"you said we can put anything in this arg,  so I put an OID const 
here",

seems nothing is wrong.


Perhaps one of the more senior developers will chime in, but to me,
leaving out the relabel nodes looks more like "all of PostgreSQL's
type checking happened before the SupportRequestSimplify, so nothing
has noticed that we rewrote the tree with mismatched types, and as
long as nothing crashes we sort of got away with it."

Suppose somebody writes an extension to double-check that plan
trees are correctly typed. Or improves EXPLAIN to check a little more
carefully than it seems to. Omitting the relabel nodes could spell
trouble then.

Or, someone more familiar with the code than I am might say "oh,
mismatches like that are common in rewritten trees, we live with it."
But unless somebody tells me that, I'm not believing it.

But I would say we have proved the concept of SupportRequestSimplify
for this task. :)

Now, it would make me happy to further reduce some of the code
duplication between the original and the _type versions of these
functions. I see that you noticed the duplication in the case of
jsonb_extract_path, and you factored out jsonb_get_jsonbvalue so
it could be reused. There is also some duplication with object_field
and array_element. (Also, we may have overlooked jsonb_path_query
and jsonb_path_query_first as candidates for the source of the
cast. Two more candidates; five total.)

Here is one way this could be structured. Observe that every one
of those five source candidates operates in two stages:

Start: All of the processing until a JsonbValue has been obtained.
Finish: Converting the JsonbValue to some form for return.

Before this patch, there were two choices for Finish:
JsonbValueToJsonb or JsonbValueAsText.

With this patch, there are four Finish choices: those two, plus
PG_RETURN_BOOL(v->val.boolean), PG_RETURN_NUMERIC(v->val.numeric).

Clearly, with rewriting, we can avoid 5✕4 = 20 distinct
functions. The five candidate functions only differ in Start.
Suppose each of those had a _start version, like
jsonb_object_field_start, that only proceeds as far as
obtaining the JsonbValue, and returns that directly (an
'internal' return type). Naturally, each _start function would
need an 'internal' parameter also, even if it isn't used,
just to make sure it is not SQL-callable.

Now consider four Finish functions: jsonb_finish_jsonb,
jsonb_finish_text, jsonb_finish_boolean, jsonb_finish_numeric.

Each would have one 'internal' parameter (a JsonbValue), and
its return type declared normally. There is no need to pass
a type oid to any of these, and they need not contain any
switch to select a return type. The correct finisher to use
is simply chosen once at the time of rewriting.

So cast(jsonb_array_element(jb, 0) as numeric) would just get
rewritten as jsonb_finish_numeric(jsonb_array_element_start(jb,0)).

The other (int and float) types don't need new code; just have
the rewriter add a cast-from-numeric node on top. That's all
those other switch cases in cast_jsonbvalue_to_type are doing,
anyway.

Notice in this structure, less relabeling is needed. The
final return does not need relabeling, because each finish
function has the expected return type. Each finish function's
parameter is typed 'internal' (a JsonbValue), but that's just
what each start function returns, so no relabeling needed
there either.

The rewriter will have to supply some 'internal' constant
as a start-function parameter (because of the necessary
'internal' parameter). It might still be civilized to relabel
that.

Regards,
-Chap




Re: Fix pg_stat_reset_single_table_counters function

2023-08-20 Thread Kyotaro Horiguchi
Apologies. In the previous mail, I mistakenly addressed it to the wrong 
recipients. Reposted.


On 2023/08/15 14:13, Masahiro Ikeda wrote:

On 2023-08-15 11:48, Masahiko Sawada wrote:

+COMMENT ON DATABASE :current_database IS 'This is a test comment';
-- insert or update in 'pg_shdescription'

I think the current_database should be quoted (see other examples
where using current_database(), e.g. collate.linux.utf8.sql). Also it
would be better to reset the comment after the test.


Thanks! I fixed the issues in the v4 patch.


I'm not sure about others, but I would avoid using the name 
"current_database" for the variable.


I would just use "database" or "db" instead.

--
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Rethink the wait event names for postgres_fdw, dblink and etc

2023-08-20 Thread Masahiro Ikeda

Hi,

Thanks for your comments.

I updated the patch to v2.
* Update a comment instead writing documentation about
  the wait events for pg_prewarm.
* Make the name of wait events which are different code
  path different. Add DblinkGetConnect and PgPrewarmDumpShutdown.
* Make variable names shorter like pgfdw_we_receive.
* Update documents.
* Add some tests with pg_wait_events view.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATIONFrom 025525eae164e33117d94f2a90a4219808072b3c Mon Sep 17 00:00:00 2001
From: Masahiro Ikeda 
Date: Mon, 21 Aug 2023 10:36:10 +0900
Subject: [PATCH] Make to use custom wait events for modules

---
 contrib/dblink/dblink.c   | 16 +++-
 contrib/dblink/expected/dblink.out|  9 +
 contrib/dblink/sql/dblink.sql |  4 ++
 contrib/pg_prewarm/autoprewarm.c  | 22 ++-
 contrib/postgres_fdw/connection.c | 23 +--
 .../postgres_fdw/expected/postgres_fdw.out| 12 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  7 
 doc/src/sgml/dblink.sgml  | 28 +
 doc/src/sgml/postgres-fdw.sgml| 39 +++
 doc/src/sgml/xfunc.sgml   |  6 +--
 src/test/modules/test_shm_mq/setup.c  |  9 -
 src/test/modules/test_shm_mq/test.c   |  9 -
 .../modules/worker_spi/t/001_worker_spi.pl| 12 +++---
 src/test/modules/worker_spi/worker_spi.c  |  2 +-
 14 files changed, 179 insertions(+), 19 deletions(-)

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 41e1f6c91d..104d6e9f37 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -130,6 +130,10 @@ static void restoreLocalGucs(int nestlevel);
 static remoteConn *pconn = NULL;
 static HTAB *remoteConnHash = NULL;
 
+/* value cached, fetched from shared memory */
+static uint32 dblink_we_get_conn = 0;
+static uint32 dblink_we_conn = 0;
+
 /*
  *	Following is list that holds multiple remote connections.
  *	Calling convention of each dblink function changes to accept
@@ -202,8 +206,12 @@ dblink_get_conn(char *conname_or_str,
 			connstr = conname_or_str;
 		dblink_connstr_check(connstr);
 
+		/* first time, allocate or get the custom wait event */
+		if (dblink_we_get_conn == 0)
+			dblink_we_get_conn = WaitEventExtensionNew("DblinkGetConnect");
+
 		/* OK to make connection */
-		conn = libpqsrv_connect(connstr, WAIT_EVENT_EXTENSION);
+		conn = libpqsrv_connect(connstr, dblink_we_get_conn);
 
 		if (PQstatus(conn) == CONNECTION_BAD)
 		{
@@ -292,8 +300,12 @@ dblink_connect(PG_FUNCTION_ARGS)
 	/* check password in connection string if not superuser */
 	dblink_connstr_check(connstr);
 
+	/* first time, allocate or get the custom wait event */
+	if (dblink_we_conn == 0)
+		dblink_we_conn = WaitEventExtensionNew("DblinkConnect");
+
 	/* OK to make connection */
-	conn = libpqsrv_connect(connstr, WAIT_EVENT_EXTENSION);
+	conn = libpqsrv_connect(connstr, dblink_we_conn);
 
 	if (PQstatus(conn) == CONNECTION_BAD)
 	{
diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out
index 7809f58d96..c17c7b1361 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -1209,6 +1209,15 @@ SHOW intervalstyle;
  postgres
 (1 row)
 
+-- Check custom wait events are registered
+SELECT name FROM pg_wait_events WHERE name ~ '^Dblink'
+ORDER BY name COLLATE "C";
+   name   
+--
+ DblinkConnect
+ DblinkGetConnect
+(2 rows)
+
 -- Clean up GUC-setting tests
 SELECT dblink_disconnect('myconn');
  dblink_disconnect 
diff --git a/contrib/dblink/sql/dblink.sql b/contrib/dblink/sql/dblink.sql
index 7870ce5d5a..519b3f5266 100644
--- a/contrib/dblink/sql/dblink.sql
+++ b/contrib/dblink/sql/dblink.sql
@@ -627,6 +627,10 @@ FROM dblink_fetch('myconn','error_cursor', 1) AS t(i int);
 SHOW datestyle;
 SHOW intervalstyle;
 
+-- Check custom wait events are registered
+SELECT name FROM pg_wait_events WHERE name ~ '^Dblink'
+ORDER BY name COLLATE "C";
+
 -- Clean up GUC-setting tests
 SELECT dblink_disconnect('myconn');
 RESET datestyle;
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index d0efc9e524..2160d40705 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -105,6 +105,16 @@ static AutoPrewarmSharedState *apw_state = NULL;
 static bool autoprewarm = true; /* start worker? */
 static int	autoprewarm_interval = 300; /* dump interval */
 
+/*
+ * Cached custom wait events, fetched from shared memory.
+ *
+ * Note that they are not reported on pg_stat_activity. pgstat_bestart() isn't
+ * called because the autoprewarm worker isn't an auxiliary process and it
+ * doesn't connect to a database.
+ */
+static uint32 autoprewarm_we_shutdown = 0;
+static uint32 autoprewarm_we_delay = 0;
+
 /*
  * Module load callback.
  */
@@ -233,11 +243,15 @@ autoprewarm_main(Datum main_arg)
 
 		if (autoprewarm_interval <= 0)
 		{
+	

Re: Extract numeric filed in JSONB more effectively

2023-08-20 Thread Andy Fan
On Sat, Aug 19, 2023 at 3:09 AM Chapman Flack  wrote:

> On 2023-08-18 14:50, Chapman Flack wrote:
> > Now, my guess is EXPLAIN is complaining when it sees the Const
> > of type internal, and doesn't know how to show that value.
> > Perhaps makeRelabelType is the answer there, too: what if the
> > Const has Oid type, so EXPLAIN can show it, and what's inserted
> > as the function argument is a relabel node saying it's internal?


>
Simply changing the Const to be of type Oid makes EXPLAIN happy,
> and nothing ever says "hey, why are you passing this oid for an
> arg that wants internal?". This is without adding any relabel
> nodes anywhere.
>


Highlighting the user case of makeRelableType is interesting! But using
the Oid directly looks more promising for this question IMO, it looks like:
"you said we can put anything in this arg,  so I put an OID const here",
seems nothing is wrong.  Compared with the makeRelableType method,
I think the current method is more straightforward.  Compared with
anyelement, it avoids the creation of makeDummyConst which I'm not
sure the implementation is alway correct.  So I am pretty inclined to this
way!

v10 attached.

-- 
Best Regards
Andy Fan


v10-0001-optimize-casting-jsonb-to-a-given-type.patch
Description: Binary data


v10-0002-convert-anyelement-to-internal.patch
Description: Binary data


Re: PostgreSQL 16 RC1 + GA release dates

2023-08-20 Thread Jonathan S. Katz

On 8/20/23 10:50 AM, Tom Lane wrote:

"Jonathan S. Katz"  writes:

The date for PostgreSQL 16 Release Candidate 1 (RC1) is August 31, 2023.
Please ensure all open items[1] are completed and committed before
August 26, 2023 12:00 UTC.


FYI, I moved the "Oversight in reparameterize_path_by_child leading to
executor crash" open item to the "Older bugs affecting stable branches"
section, because it is in fact an old bug: the given test case crashes
in every branch that has enable_partitionwise_join.  I'll still look
at getting in the fix before RC1, but we should understand what we're
dealing with.


[RMT hat]

Thanks -- appreciative of the accurate record keeping.

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Adding a LogicalRepWorker type field

2023-08-20 Thread Peter Smith
On Fri, Aug 18, 2023 at 6:16 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Friday, August 18, 2023 11:20 AM Amit Kapila  
> wrote:
> >
> > On Mon, Aug 14, 2023 at 12:08 PM Peter Smith 
> > wrote:
> > >
> > > The main patch for adding the worker type enum has been pushed [1].
> > >
> > > Here is the remaining (rebased) patch for changing some previous
> > > cascading if/else to switch on the LogicalRepWorkerType enum instead.
> > >
> >
> > I see this as being useful if we plan to add more worker types. Does anyone 
> > else
> > see this remaining patch as an improvement?
>
> +1
>
> I have one comment for the new error message.
>
> +   case WORKERTYPE_UNKNOWN:
> +   ereport(ERROR, 
> errmsg_internal("should_apply_changes_for_rel: Unknown worker type"));
>
> I think reporting an ERROR in this case is fine. However, I would suggest
> refraining from mentioning the function name in the error message, as
> recommended in the error style document [1]. Also, it appears we could use
> elog() here.
>
> [1] https://www.postgresql.org/docs/devel/error-style-guide.html

OK. Modified as suggested. Anyway, getting these errors should not
even be possible, so I added a comment to emphasise that.

PSA v9

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v9-0001-Switch-on-worker-type.patch
Description: Binary data


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-20 Thread Michael Paquier
On Fri, Aug 18, 2023 at 08:49:16AM +0900, Michael Paquier wrote:
> After sleeping on it, I think that I'd just agree with Robert's point
> to just use the same language as the message, while also agreeing with
> the patch to not set MyClientConnectionInfo.authn_id in the uaTrust
> case, only logging something under log_connections.
> 
> +* No authentication was actually performed; this happens e.g. when 
> the
> +* trust method is in use.
> 
> This comment should be reworded a bit, say "No authentication identity
> was set; blah ..".

Attached is a v3 to do these two things, with adjustments for two SSL
tests.  Any objections about it?

(Note: no backpatch)
--
Michael
From 13b4c6fe0388bb4eec1d8aab6a25192ff7ad0684 Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 16 Aug 2023 14:48:49 -0700
Subject: [PATCH v3] log_connections: add entries for "trust" connections

Allow DBAs to audit unexpected "trust" connections.  Previously, you had
to notice the absence of a "connection authenticated" line between the
"connection received" and "connection authorized" entries.  Now it's
called out explicitly.
---
 src/backend/libpq/auth.c  | 15 +++
 src/test/authentication/t/001_password.pl |  8 ++--
 src/test/ssl/t/001_ssltests.pl|  4 ++--
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 315a24bb3f..1a084d21de 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -645,6 +645,21 @@ ClientAuthentication(Port *port)
 #endif
 	}
 
+	if (Log_connections && (status == STATUS_OK) &&
+		!MyClientConnectionInfo.authn_id)
+	{
+		/*
+		 * No authentication identity was set; this happens e.g. when the
+		 * trust method is in use.  For audit purposes, log a breadcrumb to
+		 * explain where in the HBA this happened.
+		 */
+		ereport(LOG,
+errmsg("connection authenticated: user=\"%s\" method=%s "
+	   "(%s:%d)",
+	   port->user_name, hba_authname(port->hba->auth_method),
+	   port->hba->sourcefile, port->hba->linenumber));
+	}
+
 	if (ClientAuthentication_hook)
 		(*ClientAuthentication_hook) (port, status);
 
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 12552837a8..4402fa7721 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -140,9 +140,13 @@ $node->safe_psql('postgres', "CREATE database regex_testdb;");
 # considered to be authenticated.
 reset_pg_hba($node, 'all', 'all', 'trust');
 test_conn($node, 'user=scram_role', 'trust', 0,
-	log_unlike => [qr/connection authenticated:/]);
+	log_like => [
+		qr/connection authenticated: user="scram_role" method=trust/
+	]);
 test_conn($node, 'user=md5_role', 'trust', 0,
-	log_unlike => [qr/connection authenticated:/]);
+	log_like => [
+		qr/connection authenticated: user="md5_role" method=trust/
+	]);
 
 # SYSTEM_USER is null when not authenticated.
 $res = $node->safe_psql('postgres', "SELECT SYSTEM_USER IS NULL;");
diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index 76442de063..81e36a4e88 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -801,7 +801,7 @@ $node->connect_ok(
 	  . sslkey('client.key'),
 	"auth_option clientcert=verify-full succeeds with matching username and Common Name",
 	# verify-full does not provide authentication
-	log_unlike => [qr/connection authenticated:/],);
+	log_like => [qr/connection authenticated: user="ssltestuser" method=trust/],);
 
 $node->connect_fails(
 	"$common_connstr user=anotheruser sslcert=ssl/client.crt "
@@ -819,7 +819,7 @@ $node->connect_ok(
 	  . sslkey('client.key'),
 	"auth_option clientcert=verify-ca succeeds with mismatching username and Common Name",
 	# verify-full does not provide authentication
-	log_unlike => [qr/connection authenticated:/],);
+	log_like => [qr/connection authenticated: user="yetanotheruser" method=trust/],);
 
 # intermediate client_ca.crt is provided by client, and isn't in server's ssl_ca_file
 switch_server_cert($node, certfile => 'server-cn-only', cafile => 'root_ca');
-- 
2.40.1



signature.asc
Description: PGP signature


Re: POC, WIP: OR-clause support for indexes

2023-08-20 Thread Peter Geoghegan
On Thu, Aug 17, 2023 at 3:08 AM a.rybakina  wrote:
> But now I see an interesting transformation, which was the most interesting 
> for me.
>
> EXPLAIN (COSTS OFF) SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 
> OR tenthous = 3 OR tenthous = 42);

It would be even more interesting if it could be an index-only scan as
a result of the transformation. For example, we could use an
index-only scan with this query (once your patch was in place):

"SELECT thousand, tenthous FROM tenk1 WHERE thousand = 42 AND
(tenthous = 1 OR tenthous = 3 OR tenthous = 42)"

Index-only scans were the original motivation for adding native
ScalarArrayExprOp support to nbtree (in 2011 commit 9e8da0f7), in
fact.

As I suggested earlier, I suspect that there is too much planner logic
that targets BitmapOrs specifically -- maybe even selectivity
estimation/restrictinfo stuff.

PS I wonder if the correctness issues that you saw could be related to
eval_const_expressions(), since "the planner assumes that this
[eval_const_expressions] will always flatten nested AND and OR clauses
into N-argument form". See its subroutines simplify_or_arguments() and
simplify_and_arguments().

-- 
Peter Geoghegan




Re: POC, WIP: OR-clause support for indexes

2023-08-20 Thread Peter Geoghegan
On Sun, Aug 20, 2023 at 3:11 PM Peter Geoghegan  wrote:
> Back in 2003, commit 9888192f removed (or at least simplified) what
> were then called "CNF/DNF CONVERSION ROUTINES". Prior to that point
> the optimizer README had something about leaving clause lists
> un-normalized leading to selectivity estimation problems. Bear in mind
> that this is a couple of years before ScalarArrayOpExpr was first
> invented. Apparently even back then "The OR-of-ANDs format is useful
> for indexscan implementation". It's possible that that old work will
> offer some hints on what to do now.

There was actually support for OR lists in index AMs prior to
ScalarArrayOpExpr. Even though ScalarArrayOpExpr don't really seem all
that related to bitmap scans these days (since at least nbtree knows
how to execute them "natively"), that wasn't always the case.
ScalarArrayOpExpr were invented the same year that bitmap index scans
were first added (2005), and seem more or less related to that work.
See commits bc843d39, 5b051852, 1e9a6ba5, and 290166f9 (all from
2005). Particularly the last one, which has a commit message that
heavily suggests that my interpretation is correct.

I think that we currently over-rely on BitmapOr for OR clauses. It's
useful that they're so general, of course, but ISTM that we shouldn't
even try to use a BitmapOr in simple cases. Things like the "WHERE
thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42)"
tenk1 query that you brought up probably shouldn't even have a
BitmapOr path (which I guess they don't with you patch). Note that I
recently discussed the same query at length with Tomas Vondra on the
ongoing thread for his index filter patch (you probably knew that
already).

-- 
Peter Geoghegan




Re: POC, WIP: OR-clause support for indexes

2023-08-20 Thread Peter Geoghegan
On Thu, Aug 17, 2023 at 3:08 AM a.rybakina  wrote:
> This is an interesting feature. I didn't notice this function before, I 
> studied many times consider_new_or_cause, which were called there. As far as 
> I know, there is a selectivity calculation going on there, but as far as I 
> remember, I called it earlier after my conversion, and unfortunately it 
> didn't solve my problem with calculating selectivity. I'll reconsider it 
> again, maybe I can find something I missed.

Back in 2003, commit 9888192f removed (or at least simplified) what
were then called "CNF/DNF CONVERSION ROUTINES". Prior to that point
the optimizer README had something about leaving clause lists
un-normalized leading to selectivity estimation problems. Bear in mind
that this is a couple of years before ScalarArrayOpExpr was first
invented. Apparently even back then "The OR-of-ANDs format is useful
for indexscan implementation". It's possible that that old work will
offer some hints on what to do now.

In a way it's not surprising that work in this area would have some
impact on selectivies. The surprising part is the extent of the
problem, I suppose.

I see that a lot of the things in this area are just used by BitmapOr
clauses, such as build_paths_for_OR() -- but you're not necessarily
able to use any of that stuff. Also, choose_bitmap_and() has some
stuff about how it compensates to avoid "too-small selectivity that
makes a redundant AND step look like it reduces the total cost". It
also mentions some problems with match_join_clauses_to_index() +
extract_restriction_or_clauses(). Again, this might be a good place to
look for more clues.

-- 
Peter Geoghegan




Re: UUID v7

2023-08-20 Thread Andrey M. Borodin


> On 30 Jul 2023, at 13:08, Andrey M. Borodin  wrote:
> 
> 
> After discussion on GitHub with Sergey Prokhorenko [0] I understood that 
> counter is optional, but useful part of UUID v7. It actually promotes 
> sortability of data generated at high speed.
> The standard does not specify how big counter should be. PFA patch with 16 
> bit counter. Maybe it worth doing 18bit counter - it will save us one byte of 
> PRNG data. Currently we only take 2 bits out of the whole random byte.
> 

Here's a new patch version. Now counter is initialised with strong random on 
every time change (each ms). However, one first bit of the counter is preserved 
to zero. This is done to extend counter capacity (I left comments with 
reference to RFC with explanations).

Thanks!

Best regards, Andrey Borodin.


v4-0001-Implement-UUID-v7-as-per-IETF-draft.patch
Description: Binary data


datetime from a JsonbValue

2023-08-20 Thread Chapman Flack

Hi,

Thread [1] concerns (generalizing slightly) the efficient casting
to an SQL type of the result of a jsonb extracting operation
(array indexing, object keying, path evaluation) that has ended
with a scalar JsonbValue.

So far, it can efficiently rewrite casts to boolean or numeric
types.

I notice that, since 6dda292, JsonbValue includes a datetime
scalar member.

As far as I can tell, the only jsonb extracting operations
that might be capable of producing such a JsonbValue would be
jsonb_path_query(_first)?(_tz)? with a path ending in .datetime().

If casts existed from jsonb to date/time types, then the same
techniques used in [1] would be able to rewrite such casts,
eliding the JsonbValueToJsonb and subsequent reconversion via text.

But no such casts seem to exist, providing nothing to hang the
optimization on. (And, after all, 6dda292 says "These datetime
values are allowed for temporary representation only.  During
serialization datetime values are converted into strings.")

Perhaps it isn't worth supplying such casts. The value is held
as text within jsonb, so .datetime() in a jsonpath had to parse
it. One might lament the extra serialization and reparsing if
that path query result goes through ::text::timestamp, but then
simply leaving .datetime() off of the jsonpath in the first place
would have left the parsing to be done just once by ::timestamp.

Optimizable casts might be of more interest if the jsonpath
language had more operations on datetimes, so that you might
efficiently retrieve the result of some arbitrary expression
in the path, not just a literal datetime value that has to get
parsed in one place or another anyway.

I haven't looked into SQL/JSON to see what it provides in terms
of casts to SQL types. I'm more familiar with SQL/XML, which does
provide XMLCAST, which can take an XML source and SQL date/time
target, and does the equivalent of an XML Query ending in
"cast as xs:dateTime" and assigns that result to the SQL type
(with some time zone subtleties rather carefully specified).
So I might assume SQL/JSON has something analogous?

On the other hand, XML Query does offer more operations on
date/time values, which may, as discussed above, make such a cast
more interesting to have around.

Thoughts?

Regards,
-Chap

[1] 
https://www.postgresql.org/message-id/flat/CAKU4AWoqAVya6PBhn+BCbFaBMt3z-2=i5fKO3bW=6hphbid...@mail.gmail.com





Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-20 Thread Tom Lane
I looked over the v3 patch.  I think it's going in generally
the right direction, but there is a huge oversight:
path_is_reparameterizable_by_child does not come close to modeling
the success/failure conditions of reparameterize_path_by_child.
In all the cases where reparameterize_path_by_child can recurse,
path_is_reparameterizable_by_child has to do so also, to check
whether the child path is reparameterizable.  (I'm somewhat
disturbed that apparently we have no test cases that caught that
oversight; can we add one cheaply?)

BTW, I also note from the cfbot that 9e9931d2b broke this patch by
adding more ADJUST_CHILD_ATTRS calls that need to be modified.
I wonder if we could get away with having that macro cast to "void *"
to avoid needing to change all its call sites.  I'm not sure whether
pickier compilers might warn about doing it that way.

regards, tom lane




Re: [PATCH] Add function to_oct

2023-08-20 Thread Nathan Bossart
On Sat, Aug 19, 2023 at 08:35:46AM +0100, Dean Rasheed wrote:
> I note that there are no tests for negative inputs.

I added some in v8.

> Doing a quick test, shows that this changes the current behaviour,
> because all inputs are now treated as 64-bit:
> 
> HEAD:
> 
> select to_hex((-1234)::int);
>   to_hex
> --
>  fb2e
> 
> With patch:
> 
> select to_hex((-1234)::int);
>   to_hex
> --
>  fb2e

Good catch.  In v8, I fixed this by first casting the input to uint32 for
the 32-bit versions of the functions.  This prevents the conversion to
uint64 from setting the rest of the bits.  AFAICT this behavior is pretty
well defined in the standard.

> The way that negative inputs are handled really should be documented,
> or at least it should include a couple of examples.

I used your suggestion and noted that the output is the two's complement
representation [0].

[0] 
https://postgr.es/m/CAEZATCVbkL1ynqpsKiTDpch34%3DSCr5nnau%3DnfNmiy2nM3SJHtw%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From a03c06d0b26ccbd49bda55c2efab565ab9fa90db Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 25 Jul 2023 16:09:01 -0700
Subject: [PATCH v8 1/1] add to_binary() and to_oct()

---
 doc/src/sgml/func.sgml| 47 ++-
 src/backend/utils/adt/varlena.c   | 86 +++
 src/include/catalog/pg_proc.dat   | 12 
 src/test/regress/expected/strings.out | 62 ++-
 src/test/regress/sql/strings.sql  | 15 -
 5 files changed, 192 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..835c08905e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3737,6 +3737,50 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ to_binary
+
+to_binary ( integer )
+text
+   
+   
+to_binary ( bigint )
+text
+   
+   
+Converts the number to its equivalent two's complement binary
+representation.
+   
+   
+to_binary(2147483647)
+111
+   
+  
+
+  
+   
+
+ to_oct
+
+to_oct ( integer )
+text
+   
+   
+to_oct ( bigint )
+text
+   
+   
+Converts the number to its equivalent two's complement octal
+representation.
+   
+   
+to_oct(2147483647)
+177
+   
+  
+
   

 
@@ -3750,7 +3794,8 @@ repeat('Pg', 4) PgPgPgPg
 text


-Converts the number to its equivalent hexadecimal representation.
+Converts the number to its equivalent two's complement hexadecimal
+representation.


 to_hex(2147483647)
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index b1ec5c32ce..cac3577480 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4919,53 +4919,87 @@ array_to_text_internal(FunctionCallInfo fcinfo, ArrayType *v,
 	return result;
 }
 
-#define HEXBASE 16
 /*
- * Convert an int32 to a string containing a base 16 (hex) representation of
- * the number.
+ * Workhorse for to_binary, to_oct, and to_hex.  Note that base must be > 1 and
+ * <= 16.
  */
-Datum
-to_hex32(PG_FUNCTION_ARGS)
+static inline text *
+convert_to_base(uint64 value, int base)
 {
-	uint32		value = (uint32) PG_GETARG_INT32(0);
-	char	   *ptr;
 	const char *digits = "0123456789abcdef";
-	char		buf[32];		/* bigger than needed, but reasonable */
 
-	ptr = buf + sizeof(buf) - 1;
-	*ptr = '\0';
+	/* We size the buffer for to_binary's longest possible return value. */
+	char		buf[sizeof(uint64) * BITS_PER_BYTE];
+	char	   *const end = buf + sizeof(buf);
+	char	   *ptr = end;
+
+	Assert(base > 1);
+	Assert(base <= 16);
 
 	do
 	{
-		*--ptr = digits[value % HEXBASE];
-		value /= HEXBASE;
+		*--ptr = digits[value % base];
+		value /= base;
 	} while (ptr > buf && value);
 
-	PG_RETURN_TEXT_P(cstring_to_text(ptr));
+	return cstring_to_text_with_len(ptr, end - ptr);
+}
+
+/*
+ * Convert an integer to a string containing a base-2 (binary) representation
+ * of the number.
+ */
+Datum
+to_binary32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint32) PG_GETARG_INT32(0);
+
+	PG_RETURN_TEXT_P(convert_to_base(value, 2));
+}
+Datum
+to_binary64(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT64(0);
+
+	PG_RETURN_TEXT_P(convert_to_base(value, 2));
 }
 
 /*
- * Convert an int64 to a string containing a base 16 (hex) representation of
+ * Convert an integer to a string containing a base-8 (oct) representation of
  * the number.
  */
 Datum
-to_hex64(PG_FUNCTION_ARGS)
+to_oct32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint32) PG_GETARG_INT32(0);
+
+	PG_RETURN_TEXT_P(convert_to_base(value, 8));
+}
+Datum
+to_oct64(PG_FUNCTION_ARGS)
 {
 	uint64		

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-20 Thread Masahiko Sawada
On Fri, Aug 18, 2023 at 10:51 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Peter,
>
> PSA new version patch set.
>

I've looked at the v22 patch set, and here are some comments:

0001:

Do we need regression tests to make sure that the slot's
confirmed_flush_lsn matches the LSN of the latest shutdown_checkpoint
record?

0002:

+   
+Prepare for publisher upgrades
+

Should this step be done before "8. Stop both servers" as it might
require to disable subscriptions and to drop 'lost' replication slots?

Why is there no explanation about the slots' confirmed_flush_lsn check
as prerequisites?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Add function to_oct

2023-08-20 Thread Nathan Bossart
On Sat, Aug 19, 2023 at 11:41:56AM +0700, John Naylor wrote:
> This looks nicer, but still doesn't save the starting pointer, and so needs
> to lug around that big honking macro. This is what I mean:
> 
> static inline text *
> convert_to_base(uint64 value, int base)
> {
>   const char *digits = "0123456789abcdef";
>   /* We size the buffer for to_binary's longest possible return value. */
>   charbuf[sizeof(uint64) * BITS_PER_BYTE];
>   char * const end =  buf + sizeof(buf);
>   char *ptr = end;
> 
>   Assert(base > 1);
>   Assert(base <= 16);
> 
>   do
>   {
> *--ptr = digits[value % base];
> value /= base;
>   } while (ptr > buf && value);
> 
>   return cstring_to_text_with_len(ptr,  end - ptr);
> }

I will use this in v8.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: PostgreSQL 16 RC1 + GA release dates

2023-08-20 Thread Tom Lane
"Jonathan S. Katz"  writes:
> The date for PostgreSQL 16 Release Candidate 1 (RC1) is August 31, 2023. 
> Please ensure all open items[1] are completed and committed before 
> August 26, 2023 12:00 UTC.

FYI, I moved the "Oversight in reparameterize_path_by_child leading to
executor crash" open item to the "Older bugs affecting stable branches"
section, because it is in fact an old bug: the given test case crashes
in every branch that has enable_partitionwise_join.  I'll still look
at getting in the fix before RC1, but we should understand what we're
dealing with.

regards, tom lane




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-20 Thread Masahiko Sawada
On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila  wrote:
>
> On Thu, Aug 17, 2023 at 6:07 PM Masahiko Sawada  wrote:
> >
> > On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila  
> > wrote:
> > >
> > > On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada 
> > > > >  wrote:
> > > > > > Another idea is (which might have already discussed thoguh) that we 
> > > > > > check if the latest shutdown checkpoint LSN in the control file 
> > > > > > matches the confirmed_flush_lsn in pg_replication_slots view. That 
> > > > > > way, we can ensure that the slot has consumed all WAL records 
> > > > > > before the last shutdown. We don't need to worry about WAL records 
> > > > > > generated after starting the old cluster during the upgrade, at 
> > > > > > least for logical replication slots.
> > > > > >
> > > > >
> > > > > Right, this is somewhat closer to what Patch is already doing. But
> > > > > remember in this case we need to remember and use the latest
> > > > > checkpoint from the control file before the old cluster is started
> > > > > because otherwise the latest checkpoint location could be even updated
> > > > > during the upgrade. So, instead of reading from WAL, we need to change
> > > > > so that we rely on the control file's latest LSN.
> > > >
> > > > Yes, I was thinking the same idea.
> > > >
> > > > But it works for only replication slots for logical replication. Do we
> > > > want to check if no meaningful WAL records are generated after the
> > > > latest shutdown checkpoint, for manually created slots (or non-logical
> > > > replication slots)? If so, we would need to have something reading WAL
> > > > records in the end.
> > > >
> > >
> > > This feature only targets logical replication slots. I don't see a
> > > reason to be different for manually created logical replication slots.
> > > Is there something particular that you think we could be missing?
> >
> > Sorry I was not clear. I meant the logical replication slots that are
> > *not* used by logical replication, i.e., are created manually and used
> > by third party tools that periodically consume decoded changes. As we
> > discussed before, these slots will never be able to pass that
> > confirmed_flush_lsn check.
> >
>
> I think normally one would have a background process to periodically
> consume changes. Won't one can use the walsender infrastructure for
> their plugins to consume changes probably by using replication
> protocol?

Not sure.

> Also, I feel it is the plugin author's responsibility to
> consume changes or advance slot to the required position before
> shutdown.

How does the plugin author ensure that the slot consumes all WAL
records including shutdown_checkpoint before shutdown?

>
> > After some thoughts, one thing we might
> > need to consider is that in practice, the upgrade project is performed
> > during the maintenance window and has a backup plan that revert the
> > upgrade process, in case something bad happens. If we require the
> > users to drop such logical replication slots, they cannot resume to
> > use the old cluster in that case, since they would need to create new
> > slots, missing some changes.
> >
>
> Can't one keep the backup before removing slots?

Yes, but restoring the back could take time.

>
> > Other checks in pg_upgrade seem to be
> > compatibility checks that would eventually be required for the upgrade
> > anyway. Do we need to consider this case? For example, we do that
> > confirmed_flush_lsn check for only the slots with pgoutput plugin.
> >
>
> I think one is allowed to use pgoutput plugin even for manually
> created slots. So, such a check may not work.

Right, but I thought it's a very rare case.

Since the slot's flushed_confirmed_lsn check is not a compatibility
check unlike the existing check, I wonder if we can make it optional.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: WIP: new system catalog pg_wait_event

2023-08-20 Thread Michael Paquier
On Sat, Aug 19, 2023 at 06:30:12PM +0200, Drouvot, Bertrand wrote:
> Thanks, fixed in v10.

Okay.  I have done an extra review of it, simplifying a few things in
the function, the comments and the formatting, and applied the patch.
Thanks!

I have somewhat managed to miss the catalog version in the initial
commit, but fixed that quickly.
--
Michael


signature.asc
Description: PGP signature