date:20231122

pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

2023-11-22 Thread Nikhil Benesch

While working on Materialize's streaming logical replication from Postgres [0],
my colleagues Sean Loiselle and Petros Angelatos (CC'd) discovered today what
appears to be a correctness bug in pgoutput, introduced in v15.

The problem goes like this. A table with REPLICA IDENTITY FULL and some
data in it...

CREATE TABLE t (a int);
ALTER TABLE t REPLICA IDENTITY FULL;
INSERT INTO t VALUES (1), (2), (3), ...;

...undergoes a schema change to add a new column with a default:

ALTER TABLE t ADD COLUMN b bool DEFAULT false NOT NULL;

PostgreSQL is smart and does not rewrite the entire table during the schema
change. Instead it updates the tuple description to indicate to future readers
of the table that if `b` is missing, it should be filled in with the default
value, `false`.

Unfortunately, since v15, pgoutput mishandles missing attributes. If a
downstream server is subscribed to changes from t via the pgoutput plugin, when
a row with a missing attribute is updated, e.g.:

UPDATE t SET a = 2 WHERE a = 1

pgoutput will incorrectly report b's value as NULL in the old tuple, rather than
false. Using the same example:

old: a=1, b=NULL
new: a=2, b=true

The subscriber will ignore the update (as it has no row with values
a=1, b=NULL), and thus the subscriber's copy of `t` will become out of sync with
the publisher's.

I bisected the problem to 52e4f0cd4 [1], which introduced row filtering for
publications. The problem appears to be the use of CreateTupleDescCopy where
CreateTupleDescCopyConstr is required, as the former drops the constraints
in the tuple description (specifically, the default value constraint) on the
floor.

I've attached a patch which both fixes the issue and includes a test. I've
verified that the test fails against the current master and passes against
the patched version.

I'm relatively unfamiliar with the project norms here, but assuming the patch is
acceptable, this strikes me as important enough to warrant a backport to both
v15 and v16.

[0]: https://materialize.com/docs/sql/create-source/postgres
[1]: 
https://github.com/postgres/postgres/commit/52e4f0cd472d39d07732b99559989ea3b615be78


0001-pgoutput-don-t-unconditionally-fill-in-missing-value.patch
Description: Binary data

Re: GUC names in messages

2023-11-22 Thread Peter Smith

On Thu, Nov 9, 2023 at 10:04 PM Alvaro Herrera  wrote:
>
> On 2023-Nov-09, Peter Smith wrote:
>
> > Notice that NOT QUOTED is the far more common pattern, so my vote
> > would be just to standardise on making everything this way. I know
> > there was some concern raised about ambiguous words like "timezone"
> > and "datestyle" etc but in practice, those are rare. Also, those GUCs
> > are different in that they are written as camel-case (e.g.
> > "DateStyle") in the guc_tables.c, so if they were also written
> > camel-case in the messages that could remove ambiguities with normal
> > words. YMMV.
>
> Well, I think camel-casing is also a sufficient differentiator for these
> identifiers not being English words.  We'd need to ensure they're always
> written that way, when not quoted.  However, in cases where arbitrary
> values are expanded, I don't know that they would be expanded that way,
> so I would still go for quoting in that case.
>
> There's also a few that are not camel-cased nor have any underscores --
> looking at postgresql.conf.sample, we have "port", "bonjour", "ssl",
> "fsync", "geqo", "jit", "autovacuum", "xmlbinary", "xmloption".  (We also
> have "include", but I doubt that's ever used in an error message).  But
> actually, there's more: every reloption is a candidate, and there we
> have "fillfactor", "autosummarize", "fastupdate", "buffering".  So if we
> want to make generic advice on how to deal with option names in error
> messages, I think the wording on conditional quoting I proposed should
> go in (adding CamelCase as a reason not to quote), and then we can fix
> the code to match.  Looking at your list, I think the changes to make
> are not too numerous.
>

Sorry for my delay in getting back to this thread.

PSA a patch for this work.

There may be some changes I've missed, but hopefully, this is a nudge
in the right direction.

==
Kind Regards,
Peter Smith.
Fujitsu Australia.


v1-0001-GUC-names-docs.patch
Description: Binary data


v1-0002-GUC-names-fix-quotes.patch
Description: Binary data

Catalog domain not-null constraints

2023-11-22 Thread Peter Eisentraut

This patch set applies the explicit catalog representation of not-null 
constraints introduced by b0e96f3119 for table constraints also to 
domain not-null constraints.


Since there is no inheritance or primary keys etc., this is much simpler 
and just applies the existing infrastructure to domains as well.  As a 
result, domain not-null constraints now appear in the information schema 
correctly.  Another effect is that you can now use the ALTER DOMAIN ... 
ADD/DROP CONSTRAINT syntax for not-null constraints as well.  This makes 
everything consistent overall.


For the most part, I structured the code so that there are now separate 
sibling subroutines for domain check constraints and domain not-null 
constraints.  This seemed to work out best, but one could also consider 
other ways to refactor this.From c76dd1e21fb2c0a35d45106649788afe2b2b59bd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 23 Nov 2023 07:35:32 +0100
Subject: [PATCH v1 1/2] Add tests for domain-related information schema views

---
 src/test/regress/expected/domain.out | 47 
 src/test/regress/sql/domain.sql  | 24 ++
 2 files changed, 71 insertions(+)

diff --git a/src/test/regress/expected/domain.out 
b/src/test/regress/expected/domain.out
index 6d94e84414..e70aebd70c 100644
--- a/src/test/regress/expected/domain.out
+++ b/src/test/regress/expected/domain.out
@@ -1207,3 +1207,50 @@ create domain testdomain1 as int constraint unsigned 
check (value > 0);
 alter domain testdomain1 rename constraint unsigned to unsigned_foo;
 alter domain testdomain1 drop constraint unsigned_foo;
 drop domain testdomain1;
+--
+-- Information schema
+--
+SELECT * FROM information_schema.column_domain_usage
+  WHERE domain_name IN ('con', 'dom', 'pos_int', 'things')
+  ORDER BY domain_name;
+ domain_catalog | domain_schema | domain_name | table_catalog | table_schema | 
table_name | column_name 
++---+-+---+--++-
+ regression | public| con | regression| public   | 
domcontest | col1
+ regression | public| dom | regression| public   | 
domview| col1
+ regression | public| things  | regression| public   | 
thethings  | stuff
+(3 rows)
+
+SELECT * FROM information_schema.domain_constraints
+  WHERE domain_name IN ('con', 'dom', 'pos_int', 'things')
+  ORDER BY constraint_name;
+ constraint_catalog | constraint_schema | constraint_name | domain_catalog | 
domain_schema | domain_name | is_deferrable | initially_deferred 
++---+-++---+-+---+
+ regression | public| con_check   | regression | 
public| con | NO| NO
+ regression | public| meow| regression | 
public| things  | NO| NO
+ regression | public| pos_int_check   | regression | 
public| pos_int | NO| NO
+(3 rows)
+
+SELECT * FROM information_schema.domains
+  WHERE domain_name IN ('con', 'dom', 'pos_int', 'things')
+  ORDER BY domain_name;
+ domain_catalog | domain_schema | domain_name | data_type | 
character_maximum_length | character_octet_length | character_set_catalog | 
character_set_schema | character_set_name | collation_catalog | 
collation_schema | collation_name | numeric_precision | numeric_precision_radix 
| numeric_scale | datetime_precision | interval_type | interval_precision | 
domain_default | udt_catalog | udt_schema | udt_name | scope_catalog | 
scope_schema | scope_name | maximum_cardinality | dtd_identifier 
++---+-+---+--++---+--++---+--++---+-+---++---+++-++--+---+--++-+
+ regression | public| con | integer   |
  ||   |  | 
   |   |  ||
32 |   2 | 0 || 
  ||| regression  | pg_catalog 
| int4 |   |  || | 1
+ regression | public| dom | integer   |
  ||   |  | 
   |   |

Re: remaining sql/json patches

2023-11-22 Thread jian he

minor issue.
maybe you can add the following after
/src/test/regress/sql/jsonb_sqljson.sql: 127.
Test coverage for ExecPrepareJsonItemCoercion function.

SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING date '2018-02-21
12:34:56 +10' AS ts returning date);
SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING time '2018-02-21
12:34:56 +10' AS ts returning time);
SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING timetz '2018-02-21
12:34:56 +10' AS ts returning timetz);
SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING timestamp '2018-02-21
12:34:56 +10' AS ts returning timestamp);

Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

2023-11-22 Thread Dilip Kumar

On Tue, Nov 21, 2023 at 2:03 PM Dilip Kumar  wrote:
>
> On Mon, Nov 20, 2023 at 4:42 PM Dilip Kumar  wrote:
> >
> > On Mon, Nov 20, 2023 at 2:37 PM Andrey M. Borodin  
> > wrote:
> >
> > > > On 20 Nov 2023, at 13:51, Dilip Kumar  wrote:
> > > >
> > > > 2) Do we really need one separate lwlock tranche for each SLRU?
> > > >
> > > > IMHO if we use the same lwlock tranche then the wait event will show
> > > > the same wait event name, right? And that would be confusing for the
> > > > user, whether we are waiting for Subtransaction or Multixact or
> > > > anything else.  Is my understanding no correct here?
> > >
> > > If we give to a user multiple GUCs to tweak, I think we should give a way 
> > > to understand which GUC to tweak when they observe wait times.
>
> PFA, updated patch set, I have worked on review comments by Alvaro and
> Andrey.  So the only open comments are about clog group commit
> testing, for that my question was as I sent in the previous email
> exactly what part we are worried about in the coverage report?
>
> The second point is, if we want to generate a group update we will
> have to create the injection point after we hold the control lock so
> that other processes go for group update and then for waking up the
> waiting process who is holding the SLRU control lock in the exclusive
> mode we would need to call a function ('test_injection_points_wake()')
> to wake that up and for calling the function we would need to again
> acquire the SLRU lock in read mode for visibility check in the catalog
> for fetching the procedure row and now this wake up session will block
> on control lock for the session which is waiting on injection point so
> now it will create a deadlock.   Maybe with bank-wise lock we can
> create a lot of transaction so that these 2 falls in different banks
> and then we can somehow test this, but then we will have to generate
> 16 * 4096 = 64k transaction so that the SLRU banks are different for
> the transaction which inserted procedure row in system table from the
> transaction in which we are trying to do the group commit

I have attached a POC patch for testing the group update using the
injection point framework.  This is just for testing the group update
part and is not yet a committable test.  I have added a bunch of logs
in the code so that we can see what's going on with the group update.
>From the below logs, we can see that multiple processes are getting
accumulated for the group update and the leader is updating their xid
status.


Note: With this testing, we have found a bug in the bank-wise
approach, basically we are clearing a procglobal->clogGroupFirst, even
before acquiring the bank lock that means in most of the cases there
will be a single process in each group as a group leader (I think this
is what Alvaro was pointing in his coverage report).  I have added
this fix in this POC just for testing purposes but in my next version
I will add this fix to my proper patch version after a proper review
and a bit more testing.


here is the output after running the test
==
2023-11-23 05:55:29.399 UTC [93367] 003_clog_group_commit.pl LOG:
procno 6 got the lock
2023-11-23 05:55:29.399 UTC [93367] 003_clog_group_commit.pl
STATEMENT:  SELECT txid_current();
2023-11-23 05:55:29.406 UTC [93369] 003_clog_group_commit.pl LOG:
statement: SELECT test_injection_points_attach('ClogGroupCommit',
'wait');
2023-11-23 05:55:29.415 UTC [93371] 003_clog_group_commit.pl LOG:
statement: INSERT INTO test VALUES(1);
2023-11-23 05:55:29.416 UTC [93371] 003_clog_group_commit.pl LOG:
procno 4 got the lock
2023-11-23 05:55:29.416 UTC [93371] 003_clog_group_commit.pl
STATEMENT:  INSERT INTO test VALUES(1);
2023-11-23 05:55:29.424 UTC [93373] 003_clog_group_commit.pl LOG:
statement: INSERT INTO test VALUES(2);
2023-11-23 05:55:29.425 UTC [93373] 003_clog_group_commit.pl LOG:
procno 3 for xid 128742 added for group update
2023-11-23 05:55:29.425 UTC [93373] 003_clog_group_commit.pl
STATEMENT:  INSERT INTO test VALUES(2);
2023-11-23 05:55:29.431 UTC [93376] 003_clog_group_commit.pl LOG:
statement: INSERT INTO test VALUES(3);
2023-11-23 05:55:29.438 UTC [93378] 003_clog_group_commit.pl LOG:
statement: INSERT INTO test VALUES(4);
2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl LOG:
procno 2 for xid 128743 added for group update
2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl
STATEMENT:  INSERT INTO test VALUES(3);
2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl LOG:
procno 2 is follower and wait for group leader to update commit status
of xid 128743
2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl
STATEMENT:  INSERT INTO test VALUES(3);
2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl LOG:
procno 1 for xid 128744 added for group update
2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl
STATEMENT:  INSERT INTO test VALUES(4);
2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl LOG:
procno 1 is

Re: Synchronizing slots from primary to standby

2023-11-22 Thread Amit Kapila

On Tue, Nov 21, 2023 at 4:35 PM Drouvot, Bertrand
 wrote:
>
> On 11/21/23 10:32 AM, shveta malik wrote:
> > On Tue, Nov 21, 2023 at 2:02 PM shveta malik  wrote:
> >>
>
> > v37 fails to apply to HEAD due to a recent commit e83aa9f92fdd,
> > rebased the patches.  PFA v37_2 patches.
>
> Thanks!
>
> Regarding the promotion flow: If the primary is available and reachable I 
> don't
> think we currently try to ensure that slots are in sync. I think we'd miss the
> activity since the last sync and the promotion request or am I missing 
> something?
>
> If the primary is available and reachable shouldn't we launch a last round of
> synchronization (skipping all the slots that are not in 'r' state)?
>

We may miss the last round but there is no guarantee that we can
ensure to sync of everything if the primary is available. Because
after our last sync, there could probably be some more activity. I
think it is the user's responsibility to promote a new primary when
the old one is not required for some reason. It is not only slots that
can be out of sync but even we can miss fetching some of the data. I
think this is quite similar to what we do for WAL where on finding the
promotion signal, we shut down Walreceiver and just replay any WAL
that was already received by walreceiver. Also, the promotion
shouldn't create any problem w.r.t subscribers connecting to the new
primary because the slot's position is slightly behind what could be
requested by subscribers which means the corresponding data will be
available on the new primary.

Do you have something in mind that can create any problem if we don't
attempt additional fetching round after the promotion signal is
received?

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] Typo in sequence.c

2023-11-22 Thread Bruce Momjian

On Fri, Jan 15, 2016 at 01:18:03PM +0900, Vinayak Pokale wrote:
> Hi,
> 
> I found a typo in sequence.c
> Please check the attached patch.

I am not sure how to put this, but the typos are still there seven years
after you reported it, so fixed in master.

---

> diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
> index c98f981..25c57f6 100644
> --- a/src/backend/commands/sequence.c
> +++ b/src/backend/commands/sequence.c
> @@ -433,7 +433,7 @@ AlterSequence(AlterSeqStmt *stmt)
>   aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
>  stmt->sequence->relname);
>  
> - /* lock page' buffer and read tuple into new sequence structure */
> + /* lock page buffer and read tuple into new sequence structure */
>   seq = read_seq_tuple(elm, seqrel, , );
>  
>   /* Copy old values of options into workspace */
> @@ -582,7 +582,7 @@ nextval_internal(Oid relid)
>   return elm->last;
>   }
>  
> - /* lock page' buffer and read tuple */
> + /* lock page buffer and read tuple */
>   seq = read_seq_tuple(elm, seqrel, , );
>   page = BufferGetPage(buf);
>  
> @@ -876,7 +876,7 @@ do_setval(Oid relid, int64 next, bool iscalled)
>*/
>   PreventCommandIfParallelMode("setval()");
>  
> - /* lock page' buffer and read tuple */
> + /* lock page buffer and read tuple */
>   seq = read_seq_tuple(elm, seqrel, , );
>  
>   if ((next < seq->min_value) || (next > seq->max_value))

> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hack...@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Lockless exit path for ReplicationOriginExitCleanup

2023-11-22 Thread Bharath Rupireddy

On Wed, Nov 22, 2023 at 3:06 PM Alvaro Herrera  wrote:
>
> Hello,
>
> On 2023-Nov-22, Bharath Rupireddy wrote:
>
> > While looking at the use of session_replication_state, I noticed that
> > ReplicationOriginLock is acquired in ReplicationOriginExitCleanup()
> > even if session_replication_state is reset to NULL by
> > replorigin_session_reset(). Why can't there be a lockless exit path
> > something like [1] similar to
> > replorigin_session_reset() which checks session_replication_state ==
> > NULL without a lock?
>
> I suppose we can do this on consistency grounds -- I'm pretty sure you'd
> have a really hard time proving that this makes a performance difference --

Yes, can't measure the perf gain, however, as said upthread
https://www.postgresql.org/message-id/CALj2ACVVhPn7BVQZLGPVvBrLoDZNRaV0LS9rBpt5y%2Bj%3DxRebWw%40mail.gmail.com,
it avoids unnecessary lock acquisition and release.

> but this patch is incomplete: just two lines below, we're still testing
> session_replication_state for nullness, which would now be dead code.
> Please repair.

Done.

> The comment on session_replication_state is confusing also:
>
> /*
>  * Backend-local, cached element from ReplicationState for use in a backend
>  * replaying remote commits, so we don't have to search ReplicationState for
>  * the backends current RepOriginId.
>  */
>
> My problem with it is that this is not a "cached element", but instead a
> "cached pointer to [shared memory]".  This is what makes testing that
> pointer for null-ness doable, but access to each member therein
> requiring lwlock acquisition.

Right. I've reworded the comment a bit.

PSA v1 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v1-0001-Introduce-lockless-exit-path-for-ReplicationOrigi.patch
Description: Binary data

Re: [HACKERS] pg_upgrade vs vacuum_cost_delay

2023-11-22 Thread Bruce Momjian

On Thu, Jun 16, 2016 at 04:45:14PM +0200, Magnus Hagander wrote:
> On Thu, Jun 16, 2016 at 4:35 PM, Euler Taveira  wrote:
> 
> On 16-06-2016 09:05, Magnus Hagander wrote:
> > Shouldn't pg_upgrade turn off vacuum cost delay when it vacuums the new
> > cluster? Not talking about the post-analyze script, but when it runs
> > vacuumdb to analyze and freeze before loading the new schema, in
> > prepare_new_cluster()? Those run during downtime, so it seems like you'd
> > want those to run as fast as possible.
> >
> Doesn't --new-options do the job?
> 
> 
> You could, but it seems like it should do it by default. 

Based on this seven year old post, I realized there are minimal
directions in pg_upgrade docs about how to generate statistics quickly,
so I created this patch to help.

We do have docs on updating planner statistics:


https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-STATISTICS

but that doesn't seem to cover cases where you are doing an upgrade or
pg_dump restore.  Should I move this information into there instead?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4f78e0e1c0..c72e69bb67 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -784,6 +784,14 @@ psql --username=postgres --file=script.sql postgres
  of the upgrade.  You might need to set connection parameters to
  match your new cluster.
 
+
+
+ Using vacuumdb --all --analyze-only can efficiently
+ generate such statistics, and the use of --jobs
+ can speed it up.  Option --analyze-in-stages can
+ be used to generate minimal statistics quickly.  Non-zero values of
+ vacuum_cost_delay will delay statistics generation.
+

Re: Lockless exit path for ReplicationOriginExitCleanup

2023-11-22 Thread Bharath Rupireddy

On Wed, Nov 22, 2023 at 2:28 PM Amit Kapila  wrote:
>
> On Wed, Nov 22, 2023 at 2:12 PM Bharath Rupireddy
>  wrote:
> >
> > While looking at the use of session_replication_state, I noticed that
> > ReplicationOriginLock is acquired in ReplicationOriginExitCleanup()
> > even if session_replication_state is reset to NULL by
> > replorigin_session_reset(). Why can't there be a lockless exit path
> > something like [1] similar to
> > replorigin_session_reset() which checks session_replication_state ==
> > NULL without a lock?
> >
>
> I don't see any problem with such a check but not sure of the benefit
> of doing so either.

It avoids an unnecessary lock acquisition and release when
session_replication_state is already reset by
replorigin_session_reset() before reaching
ReplicationOriginExitCleanup(). A patch something like [1] and a run
of subscription tests shows that 153 times the lock acquisition and
release can be avoided.

ubuntu:~/postgres/src/test/subscription$ grep -ir "with
session_replication_state NULL" . | wc -l
153

ubuntu:~/postgres/src/test/subscription$ grep -ir "with
session_replication_state not NULL" . | wc -l
174

[1]
diff --git a/src/backend/replication/logical/origin.c
b/src/backend/replication/logical/origin.c
index 460e3dcc38..dd3824bd27 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -1056,6 +1056,11 @@ ReplicationOriginExitCleanup(int code, Datum arg)
 {
ConditionVariable *cv = NULL;

+   if (session_replication_state == NULL)
+   elog(LOG, "In ReplicationOriginExitCleanup() with
session_replication_state NULL");
+   else
+   elog(LOG, "In ReplicationOriginExitCleanup() with
session_replication_state not NULL");
+
LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);

if (session_replication_state != NULL &&

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.

2023-11-22 Thread Bowen Shi

> I don't think it would be useful to limit this at an arbitrary point,
iteration
> count can be set per password and if someone wants a specific password to
be
> super-hard to brute force then why should we limit that?
I agree with that. Maybe some users do want a super-hard password.
RFC 7677 and RFC 5802 don't specify the maximum number of iterations.

> If we want to add CHECK_FOR_INTERRUPTS inside the loop I think a brief
> comment would be appropriate.

This has been completed in patch v2 and it's ready for review.

Regards
Bowen Shi
From 89c4de0a814d5343c54d9e8501986892fbb4b33e Mon Sep 17 00:00:00 2001
From: bovenshi 
Date: Wed, 22 Nov 2023 19:30:56 +0800
Subject: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.

When the scram_iterations value is set too large, the backend would hang for
a long time. Add CHECK_FOR_INTERRUPTS within the loop of scram_SaltedPassword
to handle any signals received during this period.
---
 src/common/scram-common.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/common/scram-common.c b/src/common/scram-common.c
index ef997ef..bdf40e8 100644
--- a/src/common/scram-common.c
+++ b/src/common/scram-common.c
@@ -15,6 +15,7 @@
  */
 #ifndef FRONTEND
 #include "postgres.h"
+#include "miscadmin.h"
 #else
 #include "postgres_fe.h"
 #endif
@@ -73,6 +74,13 @@ scram_SaltedPassword(const char *password,
 	/* Subsequent iterations */
 	for (i = 2; i <= iterations; i++)
 	{
+		/* 
+		 * Allow it to be interrupted is necesssary when scram_iterations 
+		 * is set to a large value. However, this only works in the backend.
+		 */
+#ifndef FRONTEND
+		CHECK_FOR_INTERRUPTS();
+#endif
 		if (pg_hmac_init(hmac_ctx, (uint8 *) password, password_len) < 0 ||
 			pg_hmac_update(hmac_ctx, (uint8 *) Ui_prev, key_length) < 0 ||
 			pg_hmac_final(hmac_ctx, Ui, key_length) < 0)
-- 
2.9.3

Re: Adding a clang-format file

2023-11-22 Thread Ray Eldath

Hi, Just wondering would you mind sharing your .clang-format file? I find
the attachment you pointed to is only a demo with a “…” line and it doesn’t
format PG code very well.

btw I’m also greatly in favor of this idea. clang-format is tightly integrated 
into
CLion and it’s very convenient. right now I’m using a CLion preset by Greenplum 
[1]
but it doesn’t handle struct fields and variables column alignment very well.

[1] 
https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/rDYSYotssbE/m/7HgsWuj7AwAJ

> On Sep 28, 2022, at 08:43, Peter Geoghegan  wrote:
> 
> On Tue, Sep 27, 2022 at 5:35 AM Aleksander Alekseev
>  wrote:
>> Personally I don't have anything against the idea. TimescaleDB uses
>> clang-format to mimic pgindent and it works quite well. One problem
>> worth mentioning though is that the clang-format file is dependent on
>> the particular version of clang-format.
> 
> I was hoping that something generic could work here. Something that we
> could provide that didn't claim to be authoritative, that has a
> reasonable degree of compatibility that allows most people to use the
> file without much fuss. Kind of like our .editorconfig file. That
> might not be a realistic goal, though, since the clang-format settings
> are all quite complicated.
> 
> -- 
> Peter Geoghegan
> 
> 
>

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-11-22 Thread Bowen Shi

Hi,

I used the latest code and found some conflicts while applying. Which PG
version did you rebase?

Regards
Bowen Shi

Re: [HACKERS] psql casts aspersions on server reliability

2023-11-22 Thread Bruce Momjian

On Wed, Nov 22, 2023 at 07:38:52PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:
> >> I could go along with just dropping the last sentence ("This probably...")
> >> if the last error we got was FATAL level.  I don't find "unexpectedly"
> >> to be problematic here: from the point of view of psql, and probably
> >> of its user, the shutdown *was* unexpected.
> 
> > I looked at this thread from 2016 and I think the problem is the
> > "abnormally" word, since if the server was shutdown by the administrator
> > (most likely), it isn't abnormal.  Here is a patch to remove
> > "abnormally".
> 
> I do not think this is an improvement.  The places you are changing
> are reacting to a connection closure.  *If* we had previously gotten a
> "FATAL: terminating connection due to administrator command" message,
> then yeah the connection closure is expected; but if not, it isn't.
> Your patch adds no check for that.  (As I remarked in 2016, we could
> probably condition this on the elevel being FATAL, rather than
> checking for specific error messages.)

Yes, you are correct.  Here is a patch that implements the FATAL test,
though I am not sure I have the logic correct or backwards, and I don't
know how to test this.  Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 660cdec93c..c541fd8b02 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -749,8 +749,10 @@ retry4:
 	 */
 definitelyEOF:
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-			"\tThis probably means the server terminated abnormally\n"
-			"\tbefore or while processing the request.");
+			"\tThis probably means the server terminated%s\n"
+			"\tbefore or while processing the request.",
+			conn->result->resultStatus == PGRES_FATAL_ERROR ?
+			"" : "abnormally");
 
 	/* Come here if lower-level code already set a suitable errorMessage */
 definitelyFailed:
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index f1192d28f2..6c21f91817 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -206,8 +206,10 @@ rloop:
 if (result_errno == EPIPE ||
 	result_errno == ECONNRESET)
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-			"\tThis probably means the server terminated abnormally\n"
-			"\tbefore or while processing the request.");
+			  "\tThis probably means the server terminated%s\n"
+			  "\tbefore or while processing the request.",
+			  conn->result->resultStatus == PGRES_FATAL_ERROR ?
+			  "" : "abnormally");
 else
 	libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
 			SOCK_STRERROR(result_errno,
@@ -306,8 +308,10 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 result_errno = SOCK_ERRNO;
 if (result_errno == EPIPE || result_errno == ECONNRESET)
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-			"\tThis probably means the server terminated abnormally\n"
-			"\tbefore or while processing the request.");
+			  "\tThis probably means the server terminated%s\n"
+			  "\tbefore or while processing the request.",
+			  conn->result->resultStatus == PGRES_FATAL_ERROR ?
+			  "" : "abnormally");
 else
 	libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
 			SOCK_STRERROR(result_errno,
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index bd72a87bbb..5e7136195a 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -233,8 +233,10 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
 			case EPIPE:
 			case ECONNRESET:
 libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-		"\tThis probably means the server terminated abnormally\n"
-		"\tbefore or while processing the request.");
+		"\tThis probably means the server terminated%s\n"
+		"\tbefore or while processing the request.",
+		conn->result->resultStatus == PGRES_FATAL_ERROR ?
+		"" : "abnormally");
 break;
 
 			default:
@@ -395,8 +397,11 @@ retry_masked:
 /* (strdup failure is OK, we'll cope later) */
 snprintf(msgbuf, sizeof(msgbuf),
 		 libpq_gettext("server closed the connection unexpectedly\n"
-	   "\tThis probably means the server terminated abnormally\n"
-	   "\tbefore or while processing the request."));
+	   "\tThis probably means the server terminated%s\n"
+	   "\tbefore or while processing the request."),
+

Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

2023-11-22 Thread Richard Guo

On Sun, Nov 19, 2023 at 9:17 AM Alexander Korotkov 
wrote:

> It's here.  New REALLOCATE_BITMAPSETS forces bitmapset reallocation on
> each modification.

+1 to the idea of introducing a reallocation mode to Bitmapset.

> I had the feeling of falling into a rabbit hole while debugging all
> the cases of failure with this new option.  With the second patch
> regressions tests pass.

It seems to me that we have always had situations where we share the
same pointer to a Bitmapset structure across different places.  I do not
think this is a problem as long as we do not modify the Bitmapsets in a
way that requires reallocation or impact the locations sharing the same
pointer.

So I'm wondering, instead of attempting to avoid sharing pointer to
Bitmapset in all locations that have problems, can we simply bms_copy
the original Bitmapset within replace_relid() before making any
modifications, as I proposed previously?  Of course, as Andres pointed
out, we need to do so also for the "Delete relid without substitution"
path.  Please see the attached.

Thanks
Richard

v2-0001-Fix-how-SJE-replaces-join-clauses.patch
Description: Binary data

Re: [HACKERS] Changing references of password encryption to hashing

2023-11-22 Thread Bruce Momjian

On Wed, Nov 22, 2023 at 07:01:32PM -0500, Bruce Momjian wrote:
> On Wed, Nov 22, 2023 at 05:55:06PM -0500, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > On Wed, Nov 22, 2023 at 12:52:23PM -0800, Andres Freund wrote:
> > >> What's the point of randomly reviving threads from 6 years ago, without 
> > >> any
> > >> further analysis?
> > 
> > > Well, I feel like this is an imporant change, and got dropped because it
> > > was determined to not be a new change in Postgres 10.  Still I think it
> > > should be addressed and am looking to see if others feel the same.
> > 
> > You're expecting people to re-research something that dropped off the
> > radar years ago, probably because it wasn't sufficiently interesting
> > at the time.  If the point hasn't been raised again since then,
> > I'd guess it's not any more interesting now.  Most of us have better
> > things to do than go re-read old threads --- and dredging them up
> > ten at a time is not improving the odds that people will care to look.
> 
> I can do the work. I was more looking for any feedback that this is a
> valid area to work on.  I will do some work on this and publish an
> updated patch because I think using the right terms makes sense.

Let me also add that I apologize for brining up these old threads.  I
feel badly I didn't address them years ago, I feel bad for the original
posters, and I do think there is some value in addressing some of them,
which I think is validated by the many useful doc patches I have made
over the past few weeks.  I am over half done and hope everyone can be
patient with me while I do my best to finish this long-overdue job.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: [HACKERS] psql casts aspersions on server reliability

2023-11-22 Thread Tom Lane

Bruce Momjian  writes:
> On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:
>> I could go along with just dropping the last sentence ("This probably...")
>> if the last error we got was FATAL level.  I don't find "unexpectedly"
>> to be problematic here: from the point of view of psql, and probably
>> of its user, the shutdown *was* unexpected.

> I looked at this thread from 2016 and I think the problem is the
> "abnormally" word, since if the server was shutdown by the administrator
> (most likely), it isn't abnormal.  Here is a patch to remove
> "abnormally".

I do not think this is an improvement.  The places you are changing
are reacting to a connection closure.  *If* we had previously gotten a
"FATAL: terminating connection due to administrator command" message,
then yeah the connection closure is expected; but if not, it isn't.
Your patch adds no check for that.  (As I remarked in 2016, we could
probably condition this on the elevel being FATAL, rather than
checking for specific error messages.)

regards, tom lane

Re: pg_upgrade and logical replication

2023-11-22 Thread Peter Smith

Here are some review comments for patch v17-0001

==
src/bin/pg_dump/pg_dump.c

1. getSubscriptionTables

+/*
+ * getSubscriptionTables
+ *   Get information about subscription membership for dumpable tables. This
+ *will be used only in binary-upgrade mode and for PG17 or later versions.
+ */
+void
+getSubscriptionTables(Archive *fout)
+{
+ DumpOptions *dopt = fout->dopt;
+ SubscriptionInfo *subinfo = NULL;
+ SubRelInfo *subrinfo;
+ PQExpBuffer query;
+ PGresult   *res;
+ int i_srsubid;
+ int i_srrelid;
+ int i_srsubstate;
+ int i_srsublsn;
+ int ntups;
+ Oid last_srsubid = InvalidOid;
+
+ if (dopt->no_subscriptions || !dopt->binary_upgrade ||
+ fout->remoteVersion < 17)
+ return;

I still felt that the function comment ("used only in binary-upgrade
mode and for PG17 or later") was misleading. IMO that sounds like it
would be OK for PG17 regardless of the binary mode, but the code says
otherwise.

Assuming the code is correct, perhaps the comment should say:
"... used only in binary-upgrade mode for PG17 or later versions."

~~~

2. dumpSubscriptionTable

+/*
+ * dumpSubscriptionTable
+ *   Dump the definition of the given subscription table mapping. This will be
+ *used only in binary-upgrade mode and for PG17 or later versions.
+ */
+static void
+dumpSubscriptionTable(Archive *fout, const SubRelInfo *subrinfo)

(this is the same as the previous review comment #1)

Assuming the code is correct, perhaps the comment should say:
"... used only in binary-upgrade mode for PG17 or later versions."

==
src/bin/pg_upgrade/check.c

3.
+static void
+check_old_cluster_subscription_state()
+{
+ FILE*script = NULL;
+ char output_path[MAXPGPATH];
+ int ntup;
+ ClusterInfo *cluster = _cluster;
+
+ prep_status("Checking for subscription state");
+
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "subs_invalid.txt");
+ for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ PGresult   *res;
+ DbInfo*active_db = >dbarr.dbs[dbnum];
+ PGconn*conn = connectToServer(cluster, active_db->db_name);

There seems no need for an extra variable ('cluster') here when you
can just reference 'old_cluster' directly in the code, the same as
other functions in this file do all the time.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [HACKERS] psql casts aspersions on server reliability

2023-11-22 Thread Bruce Momjian

On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote:
> Robert Haas  writes:
> > psql tends to do things like this:
> > rhaas=# select * from pg_stat_activity;
> > FATAL:  terminating connection due to administrator command
> > server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> 
> > Basically everything psql has to say about this is a lie:
> 
> I cannot get terribly excited about this.  What you seem to be proposing
> is that psql try to intuit the reason for connection closure from the
> last error message it got, but that seems likely to lead to worse lies
> than printing a boilerplate message.
> 
> I could go along with just dropping the last sentence ("This probably...")
> if the last error we got was FATAL level.  I don't find "unexpectedly"
> to be problematic here: from the point of view of psql, and probably
> of its user, the shutdown *was* unexpected.

I looked at this thread from 2016 and I think the problem is the
"abnormally" word, since if the server was shutdown by the administrator
(most likely), it isn't abnormal.  Here is a patch to remove
"abnormally".

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 660cdec93c..634708d716 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -749,7 +749,7 @@ retry4:
 	 */
 definitelyEOF:
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-			"\tThis probably means the server terminated abnormally\n"
+			"\tThis probably means the server terminated\n"
 			"\tbefore or while processing the request.");
 
 	/* Come here if lower-level code already set a suitable errorMessage */
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index f1192d28f2..115776ce6c 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -206,7 +206,7 @@ rloop:
 if (result_errno == EPIPE ||
 	result_errno == ECONNRESET)
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-			"\tThis probably means the server terminated abnormally\n"
+			"\tThis probably means the server terminated\n"
 			"\tbefore or while processing the request.");
 else
 	libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
@@ -306,7 +306,7 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len)
 result_errno = SOCK_ERRNO;
 if (result_errno == EPIPE || result_errno == ECONNRESET)
 	libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-			"\tThis probably means the server terminated abnormally\n"
+			"\tThis probably means the server terminated\n"
 			"\tbefore or while processing the request.");
 else
 	libpq_append_conn_error(conn, "SSL SYSCALL error: %s",
diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c
index bd72a87bbb..b972bd3ced 100644
--- a/src/interfaces/libpq/fe-secure.c
+++ b/src/interfaces/libpq/fe-secure.c
@@ -233,7 +233,7 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len)
 			case EPIPE:
 			case ECONNRESET:
 libpq_append_conn_error(conn, "server closed the connection unexpectedly\n"
-		"\tThis probably means the server terminated abnormally\n"
+		"\tThis probably means the server terminated\n"
 		"\tbefore or while processing the request.");
 break;
 
@@ -395,7 +395,7 @@ retry_masked:
 /* (strdup failure is OK, we'll cope later) */
 snprintf(msgbuf, sizeof(msgbuf),
 		 libpq_gettext("server closed the connection unexpectedly\n"
-	   "\tThis probably means the server terminated abnormally\n"
+	   "\tThis probably means the server terminated\n"
 	   "\tbefore or while processing the request."));
 /* keep newline out of translated string */
 strlcat(msgbuf, "\n", sizeof(msgbuf));

Re: [HACKERS] Changing references of password encryption to hashing

2023-11-22 Thread Bruce Momjian

On Wed, Nov 22, 2023 at 05:55:06PM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Wed, Nov 22, 2023 at 12:52:23PM -0800, Andres Freund wrote:
> >> What's the point of randomly reviving threads from 6 years ago, without any
> >> further analysis?
> 
> > Well, I feel like this is an imporant change, and got dropped because it
> > was determined to not be a new change in Postgres 10.  Still I think it
> > should be addressed and am looking to see if others feel the same.
> 
> You're expecting people to re-research something that dropped off the
> radar years ago, probably because it wasn't sufficiently interesting
> at the time.  If the point hasn't been raised again since then,
> I'd guess it's not any more interesting now.  Most of us have better
> things to do than go re-read old threads --- and dredging them up
> ten at a time is not improving the odds that people will care to look.

I can do the work. I was more looking for any feedback that this is a
valid area to work on.  I will do some work on this and publish an
updated patch because I think using the right terms makes sense.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Optionally using a better backtrace library?

2023-11-22 Thread Peter Geoghegan

On Tue, Sep 5, 2023 at 2:59 AM Alvaro Herrera  wrote:
> Much appreciated!  I can put this to good use.

I was just reminded of how our existing backtrace support is lacklustre.

Are you planning on submitting a patch for this?

--
Peter Geoghegan

Re: common signal handler protection

2023-11-22 Thread Andres Freund

Hi,

On 2023-11-22 15:59:44 -0600, Nathan Bossart wrote:
> Subject: [PATCH v2 1/3] Check that MyProcPid == getpid() in all signal
>  handlers.
> 
> In commit 97550c0711, we added a similar check to the SIGTERM
> handler for the startup process.  This commit adds this check to
> all signal handlers installed with pqsignal().  This is done by
> using a wrapper function that performs the check before calling the
> actual handler.
> 
> The hope is that this will offer more general protection against
> grandchildren processes inadvertently modifying shared memory due
> to inherited signal handlers.

It's a bit unclear here what grandchildren refers to - it's presumably in
reference to postmaster, but the preceding text doesn't even mention
postmaster. I'd probably just say "child processes of the current process.


> +
> +#ifdef PG_SIGNAL_COUNT   /* Windows */
> +#define PG_NSIG (PG_SIGNAL_COUNT)
> +#elif defined(NSIG)
> +#define PG_NSIG (NSIG)
> +#else
> +#define PG_NSIG (64) /* XXX: wild guess */
> +#endif

Perhaps worth adding a static assert for at least a few common types of
signals being below that value? That way we'd see a potential issue without
needing to reach the code path.


> +/*
> + * Except when called with SIG_IGN or SIG_DFL, pqsignal() sets up this 
> function
> + * as the handler for all signals.  This wrapper handler function checks that
> + * it is called within a process that the server knows about, and not a
> + * grandchild process forked by system(3), etc.

Similar comment to earlier - the grandchildren bit seems like a dangling
reference. And also too specific - I think we could encounter this in single
user mode as well?

Perhaps it could be reframed to "postgres processes, as determined by having
called InitProcessGlobals()"?


>This check ensures that such
> + * grandchildren processes do not modify shared memory, which could be
> + * detrimental.

"could be" seems a bit euphemistic :)


> From b77da9c54590a71eb9921d6f16bf4ffb0543e87e Mon Sep 17 00:00:00 2001
> From: Nathan Bossart 
> Date: Fri, 17 Nov 2023 14:00:12 -0600
> Subject: [PATCH v2 2/3] Centralize logic for restoring errno in signal
>  handlers.
> 
> Presently, we rely on each individual signal handler to save the
> initial value of errno and then restore it before returning if
> needed.  This is easily forgotten and, if missed, often goes
> undetected for a long time.

It's also just verbose :)


> From 5734e0cf5f00bbd74504b45934f68e1c2c1290bd Mon Sep 17 00:00:00 2001
> From: Nathan Bossart 
> Date: Fri, 17 Nov 2023 22:09:24 -0600
> Subject: [PATCH v2 3/3] Revert "Avoid calling proc_exit() in processes forked
>  by system()."
> 
> Thanks to commit XX, this check in the SIGTERM handler for
> the startup process is now obsolete and can be removed.  Instead of
> leaving around the elog(PANIC, ...) calls that are now unlikely to
> be triggered and the dead function write_stderr_signal_safe(), I've
> opted to just remove them for now.  Thanks to modern version
> control software, it will be trivial to dig those up if they are
> ever needed in the future.
> 
> This reverts commit 97550c0711972a9856b5db751539bbaf2f4c.
> 
> Reviewed-by: ???
> Discussion: ???
> ---
>  src/backend/postmaster/startup.c | 17 +
>  src/backend/storage/ipc/ipc.c|  4 
>  src/backend/storage/lmgr/proc.c  |  8 
>  src/backend/utils/error/elog.c   | 28 
>  src/include/utils/elog.h |  6 --
>  5 files changed, 1 insertion(+), 62 deletions(-)
> 
> diff --git a/src/backend/postmaster/startup.c 
> b/src/backend/postmaster/startup.c
> index 83dbed86b9..f40acd20ff 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -19,8 +19,6 @@
>   */
>  #include "postgres.h"
>  
> -#include 
> -
>  #include "access/xlog.h"
>  #include "access/xlogrecovery.h"
>  #include "access/xlogutils.h"
> @@ -113,20 +111,7 @@ static void
>  StartupProcShutdownHandler(SIGNAL_ARGS)
>  {
>   if (in_restore_command)
> - {
> - /*
> -  * If we are in a child process (e.g., forked by system() in
> -  * RestoreArchivedFile()), we don't want to call any exit 
> callbacks.
> -  * The parent will take care of that.
> -  */
> - if (MyProcPid == (int) getpid())
> - proc_exit(1);
> - else
> - {
> - write_stderr_signal_safe("StartupProcShutdownHandler() 
> called in child process\n");
> - _exit(1);
> - }
> - }
> + proc_exit(1);
>   else
>   shutdown_requested = true;
>   WakeupRecovery();

Hm. I wonder if this indicates an issue.  Do the preceding changes perhaps
make it more likely that a child process of the startup process could hang
around for longer, because the signal we're delivering doesn't terminate child
processes, because

Re: [HACKERS] Changing references of password encryption to hashing

2023-11-22 Thread Tom Lane

Bruce Momjian  writes:
> On Wed, Nov 22, 2023 at 12:52:23PM -0800, Andres Freund wrote:
>> What's the point of randomly reviving threads from 6 years ago, without any
>> further analysis?

> Well, I feel like this is an imporant change, and got dropped because it
> was determined to not be a new change in Postgres 10.  Still I think it
> should be addressed and am looking to see if others feel the same.

You're expecting people to re-research something that dropped off the
radar years ago, probably because it wasn't sufficiently interesting
at the time.  If the point hasn't been raised again since then,
I'd guess it's not any more interesting now.  Most of us have better
things to do than go re-read old threads --- and dredging them up
ten at a time is not improving the odds that people will care to look.

regards, tom lane

Re: Change GUC hashtable to use simplehash?

2023-11-22 Thread Andres Freund

Hi,

On 2023-11-22 16:27:56 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-11-22 15:56:21 -0500, Tom Lane wrote:
> >> GUC names are just about always short, though, so I'm not sure you've
> >> made your point?
>
> > With short I meant <= 6 characters (32 / 5 = 6.x). After that you're
> > overwriting bits that you previously set, without dispersing the 
> > "overwritten"
> > bits throughout the hash state.
>
> I'm less than convinced about the "overwrite" part:
>
> + /* Merge into hash ... not very bright, but it needn't be */
> + result = pg_rotate_left32(result, 5);
> + result ^= (uint32) ch;
>
> Rotating a 32-bit value 5 bits at a time doesn't result in successive
> characters lining up exactly, and even once they do, XOR is not
> "overwrite".

I didn't know what word to use, hence the air quotes. Yes, xor doesn't just
set the bits to the right hand side in, but it just affects data on a per-bit
basis, which easily can be cancelled out.

My understanding of writing hash functions is that every added bit mixed in
should have a ~50% chance of causing each other bit to flip. The proposed
function obviously doesn't get there.

It's worth noting that the limited range of the input values means that
there's a lot of bias toward some bits being set ('a' to 'z' all start with
0b011).

> I'm pretty dubious that we need something better than this.

Well, we know that the current attempt at a dedicated hashfunctions for this
does result in substantial amounts of conflicts. And it's hard to understand
such cases when you hit them, so I think it's better to avoid exposing
ourselves to such dangers, without a distinct need.

And I don't really see the need here to risk it, even if we are somewhat
confident it's fine.

If, which I mildly doubt, we can't afford to call murmurhash32 for every
character, we could just call it for 32/5 input characters together.  Or we
could just load up to 8 characters into an 64bit integer, can call
murmurhash64.

Something roughly like

uint64 result;

while (*name)
{
uint64 value = 0;

for (int i = 0; i < 8 && *name; i++)
{
char ch = *name++;

value |= *name;
value = value << 8;
}

result = hash_combine64(result, murmurhash64(value));
}

The hash_combine use isn't quite right either, we should use the full
accumulator state of a proper hash function, but it's seems very unlikely to
matter here.

The fact that string_hash() is slow due to the strlen(), which causes us to
process the input twice and which is optimized to also handle very long
strings which typically string_hash() doesn't encounter, seems problematic far
beyond this case. We use string_hash() in a *lot* of places, and that strlen()
does regularly show up in profiles. We should fix that.

The various hash functions being external functions also shows up in a bunch
of profiles too.  It's particularly ridiculous for cases like tag_hash(),
where the caller typically knows the lenght, but calls a routine in a
different translation unit, which obviously can't be optimized for a specific
length.

I think we ought to adjust our APIs around this:

1) The accumulator state of the hash functions should be exposed, so one can
   accumulate values into the hash state, without reducing the internal state
   to a single 32/64 bit variable.

2) For callers that know the length of data, we should use a static inline
   hash function, rather than an external function call. This should include
   special cased inline functions for adding 32/64bit of data to the hash
   state.

   Perhaps with a bit of logic to *not* use the inline version if the hashed
   input is long (and thus the call overhead doesn't matter). Something like
   if (__builtin_constant_p(len) && len < 128)
   /* call inline implementation */
   else
   /* call out of line implementation, not worth the code bloat */

We know that hash functions should have the split into init/process
data*/finish steps, as e.g. evidenced by pg_crc.h/pg_crc32.h.

With something like that, you could write a function that lowercases
characters inline without incurring unnecessary overhead.

hash32_state hs;

hash32_init();

while (*name)
{
charch = *name++;

/* crappy lowercase for this situation */
ch |= 0x20;

hash32_process_byte(, ch);
}

return hash32_finish();

Perhaps with some additional optimization for processing the input string in
32/64 bit quantities.

Greetings,

Andres Freund

Re: Stop the search once replication origin is found

2023-11-22 Thread Peter Smith

On Wed, Nov 22, 2023 at 7:49 PM Amit Kapila  wrote:
>
> On Mon, Nov 20, 2023 at 4:36 PM Amit Kapila  wrote:
> >
> > On Mon, Nov 20, 2023 at 2:36 PM Antonin Houska  wrote:
> > >
> > > Although it's not performance-critical, I think it just makes sense to 
> > > break
> > > the loop in replorigin_session_setup() as soon as we've found the origin.
> > >
> >
> > Your proposal sounds reasonable to me.
> >
>
> Pushed, thanks for the patch!
>
> --

Hi,

While reviewing the replorigin_session_setup() fix [1] that was pushed
yesterday, I saw that some nearby code in that same function might
benefit from some refactoring.

I'm not sure if you want to modify it or not, but FWIW I think the
code can be tidied by making the following changes:

~~~

1.
else if (curstate->acquired_by != 0 && acquired_by == 0)
{
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
 errmsg("replication origin with ID %d is already
active for PID %d",
curstate->roident, curstate->acquired_by)));
}

1a. AFAICT the above code doesn't need to be else/if

1b. The brackets are unnecessary for a single statement.

~~~

2.
if (session_replication_state == NULL && free_slot == -1)
ereport(ERROR,
(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
errmsg("could not find free replication state slot for replication
origin with ID %d",
node),
errhint("Increase max_replication_slots and try again.")));
else if (session_replication_state == NULL)
{
/* initialize new slot */
session_replication_state = _states[free_slot];
Assert(session_replication_state->remote_lsn == InvalidXLogRecPtr);
Assert(session_replication_state->local_lsn == InvalidXLogRecPtr);
session_replication_state->roident = node;
}

The above code can be improved by combining within a single check for
session_replication_state NULL.

~~~

3.
There are some unnecessary double-blank lines.

~~~

4.
/* ok, found slot */
session_replication_state = curstate;
break;

QUESTION: That 'session_replication_state' is a global variable, but
there is more validation logic that comes *after* this assignment
which might decide there was some problem and cause an ereport or
elog. In practice, maybe it makes no difference, but it did seem
slightly dubious to me to assign to a global before determining
everything is OK. Thoughts?

~~~

Anyway, PSA a patch for the 1-3 above.

==
[1] https://www.postgresql.org/message-id/flat/2694.1700471273%40antos

Kind Regards,
Peter Smith.
Fujitsu Australia

v1-0001-replorigin_session_setup-refactoring.patch
Description: Binary data

Re: common signal handler protection

2023-11-22 Thread Nathan Bossart

On Tue, Nov 21, 2023 at 04:40:06PM -0600, Nathan Bossart wrote:
> cfbot seems unhappy with this on Windows.  IIUC we need to use
> PG_SIGNAL_COUNT there instead, but I'd like to find a way to have just one
> macro for all platforms.

Here's an attempt at fixing the Windows build.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From fae0d1855c74aa13574943370ffbe79f768bdb01 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 17 Nov 2023 21:38:18 -0600
Subject: [PATCH v2 1/3] Check that MyProcPid == getpid() in all signal
 handlers.

In commit 97550c0711, we added a similar check to the SIGTERM
handler for the startup process.  This commit adds this check to
all signal handlers installed with pqsignal().  This is done by
using a wrapper function that performs the check before calling the
actual handler.

The hope is that this will offer more general protection against
grandchildren processes inadvertently modifying shared memory due
to inherited signal handlers.  Another potential follow-up
improvement is to use this wrapper handler function to restore
errno instead of relying on each individual handler function to do
so.

This commit makes the changes in commit 97550c0711 obsolete but
leaves reverting it for a follow-up commit.

Reviewed-by: ???
Discussion: ???
---
 src/port/pqsignal.c | 74 +++--
 1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/src/port/pqsignal.c b/src/port/pqsignal.c
index 83d876db6c..0baf8eed11 100644
--- a/src/port/pqsignal.c
+++ b/src/port/pqsignal.c
@@ -28,23 +28,87 @@
 #include "c.h"
 
 #include 
+#ifndef FRONTEND
+#include 
+#endif
 
 #ifndef FRONTEND
 #include "libpq/pqsignal.h"
+#include "miscadmin.h"
+#endif
+
+#ifdef PG_SIGNAL_COUNT			/* Windows */
+#define PG_NSIG (PG_SIGNAL_COUNT)
+#elif defined(NSIG)
+#define PG_NSIG (NSIG)
+#else
+#define PG_NSIG (64)			/* XXX: wild guess */
+#endif
+
+static volatile pqsigfunc pqsignal_handlers[PG_NSIG];
+
+/*
+ * Except when called with SIG_IGN or SIG_DFL, pqsignal() sets up this function
+ * as the handler for all signals.  This wrapper handler function checks that
+ * it is called within a process that the server knows about, and not a
+ * grandchild process forked by system(3), etc.  This check ensures that such
+ * grandchildren processes do not modify shared memory, which could be
+ * detrimental.  If the check succeeds, the function originally provided to
+ * pqsignal() is called.  Otherwise, the default signal handler is installed
+ * and then called.
+ */
+static void
+wrapper_handler(SIGNAL_ARGS)
+{
+#ifndef FRONTEND
+	Assert(MyProcPid);
+
+	if (unlikely(MyProcPid != (int) getpid()))
+	{
+		pqsignal(postgres_signal_arg, SIG_DFL);
+		raise(postgres_signal_arg);
+		return;
+	}
 #endif
 
+	(*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg);
+}
+
 /*
  * Set up a signal handler, with SA_RESTART, for signal "signo"
  *
  * Returns the previous handler.
+ *
+ * NB: If called within a signal handler, race conditions may lead to bogus
+ * return values.  You should either avoid calling this within signal handlers
+ * or ignore the return value.
+ *
+ * XXX: Since no in-tree callers use the return value, and there is little
+ * reason to do so, it would be nice if we could convert this to a void
+ * function instead of providing potentially-bogus return values.
+ * Unfortunately, that requires modifying the pqsignal() in legacy-pqsignal.c,
+ * which in turn requires an SONAME bump, which is probably not worth it.
  */
 pqsigfunc
 pqsignal(int signo, pqsigfunc func)
 {
+	pqsigfunc	orig_func = pqsignal_handlers[signo];	/* assumed atomic */
 #if !(defined(WIN32) && defined(FRONTEND))
 	struct sigaction act,
 oact;
+#else
+	pqsigfunc	ret;
+#endif
+
+	Assert(signo < PG_NSIG);
 
+	if (func != SIG_IGN && func != SIG_DFL)
+	{
+		pqsignal_handlers[signo] = func;	/* assumed atomic */
+		func = wrapper_handler;
+	}
+
+#if !(defined(WIN32) && defined(FRONTEND))
 	act.sa_handler = func;
 	sigemptyset(_mask);
 	act.sa_flags = SA_RESTART;
@@ -54,9 +118,15 @@ pqsignal(int signo, pqsigfunc func)
 #endif
 	if (sigaction(signo, , ) < 0)
 		return SIG_ERR;
-	return oact.sa_handler;
+	else if (oact.sa_handler == wrapper_handler)
+		return orig_func;
+	else
+		return oact.sa_handler;
 #else
 	/* Forward to Windows native signal system. */
-	return signal(signo, func);
+	if ((ret = signal(signo, func)) == wrapper_handler)
+		return orig_func;
+	else
+		return ret;
 #endif
 }
-- 
2.25.1

>From b77da9c54590a71eb9921d6f16bf4ffb0543e87e Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 17 Nov 2023 14:00:12 -0600
Subject: [PATCH v2 2/3] Centralize logic for restoring errno in signal
 handlers.

Presently, we rely on each individual signal handler to save the
initial value of errno and then restore it before returning if
needed.  This is easily forgotten and, if missed, often goes
undetected for a long time.

In commit XX, we introduced a wrapper signal

Re: psql not responding to SIGINT upon db reconnection

2023-11-22 Thread Tristan Partin


On Wed Nov 22, 2023 at 3:00 PM CST, Heikki Linnakangas wrote:

On 22/11/2023 19:29, Tristan Partin wrote:
> On Thu Nov 16, 2023 at 8:33 AM CST, Heikki Linnakangas wrote:
>> On 06/11/2023 19:16, Tristan Partin wrote:
> That sounds like a much better solution. Attached you will find a v4
> that implements your suggestion. Please let me know if there is
> something that I missed. I can confirm that the patch works.
>>
>> This patch is missing a select(). It will busy loop until the connection
>> is established or cancelled.
> 
> If I add a wait (select, poll, etc.), then I can't control-C during the

> blocking call, so it doesn't really solve the problem.

Hmm, they should return with EINTR on signal. At least on Linux; I'm not 
sure how portable that is. See signal(7) man page, section "Interruption 
of system calls and library functions by signal handlers". You could 
also use a timeout like 5 s to ensure that you wake up and notice that 
the signal was received eventually, even if it doesn't interrupt the 
blocking call.


Ha, you're right. I had this working yesterday, but convinced myself it 
didn't. I had a do while loop wrapping the blocking call. Here is a v4, 
which seems to pass the tests that were pointed out to be failing 
earlier.


Noticed that I copy-pasted pqSocketPoll() into the psql code. I think 
this may be controversial. Not sure what the best solution to the issue 
is. I will pay attention to the buildfarm animals when they pick this 
up.


--
Tristan Partin
Neon (https://neon.tech)
From 1b4303df1200c561cf478f9c7037ff940f6cd741 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 24 Jul 2023 11:12:59 -0500
Subject: [PATCH v4] Allow SIGINT to cancel psql database reconnections

After installing the SIGINT handler in psql, SIGINT can no longer cancel
database reconnections. For instance, if the user starts a reconnection
and then needs to do some form of interaction (ie psql is polling),
there is no way to cancel the reconnection process currently.

Use PQconnectStartParams() in order to insert a CancelRequested check
into the polling loop.
---
 src/bin/psql/command.c | 129 -
 1 file changed, 128 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 82cc091568..588eed9351 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -11,6 +11,11 @@
 #include 
 #include 
 #include 
+#if HAVE_POLL
+#include 
+#else
+#include 
+#endif
 #ifndef WIN32
 #include 			/* for stat() */
 #include 			/* for setitimer() */
@@ -40,6 +45,7 @@
 #include "large_obj.h"
 #include "libpq-fe.h"
 #include "libpq/pqcomm.h"
+#include "libpq/pqsignal.h"
 #include "mainloop.h"
 #include "portability/instr_time.h"
 #include "pqexpbuffer.h"
@@ -3251,6 +3257,90 @@ copy_previous_query(PQExpBuffer query_buf, PQExpBuffer previous_buf)
 	return false;
 }
 
+/*
+ * Check a file descriptor for read and/or write data, possibly waiting.
+ * If neither forRead nor forWrite are set, immediately return a timeout
+ * condition (without waiting).  Return >0 if condition is met, 0
+ * if a timeout occurred, -1 if an error or interrupt occurred.
+ *
+ * Timeout is infinite if end_time is -1.  Timeout is immediate (no blocking)
+ * if end_time is 0 (or indeed, any time before now).
+ */
+static int
+pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
+{
+	/* We use poll(2) if available, otherwise select(2) */
+#ifdef HAVE_POLL
+	struct pollfd input_fd;
+	int			timeout_ms;
+
+	if (!forRead && !forWrite)
+		return 0;
+
+	input_fd.fd = sock;
+	input_fd.events = POLLERR;
+	input_fd.revents = 0;
+
+	if (forRead)
+		input_fd.events |= POLLIN;
+	if (forWrite)
+		input_fd.events |= POLLOUT;
+
+	/* Compute appropriate timeout interval */
+	if (end_time == ((time_t) -1))
+		timeout_ms = -1;
+	else
+	{
+		time_t		now = time(NULL);
+
+		if (end_time > now)
+			timeout_ms = (end_time - now) * 1000;
+		else
+			timeout_ms = 0;
+	}
+
+	return poll(_fd, 1, timeout_ms);
+#else			/* !HAVE_POLL */
+
+	fd_set		input_mask;
+	fd_set		output_mask;
+	fd_set		except_mask;
+	struct timeval timeout;
+	struct timeval *ptr_timeout;
+
+	if (!forRead && !forWrite)
+		return 0;
+
+	FD_ZERO(_mask);
+	FD_ZERO(_mask);
+	FD_ZERO(_mask);
+	if (forRead)
+		FD_SET(sock, _mask);
+
+	if (forWrite)
+		FD_SET(sock, _mask);
+	FD_SET(sock, _mask);
+
+	/* Compute appropriate timeout interval */
+	if (end_time == ((time_t) -1))
+		ptr_timeout = NULL;
+	else
+	{
+		time_t		now = time(NULL);
+
+		if (end_time > now)
+			timeout.tv_sec = end_time - now;
+		else
+			timeout.tv_sec = 0;
+		timeout.tv_usec = 0;
+		ptr_timeout = 
+	}
+
+	return select(sock + 1, _mask, _mask,
+  _mask, ptr_timeout);
+#endif			/* HAVE_POLL */
+}
+
 /*
  * Ask the user for a password; 'username' is the username the
  * password is for, if one has been explicitly specified.
@@ -3324,6 +3414,7 @@ do_connect(enum trivalue reuse_previous_specification,
 	bool		keep_password =

Re: [HACKERS] Changing references of password encryption to hashing

2023-11-22 Thread Bruce Momjian

On Wed, Nov 22, 2023 at 12:52:23PM -0800, Andres Freund wrote:
> On 2023-11-21 22:43:48 -0500, Bruce Momjian wrote:
> > Is there any interest in fixing our documentation that says encrypted
> > when it means hashed?  Should I pursue this?
> 
> What's the point of randomly reviving threads from 6 years ago, without any
> further analysis?

Well, I feel like this is an imporant change, and got dropped because it
was determined to not be a new change in Postgres 10.  Still I think it
should be addressed and am looking to see if others feel the same.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Change GUC hashtable to use simplehash?

2023-11-22 Thread Tom Lane

Andres Freund  writes:
> On 2023-11-22 15:56:21 -0500, Tom Lane wrote:
>> GUC names are just about always short, though, so I'm not sure you've
>> made your point?

> With short I meant <= 6 characters (32 / 5 = 6.x). After that you're
> overwriting bits that you previously set, without dispersing the "overwritten"
> bits throughout the hash state.

I'm less than convinced about the "overwrite" part:

+   /* Merge into hash ... not very bright, but it needn't be */
+   result = pg_rotate_left32(result, 5);
+   result ^= (uint32) ch;

Rotating a 32-bit value 5 bits at a time doesn't result in successive
characters lining up exactly, and even once they do, XOR is not
"overwrite".  I'm pretty dubious that we need something better than this.

regards, tom lane

Re: Change GUC hashtable to use simplehash?

2023-11-22 Thread Andres Freund

Hi,

On 2023-11-22 15:56:21 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-11-21 16:42:55 +0700, John Naylor wrote:
> >> The strlen call required for hashbytes() is not free. The lack of
> >> mixing in the (probably inlined after 0001) previous hash function can
> >> remedied directly, as in the attached:
>
> > I doubt this is a good hashfunction. For short strings, sure, but after
> > that...  I don't think it makes sense to reduce the internal state of a hash
> > function to something this small.
>
> GUC names are just about always short, though, so I'm not sure you've
> made your point?

With short I meant <= 6 characters (32 / 5 = 6.x). After that you're
overwriting bits that you previously set, without dispersing the "overwritten"
bits throughout the hash state.

It's pretty easy to create conflicts this way, even just on paper. E.g. I
think abcdefgg and cbcdefgw would have the same hash, because the accumulated
value passed to murmurhash32 is the same.

The fact that this happens when a large part of the string is the same
is bad, because it makes it more likely that prefixed strings trigger such
conflicts, and they're obviously common with GUC strings.

Greetings,

Andres Freund

Re: Partial aggregates pushdown

2023-11-22 Thread Bruce Momjian

On Wed, Nov 22, 2023 at 10:16:16AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> 2. Approach 2
> (1) Advantages
> (a) No need to add partial aggregate functions to the catalogs for each 
> aggregation
> (2) Disadvantages
> (a) Need to add non-standard keywords to the SQL syntax.
> 
> I did not choose Approach2 because I was not confident that the disadvantage 
> mentioned in 2.(2)(a)
> would be accepted by the PostgreSQL development community.
> If it is accepted, I think Approach 2 is smarter.
> Could you please provide your opinion on which
> approach is preferable after comparing these two approaches?

I didn't know #2 was possible, but given the great number of catalog
entries, doing it in the SQL grammar seems cleaner to me.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: CRC32C Parallel Computation Optimization on ARM

2023-11-22 Thread Nathan Bossart

On Wed, Nov 22, 2023 at 10:16:44AM +, Xiang Gao wrote:
> On Date: Fri, 10 Nov 2023 10:36:08AM -0600, Nathan Bossart wrote:
>>+__attribute__((target("+crc+crypto")))
>>
>>I'm not sure we can assume that all compilers will understand this, and I'm
>>not sure we need it.
> 
> CFLAGS_CRC is "-march=armv8-a+crc". Generally, if -march is supported,
> __attribute__ is also supported.

IMHO we should stick with CFLAGS_CRC for now.  If we want to switch to
using __attribute__((target("..."))), I think we should do so in a separate
patch.  We are cautious about checking the availability of an attribute
before using it (see c.h), and IIUC we'd need to verify that this works for
all supported compilers that can target ARM before removing CFLAGS_CRC
here.

> In addition, I am not sure about the source file pg_crc32c_armv8.c, if
> CFLAGS_CRC and CFLAGS_CRYPTO are needed at the same time, how should it
> be expressed in the makefile?

pg_crc32c_armv8.o: CFLAGS += ${CFLAGS_CRC} ${CFLAGS_CRYPTO}

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: psql not responding to SIGINT upon db reconnection

2023-11-22 Thread Heikki Linnakangas


On 22/11/2023 19:29, Tristan Partin wrote:

On Thu Nov 16, 2023 at 8:33 AM CST, Heikki Linnakangas wrote:

On 06/11/2023 19:16, Tristan Partin wrote:

That sounds like a much better solution. Attached you will find a v4
that implements your suggestion. Please let me know if there is
something that I missed. I can confirm that the patch works.


This patch is missing a select(). It will busy loop until the connection
is established or cancelled.


If I add a wait (select, poll, etc.), then I can't control-C during the
blocking call, so it doesn't really solve the problem.


Hmm, they should return with EINTR on signal. At least on Linux; I'm not 
sure how portable that is. See signal(7) man page, section "Interruption 
of system calls and library functions by signal handlers". You could 
also use a timeout like 5 s to ensure that you wake up and notice that 
the signal was received eventually, even if it doesn't interrupt the 
blocking call.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Change GUC hashtable to use simplehash?

2023-11-22 Thread Tom Lane

Andres Freund  writes:
> On 2023-11-21 16:42:55 +0700, John Naylor wrote:
>> The strlen call required for hashbytes() is not free. The lack of
>> mixing in the (probably inlined after 0001) previous hash function can
>> remedied directly, as in the attached:

> I doubt this is a good hashfunction. For short strings, sure, but after
> that...  I don't think it makes sense to reduce the internal state of a hash
> function to something this small.

GUC names are just about always short, though, so I'm not sure you've
made your point?  At worst, maybe this with 64-bit state instead of 32?

regards, tom lane

Re: [HACKERS] Changing references of password encryption to hashing

2023-11-22 Thread Andres Freund

On 2023-11-21 22:43:48 -0500, Bruce Momjian wrote:
> Is there any interest in fixing our documentation that says encrypted
> when it means hashed?  Should I pursue this?

What's the point of randomly reviving threads from 6 years ago, without any
further analysis?

Re: Change GUC hashtable to use simplehash?

2023-11-22 Thread Andres Freund

Hi,

On 2023-11-21 16:42:55 +0700, John Naylor wrote:
> I get a noticeable regression in 0002, though, and I think I see why:
> 
>  guc_name_hash(const char *name)
>  {
> - uint32 result = 0;
> + const unsigned char *bytes = (const unsigned char *)name;
> + int  blen  = strlen(name);
> 
> The strlen call required for hashbytes() is not free. The lack of
> mixing in the (probably inlined after 0001) previous hash function can
> remedied directly, as in the attached:

I doubt this is a good hashfunction. For short strings, sure, but after
that...  I don't think it makes sense to reduce the internal state of a hash
function to something this small.

Greetings,

Andres Freund

Re: remaining sql/json patches

2023-11-22 Thread Andres Freund

Hi,

On 2023-11-21 12:52:35 +0900, Amit Langote wrote:
> version gram.o text bytes  %change  gram.c bytes  %change
>
> 9.6 534010 -2108984   -
> 10  582554 9.09 2258313   7.08
> 11  584596 0.35 2313475   2.44
> 12  590957 1.08 2341564   1.21
> 13  590381-0.09 2357327   0.67
> 14  600707 1.74 2428841   3.03
> 15  633180 5.40 2495364   2.73
> 16  653464 3.20 2575269   3.20
> 17-sqljson  672800 2.95 2709422   3.97
>
> So if we put SQL/JSON (including JSON_TABLE()) into 17, we end up with a 
> gram.o 2.95% larger than v16, which granted is a somewhat larger bump, though 
> also smaller than with some of recent releases.

I think it's ok to increase the size if it's necessary increases - but I also
think we've been a bit careless at times, and that that has made the parser
slower.  There's probably also some "infrastructure" work we could do combat
some of the growth too.

I know I triggered the use of the .c bytes and text size, but it'd probably
more sensible to look at the size of the important tables generated by bison.
I think the most relevant defines are:

#define YYLAST   117115
#define YYNTOKENS  521
#define YYNNTS  707
#define YYNRULES  3300
#define YYNSTATES  6255
#define YYMAXUTOK   758

I think a lot of the reason we end up with such a big "state transition" space
is that a single addition to e.g. col_name_keyword or unreserved_keyword
increases the state space substantially, because it adds new transitions to so
many places. We're in quadratic territory, I think.  We might be able to do
some lexer hackery to avoid that, but not sure.

Greetings,

Andres Freund

Re: Parallel CREATE INDEX for BRIN indexes

2023-11-22 Thread Tomas Vondra

On 11/20/23 20:48, Matthias van de Meent wrote:
> On Wed, 8 Nov 2023 at 12:03, Tomas Vondra  
> wrote:
>>
>> Hi,
>>
>> here's an updated patch, addressing the review comments, and reworking
>> how the work is divided between the workers & leader etc.
>>
>> 0001 is just v2, rebased to current master
>>
>> 0002 and 0003 address most of the issues, in particular it
>>
>>   - removes the unnecessary spool
>>   - fixes bs_reltuples type to double
>>   - a couple comments are reworded to be clearer
>>   - changes loop/condition in brinbuildCallbackParallel
>>   - removes asserts added for debugging
>>   - fixes cast in comparetup_index_brin
>>   - 0003 then simplifies comparetup_index_brin
>>   - I haven't inlined the tuplesort_puttuple_common parameter
>> (didn't seem worth it)
> 
> OK, thanks
> 
>> 0004 Reworks how the work is divided between workers and combined by the
>> leader. It undoes the tableam.c changes that attempted to divide the
>> relation into chunks matching the BRIN ranges, and instead merges the
>> results in the leader (using the BRIN "union" function).
> 
> That's OK.
> 
>> I haven't done any indentation fixes yet.
>>
>> I did fairly extensive testing, using pageinspect to compare indexes
>> built with/without parallelism. More testing is needed, but it seems to
>> work fine (with other opclasses and so on).
> 
> After code-only review, here are some comments:
> 
>> +++ b/src/backend/access/brin/brin.c
>> [...]
>> +/* Magic numbers for parallel state sharing */
>> +#define PARALLEL_KEY_BRIN_SHAREDUINT64CONST(0xA001)
>> +#define PARALLEL_KEY_TUPLESORTUINT64CONST(0xA002)
> 
> These shm keys use the same constants also in use in
> access/nbtree/nbtsort.c. While this shouldn't be an issue in normal
> operations, I'd prefer if we didn't actively introduce conflicting
> identifiers when we still have significant amounts of unused values
> remaining.
> 

Hmmm. Is there some rule of thumb how to pick these key values? I see
nbtsort.c uses 0xA prefix, execParallel.c uses 0xE, while parallel.c
ended up using 0x as prefix. I've user 0xB, simply because
BRIN also starts with B.

>> +#define PARALLEL_KEY_QUERY_TEXTUINT64CONST(0xA003)
> 
> This is the fourth definition of a PARALLEL%_KEY_QUERY_TEXT, the
> others being in access/nbtree/nbtsort.c (value 0xA004, one
> more than brin's), backend/executor/execParallel.c
> (0xE008), and PARALLEL_VACUUM_KEY_QUERY_TEXT (0x3) (though
> I've not checked that their uses are exactly the same, I'd expect at
> least btree to match mostly, if not fully, 1:1).
> I think we could probably benefit from a less ad-hoc sharing of query
> texts. I don't think that needs to happen specifically in this patch,
> but I think it's something to keep in mind in future efforts.
> 

I'm afraid I don't quite get what you mean by "ad hoc sharing of query
texts". Are you saying we shouldn't propagate the query text to the
parallel workers? Why? Or what's the proper solution?

>> +_brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state)
>> [...]
>> +BrinSpool  *spool = state->bs_spool;
>> [...]
>> +if (!state)
>> +return;
> 
> I think the assignment to spool should be moved to below this
> condition, as _brin_begin_parallel calls this with state=NULL when it
> can't launch parallel workers, which will cause issues here.
> 

Good catch! I wonder if we have tests that might trigger this, say by
setting max_parallel_maintenance_workers > 0 while no workers allowed.

>> +state->bs_numtuples = brinshared->indtuples;
> 
> My IDE complains about bs_numtuples being an integer. This is a
> pre-existing issue, but still valid: we can hit an overflow on tables
> with pages_per_range=1 and relsize >= 2^31 pages. Extremely unlikely,
> but problematic nonetheless.
> 

True. I think I've been hesitant to make this a double because it seems
a bit weird to do +1 with a double, and at some point (d == d+1). But
this seems safe, we're guaranteed to be far away from that threshold.

>> + * FIXME This probably needs some memory management fixes - we're 
>> reading
>> + * tuples from the tuplesort, we're allocating an emty tuple, and so on.
>> + * Probably better to release this memory.
> 
> This should probably be resolved.
> 

AFAICS that comment is actually inaccurate/stale, sorry about that. The
code actually allocates (and resets) a single memtuple, and also
emptyTuple. So I think that's OK, I've removed the comment.

> I also noticed that this is likely to execute `union_tuples` many
> times when pages_per_range is coprime with the parallel table scan's
> block stride (or when we for other reasons have many tuples returned
> for each range); and this `union_tuples` internally allocates and
> deletes its own memory context for its deserialization of the 'b'
> tuple. I think we should just pass a scratch context instead, so that
> we don't have the overhead of

Re: autovectorize page checksum code included elsewhere

2023-11-22 Thread Nathan Bossart

On Wed, Nov 22, 2023 at 02:54:13PM +0200, Ants Aasma wrote:
> On Wed, 22 Nov 2023 at 11:44, John Naylor  wrote:
>> Poking in those files a bit, I also see references to building with
>> SSE 4.1. Maybe that's an avenue that we should pursue? (an indirect
>> function call is surely worth it for page-sized data)

Yes, I think we should, but we also need to be careful not to hurt
performance on platforms that aren't able to benefit [0] [1].

There are a couple of other threads about adding support for newer
instructions [2] [3], and properly detecting the availability of these
instructions seems to be a common obstacle.  We have a path forward for
stuff that's already using a runtime check (e.g., CRC32C), but I think
we're still trying to figure out what to do for things that must be inlined
(e.g., simd.h).

One half-formed idea I have is to introduce some sort of ./configure flag
that enables all the newer instructions that your CPU understands.  It
would also remove any existing runtime checks.  This option would make it
easy to take advantage of the newer instructions if you are building
Postgres for only your machine (or others just like it).

> For reference, executing the page checksum 10M times on a AMD 3900X CPU:
> 
> clang-14 -O2 4.292s (17.8 GiB/s)
> clang-14 -O2 -msse4.12.859s (26.7 GiB/s)
> clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s)

Nice.  I've noticed similar improvements with AVX2 intrinsics in simd.h.

[0] https://postgr.es/m/2613682.1698779776%40sss.pgh.pa.us
[1] https://postgr.es/m/36329.1699325578%40sss.pgh.pa.us
[2] 
https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com
[3] 
https://postgr.es/m/db9pr08mb6991329a73923bf8ed4b3422f5...@db9pr08mb6991.eurprd08.prod.outlook.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: How to stop autovacuum silently

2023-11-22 Thread Peter Geoghegan

On Wed, Nov 22, 2023 at 8:18 AM Maxim Orlov  wrote:
> Recently, one of our customers had reported a not working autovacuum.  After 
> a minor investigation, I've found that
> autovacuum launcher did, actually, run vacuum as expected, but with no 
> results.  At the same time, no warnings or
> other anomies were present in the logs.

Are you aware of commit e83ebfe6d7, which added a similar WARNING at
the point when VACUUM overwrites a relfrozenxid/relminmxid "from the
future"? It's a recent one.

> At first, I've thought may be statistics is broken, thus vacuum is not 
> working as expected.  But in fact, something
> more interesting is had happened.

Was pg_upgrade even run against this database? My guess is that the
underlying problem was caused by the bug fixed by commit 74cf7d46.

-- 
Peter Geoghegan

Re: psql not responding to SIGINT upon db reconnection

2023-11-22 Thread Tristan Partin

On Thu Nov 16, 2023 at 8:33 AM CST, Heikki Linnakangas wrote:

On 06/11/2023 19:16, Tristan Partin wrote:
>>> That sounds like a much better solution. Attached you will find a v4
>>> that implements your suggestion. Please let me know if there is
>>> something that I missed. I can confirm that the patch works.

This patch is missing a select(). It will busy loop until the connection 
is established or cancelled.

If I add a wait (select, poll, etc.), then I can't control-C during the 
blocking call, so it doesn't really solve the problem. On Linux, we have 
signalfds which seem like the perfect solution to this problem, "wait on 
the pq socket or SIGINT." But that doesn't translate well to other 
operating systems :(.

tristan957=> \c
NOTICE:  Welcome to Neon!
Authenticate by visiting:
https://console.neon.tech/psql_session/

^CTerminated

You can see here that I can't terminate the command. Where you see 
"Terminated" is me running `kill $(pgrep psql)` in another terminal.

Shouldn't we also clear CancelRequested after we have cancelled the 
attempt? Otherwise, any subsequent attempts will immediately fail too.

After switching to cancel_pressed, I don't think so. See this bit of 
code in the psql main loop:

/* main loop to get queries and execute them */
while (successResult == EXIT_SUCCESS)
{
/*
 * Clean up after a previous Control-C
 */
if (cancel_pressed)
{
if (!pset.cur_cmd_interactive)
{
/*
 * You get here if you stopped a script with Ctrl-C.
 */
successResult = EXIT_USER;
break;
}

cancel_pressed = false;
}

Should we use 'cancel_pressed' here rather than CancelRequested? To be 
honest, I don't understand the difference, so that's a genuine question. 
There was an attempt at unifying them in the past but it was reverted in 
commit 5d43c3c54d.

The more I look at this, the more I don't understand... I need to look 
at this harder, but wanted to get this email out. Switched to 
cancel_pressed though. Thanks for pointing it out. Going to wait to send 
a v2 while more discussion occurs.

--
Tristan Partin
Neon (https://neon.tech)

Re: [ psql - review request ] review request for \d+ tablename, \d+ indexname indenting

2023-11-22 Thread Daniel Verite

Shlok Kyal wrote:

> > The error was corrected and a new diff file was created.
> > The diff file was created based on 16 RC1.
> > We confirmed that 5 places where errors occurred when performing
> > make check were changed to ok.

Reviewing the patch, I see these two problems in the current version
(File: psql-slashDplus-partition-indentation.diff, Date: 2023-09-19 00:19:34) 

* There are changes in the regression tests that do not concern this
feature and should not be there.

For instance this hunk:

--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -742,8 +742,6 @@ COMMENT ON COLUMN ft1.c1 IS 'ft1.c1';
 Check constraints:
 "ft1_c2_check" CHECK (c2 <> ''::text)
 "ft1_c3_check" CHECK (c3 >= '01-01-1994'::date AND c3 <=
'01-31-1994'::date)
-Not-null constraints:
-"ft1_c1_not_null" NOT NULL "c1"
 Server: s0
 FDW options: (delimiter ',', quote '"', "be quoted" 'value')

It seems to undo a test for a recent feature adding "Not-null
constraints" to \d, which suggests that you've been running tests
against and older version than the source tree you're diffing
against. These should be the same version, and also the latest
one (git HEAD) or as close as possible to the latest when the
patch is submitted.

* The new query with \d on partitioned tables does not work with
  Postgres servers 12 or 13:

postgres=# CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktempint,
unitsales   int
) PARTITION BY RANGE (logdate);

postgres=# \d measurement 
ERROR:  syntax error at or near "."
LINE 2: ...  0 AS level,c.relkind, false AS i.inhdetach...

Setting the CommitFest status to WoA.

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

How to stop autovacuum silently

2023-11-22 Thread Maxim Orlov

Hi!

Recently, one of our customers had reported a not working autovacuum.
After a minor investigation, I've found that
autovacuum launcher did, actually, run vacuum as expected, but with no
results.  At the same time, no warnings or
other anomies were present in the logs.

At first, I've thought may be statistics is broken, thus vacuum is not
working as expected.  But in fact, something
more interesting is had happened.

The pg_class.relfrozenxid was set to some rubbish value from the future,
thus broken in template1 DB, so any new
database will have it's broken too.  Then, we create "blocker" DB and then
in vac_update_datfrozenxid() we get "bogus" (from the future) value
of relfrozenxid and *silently* return.  Any other new created DB will not
be autovacuumed.

Funny, but from the perspective of DBA, this looks like autovacuum is not
working any more for no reasons, although
all the criterion for its launch is clearly observed.

AFAICS, there are several solutions for this state:
 - run vacuumdb for all DB's
 - manually update broken pg_class.relfrozenxid
 - lowering of autovacuum_freeze_max_age to trigger prevent of transaction
ID wraparound

I do understand, this behaviour hardly can be described as a bug of some
sort, but could we make, at least, a useful
message to help to clarify what is going on here?

=== REPRODUCE ===
$ cat <> pgsql/data/postgresql.conf
autovacuum_naptime = 1s
autovacuum_freeze_max_age = 10
EOF
$ ./pgsql/bin/pg_ctl -D pgsql/data -l pgsql/logfile start
waiting for server to start done
server started
$ ./pgsql/bin/psql postgres
psql (17devel)
Type "help" for help.

postgres=# \c template1
You are now connected to database "template1" as user "orlov".
template1=# update pg_class set relfrozenxid='20' where oid = 1262;
UPDATE 1
template1=# do $$

begin

  while 12 - txid_current()::text::int8 > 0 loop

  commit;

end loop;

  end $$;
DO
template1=# create database blocker;
CREATE DATABASE
template1=# create database foo;
CREATE DATABASE
template1=# \c foo
You are now connected to database "foo" as user "orlov".
foo=# create table bar(baz int);
CREATE TABLE
foo=# insert into bar select bar from generate_series(1, 8192) bar;
INSERT 0 8192
foo=# update bar set baz=baz;
UPDATE 8192
foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count
  from
pg_stat_user_tables where relname = 'bar';
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
last_vacuum | last_autovacuum | autovacuum_count
-+---+---+---+++-+-+--
 bar |  8192 |  8192 | 0 |   8192 |   8192 |
  | |0
(1 row)

foo=# update bar set baz=baz;
UPDATE 8192
foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count
  from
pg_stat_user_tables where relname = 'bar';
 relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
last_vacuum | last_autovacuum | autovacuum_count
-+---+---+---+++-+-+--
 bar |  8192 | 16384 | 0 |   8192 |  16384 |
  | |0
(1 row)

... and so on

-- 
Best regards,
Maxim Orlov.


0001-Add-warning-if-datfrozenxid-or-datminmxid-is-not-set.patch
Description: Binary data

Re: initdb --no-locale=C doesn't work as specified when the environment is not C

2023-11-22 Thread Tom Lane

Kyotaro Horiguchi  writes:
> Commit 3e51b278db leaves lc_* conf lines as commented-out when
> their value is "C". This leads to the following behavior.

Hmm ... I see a contributing factor here: this bit in
postgresql.conf.sample is a lie:

#lc_messages = 'C'  # locale for system error message
# strings

A look in guc_tables.c shows that the actual default is '' (empty
string), which means "use the environment", and that matches how the
variable is documented in config.sgml.  Somebody --- quite possibly me
--- was misled by the contents of postgresql.conf.sample into thinking
that the lc_xxx GUCs all default to C, when that's only true for the
others.

I think that a more correct fix for this would treat lc_messages
differently from the other lc_xxx GUCs.  Maybe just eliminate the
hack about not substituting "C" for that one?

In any case, we need to fix this mistake in postgresql.conf.sample.

regards, tom lane

Re: Adding facility for injection points (or probe points?) for more advanced tests

2023-11-22 Thread Ashutosh Bapat

On Tue, Nov 21, 2023 at 6:56 AM Michael Paquier  wrote:
>
> >> This facility is hidden behind a specific configure/Meson switch,
> >> making it a no-op by default:
> >> --enable-injection-points
> >> -Dinjection_points={ true | false }
> >
> > That's useful, but we will also see demand to enable those in
> > production (under controlled circumstances). But let the functionality
> > mature under a separate flag and developer builds before used in
> > production.
>
> Hmm.  Okay.  I'd still keep that under a compile switch for now
> anyway to keep a cleaner separation and avoid issues where it would be
> possible to inject code in a production build.  Note that I was
> planning to switch one of my buildfarm animals to use it should this
> stuff make it into the tree.  And people would be free to use it if
> they want.  If in production, that would be a risk, IMO.

makes sense. Just to be clear - by "in production" I mean user
installations - they may be testing environments or production
environments.

>
> > +/*
> > + * Local cache of injection callbacks already loaded, stored in
> > + * TopMemoryContext.
> > + */
> > +typedef struct InjectionPointArrayEntry
> > +{
> > + char name[INJ_NAME_MAXLEN];
> > + InjectionPointCallback callback;
> > +} InjectionPointArrayEntry;
> > +
> > +static InjectionPointArrayEntry *InjectionPointCacheArray = NULL;
> >
> > Initial size of this array is small, but given the size of code in a
> > given path to be tested, I fear that this may not be sufficient. I
> > feel it's better to use hash table whose APIs are already available.
>
> I've never seen in recent years a need for a given test to use more
> than 4~5 points.  But I guess that you've seen more than that wanted
> in a prod environment with a fork of Postgres?

A given case may not require more than 4 -5 points but users may
create scripts with many frequently required injection points and
install handlers for them.

>
> I'm OK to switch that to use a hash table in the initial
> implementation, even for a low number of entries with points that are
> not in hot code paths that should be OK.  At least for my cases
> related to testing that's OK.  Am I right to assume that you mean a
> HTAB in TopMemoryContext?

Yes.

>
> > I am glad that it covers the frequently needed injections error, notice and
> > wait. If someone adds multiple injection points for wait and all of them are
> > triggered (in different backends), test_injection_points_wake() would
> > wake them all. When debugging cascaded functionality it's not easy to
> > follow sequence add, trigger, wake for multiple injections. We should
> > associate a conditional variable with the required injection points. Attach
> > should create conditional variable in the shared memory for that injection
> > point and detach should remove it. I see this being mentioned in the commit
> > message, but I think it's something we need in the first version of "wait" 
> > to
> > be useful.
>
> More to the point, I actually disagree with that, because it could be
> possible as well that the same condition variable is used across
> multiple points.  At the end, IMHO, the central hash table should hold
> zero meta-data associated to an injection point like a counter, an
> elog, a condition variable, a sleep time, etc. or any combination of
> all these ones, and should just know about how to load a callback with
> a library path and a routine name.  I understand that this is leaving
> the responsibility of what a callback should do down to the individual
> developer implementing a callback into their own extension, where they
> should be free to have conditions of their own.
>
> Something that I agree would be very useful for the in-core APIs,
> depending on the cases, is to be able to push some information to the
> callback at runtime to let a callback decide what to do depending on a
> running state, including a condition registered when a point was
> attached.  See my argument about an _ARG macro that passes down to the
> callback a (void *).

The injection_run function is called from the place where the
injection point is declared but that place does not know what
injection function is going to be run. So a user can not pass
arguments to injection declaration. What injection to run is decided
by the injection_attach and thus one can pass arguments to it but then
injection_attach stores the information in the shared memory from
where it's picked up by injection_run. So even though you don't want
to store the arguments in the shared memory, you are creating a design
which takes us towards that direction eventually - otherwise users
will end up writing many injection functions - one for each possible
combination of count, sleep, conditional variable etc. But let's see
whether that happens to be the case in practice. We will need to
evolve this feature based on usage.

>
> > Those tests need to also install extension. That's another pain point.
> > So anyone wants to run the tests

Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.

2023-11-22 Thread Tom Lane

Daniel Gustafsson  writes:
>> On 22 Nov 2023, at 14:30, Aleksander Alekseev  
>> wrote:
>> It sort of makes sense. I wonder though if we should limit the maximum
>> number of iterations instead. If somebody specified 1_000_000+
>> iteration this could also indicate a user error.

> I don't think it would be useful to limit this at an arbitrary point, 
> iteration
> count can be set per password and if someone want a specific password to be
> super-hard to brute force then why should we limit that?

Maybe because it could be used to construct a DOS scenario?  In
particular, since CHECK_FOR_INTERRUPTS doesn't work on the frontend
side, a situation like this wouldn't be interruptible there.

I agree with Aleksander that such cases are much more likely to
indicate user error than anything else.

regards, tom lane

Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.

2023-11-22 Thread Daniel Gustafsson

> On 22 Nov 2023, at 14:30, Aleksander Alekseev  
> wrote:
> 
> Hi,
> 
>> When the scram_iterations value is set too large, the backend would hang for
>> a long time.  And we can't use Ctrl+C to cancel this query, cause the loop 
>> don't
>> process signal interrupts.
>> 
>> Add CHECK_FOR_INTERRUPTS within the loop of scram_SaltedPassword
>> to handle any signals received during this period may be a good choice.
>> 
>> I wrote a patch to solve this problem. What's your suggestions?
> 
> Thanks for the patch.
> 
> It sort of makes sense. I wonder though if we should limit the maximum
> number of iterations instead. If somebody specified 1_000_000+
> iteration this could also indicate a user error.

I don't think it would be useful to limit this at an arbitrary point, iteration
count can be set per password and if someone want a specific password to be
super-hard to brute force then why should we limit that?

> If we want to add CHECK_FOR_INTERRUPTS inside the loop I think a brief
> comment would be appropriate.

Agreed, it would be helpful.

--
Daniel Gustafsson

Re: How to accurately determine when a relation should use local buffers?

2023-11-22 Thread Aleksander Alekseev

Hi,

> I would propose not to associate temporary relations with local buffers

The whole point of why local buffers exist is to place the buffers of
temp tables into MemoryContexts so that these tables will not fight
for the locks for shared buffers with the rest of the system. If we
start treating them as regular tables this will cause a severe
performance degradation. I doubt that such a patch will make it.

I sort of suspect that you are working on a very specific extension
and/or feature for PG fork. Any chance you could give us more details
about the case?

-- 
Best regards,
Aleksander Alekseev

Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.

2023-11-22 Thread Aleksander Alekseev

Hi,

> When the scram_iterations value is set too large, the backend would hang for
> a long time.  And we can't use Ctrl+C to cancel this query, cause the loop 
> don't
> process signal interrupts.
>
> Add CHECK_FOR_INTERRUPTS within the loop of scram_SaltedPassword
> to handle any signals received during this period may be a good choice.
>
> I wrote a patch to solve this problem. What's your suggestions?

Thanks for the patch.

It sort of makes sense. I wonder though if we should limit the maximum
number of iterations instead. If somebody specified 1_000_000+
iteration this could also indicate a user error.

If we want to add CHECK_FOR_INTERRUPTS inside the loop I think a brief
comment would be appropriate.

-- 
Best regards,
Aleksander Alekseev

Re: remaining sql/json patches

2023-11-22 Thread Amit Langote

On Wed, Nov 22, 2023 at 3:09 PM Amit Langote  wrote:
> The last line in the chart I sent in the last email now look like this:
>
> 17-sqljson  670262 2.57 2640912   1.34
>
> meaning the gram.o text size changes by 2.57% as opposed to 2.97%
> before your fixes.

Andrew asked off-list what the percent increase is compared to 17dev
HEAD.  It's 1.27% (was 1.66% with the previous version).

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Change GUC hashtable to use simplehash?

2023-11-22 Thread John Naylor

I wrote:

> Thinking some more, I'm not quite comfortable with the number of
> places in these patches that have to know about the pre-downcased
> strings, or whether we need that in the first place. If lower case is
> common enough to optimize for, it seems the equality function can just
> check strict equality on the char and only on mismatch try downcasing
> before returning false. Doing our own function would allow the
> compiler to inline it, or at least keep it on the same page. Further,
> the old hash function shouldn't need to branch to do the same
> downcasing, since hashing is lossy anyway. In the keyword hashes, we
> just do "*ch |= 0x20", which downcases letters and turns undercores to
> DEL. I can take a stab at that later.

v4 is a quick POC for that. I haven't verified that it's correct for
the case of the probe and the entry don't match, but in case it
doesn't it should be easy to fix. I also didn't bother with
SH_STORE_HASH in my testing.

0001 adds the murmur32 finalizer -- we should do that regardless of
anything else in this thread.
0002 is just Jeff's 0001
0003 adds an equality function that downcases lazily, and teaches the
hash function about the 0x20 trick.

master:
latency average = 581.765 ms

v3 0001-0005:
latency average = 544.576 ms

v4 0001-0003:
latency average = 547.489 ms

This gives similar results with a tiny amount of code (excluding the
simplehash conversion). I didn't check if the compiler inlined these
functions, but we can hint it if necessary. We could use the new
equality function in all the call sites that currently test for
"guc_name_compare() == 0", in which case it might not end up inlined,
but that's probably okay.

We could also try to improve the hash function's collision behavior by
collecting the bytes on a uint64 and calling our new murmur64 before
returning the lower half, but that's speculative.
From 1a516bb341afb72680470897d75c1d23f75fb37e Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 22 Nov 2023 17:28:41 +0700
Subject: [PATCH v4 1/3] Add finalizer to guc_name_hash

---
 src/backend/utils/misc/guc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 82d8efbc96..e3834d52ee 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -33,6 +33,7 @@
 #include "catalog/objectaccess.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_parameter_acl.h"
+#include "common/hashfn.h"
 #include "guc_internal.h"
 #include "libpq/pqformat.h"
 #include "parser/scansup.h"
@@ -1339,7 +1340,7 @@ guc_name_hash(const void *key, Size keysize)
 		result = pg_rotate_left32(result, 5);
 		result ^= (uint32) ch;
 	}
-	return result;
+	return murmurhash32(result);
 }
 
 /*
-- 
2.42.0

From 01b053b473d897c71725a7bb09a3127fc78140dc Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 22 Nov 2023 18:45:48 +0700
Subject: [PATCH v4 3/3] Optimize GUC functions for simple hash

Only downcase a character when an equality check fails,
since we expect most names to be lower case. Also simplify
downcasing in the hash function.
---
 src/backend/utils/misc/guc.c | 40 
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index bf05b022c3..7896deb63b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -229,6 +229,7 @@ static int	GUCNestLevel = 0;	/* 1 when in main transaction */
 
 
 static int	guc_var_compare(const void *a, const void *b);
+static bool guc_name_eq(const char *namea, const char *nameb);
 static uint32 guc_name_hash(const char *name);
 static void InitializeGUCOptionsFromEnvironment(void);
 static void InitializeOneGUCOption(struct config_generic *gconf);
@@ -271,7 +272,7 @@ static bool call_enum_check_hook(struct config_enum *conf, int *newval,
 #define SH_KEY_TYPE		const char *
 #define	SH_KEY			gucname
 #define SH_HASH_KEY(tb, key)   	guc_name_hash(key)
-#define SH_EQUAL(tb, a, b)		(guc_name_compare(a, b) == 0)
+#define SH_EQUAL(tb, a, b)		(guc_name_eq(a, b))
 #define	SH_SCOPE		static inline
 #define SH_DECLARE
 #define SH_DEFINE
@@ -1308,6 +1309,38 @@ guc_name_compare(const char *namea, const char *nameb)
 	return 0;
 }
 
+static bool
+guc_name_eq(const char *namea, const char *nameb)
+{
+	char		cha;
+	char		chb;
+
+	while (*namea && *nameb)
+	{
+		cha = *namea++;
+		chb = *nameb++;
+
+		if (cha != chb)
+		{
+			/* Casefold lazily since we expect lower case */
+			if (cha >= 'A' && cha <= 'Z')
+cha += 'a' - 'A';
+			if (chb >= 'A' && chb <= 'Z')
+chb += 'a' - 'A';
+
+			if (cha != chb)
+return false;
+		}
+	}
+
+	if (*namea == *nameb)
+		return true;
+	else
+		return false;
+
+	//Assert(guc_name_compare(namea, nameb) == 0);
+}
+
 /*
  * Hash function that's compatible with guc_name_compare
  */
@@ -1320,9 +1353,8 @@ guc_name_hash(const char *name)
 	{
 		char		ch = *name++;
 
-		/* Case-fold in the same way as

Re: WaitEventSet resource leakage

2023-11-22 Thread Alexander Lakhin


20.11.2023 00:09, Thomas Munro wrote:

On Fri, Nov 17, 2023 at 12:22 AM Heikki Linnakangas  wrote:


And here is a patch to implement that on master.

Rationale and code look good to me.




I can also confirm that the patches proposed (for master and back branches)
eliminate WES leakage as expected.

Thanks for the fix!

Maybe you would find appropriate to add the comment
/* Convenience wrappers over ResourceOwnerRemember/Forget */
above ResourceOwnerRememberWaitEventSet
just as it's added above ResourceOwnerRememberRelationRef,
ResourceOwnerRememberDSM, ResourceOwnerRememberFile, ...

(As a side note, this fix doesn't resolve the issue #17828 completely,
because that large number of handles might be also consumed
legally.)

Best regards,
Alexander

Re: autovectorize page checksum code included elsewhere

2023-11-22 Thread Ants Aasma

On Wed, 22 Nov 2023 at 11:44, John Naylor  wrote:
>
> On Tue, Nov 7, 2023 at 9:47 AM Nathan Bossart  
> wrote:
> >
> > Presently, we ask compilers to autovectorize checksum.c and numeric.c.  The
> > page checksum code actually lives in checksum_impl.h, and checksum.c just
> > includes it.  But checksum_impl.h is also used in pg_upgrade/file.c and
> > pg_checksums.c, and since we don't ask compilers to autovectorize those
> > files, the page checksum code may remain un-vectorized.
>
> Poking in those files a bit, I also see references to building with
> SSE 4.1. Maybe that's an avenue that we should pursue? (an indirect
> function call is surely worth it for page-sized data)

For reference, executing the page checksum 10M times on a AMD 3900X CPU:

clang-14 -O2 4.292s (17.8 GiB/s)
clang-14 -O2 -msse4.12.859s (26.7 GiB/s)
clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s)

--
Ants Aasma
Senior Database Engineer
www.cybertec-postgresql.com

Re: pipe_read_line for reading arbitrary strings

2023-11-22 Thread Alvaro Herrera

On 2023-Mar-07, Daniel Gustafsson wrote:

> The attached POC diff replace fgets() with pg_get_line(), which may not be an
> Ok way to cross the streams (it's clearly not a great fit), but as a POC it
> provided a neater interface for reading one-off lines from a pipe IMO.  Does
> anyone else think this is worth fixing before too many callsites use it, or is
> this another case of my fear of silent subtle truncation bugs?  =)

I think this is generally a good change.

I think pipe_read_line should have a "%m" in the "no data returned"
error message.  pg_read_line is careful to retain errno (and it was
already zero at start), so this should be okay ... or should we set
errno again to zero after popen(), even if it works?

(I'm not sure I buy pg_read_line's use of perror in the backend case.
Maybe this is only okay because the backend doesn't use this code?)

pg_get_line says caller can distinguish error from no-input-before-EOF
with ferror(), but pipe_read_line does no such thing.  I wonder what
happens if an NFS mount containing a file being read disappears in the
middle of reading it, for example ... will we think that we completed
reading the file, rather than erroring out?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"That sort of implies that there are Emacs keystrokes which aren't obscure.
I've been using it daily for 2 years now and have yet to discover any key
sequence which makes any sense."(Paul Thomas)

meson vs Cygwin

2023-11-22 Thread Andrew Dunstan




I've been trying to get meson working with Cygwin. On a fresh install 
(Cygwin 3.4.9, gcc 11.4.0, meson 1.0.2, ninja 1.11.1) I get a bunch of 
errors like this:


ERROR:  incompatible library 
"/home/andrew/bf/buildroot/HEAD/pgsql.build/tmp_install/home/andrew/bf/buildroot/HEAD/inst/lib/postgresql/plperl.dll": 
missing magic block


Similar things happen if I try to build with python.

I'm not getting the same on a configure/make build. Not sure what would 
be different.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: pg_upgrade and logical replication

2023-11-22 Thread Shlok Kyal

On Wed, 22 Nov 2023 at 06:48, Peter Smith  wrote:
> ==
> doc/src/sgml/ref/pgupgrade.sgml
>
> 1.
> +  
> +   Create all the new tables that were created in the publication during
> +   upgrade and refresh the publication by executing
> +   ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION.
> +  
>
> "Create all ... that were created" sounds a bit strange.
>
> SUGGESTION (maybe like this or similar?)
> Create equivalent subscriber tables for anything that became newly
> part of the publication during the upgrade and

Modified

> ==
> src/bin/pg_dump/pg_dump.c
>
> 2. getSubscriptionTables
>
> +/*
> + * getSubscriptionTables
> + *   Get information about subscription membership for dumpable tables. This
> + *will be used only in binary-upgrade mode.
> + */
> +void
> +getSubscriptionTables(Archive *fout)
> +{
> + DumpOptions *dopt = fout->dopt;
> + SubscriptionInfo *subinfo = NULL;
> + SubRelInfo *subrinfo;
> + PQExpBuffer query;
> + PGresult   *res;
> + int i_srsubid;
> + int i_srrelid;
> + int i_srsubstate;
> + int i_srsublsn;
> + int ntups;
> + Oid last_srsubid = InvalidOid;
> +
> + if (dopt->no_subscriptions || !dopt->binary_upgrade ||
> + fout->remoteVersion < 17)
> + return;
>
> This function comment says "used only in binary-upgrade mode." and the
> Assert says the same. But, is this compatible with the other function
> dumpSubscriptionTable() where it says "used only in binary-upgrade
> mode and for PG17 or later versions"?
>
Modified

> ==
> src/bin/pg_upgrade/check.c
>
> 3. check_new_cluster_subscription_configuration
>
> +static void
> +check_new_cluster_subscription_configuration(void)
> +{
> + PGresult   *res;
> + PGconn*conn;
> + int nsubs_on_old;
> + int max_replication_slots;
> +
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
>
> IMO it is better to say < 1700 in this check, instead of <= 1600.
>
Modified

> ~~~
>
> 4.
> + /* Quick return if there are no subscriptions to be migrated */
> + if (nsubs_on_old == 0)
> + return;
>
> Missing period in comment.
>
Modified

> ~~~
>
> 5.
> +/*
> + * check_old_cluster_subscription_state()
> + *
> + * Verify that each of the subscriptions has all their corresponding tables 
> in
> + * i (initialize), r (ready) or s (synchronized) state.
> + */
> +static void
> +check_old_cluster_subscription_state(ClusterInfo *cluster)
>
> This function is only for the old cluster (hint: the function name) so
> there is no need to pass the 'cluster' parameter here. Just directly
> use old_cluster in the function body.
>
Modified

> ==
> src/bin/pg_upgrade/t/004_subscription.pl
>
> 6.
> +# Add tab_not_upgraded1 to the publication. Now publication has tab_upgraded1
> +# and tab_upgraded2 tables.
> +$publisher->safe_psql('postgres',
> + "ALTER PUBLICATION regress_pub ADD TABLE tab_upgraded2");
>
> Typo in comment. You added tab_not_upgraded2, not tab_not_upgraded1
>
Modified

> ~~
>
> 7.
> +# Subscription relations should be preserved. The upgraded won't know
> +# about 'tab_not_upgraded1' because the subscription is not yet refreshed.
>
> Typo or missing word in comment?
>
> "The upgraded" ??
>
Modified

Attached the v17 patch which have the same changes

Thanks,
Shlok Kumar Kyal


v17-0001-Preserve-the-full-subscription-s-state-during-pg.patch
Description: Binary data

[PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.

2023-11-22 Thread Bowen Shi

Hi, hackers

When the scram_iterations value is set too large, the backend would hang for
a long time.  And we can't use Ctrl+C to cancel this query, cause the loop don't
process signal interrupts.

Add CHECK_FOR_INTERRUPTS within the loop of scram_SaltedPassword
to handle any signals received during this period may be a good choice.

I wrote a patch to solve this problem. What's your suggestions?

Dears
Bowen Shi


0001-Add-CHECK_FOR_INTERRUPTS-in-scram_SaltedPassword-loo.patch
Description: Binary data

Re: Changing baserel to foreignrel in postgres_fdw functions

2023-11-22 Thread Ashutosh Bapat

On Wed, Nov 22, 2023 at 3:27 AM Bruce Momjian  wrote:

>
> Should this patch be applied?

I think so.

> ---
>
> On Thu, Feb 15, 2018 at 06:57:50PM +0530, Ashutosh Bapat wrote:
> > Hi,
> > I noticed that functions is_foreign_expr(), classifyConditions() and
> > appendOrderByClause() had variables/arguments named baserel when the
> > relations passed to those could be join or upper relation as well.
> > Here's patch renaming those as foreignrel.
>

The patch is more than 5 years old. So it might need adjustments. E.g. the
appendOrderByClause() does not require the change anymore. But the other
two functions still require the changes. There may be other new places that
require change. I have not checked that. If we are accepting this change, I
can update the patch.

-- 
Best Wishes,
Ashutosh

Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION

2023-11-22 Thread Tomas Vondra

On 11/22/23 11:38, Amit Kapila wrote:
> On Tue, Nov 21, 2023 at 6:56 PM Tomas Vondra
>  wrote:
>>
>> On 11/21/23 14:16, Amit Kapila wrote:
>>> On Tue, Nov 21, 2023 at 5:17 PM Tomas Vondra
>>>  wrote:

>>>
>>> It seems there is some inconsistency in what you have written for
>>> client backends/tablesync worker vs. apply worker. The above text
>>> seems to be saying that the client backend and table sync worker are
>>> waiting on a "subscription row in pg_subscription" and the apply
>>> worker is operating on "pg_subscription_rel". So, if that is true then
>>> they shouldn't get stuck.
>>>
>>> I think here client backend and tablesync worker seems to be blocked
>>> for a lock on pg_subscription_rel.
>>>
>>
>> Not really, they are all locking the subscription. All the locks are on
>> classid=6100, which is pg_subscription:
>>
>>   test=# select 6100::regclass;
>>   regclass
>>   -
>>pg_subscription
>>   (1 row)
>>
>> The thing is, the tablesync workers call UpdateSubscriptionRelState,
>> which locks the pg_subscription catalog at the very beginning:
>>
>>LockSharedObject(SubscriptionRelationId, ...);
>>
>> So that's the issue. I haven't explored why it's done this way, and
>> there's no comment explaining locking the subscriptions is needed ...
>>
> 
> I think it prevents concurrent drop of rel during the REFRESH operation.
> 

Yes. Or maybe some other concurrent DDL on the relations included in the
subscription.

 The tablesync workers can't proceed because their lock request is stuck
 behind the AccessExclusiveLock request.

 And the apply worker can't proceed, because it's waiting for status
 update from the tablesync workers.

>>>
>>> This part is not clear to me because
>>> wait_for_relation_state_change()->GetSubscriptionRelState() seems to
>>> be releasing the lock while closing the relation. Am, I missing
>>> something?
>>>
>>
>> I think you're missing the fact that GetSubscriptionRelState() acquires
>> and releases the lock on pg_subscription_rel, but that's not the lock
>> causing the issue. The problem is the lock on the pg_subscription row.
>>
> 
> Okay. IIUC, what's going on here is that the apply worker acquires
> AccessShareLock on pg_subscription to update rel state for one of the
> tables say tbl-1, and then for another table say tbl-2, it started
> waiting for a state change via wait_for_relation_state_change(). I
> think here the fix is to commit the transaction before we go for a
> wait. I guess we need something along the lines of what is proposed in
> [1] though we have solved the problem in that thread in some other
> way..
> 

Possibly. I haven't checked if the commit might have some unexpected
consequences, but I can confirm I can no longer reproduce the deadlock
with the patch applied.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION

2023-11-22 Thread Amit Kapila

On Tue, Nov 21, 2023 at 6:56 PM Tomas Vondra
 wrote:
>
> On 11/21/23 14:16, Amit Kapila wrote:
> > On Tue, Nov 21, 2023 at 5:17 PM Tomas Vondra
> >  wrote:
> >>
> >
> > It seems there is some inconsistency in what you have written for
> > client backends/tablesync worker vs. apply worker. The above text
> > seems to be saying that the client backend and table sync worker are
> > waiting on a "subscription row in pg_subscription" and the apply
> > worker is operating on "pg_subscription_rel". So, if that is true then
> > they shouldn't get stuck.
> >
> > I think here client backend and tablesync worker seems to be blocked
> > for a lock on pg_subscription_rel.
> >
>
> Not really, they are all locking the subscription. All the locks are on
> classid=6100, which is pg_subscription:
>
>   test=# select 6100::regclass;
>   regclass
>   -
>pg_subscription
>   (1 row)
>
> The thing is, the tablesync workers call UpdateSubscriptionRelState,
> which locks the pg_subscription catalog at the very beginning:
>
>LockSharedObject(SubscriptionRelationId, ...);
>
> So that's the issue. I haven't explored why it's done this way, and
> there's no comment explaining locking the subscriptions is needed ...
>

I think it prevents concurrent drop of rel during the REFRESH operation.

> >> The tablesync workers can't proceed because their lock request is stuck
> >> behind the AccessExclusiveLock request.
> >>
> >> And the apply worker can't proceed, because it's waiting for status
> >> update from the tablesync workers.
> >>
> >
> > This part is not clear to me because
> > wait_for_relation_state_change()->GetSubscriptionRelState() seems to
> > be releasing the lock while closing the relation. Am, I missing
> > something?
> >
>
> I think you're missing the fact that GetSubscriptionRelState() acquires
> and releases the lock on pg_subscription_rel, but that's not the lock
> causing the issue. The problem is the lock on the pg_subscription row.
>

Okay. IIUC, what's going on here is that the apply worker acquires
AccessShareLock on pg_subscription to update rel state for one of the
tables say tbl-1, and then for another table say tbl-2, it started
waiting for a state change via wait_for_relation_state_change(). I
think here the fix is to commit the transaction before we go for a
wait. I guess we need something along the lines of what is proposed in
[1] though we have solved the problem in that thread in some other
way..

[1] - https://www.postgresql.org/message-id/1412708.1674417574%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.

Re: How to accurately determine when a relation should use local buffers?

2023-11-22 Thread Давыдов Виталий


Hi Aleksander,

Thank you for your answers. It seems, local buffers are used for temporary 
relations unconditionally. In this case, we may check either relpersistence or 
backend id, or both of them.
I didn't do a deep investigation of the code in this particular aspect but that 
could be a fair point. Would you like to propose a refactoring that unifies the 
way we check if the relation is temporary?I would propose not to associate 
temporary relations with local buffers. I would say, that we that we should 
choose local buffers only in a backend context. It is the primary condition. 
Thus, to choose local buffers, two checks should be succeeded:
 * relpersistence (RelationUsesLocalBuffers) * backend id (SmgrIsTemp)I know, 
it may be not as effective as to check relpersistence only, but it makes 
the internal architecture more flexible, I believe.

With best regards,
Vitaly Davydov

RE: CRC32C Parallel Computation Optimization on ARM

2023-11-22 Thread Xiang Gao

On Date: Fri, 10 Nov 2023 10:36:08AM -0600, Nathan Bossart wrote:

>-# all versions of pg_crc32c_armv8.o need CFLAGS_CRC
>-pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC)
>-pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC)
>-pg_crc32c_armv8_srv.o: CFLAGS+=$(CFLAGS_CRC)
>
>Why are these lines deleted?
>
>-  ['pg_crc32c_armv8', 'USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 'crc'],
>+  ['pg_crc32c_armv8', 'USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK'],
>
>What is the purpose of this change?

Because I added `__attribute__((target("+crc+crypto")))` before the functions 
that require crc extension and crypto extension, so they are removed here.

>+__attribute__((target("+crc+crypto")))
>
>I'm not sure we can assume that all compilers will understand this, and I'm
>not sure we need it.

CFLAGS_CRC is "-march=armv8-a+crc". Generally, if -march is supported, 
__attribute__ is also supported.
In addition, I am not sure about the source file pg_crc32c_armv8.c, if 
CFLAGS_CRC and CFLAGS_CRYPTO are needed at the same time, how should it be 
expressed in the makefile?

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

0007-crc32c-parallel-computation-optimization-on-arm.patch
Description:  0007-crc32c-parallel-computation-optimization-on-arm.patch

RE: Partial aggregates pushdown

2023-11-22 Thread fujii.y...@df.mitsubishielectric.co.jp

Hi Mr. Haas, hackers.

Thank you for your thoughtful comments.

> From: Robert Haas 
> Sent: Tuesday, November 21, 2023 5:52 AM
> I do have a concern about this, though. It adds a lot of bloat. It adds a 
> whole lot of additional entries to pg_aggregate, and
> every new aggregate we add in the future will require a bonus entry for this, 
> and it needs a bunch of new pg_proc entries
> as well. One idea that I've had in the past is to instead introduce syntax 
> that just does this, without requiring a separate
> aggregate definition in each case.
> For example, maybe instead of changing string_agg(whatever) to 
> string_agg_p_text_text(whatever), you can say
> PARTIAL_AGGREGATE
> string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or something. 
> Then all aggregates could be treated
> in a generic way. I'm not completely sure that's better, but I think it's 
> worth considering.
I believe this comment addresses a fundamental aspect of the approach.
So, firstly, could we discuss whether we should fundamentally reconsider the 
approach?

The approach adopted in this patch is as follows.
Approach 1: Adding partial aggregation functions to the catalogs(pg_aggregate, 
pg_proc)

The approach proposed by Mr.Haas is as follows.
Approach 2: Adding a keyword to the SQL syntax to indicate partial aggregation 
requests

The amount of code required to implement Approach 2 has not been investigated,
but comparing Approach 1 and Approach 2 in other aspects, 
I believe they each have the following advantages and disadvantages. 

1. Approach 1
(1) Advantages
(a) No need to change the SQL syntax
(2) Disadvantages
(a) Catalog bloat
As Mr.Haas pointed out, the catalog will bloat by adding partial aggregation 
functions (e.g. avg_p_int8(int8)) 
for each individual aggregate function (e.g. avg(int8)) in pg_aggregate and 
pg_proc (theoretically doubling the size).
Some PostgreSQL developers and users may find this uncomfortable.
(b) Increase in manual procedures
Developers of new aggregate functions (both built-in and user-defined) need to 
manually add the partial aggregation
functions when defining the aggregate functions.
However, the procedure for adding partial aggregation functions for a certain 
aggregate function can be automated,
so this problem can be resolved by improving the patch.
The automation method involves the core part (AggregateCreate() and related 
functions) that executes
the CREATE AGGREGATE command for user-defined functions.
For built-in functions, it involves generating the initial data for the 
pg_aggregate catalog and pg_proc catalog from pg_aggregate.dat and pg_proc.dat
(using the genbki.pl script and related scripts).

2. Approach 2
(1) Advantages
(a) No need to add partial aggregate functions to the catalogs for each 
aggregation
(2) Disadvantages
(a) Need to add non-standard keywords to the SQL syntax.

I did not choose Approach2 because I was not confident that the disadvantage 
mentioned in 2.(2)(a)
would be accepted by the PostgreSQL development community.
If it is accepted, I think Approach 2 is smarter.
Could you please provide your opinion on which
approach is preferable after comparing these two approaches?
If we cannot say anything without comparing the amount of source code, as 
Mr.Momjian mentioned,
we need to estimate the amount of source code required to implement Approach2.

Sincerely yours,
Yuuki Fujii
 
--
Yuuki Fujii
Information Technology R Center Mitsubishi Electric Corporation

Re: Output affected rows in EXPLAIN

2023-11-22 Thread John Naylor

On Wed, Nov 15, 2023 at 2:17 PM  wrote:
>
> We can check the number of updated rows from execute plan,
> I think there is no need to return the command tag
> when EXPLAIN(ANALYZE) is executed by default.

Given that several people have voiced an opinion against this patch,
I'm marking it rejected.

Re: Extension Enhancement: Buffer Invalidation in pg_buffercache

2023-11-22 Thread Cédric Villemain


Le 01/07/2023 à 00:09, Thomas Munro a écrit :

On Fri, Jun 30, 2023 at 10:47 PM Palak Chaturvedi
 wrote:



We also talked a bit about how one might control the kernel page cache
in more fine-grained ways for testing purposes, but it seems like the
pgfincore project has that covered with its pgfadvise_willneed() and
pgfadvise_dontneed().  IMHO that project could use more page-oriented
operations (instead of just counts and coarse grains operations) but
that's something that could be material for patches to send to the
extension maintainers.  This work, in contrast, is more tangled up
with bufmgr.c internals, so it feels like this feature belongs in a
core contrib module.


Precisely what pgfincore is doing/offering already.
Happy to propose to postgresql tree if there are interest. Next step for 
pgfincore is to add cachestat() syscall and evaluates benefits for 
PostgreSQL cost estimators of this new call.


Here an example to achieve the warm/unwarm, each bit is a PostgreSQL 
page, so here we warm cache with the first 3 and remove the last 3 from 
cache (system cache, not shared buffers).


-- Loading and Unloading
cedric=# select * from pgfadvise_loader('pgbench_accounts', 0, true, 
true, B'111000');
 relpath  | os_page_size | os_pages_free | pages_loaded | 
pages_unloaded

--+--+---+--+
 base/11874/16447 | 4096 |408376 |3 | 
   3



---
Cédric Villemain +33 (0)6 20 30 22 52
https://Data-Bene.io
PostgreSQL Expertise, Support, Training, R

Re: autovectorize page checksum code included elsewhere

2023-11-22 Thread John Naylor

On Tue, Nov 7, 2023 at 9:47 AM Nathan Bossart  wrote:
>
> Presently, we ask compilers to autovectorize checksum.c and numeric.c.  The
> page checksum code actually lives in checksum_impl.h, and checksum.c just
> includes it.  But checksum_impl.h is also used in pg_upgrade/file.c and
> pg_checksums.c, and since we don't ask compilers to autovectorize those
> files, the page checksum code may remain un-vectorized.

Poking in those files a bit, I also see references to building with
SSE 4.1. Maybe that's an avenue that we should pursue? (an indirect
function call is surely worth it for page-sized data)

Re: Lockless exit path for ReplicationOriginExitCleanup

2023-11-22 Thread Alvaro Herrera

Hello,

On 2023-Nov-22, Bharath Rupireddy wrote:

> While looking at the use of session_replication_state, I noticed that
> ReplicationOriginLock is acquired in ReplicationOriginExitCleanup()
> even if session_replication_state is reset to NULL by
> replorigin_session_reset(). Why can't there be a lockless exit path
> something like [1] similar to
> replorigin_session_reset() which checks session_replication_state ==
> NULL without a lock?

I suppose we can do this on consistency grounds -- I'm pretty sure you'd
have a really hard time proving that this makes a performance difference --
but this patch is incomplete: just two lines below, we're still testing
session_replication_state for nullness, which would now be dead code.
Please repair.

The comment on session_replication_state is confusing also:

/*
 * Backend-local, cached element from ReplicationState for use in a backend
 * replaying remote commits, so we don't have to search ReplicationState for
 * the backends current RepOriginId.
 */

My problem with it is that this is not a "cached element", but instead a
"cached pointer to [shared memory]".  This is what makes testing that
pointer for null-ness doable, but access to each member therein
requiring lwlock acquisition.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Once again, thank you and all of the developers for your hard work on
PostgreSQL.  This is by far the most pleasant management experience of
any database I've worked on." (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php

Commitfest 2023-11 update 2

2023-11-22 Thread John Naylor

We're about 2/3 of the way through.

At start:
Needs review: 210. Waiting on Author: 42. Ready for Committer: 29.
Committed: 55. Withdrawn: 10. Returned with Feedback: 1. Total: 347.

Today:
Needs review: 183. Waiting on Author: 45. Ready for Committer: 25.
Committed: 76. Returned with Feedback: 4. Withdrawn: 13. Rejected: 1.
Total: 347.

The pace seems to have picked up a bit, based on number of commits.

--
John Naylor

Re: Lockless exit path for ReplicationOriginExitCleanup

2023-11-22 Thread Amit Kapila

On Wed, Nov 22, 2023 at 2:12 PM Bharath Rupireddy
 wrote:
>
> While looking at the use of session_replication_state, I noticed that
> ReplicationOriginLock is acquired in ReplicationOriginExitCleanup()
> even if session_replication_state is reset to NULL by
> replorigin_session_reset(). Why can't there be a lockless exit path
> something like [1] similar to
> replorigin_session_reset() which checks session_replication_state ==
> NULL without a lock?
>

I don't see any problem with such a check but not sure of the benefit
of doing so either.

-- 
With Regards,
Amit Kapila.

Re: Stop the search once replication origin is found

2023-11-22 Thread Amit Kapila

On Mon, Nov 20, 2023 at 4:36 PM Amit Kapila  wrote:
>
> On Mon, Nov 20, 2023 at 2:36 PM Antonin Houska  wrote:
> >
> > Although it's not performance-critical, I think it just makes sense to break
> > the loop in replorigin_session_setup() as soon as we've found the origin.
> >
>
> Your proposal sounds reasonable to me.
>

Pushed, thanks for the patch!

-- 
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-22 Thread Amit Kapila

On Wed, Nov 22, 2023 at 1:30 PM John Naylor  wrote:
>
> On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila  wrote:
> >
> > On Wed, Nov 8, 2023 at 11:05 PM vignesh C  wrote:
> > >
> > > On Wed, 8 Nov 2023 at 08:43, vignesh C  wrote:
> > >
> > > Here is a small improvisation where num_slots need not be initialized
> > > as it will be used only after assigning the result now. The attached
> > > patch has the changes for the same.
> > >
> >
> > Pushed!
>
> Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> the last patch committed. Is there further work that needs to be
> re-attached and/or rebased?
>

No. I have marked it as committed.

-- 
With Regards,
Amit Kapila.

Re: pipe_read_line for reading arbitrary strings

2023-11-22 Thread John Naylor

On Mon, Sep 25, 2023 at 2:55 PM Daniel Gustafsson  wrote:
>
> Fixed, along with commit message wordsmithing in the attached.  Unless 
> objected
> to I'll go ahead with this version.

+1

Lockless exit path for ReplicationOriginExitCleanup

2023-11-22 Thread Bharath Rupireddy

Hi,

While looking at the use of session_replication_state, I noticed that
ReplicationOriginLock is acquired in ReplicationOriginExitCleanup()
even if session_replication_state is reset to NULL by
replorigin_session_reset(). Why can't there be a lockless exit path
something like [1] similar to
replorigin_session_reset() which checks session_replication_state ==
NULL without a lock?

[1]
diff --git a/src/backend/replication/logical/origin.c
b/src/backend/replication/logical/origin.c
index 460e3dcc38..99bbe90f6c 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -1056,6 +1056,9 @@ ReplicationOriginExitCleanup(int code, Datum arg)
 {
ConditionVariable *cv = NULL;

+   if (session_replication_state == NULL)
+   return;
+
LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE);

if (session_replication_state != NULL &&

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: remaining sql/json patches

2023-11-22 Thread Amit Langote

On Fri, Nov 17, 2023 at 6:40 PM Alvaro Herrera  wrote:
> On 2023-Nov-17, Amit Langote wrote:
>
> > On Fri, Nov 17, 2023 at 4:27 PM jian he  wrote:
>
> > > some enum declaration, ending element need an extra comma?
> >
> > Didn't know about the convention to have that comma, but I can see it
> > is present in most enum definitions.
>
> It's new.  See commit 611806cd726f.

I see, thanks.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Evaluate arguments of correlated SubPlans in the referencing ExprState

2023-11-22 Thread John Naylor

On Tue, Oct 10, 2023 at 10:00 AM Andres Freund  wrote:
>
> Hi,
>
> On 2023-10-01 14:53:23 -0400, Tom Lane wrote:
> > Peter Eisentraut  writes:
> > > Is this patch still being worked on?
> >
> > I thought Andres simply hadn't gotten back to it yet.
> > It still seems like a worthwhile improvement.
>
> Indeed - I do plan to commit it. I haven't quite shifted into v17 mode yet...

Any shift yet? ;-)

Re: Unlinking Parallel Hash Join inner batch files sooner

2023-11-22 Thread John Naylor

On Wed, Sep 27, 2023 at 11:42 PM Heikki Linnakangas  wrote:
>
> Looks good to me too at a quick glance. There's this one "XXX free"
> comment though:
>
> >   for (int i = 1; i < old_nbatch; ++i)
> >   {
> >   ParallelHashJoinBatch *shared =
> >   NthParallelHashJoinBatch(old_batches, i);
> >   SharedTuplestoreAccessor *accessor;
> >
> >   accessor = sts_attach(ParallelHashJoinBatchInner(shared),
> > 
> > ParallelWorkerNumber + 1,
> > >fileset);
> >   sts_dispose(accessor);
> >   /* XXX free */
> >   }
>
> I think that's referring to the fact that sts_dispose() doesn't free the
> 'accessor', or any of the buffers etc. that it contains. That's a
> pre-existing problem, though: ExecParallelHashRepartitionRest() already
> leaks the SharedTuplestoreAccessor structs and their buffers etc. of the
> old batches. I'm a little surprised there isn't aready an sts_free()
> function.
>
> Another thought is that it's a bit silly to have to call sts_attach()
> just to delete the files. Maybe sts_dispose() should take the same three
> arguments that sts_attach() does, instead.
>
> So that freeing would be nice to tidy up, although the amount of memory
> leaked is tiny so might not be worth it, and it's a pre-existing issue.
> I'm marking this as Ready for Committer.

(I thought I'd go around and nudge CF entries where both author and
reviewer are committers.)

Hi Thomas, do you have any additional thoughts on the above?

Re: remaining sql/json patches

2023-11-22 Thread Amit Langote

On Wed, Nov 22, 2023 at 4:37 PM Andres Freund  wrote:
> On 2023-11-22 15:09:36 +0900, Amit Langote wrote:
> > OK, I will keep polishing 0001-0003 with the intent to push it next
> > week barring objections / damning findings.
>
> I don't think the patchset is quite there yet. It's definitely getting closer
> though!  I'll try to do another review next week.

That would be great, thank you.  I'll post an update on Friday.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-22 Thread John Naylor

On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila  wrote:
>
> On Wed, Nov 8, 2023 at 11:05 PM vignesh C  wrote:
> >
> > On Wed, 8 Nov 2023 at 08:43, vignesh C  wrote:
> >
> > Here is a small improvisation where num_slots need not be initialized
> > as it will be used only after assigning the result now. The attached
> > patch has the changes for the same.
> >
>
> Pushed!

Hi all, the CF entry for this is marked RfC, and CI is trying to apply
the last patch committed. Is there further work that needs to be
re-attached and/or rebased?

78 matches

Mail list logo