pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15
While working on Materialize's streaming logical replication from Postgres [0], my colleagues Sean Loiselle and Petros Angelatos (CC'd) discovered today what appears to be a correctness bug in pgoutput, introduced in v15. The problem goes like this. A table with REPLICA IDENTITY FULL and some data in it... CREATE TABLE t (a int); ALTER TABLE t REPLICA IDENTITY FULL; INSERT INTO t VALUES (1), (2), (3), ...; ...undergoes a schema change to add a new column with a default: ALTER TABLE t ADD COLUMN b bool DEFAULT false NOT NULL; PostgreSQL is smart and does not rewrite the entire table during the schema change. Instead it updates the tuple description to indicate to future readers of the table that if `b` is missing, it should be filled in with the default value, `false`. Unfortunately, since v15, pgoutput mishandles missing attributes. If a downstream server is subscribed to changes from t via the pgoutput plugin, when a row with a missing attribute is updated, e.g.: UPDATE t SET a = 2 WHERE a = 1 pgoutput will incorrectly report b's value as NULL in the old tuple, rather than false. Using the same example: old: a=1, b=NULL new: a=2, b=true The subscriber will ignore the update (as it has no row with values a=1, b=NULL), and thus the subscriber's copy of `t` will become out of sync with the publisher's. I bisected the problem to 52e4f0cd4 [1], which introduced row filtering for publications. The problem appears to be the use of CreateTupleDescCopy where CreateTupleDescCopyConstr is required, as the former drops the constraints in the tuple description (specifically, the default value constraint) on the floor. I've attached a patch which both fixes the issue and includes a test. I've verified that the test fails against the current master and passes against the patched version. I'm relatively unfamiliar with the project norms here, but assuming the patch is acceptable, this strikes me as important enough to warrant a backport to both v15 and v16. [0]: https://materialize.com/docs/sql/create-source/postgres [1]: https://github.com/postgres/postgres/commit/52e4f0cd472d39d07732b99559989ea3b615be78 0001-pgoutput-don-t-unconditionally-fill-in-missing-value.patch Description: Binary data
Re: GUC names in messages
On Thu, Nov 9, 2023 at 10:04 PM Alvaro Herrera wrote: > > On 2023-Nov-09, Peter Smith wrote: > > > Notice that NOT QUOTED is the far more common pattern, so my vote > > would be just to standardise on making everything this way. I know > > there was some concern raised about ambiguous words like "timezone" > > and "datestyle" etc but in practice, those are rare. Also, those GUCs > > are different in that they are written as camel-case (e.g. > > "DateStyle") in the guc_tables.c, so if they were also written > > camel-case in the messages that could remove ambiguities with normal > > words. YMMV. > > Well, I think camel-casing is also a sufficient differentiator for these > identifiers not being English words. We'd need to ensure they're always > written that way, when not quoted. However, in cases where arbitrary > values are expanded, I don't know that they would be expanded that way, > so I would still go for quoting in that case. > > There's also a few that are not camel-cased nor have any underscores -- > looking at postgresql.conf.sample, we have "port", "bonjour", "ssl", > "fsync", "geqo", "jit", "autovacuum", "xmlbinary", "xmloption". (We also > have "include", but I doubt that's ever used in an error message). But > actually, there's more: every reloption is a candidate, and there we > have "fillfactor", "autosummarize", "fastupdate", "buffering". So if we > want to make generic advice on how to deal with option names in error > messages, I think the wording on conditional quoting I proposed should > go in (adding CamelCase as a reason not to quote), and then we can fix > the code to match. Looking at your list, I think the changes to make > are not too numerous. > Sorry for my delay in getting back to this thread. PSA a patch for this work. There may be some changes I've missed, but hopefully, this is a nudge in the right direction. == Kind Regards, Peter Smith. Fujitsu Australia. v1-0001-GUC-names-docs.patch Description: Binary data v1-0002-GUC-names-fix-quotes.patch Description: Binary data
Catalog domain not-null constraints
This patch set applies the explicit catalog representation of not-null constraints introduced by b0e96f3119 for table constraints also to domain not-null constraints. Since there is no inheritance or primary keys etc., this is much simpler and just applies the existing infrastructure to domains as well. As a result, domain not-null constraints now appear in the information schema correctly. Another effect is that you can now use the ALTER DOMAIN ... ADD/DROP CONSTRAINT syntax for not-null constraints as well. This makes everything consistent overall. For the most part, I structured the code so that there are now separate sibling subroutines for domain check constraints and domain not-null constraints. This seemed to work out best, but one could also consider other ways to refactor this.From c76dd1e21fb2c0a35d45106649788afe2b2b59bd Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 23 Nov 2023 07:35:32 +0100 Subject: [PATCH v1 1/2] Add tests for domain-related information schema views --- src/test/regress/expected/domain.out | 47 src/test/regress/sql/domain.sql | 24 ++ 2 files changed, 71 insertions(+) diff --git a/src/test/regress/expected/domain.out b/src/test/regress/expected/domain.out index 6d94e84414..e70aebd70c 100644 --- a/src/test/regress/expected/domain.out +++ b/src/test/regress/expected/domain.out @@ -1207,3 +1207,50 @@ create domain testdomain1 as int constraint unsigned check (value > 0); alter domain testdomain1 rename constraint unsigned to unsigned_foo; alter domain testdomain1 drop constraint unsigned_foo; drop domain testdomain1; +-- +-- Information schema +-- +SELECT * FROM information_schema.column_domain_usage + WHERE domain_name IN ('con', 'dom', 'pos_int', 'things') + ORDER BY domain_name; + domain_catalog | domain_schema | domain_name | table_catalog | table_schema | table_name | column_name ++---+-+---+--++- + regression | public| con | regression| public | domcontest | col1 + regression | public| dom | regression| public | domview| col1 + regression | public| things | regression| public | thethings | stuff +(3 rows) + +SELECT * FROM information_schema.domain_constraints + WHERE domain_name IN ('con', 'dom', 'pos_int', 'things') + ORDER BY constraint_name; + constraint_catalog | constraint_schema | constraint_name | domain_catalog | domain_schema | domain_name | is_deferrable | initially_deferred ++---+-++---+-+---+ + regression | public| con_check | regression | public| con | NO| NO + regression | public| meow| regression | public| things | NO| NO + regression | public| pos_int_check | regression | public| pos_int | NO| NO +(3 rows) + +SELECT * FROM information_schema.domains + WHERE domain_name IN ('con', 'dom', 'pos_int', 'things') + ORDER BY domain_name; + domain_catalog | domain_schema | domain_name | data_type | character_maximum_length | character_octet_length | character_set_catalog | character_set_schema | character_set_name | collation_catalog | collation_schema | collation_name | numeric_precision | numeric_precision_radix | numeric_scale | datetime_precision | interval_type | interval_precision | domain_default | udt_catalog | udt_schema | udt_name | scope_catalog | scope_schema | scope_name | maximum_cardinality | dtd_identifier ++---+-+---+--++---+--++---+--++---+-+---++---+++-++--+---+--++-+ + regression | public| con | integer | || | | | | || 32 | 2 | 0 || ||| regression | pg_catalog | int4 | | || | 1 + regression | public| dom | integer | || | | | |
Re: remaining sql/json patches
minor issue. maybe you can add the following after /src/test/regress/sql/jsonb_sqljson.sql: 127. Test coverage for ExecPrepareJsonItemCoercion function. SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING date '2018-02-21 12:34:56 +10' AS ts returning date); SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING time '2018-02-21 12:34:56 +10' AS ts returning time); SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING timetz '2018-02-21 12:34:56 +10' AS ts returning timetz); SELECT JSON_VALUE(jsonb 'null', '$ts' PASSING timestamp '2018-02-21 12:34:56 +10' AS ts returning timestamp);
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
On Tue, Nov 21, 2023 at 2:03 PM Dilip Kumar wrote: > > On Mon, Nov 20, 2023 at 4:42 PM Dilip Kumar wrote: > > > > On Mon, Nov 20, 2023 at 2:37 PM Andrey M. Borodin > > wrote: > > > > > > On 20 Nov 2023, at 13:51, Dilip Kumar wrote: > > > > > > > > 2) Do we really need one separate lwlock tranche for each SLRU? > > > > > > > > IMHO if we use the same lwlock tranche then the wait event will show > > > > the same wait event name, right? And that would be confusing for the > > > > user, whether we are waiting for Subtransaction or Multixact or > > > > anything else. Is my understanding no correct here? > > > > > > If we give to a user multiple GUCs to tweak, I think we should give a way > > > to understand which GUC to tweak when they observe wait times. > > PFA, updated patch set, I have worked on review comments by Alvaro and > Andrey. So the only open comments are about clog group commit > testing, for that my question was as I sent in the previous email > exactly what part we are worried about in the coverage report? > > The second point is, if we want to generate a group update we will > have to create the injection point after we hold the control lock so > that other processes go for group update and then for waking up the > waiting process who is holding the SLRU control lock in the exclusive > mode we would need to call a function ('test_injection_points_wake()') > to wake that up and for calling the function we would need to again > acquire the SLRU lock in read mode for visibility check in the catalog > for fetching the procedure row and now this wake up session will block > on control lock for the session which is waiting on injection point so > now it will create a deadlock. Maybe with bank-wise lock we can > create a lot of transaction so that these 2 falls in different banks > and then we can somehow test this, but then we will have to generate > 16 * 4096 = 64k transaction so that the SLRU banks are different for > the transaction which inserted procedure row in system table from the > transaction in which we are trying to do the group commit I have attached a POC patch for testing the group update using the injection point framework. This is just for testing the group update part and is not yet a committable test. I have added a bunch of logs in the code so that we can see what's going on with the group update. >From the below logs, we can see that multiple processes are getting accumulated for the group update and the leader is updating their xid status. Note: With this testing, we have found a bug in the bank-wise approach, basically we are clearing a procglobal->clogGroupFirst, even before acquiring the bank lock that means in most of the cases there will be a single process in each group as a group leader (I think this is what Alvaro was pointing in his coverage report). I have added this fix in this POC just for testing purposes but in my next version I will add this fix to my proper patch version after a proper review and a bit more testing. here is the output after running the test == 2023-11-23 05:55:29.399 UTC [93367] 003_clog_group_commit.pl LOG: procno 6 got the lock 2023-11-23 05:55:29.399 UTC [93367] 003_clog_group_commit.pl STATEMENT: SELECT txid_current(); 2023-11-23 05:55:29.406 UTC [93369] 003_clog_group_commit.pl LOG: statement: SELECT test_injection_points_attach('ClogGroupCommit', 'wait'); 2023-11-23 05:55:29.415 UTC [93371] 003_clog_group_commit.pl LOG: statement: INSERT INTO test VALUES(1); 2023-11-23 05:55:29.416 UTC [93371] 003_clog_group_commit.pl LOG: procno 4 got the lock 2023-11-23 05:55:29.416 UTC [93371] 003_clog_group_commit.pl STATEMENT: INSERT INTO test VALUES(1); 2023-11-23 05:55:29.424 UTC [93373] 003_clog_group_commit.pl LOG: statement: INSERT INTO test VALUES(2); 2023-11-23 05:55:29.425 UTC [93373] 003_clog_group_commit.pl LOG: procno 3 for xid 128742 added for group update 2023-11-23 05:55:29.425 UTC [93373] 003_clog_group_commit.pl STATEMENT: INSERT INTO test VALUES(2); 2023-11-23 05:55:29.431 UTC [93376] 003_clog_group_commit.pl LOG: statement: INSERT INTO test VALUES(3); 2023-11-23 05:55:29.438 UTC [93378] 003_clog_group_commit.pl LOG: statement: INSERT INTO test VALUES(4); 2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl LOG: procno 2 for xid 128743 added for group update 2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl STATEMENT: INSERT INTO test VALUES(3); 2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl LOG: procno 2 is follower and wait for group leader to update commit status of xid 128743 2023-11-23 05:55:29.438 UTC [93376] 003_clog_group_commit.pl STATEMENT: INSERT INTO test VALUES(3); 2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl LOG: procno 1 for xid 128744 added for group update 2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl STATEMENT: INSERT INTO test VALUES(4); 2023-11-23 05:55:29.439 UTC [93378] 003_clog_group_commit.pl LOG: procno 1 is
Re: Synchronizing slots from primary to standby
On Tue, Nov 21, 2023 at 4:35 PM Drouvot, Bertrand wrote: > > On 11/21/23 10:32 AM, shveta malik wrote: > > On Tue, Nov 21, 2023 at 2:02 PM shveta malik wrote: > >> > > > v37 fails to apply to HEAD due to a recent commit e83aa9f92fdd, > > rebased the patches. PFA v37_2 patches. > > Thanks! > > Regarding the promotion flow: If the primary is available and reachable I > don't > think we currently try to ensure that slots are in sync. I think we'd miss the > activity since the last sync and the promotion request or am I missing > something? > > If the primary is available and reachable shouldn't we launch a last round of > synchronization (skipping all the slots that are not in 'r' state)? > We may miss the last round but there is no guarantee that we can ensure to sync of everything if the primary is available. Because after our last sync, there could probably be some more activity. I think it is the user's responsibility to promote a new primary when the old one is not required for some reason. It is not only slots that can be out of sync but even we can miss fetching some of the data. I think this is quite similar to what we do for WAL where on finding the promotion signal, we shut down Walreceiver and just replay any WAL that was already received by walreceiver. Also, the promotion shouldn't create any problem w.r.t subscribers connecting to the new primary because the slot's position is slightly behind what could be requested by subscribers which means the corresponding data will be available on the new primary. Do you have something in mind that can create any problem if we don't attempt additional fetching round after the promotion signal is received? -- With Regards, Amit Kapila.
Re: [HACKERS] Typo in sequence.c
On Fri, Jan 15, 2016 at 01:18:03PM +0900, Vinayak Pokale wrote: > Hi, > > I found a typo in sequence.c > Please check the attached patch. I am not sure how to put this, but the typos are still there seven years after you reported it, so fixed in master. --- > diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c > index c98f981..25c57f6 100644 > --- a/src/backend/commands/sequence.c > +++ b/src/backend/commands/sequence.c > @@ -433,7 +433,7 @@ AlterSequence(AlterSeqStmt *stmt) > aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS, > stmt->sequence->relname); > > - /* lock page' buffer and read tuple into new sequence structure */ > + /* lock page buffer and read tuple into new sequence structure */ > seq = read_seq_tuple(elm, seqrel, , ); > > /* Copy old values of options into workspace */ > @@ -582,7 +582,7 @@ nextval_internal(Oid relid) > return elm->last; > } > > - /* lock page' buffer and read tuple */ > + /* lock page buffer and read tuple */ > seq = read_seq_tuple(elm, seqrel, , ); > page = BufferGetPage(buf); > > @@ -876,7 +876,7 @@ do_setval(Oid relid, int64 next, bool iscalled) >*/ > PreventCommandIfParallelMode("setval()"); > > - /* lock page' buffer and read tuple */ > + /* lock page buffer and read tuple */ > seq = read_seq_tuple(elm, seqrel, , ); > > if ((next < seq->min_value) || (next > seq->max_value)) > > -- > Sent via pgsql-hackers mailing list (pgsql-hack...@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Lockless exit path for ReplicationOriginExitCleanup
On Wed, Nov 22, 2023 at 3:06 PM Alvaro Herrera wrote: > > Hello, > > On 2023-Nov-22, Bharath Rupireddy wrote: > > > While looking at the use of session_replication_state, I noticed that > > ReplicationOriginLock is acquired in ReplicationOriginExitCleanup() > > even if session_replication_state is reset to NULL by > > replorigin_session_reset(). Why can't there be a lockless exit path > > something like [1] similar to > > replorigin_session_reset() which checks session_replication_state == > > NULL without a lock? > > I suppose we can do this on consistency grounds -- I'm pretty sure you'd > have a really hard time proving that this makes a performance difference -- Yes, can't measure the perf gain, however, as said upthread https://www.postgresql.org/message-id/CALj2ACVVhPn7BVQZLGPVvBrLoDZNRaV0LS9rBpt5y%2Bj%3DxRebWw%40mail.gmail.com, it avoids unnecessary lock acquisition and release. > but this patch is incomplete: just two lines below, we're still testing > session_replication_state for nullness, which would now be dead code. > Please repair. Done. > The comment on session_replication_state is confusing also: > > /* > * Backend-local, cached element from ReplicationState for use in a backend > * replaying remote commits, so we don't have to search ReplicationState for > * the backends current RepOriginId. > */ > > My problem with it is that this is not a "cached element", but instead a > "cached pointer to [shared memory]". This is what makes testing that > pointer for null-ness doable, but access to each member therein > requiring lwlock acquisition. Right. I've reworded the comment a bit. PSA v1 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v1-0001-Introduce-lockless-exit-path-for-ReplicationOrigi.patch Description: Binary data
Re: [HACKERS] pg_upgrade vs vacuum_cost_delay
On Thu, Jun 16, 2016 at 04:45:14PM +0200, Magnus Hagander wrote: > On Thu, Jun 16, 2016 at 4:35 PM, Euler Taveira wrote: > > On 16-06-2016 09:05, Magnus Hagander wrote: > > Shouldn't pg_upgrade turn off vacuum cost delay when it vacuums the new > > cluster? Not talking about the post-analyze script, but when it runs > > vacuumdb to analyze and freeze before loading the new schema, in > > prepare_new_cluster()? Those run during downtime, so it seems like you'd > > want those to run as fast as possible. > > > Doesn't --new-options do the job? > > > You could, but it seems like it should do it by default. Based on this seven year old post, I realized there are minimal directions in pg_upgrade docs about how to generate statistics quickly, so I created this patch to help. We do have docs on updating planner statistics: https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-STATISTICS but that doesn't seem to cover cases where you are doing an upgrade or pg_dump restore. Should I move this information into there instead? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you. diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml index 4f78e0e1c0..c72e69bb67 100644 --- a/doc/src/sgml/ref/pgupgrade.sgml +++ b/doc/src/sgml/ref/pgupgrade.sgml @@ -784,6 +784,14 @@ psql --username=postgres --file=script.sql postgres of the upgrade. You might need to set connection parameters to match your new cluster. + + + Using vacuumdb --all --analyze-only can efficiently + generate such statistics, and the use of --jobs + can speed it up. Option --analyze-in-stages can + be used to generate minimal statistics quickly. Non-zero values of + vacuum_cost_delay will delay statistics generation. +
Re: Lockless exit path for ReplicationOriginExitCleanup
On Wed, Nov 22, 2023 at 2:28 PM Amit Kapila wrote: > > On Wed, Nov 22, 2023 at 2:12 PM Bharath Rupireddy > wrote: > > > > While looking at the use of session_replication_state, I noticed that > > ReplicationOriginLock is acquired in ReplicationOriginExitCleanup() > > even if session_replication_state is reset to NULL by > > replorigin_session_reset(). Why can't there be a lockless exit path > > something like [1] similar to > > replorigin_session_reset() which checks session_replication_state == > > NULL without a lock? > > > > I don't see any problem with such a check but not sure of the benefit > of doing so either. It avoids an unnecessary lock acquisition and release when session_replication_state is already reset by replorigin_session_reset() before reaching ReplicationOriginExitCleanup(). A patch something like [1] and a run of subscription tests shows that 153 times the lock acquisition and release can be avoided. ubuntu:~/postgres/src/test/subscription$ grep -ir "with session_replication_state NULL" . | wc -l 153 ubuntu:~/postgres/src/test/subscription$ grep -ir "with session_replication_state not NULL" . | wc -l 174 [1] diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c index 460e3dcc38..dd3824bd27 100644 --- a/src/backend/replication/logical/origin.c +++ b/src/backend/replication/logical/origin.c @@ -1056,6 +1056,11 @@ ReplicationOriginExitCleanup(int code, Datum arg) { ConditionVariable *cv = NULL; + if (session_replication_state == NULL) + elog(LOG, "In ReplicationOriginExitCleanup() with session_replication_state NULL"); + else + elog(LOG, "In ReplicationOriginExitCleanup() with session_replication_state not NULL"); + LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE); if (session_replication_state != NULL && -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.
> I don't think it would be useful to limit this at an arbitrary point, iteration > count can be set per password and if someone wants a specific password to be > super-hard to brute force then why should we limit that? I agree with that. Maybe some users do want a super-hard password. RFC 7677 and RFC 5802 don't specify the maximum number of iterations. > If we want to add CHECK_FOR_INTERRUPTS inside the loop I think a brief > comment would be appropriate. This has been completed in patch v2 and it's ready for review. Regards Bowen Shi From 89c4de0a814d5343c54d9e8501986892fbb4b33e Mon Sep 17 00:00:00 2001 From: bovenshi Date: Wed, 22 Nov 2023 19:30:56 +0800 Subject: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop. When the scram_iterations value is set too large, the backend would hang for a long time. Add CHECK_FOR_INTERRUPTS within the loop of scram_SaltedPassword to handle any signals received during this period. --- src/common/scram-common.c | 8 1 file changed, 8 insertions(+) diff --git a/src/common/scram-common.c b/src/common/scram-common.c index ef997ef..bdf40e8 100644 --- a/src/common/scram-common.c +++ b/src/common/scram-common.c @@ -15,6 +15,7 @@ */ #ifndef FRONTEND #include "postgres.h" +#include "miscadmin.h" #else #include "postgres_fe.h" #endif @@ -73,6 +74,13 @@ scram_SaltedPassword(const char *password, /* Subsequent iterations */ for (i = 2; i <= iterations; i++) { + /* + * Allow it to be interrupted is necesssary when scram_iterations + * is set to a large value. However, this only works in the backend. + */ +#ifndef FRONTEND + CHECK_FOR_INTERRUPTS(); +#endif if (pg_hmac_init(hmac_ctx, (uint8 *) password, password_len) < 0 || pg_hmac_update(hmac_ctx, (uint8 *) Ui_prev, key_length) < 0 || pg_hmac_final(hmac_ctx, Ui, key_length) < 0) -- 2.9.3
Re: Adding a clang-format file
Hi, Just wondering would you mind sharing your .clang-format file? I find the attachment you pointed to is only a demo with a “…” line and it doesn’t format PG code very well. btw I’m also greatly in favor of this idea. clang-format is tightly integrated into CLion and it’s very convenient. right now I’m using a CLion preset by Greenplum [1] but it doesn’t handle struct fields and variables column alignment very well. [1] https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/rDYSYotssbE/m/7HgsWuj7AwAJ > On Sep 28, 2022, at 08:43, Peter Geoghegan wrote: > > On Tue, Sep 27, 2022 at 5:35 AM Aleksander Alekseev > wrote: >> Personally I don't have anything against the idea. TimescaleDB uses >> clang-format to mimic pgindent and it works quite well. One problem >> worth mentioning though is that the clang-format file is dependent on >> the particular version of clang-format. > > I was hoping that something generic could work here. Something that we > could provide that didn't claim to be authoritative, that has a > reasonable degree of compatibility that allows most people to use the > file without much fuss. Kind of like our .editorconfig file. That > might not be a realistic goal, though, since the clang-format settings > are all quite complicated. > > -- > Peter Geoghegan > > >
Re: [HACKERS] make async slave to wait for lsn to be replayed
Hi, I used the latest code and found some conflicts while applying. Which PG version did you rebase? Regards Bowen Shi
Re: [HACKERS] psql casts aspersions on server reliability
On Wed, Nov 22, 2023 at 07:38:52PM -0500, Tom Lane wrote: > Bruce Momjian writes: > > On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote: > >> I could go along with just dropping the last sentence ("This probably...") > >> if the last error we got was FATAL level. I don't find "unexpectedly" > >> to be problematic here: from the point of view of psql, and probably > >> of its user, the shutdown *was* unexpected. > > > I looked at this thread from 2016 and I think the problem is the > > "abnormally" word, since if the server was shutdown by the administrator > > (most likely), it isn't abnormal. Here is a patch to remove > > "abnormally". > > I do not think this is an improvement. The places you are changing > are reacting to a connection closure. *If* we had previously gotten a > "FATAL: terminating connection due to administrator command" message, > then yeah the connection closure is expected; but if not, it isn't. > Your patch adds no check for that. (As I remarked in 2016, we could > probably condition this on the elevel being FATAL, rather than > checking for specific error messages.) Yes, you are correct. Here is a patch that implements the FATAL test, though I am not sure I have the logic correct or backwards, and I don't know how to test this. Thanks. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you. diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index 660cdec93c..c541fd8b02 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -749,8 +749,10 @@ retry4: */ definitelyEOF: libpq_append_conn_error(conn, "server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" - "\tbefore or while processing the request."); + "\tThis probably means the server terminated%s\n" + "\tbefore or while processing the request.", + conn->result->resultStatus == PGRES_FATAL_ERROR ? + "" : "abnormally"); /* Come here if lower-level code already set a suitable errorMessage */ definitelyFailed: diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c index f1192d28f2..6c21f91817 100644 --- a/src/interfaces/libpq/fe-secure-openssl.c +++ b/src/interfaces/libpq/fe-secure-openssl.c @@ -206,8 +206,10 @@ rloop: if (result_errno == EPIPE || result_errno == ECONNRESET) libpq_append_conn_error(conn, "server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" - "\tbefore or while processing the request."); + "\tThis probably means the server terminated%s\n" + "\tbefore or while processing the request.", + conn->result->resultStatus == PGRES_FATAL_ERROR ? + "" : "abnormally"); else libpq_append_conn_error(conn, "SSL SYSCALL error: %s", SOCK_STRERROR(result_errno, @@ -306,8 +308,10 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len) result_errno = SOCK_ERRNO; if (result_errno == EPIPE || result_errno == ECONNRESET) libpq_append_conn_error(conn, "server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" - "\tbefore or while processing the request."); + "\tThis probably means the server terminated%s\n" + "\tbefore or while processing the request.", + conn->result->resultStatus == PGRES_FATAL_ERROR ? + "" : "abnormally"); else libpq_append_conn_error(conn, "SSL SYSCALL error: %s", SOCK_STRERROR(result_errno, diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c index bd72a87bbb..5e7136195a 100644 --- a/src/interfaces/libpq/fe-secure.c +++ b/src/interfaces/libpq/fe-secure.c @@ -233,8 +233,10 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len) case EPIPE: case ECONNRESET: libpq_append_conn_error(conn, "server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" - "\tbefore or while processing the request."); + "\tThis probably means the server terminated%s\n" + "\tbefore or while processing the request.", + conn->result->resultStatus == PGRES_FATAL_ERROR ? + "" : "abnormally"); break; default: @@ -395,8 +397,11 @@ retry_masked: /* (strdup failure is OK, we'll cope later) */ snprintf(msgbuf, sizeof(msgbuf), libpq_gettext("server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" - "\tbefore or while processing the request.")); + "\tThis probably means the server terminated%s\n" + "\tbefore or while processing the request."), +
Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
On Sun, Nov 19, 2023 at 9:17 AM Alexander Korotkov wrote: > It's here. New REALLOCATE_BITMAPSETS forces bitmapset reallocation on > each modification. +1 to the idea of introducing a reallocation mode to Bitmapset. > I had the feeling of falling into a rabbit hole while debugging all > the cases of failure with this new option. With the second patch > regressions tests pass. It seems to me that we have always had situations where we share the same pointer to a Bitmapset structure across different places. I do not think this is a problem as long as we do not modify the Bitmapsets in a way that requires reallocation or impact the locations sharing the same pointer. So I'm wondering, instead of attempting to avoid sharing pointer to Bitmapset in all locations that have problems, can we simply bms_copy the original Bitmapset within replace_relid() before making any modifications, as I proposed previously? Of course, as Andres pointed out, we need to do so also for the "Delete relid without substitution" path. Please see the attached. Thanks Richard v2-0001-Fix-how-SJE-replaces-join-clauses.patch Description: Binary data
Re: [HACKERS] Changing references of password encryption to hashing
On Wed, Nov 22, 2023 at 07:01:32PM -0500, Bruce Momjian wrote: > On Wed, Nov 22, 2023 at 05:55:06PM -0500, Tom Lane wrote: > > Bruce Momjian writes: > > > On Wed, Nov 22, 2023 at 12:52:23PM -0800, Andres Freund wrote: > > >> What's the point of randomly reviving threads from 6 years ago, without > > >> any > > >> further analysis? > > > > > Well, I feel like this is an imporant change, and got dropped because it > > > was determined to not be a new change in Postgres 10. Still I think it > > > should be addressed and am looking to see if others feel the same. > > > > You're expecting people to re-research something that dropped off the > > radar years ago, probably because it wasn't sufficiently interesting > > at the time. If the point hasn't been raised again since then, > > I'd guess it's not any more interesting now. Most of us have better > > things to do than go re-read old threads --- and dredging them up > > ten at a time is not improving the odds that people will care to look. > > I can do the work. I was more looking for any feedback that this is a > valid area to work on. I will do some work on this and publish an > updated patch because I think using the right terms makes sense. Let me also add that I apologize for brining up these old threads. I feel badly I didn't address them years ago, I feel bad for the original posters, and I do think there is some value in addressing some of them, which I think is validated by the many useful doc patches I have made over the past few weeks. I am over half done and hope everyone can be patient with me while I do my best to finish this long-overdue job. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: [HACKERS] psql casts aspersions on server reliability
Bruce Momjian writes: > On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote: >> I could go along with just dropping the last sentence ("This probably...") >> if the last error we got was FATAL level. I don't find "unexpectedly" >> to be problematic here: from the point of view of psql, and probably >> of its user, the shutdown *was* unexpected. > I looked at this thread from 2016 and I think the problem is the > "abnormally" word, since if the server was shutdown by the administrator > (most likely), it isn't abnormal. Here is a patch to remove > "abnormally". I do not think this is an improvement. The places you are changing are reacting to a connection closure. *If* we had previously gotten a "FATAL: terminating connection due to administrator command" message, then yeah the connection closure is expected; but if not, it isn't. Your patch adds no check for that. (As I remarked in 2016, we could probably condition this on the elevel being FATAL, rather than checking for specific error messages.) regards, tom lane
Re: pg_upgrade and logical replication
Here are some review comments for patch v17-0001 == src/bin/pg_dump/pg_dump.c 1. getSubscriptionTables +/* + * getSubscriptionTables + * Get information about subscription membership for dumpable tables. This + *will be used only in binary-upgrade mode and for PG17 or later versions. + */ +void +getSubscriptionTables(Archive *fout) +{ + DumpOptions *dopt = fout->dopt; + SubscriptionInfo *subinfo = NULL; + SubRelInfo *subrinfo; + PQExpBuffer query; + PGresult *res; + int i_srsubid; + int i_srrelid; + int i_srsubstate; + int i_srsublsn; + int ntups; + Oid last_srsubid = InvalidOid; + + if (dopt->no_subscriptions || !dopt->binary_upgrade || + fout->remoteVersion < 17) + return; I still felt that the function comment ("used only in binary-upgrade mode and for PG17 or later") was misleading. IMO that sounds like it would be OK for PG17 regardless of the binary mode, but the code says otherwise. Assuming the code is correct, perhaps the comment should say: "... used only in binary-upgrade mode for PG17 or later versions." ~~~ 2. dumpSubscriptionTable +/* + * dumpSubscriptionTable + * Dump the definition of the given subscription table mapping. This will be + *used only in binary-upgrade mode and for PG17 or later versions. + */ +static void +dumpSubscriptionTable(Archive *fout, const SubRelInfo *subrinfo) (this is the same as the previous review comment #1) Assuming the code is correct, perhaps the comment should say: "... used only in binary-upgrade mode for PG17 or later versions." == src/bin/pg_upgrade/check.c 3. +static void +check_old_cluster_subscription_state() +{ + FILE*script = NULL; + char output_path[MAXPGPATH]; + int ntup; + ClusterInfo *cluster = _cluster; + + prep_status("Checking for subscription state"); + + snprintf(output_path, sizeof(output_path), "%s/%s", + log_opts.basedir, + "subs_invalid.txt"); + for (int dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++) + { + PGresult *res; + DbInfo*active_db = >dbarr.dbs[dbnum]; + PGconn*conn = connectToServer(cluster, active_db->db_name); There seems no need for an extra variable ('cluster') here when you can just reference 'old_cluster' directly in the code, the same as other functions in this file do all the time. == Kind Regards, Peter Smith. Fujitsu Australia
Re: [HACKERS] psql casts aspersions on server reliability
On Wed, Sep 28, 2016 at 09:14:41AM -0400, Tom Lane wrote: > Robert Haas writes: > > psql tends to do things like this: > > rhaas=# select * from pg_stat_activity; > > FATAL: terminating connection due to administrator command > > server closed the connection unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > > Basically everything psql has to say about this is a lie: > > I cannot get terribly excited about this. What you seem to be proposing > is that psql try to intuit the reason for connection closure from the > last error message it got, but that seems likely to lead to worse lies > than printing a boilerplate message. > > I could go along with just dropping the last sentence ("This probably...") > if the last error we got was FATAL level. I don't find "unexpectedly" > to be problematic here: from the point of view of psql, and probably > of its user, the shutdown *was* unexpected. I looked at this thread from 2016 and I think the problem is the "abnormally" word, since if the server was shutdown by the administrator (most likely), it isn't abnormal. Here is a patch to remove "abnormally". -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you. diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c index 660cdec93c..634708d716 100644 --- a/src/interfaces/libpq/fe-misc.c +++ b/src/interfaces/libpq/fe-misc.c @@ -749,7 +749,7 @@ retry4: */ definitelyEOF: libpq_append_conn_error(conn, "server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" + "\tThis probably means the server terminated\n" "\tbefore or while processing the request."); /* Come here if lower-level code already set a suitable errorMessage */ diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c index f1192d28f2..115776ce6c 100644 --- a/src/interfaces/libpq/fe-secure-openssl.c +++ b/src/interfaces/libpq/fe-secure-openssl.c @@ -206,7 +206,7 @@ rloop: if (result_errno == EPIPE || result_errno == ECONNRESET) libpq_append_conn_error(conn, "server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" + "\tThis probably means the server terminated\n" "\tbefore or while processing the request."); else libpq_append_conn_error(conn, "SSL SYSCALL error: %s", @@ -306,7 +306,7 @@ pgtls_write(PGconn *conn, const void *ptr, size_t len) result_errno = SOCK_ERRNO; if (result_errno == EPIPE || result_errno == ECONNRESET) libpq_append_conn_error(conn, "server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" + "\tThis probably means the server terminated\n" "\tbefore or while processing the request."); else libpq_append_conn_error(conn, "SSL SYSCALL error: %s", diff --git a/src/interfaces/libpq/fe-secure.c b/src/interfaces/libpq/fe-secure.c index bd72a87bbb..b972bd3ced 100644 --- a/src/interfaces/libpq/fe-secure.c +++ b/src/interfaces/libpq/fe-secure.c @@ -233,7 +233,7 @@ pqsecure_raw_read(PGconn *conn, void *ptr, size_t len) case EPIPE: case ECONNRESET: libpq_append_conn_error(conn, "server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" + "\tThis probably means the server terminated\n" "\tbefore or while processing the request."); break; @@ -395,7 +395,7 @@ retry_masked: /* (strdup failure is OK, we'll cope later) */ snprintf(msgbuf, sizeof(msgbuf), libpq_gettext("server closed the connection unexpectedly\n" - "\tThis probably means the server terminated abnormally\n" + "\tThis probably means the server terminated\n" "\tbefore or while processing the request.")); /* keep newline out of translated string */ strlcat(msgbuf, "\n", sizeof(msgbuf));
Re: [HACKERS] Changing references of password encryption to hashing
On Wed, Nov 22, 2023 at 05:55:06PM -0500, Tom Lane wrote: > Bruce Momjian writes: > > On Wed, Nov 22, 2023 at 12:52:23PM -0800, Andres Freund wrote: > >> What's the point of randomly reviving threads from 6 years ago, without any > >> further analysis? > > > Well, I feel like this is an imporant change, and got dropped because it > > was determined to not be a new change in Postgres 10. Still I think it > > should be addressed and am looking to see if others feel the same. > > You're expecting people to re-research something that dropped off the > radar years ago, probably because it wasn't sufficiently interesting > at the time. If the point hasn't been raised again since then, > I'd guess it's not any more interesting now. Most of us have better > things to do than go re-read old threads --- and dredging them up > ten at a time is not improving the odds that people will care to look. I can do the work. I was more looking for any feedback that this is a valid area to work on. I will do some work on this and publish an updated patch because I think using the right terms makes sense. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Optionally using a better backtrace library?
On Tue, Sep 5, 2023 at 2:59 AM Alvaro Herrera wrote: > Much appreciated! I can put this to good use. I was just reminded of how our existing backtrace support is lacklustre. Are you planning on submitting a patch for this? -- Peter Geoghegan
Re: common signal handler protection
Hi, On 2023-11-22 15:59:44 -0600, Nathan Bossart wrote: > Subject: [PATCH v2 1/3] Check that MyProcPid == getpid() in all signal > handlers. > > In commit 97550c0711, we added a similar check to the SIGTERM > handler for the startup process. This commit adds this check to > all signal handlers installed with pqsignal(). This is done by > using a wrapper function that performs the check before calling the > actual handler. > > The hope is that this will offer more general protection against > grandchildren processes inadvertently modifying shared memory due > to inherited signal handlers. It's a bit unclear here what grandchildren refers to - it's presumably in reference to postmaster, but the preceding text doesn't even mention postmaster. I'd probably just say "child processes of the current process. > + > +#ifdef PG_SIGNAL_COUNT /* Windows */ > +#define PG_NSIG (PG_SIGNAL_COUNT) > +#elif defined(NSIG) > +#define PG_NSIG (NSIG) > +#else > +#define PG_NSIG (64) /* XXX: wild guess */ > +#endif Perhaps worth adding a static assert for at least a few common types of signals being below that value? That way we'd see a potential issue without needing to reach the code path. > +/* > + * Except when called with SIG_IGN or SIG_DFL, pqsignal() sets up this > function > + * as the handler for all signals. This wrapper handler function checks that > + * it is called within a process that the server knows about, and not a > + * grandchild process forked by system(3), etc. Similar comment to earlier - the grandchildren bit seems like a dangling reference. And also too specific - I think we could encounter this in single user mode as well? Perhaps it could be reframed to "postgres processes, as determined by having called InitProcessGlobals()"? >This check ensures that such > + * grandchildren processes do not modify shared memory, which could be > + * detrimental. "could be" seems a bit euphemistic :) > From b77da9c54590a71eb9921d6f16bf4ffb0543e87e Mon Sep 17 00:00:00 2001 > From: Nathan Bossart > Date: Fri, 17 Nov 2023 14:00:12 -0600 > Subject: [PATCH v2 2/3] Centralize logic for restoring errno in signal > handlers. > > Presently, we rely on each individual signal handler to save the > initial value of errno and then restore it before returning if > needed. This is easily forgotten and, if missed, often goes > undetected for a long time. It's also just verbose :) > From 5734e0cf5f00bbd74504b45934f68e1c2c1290bd Mon Sep 17 00:00:00 2001 > From: Nathan Bossart > Date: Fri, 17 Nov 2023 22:09:24 -0600 > Subject: [PATCH v2 3/3] Revert "Avoid calling proc_exit() in processes forked > by system()." > > Thanks to commit XX, this check in the SIGTERM handler for > the startup process is now obsolete and can be removed. Instead of > leaving around the elog(PANIC, ...) calls that are now unlikely to > be triggered and the dead function write_stderr_signal_safe(), I've > opted to just remove them for now. Thanks to modern version > control software, it will be trivial to dig those up if they are > ever needed in the future. > > This reverts commit 97550c0711972a9856b5db751539bbaf2f4c. > > Reviewed-by: ??? > Discussion: ??? > --- > src/backend/postmaster/startup.c | 17 + > src/backend/storage/ipc/ipc.c| 4 > src/backend/storage/lmgr/proc.c | 8 > src/backend/utils/error/elog.c | 28 > src/include/utils/elog.h | 6 -- > 5 files changed, 1 insertion(+), 62 deletions(-) > > diff --git a/src/backend/postmaster/startup.c > b/src/backend/postmaster/startup.c > index 83dbed86b9..f40acd20ff 100644 > --- a/src/backend/postmaster/startup.c > +++ b/src/backend/postmaster/startup.c > @@ -19,8 +19,6 @@ > */ > #include "postgres.h" > > -#include > - > #include "access/xlog.h" > #include "access/xlogrecovery.h" > #include "access/xlogutils.h" > @@ -113,20 +111,7 @@ static void > StartupProcShutdownHandler(SIGNAL_ARGS) > { > if (in_restore_command) > - { > - /* > - * If we are in a child process (e.g., forked by system() in > - * RestoreArchivedFile()), we don't want to call any exit > callbacks. > - * The parent will take care of that. > - */ > - if (MyProcPid == (int) getpid()) > - proc_exit(1); > - else > - { > - write_stderr_signal_safe("StartupProcShutdownHandler() > called in child process\n"); > - _exit(1); > - } > - } > + proc_exit(1); > else > shutdown_requested = true; > WakeupRecovery(); Hm. I wonder if this indicates an issue. Do the preceding changes perhaps make it more likely that a child process of the startup process could hang around for longer, because the signal we're delivering doesn't terminate child processes, because
Re: [HACKERS] Changing references of password encryption to hashing
Bruce Momjian writes: > On Wed, Nov 22, 2023 at 12:52:23PM -0800, Andres Freund wrote: >> What's the point of randomly reviving threads from 6 years ago, without any >> further analysis? > Well, I feel like this is an imporant change, and got dropped because it > was determined to not be a new change in Postgres 10. Still I think it > should be addressed and am looking to see if others feel the same. You're expecting people to re-research something that dropped off the radar years ago, probably because it wasn't sufficiently interesting at the time. If the point hasn't been raised again since then, I'd guess it's not any more interesting now. Most of us have better things to do than go re-read old threads --- and dredging them up ten at a time is not improving the odds that people will care to look. regards, tom lane
Re: Change GUC hashtable to use simplehash?
Hi, On 2023-11-22 16:27:56 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2023-11-22 15:56:21 -0500, Tom Lane wrote: > >> GUC names are just about always short, though, so I'm not sure you've > >> made your point? > > > With short I meant <= 6 characters (32 / 5 = 6.x). After that you're > > overwriting bits that you previously set, without dispersing the > > "overwritten" > > bits throughout the hash state. > > I'm less than convinced about the "overwrite" part: > > + /* Merge into hash ... not very bright, but it needn't be */ > + result = pg_rotate_left32(result, 5); > + result ^= (uint32) ch; > > Rotating a 32-bit value 5 bits at a time doesn't result in successive > characters lining up exactly, and even once they do, XOR is not > "overwrite". I didn't know what word to use, hence the air quotes. Yes, xor doesn't just set the bits to the right hand side in, but it just affects data on a per-bit basis, which easily can be cancelled out. My understanding of writing hash functions is that every added bit mixed in should have a ~50% chance of causing each other bit to flip. The proposed function obviously doesn't get there. It's worth noting that the limited range of the input values means that there's a lot of bias toward some bits being set ('a' to 'z' all start with 0b011). > I'm pretty dubious that we need something better than this. Well, we know that the current attempt at a dedicated hashfunctions for this does result in substantial amounts of conflicts. And it's hard to understand such cases when you hit them, so I think it's better to avoid exposing ourselves to such dangers, without a distinct need. And I don't really see the need here to risk it, even if we are somewhat confident it's fine. If, which I mildly doubt, we can't afford to call murmurhash32 for every character, we could just call it for 32/5 input characters together. Or we could just load up to 8 characters into an 64bit integer, can call murmurhash64. Something roughly like uint64 result; while (*name) { uint64 value = 0; for (int i = 0; i < 8 && *name; i++) { char ch = *name++; value |= *name; value = value << 8; } result = hash_combine64(result, murmurhash64(value)); } The hash_combine use isn't quite right either, we should use the full accumulator state of a proper hash function, but it's seems very unlikely to matter here. The fact that string_hash() is slow due to the strlen(), which causes us to process the input twice and which is optimized to also handle very long strings which typically string_hash() doesn't encounter, seems problematic far beyond this case. We use string_hash() in a *lot* of places, and that strlen() does regularly show up in profiles. We should fix that. The various hash functions being external functions also shows up in a bunch of profiles too. It's particularly ridiculous for cases like tag_hash(), where the caller typically knows the lenght, but calls a routine in a different translation unit, which obviously can't be optimized for a specific length. I think we ought to adjust our APIs around this: 1) The accumulator state of the hash functions should be exposed, so one can accumulate values into the hash state, without reducing the internal state to a single 32/64 bit variable. 2) For callers that know the length of data, we should use a static inline hash function, rather than an external function call. This should include special cased inline functions for adding 32/64bit of data to the hash state. Perhaps with a bit of logic to *not* use the inline version if the hashed input is long (and thus the call overhead doesn't matter). Something like if (__builtin_constant_p(len) && len < 128) /* call inline implementation */ else /* call out of line implementation, not worth the code bloat */ We know that hash functions should have the split into init/process data*/finish steps, as e.g. evidenced by pg_crc.h/pg_crc32.h. With something like that, you could write a function that lowercases characters inline without incurring unnecessary overhead. hash32_state hs; hash32_init(); while (*name) { charch = *name++; /* crappy lowercase for this situation */ ch |= 0x20; hash32_process_byte(, ch); } return hash32_finish(); Perhaps with some additional optimization for processing the input string in 32/64 bit quantities. Greetings, Andres Freund
Re: Stop the search once replication origin is found
On Wed, Nov 22, 2023 at 7:49 PM Amit Kapila wrote: > > On Mon, Nov 20, 2023 at 4:36 PM Amit Kapila wrote: > > > > On Mon, Nov 20, 2023 at 2:36 PM Antonin Houska wrote: > > > > > > Although it's not performance-critical, I think it just makes sense to > > > break > > > the loop in replorigin_session_setup() as soon as we've found the origin. > > > > > > > Your proposal sounds reasonable to me. > > > > Pushed, thanks for the patch! > > -- Hi, While reviewing the replorigin_session_setup() fix [1] that was pushed yesterday, I saw that some nearby code in that same function might benefit from some refactoring. I'm not sure if you want to modify it or not, but FWIW I think the code can be tidied by making the following changes: ~~~ 1. else if (curstate->acquired_by != 0 && acquired_by == 0) { ereport(ERROR, (errcode(ERRCODE_OBJECT_IN_USE), errmsg("replication origin with ID %d is already active for PID %d", curstate->roident, curstate->acquired_by))); } 1a. AFAICT the above code doesn't need to be else/if 1b. The brackets are unnecessary for a single statement. ~~~ 2. if (session_replication_state == NULL && free_slot == -1) ereport(ERROR, (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), errmsg("could not find free replication state slot for replication origin with ID %d", node), errhint("Increase max_replication_slots and try again."))); else if (session_replication_state == NULL) { /* initialize new slot */ session_replication_state = _states[free_slot]; Assert(session_replication_state->remote_lsn == InvalidXLogRecPtr); Assert(session_replication_state->local_lsn == InvalidXLogRecPtr); session_replication_state->roident = node; } The above code can be improved by combining within a single check for session_replication_state NULL. ~~~ 3. There are some unnecessary double-blank lines. ~~~ 4. /* ok, found slot */ session_replication_state = curstate; break; QUESTION: That 'session_replication_state' is a global variable, but there is more validation logic that comes *after* this assignment which might decide there was some problem and cause an ereport or elog. In practice, maybe it makes no difference, but it did seem slightly dubious to me to assign to a global before determining everything is OK. Thoughts? ~~~ Anyway, PSA a patch for the 1-3 above. == [1] https://www.postgresql.org/message-id/flat/2694.1700471273%40antos Kind Regards, Peter Smith. Fujitsu Australia v1-0001-replorigin_session_setup-refactoring.patch Description: Binary data
Re: common signal handler protection
On Tue, Nov 21, 2023 at 04:40:06PM -0600, Nathan Bossart wrote: > cfbot seems unhappy with this on Windows. IIUC we need to use > PG_SIGNAL_COUNT there instead, but I'd like to find a way to have just one > macro for all platforms. Here's an attempt at fixing the Windows build. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From fae0d1855c74aa13574943370ffbe79f768bdb01 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 17 Nov 2023 21:38:18 -0600 Subject: [PATCH v2 1/3] Check that MyProcPid == getpid() in all signal handlers. In commit 97550c0711, we added a similar check to the SIGTERM handler for the startup process. This commit adds this check to all signal handlers installed with pqsignal(). This is done by using a wrapper function that performs the check before calling the actual handler. The hope is that this will offer more general protection against grandchildren processes inadvertently modifying shared memory due to inherited signal handlers. Another potential follow-up improvement is to use this wrapper handler function to restore errno instead of relying on each individual handler function to do so. This commit makes the changes in commit 97550c0711 obsolete but leaves reverting it for a follow-up commit. Reviewed-by: ??? Discussion: ??? --- src/port/pqsignal.c | 74 +++-- 1 file changed, 72 insertions(+), 2 deletions(-) diff --git a/src/port/pqsignal.c b/src/port/pqsignal.c index 83d876db6c..0baf8eed11 100644 --- a/src/port/pqsignal.c +++ b/src/port/pqsignal.c @@ -28,23 +28,87 @@ #include "c.h" #include +#ifndef FRONTEND +#include +#endif #ifndef FRONTEND #include "libpq/pqsignal.h" +#include "miscadmin.h" +#endif + +#ifdef PG_SIGNAL_COUNT /* Windows */ +#define PG_NSIG (PG_SIGNAL_COUNT) +#elif defined(NSIG) +#define PG_NSIG (NSIG) +#else +#define PG_NSIG (64) /* XXX: wild guess */ +#endif + +static volatile pqsigfunc pqsignal_handlers[PG_NSIG]; + +/* + * Except when called with SIG_IGN or SIG_DFL, pqsignal() sets up this function + * as the handler for all signals. This wrapper handler function checks that + * it is called within a process that the server knows about, and not a + * grandchild process forked by system(3), etc. This check ensures that such + * grandchildren processes do not modify shared memory, which could be + * detrimental. If the check succeeds, the function originally provided to + * pqsignal() is called. Otherwise, the default signal handler is installed + * and then called. + */ +static void +wrapper_handler(SIGNAL_ARGS) +{ +#ifndef FRONTEND + Assert(MyProcPid); + + if (unlikely(MyProcPid != (int) getpid())) + { + pqsignal(postgres_signal_arg, SIG_DFL); + raise(postgres_signal_arg); + return; + } #endif + (*pqsignal_handlers[postgres_signal_arg]) (postgres_signal_arg); +} + /* * Set up a signal handler, with SA_RESTART, for signal "signo" * * Returns the previous handler. + * + * NB: If called within a signal handler, race conditions may lead to bogus + * return values. You should either avoid calling this within signal handlers + * or ignore the return value. + * + * XXX: Since no in-tree callers use the return value, and there is little + * reason to do so, it would be nice if we could convert this to a void + * function instead of providing potentially-bogus return values. + * Unfortunately, that requires modifying the pqsignal() in legacy-pqsignal.c, + * which in turn requires an SONAME bump, which is probably not worth it. */ pqsigfunc pqsignal(int signo, pqsigfunc func) { + pqsigfunc orig_func = pqsignal_handlers[signo]; /* assumed atomic */ #if !(defined(WIN32) && defined(FRONTEND)) struct sigaction act, oact; +#else + pqsigfunc ret; +#endif + + Assert(signo < PG_NSIG); + if (func != SIG_IGN && func != SIG_DFL) + { + pqsignal_handlers[signo] = func; /* assumed atomic */ + func = wrapper_handler; + } + +#if !(defined(WIN32) && defined(FRONTEND)) act.sa_handler = func; sigemptyset(_mask); act.sa_flags = SA_RESTART; @@ -54,9 +118,15 @@ pqsignal(int signo, pqsigfunc func) #endif if (sigaction(signo, , ) < 0) return SIG_ERR; - return oact.sa_handler; + else if (oact.sa_handler == wrapper_handler) + return orig_func; + else + return oact.sa_handler; #else /* Forward to Windows native signal system. */ - return signal(signo, func); + if ((ret = signal(signo, func)) == wrapper_handler) + return orig_func; + else + return ret; #endif } -- 2.25.1 >From b77da9c54590a71eb9921d6f16bf4ffb0543e87e Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 17 Nov 2023 14:00:12 -0600 Subject: [PATCH v2 2/3] Centralize logic for restoring errno in signal handlers. Presently, we rely on each individual signal handler to save the initial value of errno and then restore it before returning if needed. This is easily forgotten and, if missed, often goes undetected for a long time. In commit XX, we introduced a wrapper signal
Re: psql not responding to SIGINT upon db reconnection
On Wed Nov 22, 2023 at 3:00 PM CST, Heikki Linnakangas wrote: On 22/11/2023 19:29, Tristan Partin wrote: > On Thu Nov 16, 2023 at 8:33 AM CST, Heikki Linnakangas wrote: >> On 06/11/2023 19:16, Tristan Partin wrote: > That sounds like a much better solution. Attached you will find a v4 > that implements your suggestion. Please let me know if there is > something that I missed. I can confirm that the patch works. >> >> This patch is missing a select(). It will busy loop until the connection >> is established or cancelled. > > If I add a wait (select, poll, etc.), then I can't control-C during the > blocking call, so it doesn't really solve the problem. Hmm, they should return with EINTR on signal. At least on Linux; I'm not sure how portable that is. See signal(7) man page, section "Interruption of system calls and library functions by signal handlers". You could also use a timeout like 5 s to ensure that you wake up and notice that the signal was received eventually, even if it doesn't interrupt the blocking call. Ha, you're right. I had this working yesterday, but convinced myself it didn't. I had a do while loop wrapping the blocking call. Here is a v4, which seems to pass the tests that were pointed out to be failing earlier. Noticed that I copy-pasted pqSocketPoll() into the psql code. I think this may be controversial. Not sure what the best solution to the issue is. I will pay attention to the buildfarm animals when they pick this up. -- Tristan Partin Neon (https://neon.tech) From 1b4303df1200c561cf478f9c7037ff940f6cd741 Mon Sep 17 00:00:00 2001 From: Tristan Partin Date: Mon, 24 Jul 2023 11:12:59 -0500 Subject: [PATCH v4] Allow SIGINT to cancel psql database reconnections After installing the SIGINT handler in psql, SIGINT can no longer cancel database reconnections. For instance, if the user starts a reconnection and then needs to do some form of interaction (ie psql is polling), there is no way to cancel the reconnection process currently. Use PQconnectStartParams() in order to insert a CancelRequested check into the polling loop. --- src/bin/psql/command.c | 129 - 1 file changed, 128 insertions(+), 1 deletion(-) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 82cc091568..588eed9351 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -11,6 +11,11 @@ #include #include #include +#if HAVE_POLL +#include +#else +#include +#endif #ifndef WIN32 #include /* for stat() */ #include /* for setitimer() */ @@ -40,6 +45,7 @@ #include "large_obj.h" #include "libpq-fe.h" #include "libpq/pqcomm.h" +#include "libpq/pqsignal.h" #include "mainloop.h" #include "portability/instr_time.h" #include "pqexpbuffer.h" @@ -3251,6 +3257,90 @@ copy_previous_query(PQExpBuffer query_buf, PQExpBuffer previous_buf) return false; } +/* + * Check a file descriptor for read and/or write data, possibly waiting. + * If neither forRead nor forWrite are set, immediately return a timeout + * condition (without waiting). Return >0 if condition is met, 0 + * if a timeout occurred, -1 if an error or interrupt occurred. + * + * Timeout is infinite if end_time is -1. Timeout is immediate (no blocking) + * if end_time is 0 (or indeed, any time before now). + */ +static int +pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time) +{ + /* We use poll(2) if available, otherwise select(2) */ +#ifdef HAVE_POLL + struct pollfd input_fd; + int timeout_ms; + + if (!forRead && !forWrite) + return 0; + + input_fd.fd = sock; + input_fd.events = POLLERR; + input_fd.revents = 0; + + if (forRead) + input_fd.events |= POLLIN; + if (forWrite) + input_fd.events |= POLLOUT; + + /* Compute appropriate timeout interval */ + if (end_time == ((time_t) -1)) + timeout_ms = -1; + else + { + time_t now = time(NULL); + + if (end_time > now) + timeout_ms = (end_time - now) * 1000; + else + timeout_ms = 0; + } + + return poll(_fd, 1, timeout_ms); +#else /* !HAVE_POLL */ + + fd_set input_mask; + fd_set output_mask; + fd_set except_mask; + struct timeval timeout; + struct timeval *ptr_timeout; + + if (!forRead && !forWrite) + return 0; + + FD_ZERO(_mask); + FD_ZERO(_mask); + FD_ZERO(_mask); + if (forRead) + FD_SET(sock, _mask); + + if (forWrite) + FD_SET(sock, _mask); + FD_SET(sock, _mask); + + /* Compute appropriate timeout interval */ + if (end_time == ((time_t) -1)) + ptr_timeout = NULL; + else + { + time_t now = time(NULL); + + if (end_time > now) + timeout.tv_sec = end_time - now; + else + timeout.tv_sec = 0; + timeout.tv_usec = 0; + ptr_timeout = + } + + return select(sock + 1, _mask, _mask, + _mask, ptr_timeout); +#endif /* HAVE_POLL */ +} + /* * Ask the user for a password; 'username' is the username the * password is for, if one has been explicitly specified. @@ -3324,6 +3414,7 @@ do_connect(enum trivalue reuse_previous_specification, bool keep_password =
Re: [HACKERS] Changing references of password encryption to hashing
On Wed, Nov 22, 2023 at 12:52:23PM -0800, Andres Freund wrote: > On 2023-11-21 22:43:48 -0500, Bruce Momjian wrote: > > Is there any interest in fixing our documentation that says encrypted > > when it means hashed? Should I pursue this? > > What's the point of randomly reviving threads from 6 years ago, without any > further analysis? Well, I feel like this is an imporant change, and got dropped because it was determined to not be a new change in Postgres 10. Still I think it should be addressed and am looking to see if others feel the same. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Change GUC hashtable to use simplehash?
Andres Freund writes: > On 2023-11-22 15:56:21 -0500, Tom Lane wrote: >> GUC names are just about always short, though, so I'm not sure you've >> made your point? > With short I meant <= 6 characters (32 / 5 = 6.x). After that you're > overwriting bits that you previously set, without dispersing the "overwritten" > bits throughout the hash state. I'm less than convinced about the "overwrite" part: + /* Merge into hash ... not very bright, but it needn't be */ + result = pg_rotate_left32(result, 5); + result ^= (uint32) ch; Rotating a 32-bit value 5 bits at a time doesn't result in successive characters lining up exactly, and even once they do, XOR is not "overwrite". I'm pretty dubious that we need something better than this. regards, tom lane
Re: Change GUC hashtable to use simplehash?
Hi, On 2023-11-22 15:56:21 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2023-11-21 16:42:55 +0700, John Naylor wrote: > >> The strlen call required for hashbytes() is not free. The lack of > >> mixing in the (probably inlined after 0001) previous hash function can > >> remedied directly, as in the attached: > > > I doubt this is a good hashfunction. For short strings, sure, but after > > that... I don't think it makes sense to reduce the internal state of a hash > > function to something this small. > > GUC names are just about always short, though, so I'm not sure you've > made your point? With short I meant <= 6 characters (32 / 5 = 6.x). After that you're overwriting bits that you previously set, without dispersing the "overwritten" bits throughout the hash state. It's pretty easy to create conflicts this way, even just on paper. E.g. I think abcdefgg and cbcdefgw would have the same hash, because the accumulated value passed to murmurhash32 is the same. The fact that this happens when a large part of the string is the same is bad, because it makes it more likely that prefixed strings trigger such conflicts, and they're obviously common with GUC strings. Greetings, Andres Freund
Re: Partial aggregates pushdown
On Wed, Nov 22, 2023 at 10:16:16AM +, fujii.y...@df.mitsubishielectric.co.jp wrote: > 2. Approach 2 > (1) Advantages > (a) No need to add partial aggregate functions to the catalogs for each > aggregation > (2) Disadvantages > (a) Need to add non-standard keywords to the SQL syntax. > > I did not choose Approach2 because I was not confident that the disadvantage > mentioned in 2.(2)(a) > would be accepted by the PostgreSQL development community. > If it is accepted, I think Approach 2 is smarter. > Could you please provide your opinion on which > approach is preferable after comparing these two approaches? I didn't know #2 was possible, but given the great number of catalog entries, doing it in the SQL grammar seems cleaner to me. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: CRC32C Parallel Computation Optimization on ARM
On Wed, Nov 22, 2023 at 10:16:44AM +, Xiang Gao wrote: > On Date: Fri, 10 Nov 2023 10:36:08AM -0600, Nathan Bossart wrote: >>+__attribute__((target("+crc+crypto"))) >> >>I'm not sure we can assume that all compilers will understand this, and I'm >>not sure we need it. > > CFLAGS_CRC is "-march=armv8-a+crc". Generally, if -march is supported, > __attribute__ is also supported. IMHO we should stick with CFLAGS_CRC for now. If we want to switch to using __attribute__((target("..."))), I think we should do so in a separate patch. We are cautious about checking the availability of an attribute before using it (see c.h), and IIUC we'd need to verify that this works for all supported compilers that can target ARM before removing CFLAGS_CRC here. > In addition, I am not sure about the source file pg_crc32c_armv8.c, if > CFLAGS_CRC and CFLAGS_CRYPTO are needed at the same time, how should it > be expressed in the makefile? pg_crc32c_armv8.o: CFLAGS += ${CFLAGS_CRC} ${CFLAGS_CRYPTO} -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: psql not responding to SIGINT upon db reconnection
On 22/11/2023 19:29, Tristan Partin wrote: On Thu Nov 16, 2023 at 8:33 AM CST, Heikki Linnakangas wrote: On 06/11/2023 19:16, Tristan Partin wrote: That sounds like a much better solution. Attached you will find a v4 that implements your suggestion. Please let me know if there is something that I missed. I can confirm that the patch works. This patch is missing a select(). It will busy loop until the connection is established or cancelled. If I add a wait (select, poll, etc.), then I can't control-C during the blocking call, so it doesn't really solve the problem. Hmm, they should return with EINTR on signal. At least on Linux; I'm not sure how portable that is. See signal(7) man page, section "Interruption of system calls and library functions by signal handlers". You could also use a timeout like 5 s to ensure that you wake up and notice that the signal was received eventually, even if it doesn't interrupt the blocking call. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Change GUC hashtable to use simplehash?
Andres Freund writes: > On 2023-11-21 16:42:55 +0700, John Naylor wrote: >> The strlen call required for hashbytes() is not free. The lack of >> mixing in the (probably inlined after 0001) previous hash function can >> remedied directly, as in the attached: > I doubt this is a good hashfunction. For short strings, sure, but after > that... I don't think it makes sense to reduce the internal state of a hash > function to something this small. GUC names are just about always short, though, so I'm not sure you've made your point? At worst, maybe this with 64-bit state instead of 32? regards, tom lane
Re: [HACKERS] Changing references of password encryption to hashing
On 2023-11-21 22:43:48 -0500, Bruce Momjian wrote: > Is there any interest in fixing our documentation that says encrypted > when it means hashed? Should I pursue this? What's the point of randomly reviving threads from 6 years ago, without any further analysis?
Re: Change GUC hashtable to use simplehash?
Hi, On 2023-11-21 16:42:55 +0700, John Naylor wrote: > I get a noticeable regression in 0002, though, and I think I see why: > > guc_name_hash(const char *name) > { > - uint32 result = 0; > + const unsigned char *bytes = (const unsigned char *)name; > + int blen = strlen(name); > > The strlen call required for hashbytes() is not free. The lack of > mixing in the (probably inlined after 0001) previous hash function can > remedied directly, as in the attached: I doubt this is a good hashfunction. For short strings, sure, but after that... I don't think it makes sense to reduce the internal state of a hash function to something this small. Greetings, Andres Freund
Re: remaining sql/json patches
Hi, On 2023-11-21 12:52:35 +0900, Amit Langote wrote: > version gram.o text bytes %change gram.c bytes %change > > 9.6 534010 -2108984 - > 10 582554 9.09 2258313 7.08 > 11 584596 0.35 2313475 2.44 > 12 590957 1.08 2341564 1.21 > 13 590381-0.09 2357327 0.67 > 14 600707 1.74 2428841 3.03 > 15 633180 5.40 2495364 2.73 > 16 653464 3.20 2575269 3.20 > 17-sqljson 672800 2.95 2709422 3.97 > > So if we put SQL/JSON (including JSON_TABLE()) into 17, we end up with a > gram.o 2.95% larger than v16, which granted is a somewhat larger bump, though > also smaller than with some of recent releases. I think it's ok to increase the size if it's necessary increases - but I also think we've been a bit careless at times, and that that has made the parser slower. There's probably also some "infrastructure" work we could do combat some of the growth too. I know I triggered the use of the .c bytes and text size, but it'd probably more sensible to look at the size of the important tables generated by bison. I think the most relevant defines are: #define YYLAST 117115 #define YYNTOKENS 521 #define YYNNTS 707 #define YYNRULES 3300 #define YYNSTATES 6255 #define YYMAXUTOK 758 I think a lot of the reason we end up with such a big "state transition" space is that a single addition to e.g. col_name_keyword or unreserved_keyword increases the state space substantially, because it adds new transitions to so many places. We're in quadratic territory, I think. We might be able to do some lexer hackery to avoid that, but not sure. Greetings, Andres Freund
Re: Parallel CREATE INDEX for BRIN indexes
On 11/20/23 20:48, Matthias van de Meent wrote: > On Wed, 8 Nov 2023 at 12:03, Tomas Vondra > wrote: >> >> Hi, >> >> here's an updated patch, addressing the review comments, and reworking >> how the work is divided between the workers & leader etc. >> >> 0001 is just v2, rebased to current master >> >> 0002 and 0003 address most of the issues, in particular it >> >> - removes the unnecessary spool >> - fixes bs_reltuples type to double >> - a couple comments are reworded to be clearer >> - changes loop/condition in brinbuildCallbackParallel >> - removes asserts added for debugging >> - fixes cast in comparetup_index_brin >> - 0003 then simplifies comparetup_index_brin >> - I haven't inlined the tuplesort_puttuple_common parameter >> (didn't seem worth it) > > OK, thanks > >> 0004 Reworks how the work is divided between workers and combined by the >> leader. It undoes the tableam.c changes that attempted to divide the >> relation into chunks matching the BRIN ranges, and instead merges the >> results in the leader (using the BRIN "union" function). > > That's OK. > >> I haven't done any indentation fixes yet. >> >> I did fairly extensive testing, using pageinspect to compare indexes >> built with/without parallelism. More testing is needed, but it seems to >> work fine (with other opclasses and so on). > > After code-only review, here are some comments: > >> +++ b/src/backend/access/brin/brin.c >> [...] >> +/* Magic numbers for parallel state sharing */ >> +#define PARALLEL_KEY_BRIN_SHAREDUINT64CONST(0xA001) >> +#define PARALLEL_KEY_TUPLESORTUINT64CONST(0xA002) > > These shm keys use the same constants also in use in > access/nbtree/nbtsort.c. While this shouldn't be an issue in normal > operations, I'd prefer if we didn't actively introduce conflicting > identifiers when we still have significant amounts of unused values > remaining. > Hmmm. Is there some rule of thumb how to pick these key values? I see nbtsort.c uses 0xA prefix, execParallel.c uses 0xE, while parallel.c ended up using 0x as prefix. I've user 0xB, simply because BRIN also starts with B. >> +#define PARALLEL_KEY_QUERY_TEXTUINT64CONST(0xA003) > > This is the fourth definition of a PARALLEL%_KEY_QUERY_TEXT, the > others being in access/nbtree/nbtsort.c (value 0xA004, one > more than brin's), backend/executor/execParallel.c > (0xE008), and PARALLEL_VACUUM_KEY_QUERY_TEXT (0x3) (though > I've not checked that their uses are exactly the same, I'd expect at > least btree to match mostly, if not fully, 1:1). > I think we could probably benefit from a less ad-hoc sharing of query > texts. I don't think that needs to happen specifically in this patch, > but I think it's something to keep in mind in future efforts. > I'm afraid I don't quite get what you mean by "ad hoc sharing of query texts". Are you saying we shouldn't propagate the query text to the parallel workers? Why? Or what's the proper solution? >> +_brin_end_parallel(BrinLeader *brinleader, BrinBuildState *state) >> [...] >> +BrinSpool *spool = state->bs_spool; >> [...] >> +if (!state) >> +return; > > I think the assignment to spool should be moved to below this > condition, as _brin_begin_parallel calls this with state=NULL when it > can't launch parallel workers, which will cause issues here. > Good catch! I wonder if we have tests that might trigger this, say by setting max_parallel_maintenance_workers > 0 while no workers allowed. >> +state->bs_numtuples = brinshared->indtuples; > > My IDE complains about bs_numtuples being an integer. This is a > pre-existing issue, but still valid: we can hit an overflow on tables > with pages_per_range=1 and relsize >= 2^31 pages. Extremely unlikely, > but problematic nonetheless. > True. I think I've been hesitant to make this a double because it seems a bit weird to do +1 with a double, and at some point (d == d+1). But this seems safe, we're guaranteed to be far away from that threshold. >> + * FIXME This probably needs some memory management fixes - we're >> reading >> + * tuples from the tuplesort, we're allocating an emty tuple, and so on. >> + * Probably better to release this memory. > > This should probably be resolved. > AFAICS that comment is actually inaccurate/stale, sorry about that. The code actually allocates (and resets) a single memtuple, and also emptyTuple. So I think that's OK, I've removed the comment. > I also noticed that this is likely to execute `union_tuples` many > times when pages_per_range is coprime with the parallel table scan's > block stride (or when we for other reasons have many tuples returned > for each range); and this `union_tuples` internally allocates and > deletes its own memory context for its deserialization of the 'b' > tuple. I think we should just pass a scratch context instead, so that > we don't have the overhead of
Re: autovectorize page checksum code included elsewhere
On Wed, Nov 22, 2023 at 02:54:13PM +0200, Ants Aasma wrote: > On Wed, 22 Nov 2023 at 11:44, John Naylor wrote: >> Poking in those files a bit, I also see references to building with >> SSE 4.1. Maybe that's an avenue that we should pursue? (an indirect >> function call is surely worth it for page-sized data) Yes, I think we should, but we also need to be careful not to hurt performance on platforms that aren't able to benefit [0] [1]. There are a couple of other threads about adding support for newer instructions [2] [3], and properly detecting the availability of these instructions seems to be a common obstacle. We have a path forward for stuff that's already using a runtime check (e.g., CRC32C), but I think we're still trying to figure out what to do for things that must be inlined (e.g., simd.h). One half-formed idea I have is to introduce some sort of ./configure flag that enables all the newer instructions that your CPU understands. It would also remove any existing runtime checks. This option would make it easy to take advantage of the newer instructions if you are building Postgres for only your machine (or others just like it). > For reference, executing the page checksum 10M times on a AMD 3900X CPU: > > clang-14 -O2 4.292s (17.8 GiB/s) > clang-14 -O2 -msse4.12.859s (26.7 GiB/s) > clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s) Nice. I've noticed similar improvements with AVX2 intrinsics in simd.h. [0] https://postgr.es/m/2613682.1698779776%40sss.pgh.pa.us [1] https://postgr.es/m/36329.1699325578%40sss.pgh.pa.us [2] https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com [3] https://postgr.es/m/db9pr08mb6991329a73923bf8ed4b3422f5...@db9pr08mb6991.eurprd08.prod.outlook.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: How to stop autovacuum silently
On Wed, Nov 22, 2023 at 8:18 AM Maxim Orlov wrote: > Recently, one of our customers had reported a not working autovacuum. After > a minor investigation, I've found that > autovacuum launcher did, actually, run vacuum as expected, but with no > results. At the same time, no warnings or > other anomies were present in the logs. Are you aware of commit e83ebfe6d7, which added a similar WARNING at the point when VACUUM overwrites a relfrozenxid/relminmxid "from the future"? It's a recent one. > At first, I've thought may be statistics is broken, thus vacuum is not > working as expected. But in fact, something > more interesting is had happened. Was pg_upgrade even run against this database? My guess is that the underlying problem was caused by the bug fixed by commit 74cf7d46. -- Peter Geoghegan
Re: psql not responding to SIGINT upon db reconnection
On Thu Nov 16, 2023 at 8:33 AM CST, Heikki Linnakangas wrote: On 06/11/2023 19:16, Tristan Partin wrote: >>> That sounds like a much better solution. Attached you will find a v4 >>> that implements your suggestion. Please let me know if there is >>> something that I missed. I can confirm that the patch works. This patch is missing a select(). It will busy loop until the connection is established or cancelled. If I add a wait (select, poll, etc.), then I can't control-C during the blocking call, so it doesn't really solve the problem. On Linux, we have signalfds which seem like the perfect solution to this problem, "wait on the pq socket or SIGINT." But that doesn't translate well to other operating systems :(. tristan957=> \c NOTICE: Welcome to Neon! Authenticate by visiting: https://console.neon.tech/psql_session/ ^CTerminated You can see here that I can't terminate the command. Where you see "Terminated" is me running `kill $(pgrep psql)` in another terminal. Shouldn't we also clear CancelRequested after we have cancelled the attempt? Otherwise, any subsequent attempts will immediately fail too. After switching to cancel_pressed, I don't think so. See this bit of code in the psql main loop: /* main loop to get queries and execute them */ while (successResult == EXIT_SUCCESS) { /* * Clean up after a previous Control-C */ if (cancel_pressed) { if (!pset.cur_cmd_interactive) { /* * You get here if you stopped a script with Ctrl-C. */ successResult = EXIT_USER; break; } cancel_pressed = false; } Should we use 'cancel_pressed' here rather than CancelRequested? To be honest, I don't understand the difference, so that's a genuine question. There was an attempt at unifying them in the past but it was reverted in commit 5d43c3c54d. The more I look at this, the more I don't understand... I need to look at this harder, but wanted to get this email out. Switched to cancel_pressed though. Thanks for pointing it out. Going to wait to send a v2 while more discussion occurs. -- Tristan Partin Neon (https://neon.tech)
Re: [ psql - review request ] review request for \d+ tablename, \d+ indexname indenting
Shlok Kyal wrote: > > The error was corrected and a new diff file was created. > > The diff file was created based on 16 RC1. > > We confirmed that 5 places where errors occurred when performing > > make check were changed to ok. Reviewing the patch, I see these two problems in the current version (File: psql-slashDplus-partition-indentation.diff, Date: 2023-09-19 00:19:34) * There are changes in the regression tests that do not concern this feature and should not be there. For instance this hunk: --- a/src/test/regress/expected/foreign_data.out +++ b/src/test/regress/expected/foreign_data.out @@ -742,8 +742,6 @@ COMMENT ON COLUMN ft1.c1 IS 'ft1.c1'; Check constraints: "ft1_c2_check" CHECK (c2 <> ''::text) "ft1_c3_check" CHECK (c3 >= '01-01-1994'::date AND c3 <= '01-31-1994'::date) -Not-null constraints: -"ft1_c1_not_null" NOT NULL "c1" Server: s0 FDW options: (delimiter ',', quote '"', "be quoted" 'value') It seems to undo a test for a recent feature adding "Not-null constraints" to \d, which suggests that you've been running tests against and older version than the source tree you're diffing against. These should be the same version, and also the latest one (git HEAD) or as close as possible to the latest when the patch is submitted. * The new query with \d on partitioned tables does not work with Postgres servers 12 or 13: postgres=# CREATE TABLE measurement ( city_id int not null, logdate date not null, peaktempint, unitsales int ) PARTITION BY RANGE (logdate); postgres=# \d measurement ERROR: syntax error at or near "." LINE 2: ... 0 AS level,c.relkind, false AS i.inhdetach... Setting the CommitFest status to WoA. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
How to stop autovacuum silently
Hi! Recently, one of our customers had reported a not working autovacuum. After a minor investigation, I've found that autovacuum launcher did, actually, run vacuum as expected, but with no results. At the same time, no warnings or other anomies were present in the logs. At first, I've thought may be statistics is broken, thus vacuum is not working as expected. But in fact, something more interesting is had happened. The pg_class.relfrozenxid was set to some rubbish value from the future, thus broken in template1 DB, so any new database will have it's broken too. Then, we create "blocker" DB and then in vac_update_datfrozenxid() we get "bogus" (from the future) value of relfrozenxid and *silently* return. Any other new created DB will not be autovacuumed. Funny, but from the perspective of DBA, this looks like autovacuum is not working any more for no reasons, although all the criterion for its launch is clearly observed. AFAICS, there are several solutions for this state: - run vacuumdb for all DB's - manually update broken pg_class.relfrozenxid - lowering of autovacuum_freeze_max_age to trigger prevent of transaction ID wraparound I do understand, this behaviour hardly can be described as a bug of some sort, but could we make, at least, a useful message to help to clarify what is going on here? === REPRODUCE === $ cat <> pgsql/data/postgresql.conf autovacuum_naptime = 1s autovacuum_freeze_max_age = 10 EOF $ ./pgsql/bin/pg_ctl -D pgsql/data -l pgsql/logfile start waiting for server to start done server started $ ./pgsql/bin/psql postgres psql (17devel) Type "help" for help. postgres=# \c template1 You are now connected to database "template1" as user "orlov". template1=# update pg_class set relfrozenxid='20' where oid = 1262; UPDATE 1 template1=# do $$ begin while 12 - txid_current()::text::int8 > 0 loop commit; end loop; end $$; DO template1=# create database blocker; CREATE DATABASE template1=# create database foo; CREATE DATABASE template1=# \c foo You are now connected to database "foo" as user "orlov". foo=# create table bar(baz int); CREATE TABLE foo=# insert into bar select bar from generate_series(1, 8192) bar; INSERT 0 8192 foo=# update bar set baz=baz; UPDATE 8192 foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count from pg_stat_user_tables where relname = 'bar'; relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | autovacuum_count -+---+---+---+++-+-+-- bar | 8192 | 8192 | 0 | 8192 | 8192 | | |0 (1 row) foo=# update bar set baz=baz; UPDATE 8192 foo=# select relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup, last_vacuum, last_autovacuum, autovacuum_count from pg_stat_user_tables where relname = 'bar'; relname | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | autovacuum_count -+---+---+---+++-+-+-- bar | 8192 | 16384 | 0 | 8192 | 16384 | | |0 (1 row) ... and so on -- Best regards, Maxim Orlov. 0001-Add-warning-if-datfrozenxid-or-datminmxid-is-not-set.patch Description: Binary data
Re: initdb --no-locale=C doesn't work as specified when the environment is not C
Kyotaro Horiguchi writes: > Commit 3e51b278db leaves lc_* conf lines as commented-out when > their value is "C". This leads to the following behavior. Hmm ... I see a contributing factor here: this bit in postgresql.conf.sample is a lie: #lc_messages = 'C' # locale for system error message # strings A look in guc_tables.c shows that the actual default is '' (empty string), which means "use the environment", and that matches how the variable is documented in config.sgml. Somebody --- quite possibly me --- was misled by the contents of postgresql.conf.sample into thinking that the lc_xxx GUCs all default to C, when that's only true for the others. I think that a more correct fix for this would treat lc_messages differently from the other lc_xxx GUCs. Maybe just eliminate the hack about not substituting "C" for that one? In any case, we need to fix this mistake in postgresql.conf.sample. regards, tom lane
Re: Adding facility for injection points (or probe points?) for more advanced tests
On Tue, Nov 21, 2023 at 6:56 AM Michael Paquier wrote: > > >> This facility is hidden behind a specific configure/Meson switch, > >> making it a no-op by default: > >> --enable-injection-points > >> -Dinjection_points={ true | false } > > > > That's useful, but we will also see demand to enable those in > > production (under controlled circumstances). But let the functionality > > mature under a separate flag and developer builds before used in > > production. > > Hmm. Okay. I'd still keep that under a compile switch for now > anyway to keep a cleaner separation and avoid issues where it would be > possible to inject code in a production build. Note that I was > planning to switch one of my buildfarm animals to use it should this > stuff make it into the tree. And people would be free to use it if > they want. If in production, that would be a risk, IMO. makes sense. Just to be clear - by "in production" I mean user installations - they may be testing environments or production environments. > > > +/* > > + * Local cache of injection callbacks already loaded, stored in > > + * TopMemoryContext. > > + */ > > +typedef struct InjectionPointArrayEntry > > +{ > > + char name[INJ_NAME_MAXLEN]; > > + InjectionPointCallback callback; > > +} InjectionPointArrayEntry; > > + > > +static InjectionPointArrayEntry *InjectionPointCacheArray = NULL; > > > > Initial size of this array is small, but given the size of code in a > > given path to be tested, I fear that this may not be sufficient. I > > feel it's better to use hash table whose APIs are already available. > > I've never seen in recent years a need for a given test to use more > than 4~5 points. But I guess that you've seen more than that wanted > in a prod environment with a fork of Postgres? A given case may not require more than 4 -5 points but users may create scripts with many frequently required injection points and install handlers for them. > > I'm OK to switch that to use a hash table in the initial > implementation, even for a low number of entries with points that are > not in hot code paths that should be OK. At least for my cases > related to testing that's OK. Am I right to assume that you mean a > HTAB in TopMemoryContext? Yes. > > > I am glad that it covers the frequently needed injections error, notice and > > wait. If someone adds multiple injection points for wait and all of them are > > triggered (in different backends), test_injection_points_wake() would > > wake them all. When debugging cascaded functionality it's not easy to > > follow sequence add, trigger, wake for multiple injections. We should > > associate a conditional variable with the required injection points. Attach > > should create conditional variable in the shared memory for that injection > > point and detach should remove it. I see this being mentioned in the commit > > message, but I think it's something we need in the first version of "wait" > > to > > be useful. > > More to the point, I actually disagree with that, because it could be > possible as well that the same condition variable is used across > multiple points. At the end, IMHO, the central hash table should hold > zero meta-data associated to an injection point like a counter, an > elog, a condition variable, a sleep time, etc. or any combination of > all these ones, and should just know about how to load a callback with > a library path and a routine name. I understand that this is leaving > the responsibility of what a callback should do down to the individual > developer implementing a callback into their own extension, where they > should be free to have conditions of their own. > > Something that I agree would be very useful for the in-core APIs, > depending on the cases, is to be able to push some information to the > callback at runtime to let a callback decide what to do depending on a > running state, including a condition registered when a point was > attached. See my argument about an _ARG macro that passes down to the > callback a (void *). The injection_run function is called from the place where the injection point is declared but that place does not know what injection function is going to be run. So a user can not pass arguments to injection declaration. What injection to run is decided by the injection_attach and thus one can pass arguments to it but then injection_attach stores the information in the shared memory from where it's picked up by injection_run. So even though you don't want to store the arguments in the shared memory, you are creating a design which takes us towards that direction eventually - otherwise users will end up writing many injection functions - one for each possible combination of count, sleep, conditional variable etc. But let's see whether that happens to be the case in practice. We will need to evolve this feature based on usage. > > > Those tests need to also install extension. That's another pain point. > > So anyone wants to run the tests
Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.
Daniel Gustafsson writes: >> On 22 Nov 2023, at 14:30, Aleksander Alekseev >> wrote: >> It sort of makes sense. I wonder though if we should limit the maximum >> number of iterations instead. If somebody specified 1_000_000+ >> iteration this could also indicate a user error. > I don't think it would be useful to limit this at an arbitrary point, > iteration > count can be set per password and if someone want a specific password to be > super-hard to brute force then why should we limit that? Maybe because it could be used to construct a DOS scenario? In particular, since CHECK_FOR_INTERRUPTS doesn't work on the frontend side, a situation like this wouldn't be interruptible there. I agree with Aleksander that such cases are much more likely to indicate user error than anything else. regards, tom lane
Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.
> On 22 Nov 2023, at 14:30, Aleksander Alekseev > wrote: > > Hi, > >> When the scram_iterations value is set too large, the backend would hang for >> a long time. And we can't use Ctrl+C to cancel this query, cause the loop >> don't >> process signal interrupts. >> >> Add CHECK_FOR_INTERRUPTS within the loop of scram_SaltedPassword >> to handle any signals received during this period may be a good choice. >> >> I wrote a patch to solve this problem. What's your suggestions? > > Thanks for the patch. > > It sort of makes sense. I wonder though if we should limit the maximum > number of iterations instead. If somebody specified 1_000_000+ > iteration this could also indicate a user error. I don't think it would be useful to limit this at an arbitrary point, iteration count can be set per password and if someone want a specific password to be super-hard to brute force then why should we limit that? > If we want to add CHECK_FOR_INTERRUPTS inside the loop I think a brief > comment would be appropriate. Agreed, it would be helpful. -- Daniel Gustafsson
Re: How to accurately determine when a relation should use local buffers?
Hi, > I would propose not to associate temporary relations with local buffers The whole point of why local buffers exist is to place the buffers of temp tables into MemoryContexts so that these tables will not fight for the locks for shared buffers with the rest of the system. If we start treating them as regular tables this will cause a severe performance degradation. I doubt that such a patch will make it. I sort of suspect that you are working on a very specific extension and/or feature for PG fork. Any chance you could give us more details about the case? -- Best regards, Aleksander Alekseev
Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.
Hi, > When the scram_iterations value is set too large, the backend would hang for > a long time. And we can't use Ctrl+C to cancel this query, cause the loop > don't > process signal interrupts. > > Add CHECK_FOR_INTERRUPTS within the loop of scram_SaltedPassword > to handle any signals received during this period may be a good choice. > > I wrote a patch to solve this problem. What's your suggestions? Thanks for the patch. It sort of makes sense. I wonder though if we should limit the maximum number of iterations instead. If somebody specified 1_000_000+ iteration this could also indicate a user error. If we want to add CHECK_FOR_INTERRUPTS inside the loop I think a brief comment would be appropriate. -- Best regards, Aleksander Alekseev
Re: remaining sql/json patches
On Wed, Nov 22, 2023 at 3:09 PM Amit Langote wrote: > The last line in the chart I sent in the last email now look like this: > > 17-sqljson 670262 2.57 2640912 1.34 > > meaning the gram.o text size changes by 2.57% as opposed to 2.97% > before your fixes. Andrew asked off-list what the percent increase is compared to 17dev HEAD. It's 1.27% (was 1.66% with the previous version). -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Change GUC hashtable to use simplehash?
I wrote: > Thinking some more, I'm not quite comfortable with the number of > places in these patches that have to know about the pre-downcased > strings, or whether we need that in the first place. If lower case is > common enough to optimize for, it seems the equality function can just > check strict equality on the char and only on mismatch try downcasing > before returning false. Doing our own function would allow the > compiler to inline it, or at least keep it on the same page. Further, > the old hash function shouldn't need to branch to do the same > downcasing, since hashing is lossy anyway. In the keyword hashes, we > just do "*ch |= 0x20", which downcases letters and turns undercores to > DEL. I can take a stab at that later. v4 is a quick POC for that. I haven't verified that it's correct for the case of the probe and the entry don't match, but in case it doesn't it should be easy to fix. I also didn't bother with SH_STORE_HASH in my testing. 0001 adds the murmur32 finalizer -- we should do that regardless of anything else in this thread. 0002 is just Jeff's 0001 0003 adds an equality function that downcases lazily, and teaches the hash function about the 0x20 trick. master: latency average = 581.765 ms v3 0001-0005: latency average = 544.576 ms v4 0001-0003: latency average = 547.489 ms This gives similar results with a tiny amount of code (excluding the simplehash conversion). I didn't check if the compiler inlined these functions, but we can hint it if necessary. We could use the new equality function in all the call sites that currently test for "guc_name_compare() == 0", in which case it might not end up inlined, but that's probably okay. We could also try to improve the hash function's collision behavior by collecting the bytes on a uint64 and calling our new murmur64 before returning the lower half, but that's speculative. From 1a516bb341afb72680470897d75c1d23f75fb37e Mon Sep 17 00:00:00 2001 From: John Naylor Date: Wed, 22 Nov 2023 17:28:41 +0700 Subject: [PATCH v4 1/3] Add finalizer to guc_name_hash --- src/backend/utils/misc/guc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 82d8efbc96..e3834d52ee 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -33,6 +33,7 @@ #include "catalog/objectaccess.h" #include "catalog/pg_authid.h" #include "catalog/pg_parameter_acl.h" +#include "common/hashfn.h" #include "guc_internal.h" #include "libpq/pqformat.h" #include "parser/scansup.h" @@ -1339,7 +1340,7 @@ guc_name_hash(const void *key, Size keysize) result = pg_rotate_left32(result, 5); result ^= (uint32) ch; } - return result; + return murmurhash32(result); } /* -- 2.42.0 From 01b053b473d897c71725a7bb09a3127fc78140dc Mon Sep 17 00:00:00 2001 From: John Naylor Date: Wed, 22 Nov 2023 18:45:48 +0700 Subject: [PATCH v4 3/3] Optimize GUC functions for simple hash Only downcase a character when an equality check fails, since we expect most names to be lower case. Also simplify downcasing in the hash function. --- src/backend/utils/misc/guc.c | 40 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index bf05b022c3..7896deb63b 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -229,6 +229,7 @@ static int GUCNestLevel = 0; /* 1 when in main transaction */ static int guc_var_compare(const void *a, const void *b); +static bool guc_name_eq(const char *namea, const char *nameb); static uint32 guc_name_hash(const char *name); static void InitializeGUCOptionsFromEnvironment(void); static void InitializeOneGUCOption(struct config_generic *gconf); @@ -271,7 +272,7 @@ static bool call_enum_check_hook(struct config_enum *conf, int *newval, #define SH_KEY_TYPE const char * #define SH_KEY gucname #define SH_HASH_KEY(tb, key) guc_name_hash(key) -#define SH_EQUAL(tb, a, b) (guc_name_compare(a, b) == 0) +#define SH_EQUAL(tb, a, b) (guc_name_eq(a, b)) #define SH_SCOPE static inline #define SH_DECLARE #define SH_DEFINE @@ -1308,6 +1309,38 @@ guc_name_compare(const char *namea, const char *nameb) return 0; } +static bool +guc_name_eq(const char *namea, const char *nameb) +{ + char cha; + char chb; + + while (*namea && *nameb) + { + cha = *namea++; + chb = *nameb++; + + if (cha != chb) + { + /* Casefold lazily since we expect lower case */ + if (cha >= 'A' && cha <= 'Z') +cha += 'a' - 'A'; + if (chb >= 'A' && chb <= 'Z') +chb += 'a' - 'A'; + + if (cha != chb) +return false; + } + } + + if (*namea == *nameb) + return true; + else + return false; + + //Assert(guc_name_compare(namea, nameb) == 0); +} + /* * Hash function that's compatible with guc_name_compare */ @@ -1320,9 +1353,8 @@ guc_name_hash(const char *name) { char ch = *name++; - /* Case-fold in the same way as
Re: WaitEventSet resource leakage
20.11.2023 00:09, Thomas Munro wrote: On Fri, Nov 17, 2023 at 12:22 AM Heikki Linnakangas wrote: And here is a patch to implement that on master. Rationale and code look good to me. I can also confirm that the patches proposed (for master and back branches) eliminate WES leakage as expected. Thanks for the fix! Maybe you would find appropriate to add the comment /* Convenience wrappers over ResourceOwnerRemember/Forget */ above ResourceOwnerRememberWaitEventSet just as it's added above ResourceOwnerRememberRelationRef, ResourceOwnerRememberDSM, ResourceOwnerRememberFile, ... (As a side note, this fix doesn't resolve the issue #17828 completely, because that large number of handles might be also consumed legally.) Best regards, Alexander
Re: autovectorize page checksum code included elsewhere
On Wed, 22 Nov 2023 at 11:44, John Naylor wrote: > > On Tue, Nov 7, 2023 at 9:47 AM Nathan Bossart > wrote: > > > > Presently, we ask compilers to autovectorize checksum.c and numeric.c. The > > page checksum code actually lives in checksum_impl.h, and checksum.c just > > includes it. But checksum_impl.h is also used in pg_upgrade/file.c and > > pg_checksums.c, and since we don't ask compilers to autovectorize those > > files, the page checksum code may remain un-vectorized. > > Poking in those files a bit, I also see references to building with > SSE 4.1. Maybe that's an avenue that we should pursue? (an indirect > function call is surely worth it for page-sized data) For reference, executing the page checksum 10M times on a AMD 3900X CPU: clang-14 -O2 4.292s (17.8 GiB/s) clang-14 -O2 -msse4.12.859s (26.7 GiB/s) clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s) -- Ants Aasma Senior Database Engineer www.cybertec-postgresql.com
Re: pipe_read_line for reading arbitrary strings
On 2023-Mar-07, Daniel Gustafsson wrote: > The attached POC diff replace fgets() with pg_get_line(), which may not be an > Ok way to cross the streams (it's clearly not a great fit), but as a POC it > provided a neater interface for reading one-off lines from a pipe IMO. Does > anyone else think this is worth fixing before too many callsites use it, or is > this another case of my fear of silent subtle truncation bugs? =) I think this is generally a good change. I think pipe_read_line should have a "%m" in the "no data returned" error message. pg_read_line is careful to retain errno (and it was already zero at start), so this should be okay ... or should we set errno again to zero after popen(), even if it works? (I'm not sure I buy pg_read_line's use of perror in the backend case. Maybe this is only okay because the backend doesn't use this code?) pg_get_line says caller can distinguish error from no-input-before-EOF with ferror(), but pipe_read_line does no such thing. I wonder what happens if an NFS mount containing a file being read disappears in the middle of reading it, for example ... will we think that we completed reading the file, rather than erroring out? -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "That sort of implies that there are Emacs keystrokes which aren't obscure. I've been using it daily for 2 years now and have yet to discover any key sequence which makes any sense."(Paul Thomas)
meson vs Cygwin
I've been trying to get meson working with Cygwin. On a fresh install (Cygwin 3.4.9, gcc 11.4.0, meson 1.0.2, ninja 1.11.1) I get a bunch of errors like this: ERROR: incompatible library "/home/andrew/bf/buildroot/HEAD/pgsql.build/tmp_install/home/andrew/bf/buildroot/HEAD/inst/lib/postgresql/plperl.dll": missing magic block Similar things happen if I try to build with python. I'm not getting the same on a configure/make build. Not sure what would be different. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pg_upgrade and logical replication
On Wed, 22 Nov 2023 at 06:48, Peter Smith wrote: > == > doc/src/sgml/ref/pgupgrade.sgml > > 1. > + > + Create all the new tables that were created in the publication during > + upgrade and refresh the publication by executing > + ALTER > SUBSCRIPTION ... REFRESH PUBLICATION. > + > > "Create all ... that were created" sounds a bit strange. > > SUGGESTION (maybe like this or similar?) > Create equivalent subscriber tables for anything that became newly > part of the publication during the upgrade and Modified > == > src/bin/pg_dump/pg_dump.c > > 2. getSubscriptionTables > > +/* > + * getSubscriptionTables > + * Get information about subscription membership for dumpable tables. This > + *will be used only in binary-upgrade mode. > + */ > +void > +getSubscriptionTables(Archive *fout) > +{ > + DumpOptions *dopt = fout->dopt; > + SubscriptionInfo *subinfo = NULL; > + SubRelInfo *subrinfo; > + PQExpBuffer query; > + PGresult *res; > + int i_srsubid; > + int i_srrelid; > + int i_srsubstate; > + int i_srsublsn; > + int ntups; > + Oid last_srsubid = InvalidOid; > + > + if (dopt->no_subscriptions || !dopt->binary_upgrade || > + fout->remoteVersion < 17) > + return; > > This function comment says "used only in binary-upgrade mode." and the > Assert says the same. But, is this compatible with the other function > dumpSubscriptionTable() where it says "used only in binary-upgrade > mode and for PG17 or later versions"? > Modified > == > src/bin/pg_upgrade/check.c > > 3. check_new_cluster_subscription_configuration > > +static void > +check_new_cluster_subscription_configuration(void) > +{ > + PGresult *res; > + PGconn*conn; > + int nsubs_on_old; > + int max_replication_slots; > + > + /* Logical slots can be migrated since PG17. */ > + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600) > + return; > > IMO it is better to say < 1700 in this check, instead of <= 1600. > Modified > ~~~ > > 4. > + /* Quick return if there are no subscriptions to be migrated */ > + if (nsubs_on_old == 0) > + return; > > Missing period in comment. > Modified > ~~~ > > 5. > +/* > + * check_old_cluster_subscription_state() > + * > + * Verify that each of the subscriptions has all their corresponding tables > in > + * i (initialize), r (ready) or s (synchronized) state. > + */ > +static void > +check_old_cluster_subscription_state(ClusterInfo *cluster) > > This function is only for the old cluster (hint: the function name) so > there is no need to pass the 'cluster' parameter here. Just directly > use old_cluster in the function body. > Modified > == > src/bin/pg_upgrade/t/004_subscription.pl > > 6. > +# Add tab_not_upgraded1 to the publication. Now publication has tab_upgraded1 > +# and tab_upgraded2 tables. > +$publisher->safe_psql('postgres', > + "ALTER PUBLICATION regress_pub ADD TABLE tab_upgraded2"); > > Typo in comment. You added tab_not_upgraded2, not tab_not_upgraded1 > Modified > ~~ > > 7. > +# Subscription relations should be preserved. The upgraded won't know > +# about 'tab_not_upgraded1' because the subscription is not yet refreshed. > > Typo or missing word in comment? > > "The upgraded" ?? > Modified Attached the v17 patch which have the same changes Thanks, Shlok Kumar Kyal v17-0001-Preserve-the-full-subscription-s-state-during-pg.patch Description: Binary data
[PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.
Hi, hackers When the scram_iterations value is set too large, the backend would hang for a long time. And we can't use Ctrl+C to cancel this query, cause the loop don't process signal interrupts. Add CHECK_FOR_INTERRUPTS within the loop of scram_SaltedPassword to handle any signals received during this period may be a good choice. I wrote a patch to solve this problem. What's your suggestions? Dears Bowen Shi 0001-Add-CHECK_FOR_INTERRUPTS-in-scram_SaltedPassword-loo.patch Description: Binary data
Re: Changing baserel to foreignrel in postgres_fdw functions
On Wed, Nov 22, 2023 at 3:27 AM Bruce Momjian wrote: > > Should this patch be applied? I think so. > --- > > On Thu, Feb 15, 2018 at 06:57:50PM +0530, Ashutosh Bapat wrote: > > Hi, > > I noticed that functions is_foreign_expr(), classifyConditions() and > > appendOrderByClause() had variables/arguments named baserel when the > > relations passed to those could be join or upper relation as well. > > Here's patch renaming those as foreignrel. > The patch is more than 5 years old. So it might need adjustments. E.g. the appendOrderByClause() does not require the change anymore. But the other two functions still require the changes. There may be other new places that require change. I have not checked that. If we are accepting this change, I can update the patch. -- Best Wishes, Ashutosh
Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
On 11/22/23 11:38, Amit Kapila wrote: > On Tue, Nov 21, 2023 at 6:56 PM Tomas Vondra > wrote: >> >> On 11/21/23 14:16, Amit Kapila wrote: >>> On Tue, Nov 21, 2023 at 5:17 PM Tomas Vondra >>> wrote: >>> >>> It seems there is some inconsistency in what you have written for >>> client backends/tablesync worker vs. apply worker. The above text >>> seems to be saying that the client backend and table sync worker are >>> waiting on a "subscription row in pg_subscription" and the apply >>> worker is operating on "pg_subscription_rel". So, if that is true then >>> they shouldn't get stuck. >>> >>> I think here client backend and tablesync worker seems to be blocked >>> for a lock on pg_subscription_rel. >>> >> >> Not really, they are all locking the subscription. All the locks are on >> classid=6100, which is pg_subscription: >> >> test=# select 6100::regclass; >> regclass >> - >>pg_subscription >> (1 row) >> >> The thing is, the tablesync workers call UpdateSubscriptionRelState, >> which locks the pg_subscription catalog at the very beginning: >> >>LockSharedObject(SubscriptionRelationId, ...); >> >> So that's the issue. I haven't explored why it's done this way, and >> there's no comment explaining locking the subscriptions is needed ... >> > > I think it prevents concurrent drop of rel during the REFRESH operation. > Yes. Or maybe some other concurrent DDL on the relations included in the subscription. The tablesync workers can't proceed because their lock request is stuck behind the AccessExclusiveLock request. And the apply worker can't proceed, because it's waiting for status update from the tablesync workers. >>> >>> This part is not clear to me because >>> wait_for_relation_state_change()->GetSubscriptionRelState() seems to >>> be releasing the lock while closing the relation. Am, I missing >>> something? >>> >> >> I think you're missing the fact that GetSubscriptionRelState() acquires >> and releases the lock on pg_subscription_rel, but that's not the lock >> causing the issue. The problem is the lock on the pg_subscription row. >> > > Okay. IIUC, what's going on here is that the apply worker acquires > AccessShareLock on pg_subscription to update rel state for one of the > tables say tbl-1, and then for another table say tbl-2, it started > waiting for a state change via wait_for_relation_state_change(). I > think here the fix is to commit the transaction before we go for a > wait. I guess we need something along the lines of what is proposed in > [1] though we have solved the problem in that thread in some other > way.. > Possibly. I haven't checked if the commit might have some unexpected consequences, but I can confirm I can no longer reproduce the deadlock with the patch applied. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
On Tue, Nov 21, 2023 at 6:56 PM Tomas Vondra wrote: > > On 11/21/23 14:16, Amit Kapila wrote: > > On Tue, Nov 21, 2023 at 5:17 PM Tomas Vondra > > wrote: > >> > > > > It seems there is some inconsistency in what you have written for > > client backends/tablesync worker vs. apply worker. The above text > > seems to be saying that the client backend and table sync worker are > > waiting on a "subscription row in pg_subscription" and the apply > > worker is operating on "pg_subscription_rel". So, if that is true then > > they shouldn't get stuck. > > > > I think here client backend and tablesync worker seems to be blocked > > for a lock on pg_subscription_rel. > > > > Not really, they are all locking the subscription. All the locks are on > classid=6100, which is pg_subscription: > > test=# select 6100::regclass; > regclass > - >pg_subscription > (1 row) > > The thing is, the tablesync workers call UpdateSubscriptionRelState, > which locks the pg_subscription catalog at the very beginning: > >LockSharedObject(SubscriptionRelationId, ...); > > So that's the issue. I haven't explored why it's done this way, and > there's no comment explaining locking the subscriptions is needed ... > I think it prevents concurrent drop of rel during the REFRESH operation. > >> The tablesync workers can't proceed because their lock request is stuck > >> behind the AccessExclusiveLock request. > >> > >> And the apply worker can't proceed, because it's waiting for status > >> update from the tablesync workers. > >> > > > > This part is not clear to me because > > wait_for_relation_state_change()->GetSubscriptionRelState() seems to > > be releasing the lock while closing the relation. Am, I missing > > something? > > > > I think you're missing the fact that GetSubscriptionRelState() acquires > and releases the lock on pg_subscription_rel, but that's not the lock > causing the issue. The problem is the lock on the pg_subscription row. > Okay. IIUC, what's going on here is that the apply worker acquires AccessShareLock on pg_subscription to update rel state for one of the tables say tbl-1, and then for another table say tbl-2, it started waiting for a state change via wait_for_relation_state_change(). I think here the fix is to commit the transaction before we go for a wait. I guess we need something along the lines of what is proposed in [1] though we have solved the problem in that thread in some other way.. [1] - https://www.postgresql.org/message-id/1412708.1674417574%40sss.pgh.pa.us -- With Regards, Amit Kapila.
Re: How to accurately determine when a relation should use local buffers?
Hi Aleksander, Thank you for your answers. It seems, local buffers are used for temporary relations unconditionally. In this case, we may check either relpersistence or backend id, or both of them. I didn't do a deep investigation of the code in this particular aspect but that could be a fair point. Would you like to propose a refactoring that unifies the way we check if the relation is temporary?I would propose not to associate temporary relations with local buffers. I would say, that we that we should choose local buffers only in a backend context. It is the primary condition. Thus, to choose local buffers, two checks should be succeeded: * relpersistence (RelationUsesLocalBuffers) * backend id (SmgrIsTemp)I know, it may be not as effective as to check relpersistence only, but it makes the internal architecture more flexible, I believe. With best regards, Vitaly Davydov
RE: CRC32C Parallel Computation Optimization on ARM
On Date: Fri, 10 Nov 2023 10:36:08AM -0600, Nathan Bossart wrote: >-# all versions of pg_crc32c_armv8.o need CFLAGS_CRC >-pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC) >-pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC) >-pg_crc32c_armv8_srv.o: CFLAGS+=$(CFLAGS_CRC) > >Why are these lines deleted? > >- ['pg_crc32c_armv8', 'USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 'crc'], >+ ['pg_crc32c_armv8', 'USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK'], > >What is the purpose of this change? Because I added `__attribute__((target("+crc+crypto")))` before the functions that require crc extension and crypto extension, so they are removed here. >+__attribute__((target("+crc+crypto"))) > >I'm not sure we can assume that all compilers will understand this, and I'm >not sure we need it. CFLAGS_CRC is "-march=armv8-a+crc". Generally, if -march is supported, __attribute__ is also supported. In addition, I am not sure about the source file pg_crc32c_armv8.c, if CFLAGS_CRC and CFLAGS_CRYPTO are needed at the same time, how should it be expressed in the makefile? IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. 0007-crc32c-parallel-computation-optimization-on-arm.patch Description: 0007-crc32c-parallel-computation-optimization-on-arm.patch
RE: Partial aggregates pushdown
Hi Mr. Haas, hackers. Thank you for your thoughtful comments. > From: Robert Haas > Sent: Tuesday, November 21, 2023 5:52 AM > I do have a concern about this, though. It adds a lot of bloat. It adds a > whole lot of additional entries to pg_aggregate, and > every new aggregate we add in the future will require a bonus entry for this, > and it needs a bunch of new pg_proc entries > as well. One idea that I've had in the past is to instead introduce syntax > that just does this, without requiring a separate > aggregate definition in each case. > For example, maybe instead of changing string_agg(whatever) to > string_agg_p_text_text(whatever), you can say > PARTIAL_AGGREGATE > string_agg(whatever) or string_agg(PARTIAL_AGGREGATE whatever) or something. > Then all aggregates could be treated > in a generic way. I'm not completely sure that's better, but I think it's > worth considering. I believe this comment addresses a fundamental aspect of the approach. So, firstly, could we discuss whether we should fundamentally reconsider the approach? The approach adopted in this patch is as follows. Approach 1: Adding partial aggregation functions to the catalogs(pg_aggregate, pg_proc) The approach proposed by Mr.Haas is as follows. Approach 2: Adding a keyword to the SQL syntax to indicate partial aggregation requests The amount of code required to implement Approach 2 has not been investigated, but comparing Approach 1 and Approach 2 in other aspects, I believe they each have the following advantages and disadvantages. 1. Approach 1 (1) Advantages (a) No need to change the SQL syntax (2) Disadvantages (a) Catalog bloat As Mr.Haas pointed out, the catalog will bloat by adding partial aggregation functions (e.g. avg_p_int8(int8)) for each individual aggregate function (e.g. avg(int8)) in pg_aggregate and pg_proc (theoretically doubling the size). Some PostgreSQL developers and users may find this uncomfortable. (b) Increase in manual procedures Developers of new aggregate functions (both built-in and user-defined) need to manually add the partial aggregation functions when defining the aggregate functions. However, the procedure for adding partial aggregation functions for a certain aggregate function can be automated, so this problem can be resolved by improving the patch. The automation method involves the core part (AggregateCreate() and related functions) that executes the CREATE AGGREGATE command for user-defined functions. For built-in functions, it involves generating the initial data for the pg_aggregate catalog and pg_proc catalog from pg_aggregate.dat and pg_proc.dat (using the genbki.pl script and related scripts). 2. Approach 2 (1) Advantages (a) No need to add partial aggregate functions to the catalogs for each aggregation (2) Disadvantages (a) Need to add non-standard keywords to the SQL syntax. I did not choose Approach2 because I was not confident that the disadvantage mentioned in 2.(2)(a) would be accepted by the PostgreSQL development community. If it is accepted, I think Approach 2 is smarter. Could you please provide your opinion on which approach is preferable after comparing these two approaches? If we cannot say anything without comparing the amount of source code, as Mr.Momjian mentioned, we need to estimate the amount of source code required to implement Approach2. Sincerely yours, Yuuki Fujii -- Yuuki Fujii Information Technology R Center Mitsubishi Electric Corporation
Re: Output affected rows in EXPLAIN
On Wed, Nov 15, 2023 at 2:17 PM wrote: > > We can check the number of updated rows from execute plan, > I think there is no need to return the command tag > when EXPLAIN(ANALYZE) is executed by default. Given that several people have voiced an opinion against this patch, I'm marking it rejected.
Re: Extension Enhancement: Buffer Invalidation in pg_buffercache
Le 01/07/2023 à 00:09, Thomas Munro a écrit : On Fri, Jun 30, 2023 at 10:47 PM Palak Chaturvedi wrote: We also talked a bit about how one might control the kernel page cache in more fine-grained ways for testing purposes, but it seems like the pgfincore project has that covered with its pgfadvise_willneed() and pgfadvise_dontneed(). IMHO that project could use more page-oriented operations (instead of just counts and coarse grains operations) but that's something that could be material for patches to send to the extension maintainers. This work, in contrast, is more tangled up with bufmgr.c internals, so it feels like this feature belongs in a core contrib module. Precisely what pgfincore is doing/offering already. Happy to propose to postgresql tree if there are interest. Next step for pgfincore is to add cachestat() syscall and evaluates benefits for PostgreSQL cost estimators of this new call. Here an example to achieve the warm/unwarm, each bit is a PostgreSQL page, so here we warm cache with the first 3 and remove the last 3 from cache (system cache, not shared buffers). -- Loading and Unloading cedric=# select * from pgfadvise_loader('pgbench_accounts', 0, true, true, B'111000'); relpath | os_page_size | os_pages_free | pages_loaded | pages_unloaded --+--+---+--+ base/11874/16447 | 4096 |408376 |3 | 3 --- Cédric Villemain +33 (0)6 20 30 22 52 https://Data-Bene.io PostgreSQL Expertise, Support, Training, R
Re: autovectorize page checksum code included elsewhere
On Tue, Nov 7, 2023 at 9:47 AM Nathan Bossart wrote: > > Presently, we ask compilers to autovectorize checksum.c and numeric.c. The > page checksum code actually lives in checksum_impl.h, and checksum.c just > includes it. But checksum_impl.h is also used in pg_upgrade/file.c and > pg_checksums.c, and since we don't ask compilers to autovectorize those > files, the page checksum code may remain un-vectorized. Poking in those files a bit, I also see references to building with SSE 4.1. Maybe that's an avenue that we should pursue? (an indirect function call is surely worth it for page-sized data)
Re: Lockless exit path for ReplicationOriginExitCleanup
Hello, On 2023-Nov-22, Bharath Rupireddy wrote: > While looking at the use of session_replication_state, I noticed that > ReplicationOriginLock is acquired in ReplicationOriginExitCleanup() > even if session_replication_state is reset to NULL by > replorigin_session_reset(). Why can't there be a lockless exit path > something like [1] similar to > replorigin_session_reset() which checks session_replication_state == > NULL without a lock? I suppose we can do this on consistency grounds -- I'm pretty sure you'd have a really hard time proving that this makes a performance difference -- but this patch is incomplete: just two lines below, we're still testing session_replication_state for nullness, which would now be dead code. Please repair. The comment on session_replication_state is confusing also: /* * Backend-local, cached element from ReplicationState for use in a backend * replaying remote commits, so we don't have to search ReplicationState for * the backends current RepOriginId. */ My problem with it is that this is not a "cached element", but instead a "cached pointer to [shared memory]". This is what makes testing that pointer for null-ness doable, but access to each member therein requiring lwlock acquisition. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Once again, thank you and all of the developers for your hard work on PostgreSQL. This is by far the most pleasant management experience of any database I've worked on." (Dan Harris) http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php
Commitfest 2023-11 update 2
We're about 2/3 of the way through. At start: Needs review: 210. Waiting on Author: 42. Ready for Committer: 29. Committed: 55. Withdrawn: 10. Returned with Feedback: 1. Total: 347. Today: Needs review: 183. Waiting on Author: 45. Ready for Committer: 25. Committed: 76. Returned with Feedback: 4. Withdrawn: 13. Rejected: 1. Total: 347. The pace seems to have picked up a bit, based on number of commits. -- John Naylor
Re: Lockless exit path for ReplicationOriginExitCleanup
On Wed, Nov 22, 2023 at 2:12 PM Bharath Rupireddy wrote: > > While looking at the use of session_replication_state, I noticed that > ReplicationOriginLock is acquired in ReplicationOriginExitCleanup() > even if session_replication_state is reset to NULL by > replorigin_session_reset(). Why can't there be a lockless exit path > something like [1] similar to > replorigin_session_reset() which checks session_replication_state == > NULL without a lock? > I don't see any problem with such a check but not sure of the benefit of doing so either. -- With Regards, Amit Kapila.
Re: Stop the search once replication origin is found
On Mon, Nov 20, 2023 at 4:36 PM Amit Kapila wrote: > > On Mon, Nov 20, 2023 at 2:36 PM Antonin Houska wrote: > > > > Although it's not performance-critical, I think it just makes sense to break > > the loop in replorigin_session_setup() as soon as we've found the origin. > > > > Your proposal sounds reasonable to me. > Pushed, thanks for the patch! -- With Regards, Amit Kapila.
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Wed, Nov 22, 2023 at 1:30 PM John Naylor wrote: > > On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila wrote: > > > > On Wed, Nov 8, 2023 at 11:05 PM vignesh C wrote: > > > > > > On Wed, 8 Nov 2023 at 08:43, vignesh C wrote: > > > > > > Here is a small improvisation where num_slots need not be initialized > > > as it will be used only after assigning the result now. The attached > > > patch has the changes for the same. > > > > > > > Pushed! > > Hi all, the CF entry for this is marked RfC, and CI is trying to apply > the last patch committed. Is there further work that needs to be > re-attached and/or rebased? > No. I have marked it as committed. -- With Regards, Amit Kapila.
Re: pipe_read_line for reading arbitrary strings
On Mon, Sep 25, 2023 at 2:55 PM Daniel Gustafsson wrote: > > Fixed, along with commit message wordsmithing in the attached. Unless > objected > to I'll go ahead with this version. +1
Lockless exit path for ReplicationOriginExitCleanup
Hi, While looking at the use of session_replication_state, I noticed that ReplicationOriginLock is acquired in ReplicationOriginExitCleanup() even if session_replication_state is reset to NULL by replorigin_session_reset(). Why can't there be a lockless exit path something like [1] similar to replorigin_session_reset() which checks session_replication_state == NULL without a lock? [1] diff --git a/src/backend/replication/logical/origin.c b/src/backend/replication/logical/origin.c index 460e3dcc38..99bbe90f6c 100644 --- a/src/backend/replication/logical/origin.c +++ b/src/backend/replication/logical/origin.c @@ -1056,6 +1056,9 @@ ReplicationOriginExitCleanup(int code, Datum arg) { ConditionVariable *cv = NULL; + if (session_replication_state == NULL) + return; + LWLockAcquire(ReplicationOriginLock, LW_EXCLUSIVE); if (session_replication_state != NULL && -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: remaining sql/json patches
On Fri, Nov 17, 2023 at 6:40 PM Alvaro Herrera wrote: > On 2023-Nov-17, Amit Langote wrote: > > > On Fri, Nov 17, 2023 at 4:27 PM jian he wrote: > > > > some enum declaration, ending element need an extra comma? > > > > Didn't know about the convention to have that comma, but I can see it > > is present in most enum definitions. > > It's new. See commit 611806cd726f. I see, thanks. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Evaluate arguments of correlated SubPlans in the referencing ExprState
On Tue, Oct 10, 2023 at 10:00 AM Andres Freund wrote: > > Hi, > > On 2023-10-01 14:53:23 -0400, Tom Lane wrote: > > Peter Eisentraut writes: > > > Is this patch still being worked on? > > > > I thought Andres simply hadn't gotten back to it yet. > > It still seems like a worthwhile improvement. > > Indeed - I do plan to commit it. I haven't quite shifted into v17 mode yet... Any shift yet? ;-)
Re: Unlinking Parallel Hash Join inner batch files sooner
On Wed, Sep 27, 2023 at 11:42 PM Heikki Linnakangas wrote: > > Looks good to me too at a quick glance. There's this one "XXX free" > comment though: > > > for (int i = 1; i < old_nbatch; ++i) > > { > > ParallelHashJoinBatch *shared = > > NthParallelHashJoinBatch(old_batches, i); > > SharedTuplestoreAccessor *accessor; > > > > accessor = sts_attach(ParallelHashJoinBatchInner(shared), > > > > ParallelWorkerNumber + 1, > > >fileset); > > sts_dispose(accessor); > > /* XXX free */ > > } > > I think that's referring to the fact that sts_dispose() doesn't free the > 'accessor', or any of the buffers etc. that it contains. That's a > pre-existing problem, though: ExecParallelHashRepartitionRest() already > leaks the SharedTuplestoreAccessor structs and their buffers etc. of the > old batches. I'm a little surprised there isn't aready an sts_free() > function. > > Another thought is that it's a bit silly to have to call sts_attach() > just to delete the files. Maybe sts_dispose() should take the same three > arguments that sts_attach() does, instead. > > So that freeing would be nice to tidy up, although the amount of memory > leaked is tiny so might not be worth it, and it's a pre-existing issue. > I'm marking this as Ready for Committer. (I thought I'd go around and nudge CF entries where both author and reviewer are committers.) Hi Thomas, do you have any additional thoughts on the above?
Re: remaining sql/json patches
On Wed, Nov 22, 2023 at 4:37 PM Andres Freund wrote: > On 2023-11-22 15:09:36 +0900, Amit Langote wrote: > > OK, I will keep polishing 0001-0003 with the intent to push it next > > week barring objections / damning findings. > > I don't think the patchset is quite there yet. It's definitely getting closer > though! I'll try to do another review next week. That would be great, thank you. I'll post an update on Friday. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: [PoC] pg_upgrade: allow to upgrade publisher node
On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila wrote: > > On Wed, Nov 8, 2023 at 11:05 PM vignesh C wrote: > > > > On Wed, 8 Nov 2023 at 08:43, vignesh C wrote: > > > > Here is a small improvisation where num_slots need not be initialized > > as it will be used only after assigning the result now. The attached > > patch has the changes for the same. > > > > Pushed! Hi all, the CF entry for this is marked RfC, and CI is trying to apply the last patch committed. Is there further work that needs to be re-attached and/or rebased?