Re: Identify missing publications from publisher while create/alter subscription.
On Wed, Nov 10, 2021 at 11:16 AM Bharath Rupireddy wrote: > > On Tue, Nov 9, 2021 at 9:27 PM vignesh C wrote: > > Attached v12 version is rebased on top of Head. > > Thanks for the patch. Here are some comments on v12: > > 1) I think ERRCODE_TOO_MANY_ARGUMENTS isn't the right error code, the > ERRCODE_UNDEFINED_OBJECT is more meaningful. Please change. > + ereport(ERROR, > + (errcode(ERRCODE_TOO_MANY_ARGUMENTS), > + errmsg_plural("publication %s does not exist in the publisher", > +"publications %s do not exist in the publisher", > > The existing code using > ereport(ERROR, > (errcode(ERRCODE_UNDEFINED_OBJECT), > errmsg("subscription \"%s\" does not exist", subname))); Modified > 2) Typo: It is "One of the specified publications exists." > +# One of the specified publication exist. Modified > 3) I think we can remove the test case "+# Specified publication does > not exist." because the "+# One of the specified publication exist." > covers the code. Modified > 4) Do we need the below test case? Even with refresh = false, it does > call connect_and_check_pubs() right? Please remove it. > +# Specified publication does not exist with refresh = false. > +($ret, $stdout, $stderr) = $node_subscriber->psql('postgres', > + "ALTER SUBSCRIPTION mysub1 SET PUBLICATION non_existent_pub > WITH (REFRESH = FALSE, VALIDATE_PUBLICATION = TRUE)" > +); > +ok( $stderr =~ > + m/ERROR: publication "non_existent_pub" does not exist in > the publisher/, > + "Alter subscription for non existent publication fails"); > + Modified > 5) Change the test case names to different ones instead of the same. > Have something like: > "Create subscription fails with single non-existent publication"); > "Create subscription fails with multiple non-existent publications"); > "Create subscription fails with mutually exclusive options"); > "Alter subscription add publication fails with non-existent publication"); > "Alter subscription set publication fails with non-existent publication"); > "Alter subscription set publication fails with connection to a > non-existent database"); Modified Thanks for the comments, the attached v13 patch has the fixes for the same. Regards, Vignesh From f85203bf95359dac6fb2eea3b973be7ceaadf4e1 Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Tue, 9 Nov 2021 20:30:47 +0530 Subject: [PATCH v13] Identify missing publications from publisher while create/alter subscription. Creating/altering subscription is successful when we specify a publication which does not exist in the publisher. This patch checks if the specified publications are present in the publisher and throws an error if any of the publication is missing in the publisher. --- doc/src/sgml/ref/alter_subscription.sgml | 13 ++ doc/src/sgml/ref/create_subscription.sgml | 20 ++- src/backend/commands/subscriptioncmds.c | 207 +++--- src/bin/psql/tab-complete.c | 14 +- src/test/subscription/t/007_ddl.pl| 56 +- 5 files changed, 276 insertions(+), 34 deletions(-) diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index 0b027cc346..95b6b6a0e5 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -168,6 +168,19 @@ ALTER SUBSCRIPTION name RENAME TO < + + +validate_publication (boolean) + + + When true, the command verifies if all the specified publications + that are being subscribed to are present in the publisher and throws + an error if any of the publication doesn't exist. The default is + false. + + + + diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index 990a41f1a1..8ec028ad7d 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -110,12 +110,14 @@ CREATE SUBSCRIPTION subscription_nametrue. Setting this to false will force the values of - create_slot, enabled and - copy_data to false. + create_slot, enabled, + copy_data and + validate_publication to false. (You cannot combine setting connect to false with setting create_slot, enabled, - or copy_data to true.) + copy_data or + validate_publication to true.) @@ -170,6 +172,18 @@ CREATE SUBSCRIPTION subscription_name + + +validate_publication (boolean) + + + When true, the command verifies if all the specified publications + that are being subscribed to are present in the publisher and throws + an error if any of the publication doesn't exist. The default is + false. + + + diff --git
Re: Invalid Unicode escape value at or near "\u0000"
On Sat, Nov 13, 2021 at 4:32 PM Japin Li wrote: > When I try to insert an Unicode "\u", there is an error $subject. > > postgres=# CREATE TABLE tbl (s varchar(10)); > CREATE TABLE > postgres=# INSERT INTO tbl VALUES (E'\u'); > ERROR: invalid Unicode escape value at or near "\u" > LINE 1: INSERT INTO tbl VALUES (E'\u'); > ^ > > "\u" is valid unicode [1], why not we cannot insert it? Yes, it is a valid codepoint, but unfortunately PostgreSQL can't support it because it sometimes deals in null terminated string, even though internally it does track string data and length separately. We have to do that to use libc facilities like strcoll_r(), and probably many other things.
Re: Parallel vacuum workers prevent the oldest xmin from advancing
On Fri, Nov 12, 2021 at 6:44 PM Alvaro Herrera wrote: > > On 2021-Nov-11, Masahiko Sawada wrote: > > > On Thu, Nov 11, 2021 at 12:53 PM Amit Kapila > > wrote: > > > > > > On Thu, Nov 11, 2021 at 9:11 AM Andres Freund wrote: > > > > > This seems like an unnecessary optimization. > > > > ProcArrayInstallRestoredXmin() > > > > only happens in the context of much more expensive operations. > > > > > > Fair point. I think that will also make the change in > > > ProcArrayInstallRestoredXmin() appear neat. > > > > This makes me think that it'd be better to copy status flags in a > > separate function rather than ProcArrayInstallRestoredXmin(). > > To me, and this is perhaps just personal opinion, it seems conceptually > simpler to have ProcArrayInstallRestoredXmin acquire exclusive and do > both things. Why? Because if you have two functions, you have to be > careful not to call the new function after ProcArrayInstallRestoredXmin; > otherwise you would create an instant during which you make an > Xmin-without-flag visible to other procs; this causes the computed xmin > go backwards, which is verboten. > > If I understand Amit correctly, his point is about the callers of > RestoreTransactionSnapshot, which are two: CreateReplicationSlot and > ParallelWorkerMain. He wants you hypothetical new function called from > the latter but not the former. Looking at both, it seems a bit strange > to make them responsible for a detail such as "copy ->statusFlags from > source proc to mine". It seems more reasonable to add a third flag to > RestoreTransactionSnapshot(Snapshot snapshot, void *source_proc, bool > is_vacuum) > and if that is true, tell SetTransactionSnapshot to copy flags, > otherwise not. > If we decide to go this way then I suggest adding a comment to convey why we choose to copy status flags in ProcArrayInstallRestoredXmin() as the function name doesn't indicate it. > > ... unrelated to this, and looking at CreateReplicationSlot, I wonder > why does it pass MyProc as the source_pgproc parameter. What good is > that doing? I mean, if the only thing we do with source_pgproc is to > copy stuff from source_pgproc to MyProc, then if source_pgproc is > MyProc, we're effectively doing nothing at all. (You can't "fix" this > by merely passing NULL, because what that would do is change the calling > of ProcArrayInstallRestoredXmin into a call of > ProcArrayInstallImportedXmin and that would presumably have different > behavior.) I may be misreading the code of course, but it sounds like > the intention of CreateReplicationSlot is to "do nothing" with the > transaction snapshot in a complicated way. > It ensures that the source transaction is still running, otherwise, it won't allow the import to be successful. It also seems to help by updating the state for GlobalVis* stuff. I think in the current form it seems to help in not moving MyProc-xmin and TransactionXmin backward due to checks in ProcArrayInstallRestoredXmin() and also change them to the value in source snapshot. -- With Regards, Amit Kapila.
Re: BUFFERS enabled by default in EXPLAIN (ANALYZE)
On Sat, Nov 13, 2021 at 8:18 AM Tomas Vondra wrote: > > On 11/12/21 23:58, Nikolay Samokhvalov wrote: > > Re-reading the thread [1] (cannot answer there – don't have those emails > > in my box anymore), > > You can download the message as mbox and import it into your client > (pretty much any client supports that, I think). There is also a "resend mail" link available for each mail in the archive which should be even more straightforward.
Re: Allow users to choose what happens when recovery target is not reached
On Sat, Nov 13, 2021 at 11:00 AM Bharath Rupireddy wrote: > > Users will always be optimistic and set a recovery target and try to > reach it, but somehow the few of the WAL files haven't arrived (for > whatever the reasons) the PITR target server, imagine if their primary > isn't available too, then with the proposal I made, they can choose to > have at least an available target server rather than a FATALly failed > one. If your primary server isn't available, why would you want a recovery target in the first place? I just don't understand in which case someone would want to setup a recovery target and wouldn't care if the recovery wasn't reached, especially if it can be off by GB / days of data. It seems like it could have the opposite effect of what you want most of the time. What if for some reason the restore_command is flawed, and you end up starting your server because it couldn't restore WAL that are actually available? You would have to restart from scratch and waste more time than if you didn't use this. It look like what you actually want is some kind of a target window, but the window you currently propose is a hardcoded (consistency, given target], and it seems too dangerous to be useful.
Invalid Unicode escape value at or near "\u0000"
Hi, hackers When I try to insert an Unicode "\u", there is an error $subject. postgres=# CREATE TABLE tbl (s varchar(10)); CREATE TABLE postgres=# INSERT INTO tbl VALUES (E'\u'); ERROR: invalid Unicode escape value at or near "\u" LINE 1: INSERT INTO tbl VALUES (E'\u'); ^ "\u" is valid unicode [1], why not we cannot insert it? [1] https://www.unicode.org/charts/PDF/U.pdf -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Inconsistent error message for varchar(n)
Hi, hackers When I try to create table that has a varchar(n) data type, I find an inconsistent error message for it. postgres=# CREATE TABLE tbl (s varchar(2147483647)); ERROR: length for type varchar cannot exceed 10485760 LINE 1: CREATE TABLE tbl (s varchar(2147483647)); ^ postgres=# CREATE TABLE tbl (s varchar(2147483648)); ERROR: syntax error at or near "2147483648" LINE 1: CREATE TABLE tbl (s varchar(2147483648)); ^ I find that in gram.y the varchar has an integer parameter which means its value don't exceed 2147483647. The first error message is reported by anychar_typmodin(), and the later is reported by gram.y. IMO, the syntax error for varchar(n) is more confused. Any thoughts? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation
On Sat, Nov 13, 2021 at 12:06 AM Euler Taveira wrote: > > Here's a rebased v8 patch. Please review it. > > This improves the user experience by increasing the granularity of error > reporting, and maps well with the precedent set in 81d5995b4. I'm marking > this > Ready for Committer and will go ahead and apply this unless there are > objections. > > Shouldn't we modify errdetail_relkind_not_supported() to include > relpersistence > as a 2nd parameter and move those messages to it? I experiment this idea with > the attached patch. The idea is to provide a unique function that reports > accurate detail messages. Thanks. It is a good idea to use errdetail_relkind_not_supported. I slightly modified the API to "int errdetail_relkind_not_supported(Oid relid, Form_pg_class rd_rel);" to simplify things and increase the usability of the function further. For instance, it can report the specific error for the catalog tables as well. And, also added "int errdetail_relkind_not_supported _v2(Oid relid, char relkind, char relpersistence);" so that the callers not having Form_pg_class (there are 3 callers exist) can pass the parameters directly. PSA v10. Regards, Bharath Rupireddy. From 50e0570fe2c83dfddc85f3e7df0ffebb12c5ef3e Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Sat, 13 Nov 2021 02:33:44 + Subject: [PATCH v10] Improve publication error messages Provide accurate messages for CREATE PUBLICATION commands. Also add tests to cover temporary, unlogged, system and foreign tables. While at it, modify errdetail_relkind_not_supported API to distinguish tables based on persistence property and report for system tables. Also, introduce errdetail_relkind_not_supported_v2 so that the callers not having Form_pg_class object can send relid, relkind, relpersistence as direct parameters. This way, the API is more flexbile and easier to use. It also provides accurate detail messages for future work. --- contrib/amcheck/verify_heapam.c | 2 +- contrib/pageinspect/rawpage.c | 2 +- contrib/pg_surgery/heap_surgery.c | 2 +- contrib/pg_visibility/pg_visibility.c | 2 +- contrib/pgstattuple/pgstatapprox.c| 2 +- contrib/pgstattuple/pgstatindex.c | 2 +- contrib/pgstattuple/pgstattuple.c | 2 +- .../postgres_fdw/expected/postgres_fdw.out| 6 contrib/postgres_fdw/sql/postgres_fdw.sql | 5 src/backend/catalog/pg_class.c| 24 ++-- src/backend/catalog/pg_publication.c | 28 ++- src/backend/commands/comment.c| 2 +- src/backend/commands/indexcmds.c | 2 +- src/backend/commands/lockcmds.c | 5 ++-- src/backend/commands/seclabel.c | 2 +- src/backend/commands/sequence.c | 2 +- src/backend/commands/statscmds.c | 2 +- src/backend/commands/subscriptioncmds.c | 4 +-- src/backend/commands/tablecmds.c | 9 +++--- src/backend/commands/trigger.c| 6 ++-- src/backend/executor/execReplication.c| 5 ++-- src/backend/parser/parse_utilcmd.c| 2 +- src/backend/replication/logical/relation.c| 2 +- src/backend/rewrite/rewriteDefine.c | 4 +-- src/include/catalog/pg_class.h| 4 ++- src/include/executor/executor.h | 2 +- src/test/regress/expected/publication.out | 16 +++ src/test/regress/sql/publication.sql | 14 ++ 28 files changed, 107 insertions(+), 53 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index bae5340111..f6b16c8fb0 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -314,7 +314,7 @@ verify_heapam(PG_FUNCTION_ARGS) (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot check relation \"%s\"", RelationGetRelationName(ctx.rel)), - errdetail_relkind_not_supported(ctx.rel->rd_rel->relkind))); + errdetail_relkind_not_supported(RelationGetRelid(ctx.rel), ctx.rel->rd_rel))); /* * Sequences always use heap AM, but they don't show that in the catalogs. diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c index 4bfa346c24..3173fb83d3 100644 --- a/contrib/pageinspect/rawpage.c +++ b/contrib/pageinspect/rawpage.c @@ -160,7 +160,7 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno) (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot get raw page from relation \"%s\"", RelationGetRelationName(rel)), - errdetail_relkind_not_supported(rel->rd_rel->relkind))); + errdetail_relkind_not_supported(RelationGetRelid(rel), rel->rd_rel))); /* * Reject attempts to read non-local temporary relations; we would be diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surgery.c index 7edfe4f326..043efda2f6 100644 --- a/contrib/pg_surgery/heap_surgery.c
Re: Allow users to choose what happens when recovery target is not reached
On Fri, Nov 12, 2021 at 4:09 PM Julien Rouhaud wrote: > > On Fri, Nov 12, 2021 at 6:14 PM Bharath Rupireddy > wrote: > > > > Currently, the server shuts down with a FATAL error (added by commit > > [1]) when the recovery target isn't reached. This can cause a server > > availability problem, especially in case of disaster recovery (geo > > restores) where the primary was down and the user is doing a PITR on a > > server lying in another region where it had missed to receive few of > > the last WAL files required to reach the recovery target. In this > > case, users might want the server to be available rather than a no > > server. With the commit [1], there's no way to achieve what users > > wanted. > > if users don't mind if the recovery target is reached or not isn't it > better to simply don't specify a target and let the recovery go as far > as possible? Users will always be optimistic and set a recovery target and try to reach it, but somehow the few of the WAL files haven't arrived (for whatever the reasons) the PITR target server, imagine if their primary isn't available too, then with the proposal I made, they can choose to have at least an available target server rather than a FATALly failed one. Regards, Bharath Rupireddy.
Re: support for MERGE
On Fri, Nov 12, 2021 at 9:58 AM Alvaro Herrera wrote: > Here's a new version. Many of the old complaints have been fixed; > particularly, the handling of partitioned tables is now much cleaner and > straightforward. Amit Langote helped considerably in getting this part > to shape -- thanks for that. Amit also helped correct the EvalPlanQual > behavior, which wasn't quite up to snuff. > > There are a few things that can still be improved here. For one, I need > to clean up the interactions with table AM (and thus heapam.c etc). > Secondarily, and I'm now not sure that I really want to do it, is change > the representation for executor: instead of creating a fake join between > target and source, perhaps we should have just source, and give > optimizer a separate query to fetch tuples from target. > > What I did do is change how the target table is represented from parse > analysis to executor. For example, in the original code, there were two > RTEs that represented the target table; that is gone. Now the target > table is always just the query's resultRelation. This removes a good > number of ugly hacks that had been objected to. > > I'll park this in the January commitfest now. > > -- > Álvaro Herrera Valdivia, Chile — > https://www.EnterpriseDB.com/ > "Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; > jugar > al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en > La Feria de las Tinieblas, R. Bradbury) > Hi, This is continuation of review. + elog(WARNING, "hoping nothing needed here"); the above warning can be dropped. + ExprState *mas_whenqual; /* WHEN [NOT] MATCHED AND conditions */ In my earlier comment I suggested changing notMatched to unmatched - I didn't pay attention to the syntax. That suggestion can be ignored. Cheers
Re: support for MERGE
On Fri, Nov 12, 2021 at 3:13 PM Alvaro Herrera wrote: > On 2021-Nov-12, Zhihong Yu wrote: > > > Hi, > > > > + skipped_path = total - insert_path - update_path - > delete_path; > > > > Should there be an assertion that skipped_path is not negative ? > > Hm, yeah, added. > > > +* We maintain separate transaction tables for UPDATE/INSERT/DELETE > since > > +* MERGE can run all three actions in a single statement. Note that > UPDATE > > +* needs both old and new transition tables > > > > Should the 'transaction' in the first line be transition ? > > Oh, of course. > > Uploaded fixup commits to > https://github.com/alvherre/postgres/commits/merge-15 > > -- > Álvaro Herrera Valdivia, Chile — > https://www.EnterpriseDB.com/ Hi, + resultRelInfo->ri_notMatchedMergeAction = NIL; ri_notMatchedMergeAction -> ri_unmatchedMergeAction +static void ExecMergeNotMatched(ModifyTableState *mtstate, ExecMergeNotMatched -> ExecMergeUnmatched +*In this case, we are still dealing with a WHEN MATCHED case. +*In this case, we recheck the list of WHEN MATCHED actions from It seems the comment can be simplified to: +*In this case, since we are still dealing with a WHEN MATCHED case, +*we recheck the list of WHEN MATCHED actions from +* If we got no tuple, or the tuple we get has 'get' appears in different tenses. Better use either 'get' or 'got' in both places. +lmerge_matched: ... + foreach(l, resultRelInfo->ri_matchedMergeAction) I suggest expanding the foreach macro into the form of for loop where the loop condition has extra boolean variable merge_matched. Initial value for merge_matched can be true. Inside the loop, we can adjust merge_matched's value to control whether the for loop continues. This would avoid using goto label. + if (commandType == CMD_UPDATE && tuple_updated) Since commandType can only be update or delete, it seems tuple_updated and tuple_deleted can be consolidated into one boolean variable (tuple_modified). The above point is personal preference. +* We've activated one of the WHEN clauses, so we don't search +* further. This is required behaviour, not an optimization. +*/ + break; We can directly return instead of break'ing. +* Similar logic appears in ExecInitPartitionInfo(), so if changing +* anything here, do so there too. The above implies code dedup via refactoring - can be done in a separate patch. To be continued ...
Re: BUFFERS enabled by default in EXPLAIN (ANALYZE)
On 11/12/21 11:58 PM, Nikolay Samokhvalov wrote: > Re-reading the thread [1] (cannot answer there – don't have those emails in > my box anymore), I see that there was strong support for enabling BUFFERS > in EXPLAIN ANALYZE by default. And there were patches. Commitfest entry [2] > was marked Rejected because there were questions to the implementation > based on GUCs. > > Attached is a simple patch enabling BUFFERS by default, without involving > GUCs. I obviously agree with this (although I still wish we had gucs as we keep adding more and more options to EXPLAIN). The patch looks good to me, too. +1 -- Vik Fearing
Re: ALTER TABLE DETACH PARTITION violates serializability
On Fri, Nov 12, 2021 at 3:27 PM Tom Lane wrote: > I wasn't aware of $SUBJECT ... were you? I wasn't aware of this particular example, but I didn't spend much time worrying about it, because DDL interleaved with DML generally isn't serializable anyway. If you issue "SELECT * FROM some_previously_unacessed_table" in a transaction, it will find some_table even if the XID that created that table is not visible to your snapshot. And conversely if that table has been dropped by some XID that is not visible to your snapshot, you will get an error that the table doesn't exist. There's also ATRewriteTable(), which is not MVCC-aware, and thus if someone applies that to a table before you first access it, you may see rows you shouldn't see and fail to see rows which you should see. I don't see those kinds of cases as being all that different from this one. My judgement is that outwaiting other XIDs would make far more people unhappy than it would make happy. If we want to have that as an optional behavior, I guess that would be OK, but I sure wouldn't make it the default. I feel like I went to quite a bit of effort to keep the lock levels as low as reasonably possible here and to allow as much concurrency as I could, and I bet there are still a LOT more people who are unhappy about things that take more locks than they'd like than there are people who are sad about these kinds of MVCC issues. -- Robert Haas EDB: http://www.enterprisedb.com
Re: BUFFERS enabled by default in EXPLAIN (ANALYZE)
On 11/12/21 23:58, Nikolay Samokhvalov wrote: Re-reading the thread [1] (cannot answer there – don't have those emails in my box anymore), You can download the message as mbox and import it into your client (pretty much any client supports that, I think). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Commitfest 2021-11 Patch Triage - Part 2
On 11/10/21 16:54, Andrey Borodin wrote: Daniel Gustafsson writes: 2773: libpq compression === This patch intended to provide libpq connection compression to "replace SSL compression" which was doomed when the patch was written, and have since been removed altogether. The initial approach didn't get much traction but there was significant discussion and work, which has since fizzled out. The patch has been updated but there hasn't been meaningful review the past months, the last comments seem to imply there being a fair amount of questionmarks left in here. Robert, having been very involved in this do you have any thoughts on where we are and where to go (if at all IYO)? I'm not Robert, but I still have an opinion here, and that it's that this feature would at best be an attractive nuisance. If you need compression on a database session, it probably means that the connection is over the open internet, which means that you need encryption even more. And we know that compression and encryption do not play well together. The reason compression was taken out of the latest TLS standards is not that they wouldn't have liked to have it, nor that applying compression in a separate code layer would be any safer. I fear offering this would merely lead people to build CVE-worthy setups. Compression is crucial for highly available setups. Replication traffic is often billed. Or route has bandwidth limits. An entropy added by WAL headers makes CRIME attack against replication encryption impractical. I very much doubt WAL headers are a reliable protection against CRIME, because the entropy of the headers is likely fairly constant. So if you compress the WAL stream, the WAL headers may change but the compression ratio should be pretty similar. At least that's my guess. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Should AT TIME ZONE be volatile?
On Sat, Nov 13, 2021 at 11:47 AM Ilya Anfimov wrote: > Currently for glibc the version looks like glibc version at > initdb, and that doesn't seem very reliable, but that could be a > different task (to find LC_COLLATE file and put hash of the > usuable data into version string, for example). Yeah, I had a system exactly like that working (that is, a way to run arbitrary commands to capture version strings, that could be used to hash your collation definition files, patches somewhere in the archives), but then we thought it'd be better to use glibc versions, and separately, to perhaps try to ask the glibc people to expose a version. FreeBSD (at my request), Windows and ICU do expose versions in a straightforward way, and we capture those. > Currently, it is questionable how to work with the different > versions of collations -- but that could be solved e.g. via ap- > propriate naming. Perhaps "collation@ver" ? But if the version > would contain a hash, a full version could be a bit dubious. > And some database maintainance task could check that all the old > collations are available, rename them as needed, and create a set > the new ones. > Automatically invalidating all the indexes, unfortunately. We built a system that at least detected the changes on a per-index level, but failed to ship it in release 14. See revert commit, and links back to previous commits and discussion: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ec48314708262d8ea6cdcb83f803fc83dd89e721 It didn't invalidate indexes, but complained about each individual index on first access in each session, until you REINDEXed it. We will try again :-) In the archives you can find discussions of how to make a system that tolerates multiple version existing at the same time as I think you're getting at, like DB2 does. It's tricky, because it's code and data. DB2 ships with N copies of ICU in it.
Re: Commitfest 2021-11 Patch Triage - Part 2
On 11/10/21 00:21, Bruce Momjian wrote: On Tue, Nov 9, 2021 at 12:43:20PM -0500, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Daniel Gustafsson writes: I'm not Robert, but I still have an opinion here, and that it's that this feature would at best be an attractive nuisance. If you need compression on a database session, it probably means that the connection is over the open internet, which means that you need encryption even more. And we know that compression and encryption do not play well together. The reason compression was taken out of the latest TLS standards is not that they wouldn't have liked to have it, nor that applying compression in a separate code layer would be any safer. I fear offering this would merely lead people to build CVE-worthy setups. I've got an opinion on this also and I don't agree that needing compression means you're on the open internet or that we shouldn't allow users the choice to decide if they want to use compression and encryption together or not. Yes, there's potential risks there, but I don't think those risks would lead to CVEs against PG for supporting a mode where we allow compression and then also allow encryption- if that's a problem then it's an issue for the encryption library/method being used and isn't the fault of us for allowing that combination. Yeah, I am leaning this diretion too. On the one hand, the number of ways to exploit compression-then-encryption seem to continue to grow, but it is the complex behavior of HTTPS and connecting to multiple sites which seems to always be required, and we don't do that with the Postgres client/server protocol. +0.5 I don't think the number of attacks grows all that much, actually. Or more precisely, the new vulnerabilities are just variations on the CRIME pattern adopting it to different contexts. For example Voracle [1] applies to openvpn connections, but in principle it's the same scenario - observing changes to the compressed size, after tweaking the requests. [1] https://community.openvpn.net/openvpn/wiki/VORACLE So it's true compression is a side-channel, but these attacks require other ingredients too - ability to MITM the connection, and ability to inject custom data. Doing MITM is probably harder than when attacking HTTPS, because DB connections tend to be between app server and db server - so setting up a rogue AP does not really work. Sometimes people / DBAs may connect to the DB directly, although in a way that makes it hard to do modify the data (which is why psql/ssh are immune to this). But it's possible. But how hard is injecting custom data in a way that'd make CRIME-like issue possible? Yes, you could hijack webapp similarly to CRIME, in a way that initiates database requests with custom data. But we don't generally send both the supplied and actual password to the DB to do the check there - we just send one of them, and check it against the value stored in the DB. The one (somewhat) plausible scenario I can think of is an application doing UPDATE t SET field = user_data WHERE id = 1; SELECT field FROM t WHERE ...; in which case we might learn if some other row in the result contains the same value. Not sure how practical this could be, though. AFAIK OpenVPN still supports tunnel compression, but it's disabled by default. That probably says something about what they think about the feature, although that might be due to having compression and not wanting to just remove it. One interesting idea OpenVPN allows is asymmetric compression, i.e. allowing compression in just one direction. For example it might be possible for the client to send compressed data, but the server would never send compressed. Or vice versa. That might still help with e.g. large imports or exports, depending on the use case, while maybe mitigating some issues. But it's clearly not a solid defense. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: ALTER TABLE DETACH PARTITION violates serializability
Alvaro Herrera writes: > I understand that the behavior is not fully correct, but given the way > most people are going to use this (which is that they're no longer > terribly interested in the data of the partition being detached/dropped) > and the severity of the penalties if we implement a full solution, I > lean towards documenting it rather than fixing it. > Another option might be to play with the trade-offs in the CONCURRENTLY > mode vs. the regular one. If we make the CONCURRENTLY mode wait for all > snapshots, we would only be making worse a mode that's already > documented to potentially take a long time. So people who want safety > can use that mode, and the others are aware of the risk. Meh ... "wrong by default" doesn't seem like it fits the Postgres ethos. How about adding an option UNSAFE (orthogonal to CONCURRENTLY) that activates the current behavior, and without it we wait for everything? regards, tom lane
Re: support for MERGE
On 2021-Nov-12, Zhihong Yu wrote: > Hi, > > + skipped_path = total - insert_path - update_path - delete_path; > > Should there be an assertion that skipped_path is not negative ? Hm, yeah, added. > +* We maintain separate transaction tables for UPDATE/INSERT/DELETE since > +* MERGE can run all three actions in a single statement. Note that UPDATE > +* needs both old and new transition tables > > Should the 'transaction' in the first line be transition ? Oh, of course. Uploaded fixup commits to https://github.com/alvherre/postgres/commits/merge-15 -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: ALTER TABLE DETACH PARTITION violates serializability
On 2021-Nov-12, Tom Lane wrote: > I wasn't aware of $SUBJECT ... were you? Yeah, I remember pointing out that DETACH and DROP and not fully correct for serializability, but I can't find any thread where I said it in so many words. At the time I had no ideas on how to fix it; the idea of waiting for snapshots to go away didn't occur to anybody, or at least it didn't reach me. > AFAICS, the only real way to fix this is to acquire lock on > the target partition and then wait out any snapshots that are > older than the lock, just in case those transactions would look > at the partitioned table later. I'm not sure if we want to go > there, but if we don't, we at least have to document this gotcha. ALTER TABLE DETACH PARTITION CONCURRENTLY already has a wait-for-snapshots phase. It doesn't fix this problem, because it doesn't wait for all snapshots, just the ones holding any lock on the parent table. I'm not sure how upset would people be if we made it wait on *all* snapshots, and worse, to make it in all cases and not just CONCURRENTLY. I understand that the behavior is not fully correct, but given the way most people are going to use this (which is that they're no longer terribly interested in the data of the partition being detached/dropped) and the severity of the penalties if we implement a full solution, I lean towards documenting it rather than fixing it. Another option might be to play with the trade-offs in the CONCURRENTLY mode vs. the regular one. If we make the CONCURRENTLY mode wait for all snapshots, we would only be making worse a mode that's already documented to potentially take a long time. So people who want safety can use that mode, and the others are aware of the risk. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I am amazed at [the pgsql-sql] mailing list for the wonderful support, and lack of hesitasion in answering a lost soul's question, I just wished the rest of the mailing list could be like this." (Fotis) (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)
BUFFERS enabled by default in EXPLAIN (ANALYZE)
Re-reading the thread [1] (cannot answer there – don't have those emails in my box anymore), I see that there was strong support for enabling BUFFERS in EXPLAIN ANALYZE by default. And there were patches. Commitfest entry [2] was marked Rejected because there were questions to the implementation based on GUCs. Attached is a simple patch enabling BUFFERS by default, without involving GUCs. Why it is important? In many cases, people forget about the BUFFERS option in EXPLAIN ANALYZE and share execution plans without it – sending it via regular communication channels for work or publishing to visualization systems. Meanwhile, the option has a lower overhead compared to TIMING (enabled by default for EXPLAIN ANALYZE) and it is extremely useful for query optimization. This patch doesn't enable BUFFERS for EXPLAIN executed without ANALYZE. Open questions: 1. Should BUFFERS be enabled for regular (w/o ANALYZE) EXPLAIN? Now it may make sense because of the buffer numbers the planner uses. This patch doesn't do that, but it definitely may make sense because it can help people understand why planning time is so big, in some cases. 2. How to adjust documentation? Should EXPLAIN ANALYZE examples be adjusted to use BUFFERS OFF (easier change) or show some example buffer numbers – like it is done for timing and cost numbers? I tend to think that the latter makes more sense. 3. How to adjust regression tests? Of course, now many tests fail. Same question as for documentation. Excluding buffer, numbers would be an easier fix, of course – but at least some tests should check the behavior of BUFFERS (such as both default options – with and without ANALYZE). On any given platform, the buffer numbers are pretty stable, so we could rely on it, but I'm not sure about all possible options being tested and would appreciate advice here (of course, if the patch makes it thru the discussion in general). ## Links [1] https://www.postgresql.org/message-id/flat/b3197ba8-225f-f53c-326d-5b1756c77c3e%40postgresfriends.org [2] https://commitfest.postgresql.org/28/2567/ 001-buffers-in-explain-analyze-enabled-by-default.patch Description: Binary data
Re: Should AT TIME ZONE be volatile?
On Thu, Nov 11, 2021 at 09:52:52AM -0500, Tom Lane wrote: > Robert Haas writes: > > I'm not really convinced that ICU is better, either. I think it's more > > that it isn't used as much. > > Well, at least ICU has a notion of attaching versions to collations. > How mindful they are of bumping the version number when necessary > remains to be seen. But the POSIX locale APIs don't even offer the > opportunity to get it right. > > > I don't have any constructive proposal for what to do about any of > > this. It sure is frustrating, though. > > Yup. If we had reliable ways to detect changes in this sort of > environment-supplied data, maybe we could do something about it > (a la the work that's been happening on attaching collation versions > to indexes). But personally I can't summon the motivation to work Theoretically there are versions attached to collations already: the collation in index is an oid referencing the pg_collation. And the pg_collation already has versions. Currently for glibc the version looks like glibc version at initdb, and that doesn't seem very reliable, but that could be a different task (to find LC_COLLATE file and put hash of the usuable data into version string, for example). Currently, it is questionable how to work with the different versions of collations -- but that could be solved e.g. via ap- propriate naming. Perhaps "collation@ver" ? But if the version would contain a hash, a full version could be a bit dubious. And some database maintainance task could check that all the old collations are available, rename them as needed, and create a set the new ones. Automatically invalidating all the indexes, unfortunately. > on that, when ICU is the *only* such infrastructure that offers > readily program-readable versioning. > > regards, tom lane >
Re: Undocumented AT TIME ZONE INTERVAL syntax
Corey Huinker writes: > Attached is an attempt at an explanation of the edge cases I was > encountering, as well as some examples. If nothing else, the examples will > draw eyes and searches to the explanations that were already there. I looked this over and have a few thoughts: * I don't think your explanation of the behavior of colon-separated times is quite correct; for example, it doesn't correctly describe this: regression=# select INTERVAL '2:03:04' minute to second; interval -- 02:03:04 (1 row) I think the actual rule is that hh:mm:ss is always interpreted that way regardless of the typmod (though we may then drop low-order fields if the typmod says to). Two colon-separated numbers are interpreted as hh:mm by default, but as mm:ss if the typmod is exactly "minute to second". (This might work better in a separate para; the one you've modified here is mostly about what we do with unmarked quantities, but the use of colons makes these numbers not unmarked.) * I'm not sure I would bother with examples for half-broken formats like "2:". People who really want to know about that can experiment, while for the rest of us it seems like adding confusion. * In the same vein, I think your 0002 example adds way more confusion than illumination. Maybe better would be a less contrived offset, say # SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00' AT TIME ZONE INTERVAL '3:00:00'; timezone - 2001-02-16 23:38:40 (1 row) which could be put after the second example and glossed as "The third example rotates a timestamp specified in UTC to the zone three hours east of Greenwich, using a constant interval as the time zone specification". regards, tom lane
Re: Commitfest 2021-11 Patch Triage - Part 2
On Tue, Nov 9, 2021 at 7:02 AM Tom Lane wrote: > Daniel Gustafsson writes: > > 2773: libpq compression > > === > > This patch intended to provide libpq connection compression to "replace > SSL > > compression" which was doomed when the patch was written, and have since > been > > removed altogether. The initial approach didn't get much traction but > there > > was significant discussion and work, which has since fizzled out. The > patch > > has been updated but there hasn't been meaningful review the past > months, the > > last comments seem to imply there being a fair amount of questionmarks > left in > > here. Robert, having been very involved in this do you have any > thoughts on > > where we are and where to go (if at all IYO)? > > I'm not Robert, but I still have an opinion here, and that it's that this > feature would at best be an attractive nuisance. If you need compression > on a database session, it probably means that the connection is over the > open internet, which means that you need encryption even more. > This assumption is very vague. I personally had multiple cases when network bandwidth for app <--> Postgres communication, that were fixed wither via upgrades (spending more money) or making application developers cache more data (optimization), or, usually, both. Never public internet connections were involved. Actually, any growing startup experiences such issues sooner or later. Network compression would be a great tool for many setups, and some other popular database systems offer it for a long.
Re: support for MERGE
On Fri, Nov 12, 2021 at 9:58 AM Alvaro Herrera wrote: > Here's a new version. Many of the old complaints have been fixed; > particularly, the handling of partitioned tables is now much cleaner and > straightforward. Amit Langote helped considerably in getting this part > to shape -- thanks for that. Amit also helped correct the EvalPlanQual > behavior, which wasn't quite up to snuff. > > There are a few things that can still be improved here. For one, I need > to clean up the interactions with table AM (and thus heapam.c etc). > Secondarily, and I'm now not sure that I really want to do it, is change > the representation for executor: instead of creating a fake join between > target and source, perhaps we should have just source, and give > optimizer a separate query to fetch tuples from target. > > What I did do is change how the target table is represented from parse > analysis to executor. For example, in the original code, there were two > RTEs that represented the target table; that is gone. Now the target > table is always just the query's resultRelation. This removes a good > number of ugly hacks that had been objected to. > > I'll park this in the January commitfest now. > > -- > Álvaro Herrera Valdivia, Chile — > https://www.EnterpriseDB.com/ > "Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; > jugar > al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en > La Feria de las Tinieblas, R. Bradbury) > Hi, + skipped_path = total - insert_path - update_path - delete_path; Should there be an assertion that skipped_path is not negative ? +* We maintain separate transaction tables for UPDATE/INSERT/DELETE since +* MERGE can run all three actions in a single statement. Note that UPDATE +* needs both old and new transition tables Should the 'transaction' in the first line be transition ? cheers
Re: Add additional information to src/test/ssl/README
> On 12 Nov 2021, at 16:21, Tom Lane wrote: > > Daniel Gustafsson writes: >> Attached is a small addition mentioning PG_TEST_NOCLEAN > > Maybe could use a bit of copy-editing, say > > Data directories will also be left behind for analysis when a test fails; > they are named according to the test filename. But if the environment > variable PG_TEST_NOCLEAN is set, data directories will be retained > regardless of test status. Agreed, that reads better. Pushed with those changes. -- Daniel Gustafsson https://vmware.com/
Re: [Patch] ALTER SYSTEM READ ONLY
On Mon, Nov 8, 2021 at 8:20 AM Amul Sul wrote: > Attached is the rebased version of refactoring as well as the > pg_prohibit_wal feature patches for the latest master head (commit # > 39a3105678a). I spent a lot of time today studying 0002, and specifically the question of whether EndOfLog must be the same as XLogCtl->replayEndRecPtr and whether EndOfLogTLI must be the same as XLogCtl->replayEndTLI. The answer to the former question is "no" because, if we don't enter redo, XLogCtl->replayEndRecPtr won't be initialized at all. If we do enter redo, then I think it has to be the same unless something very weird happens. EndOfLog gets set like this: XLogBeginRead(xlogreader, LastRec); record = ReadRecord(xlogreader, PANIC, false, replayTLI); EndOfLog = EndRecPtr; In every case that exists in our regression tests, EndRecPtr is the same before these three lines of code as it is afterward. However, if you test with recovery_target=immediate, you can get it to be different, because in that case we drop out of the redo loop after calling recoveryStopsBefore() rather than after calling recoveryStopsAfter(). Similarly I'm fairly sure that if you use recovery_target_inclusive=off you can likewise get it to be different (though I discovered the hard way that recovery_target_inclusive=off is ignored when you use recovery_target_name). It seems like a really bad thing that neither recovery_target=immediate nor recovery_target_inclusive=off have any tests, and I think we ought to add some. Anyway, in effect, these three lines of code have the effect of backing up the xlogreader by one record when we stop before rather than after a record that we're replaying. What that means is that EndOfLog is going to be the end+1 of the last record that we actually replayed. There might be one more record that we read but did not replay, and that record won't impact the value we end up with in EndOfLog. Now, XLogCtl->replayEndRecPtr is also that end+1 of the last record that we actually replayed. To put that another way, there's no way to exit the main redo loop after we set XLogCtl->replayEndRecPtr and before we change LastRec. So in the cases where XLogCtl->replayEndRecPtr gets initialized at all, it can only be different from EndOfLog if something different happens when we re-read the last-replayed WAL record than what happened when we read it the first time. That seems unlikely, and would be unfortunate it if it did happen. I am inclined to think that it might be better not to reread the record at all, though. As far as this patch goes, I think we need a solution that doesn't involve fetching EndOfLog from a variable that's only sometimes initialized and then not doing anything with it except in the cases where it was initialized. As for EndOfLogTLI, I'm afraid I don't think that's the same thing as XLogCtl->replayEndTLI. Now, it's hard to be sure, because I don't think the regression tests contain any scenarios where we run recovery and the values end up different. However, I think that the code sets EndOfLogTLI to the TLI of the last WAL file that we read, and I think XLogCtl->replayEndTLI gets set to the timeline from which that WAL record originated. So imagine that we are looking for WAL that ought to be in 00010003 but we don't find it; instead we find 00020003 because our recovery target timeline is 2, or something that has 2 in its history. We will read the WAL for timeline 1 from this file which has timeline 2 in the file name. I think if recovery ends in this file before the timeline switch, these values will be different. I did not try to construct a test case for this today due to not having enough time, so it's possible that I'm wrong about this, but that's how it looks to me from the code. -- Robert Haas EDB: http://www.enterprisedb.com
Re: simplifying foreign key/RI checks
Amit Langote writes: > On Fri, Nov 12, 2021 at 8:19 AM Tom Lane wrote: >> Anyway, I think that (1) we should write some more test cases around >> this behavior, (2) you need to establish the snapshot to use in two >> different ways for the RI_FKey_check and ri_Check_Pk_Match cases, >> and (3) something's going to have to be done to repair the behavior >> in v14 (unless we want to back-patch this into v14, which seems a >> bit scary). > Okay, I'll look into getting 1 and 2 done for this patch and I guess > work with Alvaro on 3. Actually, it seems that DETACH PARTITION is broken for concurrent serializable/repeatable-read transactions quite independently of whether they attempt to make any FK checks [1]. If we do what I speculated about there, namely wait out all such xacts before detaching, it might be possible to fix (3) just by reverting the problematic change in ri_triggers.c. I'm thinking the wait would render it unnecessary to get FK checks to do anything weird about partition lookup. But I might well be missing something. regards, tom lane [1] https://www.postgresql.org/message-id/1849918.1636748862%40sss.pgh.pa.us
Re: Is heap_page_prune() stats collector accounting wrong?
On Fri, Nov 12, 2021 at 11:29 AM Peter Geoghegan wrote: > We compensate here precisely because we are not running in VACUUM (it > has to be an opportunistic prune in practice). > If we're not running in VACUUM, and have to make a statistics > collector call, then we don't want to forget about DEAD tuples that > were pruned-away (i.e. no longer have tuple storage) when they still > have an LP_DEAD stub item. There is obviously no justification for > just ignoring LP_DEAD items there, because we don't know when VACUUM > is going to run next (since we are not VACUUM). Attached patch clears this up by adding some comments. It also moves the call to pgstat_update_heap_dead_tuples() from heap_page_prune() to heap_page_prune_opt(), which feels like a better place for it to me. -- Peter Geoghegan v1-0001-Clear-up-ndeleted-pgstats-accounting.patch Description: Binary data
Re: [PATCH v2] use has_privs_for_role for predefined roles
On Wed, Nov 10, 2021 at 12:45 PM Bossart, Nathan wrote: > > On 11/8/21, 2:19 PM, "Joshua Brindle" wrote: > > Thanks for the review, attached is an update with that comment fixed > > and also sgml documentation changes that I missed earlier. > > I think there are a number of documentation changes that are still > missing. I did a quick scan and saw the "is member of" language in > func.sgml, monitoring.sgml, pgbuffercache.sgml, pgfreespacemap.sgml, > pgrowlocks.sgml, pgstatstatements.sgml, and pgvisibility.sgml. All of these and also adminpack.sgml updated. I think that is all of them but docs broken across lines and irregular wording makes it difficult. > > By default, the pg_shmem_allocations view can be > - read only by superusers or members of the > pg_read_all_stats > - role. > + read only by superusers or roles with privilges of the > + pg_read_all_stats role. > > > > nitpick: "privileges" is misspelled. Fixed, thanks for reviewing. 0001-use-has_privs_for_roles-for-predefined-role-checks.patch Description: Binary data
ALTER TABLE DETACH PARTITION violates serializability
I wasn't aware of $SUBJECT ... were you? Here's a demonstration: drop table if exists pk, fk, pk1, pk2; create table pk (f1 int primary key) partition by list(f1); create table pk1 partition of pk for values in (1); create table pk2 partition of pk for values in (2); insert into pk values(1); insert into pk values(2); create table fk (f1 int references pk); insert into fk values(1); insert into fk values(2); In session 1, next do regression=# begin isolation level serializable; BEGIN regression=*# select * from unrelated_table; -- to set the xact snapshot ... Now in session 2, do regression=# delete from fk where f1=2; DELETE 1 regression=# alter table pk detach partition pk2; ALTER TABLE Back at session 1, we now see what's not only a serializability violation, but a not-even-self-consistent view of the database: regression=*# select * from fk; f1 1 2 (2 rows) regression=*# select * from pk; f1 1 (1 row) This is slightly ameliorated by the fact that if session 1 has already touched either pk or fk, locking considerations will block the DETACH. But only slightly. (Dropping a partition altogether has the same issue, of course.) AFAICS, the only real way to fix this is to acquire lock on the target partition and then wait out any snapshots that are older than the lock, just in case those transactions would look at the partitioned table later. I'm not sure if we want to go there, but if we don't, we at least have to document this gotcha. regards, tom lane
Re: Improving psql's \password command
On 11/12/21, 11:58 AM, "Tom Lane" wrote: > "Bossart, Nathan" writes: >> I haven't started on the SIGINT stuff yet, but I did extract the >> PQuser() fix so that we can hopefully get that one out of the way. I >> made some small adjustments in an attempt to keep the branching in >> this function from getting too complicated. > > Right, it makes sense to consider just this much as the back-patchable > part, and leave the more complicated messing with simple_prompt() > for HEAD only. I made a couple further cosmetic tweaks and pushed it. Thanks! Nathan
Re: Improving psql's \password command
"Bossart, Nathan" writes: > I haven't started on the SIGINT stuff yet, but I did extract the > PQuser() fix so that we can hopefully get that one out of the way. I > made some small adjustments in an attempt to keep the branching in > this function from getting too complicated. Right, it makes sense to consider just this much as the back-patchable part, and leave the more complicated messing with simple_prompt() for HEAD only. I made a couple further cosmetic tweaks and pushed it. regards, tom lane
Re: Is heap_page_prune() stats collector accounting wrong?
On Fri, Nov 12, 2021 at 10:45 AM Peter Geoghegan wrote: > Let's assume that somehow I have it wrong. Even then, why should we > compensate like this for the stats collector, but not for VACUUM? > There's certainly no corresponding code in vacuumlazy.c that does a > similar transformation with ndeleted. I think that I figured it out myself, but it's very confusing. Could definitely do with a better explanatory comment. Here's what I think is actually going on here: We compensate here precisely because we are not running in VACUUM (it has to be an opportunistic prune in practice). If we're running in VACUUM, then we are justified in ignoring newly-pruned LP_DEAD items, because 1.) They're going to be turned into LP_UNUSED items in the same VACUUM anyway (barring edge-cases), and 2.) Newly-pruned LP_DEAD items are really no different to existing LP_DEAD items that VACUUM found on the page before its own pruning operation even began -- if VACUUM wants to count them or report on them at all, then it had better count them itself, after pruning (which it only began to do in Postgres 14). If we're not running in VACUUM, and have to make a statistics collector call, then we don't want to forget about DEAD tuples that were pruned-away (i.e. no longer have tuple storage) when they still have an LP_DEAD stub item. There is obviously no justification for just ignoring LP_DEAD items there, because we don't know when VACUUM is going to run next (since we are not VACUUM). Does that sound correct? -- Peter Geoghegan
Re: [BUG]Invalidate relcache when setting REPLICA IDENTITY
On Fri, Nov 12, 2021, at 3:10 AM, houzj.f...@fujitsu.com wrote: > On Fri, Nov 12, 2021 1:33 PM Amit Kapila wrote: > > On Fri, Nov 12, 2021 at 10:50 AM Amit Kapila > > wrote: > > > But won't that generate invalidation for the rel twice in the case > > > (change Replica Identity from Nothing to some index) you mentioned in > > > the previous email? > > > > > > > Oh, I see the point. I think this is okay because > > AddRelcacheInvalidationMessage doesn't allow to add duplicate rel > > invalidation. If that is the case I wonder why not simply register > > invalidation without any check in the for loop as was the case with > > Tang's original patch? > > OK, I also think the code in Tang's original patch is fine. > Attach the patch which register invalidation without any check in the for > loop. WFM. -- Euler Taveira EDB https://www.enterprisedb.com/
Is heap_page_prune() stats collector accounting wrong?
The following code appears at the end of heap_page_prune(): /* * If requested, report the number of tuples reclaimed to pgstats. This is * ndeleted minus ndead, because we don't want to count a now-DEAD root * item as a deletion for this purpose. */ if (report_stats && ndeleted > prstate.ndead) pgstat_update_heap_dead_tuples(relation, ndeleted - prstate.ndead); In general, heap_page_prune() has to know the exact definition of ndeleted (which is its return value) -- and the definition is kind of subtle. The definition is really "the number of tuples that had storage before pruning, but not after pruning, because pruning considered them DEAD and removed the tuple itself (though not necessarily its line pointer, which could have just been marked LP_DEAD rather than LP_UNUSED)". As far as I know, the handling for all that at the end of heap_prune_chain() (at the point that we decide some tuples from a HOT chain are considered DEAD) is correct. It handles all the subtleties in a way that's consistent. I can't see why we should need to compensate like this (by subtracting prstate.ndead) at all, because, as I said, heap_page_prune() *already* does everything for us. In particular, if the now-DEAD root item started out as an LP_NORMAL tuple with storage (a root item that is a plain heap tuple, or just a simple tuple on its own), then we already make sure to count it in ndeleted here, at the end of heap_prune_chain(): /* * If the root entry had been a normal tuple, we are deleting it, so * count it in the result. But changing a redirect (even to DEAD * state) doesn't count. */ if (ItemIdIsNormal(rootlp)) ndeleted++; So we're always going to count tuples that had storage but then got pruned away in ndeleted -- we *do* want to count root items when they're LP_NORMAL. We also avoid counting root items when that is inappropriate: when they started out as an LP_REDIRECT, but became LP_DEAD. It would be inappropriate in the case of an LP_REDIRECT root item specifically, because it doesn't start out with tuple storage anyway (heap-only tuples from the same HOT chain are another matter). So why compensate when calling pgstat_update_heap_dead_tuples() at all? Let's assume that somehow I have it wrong. Even then, why should we compensate like this for the stats collector, but not for VACUUM? There's certainly no corresponding code in vacuumlazy.c that does a similar transformation with ndeleted. -- Peter Geoghegan
Re: RFC: Logging plan of the running query
On Wed, Oct 13, 2021 at 05:28:30PM +0300, Ekaterina Sokolova wrote: > Hi, hackers! > > • pg_query_state is contrib with 2 patches for core (I hope someday > Community will support adding this patches to PostgreSQL). It contains I reviewed this version of the patch - I have some language fixes. I didn't know about pg_query_state, thanks. > To improve this situation, this patch adds > pg_log_current_query_plan() function that requests to log the > plan of the specified backend process. To me, "current plan" seems to mean "plan of *this* backend" (which makes no sense to log). I think the user-facing function could be called pg_log_query_plan(). It's true that the implementation is a request to another backend to log its *own* query plan - but users shouldn't need to know about the implementation. > +Only superusers can request to log plan of the running query. .. log the plan of a running query. > +Note that nested statements (statements executed inside a function) are > not > +considered for logging. Only the deepest nesting query's plan is logged. Only the plan of the most deeply nested query is logged. > + (errmsg("backend with PID %d is not running a > query", > + MyProcPid))); The extra parens around errmsg() are not needed since e3a87b499. > + (errmsg("backend with PID %d is now holding a > page lock. Try again", remove "now" > + (errmsg("plan of the query running on backend with PID > %d is:\n%s", > + MyProcPid, es->str->data), Maybe this should say "query plan running on backend with PID 17793 is:" > + * would cause lots of log messages and which can lead to denial of remove "and" > + errmsg("must be a superuser to log information > about specified process"))); I think it should say not say "specified", since that sounds like the user might have access to log information about some other processes: | must be a superuser to log information about processes > + > + proc = BackendPidGetProc(pid); > + > + /* > + * BackendPidGetProc returns NULL if the pid isn't valid; but by the > time > + * we reach kill(), a process for which we get a valid proc here might > + * have terminated on its own. There's no way to acquire a lock on an > + * arbitrary process to prevent that. But since this mechanism is > usually > + * used to below purposes, it might end its own first and the > information used for below purposes, -- Justin
Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation
On Fri, Nov 12, 2021, at 9:41 AM, Daniel Gustafsson wrote: > > On 4 Nov 2021, at 05:24, Bharath Rupireddy > > wrote: > > > > On Wed, Nov 3, 2021 at 6:21 PM Daniel Gustafsson wrote: > >> > >>> On 26 Jul 2021, at 09:33, Bharath Rupireddy > >>> wrote: > >> > >>> PSA v7. > >> > >> This patch no longer applies on top of HEAD, please submit a rebased > >> version. > > > > Here's a rebased v8 patch. Please review it. > > This improves the user experience by increasing the granularity of error > reporting, and maps well with the precedent set in 81d5995b4. I'm marking > this > Ready for Committer and will go ahead and apply this unless there are > objections. Shouldn't we modify errdetail_relkind_not_supported() to include relpersistence as a 2nd parameter and move those messages to it? I experiment this idea with the attached patch. The idea is to provide a unique function that reports accurate detail messages. -- Euler Taveira EDB https://www.enterprisedb.com/ From f068d4688a95c8e8c5a98d8d6f1847ddfafda43c Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Thu, 4 Nov 2021 04:22:21 + Subject: [PATCH v9] Improve publication error messages Provide accurate messages for CREATE PUBLICATION commands. While at it, modify errdetail_relkind_not_supported to distinguish tables based on its persistence property. It provides accurate detail messages for future work. It also includes tests to cover temporary, unlogged, system and foreign tables. --- contrib/amcheck/verify_heapam.c| 2 +- contrib/pageinspect/rawpage.c | 2 +- contrib/pg_surgery/heap_surgery.c | 2 +- contrib/pg_visibility/pg_visibility.c | 2 +- contrib/pgstattuple/pgstatapprox.c | 2 +- contrib/pgstattuple/pgstatindex.c | 2 +- contrib/pgstattuple/pgstattuple.c | 2 +- contrib/postgres_fdw/expected/postgres_fdw.out | 6 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 5 + src/backend/catalog/pg_class.c | 10 -- src/backend/catalog/pg_publication.c | 13 +++-- src/backend/commands/comment.c | 2 +- src/backend/commands/indexcmds.c | 2 +- src/backend/commands/lockcmds.c| 5 +++-- src/backend/commands/seclabel.c| 2 +- src/backend/commands/sequence.c| 2 +- src/backend/commands/statscmds.c | 2 +- src/backend/commands/tablecmds.c | 9 + src/backend/commands/trigger.c | 6 +++--- src/backend/executor/execReplication.c | 2 +- src/backend/parser/parse_utilcmd.c | 2 +- src/backend/rewrite/rewriteDefine.c| 4 ++-- src/include/catalog/pg_class.h | 2 +- src/test/regress/expected/publication.out | 16 src/test/regress/sql/publication.sql | 14 ++ 25 files changed, 80 insertions(+), 38 deletions(-) diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index bae5340111..43854d1361 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -314,7 +314,7 @@ verify_heapam(PG_FUNCTION_ARGS) (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot check relation \"%s\"", RelationGetRelationName(ctx.rel)), - errdetail_relkind_not_supported(ctx.rel->rd_rel->relkind))); + errdetail_relkind_not_supported(ctx.rel->rd_rel->relkind, ctx.rel->rd_rel->relpersistence))); /* * Sequences always use heap AM, but they don't show that in the catalogs. diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c index 4bfa346c24..2259068d73 100644 --- a/contrib/pageinspect/rawpage.c +++ b/contrib/pageinspect/rawpage.c @@ -160,7 +160,7 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno) (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot get raw page from relation \"%s\"", RelationGetRelationName(rel)), - errdetail_relkind_not_supported(rel->rd_rel->relkind))); + errdetail_relkind_not_supported(rel->rd_rel->relkind, rel->rd_rel->relpersistence))); /* * Reject attempts to read non-local temporary relations; we would be diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surgery.c index 7edfe4f326..1c90c87c04 100644 --- a/contrib/pg_surgery/heap_surgery.c +++ b/contrib/pg_surgery/heap_surgery.c @@ -110,7 +110,7 @@ heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt) (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot operate on relation \"%s\"", RelationGetRelationName(rel)), - errdetail_relkind_not_supported(rel->rd_rel->relkind))); + errdetail_relkind_not_supported(rel->rd_rel->relkind, rel->rd_rel->relpersistence))); if (rel->rd_rel->relam != HEAP_TABLE_AM_OID) ereport(ERROR, diff --git a/contrib/pg_visibility/pg_visibility.c
Re: support for MERGE
On 2021-Nov-12, Tomas Vondra wrote: > On 11/12/21 18:57, Alvaro Herrera wrote: > > Secondarily, and I'm now not sure that I really want to do it, is change > > the representation for executor: instead of creating a fake join between > > target and source, perhaps we should have just source, and give > > optimizer a separate query to fetch tuples from target. > > When you say you're not sure you want to change this, is that because you > don't have time for that, or because you think the current approach is > better? I'm not sure that doing it the other way will be better. We'll have two queries to optimize: one is reading from the data source, and the other is fetching tuples from the target table based on the conditions applied to each row obtained form the data source. Right now, we just apply a join and let the optimizer do it's thing. It's simpler, but ... -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: XTS cipher mode for cluster file encryption
On Mon, Nov 1, 2021 at 02:24:36PM -0400, Stephen Frost wrote: > I can understand the general idea that we should be sure to engineer > this in a way that multiple methods can be used, as surely one day folks > will say that AES128 isn't acceptable any more. In terms of what we'll > do from the start, I would think providing the options of AES128 and > AES256 would be good to ensure that we have the bits covered to support > multiple methods and I don't think that would put us into a situation of > having to really explain which to use to users (we don't for pgcrypto > anyway, as an example). I agree that we shouldn't be looking at adding > in a whole new crypto library for this though, that's a large and > independent effort (see the work on NSS happening nearby). Since it has been two weeks since the last activity on this thread, I have updated the TDE wiki to reflect the conclusions and discussions: https://wiki.postgresql.org/wiki/Transparent_Data_Encryption -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: support for MERGE
On 11/12/21 18:57, Alvaro Herrera wrote: Here's a new version. Many of the old complaints have been fixed; particularly, the handling of partitioned tables is now much cleaner and straightforward. Amit Langote helped considerably in getting this part to shape -- thanks for that. Amit also helped correct the EvalPlanQual behavior, which wasn't quite up to snuff. Thanks! There are a few things that can still be improved here. For one, I need to clean up the interactions with table AM (and thus heapam.c etc). Secondarily, and I'm now not sure that I really want to do it, is change the representation for executor: instead of creating a fake join between target and source, perhaps we should have just source, and give optimizer a separate query to fetch tuples from target. When you say you're not sure you want to change this, is that because you don't have time for that, or because you think the current approach is better? What I did do is change how the target table is represented from parse analysis to executor. For example, in the original code, there were two RTEs that represented the target table; that is gone. Now the target table is always just the query's resultRelation. This removes a good number of ugly hacks that had been objected to. I'll park this in the January commitfest now. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: increase size of pg_commit_ts buffers
Hi, I think this is ready to go. I was wondering why it merely doubles the number of buffers, as described by previous comments. That seems like a very tiny increase, so considering how much the hardware grew over the last few years it'd probably fail to help some of the large boxes. But it turns out that's not what the patch does. The change is this > - return Min(16, Max(4, NBuffers / 1024)); > + return Min(256, Max(4, NBuffers / 512)); So it does two things - (a) it increases the maximum from 16 to 256 (so 16x), and (b) it doubles the speed how fast we get there. Until now we add 1 buffer per 1024 shared buffers, and the maximum would be reached with 128MB. The patch lowers the steps to 512, and the maximum to 1GB. So this actually increases the number of commit_ts buffers 16x, not 2x. That seems reasonable, I guess. The increase may be smaller for systems with less that 1GB shared buffers, but IMO that's a tiny minority of production systems busy enough for this patch to make a difference. The other question is of course what overhead could this change have on workload that does not have issues with commit_ts buffers (i.e. it's using commit_ts, but would be fine with just the 16 buffers). But my guess is this is negligible, based on how simple the SLRU code is and my previous experiments with SLRU. So +1 to just get this committed, as it is. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Parallel Append can break run-time partition pruning
On Mon, Nov 02, 2020 at 01:50:57PM +1300, David Rowley wrote: > On Tue, 27 Oct 2020 at 19:40, Amit Langote wrote: > > Some comments: > > Thanks for having a look at this. > > I've made some adjustments to those comments and pushed. commit a929e17e5 doesn't appear in the v14 release notes, but I wanted to mention that this appears to allow fixing a rowcount mis-estimate for us, which I had reported before: https://www.postgresql.org/message-id/20170326193344.GS31628%40telsasoft.com https://www.postgresql.org/message-id/20170415002322.ga24...@telsasoft.com https://www.postgresql.org/message-id/20170524211730.gm31...@telsasoft.com And others have reported before: https://www.postgresql.org/message-id/flat/7df51702-0f6a-4571-80bb-188aaef26...@gmail.com https://www.postgresql.org/message-id/SG2PR01MB29673BE6F7AA24424FDBFF60BC670%40SG2PR01MB2967.apcprd01.prod.exchangelabs.com For years, our reports have included a generated WHERE clause for each table being queried, to allow each table's partitions to be properly pruned/excluded (not just one table, as happened if we used a single WHERE clause). That worked, but then the planner underestimates the rowcount, since it doesn't realize that the conditions are redundant (since "equality classes" do not handle the inequality conditions). In v14, one WHERE clause per table still gives an underestimate; but, now multiple WHERE clauses aren't required, because a single WHERE clause excludes partitions from each table, and the rowcount from the elided partitions is excluded from the Append rowcount at plan time. Thanks for this feature ! -- Justin
Re: Logical replication timeout problem
I made a mistake in the configuration of my test script, in fact I cannot reproduce the problem at the moment. Yes, on the original environment there is physical replication, that's why for the lab I configured 2 nodes with physical replication. I'll try new tests next week Regards On Fri, Nov 12, 2021 at 7:23 AM Amit Kapila wrote: > On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis > wrote: > > > > Hello, > > Our lab is ready now. Amit, I compile Postgres 10.18 with your > patch.Tang, I used your script to configure logical replication between 2 > databases and to generate 10 million entries in an unreplicated foo table. > On a standalone instance no error message appears in log. > > I activate the physical replication between 2 nodes, and I got following > error: > > > > 2021-11-10 10:49:12.297 CET [12126] LOG: attempt to send keep alive > message > > 2021-11-10 10:49:12.297 CET [12126] STATEMENT: START_REPLICATION > 0/300 TIMELINE 1 > > 2021-11-10 10:49:15.127 CET [12064] FATAL: terminating logical > replication worker due to administrator command > > 2021-11-10 10:49:15.127 CET [12036] LOG: worker process: logical > replication worker for subscription 16413 (PID 12064) exited with exit code > 1 > > 2021-11-10 10:49:15.155 CET [12126] LOG: attempt to send keep alive > message > > > > This message look like strange because no admin command have been > executed during data load. > > I did not find any error related to the timeout. > > The message coming from the modification made with the patch comes back > all the time: attempt to send keep alive message. But there is no "sent > keep alive message". > > > > Why logical replication worker exit when physical replication is > configured? > > > > I am also not sure why that happened may be due to > max_worker_processes reaching its limit. This can happen because it > seems you configured both publisher and subscriber in the same > cluster. Tang, did you also see the same problem? > > BTW, why are you bringing physical standby configuration into the > test? Does in your original setup where you observe the problem the > physical standbys were there? > > -- > With Regards, > Amit Kapila. >
SKIP LOCKED assert triggered
The combination of these two statements in a transaction hits an Assert in heapam.c at line 4770 on REL_14_STABLE BEGIN; SELECT * FROM queue LIMIT 1 FOR UPDATE SKIP LOCKED; ... UPDATE queue SET status = 'UPDATED' WHERE id = :id; COMMIT; pgbench reliably finds this, running from inside a PL/pgSQL function. Alvaro suggests we just ignore the Assert, which on testing appears to be the right approach. Patch attached, which solves it for me. There is no formal test that does lock then update, so I have attempted to create one, but this has not successfully reproduced it (attached anyway), but this code is different from the pgbench test. -- Simon Riggshttp://www.EnterpriseDB.com/ skip_locked_assert.v1.patch Description: Binary data skip_locked_then_update_test.v1.patch Description: Binary data
Re: [PATCH] postgres_fdw: suppress explicit casts in text:text comparisons (was: column option to override foreign types)
"Dian M Fay" writes: > Eminently reasonable all around! `git apply` insisted that the v8 patch > didn't (apply, that is), but `patch -p1` liked it fine. I've put it > through a few paces and it seems good; what needs to happen next? I don't see anything else to do except shove it out into the light of day and see what happens. Hence, pushed. As I remarked in the commit message: >> One point that I (tgl) remain slightly uncomfortable with is that we >> will ignore an implicit RelabelType when deciding if the non-Const input >> is a plain Var. That makes it a little squishy to argue that the remote >> should resolve the Const as being of the same type as its Var, because >> then our Const is not the same type as our Var. However, if we don't do >> that, then this hack won't work as desired if the user chooses to use >> varchar rather than text to represent some remote column. That seems >> useful, so do it like this for now. We might have to give up the >> RelabelType-ignoring bit if any problems surface. I think we can await complaints before doing more, but I wanted that bit on record for anyone perusing the archives. regards, tom lane
Re: Commitfest 2021-11 Patch Triage - Part 2
> On 9 Nov 2021, at 18:43, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> Daniel Gustafsson writes: >>> 2773: libpq compression >>> === >>> This patch intended to provide libpq connection compression to "replace SSL >>> compression" which was doomed when the patch was written, and have since >>> been >>> removed altogether. The initial approach didn't get much traction but there >>> was significant discussion and work, which has since fizzled out. The patch >>> has been updated but there hasn't been meaningful review the past months, >>> the >>> last comments seem to imply there being a fair amount of questionmarks left >>> in >>> here. Robert, having been very involved in this do you have any thoughts on >>> where we are and where to go (if at all IYO)? >> >> I'm not Robert, but I still have an opinion here, and that it's that this >> feature would at best be an attractive nuisance. If you need compression >> on a database session, it probably means that the connection is over the >> open internet, which means that you need encryption even more. And we >> know that compression and encryption do not play well together. The >> reason compression was taken out of the latest TLS standards is not that >> they wouldn't have liked to have it, nor that applying compression in a >> separate code layer would be any safer. I fear offering this would >> merely lead people to build CVE-worthy setups. > > I've got an opinion on this also and I don't agree that needing > compression means you're on the open internet or that we shouldn't allow > users the choice to decide if they want to use compression and > encryption together or not. Yes, there's potential risks there, but I > don't think those risks would lead to CVEs against PG for supporting a > mode where we allow compression and then also allow encryption- if > that's a problem then it's an issue for the encryption library/method > being used and isn't the fault of us for allowing that combination. I'm not so sure, if the parts aren't vulnerable separately on their own but the combination of them + libpq allows for weakening encryption in a TLS stream I'd say the CVE is on us. > Further, the general issue with encryption+compression attacks is when > the attacker is able to inject chosen plaintext content into what ends > up being compressed and encrypted, and then is able to observe the > compressed+encrypted traffic, and that's not nearly as easy to do when > you're talking about a database connection vs. a case where an attacker > is able to convince a victim to go to a malicious website and through > that is able to do cross-site requests to inject the known plaintext > into requests to a trusted webpage and see the results coming back. I agree with this, the known weaknesses rely on mechanisms that aren't readily applicable to postgres connections. I don't remember if the patch already does this, but if not we should ensure we only negotiate compression after authentication like how OpenSSH does [0]. -- Daniel Gustafsson https://vmware.com/ [0] https://www.openssh.com/txt/draft-miller-secsh-compression-delayed-00.txt
Re: Test::More version
On 11/12/21 10:05, Tom Lane wrote: > Andrew Dunstan writes: >> On 11/11/21 15:44, Tom Lane wrote: >>> Yeah ... configure is already checking those versions, so maybe we could >>> make it print them out along the way? I'd been thinking of doing exactly >>> this just for documentation purposes, so if there's a concrete need for >>> it, let's get it done. >> The attached seems to do the trick: > LGTM (by eyeball, didn't test) Thanks. pushed a slightly more robust version. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display
On Thu, Nov 11, 2021 at 6:07 PM Alvaro Herrera wrote: > On 2021-Nov-10, Bossart, Nathan wrote: > > > On 11/10/21, 9:43 AM, "Bharath Rupireddy" < > bharath.rupireddyforpostg...@gmail.com> wrote: > > > As discussed in [1], isn't it a better idea to add some of activity > > > messages [2] such as recovery, archive, backup, streaming etc. to > > > server logs at LOG level? > > > I think this would make the logs far too noisy for many servers. For > > archiving alone, this could cause tens of thousands more log messages > > per hour on a busy system. I think you can already see such > > information at a debug level, anyway. > > Yeah. If we had some sort of ring buffer in which to store these > messages, the user could examine them through a view; they would still > be accessible in a running server, but they would not be written to the > server log. > That's a good idea. How about also adding some GUC(s) to the log archive, recovery related log messages just like we have for checkpoints, autovacuum etc? Maybe something like log_archive, log_recovery etc. -- With Regards, Ashutosh Sharma.
Re: Patch abstracts in the Commitfest app
On 11/12/21 7:51 AM, Daniel Gustafsson wrote: It should be a free-form textfield, and only apply to new submissions to avoid a need to backfill. The field should, IMHO, be mandatory. The implementation can be left to discussion on -www iff this list agrees that it would be a good thing. +1. The patch title is often not enough to tell what the patch does, and patches also change over time. The abstract may also be helpful to the committer when generating a commit message. Regards, -- -David da...@pgmasters.net
Re: Test::More version
Peter Eisentraut writes: > On 11.11.21 21:04, Andrew Dunstan wrote: >> But I'd really like it if we could shift the goalposts a bit and require >> 0.96. subtests would be really nice to have available. Unfortunately I >> don't have any idea what versions of Test::More are actually in use. > My initial patch for TAP tests used subtests, and IIRC that was taken > out because the Perl that comes with CentOS 6 is too old. I retired my RHEL6 installation a year ago, but I still have backups at hand (...digs...) Yeah, its Test/More.pm says $VERSION = '0.92'; so your recollection is accurate. RHEL6 has been EOL for awhile, and that specific platform isn't in the buildfarm, so it needn't constrain our decision. But it may well be that some of the older buildfarm members have similarly ancient Test::More. I await results from Andrew's patch. regards, tom lane
Re: RecoveryInProgress() has critical side effects
On 11/11/21, 9:20 AM, "Robert Haas" wrote: > memory, and not much is lost if they are set up and not used. So, in > 0001, I propose to move the call to InitXLogInsert() from > InitXLOGAccess() to BaseInit(). That means every backend will do it > the first time it starts up. It also means that CreateCheckPoint() > will no longer need a special case call to InitXLogInsert(), which > seems like a good thing. 0001 looks reasonable to me. > connected. However, it does have the effect of introducing a branch > into GetFullPageWritesInfo(), which I'm not 100% sure is cheap enough > not to matter. I think the obvious alternative is to just not > initialize RedoRecPtr and doPageWrites at all and document in the > comments that the first time each backend does something with them > it's going to end up retrying once; perhaps that is preferable. Going Unless there's a demonstrable performance benefit from adding the branch, my preference is to leave it out. I could be off-base, but it seems possible that future changes might end up depending on any side effects from this new branch, which is exactly what you're trying to fix. Plus, always using the retry path is a good way to test that it works as expected. Nathan
Re: Add additional information to src/test/ssl/README
Daniel Gustafsson writes: > Attached is a small addition mentioning PG_TEST_NOCLEAN Maybe could use a bit of copy-editing, say Data directories will also be left behind for analysis when a test fails; they are named according to the test filename. But if the environment variable PG_TEST_NOCLEAN is set, data directories will be retained regardless of test status. regards, tom lane
Re: Add additional information to src/test/ssl/README
> On 31 Oct 2021, at 23:13, Tom Lane wrote: > Done that way; feel free to add more material to perl/README. Attached is a small addition mentioning PG_TEST_NOCLEAN -- Daniel Gustafsson https://vmware.com/ readme_retaindir.diff Description: Binary data
Re: Test::More version
Andrew Dunstan writes: > On 11/11/21 15:44, Tom Lane wrote: >> Yeah ... configure is already checking those versions, so maybe we could >> make it print them out along the way? I'd been thinking of doing exactly >> this just for documentation purposes, so if there's a concrete need for >> it, let's get it done. > The attached seems to do the trick: LGTM (by eyeball, didn't test) regards, tom lane
Re: Patch abstracts in the Commitfest app
Julien Rouhaud writes: > On Fri, Nov 12, 2021 at 8:51 PM Daniel Gustafsson wrote: >> It should be a free-form textfield, and only apply to new submissions to >> avoid >> a need to backfill. The field should, IMHO, be mandatory. The >> implementation >> can be left to discussion on -www iff this list agrees that it would be a >> good >> thing. > +1 for the idea. Not sure if that should be mandatory though, as some > simple entries should be self explanatory. Yeah, there are plenty of entries for which the title seems adequate. No objection to making room for an additional optional description, though. > Really this is mainly > helpful for long and lengthy threads, when people start to ask a > summary of the situation, so it should be editable and maybe > automatically generates an email on the thread when that happens. Editable certainly, but I don't think we want any auto-generated mail. regards, tom lane
Re: Test::More version
On 11/11/21 15:44, Tom Lane wrote: > Alvaro Herrera writes: >> On 2021-Nov-11, Andrew Dunstan wrote: >>> Perhaps we could add a line to one of the TAP tests that would spit out >>> the version on the log? >> Maybe have it spit out the version of *all* the modules we require >> (which aren't all that many), not just Test::More? > Yeah ... configure is already checking those versions, so maybe we could > make it print them out along the way? I'd been thinking of doing exactly > this just for documentation purposes, so if there's a concrete need for > it, let's get it done. > The attached seems to do the trick: checking for perl module IPC::Run 0.79... 20200505.0 checking for perl module Test::More 0.87... 1.302183 checking for perl module Time::HiRes 1.52... 1.9764 > > I have no objection to updating prairiedog to whatever new minimum > version we settle on. But if there are other buildfarm animals with > old versions, that might be a reason not to change it. > > Yeah, let's see what's actually in use and then we can decide. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/config/ax_prog_perl_modules.m4 b/config/ax_prog_perl_modules.m4 index 70b3230ebd..03c25c3ab2 100644 --- a/config/ax_prog_perl_modules.m4 +++ b/config/ax_prog_perl_modules.m4 @@ -55,12 +55,12 @@ if test "x$PERL" != x; then AC_MSG_CHECKING(for perl module $ax_perl_module) # Would be nice to log result here, but can't rely on autoconf internals -$PERL -e "use $ax_perl_module; exit" > /dev/null 2>&1 +modversion=`$PERL -e "use $ax_perl_module; my \\\$x=q($ax_perl_module); \\\$x =~ s/ .*/::VERSION/; eval qq{print $\\\$x\\n}; exit;" 2>/dev/null` if test $? -ne 0; then AC_MSG_RESULT(no); ax_perl_modules_failed=1 else - AC_MSG_RESULT(ok); + AC_MSG_RESULT($modversion); fi done diff --git a/configure b/configure index 7d4e42dd62..d0f0f88a1b 100755 --- a/configure +++ b/configure @@ -19301,14 +19301,14 @@ if test "x$PERL" != x; then $as_echo_n "checking for perl module $ax_perl_module... " >&6; } # Would be nice to log result here, but can't rely on autoconf internals -$PERL -e "use $ax_perl_module; exit" > /dev/null 2>&1 +modversion=`$PERL -e "use $ax_perl_module; my \\\$x=q($ax_perl_module); \\\$x =~ s/ .*/::VERSION/; eval qq{print $\\\$x\\n}; exit;" 2>/dev/null` if test $? -ne 0; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 $as_echo "no" >&6; }; ax_perl_modules_failed=1 else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5 -$as_echo "ok" >&6; }; + { $as_echo "$as_me:${as_lineno-$LINENO}: result: $modversion" >&5 +$as_echo "$modversion" >&6; }; fi done
Re: Patch abstracts in the Commitfest app
> On Fri, Nov 12, 2021 at 03:36:43PM +0100, Daniel Gustafsson wrote: > > On 12 Nov 2021, at 15:24, Justin Pryzby wrote: > > > > On Fri, Nov 12, 2021 at 01:51:28PM +0100, Daniel Gustafsson wrote: > >> While reading through and working with the hundreds of patches in the CF > >> app a > >> small feature/process request struck me: it would be really helpful if the > >> patch had a brief abstract outlining what it aims to add or fix (or > >> summary, > >> description or something else; not sure what to call it). Basically a > >> two-sentence or so version of the email posting the patch to -hackers. > > > > This seems fine ; that purpose is partially served (and duplicated) by the > > patch commit messages (if used). > > That's the problem, many patches are in diff format and don't have commit > messages at all. There are also many entries with patchsets containing > multiple patches and thus commitmessages but the app will only link to one. Probably encouraging to use cover letters, generated for such patch series and linked by the CF app, would be a good idea?
Re: Patch abstracts in the Commitfest app
On 11/12/21, 4:53 AM, "Daniel Gustafsson" wrote: > While reading through and working with the hundreds of patches in the CF app a > small feature/process request struck me: it would be really helpful if the > patch had a brief abstract outlining what it aims to add or fix (or summary, > description or something else; not sure what to call it). Basically a > two-sentence or so version of the email posting the patch to -hackers. +1 Nathan
Re: Should AT TIME ZONE be volatile?
On Fri, Nov 12, 2021 at 8:42 AM Peter Eisentraut wrote: > There are standards for sort order, and the major hiccups we had in the > past were mostly moving from older versions of those standards to newer > versions. So at some point this should stabilize. Only if they don't keep making new versions of the standards. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Patch abstracts in the Commitfest app
> On 12 Nov 2021, at 15:24, Justin Pryzby wrote: > > On Fri, Nov 12, 2021 at 01:51:28PM +0100, Daniel Gustafsson wrote: >> While reading through and working with the hundreds of patches in the CF app >> a >> small feature/process request struck me: it would be really helpful if the >> patch had a brief abstract outlining what it aims to add or fix (or summary, >> description or something else; not sure what to call it). Basically a >> two-sentence or so version of the email posting the patch to -hackers. > > This seems fine ; that purpose is partially served (and duplicated) by the > patch commit messages (if used). That's the problem, many patches are in diff format and don't have commit messages at all. There are also many entries with patchsets containing multiple patches and thus commitmessages but the app will only link to one. > Actually, it's already possible to add an "annotation" to a given message-id. > I've just added this here: https://commitfest.postgresql.org/35/2800/ > > I think this that feature is rarely used, but I can see the value in using it > more, for long threads. I consider this feature very valuable, but I don't see it as a replacement for a text field Abstract in the patch entry. -- Daniel Gustafsson https://vmware.com/
Re: simplifying foreign key/RI checks
Amit Langote writes: > Whatever mechanism we will use would still need to involve setting a > global Snapshot variable though, right? In v14 we'll certainly still be passing the snapshot(s) to SPI, which will eventually make the snapshot active. With your patch, since we're just handing the snapshot to the scan mechanism, it seems at least theoretically possible that we'd not have to do PushActiveSnapshot on it. Not doing so might be a bad idea however; if there is any user-defined code getting called, it might have expectations about ActiveSnapshot being relevant. On the whole I'd be inclined to say that we still want the RI test_snapshot to be the ActiveSnapshot while performing the test. regards, tom lane
Re: Patch abstracts in the Commitfest app
On Fri, Nov 12, 2021 at 01:51:28PM +0100, Daniel Gustafsson wrote: > While reading through and working with the hundreds of patches in the CF app a > small feature/process request struck me: it would be really helpful if the > patch had a brief abstract outlining what it aims to add or fix (or summary, > description or something else; not sure what to call it). Basically a > two-sentence or so version of the email posting the patch to -hackers. This seems fine ; that purpose is partially served (and duplicated) by the patch commit messages (if used). Actually, it's already possible to add an "annotation" to a given message-id. I've just added this here: https://commitfest.postgresql.org/35/2800/ I think this that feature is rarely used, but I can see the value in using it more, for long threads. If you really wanted to make an "abstract" not associated with any message-id, I think it would be implemented simply by removing the restriction that an message-id must be specified and must match an existing message in the thead. -- Justin
Re: simplifying foreign key/RI checks
On Fri, Nov 12, 2021 at 10:58 AM Tom Lane wrote: > I wrote: > > Anyway, I think that (1) we should write some more test cases around > > this behavior, (2) you need to establish the snapshot to use in two > > different ways for the RI_FKey_check and ri_Check_Pk_Match cases, > > and (3) something's going to have to be done to repair the behavior > > in v14 (unless we want to back-patch this into v14, which seems a > > bit scary). > > I wrote that thinking that point (2), ie fix the choice of snapshots for > these RI queries, would solve the brokenness in partitioned tables, > so that (3) would potentially only require hacking up v14. > > However after thinking more I realize that (2) will break the desired > behavior for concurrent partition detaches, because that's being driven > off ActiveSnapshot. So we really need a solution that decouples the > partition detachment logic from ActiveSnapshot, in both branches. ISTM that the latest snapshot would still have to be passed to the find_inheritance_children_extended() *somehow* by ri_trigger.c. IIUC the problem with using the ActiveSnapshot mechanism to do that is that it causes the SPI query to see even user table rows that it shouldn't be able to, so that is why you say it is too global a mechanism for this hack. Whatever mechanism we will use would still need to involve setting a global Snapshot variable though, right? -- Amit Langote EDB: http://www.enterprisedb.com
Re: Test::More version
On 11.11.21 21:04, Andrew Dunstan wrote: But I'd really like it if we could shift the goalposts a bit and require 0.96. subtests would be really nice to have available. Unfortunately I don't have any idea what versions of Test::More are actually in use. My initial patch for TAP tests used subtests, and IIRC that was taken out because the Perl that comes with CentOS 6 is too old.
Re: Should AT TIME ZONE be volatile?
On 11.11.21 18:32, Robert Haas wrote: I agree with Tom that it sounds like a lot of work. And to be honest it's work that I don't really feel very excited about. It would be necessary to understand not only the bona fide sorting rules of every human language out there, which might actually be sort of fun at least for a while, but also to decide - probably according to some incomprehensible standard - how Japanese katakana ought to sort in comparison to, say, box-drawing characters, the Mongolian alphabet, and smiley-face emojis. I think it's not particularly likely that there are a whole lot of documents out there that include all of those things, but the comparison algorithm has to return something, and probably there are people who have strong feelings about what the right answers are. That's a pretty unappealing thing to tackle, and I am not volunteering. On the other hand, if we don't do it, I'm suspicious that things will never get any better. And that would be sad. There are standards for sort order, and the major hiccups we had in the past were mostly moving from older versions of those standards to newer versions. So at some point this should stabilize.
Re: Parallel vacuum workers prevent the oldest xmin from advancing
On 2021-Nov-11, Masahiko Sawada wrote: > On Thu, Nov 11, 2021 at 12:53 PM Amit Kapila wrote: > > > > On Thu, Nov 11, 2021 at 9:11 AM Andres Freund wrote: > > > This seems like an unnecessary optimization. > > > ProcArrayInstallRestoredXmin() > > > only happens in the context of much more expensive operations. > > > > Fair point. I think that will also make the change in > > ProcArrayInstallRestoredXmin() appear neat. > > This makes me think that it'd be better to copy status flags in a > separate function rather than ProcArrayInstallRestoredXmin(). To me, and this is perhaps just personal opinion, it seems conceptually simpler to have ProcArrayInstallRestoredXmin acquire exclusive and do both things. Why? Because if you have two functions, you have to be careful not to call the new function after ProcArrayInstallRestoredXmin; otherwise you would create an instant during which you make an Xmin-without-flag visible to other procs; this causes the computed xmin go backwards, which is verboten. If I understand Amit correctly, his point is about the callers of RestoreTransactionSnapshot, which are two: CreateReplicationSlot and ParallelWorkerMain. He wants you hypothetical new function called from the latter but not the former. Looking at both, it seems a bit strange to make them responsible for a detail such as "copy ->statusFlags from source proc to mine". It seems more reasonable to add a third flag to RestoreTransactionSnapshot(Snapshot snapshot, void *source_proc, bool is_vacuum) and if that is true, tell SetTransactionSnapshot to copy flags, otherwise not. ... unrelated to this, and looking at CreateReplicationSlot, I wonder why does it pass MyProc as the source_pgproc parameter. What good is that doing? I mean, if the only thing we do with source_pgproc is to copy stuff from source_pgproc to MyProc, then if source_pgproc is MyProc, we're effectively doing nothing at all. (You can't "fix" this by merely passing NULL, because what that would do is change the calling of ProcArrayInstallRestoredXmin into a call of ProcArrayInstallImportedXmin and that would presumably have different behavior.) I may be misreading the code of course, but it sounds like the intention of CreateReplicationSlot is to "do nothing" with the transaction snapshot in a complicated way. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Patch abstracts in the Commitfest app
On Fri, Nov 12, 2021 at 8:51 PM Daniel Gustafsson wrote: > > While reading through and working with the hundreds of patches in the CF app a > small feature/process request struck me: it would be really helpful if the > patch had a brief abstract outlining what it aims to add or fix (or summary, > description or something else; not sure what to call it). Basically a > two-sentence or so version of the email posting the patch to -hackers. > > The "title" field isn't always explanatory enough, and diving into a long > thread to piece together clues can be quite daunting. I'm especially thinking > about new contributors who would like to contribute but who have a hard time > finding something in the CF app (which is a complaint I've heard from multiple > such contributors). > > It should be a free-form textfield, and only apply to new submissions to avoid > a need to backfill. The field should, IMHO, be mandatory. The implementation > can be left to discussion on -www iff this list agrees that it would be a good > thing. +1 for the idea. Not sure if that should be mandatory though, as some simple entries should be self explanatory. Really this is mainly helpful for long and lengthy threads, when people start to ask a summary of the situation, so it should be editable and maybe automatically generates an email on the thread when that happens.
Patch abstracts in the Commitfest app
While reading through and working with the hundreds of patches in the CF app a small feature/process request struck me: it would be really helpful if the patch had a brief abstract outlining what it aims to add or fix (or summary, description or something else; not sure what to call it). Basically a two-sentence or so version of the email posting the patch to -hackers. The "title" field isn't always explanatory enough, and diving into a long thread to piece together clues can be quite daunting. I'm especially thinking about new contributors who would like to contribute but who have a hard time finding something in the CF app (which is a complaint I've heard from multiple such contributors). It should be a free-form textfield, and only apply to new submissions to avoid a need to backfill. The field should, IMHO, be mandatory. The implementation can be left to discussion on -www iff this list agrees that it would be a good thing. Thoughts? -- Daniel Gustafsson https://vmware.com/
Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation
> On 4 Nov 2021, at 05:24, Bharath Rupireddy > wrote: > > On Wed, Nov 3, 2021 at 6:21 PM Daniel Gustafsson wrote: >> >>> On 26 Jul 2021, at 09:33, Bharath Rupireddy >>> wrote: >> >>> PSA v7. >> >> This patch no longer applies on top of HEAD, please submit a rebased version. > > Here's a rebased v8 patch. Please review it. This improves the user experience by increasing the granularity of error reporting, and maps well with the precedent set in 81d5995b4. I'm marking this Ready for Committer and will go ahead and apply this unless there are objections. -- Daniel Gustafsson https://vmware.com/
Re: Printing backtrace of postgres processes
On Fri, Nov 12, 2021 at 5:15 PM Bharath Rupireddy wrote: > > On Thu, Nov 11, 2021 at 12:14 PM vignesh C wrote: > > Thanks for the comments, the attached v10 patch has the fixes for the same. > > Thanks for the patches. Here are some comments: > > 1) In the docs, let's have the similar description of > pg_log_backend_memory_contexts for pg_print_backtrace, just for the > continuity in the users readability. > > 2) I don't know how the part looks like in the Server > Signaling Functions table. I think here you can just say, it will emit > a warning and return false if not supported by the installation. And > you can give the part after the description where you are > showing a sample backtrace. > > +capture backtrace. If not available, the function will return false > +and a warning is issued, for example: > + > +WARNING: backtrace generation is not supported by this installation > + pg_print_backtrace > + > + f > + > + > + > > 3) Replace '!' with '.'. > + * Note: this is called within a signal handler! All we can do is set > > 4) It is not only the next CFI but also the process specific interrupt > handlers (in your 0002 patch) right? > + * a flag that will cause the next CHECK_FOR_INTERRUPTS to invoke > > 5) I think you need to update CATALOG_VERSION_NO, mostly the committer > will take care of it but just in case. > > 6) Be consistent with casing "Verify" and "Might" > +# Verify that log output gets to the file > +# might need to retry if logging collector process is slow... Few more comments: 7) Do we need TAP tests for this function? I think it is sufficient to test the function in misc_functions.sql, please remove 002_print_backtrace_validation.pl. Note that we don't do similar TAP testing for pg_log_backend_memory_contexts as well. 8) I hope you have manually tested the pg_print_backtrace for the startup process and other auxiliary processes. 9) I think we can have a similar description (to the patch [1]). with links to each process definition in the glossary so that it will be easier for the users to follow the links and know what each process is. Especially, I didn't like the 0002 mentioned about the BackendPidGetProc or AuxiliaryPidGetProc as these are internal to the server and the users don't care about them. - * end on its own first and its backtrace are not logged is not a problem. + * BackendPidGetProc or AuxiliaryPidGetProc returns NULL if the pid isn't + * valid; but by the time we reach kill(), a process for which we get a + * valid proc here might have terminated on its own. There's no way to + * acquire a lock on an arbitrary process to prevent that. But since this + * mechanism is usually used to debug a backend running and consuming lots + * of CPU cycles, that it might end on its own first and its backtrace are + * not logged is not a problem. */ Here's what I have written in the other patch [1], if okay please use this: +Requests to log memory contexts of the backend or the +WAL sender or +the auxiliary process +with the specified process ID. All of the +auxiliary processes +are supported except the logger +and the statistics collector +as they are not connected to shared memory the function can not make requests. +These memory contexts will be logged at LOG message level. +They will appear in the server log based on the log configuration set (See for more information), I have no further comments on v10. [1] - https://www.postgresql.org/message-id/CALj2ACU1nBzpacOK2q%3Da65S_4%2BOaz_rLTsU1Ri0gf7YUmnmhfQ%40mail.gmail.com Regards, Bharath Rupireddy.
Re: Printing backtrace of postgres processes
On Thu, Nov 11, 2021 at 12:14 PM vignesh C wrote: > Thanks for the comments, the attached v10 patch has the fixes for the same. Thanks for the patches. Here are some comments: 1) In the docs, let's have the similar description of pg_log_backend_memory_contexts for pg_print_backtrace, just for the continuity in the users readability. 2) I don't know how the part looks like in the Server Signaling Functions table. I think here you can just say, it will emit a warning and return false if not supported by the installation. And you can give the part after the description where you are showing a sample backtrace. +capture backtrace. If not available, the function will return false +and a warning is issued, for example: + +WARNING: backtrace generation is not supported by this installation + pg_print_backtrace + + f + + + 3) Replace '!' with '.'. + * Note: this is called within a signal handler! All we can do is set 4) It is not only the next CFI but also the process specific interrupt handlers (in your 0002 patch) right? + * a flag that will cause the next CHECK_FOR_INTERRUPTS to invoke 5) I think you need to update CATALOG_VERSION_NO, mostly the committer will take care of it but just in case. 6) Be consistent with casing "Verify" and "Might" +# Verify that log output gets to the file +# might need to retry if logging collector process is slow... Regards, Bharath Rupireddy.
Re: Allow users to choose what happens when recovery target is not reached
On Fri, Nov 12, 2021 at 6:14 PM Bharath Rupireddy wrote: > > Currently, the server shuts down with a FATAL error (added by commit > [1]) when the recovery target isn't reached. This can cause a server > availability problem, especially in case of disaster recovery (geo > restores) where the primary was down and the user is doing a PITR on a > server lying in another region where it had missed to receive few of > the last WAL files required to reach the recovery target. In this > case, users might want the server to be available rather than a no > server. With the commit [1], there's no way to achieve what users > wanted. if users don't mind if the recovery target is reached or not isn't it better to simply don't specify a target and let the recovery go as far as possible?
Re: add retry mechanism for achieving recovery target before emitting FATA error "recovery ended before configured recovery target was reached"
On Sat, Oct 23, 2021 at 1:46 AM Jeff Davis wrote: > What do you want to do after the timeout happens? If you want to issue > a WARNING instead of failing outright, perhaps that makes sense for > exploratory PITR cases. That could be a simple boolean GUC without > needing to introduce the timeout logic into the server. Thanks Jeff. I posted the patch in a separate thread[1] for new GUC (WARN + promotion or shutdown with FATAL error) in case the recovery target isn't reached. [1] - https://www.postgresql.org/message-id/flat/CALj2ACWR4iaph7AWCr5-V9dXqpf2p5B%3D3fTyvLfL8VD_E%2Bx0tA%40mail.gmail.com. Regards, Bharath Rupireddy.
Allow users to choose what happens when recovery target is not reached
Hi, Currently, the server shuts down with a FATAL error (added by commit [1]) when the recovery target isn't reached. This can cause a server availability problem, especially in case of disaster recovery (geo restores) where the primary was down and the user is doing a PITR on a server lying in another region where it had missed to receive few of the last WAL files required to reach the recovery target. In this case, users might want the server to be available rather than a no server. With the commit [1], there's no way to achieve what users wanted. There can be many reasons for the last few WAL files not reaching the target server where the user is performing the PITR. The primary may have been down before archiving the last few WAL files to the archive locations, or archive command fails for whatever reasons or network latency from primary to archive location and archive location to the target server, or recovery command on the target server fails or users may have chosen some wrong/futuristic recovery targets etc. If the PITR fails with FATAL error and we may ask them to restart the server, but imagine the wastage of compute resources - if there are a 1 TB of WAL files to be replayed and just last 16MB WAL file is missing, everything has to be replayed from the beginning. Here's a proposal(and a patch) to have a GUC so that users can choose either to emit a warning and promote or shutdown with FATAL error (as default) when recovery target isn't reached. In reality, users can choose to shutdown with FATAL error, if strict consistency is the necessity, otherwise they can choose to get promoted, if availability is preferred. There is some discussion around this idea in [2]. Thoughts? [1] - commit dc788668bb269b10a108e87d14fefd1b9301b793 Author: Peter Eisentraut Date: Wed Jan 29 15:43:32 2020 +0100 Fail if recovery target is not reached Before, if a recovery target is configured, but the archive ended before the target was reached, recovery would end and the server would promote without further notice. That was deemed to be pretty wrong. With this change, if the recovery target is not reached, it is a fatal error. Based-on-patch-by: Leif Gunnar Erlandsen Reviewed-by: Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b65383...@lako.no [2] - https://www.postgresql.org/message-id/b334d61396e6b0657a63dc38e16d429703fe9b96.camel%40j-davis.com Regards, Bharath Rupireddy. v1-0001-Allow-users-to-choose-what-happens-when-recovery-.patch Description: Binary data
RE: Logical replication timeout problem
On Friday, November 12, 2021 2:24 PM Amit Kapila wrote: > > On Thu, Nov 11, 2021 at 11:15 PM Fabrice Chapuis > wrote: > > > > Hello, > > Our lab is ready now. Amit, I compile Postgres 10.18 with your patch.Tang, > > I > used your script to configure logical replication between 2 databases and to > generate 10 million entries in an unreplicated foo table. On a standalone > instance > no error message appears in log. > > I activate the physical replication between 2 nodes, and I got following > > error: > > > > 2021-11-10 10:49:12.297 CET [12126] LOG: attempt to send keep alive > message > > 2021-11-10 10:49:12.297 CET [12126] STATEMENT: START_REPLICATION > 0/300 TIMELINE 1 > > 2021-11-10 10:49:15.127 CET [12064] FATAL: terminating logical replication > worker due to administrator command > > 2021-11-10 10:49:15.127 CET [12036] LOG: worker process: logical > > replication > worker for subscription 16413 (PID 12064) exited with exit code 1 > > 2021-11-10 10:49:15.155 CET [12126] LOG: attempt to send keep alive > message > > > > This message look like strange because no admin command have been executed > during data load. > > I did not find any error related to the timeout. > > The message coming from the modification made with the patch comes back all > the time: attempt to send keep alive message. But there is no "sent keep alive > message". > > > > Why logical replication worker exit when physical replication is configured? > > > > I am also not sure why that happened may be due to > max_worker_processes reaching its limit. This can happen because it > seems you configured both publisher and subscriber in the same > cluster. Tang, did you also see the same problem? > No. I used the default max_worker_processes value, ran logical replication and physical replication at the same time. I also changed the data in table on publisher. But didn't see the same problem. Regards Tang