Re: GUC flags
On Tue, Nov 30, 2021 at 03:36:45PM +0900, Michael Paquier wrote: > - gettext_noop("Forces a switch to the next WAL file if a > " > - "new file has not been started > within N seconds."), > + gettext_noop("Sets the amount of time to wait before > forcing a " > + "switch to the next WAL > file."), .. > - gettext_noop("Waits N seconds on connection startup > after authentication."), > + gettext_noop("Sets the amount of seconds to wait on > connection " > + "startup after > authentication."), "amount of time", like above. > - gettext_noop("Waits N seconds on connection startup > before authentication."), > + gettext_noop("Sets the amount of seconds to wait on > connection " > + "startup before > authentication."), same > { > {"checkpoint_warning", PGC_SIGHUP, WAL_CHECKPOINTS, > - gettext_noop("Enables warnings if checkpoint segments > are filled more " > - "frequently than this."), > + gettext_noop("Sets the maximum time before warning if > checkpoints " > + "triggered by WAL volume > happen too frequently."), > gettext_noop("Write a message to the server log if > checkpoints " > - "caused by the filling of > checkpoint segment files happens more " > + "caused by the filling of WAL > segment files happens more " It should say "happen" , since it's referring to "checkpoints". That was a pre-existing issue. > {"log_parameter_max_length", PGC_SUSET, LOGGING_WHAT, > - gettext_noop("When logging statements, limit logged > parameter values to first N bytes."), > + gettext_noop("Sets the maximum amount of data logged > for bind " > + "parameter values when logging > statements."), I think this one should actually say "in bytes" or at least say "maximum length". It seems unlikely that someone is going to specify this in other units, and it's confusing to everyone else to refer to "amount of data" instead of "length in bytes". > {"log_parameter_max_length_on_error", PGC_USERSET, LOGGING_WHAT, > - gettext_noop("When reporting an error, limit logged > parameter values to first N bytes."), > + gettext_noop("Sets the maximum amount of data logged > for bind " > + "parameter values when logging > statements, on error."), same > - gettext_noop("Automatic log file rotation will occur > after N minutes."), > + gettext_noop("Sets the maximum amount of time to wait > before " > + "forcing log file rotation."), Should it say "maximum" ? Does that mean anything ? -- Justin
Re: fix a typo in slotfuncs.c
On Wed, Dec 01, 2021 at 12:22:30PM +0530, Bharath Rupireddy wrote: > It seems like there's a following typo in code comments: > - /* determine how many segments slots can be kept by slots */ > + /* determine how many segments can be kept by slots */ > > Attaching a tiny patch to fix it. This typo exists all the way until PG 13. Indeed, thanks. I'll fix in a bit. -- Michael signature.asc Description: PGP signature
Re: pg_upgrade test for binary compatibility of core data types
On Thu, Nov 18, 2021 at 03:58:18PM +0900, Michael Paquier wrote: > + ver >= 905 AND ver <= 1300 AS oldpgversion_95_13, > + ver >= 906 AND ver <= 1300 AS oldpgversion_96_13, > + ver >= 906 AND ver <= 1000 AS oldpgversion_96_10, > So here, we have the choice between conditions that play with version > ranges or we could make those checks simpler but compensate with a set > of IF EXISTS queries. I think that your choice is right. The > buildfarm mixes both styles to compensate with the cases where the > objects are created after a drop. So, I have come back to this part of the patch set, that moves the SQL queries doing the pre-upgrade cleanups in the old version we upgrade from, and decided to go with what looks like the simplest approach, relying on some IFEs depending on the object types if they don't exist for some cases. While checking the whole thing, I have noticed that some of the operations were not really necessary. The result is rather clean now, with a linear organization of the version logic, so as it is a no-brainer to get that done in back-branches per the backward-compatibility argument. I'll get that done down to 10 to maximize its influence, then I'll move on with the buildfarm code and send a patch to plug this and reduce the dependencies between core and the buildfarm code. -- Michael From 9bfe54d8867c9c05a36976f01ed65e5b8da442f7 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 1 Dec 2021 16:08:01 +0900 Subject: [PATCH] Move SQLs for cleanups before cross-version upgrades into a new file The plan is to make the buildfarm code re-use this code, and test.sh held a duplicated logic for this work. Separating those SQLs into a new file with a set of \if clauses to do version checks with the old version upgrading will allow the buildfarm to reuse that. An extra simplification is that committers will be able to control the objects cleaned up without any need to tweak the buldfarm code, at least for the main regression test suite. Backpatch down to 10, to maximize its effects. --- src/bin/pg_upgrade/test.sh | 52 ++-- src/bin/pg_upgrade/upgrade_adapt.sql | 92 2 files changed, 97 insertions(+), 47 deletions(-) create mode 100644 src/bin/pg_upgrade/upgrade_adapt.sql diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh index 8593488907..54c02bc65b 100644 --- a/src/bin/pg_upgrade/test.sh +++ b/src/bin/pg_upgrade/test.sh @@ -181,53 +181,11 @@ if "$MAKE" -C "$oldsrc" installcheck-parallel; then # Before dumping, tweak the database of the old instance depending # on its version. if [ "$newsrc" != "$oldsrc" ]; then - fix_sql="" - # Get rid of objects not feasible in later versions - case $oldpgversion in - 804??) -fix_sql="DROP FUNCTION public.myfunc(integer);" -;; - esac - - # Last appeared in v9.6 - if [ $oldpgversion -lt 10 ]; then - fix_sql="$fix_sql - DROP FUNCTION IF EXISTS - public.oldstyle_length(integer, text);" - fi - # Last appeared in v13 - if [ $oldpgversion -lt 14 ]; then - fix_sql="$fix_sql - DROP FUNCTION IF EXISTS - public.putenv(text); -- last in v13 - DROP OPERATOR IF EXISTS -- last in v13 - public.#@# (pg_catalog.int8, NONE), - public.#%# (pg_catalog.int8, NONE), - public.!=- (pg_catalog.int8, NONE), - public.#@%# (pg_catalog.int8, NONE);" - fi - psql -X -d regression -c "$fix_sql;" || psql_fix_sql_status=$? - - # WITH OIDS is not supported anymore in v12, so remove support - # for any relations marked as such. - if [ $oldpgversion -lt 12 ]; then - fix_sql="DO \$stmt\$ -DECLARE - rec text; -BEGIN -FOR rec in - SELECT oid::regclass::text - FROM pg_class - WHERE relname !~ '^pg_' - AND relhasoids - AND relkind in ('r','m') - ORDER BY 1 -LOOP - execute 'ALTER TABLE ' || rec || ' SET WITHOUT OIDS'; -END LOOP; -END; \$stmt\$;" - psql -X -d regression -c "$fix_sql;" || psql_fix_sql_status=$? - fi + # This SQL script has its own idea of the cleanup that needs to be + # done and embeds version checks. Note that this uses the script + # stored on the new branch. + psql -X -d regression -f "$newsrc/src/bin/pg_upgrade/upgrade_adapt.sql" \ + || psql_fix_sql_status=$? # Handling of --extra-float-digits gets messy after v12. # Note that this changes the dumps from the old and new diff --git a/src/bin/pg_upgrade/upgrade_adapt.sql b/src/bin/pg_upgrade/upgrade_adapt.sql new file mode 100644 index 00..175b2ebe2e --- /dev/null +++ b/src/bin/pg_upgrade/upgrade_adapt.sql @@ -0,0 +1,92 @@ +-- +-- SQL queries for upgrade tests across different major versions. +-- +-- This file includes a set of SQL queries to make a cluster to-be-upgraded +-- compatible with the version this file is based on. Note that this +-- requires psql, as per-version queries are controlled with a set of \if +-- clauses. + +-- This script is backward-compatible, so it is
Re: Commitfest 2021-11 Patch Triage - Part 1
On Mon, 29 Nov 2021 09:03:06 -0300 Marcos Pegoraro wrote: > > > > > 2138: Incremental Materialized View Maintenance > > > > I've responded to it in the following thread, and described the recent > > discussions, > > current status, summary of IVM feature and design, and past discussions. > > > > IVM is a wonderful feature, but some features were omitted because of > its complexity, so maybe IVM will spend more time to completely solve what > it wants to do. > I did not understand, and didn´t find on docs, if a Materialized View is a > table, why I cannot change a specific record ? > Because if I have a way to create and refresh the entire view and update a > single record it would give me all power of IVM is trying to. I think the reason why we can't update a materialized view directly is because it is basically a "view" and it should not contains any data irrelevant to its definition and underlying tables. If we would have a feature to update a materialized view direcly, maybe, it should behave as updatable-view as well as normal (virtual) views, although I am not sure -- Yugo NAGATA
Re: pg_replslotdata - a tool for displaying replication slot information
On Tue, Nov 30, 2021 at 9:47 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Wed, Dec 1, 2021 at 12:13 AM Bossart, Nathan > wrote: > > > > On 11/30/21, 6:14 AM, "Peter Eisentraut" < > peter.eisentr...@enterprisedb.com> wrote: > > > On 23.11.21 06:09, Bharath Rupireddy wrote: > > >> The replication slots data is stored in binary format on the disk > under > > >> the pg_replslot/<> directory which isn't human readable. If > > >> the server is crashed/down (for whatever reasons) and unable to come > up, > > >> currently there's no way for the user/admin/developer to know what > were > > >> all the replication slots available at the time of server crash/down > to > > >> figure out what's the restart lsn, xid, two phase info or types of > slots > > >> etc. > > > > > > What do you need that for? You can't do anything with a replication > > > slot while the server is down. > > > > One use-case might be to discover the value you need to set for > > max_replication_slots, although it's pretty trivial to discover the > > number of replication slots by looking at the folder directly. > > Apart from the above use-case, one can do some exploratory analysis on > the replication slot information after the server crash, this may be > useful for RCA or debugging purposes, for instance: > 1) to look at the restart_lsn of the slots to get to know why there > were many WAL files filled up on the disk (because of the restart_lsn > being low) > In a disk full scenario because of WAL, this tool comes handy identifying which WAL files to delete to free up the space and also help assess the accidental delete of the WAL files. I am not sure if there is a tool to help cleanup the WAL (may be invoking the archive_command too?) without impacting physical / logical slots, and respecting last checkpoint location but if one exist that will be handy
Is there any documentation on how to correctly create extensions in HA(primary-standby) setup?
Hi, I recently did a small experiment to see how one can create extensions properly in HA(primary-standby) setup. Here are my findings: 1) ALTER SYSTEM SET or GUC(configuration parameters) settings are not replicated to standby. 2) CREATE EXTENSION statements are replicated to standby. 3) If the extension doesn't need to be set up in shared_preload_libraries GUC, no need to create extension on the standby, it just works. 4) If the extension needs to be set up in shared_preload_libraries GUC: the correct way to install the extension on both primary and standby is: a) set shared_preload_libraries GUC on primary, reload conf, restart the primary to make the GUC effective. b) set shared_preload_libraries GUC on standby, restart the standby to make the GUC effective. c) create extension on primary (we don't need to create extension on standby as the create extension statements are replicated). d) verify that the extension functions work on both primary and standby. 5) The extensions which perform writes to the database may not work on standby as the write transactions are not allowed on the standby. However, the create extension on the standby works just fine but the functions it provides may not work. I think I was successful in my experiment, please let me know if anything is wrong in what I did. Do we have the documentation on how to create extensions correctly in HA setup? If what I did is correct and we don't have it documented, can we have it somewhere in the existing HA related documentation? Regards, Bharath Rupireddy.
Re: pg_get_publication_tables() output duplicate relid
On Mon, Nov 29, 2021 at 2:37 PM houzj.f...@fujitsu.com wrote: > > On Wed, Nov 24, 2021 4:48 PM Amit Kapila > > On Mon, Nov 22, 2021 at 12:55 PM Amit Langote > > wrote: > > > > > > On Fri, Nov 19, 2021 at 2:28 PM Amit Kapila > > wrote: > > > > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote > > wrote: > > > > > As in, > > > > > do we know of any replication (initial/streaming) misbehavior > > > > > caused by the duplicate partition OIDs in this case or is the only > > > > > problem that pg_publication_tables output looks odd? > > > > > > > > The latter one but I think either we should document this or change > > > > it as we can't assume users will follow what subscriber-side code does. > > > > > > On second thought, I agree that de-duplicating partitions from this > > > view is an improvement. > > > > > > > Fair enough. Hou-San, Can you please submit the updated patch after fixing > > any pending comments including the test case? > > Attach the updated patch which address the comments so far. > > The patch only adds a testcase in publication.sql because the duplicate > output doesn't cause unexpected behavior in pub-sub test. > Thanks, the patch looks good to me. I have slightly changed the commit message in the attached. I would like to commit this only in HEAD as there is no user complaint about this and it might not stop any user's service unless it relies on this view's output for the initial table synchronization. What do you think? Bharath/Amit L., others, do you have any opinion on this matter? -- With Regards, Amit Kapila. v3-0001-De-duplicate-the-result-of-pg_publication_tables-.patch Description: Binary data
fix a typo in slotfuncs.c
Hi, It seems like there's a following typo in code comments: - /* determine how many segments slots can be kept by slots */ + /* determine how many segments can be kept by slots */ Attaching a tiny patch to fix it. This typo exists all the way until PG 13. Regards, Bharath Rupireddy. v1-0001-fix-a-typo-in-slotfuncs.c.patch Description: Binary data
Re: suboverflowed subtransactions concurrency performance optimize
> 30 нояб. 2021 г., в 17:19, Simon Riggs > написал(а): > > On Mon, 30 Aug 2021 at 11:25, Andrey Borodin wrote: >> >> Hi Pengcheng! >> >> You are solving important problem, thank you! >> >>> 30 авг. 2021 г., в 13:43, Pengchengliu написал(а): >>> >>> To resolve this performance problem, we think about a solution which cache >>> SubtransSLRU to local cache. >>> First we can query parent transaction id from SubtransSLRU, and copy the >>> SLRU page to local cache page. >>> After that if we need query parent transaction id again, we can query it >>> from local cache directly. >> >> A copy of SLRU in each backend's cache can consume a lot of memory. > > Yes, copying the whole SLRU into local cache seems overkill. > >> Why create a copy if we can optimise shared representation of SLRU? > > transam.c uses a single item cache to prevent thrashing from repeated > lookups, which reduces problems with shared access to SLRUs. > multitrans.c also has similar. > > I notice that subtrans. doesn't have this, but could easily do so. > Patch attached, which seems separate to other attempts at tuning. I think this definitely makes sense to do. > On review, I think it is also possible that we update subtrans ONLY if > someone uses >PGPROC_MAX_CACHED_SUBXIDS. > This would make subtrans much smaller and avoid one-entry-per-page > which is a major source of cacheing. > This would means some light changes in GetSnapshotData(). > Let me know if that seems interesting also? I'm afraid of unexpected performance degradation. When the system runs fine, you provision a VM of some vCPU\RAM, and then some backend uses a little more than 64 subtransactions and all the system is stuck. Or will it affect only backend using more than 64 subtransactions? Best regards, Andrey Borodin.
Re: pg_get_publication_tables() output duplicate relid
On Fri, Nov 26, 2021 at 8:38 AM Amit Kapila wrote: > > On Fri, Nov 26, 2021 at 7:10 AM houzj.f...@fujitsu.com > wrote: > > > > On Thursday, November 25, 2021 4:57 PM Amit Kapila > > wrote: > > > On Thu, Nov 25, 2021 at 1:30 PM Amit Langote > > > > > > > > I agree with backpatching the doc fix. I've attached a diff against > > > > master, though it also appears to apply to 13 and 14 branches. > > > > > > > > > > I think we can for publish_via_partition_root, other > > > than that > > > the patch looks good to me. > > > > > > Hou-San, others, do you have any opinion about this patch and whether to > > > backpatch it or not? > > > > I think it makes sense to backpatch the doc fix, and the patch looks good > > to me. > > > > Thanks, I'll push this sometime early next week unless there are any > objections to it. > Pushed. -- With Regards, Amit Kapila.
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Thu, Nov 25, 2021 at 10:17 PM Dilip Kumar wrote: > > Thanks for the review and many valuable comments, I have fixed all of > them except this comment (/* If we got a cancel signal during the copy > of the data, quit */) because this looks fine to me. 0007, I have > dropped from the patchset for now. I have also included fixes for > comments given by John. > I found the following issue with the patches applied: A server crash occurs after the following sequence of commands: create tablespace tbsp1 location '/tbsp1'; create tablespace tbsp2 location '/tbsp2'; create database test1 tablespace tbsp1; create database test2 template test1 tablespace tbsp2; alter database test2 set tablespace tbsp1; checkpoint; The following type of message is seen in the server log: 2021-12-01 16:48:26.623 AEDT [67423] PANIC: could not fsync file "pg_tblspc/16385/PG_15_202111301/16387/3394": No such file or directory 2021-12-01 16:48:27.228 AEDT [67422] LOG: checkpointer process (PID 67423) was terminated by signal 6: Aborted 2021-12-01 16:48:27.228 AEDT [67422] LOG: terminating any other active server processes 2021-12-01 16:48:27.233 AEDT [67422] LOG: all server processes terminated; reinitializing Also (prior to running the checkpoint command above) I've seen errors like the following when running pg_dumpall: pg_dump: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: PANIC: could not open critical system index 2662 pg_dumpall: error: pg_dump failed on database "test2", exiting Hopefully the above example will help in tracking down the cause. Regards, Greg Nancarrow Fujitsu Australia
RE: Skipping logical replication transactions on subscriber side
On Wednesday, December 1, 2021 1:23 PM Masahiko Sawada wrote: > On Wed, Dec 1, 2021 at 1:00 PM Amit Kapila wrote: > > On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada > > wrote: > > > If so, the result from the second check_sql is unstable and it's > > > probably better to check the result only once. That is, the first > > > check_sql includes the command and we exit from the function once we > > > confirm the error entry is expectedly updated. > > > > > > > Yeah, I think that should be fine. > > Okay, I've attached an updated patch. Please review it. > I agreed that checking the result only once makes the test more stable. The patch looks good to me. Best regards, Hou zj
Re: Update stale code comment in CheckpointerMain()
On Tue, Nov 30, 2021 at 3:09 PM Daniel Gustafsson wrote: > > > On 30 Nov 2021, at 08:00, Amul Sul wrote: > > > The attached patch updates the code comment which is no longer true > > after commit # 4a92a1c3d1c361ffb031ed05bf65b801241d7cdd > > Agreed, but looking at this shouldn't we also tweak the comment on > RecoveryInProgress() as per the attached v2 diff? > Yes, we should -- diff looks good to me, thanks. Regards, Amul
Re: pg_waldump stucks with options --follow or -f and --stats or -z
On Mon, Nov 29, 2021 at 1:16 PM Bharath Rupireddy wrote: > > On Mon, Nov 29, 2021 at 11:09 AM Michael Paquier wrote: > > > > On Sun, Nov 28, 2021 at 12:13:08PM +0530, Bharath Rupireddy wrote: > > > Thanks. Here's the v5. > > > > By the way, one thing that I completely forgot here is that SIGINT is > > not handled on Windows. If we want to make that work for a WIN32 > > terminal, we would need to do something similar to > > src/fe_utils/cancel.c where we need to use SetConsoleCtrlHandler() and > > handle the stats print when facing CTRL_C_EVENT or CTRL_BREAK_EVENT as > > events. Perhaps we should try to think harder and have a more > > centralized facility for the handler part between a WIN32 terminal and > > SIGINT, as it is not the first time that we need this level of > > handling. Or we could just discard this issue, document its WIN32 > > limitation and paint some "#ifdef WIN32" around all the handler > > portions of the patch. > > > > I would be fine with just doing the latter for now, as this stuff is > > still useful for most users, but that's worth mentioning. Any > > opinions? > > I'm okay to have the same behaviour as pg_receivewal and pg_recvlogical tools. Here's the v6 patch that has the SIGINT handling enabled for non-WIN32 platforms (note that I haven't specified anything in the documentation) much like pg_receivewal and pg_recvlogical. Regards, Bharath Rupireddy. v6-0001-pg_waldump-emit-stats-when-interrupted-with-SIGIN.patch Description: Binary data
Re: Optionally automatically disable logical replication subscriptions on error
On Tue, Nov 30, 2021 at 5:34 PM osumi.takami...@fujitsu.com wrote: > > On Tuesday, November 30, 2021 1:10 PM Greg Nancarrow > wrote: > > On Sat, Nov 27, 2021 at 1:36 AM osumi.takami...@fujitsu.com > > wrote: > > > > > > This v7 uses v26 of skip xid patch [1] > > This patch no longer applies on the latest source. > > Also, the patch is missing an update to doc/src/sgml/catalogs.sgml, for the > > new "subdisableonerr" column of pg_subscription. > Thanks for your review ! > > Fixed the documentation accordingly. Further, > this comment invoked some more refactoring of codes > since I wrote some internal codes related to > 'disable_on_error' in an inconsistent order. > I fixed this by keeping patch's codes > after that of 'two_phase' subscription option as much as possible. > > I also conducted both pgindent and pgperltidy. > > Now, I'll share the v8 that uses PG > whose commit id is after 8d74fc9 (pg_stat_subscription_workers). Thanks for the updated patch, few small comments: 1) This should be changed: + subdisableonerr bool + + + If true, the subscription will be disabled when subscription + worker detects an error + + to: + subdisableonerr bool + + + If true, the subscription will be disabled when subscription + worker detects non-transient errors + + 2) "Disable On Err" can be changed to "Disable On Error" + ", subtwophasestate AS \"%s\"\n" + ", subdisableonerr AS \"%s\"\n", + gettext_noop("Two phase commit"), + gettext_noop("Disable On Err")); 3) Can add a line in the commit message saying "Bump catalog version." as the patch involves changing the catalog. 4) This prototype is not required, since the function is called after the function definition: static inline void set_apply_error_context_xact(TransactionId xid, TimestampTz ts); static inline void reset_apply_error_context_info(void); +static bool IsTransientError(ErrorData *edata); 5) we could use the new style here: + ereport(LOG, + (errmsg("logical replication subscription \"%s\" will be disabled due to error: %s", + MySubscription->name, edata->message))); change it to: + ereport(LOG, + errmsg("logical replication subscription \"%s\" will be disabled due to error: %s", + MySubscription->name, edata->message)); Similarly it can be changed in the other ereports added. Regards, Vignesh
Re: pg_replslotdata - a tool for displaying replication slot information
On Wed, Dec 1, 2021 at 12:13 AM Bossart, Nathan wrote: > > On 11/30/21, 6:14 AM, "Peter Eisentraut" > wrote: > > On 23.11.21 06:09, Bharath Rupireddy wrote: > >> The replication slots data is stored in binary format on the disk under > >> the pg_replslot/<> directory which isn't human readable. If > >> the server is crashed/down (for whatever reasons) and unable to come up, > >> currently there's no way for the user/admin/developer to know what were > >> all the replication slots available at the time of server crash/down to > >> figure out what's the restart lsn, xid, two phase info or types of slots > >> etc. > > > > What do you need that for? You can't do anything with a replication > > slot while the server is down. > > One use-case might be to discover the value you need to set for > max_replication_slots, although it's pretty trivial to discover the > number of replication slots by looking at the folder directly. Apart from the above use-case, one can do some exploratory analysis on the replication slot information after the server crash, this may be useful for RCA or debugging purposes, for instance: 1) to look at the restart_lsn of the slots to get to know why there were many WAL files filled up on the disk (because of the restart_lsn being low) 2) to know how many replication slots available at the time of crash, if required, one can choose to drop selective replication slots or the ones that were falling behind to make the server up 3) if we persist active_pid info of the replication slot to the disk(currently we don't have this info in the disk), one can get to know the inactive replication slots at the time of crash 4) if the primary server is down and failover were to happen on to the standby, by looking at the replication slot information on the primary, one can easily recreate the slots on the standby > However, you also need to know how many replication origins there are, > and AFAIK there isn't an easy way to read the replorigin_checkpoint > file at the moment. IMO a utility like this should also show details > for the replication origins. I don't have any other compelling use- > cases at the moment, but I will say that it is typically nice from an > administrative standpoint to be able to inspect things like this > without logging into a running server. Yeah, this can be added too, probably as an extra option to the proposed pg_replslotdata tool. But for now, let's deal with the replication slot information alone and once this gets committed, we can extend it further for replication origin info. Regards, Bharath Rupireddy.
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On Wed, Dec 1, 2021 at 4:42 AM Peter Geoghegan wrote: > > On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada wrote: > > Thanks! I'll change my parallel vacuum refactoring patch accordingly. > > Thanks again for working on that. > > > Regarding the commit, I think that there still is one place in > > lazyvacuum.c where we can change "dead tuples” to "dead items”: > > > > /* > > * Allocate the space for dead tuples. Note that this handles parallel > > * VACUUM initialization as part of allocating shared memory space used > > * for dead_items. > > */ > > dead_items_alloc(vacrel, params->nworkers); > > dead_items = vacrel->dead_items; > > Oops. Pushed a fixup for that just now. Thanks! > > > Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES > > and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose? > > That was deliberate. > > It would be a bit strange to alter these constants without also > updating the corresponding column names for the > pg_stat_progress_vacuum system view. But if I kept the definition from > system_views.sql in sync, then I would break user scripts -- for > reasons that users don't care about. That didn't seem like the right > approach. Agreed. > > Also, the system as a whole still assumes "DEAD tuples and LP_DEAD > items are the same, and are just as much of a problem in the table as > they are in each index". As you know, this is not really true, which > is an important problem for us. Fixing it (perhaps as part of adding > something like Robert's conveyor belt design) will likely require > revising this model quite fundamentally (e.g, the vacthresh > calculation in autovacuum.c:relation_needs_vacanalyze() would be > replaced). When this happens, we'll probably need to update system > views that have columns with names like "dead_tuples" -- because maybe > we no longer specifically count dead items/tuples at all. I strongly > suspect that the approach to statistics that we take for pg_statistic > optimizer stats just doesn't work for dead items/tuples -- statistical > sampling only produces useful statistics for the optimizer because > certain delicate assumptions are met (even these assumptions only > really work with a properly normalized database schema). > > Maybe revising the model used for autovacuum scheduling wouldn't > include changing pg_stat_progress_vacuum, since that isn't technically > "part of the model" --- I'm not sure. But it's not something that I am > in a hurry to fix. Understood. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Skipping logical replication transactions on subscriber side
On Wed, Dec 1, 2021 at 1:00 PM Amit Kapila wrote: > > On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada wrote: > > > > On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila wrote: > > > > > > On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com > > > wrote: > > > > > > > > I have a question about the testcase (I could be wrong here). > > > > > > > > Is it possible that the race condition happen between apply > > > > worker(test_tab1) > > > > and table sync worker(test_tab2) ? If so, it seems the > > > > error("replication > > > > origin with OID") could happen randomly until we resolve the conflict. > > > > Based on this, for the following code: > > > > - > > > > # Wait for the error statistics to be updated. > > > > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql; > > > > $node->poll_query_until( > > > > 'postgres', $check_sql, > > > > ) or die "Timed out while waiting for statistics to be updated"; > > > > > > > > * [1] * > > > > > > > > $check_sql = > > > > qq[ > > > > SELECT subname, last_error_command, last_error_relid::regclass, > > > > last_error_count > 0 ] . $part_sql; > > > > my $result = $node->safe_psql('postgres', $check_sql); > > > > is($result, $expected, $msg); > > > > - > > > > > > > > Is it possible that the error("replication origin with OID") happen > > > > again at the > > > > place [1]. In this case, the error message we have checked could be > > > > replaced by > > > > another error("replication origin ...") and then the test fail ? > > > > > > > > > > Once we get the "duplicate key violation ..." error before * [1] * via > > > apply_worker then we shouldn't get replication origin-specific error > > > because the origin set up is done before starting to apply changes. > > > > Right. > > > > > Also, even if that or some other happens after * [1] * because of > > > errmsg_prefix check it should still succeed. > > > > In this case, the old error ("duplicate key violation ...") is > > overwritten by a new error (e.g., connection error. not sure how > > possible it is) > > > > Yeah, or probably some memory allocation failure. I think the > probability of such failures is very low but OTOH why take chance. > > > and the test fails because the query returns no > > entries, no? > > > > Right. > > > If so, the result from the second check_sql is unstable > > and it's probably better to check the result only once. That is, the > > first check_sql includes the command and we exit from the function > > once we confirm the error entry is expectedly updated. > > > > Yeah, I think that should be fine. Okay, I've attached an updated patch. Please review it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/ v2-0001-Fix-regression-test-failure-caused-by-commit-8d74.patch Description: Binary data
Re: Skipping logical replication transactions on subscriber side
On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada wrote: > > On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila wrote: > > > > On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com > > wrote: > > > > > > I have a question about the testcase (I could be wrong here). > > > > > > Is it possible that the race condition happen between apply > > > worker(test_tab1) > > > and table sync worker(test_tab2) ? If so, it seems the error("replication > > > origin with OID") could happen randomly until we resolve the conflict. > > > Based on this, for the following code: > > > - > > > # Wait for the error statistics to be updated. > > > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql; > > > $node->poll_query_until( > > > 'postgres', $check_sql, > > > ) or die "Timed out while waiting for statistics to be updated"; > > > > > > * [1] * > > > > > > $check_sql = > > > qq[ > > > SELECT subname, last_error_command, last_error_relid::regclass, > > > last_error_count > 0 ] . $part_sql; > > > my $result = $node->safe_psql('postgres', $check_sql); > > > is($result, $expected, $msg); > > > - > > > > > > Is it possible that the error("replication origin with OID") happen again > > > at the > > > place [1]. In this case, the error message we have checked could be > > > replaced by > > > another error("replication origin ...") and then the test fail ? > > > > > > > Once we get the "duplicate key violation ..." error before * [1] * via > > apply_worker then we shouldn't get replication origin-specific error > > because the origin set up is done before starting to apply changes. > > Right. > > > Also, even if that or some other happens after * [1] * because of > > errmsg_prefix check it should still succeed. > > In this case, the old error ("duplicate key violation ...") is > overwritten by a new error (e.g., connection error. not sure how > possible it is) > Yeah, or probably some memory allocation failure. I think the probability of such failures is very low but OTOH why take chance. > and the test fails because the query returns no > entries, no? > Right. > If so, the result from the second check_sql is unstable > and it's probably better to check the result only once. That is, the > first check_sql includes the command and we exit from the function > once we confirm the error entry is expectedly updated. > Yeah, I think that should be fine. With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila wrote: > > On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com > wrote: > > > > On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada wrote: > > > > > > > > > Shouldn't we someway check that the error message also starts with > > > > > "duplicate key value violates ..."? > > > > > > > > Yeah, I think it's a good idea to make the checks more specific. That > > > > is, probably we can specify the prefix of the error message and > > > > subrelid in addition to the current conditions: relid and xid. That > > > > way, we can check what error was reported by which workers (tablesync > > > > or apply) for which relations. And both check queries in > > > > test_subscription_error() can have the same WHERE clause. > > > > > > I've attached a patch that fixes this issue. Please review it. > > > > > > > I have a question about the testcase (I could be wrong here). > > > > Is it possible that the race condition happen between apply > > worker(test_tab1) > > and table sync worker(test_tab2) ? If so, it seems the error("replication > > origin with OID") could happen randomly until we resolve the conflict. > > Based on this, for the following code: > > - > > # Wait for the error statistics to be updated. > > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql; > > $node->poll_query_until( > > 'postgres', $check_sql, > > ) or die "Timed out while waiting for statistics to be updated"; > > > > * [1] * > > > > $check_sql = > > qq[ > > SELECT subname, last_error_command, last_error_relid::regclass, > > last_error_count > 0 ] . $part_sql; > > my $result = $node->safe_psql('postgres', $check_sql); > > is($result, $expected, $msg); > > - > > > > Is it possible that the error("replication origin with OID") happen again > > at the > > place [1]. In this case, the error message we have checked could be > > replaced by > > another error("replication origin ...") and then the test fail ? > > > > Once we get the "duplicate key violation ..." error before * [1] * via > apply_worker then we shouldn't get replication origin-specific error > because the origin set up is done before starting to apply changes. Right. > Also, even if that or some other happens after * [1] * because of > errmsg_prefix check it should still succeed. In this case, the old error ("duplicate key violation ...") is overwritten by a new error (e.g., connection error. not sure how possible it is) and the test fails because the query returns no entries, no? If so, the result from the second check_sql is unstable and it's probably better to check the result only once. That is, the first check_sql includes the command and we exit from the function once we confirm the error entry is expectedly updated. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: Skipping logical replication transactions on subscriber side
On Wed, Dec 1, 2021 11:22 AM Amit Kapila wrote: > On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com > wrote: > > > > On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada > wrote: > > > > > > > > > Shouldn't we someway check that the error message also starts with > > > > > "duplicate key value violates ..."? > > > > > > > > Yeah, I think it's a good idea to make the checks more specific. That > > > > is, probably we can specify the prefix of the error message and > > > > subrelid in addition to the current conditions: relid and xid. That > > > > way, we can check what error was reported by which workers (tablesync > > > > or apply) for which relations. And both check queries in > > > > test_subscription_error() can have the same WHERE clause. > > > > > > I've attached a patch that fixes this issue. Please review it. > > > > > > > I have a question about the testcase (I could be wrong here). > > > > Is it possible that the race condition happen between apply > worker(test_tab1) > > and table sync worker(test_tab2) ? If so, it seems the error("replication > > origin with OID") could happen randomly until we resolve the conflict. > > Based on this, for the following code: > > - > > # Wait for the error statistics to be updated. > > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql; > > $node->poll_query_until( > > 'postgres', $check_sql, > > ) or die "Timed out while waiting for statistics to be updated"; > > > > * [1] * > > > > $check_sql = > > qq[ > > SELECT subname, last_error_command, last_error_relid::regclass, > > last_error_count > 0 ] . $part_sql; > > my $result = $node->safe_psql('postgres', $check_sql); > > is($result, $expected, $msg); > > - > > > > Is it possible that the error("replication origin with OID") happen again > > at the > > place [1]. In this case, the error message we have checked could be replaced > by > > another error("replication origin ...") and then the test fail ? > > > > Once we get the "duplicate key violation ..." error before * [1] * via > apply_worker then we shouldn't get replication origin-specific error > because the origin set up is done before starting to apply changes. > Also, even if that or some other happens after * [1] * because of > errmsg_prefix check it should still succeed. Does that make sense? Oh, I missed the point that the origin set up is done once we get the expected error. Thanks for the explanation, and I think the patch looks good. Best regards, Hou zj
Re: Skipping logical replication transactions on subscriber side
On Wed, Dec 1, 2021 at 8:24 AM houzj.f...@fujitsu.com wrote: > > On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada wrote: > > > > > > > Shouldn't we someway check that the error message also starts with > > > > "duplicate key value violates ..."? > > > > > > Yeah, I think it's a good idea to make the checks more specific. That > > > is, probably we can specify the prefix of the error message and > > > subrelid in addition to the current conditions: relid and xid. That > > > way, we can check what error was reported by which workers (tablesync > > > or apply) for which relations. And both check queries in > > > test_subscription_error() can have the same WHERE clause. > > > > I've attached a patch that fixes this issue. Please review it. > > > > I have a question about the testcase (I could be wrong here). > > Is it possible that the race condition happen between apply worker(test_tab1) > and table sync worker(test_tab2) ? If so, it seems the error("replication > origin with OID") could happen randomly until we resolve the conflict. > Based on this, for the following code: > - > # Wait for the error statistics to be updated. > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql; > $node->poll_query_until( > 'postgres', $check_sql, > ) or die "Timed out while waiting for statistics to be updated"; > > * [1] * > > $check_sql = > qq[ > SELECT subname, last_error_command, last_error_relid::regclass, > last_error_count > 0 ] . $part_sql; > my $result = $node->safe_psql('postgres', $check_sql); > is($result, $expected, $msg); > - > > Is it possible that the error("replication origin with OID") happen again at > the > place [1]. In this case, the error message we have checked could be replaced > by > another error("replication origin ...") and then the test fail ? > Once we get the "duplicate key violation ..." error before * [1] * via apply_worker then we shouldn't get replication origin-specific error because the origin set up is done before starting to apply changes. Also, even if that or some other happens after * [1] * because of errmsg_prefix check it should still succeed. Does that make sense? -- With Regards, Amit Kapila.
RE: Skipping logical replication transactions on subscriber side
On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada wrote: > On Tue, Nov 30, 2021 at 8:41 PM Masahiko Sawada > wrote: > > > > On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila > wrote: > > > > > > On Mon, Nov 29, 2021 at 11:38 AM vignesh C > wrote: > > > > > > > > > > I have pushed this patch and there is a buildfarm failure for it. See: > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder > > > t=2021-11-30%2005%3A05%3A25 > > > > > > Sawada-San has shared his initial analysis on pgsql-committers [1] > > > and I am responding here as the fix requires some more discussion. > > > > > > > Looking at the result the test actually got, we had two error > > > > entries for test_tab1 instead of one: > > > > > > > > # Failed test 'check the error reported by the apply worker' > > > > # at t/026_worker_stats.pl line 33. > > > > # got: 'tap_sub|INSERT|test_tab1|t > > > > # tap_sub||test_tab1|t' > > > > # expected: 'tap_sub|INSERT|test_tab1|t' > > > > > > > > The possible scenarios are: > > > > > > > > The table sync worker for test_tab1 failed due to an error > > > > unrelated to apply changes: > > > > > > > > 2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin > > > > with OID 2 is already active for PID 23706 > > > > > > > > At this time, the view had one error entry for the table sync worker. > > > > After retrying table sync, it succeeded: > > > > > > > > 2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication > > > > table synchronization worker for subscription "tap_sub", table > "test_tab1" > > > > has finished > > > > > > > > Then after inserting a row on the publisher, the apply worker > > > > inserted the row but failed due to violating a unique key > > > > violation, which is > > > > expected: > > > > > > > > 2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value > > > > violates unique constraint "test_tab1_pkey" > > > > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already > > > > exists. > > > > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote > > > > data during "INSERT" for replication target relation > > > > "public.test_tab1" in transaction 721 at 2021-11-30 > > > > 06:24:04.305096+01 > > > > > > > > As a result, we had two error entries for test_tab1: the table > > > > sync worker error and the apply worker error. I didn't expect that > > > > the table sync worker for test_tab1 failed due to "replication > > > > origin with OID 2 is already active for PID 23706” error. > > > > > > > > Looking at test_subscription_error() in 026_worker_stats.pl, we > > > > have two checks; in the first check, we wait for the view to show > > > > the error entry for the given relation name and xid. This check > > > > was passed since we had the second error (i.g., apply worker > > > > error). In the second check, we get error entries from > > > > pg_stat_subscription_workers by specifying only the relation name. > > > > Therefore, we ended up getting two entries and failed the tests. > > > > > > > > To fix this issue, I think that in the second check, we can get > > > > the error from pg_stat_subscription_workers by specifying the > > > > relation name *and* xid like the first check does. I've attached the > > > > patch. > > > > What do you think? > > > > > > > > > > I think this will fix the reported failure but there is another race > > > condition in the test. Isn't it possible that for table test_tab2, > > > we get an error "replication origin with OID ..." or some other > > > error before copy, in that case also, we will proceed from the > > > second call of test_subscription_error() which is not what we expect in > > > the > test? > > > > Right. > > > > > Shouldn't we someway check that the error message also starts with > > > "duplicate key value violates ..."? > > > > Yeah, I think it's a good idea to make the checks more specific. That > > is, probably we can specify the prefix of the error message and > > subrelid in addition to the current conditions: relid and xid. That > > way, we can check what error was reported by which workers (tablesync > > or apply) for which relations. And both check queries in > > test_subscription_error() can have the same WHERE clause. > > I've attached a patch that fixes this issue. Please review it. > I have a question about the testcase (I could be wrong here). Is it possible that the race condition happen between apply worker(test_tab1) and table sync worker(test_tab2) ? If so, it seems the error("replication origin with OID") could happen randomly until we resolve the conflict. Based on this, for the following code: - # Wait for the error statistics to be updated. my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql; $node->poll_query_until( 'postgres', $check_sql, ) or die "Timed out while waiting for statistics to be updated"; * [1] * $check_sql = qq[ SELECT subname, last_error_command, last_error_relid::regclass, last_error_count > 0 ] .
Re: row filtering for logical replication
On Wed, Dec 1, 2021 at 6:55 AM Euler Taveira wrote: > > On Tue, Nov 30, 2021, at 7:25 AM, Amit Kapila wrote: > > On Tue, Nov 30, 2021 at 11:37 AM Dilip Kumar wrote: > > > > What about the initial table sync? during that, we are going to > > combine all the filters or we are going to apply only the insert > > filters? > > > > AFAIK, currently, initial table sync doesn't respect publication > actions so it should combine all the filters. What do you think? > > I agree. If you think that it might need a row to apply DML commands (UPDATE, > DELETE) in the future or that due to a row filter that row should be available > in the subscriber (INSERT-only case), it makes sense to send all rows that > satisfies any row filter. > Right and Good point. -- With Regards, Amit Kapila.
Re: [PATCH] DROP tab completion
On Tue, Nov 30, 2021 at 03:17:07PM +, Asif Rehman wrote: > The patch applies cleanly and the functionality seems to work well. (master > e7122548a3) > > The new status of this patch is: Ready for Committer + else if (Matches("DROP", "MATERIALIZED", "VIEW", MatchAny)) + COMPLETE_WITH("CASCADE", "RESTRICT"); [...] + else if (Matches("DROP", "OWNED", "BY", MatchAny)) + COMPLETE_WITH("CASCADE", "RESTRICT"); This stuff is gathered around line 3284 in tab-complete.c as of HEAD at 538724f, but I think that you have done things right as there are already sections for those commands and they have multiple keywords. So, applied. Thanks! -- Michael signature.asc Description: PGP signature
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Thu, Nov 25, 2021 at 10:17 PM Dilip Kumar wrote: > > Thanks for the review and many valuable comments, I have fixed all of > them except this comment (/* If we got a cancel signal during the copy > of the data, quit */) because this looks fine to me. 0007, I have > dropped from the patchset for now. I have also included fixes for > comments given by John. > Any progress/results yet on testing against a large database (as suggested by John Naylor) and multiple tablespaces? Thanks for the patch updates. I have some additional minor comments: 0002 (1) Tidy patch comment I suggest minor tidying of the patch comment, as follows: Support new interfaces in relmapper, 1) Support copying the relmap file from one database path to another database path. 2) Like RelationMapOidToFilenode, provide another interface which does the same but, instead of getting it for the database we are connected to, it will get it for the input database path. These interfaces are required for the next patch, for supporting the WAL-logged created database. 0003 src/include/commands/tablecmds.h (1) typedef void (*copy_relation_storage) ... The new typename "copy_relation_storage" needs to be added to src/tools/pgindent/typedefs.list 0006 src/backend/commands/dbcommands.c (1) CreateDirAndVersionFile After writing to the file, you should probably pfree(buf.data), right? Actually, I don't think StringInfo (dynamic string allocation) is needed here, since the version string is so short, so why not just use a local "char buf[16]" buffer and snprintf() the PG_MAJORVERSION+newline into that? Also (as mentioned in my first review) shouldn't the "O_TRUNC" flag be additionally specified in the case when OpenTransientFile() is tried for a 2nd time because of errno==EEXIST on the 1st attempt? (otherwise if the existing file did contain something you'd end up writing after the existing data in the file). src/backend/commands/dbcommands.c (2) typedef struct CreateDBRelInfo ... CreateDBRelInfo The new typename "CreateDBRelInfo" needs to be added to src/tools/pgindent/typedefs.list src/bin/pg_rewind/parsexlog.c (3) Include additional header file It seems that the following additional header file is not needed to compile the source file: +#include "utils/relmapper.h" Regards, Greg Nancarrow Fujitsu Australia
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On Tue, Nov 30, 2021 at 4:54 PM Michael Paquier wrote: > On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote: > > The main objections as I recall are that it is much harder for simple > backup > > scripts and commercial backup integrations to hold a connection to > postgres > > open and write the backup label separately into the backup. > > I don't quite understand why this argument would not hold even today, > even if I'd like to think that more people are using pg_basebackup. > > > I did figure out how to keep the safe part of exclusive backup (not > having > > to maintain a connection) while removing the dangerous part (writing > > backup_label into PGDATA), but it was a substantial amount of work and I > > felt that it had little chance of being committed. > > Which was, I guess, done by storing the backup_label contents within a > file different than backup_label, still maintained in the main data > folder to ensure that it gets included in the backup? > Non-exclusive backup has significant advantages over exclusive backups but would like to add a few comments on the simplicity of exclusive backups - 1/ It is not uncommon nowadays to take a snapshot based backup. Exclusive backup simplifies this story as the backup label file is part of the snapshot. Otherwise, one needs to store it somewhere outside as snapshot metadata and copy this file over during restore (after creating a disk from the snapshot) to the data directory. Typical steps included are 1/ start pg_base_backup 2/ Take disk snapshot 3/ pg_stop_backup() 4/ Mark snapshot as consistent and add some create time metadata. 2/ Control plane code responsible for taking backups is simpler with exclusive backups than non-exclusive as it doesn't maintain a connection to the server, particularly when that orchestration is outside the machine the Postgres server is running on. IMHO, we should either remove the support for it or improve it but not leave it hanging there.
Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?
On Tue, Nov 30, 2021 at 5:09 PM Peter Geoghegan wrote: > I believe that there have been several historic reasons why we need a > cleanup lock during nbtree VACUUM, and that there is only one > remaining reason for it today. So the history is unusually complicated. Minor correction: we actually also have to worry about plain index scans that don't use an MVCC snapshot, which is possible within nbtree. It's quite likely when using logical replication, actually. See the patch for more. Like with the index-only scan case, a non-MVCC snapshot + plain nbtree index scan cannot rely on heap access within the index scan node -- it won't reliably notice that any newer heap tuples (that are really the result of concurrent TID recycling) are not actually visible to its MVCC snapshot -- because there isn't an MVCC snapshot. The only difference in the index-only scan scenario is that we use the visibility map (not the heap) -- which is racey in a way that makes our MVCC snapshot (IOSs always have an MVCC snapshot) an ineffective protection. In summary, to be safe against confusion from concurrent TID recycling during index/index-only scans, we can do either of the following things: 1. Hold a pin of our leaf page while accessing the heap -- that'll definitely conflict with the cleanup lock that nbtree VACUUM will inevitably try to acquire on our leaf page. OR: 2. Hold an MVCC snapshot, AND do an actual heap page access during the plain index scan -- do both together. With approach 2, our plain index scan must determine visibility using real XIDs (against something like a dirty snapshot), rather than using a visibility map bit. That is also necessary because the VM might become invalid or ambiguous, in a way that's clearly not possible when looking at full heap tuple headers with XIDs -- concurrent recycling becomes safe if we know that we'll reliably notice it and not give wrong answers. Does that make sense? -- Peter Geoghegan
Re: row filtering for logical replication
On Tue, Nov 30, 2021, at 7:25 AM, Amit Kapila wrote: > On Tue, Nov 30, 2021 at 11:37 AM Dilip Kumar wrote: > > > > What about the initial table sync? during that, we are going to > > combine all the filters or we are going to apply only the insert > > filters? > > > > AFAIK, currently, initial table sync doesn't respect publication > actions so it should combine all the filters. What do you think? I agree. If you think that it might need a row to apply DML commands (UPDATE, DELETE) in the future or that due to a row filter that row should be available in the subscriber (INSERT-only case), it makes sense to send all rows that satisfies any row filter. The current code already works this way. All row filter are combined into a WHERE clause using OR. If any of the publications don't have a row filter, there is no WHERE clause. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: Why doesn't GiST VACUUM require a super-exclusive lock, like nbtree VACUUM?
On Fri, Nov 5, 2021 at 3:26 AM Andrey Borodin wrote: > > 4 нояб. 2021 г., в 20:58, Peter Geoghegan написал(а): > > That's a pretty unlikely scenario. And even > > if it happened it would only happen once (until the next time we get > > unlucky). What are the chances of somebody noticing a more or less > > once-off, slightly wrong answer? > > I'd say next to impossible, yet not impossible. Or, perhaps, I do not see > protection from this. I think that that's probably all correct -- I would certainly make the same guess. It's very unlikely to happen, and when it does happen it happens only once. > Moreover there's a "microvacuum". It kills tuples with BUFFER_LOCK_SHARE. > AFAIU it should take cleanup lock on buffer too? No, because there is no heap vacuuming involved (because that doesn't happen outside lazyvacuum.c). The work that VACUUM does inside lazy_vacuum_heap_rel() is part of the problem here -- we need an interlock between that work, and index-only scans. Making LP_DEAD items in heap pages LP_UNUSED is only ever possible during a VACUUM operation (I'm sure you know why). AFAICT there would be no bug at all without that detail. I believe that there have been several historic reasons why we need a cleanup lock during nbtree VACUUM, and that there is only one remaining reason for it today. So the history is unusually complicated. But AFAICT it's always some kind of "interlock with heapam VACUUM" issue, with TID recycling, with no protection from our MVCC snapshot. I would say that that's the "real problem" here, when I get to first principles. Attached draft patch attempts to explain things in this area within the nbtree README. There is a much shorter comment about it within vacuumlazy.c. I am concerned about GiST index-only scans themselves, of course, but I discovered this issue when thinking carefully about the concurrency rules for VACUUM -- I think it's valuable to formalize and justify the general rules that index access methods must follow. We can talk about this some more in NYC. See you soon! -- Peter Geoghegan From ea6612300e010f1f2b02119b5a0de95e46d1157d Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Wed, 3 Nov 2021 14:38:15 -0700 Subject: [PATCH v1] nbtree README: Improve VACUUM interlock section. Also document a related issue for index-only scans in vacuumlazy.c. Author: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com --- src/backend/access/heap/vacuumlazy.c | 10 ++ src/backend/access/nbtree/README | 145 --- 2 files changed, 75 insertions(+), 80 deletions(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 282b44f87..8bfe196bf 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -2384,6 +2384,16 @@ lazy_vacuum_heap_rel(LVRelState *vacrel) * LP_DEAD items on the page that were determined to be LP_DEAD items back * when the same page was visited by lazy_scan_prune() (i.e. those whose TID * was recorded in the dead_items array at the time). + * + * We can opportunistically set the visibility map bit for the page here, + * which is valuable when lazy_scan_prune couldn't earlier on, owing only to + * the fact that there were LP_DEAD items that we'll now mark as unused. This + * is why index AMs that support index-only scans have to hold a pin on an + * index page as an interlock against VACUUM while accessing the visibility + * map (which is really just a dense summary of visibility information in the + * heap). If they didn't do this then there would be rare race conditions + * where a heap TID that is actually dead appears alive to an unlucky + * index-only scan. */ static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer, diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README index 2a7332d07..c6f04d856 100644 --- a/src/backend/access/nbtree/README +++ b/src/backend/access/nbtree/README @@ -89,25 +89,28 @@ Page read locks are held only for as long as a scan is examining a page. To minimize lock/unlock traffic, an index scan always searches a leaf page to identify all the matching items at once, copying their heap tuple IDs into backend-local storage. The heap tuple IDs are then processed while -not holding any page lock within the index. We do continue to hold a pin -on the leaf page in some circumstances, to protect against concurrent -deletions (see below). In this state the scan is effectively stopped -"between" pages, either before or after the page it has pinned. This is -safe in the presence of concurrent insertions and even page splits, because -items are never moved across pre-existing page boundaries --- so the scan -cannot miss any items it should have seen, nor accidentally return the same -item twice. The scan must remember the page's right-link at the time it -was scanned, since that is the page to
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On Tue, Nov 30, 2021 at 05:58:15PM -0500, David Steele wrote: > The main objections as I recall are that it is much harder for simple backup > scripts and commercial backup integrations to hold a connection to postgres > open and write the backup label separately into the backup. I don't quite understand why this argument would not hold even today, even if I'd like to think that more people are using pg_basebackup. > I did figure out how to keep the safe part of exclusive backup (not having > to maintain a connection) while removing the dangerous part (writing > backup_label into PGDATA), but it was a substantial amount of work and I > felt that it had little chance of being committed. Which was, I guess, done by storing the backup_label contents within a file different than backup_label, still maintained in the main data folder to ensure that it gets included in the backup? -- Michael signature.asc Description: PGP signature
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd
On Wed, Dec 1, 2021 at 3:33 AM Robert Haas wrote: > On Tue, Nov 30, 2021 at 4:47 AM Andy Fan wrote: > >> my exception should be that the relcache should not be invalidated > _after the first relation open_ > >> in the executor (not the beginning of executorRun)。 > > > > s/exception/expectation. > > > > To be more accurate, my expectation is for a single sql statement, > after the first time I write data into > > the relation, until the statement is completed or "aborted and > RelationClose is called in ResourceOwnerRelease", > > the relcache reset should never happen. > > Well I'm not sure why you asked the question and then argued with > the answer you got. I think you misunderstand me, I argued with the answer because after I got the answer and I rethink my problem, I found my question description is not accurate enough, so I improved the question and willing discussion again. My exception was things will continue with something like this: 1. In your new described situation, your solution still does not work because ... 2. In your new described situation, the solution would work for sure 3. your situation is still not cleared enough. -- Best Regards Andy Fan
Deprecating the term "super-exclusive lock"
In commit 276db875, I made vacuumlazy.c consistently use the term "cleanup lock", rather than the term "super-exclusive lock". But on further reflection I should have gone further, and removed the term "super-exclusive lock" from the tree completely. The actual relevant C symbols only use the term cleanup. Attached patch does this. There's not much churn. The term "super-exclusive lock" is far more likely to be used in index AM code, particularly nbtree. That's why I, the nbtree person, originally added a bunch of uses of that term in heapam -- prior to that point heapam probably didn't use the term once. Anyway, I don't think that there is a particularly good reason for an index AM/table AM divide in terminology. In fact, I'd go further: nbtree's use of super-exclusive locks is actually something that exists for the benefit of heapam alone -- and so using a different name in index AM code makes zero sense, because it's really a heapam thing anyway. Despite appearances. The underlying why we need a cleanup lock when calling _bt_delitems_vacuum() (but not when calling the near-identical _bt_delitems_delete() function) is this: we need it as an interlock, to avoid breaking index-only scans with concurrent heap vacuuming (not pruning) that takes place in vacuumlazy.c [1]. This issue isn't currently documented anywhere, though I plan on addressing that in the near future, with a separate patch. Historic note: the reason why this issue is so confused now has a lot to do with how the code has evolved over time. When the cleanup lock thing was first added to nbtree way back in 2001 (see commit c8076f09), there was no such thing as HOT, and nbtree didn't do page-at-a-time processing yet -- I believe that the cleanup lock was needed to avoid breaking these things (when lazy VACUUM became the default). Of course the cleanup lock can't have been needed for index-only scans back then, because there weren't any. I'm pretty sure that that's the only remaining reason for requiring a cleanup lock. Note about a future optimization opportunity: this also means that we could safely elide the cleanup lock during VACUUM (just get an exclusive lock) iff lazyvacuum.c told ambulkdelete that it has *already* decided that it won't bother performing a round of heap VACUUM in lazy_vacuum_heap_rel(). This observation isn't useful on its own, but in a world with something like Robert Haas's conveyor belt design (a world with *selective* index vacuuming), it could be quite valuable. [1] https://postgr.es/m/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5q...@mail.gmail.com -- Peter Geoghegan v1-0001-Deprecate-the-term-super-exclusive-lock.patch Description: Binary data
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On 11/30/21, 2:58 PM, "David Steele" wrote: > I did figure out how to keep the safe part of exclusive backup (not > having to maintain a connection) while removing the dangerous part > (writing backup_label into PGDATA), but it was a substantial amount of > work and I felt that it had little chance of being committed. Do you think it's still worth trying to make it safe, or do you think we should just remove exclusive mode completely? > Attaching the thread [1] that I started with a patch to remove exclusive > backup for reference. Ah, good, some light reading. :) Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On 11/30/21 17:26, Tom Lane wrote: "Bossart, Nathan" writes: It looks like the exclusive way has been marked deprecated in all supported versions along with a note that it will eventually be removed. If it's not going to be removed out of fear of breaking backward compatibility, I think the documentation should be updated to say that. However, unless there is something that is preventing users from switching to the non-exclusive approach, I think it is reasonable to begin thinking about removing it. If we're willing to outright remove it, I don't have any great objection. My original two cents was that we shouldn't put effort into improving it; but removing it isn't that. The main objections as I recall are that it is much harder for simple backup scripts and commercial backup integrations to hold a connection to postgres open and write the backup label separately into the backup. As Stephen noted, working in this area is much harder (even in the docs) due to the need to keep both methods working. When I removed exclusive backup it didn't break any tests, other than one that needed to generate a corrupt backup, so we have virtually no coverage for that method. I did figure out how to keep the safe part of exclusive backup (not having to maintain a connection) while removing the dangerous part (writing backup_label into PGDATA), but it was a substantial amount of work and I felt that it had little chance of being committed. Attaching the thread [1] that I started with a patch to remove exclusive backup for reference. -- [1] https://www.postgresql.org/message-id/flat/ac7339ca-3718-3c93-929f-99e725d1172c%40pgmasters.net
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On 11/30/21, 2:27 PM, "Tom Lane" wrote: > If we're willing to outright remove it, I don't have any great objection. > My original two cents was that we shouldn't put effort into improving it; > but removing it isn't that. I might try to put a patch together for the January commitfest, given there is enough support. Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
"Bossart, Nathan" writes: > It looks like the exclusive way has been marked deprecated in all > supported versions along with a note that it will eventually be > removed. If it's not going to be removed out of fear of breaking > backward compatibility, I think the documentation should be updated to > say that. However, unless there is something that is preventing users > from switching to the non-exclusive approach, I think it is reasonable > to begin thinking about removing it. If we're willing to outright remove it, I don't have any great objection. My original two cents was that we shouldn't put effort into improving it; but removing it isn't that. regards, tom lane
Re: row filtering for logical replication
On Tue, Nov 30, 2021 at 9:34 PM vignesh C wrote: > > 3) Can a user remove the row filter without removing the table from > the publication after creating the publication or should the user drop > the table and add the table in this case? > AFAIK to remove an existing filter use ALTER PUBLICATION ... SET TABLE but do not specify any filter. For example, test_pub=# create table t1(a int primary key); CREATE TABLE test_pub=# create publication p1 for table t1 where (a > 1); CREATE PUBLICATION test_pub=# create publication p2 for table t1 where (a > 2); CREATE PUBLICATION test_pub=# \d+ t1 Table "public.t1" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description +-+---+--+-+-+-+--+- a | integer | | not null | | plain | | | Indexes: "t1_pkey" PRIMARY KEY, btree (a) Publications: "p1" WHERE ((a > 1)) "p2" WHERE ((a > 2)) Access method: heap test_pub=# alter publication p1 set table t1; ALTER PUBLICATION test_pub=# \d+ t1 Table "public.t1" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description +-+---+--+-+-+-+--+- a | integer | | not null | | plain | | | Indexes: "t1_pkey" PRIMARY KEY, btree (a) Publications: "p1" "p2" WHERE ((a > 2)) Access method: heap -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On 11/30/21, 9:51 AM, "Stephen Frost" wrote: > I disagree that that’s a satisfactory approach. It certainly wasn’t > intended or documented as part of the original feature and therefore > to call it satisfactory strikes me quite strongly as revisionist > history. It looks like the exclusive way has been marked deprecated in all supported versions along with a note that it will eventually be removed. If it's not going to be removed out of fear of breaking backward compatibility, I think the documentation should be updated to say that. However, unless there is something that is preventing users from switching to the non-exclusive approach, I think it is reasonable to begin thinking about removing it. Nathan
Re: Correct handling of blank/commented lines in PSQL interactive-mode history
On Tue, Nov 30, 2021 at 04:35:13PM -0500, Tom Lane wrote: > Bruce Momjian writes: > > On Mon, Nov 29, 2021 at 11:12:35PM -0500, Tom Lane wrote: > >> Interesting. I can probably adjust my MUA to send "text/x-patch", > >> but I'll have to look around to see where that's determined. > > > I would be interesting to know if "text/x-patch" is better than > > "text/x-diff" --- I currently use the later. > > I found out that where that is coming from is "file -i", so I'm a > bit loath to modify it. Is there any hard documentation as to why > "text/x-patch" should be preferred? I thought this was happening from /etc/mime.types: text/x-diff diff patch The file extensions 'diff' and 'patch' trigger mime to use text/x-diff for its attachments, at least on Debian. Based on that, I assumed "text/x-diff" was more standardized than "text/x-patch". However, it seems file -i also looks at the contents since a file with a single word in it is not recognized as a diff: $ git diff > /rtmp/x.diff $ file -i /rtmp/x.diff /rtmp/x.diff: text/x-diff; charset=us-ascii --- $ echo test > /rtmp/x.diff $ file -i /rtmp/x.diff /rtmp/x.diff: text/plain; charset=us-ascii -- -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Correct handling of blank/commented lines in PSQL interactive-mode history
Bruce Momjian writes: > On Mon, Nov 29, 2021 at 11:12:35PM -0500, Tom Lane wrote: >> Interesting. I can probably adjust my MUA to send "text/x-patch", >> but I'll have to look around to see where that's determined. > I would be interesting to know if "text/x-patch" is better than > "text/x-diff" --- I currently use the later. I found out that where that is coming from is "file -i", so I'm a bit loath to modify it. Is there any hard documentation as to why "text/x-patch" should be preferred? regards, tom lane
Re: Correct handling of blank/commented lines in PSQL interactive-mode history
On Mon, Nov 29, 2021 at 11:12:35PM -0500, Tom Lane wrote: > Greg Nancarrow writes: > > After a bit of investigation, it seems that patch attachments (like > > yours) with a Context-Type of "text/x-diff" download through Gmail in > > CRLF format for me (I'm running a browser on Windows, but my Postgres > > development environment is in a Linux VM). So those must get converted > > from Unix to CRLF format if downloaded using a browser running on > > Windows. > > The majority of patch attachments (?) seem to have a Context-Type of > > "application/octet-stream" or "text/x-patch", and these seem to > > download raw (in their original Unix format). > > Interesting. I can probably adjust my MUA to send "text/x-patch", > but I'll have to look around to see where that's determined. > (I dislike using "application/octet-stream" for this, because > the archives won't show that as text, they only let you download > the attachment. Maybe that's more Safari's fault than the > archives per se, not sure.) I would be interesting to know if "text/x-patch" is better than "text/x-diff" --- I currently use the later. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Non-superuser subscription owners
On Tue, 2021-11-30 at 17:25 +0530, Amit Kapila wrote: > I think it would be better to do it before we allow subscription > owners to be non-superusers. There are a couple other things to consider before allowing non- superusers to create subscriptions anyway. For instance, a non- superuser shouldn't be able to use a connection string that reads the certificate file from the server unless they also have pg_read_server_files privs. > Yeah, it is possible that is why I suggested in one of the emails > above to allow changing the owners only for disabled subscriptions. The current patch detects the following cases at the transaction boundary: * ALTER SUBSCRIPTION ... OWNER TO ... * ALTER ROLE ... NOSUPERUSER * privileges revoked one way or another (aside from the RLS/WCO problems, which will be fixed) If we want to detect at row boundaries we need to capture all of those cases too, or else we're being inconsistent. The latter two cannot be tied to whether the subscription is disabled or not, so I don't think that's a complete solution. How about (as a separate patch) we just do maybe_reread_subscription() every K operations within a transaction? That would speed up permissions errors if a revoke happens. Regards, Jeff Davis
Re: allowing "map" for password auth methods with clientcert=verify-full
On Mon, 2021-11-08 at 15:32 -0500, Stephen Frost wrote: > Where does that leave us with what Jonathan is suggesting though? For > my 2c, we shouldn't allow 'map=' to be used for scram or md5 because > it'll just confuse users, until and unless we actually do the PGAUTHUSER > thing and then we can allow 'map=' to check if that mapping is allowed > and the actual SCRAM PW check is done against PGAUTHUSER and then the > logged in user is the user as specified in the startup packet, assuming > that mapping is allowed. For Jonathan's actual case though, we should > add a 'certmap' option instead and have that be explicitly for the case > where it's scram w/ clientcert=verify-full and then we check the mapping > of the DN/CN to the startup-packet username. There's no reason this > couldn't also work with a 'map' specified and PGAUTHUSER set and SCRAM > used to verify against that at the same time. Agreed. > Perhaps not a surprise, but I continue to be against the idea of adding > anything more to the insecure hack that is our LDAP auth method. We > should be moving away from that, not adding to it. Paraphrasing you from earlier, the "authenticate as one user and then log in as another" use case is the one I'm trying to expand. LDAP is just the auth method I happen to have present-day customer cases for. > That this would also require a new connection option / envvar to tell us > who the user wants to authenticate to LDAP as doesn't exactly make me > any more thrilled with it. Forgetting the LDAP part for a moment, do you have another suggestion for how we can separate the role name from the user name? The connection string seemed to be the most straightforward. --Jacob
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Tue, Nov 23, 2021 at 5:01 PM Peter Geoghegan wrote: > > Behaviour that lead to a "sudden" falling over, rather than getting > > gradually > > worse are bad - they somehow tend to happen on Friday evenings :). > > These are among our most important challenges IMV. I haven't had time to work through any of your feedback just yet -- though it's certainly a priority for. I won't get to it until I return home from PGConf NYC next week. Even still, here is a rebased v2, just to fix the bitrot. This is just a courtesy to anybody interested in the patch. -- Peter Geoghegan v2-0001-Simplify-lazy_scan_heap-s-handling-of-scanned-pag.patch Description: Binary data
Re: Rename dead_tuples to dead_items in vacuumlazy.c
On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada wrote: > Thanks! I'll change my parallel vacuum refactoring patch accordingly. Thanks again for working on that. > Regarding the commit, I think that there still is one place in > lazyvacuum.c where we can change "dead tuples” to "dead items”: > > /* > * Allocate the space for dead tuples. Note that this handles parallel > * VACUUM initialization as part of allocating shared memory space used > * for dead_items. > */ > dead_items_alloc(vacrel, params->nworkers); > dead_items = vacrel->dead_items; Oops. Pushed a fixup for that just now. > Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES > and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose? That was deliberate. It would be a bit strange to alter these constants without also updating the corresponding column names for the pg_stat_progress_vacuum system view. But if I kept the definition from system_views.sql in sync, then I would break user scripts -- for reasons that users don't care about. That didn't seem like the right approach. Also, the system as a whole still assumes "DEAD tuples and LP_DEAD items are the same, and are just as much of a problem in the table as they are in each index". As you know, this is not really true, which is an important problem for us. Fixing it (perhaps as part of adding something like Robert's conveyor belt design) will likely require revising this model quite fundamentally (e.g, the vacthresh calculation in autovacuum.c:relation_needs_vacanalyze() would be replaced). When this happens, we'll probably need to update system views that have columns with names like "dead_tuples" -- because maybe we no longer specifically count dead items/tuples at all. I strongly suspect that the approach to statistics that we take for pg_statistic optimizer stats just doesn't work for dead items/tuples -- statistical sampling only produces useful statistics for the optimizer because certain delicate assumptions are met (even these assumptions only really work with a properly normalized database schema). Maybe revising the model used for autovacuum scheduling wouldn't include changing pg_stat_progress_vacuum, since that isn't technically "part of the model" --- I'm not sure. But it's not something that I am in a hurry to fix. -- Peter Geoghegan
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd
On Tue, Nov 30, 2021 at 4:47 AM Andy Fan wrote: >> my exception should be that the relcache should not be invalidated _after >> the first relation open_ >> in the executor (not the beginning of executorRun)。 > > s/exception/expectation. > > To be more accurate, my expectation is for a single sql statement, after > the first time I write data into > the relation, until the statement is completed or "aborted and RelationClose > is called in ResourceOwnerRelease", > the relcache reset should never happen. Well I'm not sure why you asked the question and then argued with the answer you got. I mean, you can certainly decide how you think it works, but that's not how I think it works. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Don't block HOT update by BRIN index
OK, I've polished the last version of the patch a bit (added a regression test with update of attribute in index predicate and docs about the new flag into indexam.sgml) and pushed. I wonder if we could/should improve handling of index predicates. In particular, it seems to me we could simply ignore indexes when the new row does not match the index predicate. For example, if there's an index CREATE INDEX ON t (a) WHERE b = 1; and the update does: UPDATE t SET b = 2 WHERE ...; then we'll not add the tuple pointer to this index anyway, and we could simply ignore this index when considering HOT. But I might be missing something important about HOT ... The main problem I see with this is it requires evaluating the index predicate for each tuple, which makes it incompatible with the caching in RelationGetIndexAttrBitmap. Just ditching the caching seems like a bad idea, so we'd probably have to do this in two phases: 1) Do what we do now, i.e. RelationGetIndexAttrBitmap considering all indexes / attributes. If this says HOT is possible, great - we're done. 2) If (1) says HOT is not possible, we need to look whether it's because of regular or partial index. For regular indexes it's clear, for partial indexes we could ignore this if the predicate evaluates to false for the new row. But even if such optimization is possible, it's way out of scope of this patch and it's not clear to me it's actually a sensible trade-off. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Support for NSS as a libpq TLS backend
On Mon, 2021-09-27 at 15:44 +0200, Daniel Gustafsson wrote: > > Speaking of IP addresses in SANs, it doesn't look like our OpenSSL > > backend can handle those. That's a separate conversation, but I might > > take a look at a patch for next commitfest. > > Please do. Didn't get around to it for November, but I'm putting the finishing touches on that now. While I was looking at the new SAN code (in fe-secure-nss.c, pgtls_verify_peer_name_matches_certificate_guts()), I noticed that code coverage never seemed to touch a good chunk of it: > +for (cn = san_list; cn != san_list; cn = CERT_GetNextGeneralName(cn)) > +{ > +char *alt_name; > +int rv; > +chartmp[512]; That loop can never execute. But I wonder if all of that extra SAN code should be removed anyway? There's this comment above it: > + /* > + * CERT_VerifyCertName will internally perform RFC 2818 SubjectAltName > + * verification. > + */ and it seems like SAN verification is working in my testing, despite the dead loop. --Jacob
Re: pg_replslotdata - a tool for displaying replication slot information
On 11/30/21, 6:14 AM, "Peter Eisentraut" wrote: > On 23.11.21 06:09, Bharath Rupireddy wrote: >> The replication slots data is stored in binary format on the disk under >> the pg_replslot/<> directory which isn't human readable. If >> the server is crashed/down (for whatever reasons) and unable to come up, >> currently there's no way for the user/admin/developer to know what were >> all the replication slots available at the time of server crash/down to >> figure out what's the restart lsn, xid, two phase info or types of slots >> etc. > > What do you need that for? You can't do anything with a replication > slot while the server is down. One use-case might be to discover the value you need to set for max_replication_slots, although it's pretty trivial to discover the number of replication slots by looking at the folder directly. However, you also need to know how many replication origins there are, and AFAIK there isn't an easy way to read the replorigin_checkpoint file at the moment. IMO a utility like this should also show details for the replication origins. I don't have any other compelling use- cases at the moment, but I will say that it is typically nice from an administrative standpoint to be able to inspect things like this without logging into a running server. Nathan
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Greetings, On Tue, Nov 30, 2021 at 11:47 Laurenz Albe wrote: > On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote: > > Greetings, > > > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > > Michael Paquier writes: > > > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM > wrote: > > > > > If we are keeping it then why not make it better? > > > > > > > Well, non-exclusive backups are better by design in many aspects, so > I > > > > don't quite see the point in spending time on something that has more > > > > limitations than what's already in place. > > > > > > IMO the main reason for keeping it is backwards compatibility for users > > > who have a satisfactory backup arrangement using it. That same > argument > > > implies that we shouldn't change how it works (at least, not very > much). > > > > There isn't a satisfactory backup approach using it specifically because > > of this issue, hence why we should remove it to make it so users don't > > run into this. > > There is a satisfactory approach, as long as you are satisfied with > manually restarting the server if it crashed during a backup. I disagree that that’s a satisfactory approach. It certainly wasn’t intended or documented as part of the original feature and therefore to call it satisfactory strikes me quite strongly as revisionist history. > I don't find the reasons brought up to continue to support exclusive > > backup to be at all compelling and the lack of huge issues with the new > > way restore works to make it abundently clear that we can, in fact, > > remove exclusive backup in a major version change without the world > > coming down. > > I guess the lack of hue and cry was at least to a certain extent because > the exclusive backup API was deprecated, but not removed. These comments were in reference to the restore API, which was quite changed (new special files that have to be touched, removing of recovery.conf, options moved to postgresql.conf/.auto, etc). So, no. Thanks, Stephen >
Re: Atomic rename feature for Windows.
Hi The flags for calling the CreateFile function have been changed. Victor Spirin Postgres Professional:http://www.postgrespro.com The Russian Postgres Company 05.07.2021 16:53, Victor Spirin пишет: Hi I used the SetFileInformationByHandle function with the FILE_RENAME_FLAG_POSIX_SEMANTICS flag for the file rename function.. 1) The _WIN32_WINNT variable needs to be increased to 0x0A00 (Windows 10). Fixed conflict with #undef CHECKSUM_TYPE_NONE 2) The SetFileInformationByHandle function works correctly only on Windows 10 and higher. The app must have a manifest to check the Windows version using the IsWindows10OrGreater() function. I added a manifest to all postgres projects and disabled the GenerateManifest option on windows projects. This patch related to this post: https://www.postgresql.org/message-id/CAEepm%3D0FV-k%2B%3Dd9z08cW%3DZXoR1%3Dkw9wdpkP6WAuOrKJdz-8ujg%40mail.gmail.com diff --git a/src/include/common/checksum_helper.h b/src/include/common/checksum_helper.h index cac7570ea1..2d533c93a6 100644 --- a/src/include/common/checksum_helper.h +++ b/src/include/common/checksum_helper.h @@ -26,6 +26,13 @@ * MD5 here because any new that does need a cryptographically strong checksum * should use something better. */ + + /* + * CHECKSUM_TYPE_NONE defined in the winioctl.h when _WIN32_WINNT >= _WIN32_WINNT_WIN10 + */ +#ifdef CHECKSUM_TYPE_NONE +#undef CHECKSUM_TYPE_NONE +#endif typedef enum pg_checksum_type { CHECKSUM_TYPE_NONE, diff --git a/src/include/port/win32.h b/src/include/port/win32.h index c6213c77c3..c48a6ce3a1 100644 --- a/src/include/port/win32.h +++ b/src/include/port/win32.h @@ -12,12 +12,13 @@ /* * Make sure _WIN32_WINNT has the minimum required value. * Leave a higher value in place. When building with at least Visual - * Studio 2015 the minimum requirement is Windows Vista (0x0600) to - * get support for GetLocaleInfoEx() with locales. For everything else + * Studio 2015 the minimum requirement is Windows 10 (0x0A00) to get support for SetFileInformationByHandle. + * The minimum requirement is Windows Vista (0x0600) get support for GetLocaleInfoEx() with locales. + * For everything else * the minimum version is Windows XP (0x0501). */ #if defined(_MSC_VER) && _MSC_VER >= 1900 -#define MIN_WINNT 0x0600 +#define MIN_WINNT 0x0A00 #else #define MIN_WINNT 0x0501 #endif diff --git a/src/port/dirmod.c b/src/port/dirmod.c index 763bc5f915..031655f0c8 100644 --- a/src/port/dirmod.c +++ b/src/port/dirmod.c @@ -39,6 +39,192 @@ #endif #endif + +#if defined(WIN32) && !defined(__CYGWIN__) && defined(_WIN32_WINNT_WIN10) && _WIN32_WINNT >= _WIN32_WINNT_WIN10 + +#include + +/* + * Checks Windows version using RtlGetVersion + * Version 1607 (Build 14393) is required for SetFileInformationByHandle + * function with FILE_RENAME_FLAG_POSIX_SEMANTICS flag +*/ +typedef NTSYSAPI(NTAPI * PFN_RTLGETVERSION) +(OUT PRTL_OSVERSIONINFOEXW lpVersionInformation); + +static int isWin1607 = -1; +static int isWindows1607OrGreater() +{ + HMODULE ntdll; + PFN_RTLGETVERSION _RtlGetVersion = NULL; + OSVERSIONINFOEXW info; + if (isWin1607 >= 0) return isWin1607; + ntdll = LoadLibraryEx("ntdll.dll", NULL, 0); + if (ntdll == NULL) + { + DWORD err = GetLastError(); + + _dosmaperr(err); + return -1; + } + + _RtlGetVersion = (PFN_RTLGETVERSION)(pg_funcptr_t) + GetProcAddress(ntdll, "RtlGetVersion"); + if (_RtlGetVersion == NULL) + { + DWORD err = GetLastError(); + + FreeLibrary(ntdll); + _dosmaperr(err); + return -1; + } + info.dwOSVersionInfoSize = sizeof(OSVERSIONINFOEXW); + if (!NT_SUCCESS(_RtlGetVersion())) + { + DWORD err = GetLastError(); + + FreeLibrary(ntdll); + _dosmaperr(err); + return -1; + } + + if (info.dwMajorVersion >= 10 && info.dwBuildNumber >= 14393) isWin1607 = 1; + else isWin1607 = 0; + FreeLibrary(ntdll); + return isWin1607; + +} + +typedef struct _FILE_RENAME_INFO_EXT { + FILE_RENAME_INFO fri; + WCHAR extra_space[MAX_PATH]; +} FILE_RENAME_INFO_EXT; + +/* + * pgrename_windows_posix_semantics - uses SetFileInformationByHandle function + * with FILE_RENAME_FLAG_POSIX_SEMANTICS flag for atomic rename file + * working only on Windows 10 (1607) or later and _WIN32_WINNT must be >= _WIN32_WINNT_WIN10 + */ +static int +pgrename_windows_posix_semantics(const char *from, const char *to) +{ + int err; + FILE_RENAME_INFO_EXT rename_info; + PFILE_RENAME_INFO prename_info; + HANDLE f_handle; + wchar_t from_w[MAX_PATH]; + + prename_info = (PFILE_RENAME_INFO)_info; + + if (MultiByteToWideChar(CP_ACP, 0, (LPCCH)from, -1, (LPWSTR)from_w, MAX_PATH) == 0) { + err = GetLastError(); +
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On Tue, 2021-11-30 at 09:20 -0500, Stephen Frost wrote: > Greetings, > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > Michael Paquier writes: > > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote: > > > > If we are keeping it then why not make it better? > > > > > Well, non-exclusive backups are better by design in many aspects, so I > > > don't quite see the point in spending time on something that has more > > > limitations than what's already in place. > > > > IMO the main reason for keeping it is backwards compatibility for users > > who have a satisfactory backup arrangement using it. That same argument > > implies that we shouldn't change how it works (at least, not very much). > > There isn't a satisfactory backup approach using it specifically because > of this issue, hence why we should remove it to make it so users don't > run into this. There is a satisfactory approach, as long as you are satisfied with manually restarting the server if it crashed during a backup. > I don't find the reasons brought up to continue to support exclusive > backup to be at all compelling and the lack of huge issues with the new > way restore works to make it abundently clear that we can, in fact, > remove exclusive backup in a major version change without the world > coming down. I guess the lack of hue and cry was at least to a certain extent because the exclusive backup API was deprecated, but not removed. Yours, Laurenz Albe
Re: Skipping logical replication transactions on subscriber side
On Tue, Nov 30, 2021 at 7:09 PM Masahiko Sawada wrote: > > On Tue, Nov 30, 2021 at 8:41 PM Masahiko Sawada wrote: > > > > On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila wrote: > > > > > > On Mon, Nov 29, 2021 at 11:38 AM vignesh C wrote: > > > > > > > > > > I have pushed this patch and there is a buildfarm failure for it. See: > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder=2021-11-30%2005%3A05%3A25 > > > > > > Sawada-San has shared his initial analysis on pgsql-committers [1] and > > > I am responding here as the fix requires some more discussion. > > > > > > > Looking at the result the test actually got, we had two error entries > > > > for test_tab1 instead of one: > > > > > > > > # Failed test 'check the error reported by the apply worker' > > > > # at t/026_worker_stats.pl line 33. > > > > # got: 'tap_sub|INSERT|test_tab1|t > > > > # tap_sub||test_tab1|t' > > > > # expected: 'tap_sub|INSERT|test_tab1|t' > > > > > > > > The possible scenarios are: > > > > > > > > The table sync worker for test_tab1 failed due to an error unrelated > > > > to apply changes: > > > > > > > > 2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with > > > > OID 2 is already active for PID 23706 > > > > > > > > At this time, the view had one error entry for the table sync worker. > > > > After retrying table sync, it succeeded: > > > > > > > > 2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table > > > > synchronization worker for subscription "tap_sub", table "test_tab1" > > > > has finished > > > > > > > > Then after inserting a row on the publisher, the apply worker inserted > > > > the row but failed due to violating a unique key violation, which is > > > > expected: > > > > > > > > 2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value > > > > violates unique constraint "test_tab1_pkey" > > > > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already > > > > exists. > > > > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data > > > > during "INSERT" for replication target relation "public.test_tab1" in > > > > transaction 721 at 2021-11-30 06:24:04.305096+01 > > > > > > > > As a result, we had two error entries for test_tab1: the table sync > > > > worker error and the apply worker error. I didn't expect that the > > > > table sync worker for test_tab1 failed due to "replication origin with > > > > OID 2 is already active for PID 23706” error. > > > > > > > > Looking at test_subscription_error() in 026_worker_stats.pl, we have > > > > two checks; in the first check, we wait for the view to show the error > > > > entry for the given relation name and xid. This check was passed since > > > > we had the second error (i.g., apply worker error). In the second > > > > check, we get error entries from pg_stat_subscription_workers by > > > > specifying only the relation name. Therefore, we ended up getting two > > > > entries and failed the tests. > > > > > > > > To fix this issue, I think that in the second check, we can get the > > > > error from pg_stat_subscription_workers by specifying the relation > > > > name *and* xid like the first check does. I've attached the patch. > > > > What do you think? > > > > > > > > > > I think this will fix the reported failure but there is another race > > > condition in the test. Isn't it possible that for table test_tab2, we > > > get an error "replication origin with OID ..." or some other error > > > before copy, in that case also, we will proceed from the second call > > > of test_subscription_error() which is not what we expect in the test? > > > > Right. > > > > > Shouldn't we someway check that the error message also starts with > > > "duplicate key value violates ..."? > > > > Yeah, I think it's a good idea to make the checks more specific. That > > is, probably we can specify the prefix of the error message and > > subrelid in addition to the current conditions: relid and xid. That > > way, we can check what error was reported by which workers (tablesync > > or apply) for which relations. And both check queries in > > test_subscription_error() can have the same WHERE clause. > > I've attached a patch that fixes this issue. Please review it. Thanks for the updated patch, the patch applies neatly and make check-world passes. Also I ran the failing test in a loop and found it to be passing always. Regards, Vignesh
Re: row filtering for logical replication
On Tue, Nov 30, 2021 at 12:33 PM Ajin Cherian wrote: > > On Thu, Nov 25, 2021 at 2:22 PM Peter Smith wrote: > > > > Thanks for all the review comments so far! We are endeavouring to keep > > pace with them. > > > > All feedback is being tracked and we will fix and/or reply to everything > > ASAP. > > > > Meanwhile, PSA the latest set of v42* patches. > > > > This version was mostly a patch restructuring exercise but it also > > addresses some minor review comments in passing. > > > > Addressed more review comments, in the attached patch-set v43. 5 > patches carried forward from v42. > This patch-set contains the following fixes: > > On Tue, Nov 23, 2021 at 1:28 AM Dilip Kumar wrote: > > > > in pgoutput_row_filter, we are dropping the slots if there are some > > old slots in the RelationSyncEntry. But then I noticed that in > > rel_sync_cache_relation_cb(), also we are doing that but only for the > > scantuple slot. So IMHO, rel_sync_cache_relation_cb(), is only place > > setting entry->rowfilter_valid to false; so why not drop all the slot > > that time only and in pgoutput_row_filter(), you can just put an > > assert? > > > > Moved all the dropping of slots to rel_sync_cache_relation_cb() > > > +static bool > > +pgoutput_row_filter_virtual(Relation relation, TupleTableSlot *slot, > > RelationSyncEntry *entry) > > +{ > > +EState *estate; > > +ExprContext *ecxt; > > > > > > pgoutput_row_filter_virtual and pgoutput_row_filter are exactly same > > except, ExecStoreHeapTuple(), so why not just put one check based on > > whether a slot is passed or not, instead of making complete duplicate > > copy of the function. > > Removed pgoutput_row_filter_virtual > > > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > tupdesc = CreateTupleDescCopy(tupdesc); > > entry->scantuple = MakeSingleTupleTableSlot(tupdesc, > > ); > > > > Why do we need to copy the tupledesc? do we think that we need to have > > this slot even if we close the relation, if so can you add the > > comments explaining why we are making a copy here. > > This code has been modified, and comments added. > > On Tue, Nov 23, 2021 at 8:02 PM Amit Kapila wrote: > > One more thing related to this code: > > pgoutput_row_filter() > > { > > .. > > + if (!entry->rowfilter_valid) > > { > > .. > > + oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > + tupdesc = CreateTupleDescCopy(tupdesc); > > + entry->scantuple = MakeSingleTupleTableSlot(tupdesc, ); > > + MemoryContextSwitchTo(oldctx); > > .. > > } > > > > Why do we need to initialize scantuple here unless we are sure that > > the row filter is going to get associated with this relentry? I think > > when there is no row filter then this allocation is not required. > > > > Modified as suggested. > > On Tue, Nov 23, 2021 at 10:52 PM Amit Kapila wrote: > > > > In 0003 patch, why is below change required? > > --- a/src/backend/replication/pgoutput/pgoutput.c > > +++ b/src/backend/replication/pgoutput/pgoutput.c > > @@ -1,4 +1,4 @@ > > -/*- > > +/* > > * > > * pgoutput.c > > > > Removed. > > > > > After above, rearrange the code in pgoutput_row_filter(), so that two > > different checks related to 'rfisnull' (introduced by different > > patches) can be combined as if .. else check. > > > Fixed. > > On Thu, Nov 25, 2021 at 12:03 AM Amit Kapila wrote: > > > > + * If the new relation or the old relation has a where clause, > > + * we need to remove it so that it can be added afresh later. > > + */ > > + if (RelationGetRelid(newpubrel->relation) == oldrelid && > > + newpubrel->whereClause == NULL && rfisnull) > > > > Can't we use _equalPublicationTable() here? It compares the whereClause as > > well. > > > > Tried this, can't do this because one is an alter statement while the > other is a publication, the whereclause is not > the same Nodetype. In the statement, the whereclause is T_A_Expr, > while in the publication > catalog, it is T_OpExpr. Here we will not be able to do a direct comparison as we store the transformed where clause in the pg_publication_rel table. We will have to transform the where clause and then check. I have attached a patch where we can check the transformed where clause and see if the where clause is the same or not. If you are ok with this approach you could make similar changes. Regards, Vignesh diff --git a/src/backend/catalog/pg_publication.c b/src/backend/catalog/pg_publication.c index b80c21e6ae..ae4a46e44a 100644 --- a/src/backend/catalog/pg_publication.c +++ b/src/backend/catalog/pg_publication.c @@ -359,6 +359,32 @@ GetPubPartitionOptionRelations(List *result, PublicationPartOpt pub_partopt, return result; } +Node * +GetTransformedWhereClause(ParseState *pstate, PublicationRelInfo *pri, + bool bfixupcollation) +{ + ParseNamespaceItem *nsitem; + Node
Re: [PATCH] DROP tab completion
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested The patch applies cleanly and the functionality seems to work well. (master e7122548a3) The new status of this patch is: Ready for Committer
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Michael Paquier writes: > > On Thu, Nov 25, 2021 at 06:19:03PM -0800, SATYANARAYANA NARLAPURAM wrote: > >> If we are keeping it then why not make it better? > > > Well, non-exclusive backups are better by design in many aspects, so I > > don't quite see the point in spending time on something that has more > > limitations than what's already in place. > > IMO the main reason for keeping it is backwards compatibility for users > who have a satisfactory backup arrangement using it. That same argument > implies that we shouldn't change how it works (at least, not very much). There isn't a satisfactory backup approach using it specifically because of this issue, hence why we should remove it to make it so users don't run into this. Would also simplify the documentation around the low level backup API, which would be a very good thing. Right now, making improvements in that area is very challenging even if all you want to do is improve the documentation around the non-exclusive API. We dealt with this as best as one could in pgbackrest for PG versions prior to when non-exclusive backup was added- which is to remove the backup_label file as soon as possible and then put it back right before you call pg_stop_backup() (since it'll complain otherwise). Not a perfect answer though and a risk still exists there of a failed restart happening. Of course, for versions which support non-exclusive backup, we use that to avoid this issue. We also extensively changed how restore works a couple releases ago and while there was some noise about it, it certainly wasn't all that bad. I don't find the reasons brought up to continue to support exclusive backup to be at all compelling and the lack of huge issues with the new way restore works to make it abundently clear that we can, in fact, remove exclusive backup in a major version change without the world coming down. Thanks, Stephen signature.asc Description: PGP signature
Re: pg_replslotdata - a tool for displaying replication slot information
On 23.11.21 06:09, Bharath Rupireddy wrote: The replication slots data is stored in binary format on the disk under the pg_replslot/<> directory which isn't human readable. If the server is crashed/down (for whatever reasons) and unable to come up, currently there's no way for the user/admin/developer to know what were all the replication slots available at the time of server crash/down to figure out what's the restart lsn, xid, two phase info or types of slots etc. What do you need that for? You can't do anything with a replication slot while the server is down.
Re: Skipping logical replication transactions on subscriber side
On Tue, Nov 30, 2021 at 8:41 PM Masahiko Sawada wrote: > > On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila wrote: > > > > On Mon, Nov 29, 2021 at 11:38 AM vignesh C wrote: > > > > > > > I have pushed this patch and there is a buildfarm failure for it. See: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder=2021-11-30%2005%3A05%3A25 > > > > Sawada-San has shared his initial analysis on pgsql-committers [1] and > > I am responding here as the fix requires some more discussion. > > > > > Looking at the result the test actually got, we had two error entries > > > for test_tab1 instead of one: > > > > > > # Failed test 'check the error reported by the apply worker' > > > # at t/026_worker_stats.pl line 33. > > > # got: 'tap_sub|INSERT|test_tab1|t > > > # tap_sub||test_tab1|t' > > > # expected: 'tap_sub|INSERT|test_tab1|t' > > > > > > The possible scenarios are: > > > > > > The table sync worker for test_tab1 failed due to an error unrelated > > > to apply changes: > > > > > > 2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with > > > OID 2 is already active for PID 23706 > > > > > > At this time, the view had one error entry for the table sync worker. > > > After retrying table sync, it succeeded: > > > > > > 2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table > > > synchronization worker for subscription "tap_sub", table "test_tab1" > > > has finished > > > > > > Then after inserting a row on the publisher, the apply worker inserted > > > the row but failed due to violating a unique key violation, which is > > > expected: > > > > > > 2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value > > > violates unique constraint "test_tab1_pkey" > > > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists. > > > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data > > > during "INSERT" for replication target relation "public.test_tab1" in > > > transaction 721 at 2021-11-30 06:24:04.305096+01 > > > > > > As a result, we had two error entries for test_tab1: the table sync > > > worker error and the apply worker error. I didn't expect that the > > > table sync worker for test_tab1 failed due to "replication origin with > > > OID 2 is already active for PID 23706” error. > > > > > > Looking at test_subscription_error() in 026_worker_stats.pl, we have > > > two checks; in the first check, we wait for the view to show the error > > > entry for the given relation name and xid. This check was passed since > > > we had the second error (i.g., apply worker error). In the second > > > check, we get error entries from pg_stat_subscription_workers by > > > specifying only the relation name. Therefore, we ended up getting two > > > entries and failed the tests. > > > > > > To fix this issue, I think that in the second check, we can get the > > > error from pg_stat_subscription_workers by specifying the relation > > > name *and* xid like the first check does. I've attached the patch. > > > What do you think? > > > > > > > I think this will fix the reported failure but there is another race > > condition in the test. Isn't it possible that for table test_tab2, we > > get an error "replication origin with OID ..." or some other error > > before copy, in that case also, we will proceed from the second call > > of test_subscription_error() which is not what we expect in the test? > > Right. > > > Shouldn't we someway check that the error message also starts with > > "duplicate key value violates ..."? > > Yeah, I think it's a good idea to make the checks more specific. That > is, probably we can specify the prefix of the error message and > subrelid in addition to the current conditions: relid and xid. That > way, we can check what error was reported by which workers (tablesync > or apply) for which relations. And both check queries in > test_subscription_error() can have the same WHERE clause. I've attached a patch that fixes this issue. Please review it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/ 0001-Fix-regression-test-failure-caused-by-commit-8d74fc9.patch Description: Binary data
Re: Windows build warnings
> On 27 Nov 2021, at 14:55, Andrew Dunstan wrote: > ISTM the worst case is that there will be undetected unused variables in > Windows-only code. I guess that would mostly be detected by Msys systems > running gcc. Yes, that should be caught there. I've applied this now together with the removal of PG_USED_FOR_ASSERTS_ONLY on those variables where it was set on variables in general use. -- Daniel Gustafsson https://vmware.com/
Re: suboverflowed subtransactions concurrency performance optimize
On Mon, 30 Aug 2021 at 11:25, Andrey Borodin wrote: > > Hi Pengcheng! > > You are solving important problem, thank you! > > > 30 авг. 2021 г., в 13:43, Pengchengliu написал(а): > > > > To resolve this performance problem, we think about a solution which cache > > SubtransSLRU to local cache. > > First we can query parent transaction id from SubtransSLRU, and copy the > > SLRU page to local cache page. > > After that if we need query parent transaction id again, we can query it > > from local cache directly. > > A copy of SLRU in each backend's cache can consume a lot of memory. Yes, copying the whole SLRU into local cache seems overkill. > Why create a copy if we can optimise shared representation of SLRU? transam.c uses a single item cache to prevent thrashing from repeated lookups, which reduces problems with shared access to SLRUs. multitrans.c also has similar. I notice that subtrans. doesn't have this, but could easily do so. Patch attached, which seems separate to other attempts at tuning. On review, I think it is also possible that we update subtrans ONLY if someone uses >PGPROC_MAX_CACHED_SUBXIDS. This would make subtrans much smaller and avoid one-entry-per-page which is a major source of cacheing. This would means some light changes in GetSnapshotData(). Let me know if that seems interesting also? -- Simon Riggshttp://www.EnterpriseDB.com/ subtrans_single_item_cache.v1.patch Description: Binary data
RE: Optionally automatically disable logical replication subscriptions on error
On Monday, November 29, 2021 2:38 PM vignesh C > Thanks for the updated patch, Few comments: Thank you for your review ! > 1) Since this function is used only from 027_disable_on_error and not used by > others, this can be moved to 027_disable_on_error: > +sub wait_for_subscriptions > +{ > + my ($self, $dbname, @subscriptions) = @_; > + > + # Unique-ify the subscriptions passed by the caller > + my %unique = map { $_ => 1 } @subscriptions; > + my @unique = sort keys %unique; > + my $unique_count = scalar(@unique); > + > + # Construct a SQL list from the unique subscription names > + my @escaped = map { s/'/''/g; s/\\//g; $_ } @unique; > + my $sublist = join(', ', map { "'$_'" } @escaped); > + > + my $polling_sql = qq( > + SELECT COUNT(1) = $unique_count FROM > + (SELECT s.oid > + FROM pg_catalog.pg_subscription s > + LEFT JOIN pg_catalog.pg_subscription_rel > sr > + ON sr.srsubid = s.oid > + WHERE (sr IS NULL OR sr.srsubstate IN > ('s', 'r')) > + AND s.subname IN ($sublist) > + AND s.subenabled IS TRUE > +UNION > +SELECT s.oid > + FROM pg_catalog.pg_subscription s > + WHERE s.subname IN ($sublist) > + AND s.subenabled IS FALSE > + ) AS synced_or_disabled > + ); > + return $self->poll_query_until($dbname, $polling_sql); } Fixed. > 2) The empty line after comment is not required, it can be removed > +# Create non-unique data in both schemas on the publisher. > +# > +for $schema (@schemas) > +{ Fixed. > 3) Similarly it can be changed across the file > +# Wait for the initial subscription synchronizations to finish or fail. > +# > +$node_subscriber->wait_for_subscriptions('postgres', @schemas) > + or die "Timed out while waiting for subscriber to synchronize > +data"; > > +# Enter unique data for both schemas on the publisher. This should > +succeed on # the publisher node, and not cause any additional problems > +on the subscriber # side either, though disabled subscription "s1" should not > replicate anything. > +# > +for $schema (@schemas) Fixed. > 4) Since subid is used only at one place, no need of subid variable, you could > replace subid with subform->oid in LockSharedObject > + Datum values[Natts_pg_subscription]; > + HeapTuple tup; > + Oid subid; > + Form_pg_subscription subform; > > + subid = subform->oid; > + LockSharedObject(SubscriptionRelationId, subid, 0, > + AccessExclusiveLock); Fixed. > 5) "permissions errors" should be "permission errors" > + Specifies whether the subscription should be automatically > disabled > + if replicating data from the publisher triggers non-transient > errors > + such as referential integrity or permissions errors. The default is > + false. > + Fixed. The new patch v8 is shared in [1]. [1] - https://www.postgresql.org/message-id/TYCPR01MB83735AA021E0F614A3AB3221ED679%40TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi
Re: increase size of pg_commit_ts buffers
On Fri, 12 Nov 2021 at 17:39, Tomas Vondra wrote: > So +1 to just get this committed, as it is. This is a real issue affecting Postgres users. Please commit this soon. -- Simon Riggshttp://www.EnterpriseDB.com/
RE: Optionally automatically disable logical replication subscriptions on error
On Tuesday, November 30, 2021 1:10 PM Greg Nancarrow wrote: > On Sat, Nov 27, 2021 at 1:36 AM osumi.takami...@fujitsu.com > wrote: > > > > This v7 uses v26 of skip xid patch [1] > This patch no longer applies on the latest source. > Also, the patch is missing an update to doc/src/sgml/catalogs.sgml, for the > new "subdisableonerr" column of pg_subscription. Thanks for your review ! Fixed the documentation accordingly. Further, this comment invoked some more refactoring of codes since I wrote some internal codes related to 'disable_on_error' in an inconsistent order. I fixed this by keeping patch's codes after that of 'two_phase' subscription option as much as possible. I also conducted both pgindent and pgperltidy. Now, I'll share the v8 that uses PG whose commit id is after 8d74fc9 (pg_stat_subscription_workers). Best Regards, Takamichi Osumi v8-0001-Optionally-disable-subscriptions-on-error.patch Description: v8-0001-Optionally-disable-subscriptions-on-error.patch
Re: Non-superuser subscription owners
On Tue, Nov 30, 2021 at 12:56 AM Jeff Davis wrote: > > On Mon, 2021-11-29 at 09:43 +0530, Amit Kapila wrote: > > The first reason is that way it would be consistent with what we can > > see while doing the operations from the backend. > > Logical replication is not interactive, so it doesn't seem quite the > same. > > If you have a long running INSERT INTO SELECT or COPY FROM, the > permission checks just happen at the beginning. As a user, it wouldn't > surprise me if logical replication was similar. > > > operation. Another reason is to make behavior predictable as users > > can > > always expect when exactly the privilege change will be reflected and > > it won't depend on the number of changes in the transaction. > > This patch does detect ownership changes more quickly (at the > transaction boundary) than the current code (only when it reloads for > some other reason). Transaction boundary seems like a reasonable time > to detect the change to me. > > Detecting faster might be nice, but I don't have a strong opinion about > it and I don't see why it necessarily needs to happen before this patch > goes in. > I think it would be better to do it before we allow subscription owners to be non-superusers. > Also, do you think the cost of doing maybe_reread_subscription() per- > tuple instead of per-transaction would be detectable? > Yeah, it is possible that is why I suggested in one of the emails above to allow changing the owners only for disabled subscriptions. -- With Regards, Amit Kapila.
Re: Non-superuser subscription owners
On Mon, Nov 29, 2021 at 11:52 PM Jeff Davis wrote: > > On Mon, 2021-11-29 at 08:26 -0800, Mark Dilger wrote: > > > > I agree that if we want to do all of this then that would require a > > > lot of changes. However, giving an error for RLS-enabled tables > > > might > > > also be too restrictive. The few alternatives could be that (a) we > > > allow subscription owners to be either have "bypassrls" attribute > > > or > > > they could be superusers. (b) don't allow initial table_sync for > > > rls > > > enabled tables. (c) evaluate/analyze what is required to allow Copy > > > From to start respecting RLS policies. (d) reject replicating any > > > changes to tables that have RLS enabled. > > Maybe a combination? > > Allow subscriptions with copy_data=true iff the subscription owner is > bypassrls or superuser. And then enforce RLS+WCO during > insert/update/delete. > Yeah, that sounds reasonable to me. > I don't think it's a big change (correct me if I'm wrong), > Yeah, I also don't think it should be a big change. -- With Regards, Amit Kapila.
Re: Skipping logical replication transactions on subscriber side
On Tue, Nov 30, 2021 at 6:28 PM Amit Kapila wrote: > > On Mon, Nov 29, 2021 at 11:38 AM vignesh C wrote: > > > > I have pushed this patch and there is a buildfarm failure for it. See: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder=2021-11-30%2005%3A05%3A25 > > Sawada-San has shared his initial analysis on pgsql-committers [1] and > I am responding here as the fix requires some more discussion. > > > Looking at the result the test actually got, we had two error entries > > for test_tab1 instead of one: > > > > # Failed test 'check the error reported by the apply worker' > > # at t/026_worker_stats.pl line 33. > > # got: 'tap_sub|INSERT|test_tab1|t > > # tap_sub||test_tab1|t' > > # expected: 'tap_sub|INSERT|test_tab1|t' > > > > The possible scenarios are: > > > > The table sync worker for test_tab1 failed due to an error unrelated > > to apply changes: > > > > 2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with > > OID 2 is already active for PID 23706 > > > > At this time, the view had one error entry for the table sync worker. > > After retrying table sync, it succeeded: > > > > 2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table > > synchronization worker for subscription "tap_sub", table "test_tab1" > > has finished > > > > Then after inserting a row on the publisher, the apply worker inserted > > the row but failed due to violating a unique key violation, which is > > expected: > > > > 2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value > > violates unique constraint "test_tab1_pkey" > > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists. > > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data > > during "INSERT" for replication target relation "public.test_tab1" in > > transaction 721 at 2021-11-30 06:24:04.305096+01 > > > > As a result, we had two error entries for test_tab1: the table sync > > worker error and the apply worker error. I didn't expect that the > > table sync worker for test_tab1 failed due to "replication origin with > > OID 2 is already active for PID 23706” error. > > > > Looking at test_subscription_error() in 026_worker_stats.pl, we have > > two checks; in the first check, we wait for the view to show the error > > entry for the given relation name and xid. This check was passed since > > we had the second error (i.g., apply worker error). In the second > > check, we get error entries from pg_stat_subscription_workers by > > specifying only the relation name. Therefore, we ended up getting two > > entries and failed the tests. > > > > To fix this issue, I think that in the second check, we can get the > > error from pg_stat_subscription_workers by specifying the relation > > name *and* xid like the first check does. I've attached the patch. > > What do you think? > > > > I think this will fix the reported failure but there is another race > condition in the test. Isn't it possible that for table test_tab2, we > get an error "replication origin with OID ..." or some other error > before copy, in that case also, we will proceed from the second call > of test_subscription_error() which is not what we expect in the test? Right. > Shouldn't we someway check that the error message also starts with > "duplicate key value violates ..."? Yeah, I think it's a good idea to make the checks more specific. That is, probably we can specify the prefix of the error message and subrelid in addition to the current conditions: relid and xid. That way, we can check what error was reported by which workers (tablesync or apply) for which relations. And both check queries in test_subscription_error() can have the same WHERE clause. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: SSL Tests for sslinfo extension
> On 30 Nov 2021, at 00:16, Daniel Gustafsson wrote: >> On 29 Nov 2021, at 23:50, Tom Lane wrote: >> Otherwise I think it's good to go, so I marked it RFC. > > Great! I'll take another look over it tomorrow and will go ahead with it > then. I applied your nitpick diff and took it for another spin in CI, and pushed it. Thanks for review! -- Daniel Gustafsson https://vmware.com/
Re: row filtering for logical replication
On Tue, Nov 30, 2021 at 12:33 PM Ajin Cherian wrote: > > On Thu, Nov 25, 2021 at 2:22 PM Peter Smith wrote: > > > > Thanks for all the review comments so far! We are endeavouring to keep > > pace with them. > > > > All feedback is being tracked and we will fix and/or reply to everything > > ASAP. > > > > Meanwhile, PSA the latest set of v42* patches. > > > > This version was mostly a patch restructuring exercise but it also > > addresses some minor review comments in passing. > > > > Addressed more review comments, in the attached patch-set v43. 5 > patches carried forward from v42. > This patch-set contains the following fixes: > > On Tue, Nov 23, 2021 at 1:28 AM Dilip Kumar wrote: > > > > in pgoutput_row_filter, we are dropping the slots if there are some > > old slots in the RelationSyncEntry. But then I noticed that in > > rel_sync_cache_relation_cb(), also we are doing that but only for the > > scantuple slot. So IMHO, rel_sync_cache_relation_cb(), is only place > > setting entry->rowfilter_valid to false; so why not drop all the slot > > that time only and in pgoutput_row_filter(), you can just put an > > assert? > > > > Moved all the dropping of slots to rel_sync_cache_relation_cb() > > > +static bool > > +pgoutput_row_filter_virtual(Relation relation, TupleTableSlot *slot, > > RelationSyncEntry *entry) > > +{ > > +EState *estate; > > +ExprContext *ecxt; > > > > > > pgoutput_row_filter_virtual and pgoutput_row_filter are exactly same > > except, ExecStoreHeapTuple(), so why not just put one check based on > > whether a slot is passed or not, instead of making complete duplicate > > copy of the function. > > Removed pgoutput_row_filter_virtual > > > oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > tupdesc = CreateTupleDescCopy(tupdesc); > > entry->scantuple = MakeSingleTupleTableSlot(tupdesc, > > ); > > > > Why do we need to copy the tupledesc? do we think that we need to have > > this slot even if we close the relation, if so can you add the > > comments explaining why we are making a copy here. > > This code has been modified, and comments added. > > On Tue, Nov 23, 2021 at 8:02 PM Amit Kapila wrote: > > One more thing related to this code: > > pgoutput_row_filter() > > { > > .. > > + if (!entry->rowfilter_valid) > > { > > .. > > + oldctx = MemoryContextSwitchTo(CacheMemoryContext); > > + tupdesc = CreateTupleDescCopy(tupdesc); > > + entry->scantuple = MakeSingleTupleTableSlot(tupdesc, ); > > + MemoryContextSwitchTo(oldctx); > > .. > > } > > > > Why do we need to initialize scantuple here unless we are sure that > > the row filter is going to get associated with this relentry? I think > > when there is no row filter then this allocation is not required. > > > > Modified as suggested. > > On Tue, Nov 23, 2021 at 10:52 PM Amit Kapila wrote: > > > > In 0003 patch, why is below change required? > > --- a/src/backend/replication/pgoutput/pgoutput.c > > +++ b/src/backend/replication/pgoutput/pgoutput.c > > @@ -1,4 +1,4 @@ > > -/*- > > +/* > > * > > * pgoutput.c > > > > Removed. > > > > > After above, rearrange the code in pgoutput_row_filter(), so that two > > different checks related to 'rfisnull' (introduced by different > > patches) can be combined as if .. else check. > > > Fixed. > > On Thu, Nov 25, 2021 at 12:03 AM Amit Kapila wrote: > > > > + * If the new relation or the old relation has a where clause, > > + * we need to remove it so that it can be added afresh later. > > + */ > > + if (RelationGetRelid(newpubrel->relation) == oldrelid && > > + newpubrel->whereClause == NULL && rfisnull) > > > > Can't we use _equalPublicationTable() here? It compares the whereClause as > > well. > > > > Tried this, can't do this because one is an alter statement while the > other is a publication, the whereclause is not > the same Nodetype. In the statement, the whereclause is T_A_Expr, > while in the publication > catalog, it is T_OpExpr. > > > /* Must be owner of the table or superuser. */ > > - if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId())) > > + if (!pg_class_ownercheck(relid, GetUserId())) > > > > Here, you can directly use RelationGetRelid as was used in the > > previous code without using an additional variable. > > > > Fixed. > > > 2. > > +typedef struct { > > + Relation rel; > > + bool check_replident; > > + Bitmapset *bms_replident; > > +} > > +rf_context; > > > > Add rf_context in the same line where } ends. > > Code has been modified, this comment no longer applies. > > > 4. > > + * Rules: Node-type validation > > + * --- > > + * Allow only simple or compound expressions like: > > + * - "(Var Op Const)" or > > > > It seems Var Op Var is allowed. I tried below and it works: > > create publication pub for table t1 where (c1 < c2) WITH (publish = > >
Re: row filtering for logical replication
On Tue, Nov 30, 2021 at 3:55 PM Amit Kapila wrote: > > > > > > We can try that way but I think we should still be able to combine in > > > many cases like where all the operations are specified for > > > publications having the table or maybe pubactions are same. So, we > > > should not give up on those cases. We can do this new logic only when > > > we find that pubactions are different and probably store them as > > > independent expressions and corresponding pubactions for it at the > > > current location in the v42* patch (in pgoutput_row_filter). It is > > > okay to combine them at a later stage during execution when we can't > > > do it at the time of forming cache entry. > > > > What about the initial table sync? during that, we are going to > > combine all the filters or we are going to apply only the insert > > filters? > > > > AFAIK, currently, initial table sync doesn't respect publication > actions so it should combine all the filters. What do you think? Yeah, I have the same opinion. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: row filtering for logical replication
On Tue, Nov 30, 2021 at 11:37 AM Dilip Kumar wrote: > > On Tue, Nov 30, 2021 at 10:26 AM Amit Kapila wrote: > > > > On Mon, Nov 29, 2021 at 8:40 PM Euler Taveira wrote: > > > > > > On Mon, Nov 29, 2021, at 7:11 AM, Amit Kapila wrote: > > > > > > I don't think it is a good idea to combine the row-filter from the > > > publication that publishes just 'insert' with the row-filter that > > > publishes 'updates'. We shouldn't apply the 'insert' filter for > > > 'update' and similarly for publication operations. We can combine the > > > filters when the published operations are the same. So, this means > > > that we might need to cache multiple row-filters but I think that is > > > better than having another restriction that publish operation 'insert' > > > should also honor RI columns restriction. > > > > > > That's exactly what I meant to say but apparently I didn't explain in > > > details. > > > If a subscriber has multiple publications and a table is part of these > > > publications with different row filters, it should check the publication > > > action > > > *before* including it in the row filter list. It means that an UPDATE > > > operation > > > cannot apply a row filter that is part of a publication that has only > > > INSERT as > > > an action. Having said that we cannot always combine multiple row filter > > > expressions into one. Instead, it should cache individual row filter > > > expression > > > and apply the OR during the row filter execution (as I did in the initial > > > patches before this caching stuff). The other idea is to have multiple > > > caches > > > for each action. The main disadvantage of this approach is to create 4x > > > entries. > > > > > > I'm experimenting the first approach that stores multiple row filters and > > > its > > > publication action right now. > > > > > > > We can try that way but I think we should still be able to combine in > > many cases like where all the operations are specified for > > publications having the table or maybe pubactions are same. So, we > > should not give up on those cases. We can do this new logic only when > > we find that pubactions are different and probably store them as > > independent expressions and corresponding pubactions for it at the > > current location in the v42* patch (in pgoutput_row_filter). It is > > okay to combine them at a later stage during execution when we can't > > do it at the time of forming cache entry. > > What about the initial table sync? during that, we are going to > combine all the filters or we are going to apply only the insert > filters? > AFAIK, currently, initial table sync doesn't respect publication actions so it should combine all the filters. What do you think? -- With Regards, Amit Kapila.
Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd
Thanks for everyone's insight so far! my exception should be that the relcache should not be invalidated _after > the first relation open_ > in the executor (not the beginning of executorRun)。 > > s/exception/expectation. To be more accurate, my expectation is for a single sql statement, after the first time I write data into the relation, until the statement is completed or "aborted and RelationClose is called in ResourceOwnerRelease", the relcache reset should never happen. Since there are many places the write table access methods can be called like COPY, CAST, REFRESH MATVIEW, VACUUM and ModifyNode, and it is possible that we can get error in each place, so I think RelationClose should be a great places for exceptional case(at the same time, we should remember to destroy properly for non exceptional case). This might be not very professional since I bind some executor related data into a relcache related struct. But it should be workable in my modified user case. The most professional method I can think out is adding another resource type in ResourceOwner and let ResourceOwnerRelease to handle the exceptional cases. -- Best Regards Andy Fan
Re: Update stale code comment in CheckpointerMain()
> On 30 Nov 2021, at 08:00, Amul Sul wrote: > The attached patch updates the code comment which is no longer true > after commit # 4a92a1c3d1c361ffb031ed05bf65b801241d7cdd Agreed, but looking at this shouldn't we also tweak the comment on RecoveryInProgress() as per the attached v2 diff? -- Daniel Gustafsson https://vmware.com/ update_code_comment_v2.patch Description: Binary data
Re: Skipping logical replication transactions on subscriber side
On Mon, Nov 29, 2021 at 11:38 AM vignesh C wrote: > I have pushed this patch and there is a buildfarm failure for it. See: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder=2021-11-30%2005%3A05%3A25 Sawada-San has shared his initial analysis on pgsql-committers [1] and I am responding here as the fix requires some more discussion. > Looking at the result the test actually got, we had two error entries > for test_tab1 instead of one: > > # Failed test 'check the error reported by the apply worker' > # at t/026_worker_stats.pl line 33. > # got: 'tap_sub|INSERT|test_tab1|t > # tap_sub||test_tab1|t' > # expected: 'tap_sub|INSERT|test_tab1|t' > > The possible scenarios are: > > The table sync worker for test_tab1 failed due to an error unrelated > to apply changes: > > 2021-11-30 06:24:02.137 CET [18990:2] ERROR: replication origin with > OID 2 is already active for PID 23706 > > At this time, the view had one error entry for the table sync worker. > After retrying table sync, it succeeded: > > 2021-11-30 06:24:04.202 CET [28117:2] LOG: logical replication table > synchronization worker for subscription "tap_sub", table "test_tab1" > has finished > > Then after inserting a row on the publisher, the apply worker inserted > the row but failed due to violating a unique key violation, which is > expected: > > 2021-11-30 06:24:04.307 CET [4806:2] ERROR: duplicate key value > violates unique constraint "test_tab1_pkey" > 2021-11-30 06:24:04.307 CET [4806:3] DETAIL: Key (a)=(1) already exists. > 2021-11-30 06:24:04.307 CET [4806:4] CONTEXT: processing remote data > during "INSERT" for replication target relation "public.test_tab1" in > transaction 721 at 2021-11-30 06:24:04.305096+01 > > As a result, we had two error entries for test_tab1: the table sync > worker error and the apply worker error. I didn't expect that the > table sync worker for test_tab1 failed due to "replication origin with > OID 2 is already active for PID 23706” error. > > Looking at test_subscription_error() in 026_worker_stats.pl, we have > two checks; in the first check, we wait for the view to show the error > entry for the given relation name and xid. This check was passed since > we had the second error (i.g., apply worker error). In the second > check, we get error entries from pg_stat_subscription_workers by > specifying only the relation name. Therefore, we ended up getting two > entries and failed the tests. > > To fix this issue, I think that in the second check, we can get the > error from pg_stat_subscription_workers by specifying the relation > name *and* xid like the first check does. I've attached the patch. > What do you think? > I think this will fix the reported failure but there is another race condition in the test. Isn't it possible that for table test_tab2, we get an error "replication origin with OID ..." or some other error before copy, in that case also, we will proceed from the second call of test_subscription_error() which is not what we expect in the test? Shouldn't we someway check that the error message also starts with "duplicate key value violates ..."? [1] - https://www.postgresql.org/message-id/CAD21AoChP5wOT2AYziF%2B-j7vvThF2NyAs7wr%2Byy%2B8hsnu%3D8Rgg%40mail.gmail.com -- With Regards, Amit Kapila.
Re: [PATCH] pg_statio_all_tables: several rows per table due to invalid TOAST index
Hi, Michael Thank you for your attention! On Tue, 2021-11-30 at 17:29 +0900, Michael Paquier wrote: > Hmm. Why should we care about invalid indexes at all, including > pg_statio_all_indexes? > I think we should care about them at least because they are exists and can consume resources. For example, invalid index is to be updated by DML operations. Of course we can exclude such indexes from a view using isvalid, isready, islive fields. But in such case we should mention this in the docs, and more important is that the new such states of indexes can appear in the future causing change in a view definition. Counting all indexes regardless of states seems more reasonable to me. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [PATCH] pg_statio_all_tables: several rows per table due to invalid TOAST index
On Mon, Nov 29, 2021 at 05:04:29PM +0300, Andrei Zubkov wrote: > However it is possible that the TOAST table will have more than one > index. For example, this happens when REINDEX CONCURRENTLY operation > lefts an index in invalid state (indisvalid = false) due to some kind > of a failure. It's often sufficient to interrupt REINDEX CONCURRENTLY > operation right after start. > > Such index will cause the second row to appear in a > pg_statio_all_tables view which obvious is unexpected behaviour. Indeed. I can see that. > Now we can have several regular indexes and several TOAST-indexes for > the same table. Statistics for the regular and TOAST indexes is to be > calculated the same way so I've decided to use a CTE here. Hmm. Why should we care about invalid indexes at all, including pg_statio_all_indexes? -- Michael signature.asc Description: PGP signature
Re: Lots of memory allocated when reassigning Large Objects
Le mar. 30 nov. 2021 à 00:25, Tom Lane a écrit : > Guillaume Lelarge writes: > > Le lun. 29 nov. 2021 à 22:27, Tom Lane a écrit : > >> I'm checking it in HEAD though; perhaps there's something else wrong > >> in the back branches? > > > That's also what I was thinking. I was only trying with v14. I just > checked > > with v15devel, and your patch works alright. So there must be something > > else with back branches. > > AFAICT the patch fixes what it intends to fix in v14 too. The reason the > residual leak is worse in v14 is that the sinval message queue is bulkier. > We improved that in HEAD in commit 3aafc030a. I wanted to make sure that commit 3aafc030a fixed this issue. So I did a few tests: without 3aafc030a, without your latest patch 1182148 kB max resident size with 3aafc030a, without your latest patch 1306812 kB max resident size without 3aafc030a, with your latest patch 1182128 kB max resident size with 3aafc030a, with your latest patch 180996 kB max resident size Definitely, 3aafc030a and your latest patch allow PostgreSQL to use much less memory. Going from 1GB to 180MB is awesome. I tried to cherry-pick 3aafc030a on v14, but it didn't apply cleanly, and the work was a bit overwhelming for me, at least in the morning. I'll try again today, but I don't have much hope. I'm not sure if I want to > take the risk of back-patching that, even now that it's aged a couple > months in the tree. It is a pretty localized fix, but it makes some > assumptions about usage patterns that might not hold up. > > I understand. Of course, it would be better if it could be fixed for each supported version but I'm already happy that it could be fixed in the next release. -- Guillaume.