Re: Fix for segfault in logical replication on master
On Fri, Jun 18, 2021 at 9:18 AM Amit Kapila wrote: > > > I thought it was cheap enough to check that the relation we open is an > > index, because if it is not, we'll segfault when accessing fields of the > > relation->rd_index struct. I wouldn't necessarily advocate doing any > > really expensive checks here, but a quick sanity check seemed worth the > > effort. > > > > I am not telling you anything about the cost of these sanity checks. I > suggest you raise elog rather than return NULL because if this happens > there is definitely some problem and continuing won't be a good idea. > Pushed, after making the above change. Additionally, I have moved the test case to the existing file 001_rep_changes instead of creating a new one as the test seems to fit there and I was not sure if the test for just this case deserves a new file. -- With Regards, Amit Kapila.
Re: seawasp failing, maybe in glibc allocator
On Sat, Jun 19, 2021 at 5:07 PM Thomas Munro wrote: > if (error != LLVMErrorSuccess) > LLVMOrcDisposeMaterializationUnit(mu); > > +#if LLVM_VERSION_MAJOR > 12 > + for (int i = 0; i < LookupSetSize; i++) > + LLVMOrcRetainSymbolStringPoolEntry(symbols[i].Name); > +#endif (Though, erm, that code probably either needs to move a bit further up or become conditional, considering the error case immediately above it, not sure which...)
Re: locking [user] catalog tables vs 2pc vs logical rep
On Fri, Jun 18, 2021 at 2:25 PM osumi.takami...@fujitsu.com wrote: > > On Friday, June 18, 2021 11:41 AM osumi.takami...@fujitsu.com > wrote: > > Simon, I appreciate your suggestions and yes, if the user catalog table is > > referenced by the output plugin, it can be another cause of the deadlock. > > > > I'm going to post the patch for the those two changes, accordingly. > Hi, I've made the patch-set to cover the discussion above for all-supported > versions. > Please have a look at those. > I have slightly modified your patch, see if the attached looks okay to you? This is just a HEAD patch, I'll modify the back-branch patches accordingly. -- With Regards, Amit Kapila. v2-0001-Doc-Update-caveats-in-synchronous-logical-replica.patch Description: Binary data
Re: Question about StartLogicalReplication() error path
On Thu, Jun 17, 2021 at 4:25 AM Jeff Davis wrote: > > On Tue, 2021-06-15 at 15:19 +0900, Kyotaro Horiguchi wrote: > > I don't think the message is neded, but I don't oppose it as far as > > the level is LOG and the messages were changed as something like > > this: > > > > > > - elog(DEBUG1, "cannot stream from %X/%X, minimum is > > %X/%X, forwarding", > > + elog(LOG, "%X/%X has been already streamed, > > forwarding to %X/%X", > > > > FWIW, I most prefer #1. I see #2 as optional. and see #3 as the > > above. > > Attached. > LGTM. -- With Regards, Amit Kapila.
Re: Optionally automatically disable logical replication subscriptions on error
On Sat, Jun 19, 2021 at 1:06 AM Mark Dilger wrote: > > On Jun 17, 2021, at 9:47 PM, Amit Kapila wrote: > > > We are also discussing another action like skipping the apply of the > > transaction on an error [1]. I think it is better to evaluate both the > > proposals as one seems to be an extension of another. > > Thanks for the link. > > I think they are two separate options. > Right, but there are things that could be common from the design perspective. For example, why is it mandatory to update this conflict ( error) information in the system catalog instead of displaying it via some stats view? Also, why not also log the xid of the failed transaction? -- With Regards, Amit Kapila.
Re: Unresolved repliaction hang and stop problem.
On Fri, Jun 18, 2021 at 11:22 AM Kyotaro Horiguchi wrote: > > At Thu, 17 Jun 2021 12:56:42 -0400, Alvaro Herrera > wrote in > > On 2021-Jun-17, Kyotaro Horiguchi wrote: > > > > > I don't see a call to hash_*seq*_search there. Instead, I see one in > > > ReorderBufferCheckMemoryLimit(). > > > > Doh, of course -- I misread. > > > > ReorderBufferCheckMemoryLimit is new in pg13 (cec2edfa7859) so now at > > least we have a reason why this workload regresses in pg13 compared to > > earlier releases. > > > > Looking at the code, it does seem that increasing the memory limit as > > Amit suggests might solve the issue. Is that a practical workaround? > > I believe so generally. I'm not sure about the op, though. > > Just increasing the working memory to certain size would solve the > problem. There might be a potential issue that it might be hard like > this case for users to find out that the GUC works for their issue (if > actually works). pg_stat_replicatoin_slots.spill_count / spill_txns > could be a guide for setting logical_decoding_work_mem. Is it worth > having additional explanation like that for the GUC variable in the > documentation? > I don't see any harm in doing that but note that we can update it only for PG-14. The view pg_stat_replicatoin_slots is introduced in PG-14. -- With Regards, Amit Kapila.
Re: seawasp failing, maybe in glibc allocator
Thomas Munro writes: > On Sat, Jun 19, 2021 at 5:07 PM Thomas Munro wrote: >> if (error != LLVMErrorSuccess) >> LLVMOrcDisposeMaterializationUnit(mu); >> >> +#if LLVM_VERSION_MAJOR > 12 >> + for (int i = 0; i < LookupSetSize; i++) >> + LLVMOrcRetainSymbolStringPoolEntry(symbols[i].Name); >> +#endif > (Though, erm, that code probably either needs to move a bit further up > or become conditional, considering the error case immediately above > it, not sure which...) Is a compile-time conditional really going to be reliable? See nearby arguments about compile-time vs run-time checks for libpq features. It's not clear to me how tightly LLVM binds its headers and running code. regards, tom lane
Re: Optionally automatically disable logical replication subscriptions on error
> On Jun 19, 2021, at 3:17 AM, Amit Kapila wrote: > > Right, but there are things that could be common from the design > perspective. I went to reconcile my patch with that from [1] only to discover there is no patch on that thread. Is there one in progress that I can see? I don't mind trying to reconcile this patch with what you're discussing in [1], but I am a bit skeptical about [1] becoming a reality and I don't want to entirely hitch this patch to that effort. This can be committed with or without any solution to the idea in [1]. The original motivation for this patch was that the TAP tests don't have a great way to deal with a subscription getting into a fail-retry infinite loop, which makes it harder for me to make progress on [2]. That doesn't absolve me of the responsibility of making this patch a good one, but it does motivate me to keep it simple. > For example, why is it mandatory to update this conflict > ( error) information in the system catalog instead of displaying it > via some stats view? The catalog must be updated to disable the subscription, so placing the error information in the same row doesn't require any independent touching of the catalogs. Likewise, the catalog must be updated to re-enable the subscription, so clearing the error from that same row doesn't require any independent touching of the catalogs. The error information does not *need* to be included in the catalog, but placing the information in any location that won't survive server restart leaves the user no information about why the subscription got disabled after a restart (or crash + restart) happens. Furthermore, since v2 removed the "disabled_by_error" field in favor of just using subenabled + suberrmsg to determine if the subscription was automatically disabled, not having the information in the catalog would make it ambiguous whether the subscription was manually or automatically disabled. > Also, why not also log the xid of the failed > transaction? We could also do that. Reading [1], it seems you are overly focused on user-facing xids. The errdetail in the examples I've been using for testing, and the one mentioned in [1], contain information about the conflicting data. I think users are more likely to understand that a particular primary key value cannot be replicated because it is not unique than to understand that a particular xid cannot be replicated. (Likewise for permissions errors.) For example: 2021-06-18 16:25:20.139 PDT [56926] ERROR: duplicate key value violates unique constraint "s1_tbl_unique" 2021-06-18 16:25:20.139 PDT [56926] DETAIL: Key (i)=(1) already exists. 2021-06-18 16:25:20.139 PDT [56926] CONTEXT: COPY tbl, line 2 This tells the user what they need to clean up before they can continue. Telling them which xid tried to apply the change, but not the change itself or the conflict itself, seems rather unhelpful. So at best, the xid is an additional piece of information. I'd rather have both the ERROR and DETAIL fields above and not the xid than have the xid and lack one of those two fields. Even so, I have not yet included the DETAIL field because I didn't want to bloat the catalog. For the problem in [1], having the xid is more important than it is in my patch, because the user is expected in [1] to use the xid as a handle. But that seems like an odd interface to me. Imagine that a transaction on the publisher side inserted a batch of data, and only a subset of that data conflicts on the subscriber side. What advantage is there in skipping the entire transaction? Wouldn't the user rather skip just the problematic rows? I understand that on the subscriber side it is difficult to do so, but if you are going to implement this sort of thing, it makes more sense to allow the user to filter out data that is problematic rather than filtering out xids that are problematic, and the filter shouldn't just be an in-or-out filter, but rather a mapping function that can redirect the data someplace else or rewrite it before inserting or change the pre-existing conflicting data prior to applying the problematic data or whatever. That's a huge effort, of course, but if the idea in [1] goes in that direction, I don't want my patch to have already added an xid field which ultimately nobody wants. [1] - https://www.postgresql.org/message-id/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK%3D30xJfUVihNZDA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/flat/915B995D-1D79-4E0A-BD8D-3B267925FCE9%40enterprisedb.com#dbbce39c9e460183b67ee44b647b1209 — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Add version macro to libpq-fe.h
I wrote: > Alvaro Herrera writes: >> Well, if we do want docs for these macros, then IMO it'd be okay to have >> them in libpq-fe.h itself rather than SGML. A one-line comment for each >> would suffice: > WFM. I'd sort of supposed that the symbol names were self-documenting, > but you're right that a line or so of annotation improves things. Hearing no further comments, done that way. regards, tom lane
Re: Optionally automatically disable logical replication subscriptions on error
> On Jun 19, 2021, at 7:44 AM, Mark Dilger wrote: > > Wouldn't the user rather skip just the problematic rows? I understand that > on the subscriber side it is difficult to do so, but if you are going to > implement this sort of thing, it makes more sense to allow the user to filter > out data that is problematic rather than filtering out xids that are > problematic, and the filter shouldn't just be an in-or-out filter, but rather > a mapping function that can redirect the data someplace else or rewrite it > before inserting or change the pre-existing conflicting data prior to > applying the problematic data or whatever. Thinking about this some more, it seems my patch already sets the stage for this sort of thing. We could extend the concept of triggers to something like ErrorTriggers that could be associated with subscriptions. I already have the code catching errors for subscriptions where disable_on_error is true. We could use that same code path for subscriptions that have one or more BEFORE or AFTER ErrorTriggers defined. We could pass the trigger all the error context information along with the row and subscription information, and allow the trigger to either modify the data being replicated or make modifications to the table being changed. I think having support for both BEFORE and AFTER would be important, as a common design pattern might be to move aside the conflicting rows in the BEFORE trigger, then reconcile and merge them back into the table in the AFTER trigger. If the xid still cannot be replicated after one attempt using the triggers, the second attempt to disable the subscription instead. There are a lot of details to consider, but to my mind this idea is much more user friendly than the idea that users should muck about with xids for arbitrarily many conflicting transactions. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: unnesting multirange data types
Alexander Korotkov writes: > Patch successfully passed commitfest.cputube.org. I'm going to > re-push it if there are no objections. I'm still not happy about the way you've done the multirange-to-array part. I think we'd be better off improving the polymorphism rules so that that can be handled by one polymorphic function. Obviously that'd be a task for v15, but we've already concluded that just having the unnest ability would be minimally sufficient for v14. So I think you should trim it down to just the unnest part. In any case, beta2 wraps on Monday, and there is very little time left for a full round of buildfarm testing. I almost feel that it's too late to consider pushing this today. Tomorrow absolutely is too late for beta2. regards, tom lane
Re: Race condition in InvalidateObsoleteReplicationSlots()
=?utf-8?Q?=C3=81lvaro?= Herrera writes: > It occurred to me that this could be made better by sigstopping both > walreceiver and walsender, then letting only the latter run; AFAICS this > makes the test stable. I'll register this on the upcoming commitfest to > let cfbot run it, and if it looks good there I'll get it pushed to > master. If there's any problem I'll just remove it before beta2 is > stamped. Hmm ... desmoxytes has failed this test once, out of four runs since it went in: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2021-06-19%2003%3A06%3A04 None of the other animals that have reported in so far are unhappy. Still, maybe that's not a track record we want to have for beta2? I've just launched a run on gaur, which given its dinosaur status might be the most likely animal to have an actual portability problem with this test technique. If you want to wait a few hours to see what it says, that'd be fine with me. regards, tom lane
Re: Race between KeepFileRestoredFromArchive() and restartpoint
On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote: > Recycling and preallocation are wasteful during archive recovery, because > KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to > fix the race by adding an XLogCtl flag indicating which regime currently owns > the right to add long-term pg_wal directory entries. In the archive recovery > regime, the checkpointer will not preallocate and will unlink old segments > instead of recycling them (like wal_recycle=off). XLogFileInit() will fail. Here's the implementation. Patches 1-4 suffice to stop the user-visible ERROR. Patch 5 avoids a spurious LOG-level message and wasted filesystem writes, and it provides some future-proofing. I was tempted to (but did not) just remove preallocation. Creating one file per checkpoint seems tiny relative to the max_wal_size=1GB default, so I expect it's hard to isolate any benefit. Under the old checkpoint_segments=3 default, a preallocated segment covered a respectable third of the next checkpoint. Before commit 63653f7 (2002), preallocation created more files. Author: Noah Misch Commit: Noah Misch Remove XLogFileInit() ability to skip ControlFileLock. Cold paths, initdb and end-of-recovery, used it. Don't optimize them. Discussion: https://postgr.es/m/20210202151416.gb3304...@rfd.leadboat.com diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 1b3a3d9..39a38ba 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -913,8 +913,7 @@ static void AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic); static bool XLogCheckpointNeeded(XLogSegNo new_segno); static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible); static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, - bool find_free, XLogSegNo max_segno, - bool use_lock); + bool find_free, XLogSegNo max_segno); static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, XLogSource source, bool notfoundOk); static int XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source); @@ -2492,7 +2491,7 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible) /* create/use new log file */ use_existent = true; - openLogFile = XLogFileInit(openLogSegNo, &use_existent, true); + openLogFile = XLogFileInit(openLogSegNo, &use_existent); ReserveExternalFD(); } @@ -3265,10 +3264,6 @@ XLogNeedsFlush(XLogRecPtr record) * pre-existing file will be deleted). On return, true if a pre-existing * file was used. * - * use_lock: if true, acquire ControlFileLock while moving file into - * place. This should be true except during bootstrap log creation. The - * caller must *not* hold the lock at call. - * * Returns FD of opened file. * * Note: errors here are ERROR not PANIC because we might or might not be @@ -3277,7 +3272,7 @@ XLogNeedsFlush(XLogRecPtr record) * in a critical section. */ int -XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock) +XLogFileInit(XLogSegNo logsegno, bool *use_existent) { charpath[MAXPGPATH]; chartmppath[MAXPGPATH]; @@ -3437,8 +3432,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock) */ max_segno = logsegno + CheckPointSegments; if (!InstallXLogFileSegment(&installed_segno, tmppath, - *use_existent, max_segno, - use_lock)) + *use_existent, max_segno)) { /* * No need for any more future segments, or InstallXLogFileSegment() @@ -3592,7 +3586,7 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno, /* * Now move the segment into place with its final name. */ - if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, false)) + if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0)) elog(ERROR, "InstallXLogFileSegment should not have failed"); } @@ -3616,29 +3610,20 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno, * free slot is found between *segno and max_segno. (Ignored when find_free * is false.) * - * use_lock: if true, acquire ControlFileLock while moving file into - * place. This should be true except during bootstrap log creation. The - * caller must *not* hold the lock at call. - * * Returns true if the file was installed successfully.
Re: unnesting multirange data types
On Sat, Jun 19, 2021 at 7:35 PM Tom Lane wrote: > Alexander Korotkov writes: > > Patch successfully passed commitfest.cputube.org. I'm going to > > re-push it if there are no objections. > > I'm still not happy about the way you've done the multirange-to-array > part. I think we'd be better off improving the polymorphism rules so > that that can be handled by one polymorphic function. Obviously that'd > be a task for v15, but we've already concluded that just having the > unnest ability would be minimally sufficient for v14. > > So I think you should trim it down to just the unnest part. I'm not entirely sure it's worth introducing anyrangearray. There might be not many use-cases of anyrangearray besides this cast (normally one should use multirange instead of an array of ranges). But I agree that this subject should be carefully considered for v15. > In any case, beta2 wraps on Monday, and there is very little time > left for a full round of buildfarm testing. I almost feel that > it's too late to consider pushing this today. Tomorrow absolutely > is too late for beta2. +1 I also don't feel comfortable hurrying with unnest part to beta2. According to the open items wiki page, there should be beta3. Does unnest part have a chance for beta3? -- Regards, Alexander Korotkov
Re: unnesting multirange data types
Alexander Korotkov writes: > I also don't feel comfortable hurrying with unnest part to beta2. > According to the open items wiki page, there should be beta3. Does > unnest part have a chance for beta3? Hm. I'd prefer to avoid another forced initdb after beta2. On the other hand, it's entirely likely that there will be some other thing that forces that; in which case there'd be no reason not to push in the unnest feature as well. I'd say let's sit on the unnest code for a little bit and see what happens. regards, tom lane
Re: pgbench logging broken by time logic changes
On 2021-Jun-19, Thomas Munro wrote: > Thanks for looking so far. It's the weekend here and I need to > unplug, but I'll test these changes and if all looks good push on > Monday. Surely not the same day as the beta stamp... -- Álvaro Herrera Valdivia, Chile "Always assume the user will do much worse than the stupidest thing you can imagine."(Julien PUYDT)
RE: locking [user] catalog tables vs 2pc vs logical rep
On Saturday, June 19, 2021 6:51 PM Amit Kapila wrote: > On Fri, Jun 18, 2021 at 2:25 PM osumi.takami...@fujitsu.com > wrote: > > > > On Friday, June 18, 2021 11:41 AM osumi.takami...@fujitsu.com > wrote: > > > > Simon, I appreciate your suggestions and yes, if the user catalog > > > table is referenced by the output plugin, it can be another cause of the > deadlock. > > > > > > I'm going to post the patch for the those two changes, accordingly. > > Hi, I've made the patch-set to cover the discussion above for all-supported > versions. > > Please have a look at those. > > I have slightly modified your patch, see if the attached looks okay to you? > This > is just a HEAD patch, I'll modify the back-branch patches accordingly. Thank you for updating the patch. The patch becomes much better. Yet, I have one suggestion. * doc/src/sgml/logicaldecoding.sgml Issuing an explicit LOCK on pg_class -(or any other catalog table) in a transaction. +(or any other [user] catalog table) in a transaction. -Perform CLUSTER on pg_class in -a transaction. +Perform CLUSTER on pg_class (or any +other [user] catalog table) in a transaction. PREPARE TRANSACTION after LOCK command -on pg_class and allow logical decoding of two-phase -transactions. +on pg_class (or any other [user] catalog table) and +allow logical decoding of two-phase transactions. PREPARE TRANSACTION after CLUSTER -command on pg_trigger and allow logical decoding of -two-phase transactions. This will lead to deadlock only when published table -have a trigger. +command on pg_trigger (or any other [user] catalog +table) and allow logical decoding of two-phase transactions. This will lead +to deadlock only when published table have a trigger. Now we have the four paren supplementary descriptions, not to make users miss any other [user] catalog tables. Because of this, the built output html gives me some redundant impression, for that parts. Accordingly, couldn't we move them to outside of the itemizedlist section in a simple manner ? For example, to insert a sentence below the list, after removing the paren descriptions in the listitem, which says "Note that those commands that can cause deadlock apply to not only explicitly indicated system catalog tables above but also any other [user] catalog table." Best Regards, Takamichi Osumi
Re: pgbench logging broken by time logic changes
On Sun, Jun 20, 2021 at 3:18 PM Alvaro Herrera wrote: > On 2021-Jun-19, Thomas Munro wrote: > > Thanks for looking so far. It's the weekend here and I need to > > unplug, but I'll test these changes and if all looks good push on > > Monday. > > Surely not the same day as the beta stamp... Because of timezones, that'll be Sunday in the Americas. Still too close?
Re: locking [user] catalog tables vs 2pc vs logical rep
On Sun, Jun 20, 2021 at 9:28 AM osumi.takami...@fujitsu.com wrote: > > On Saturday, June 19, 2021 6:51 PM Amit Kapila > wrote: > > On Fri, Jun 18, 2021 at 2:25 PM osumi.takami...@fujitsu.com > > wrote: > > > > > > On Friday, June 18, 2021 11:41 AM osumi.takami...@fujitsu.com > > wrote: > > > > > > Simon, I appreciate your suggestions and yes, if the user catalog > > > > table is referenced by the output plugin, it can be another cause of the > > deadlock. > > > > > > > > I'm going to post the patch for the those two changes, accordingly. > > > Hi, I've made the patch-set to cover the discussion above for > > > all-supported > > versions. > > > Please have a look at those. > > > > I have slightly modified your patch, see if the attached looks okay to you? > > This > > is just a HEAD patch, I'll modify the back-branch patches accordingly. > Thank you for updating the patch. > The patch becomes much better. Yet, I have one suggestion. > > * doc/src/sgml/logicaldecoding.sgml > > > > Issuing an explicit LOCK on > pg_class > -(or any other catalog table) in a transaction. > +(or any other [user] catalog table) in a transaction. > > > > > > -Perform CLUSTER on > pg_class in > -a transaction. > +Perform CLUSTER on > pg_class (or any > +other [user] catalog table) in a transaction. > > > > > > PREPARE TRANSACTION after LOCK > command > -on pg_class and allow logical decoding of > two-phase > -transactions. > +on pg_class (or any other [user] catalog > table) and > +allow logical decoding of two-phase transactions. > > > > > > PREPARE TRANSACTION after > CLUSTER > -command on pg_trigger and allow logical > decoding of > -two-phase transactions. This will lead to deadlock only when > published table > -have a trigger. > +command on pg_trigger (or any other [user] > catalog > +table) and allow logical decoding of two-phase transactions. This > will lead > +to deadlock only when published table have a trigger. > > > Now we have the four paren supplementary descriptions, > not to make users miss any other [user] catalog tables. > Because of this, the built output html gives me some redundant > impression, for that parts. Accordingly, couldn't we move them > to outside of the itemizedlist section in a simple manner ? > > For example, to insert a sentence below the list, > after removing the paren descriptions in the listitem, which says > "Note that those commands that can cause deadlock apply to not only > explicitly indicated system catalog tables above but also any other [user] > catalog table." > Sounds reasonable to me. /but also any other/but also to any other/, to seems to be missing in the above line. Kindly send an update patch. -- With Regards, Amit Kapila.