Generating code for query jumbling through gen_node_support.pl
Hi all, This thread is a follow-up of the recent discussion about query jumbling with DDL statements, where the conclusion was that we'd want to generate all this code automatically for all the nodes: https://www.postgresql.org/message-id/36e5bffe-e989-194f-85c8-06e7bc88e...@amazon.com What this patch allows to do it to compute the same query ID for utility statements using their parsed Node state instead of their string, meaning that things like "BEGIN", "bEGIN" or "begin" would be treated the same, for example. But the main idea is not only that. I have implemented that as of the attached, where the following things are done: - queryjumble.c is moved to src/backend/nodes/, to stick with the other things for node equal/read/write/copy, renamed to jumblefuncs.c. - gen_node_support.c is extended to generate the functions and the switch for the jumbling. There are a few exceptions, as of the Lists and RangeTblEntry to do the jumbling consistently. - Two pg_node_attr() are added in consistency with the existing ones: no_jumble to discard completely a node from the the query jumbling and jumble_ignore to discard one field from the jumble. The patch is in a rather good shape, passes check-world and the CI, but there are a few things that need to be discussed IMO. Things could be perhaps divided in more patches, now the areas touched are quite different so it did not look like a big deal to me as the changes touch different areas and are straight-forward. The location of the Nodes is quite invasive because we only care about that for T_Const now in the query jumbling, and this could be compensated with a third pg_node_attr() that tracks for the "int location" of a Node whether it should participate in the jumbling or not. There is also an argument where we would want to not include by default new fields added to a Node, but that would require added more pg_node_attr() than what's done here. Note that the plan is to extend the normalization to some other parts of the Nodes, like CALL and SET as mentioned on the other thread. I have done nothing about that yet but doing so can be done in a few lines with the facility presented here (aka just add a location field). Hence, the normalization is consistent with the existing queryjumble.c for the fields and the nodes processed. In this patch, things are done so as the query ID is not computed anymore from the query string but from the Query. I still need to study the performance impact of that with short queries. If things prove to be noticeable in some cases, this stuff could be switched to use a new GUC where we could have a code path for the computation of utilityStmt using its string as a fallback. I am not sure that I want to enter in this level of complications, though, to keep things simple, but that's yet to be done. A bit more could be cut but pg_ident got in the way.. There are also a few things for pg_stat_statements where a query ID of 0 can be implied for utility statements in some cases. Generating this code leads to an overall removal of code as what queryjumble.c is generated automatically: 13 files changed, 901 insertions(+), 1113 deletions(-) I am adding that to the next commit fest. Thoughts? -- Michael From 55b6d9154a5524e67865299b9a73ef074bd12adf Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 7 Dec 2022 16:36:47 +0900 Subject: [PATCH] Support for automated query jumble with all Nodes This applies query jumbling in a consistent way to all the Nodes, including DDLs & friends. --- src/include/nodes/bitmapset.h | 2 +- src/include/nodes/nodes.h | 4 + src/include/nodes/parsenodes.h| 267 +--- src/include/nodes/plannodes.h | 3 +- src/include/nodes/primnodes.h | 419 - src/backend/nodes/Makefile| 1 + src/backend/nodes/README | 1 + src/backend/nodes/gen_node_support.pl | 95 ++- src/backend/nodes/jumblefuncs.c | 358 +++ src/backend/nodes/meson.build | 1 + src/backend/utils/misc/Makefile | 1 - src/backend/utils/misc/meson.build| 1 - src/backend/utils/misc/queryjumble.c | 861 -- 13 files changed, 901 insertions(+), 1113 deletions(-) create mode 100644 src/backend/nodes/jumblefuncs.c delete mode 100644 src/backend/utils/misc/queryjumble.c diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h index 2792281658..29d531224c 100644 --- a/src/include/nodes/bitmapset.h +++ b/src/include/nodes/bitmapset.h @@ -50,7 +50,7 @@ typedef int32 signedbitmapword; /* must be the matching signed type */ typedef struct Bitmapset { - pg_node_attr(custom_copy_equal, special_read_write) + pg_node_attr(custom_copy_equal, special_read_write, no_jumble) NodeTag type; int nwords; /* number of words in array */ diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h index 1f33902947..afaeeaf2c9 100644 --- a/src/include/nodes/nodes.h
Re: Force streaming every change in logical decoding
On Wed, Dec 7, 2022 at 10:55 AM Masahiko Sawada wrote: > > On Wed, Dec 7, 2022 at 12:55 PM Amit Kapila wrote: > > > > > On one side having separate GUCs for publisher and subscriber seems to > > give better flexibility but having more GUCs also sometimes makes them > > less usable. Here, my thought was to have a single or as few GUCs as > > possible which can be extendible by providing multiple values instead > > of having different GUCs. I was trying to map this with the existing > > string parameters in developer options. > > I see your point. On the other hand, I'm not sure it's a good idea to > control different features by one GUC in general. The developer option > could be an exception? > I am not sure what is the best thing if this was proposed as a non-developer option but it seems to me that having a single parameter for publisher/subscriber, in this case, can serve our need for testing/debugging. BTW, even though it is not a very good example but we use max_replication_slots for different purposes on the publisher (the limit for slots) and subscriber (the limit for origins). -- With Regards, Amit Kapila.
Re: add \dpS to psql
On 06.12.2022 22:36, Nathan Bossart wrote: As discussed elsewhere [0], \dp doesn't show privileges on system objects, and this behavior is not mentioned in the docs. I've attached a small patch that adds support for the S modifier (i.e., \dpS) and the adjusts the docs. Thoughts? [0] https://postgr.es/m/a2382acd-e465-85b2-9d8e-f9ed1a5a66e9%40postgrespro.ru A few words in support of this patch, since I was the initiator of the discussion. Before VACUUM, ANALYZE privileges, there was no such question. Why check privileges on system catalog objects? But now it doesn't. It is now possible to grant privileges on system tables, so it should be possible to see privileges with psql commands. However, the \dp command does not support the S modifier, which is inconsistent. Furthermore. The VACUUM privilege allows you to also execute VACUUM FULL. VACUUM and VACUUM FULL are commands with similar names, but work completely differently. It may be worth clarifying on this page: https://www.postgresql.org/docs/devel/ddl-priv.html Something like: Allows VACUUM on a relation, including VACUUM FULL. But that's not all. There is a very similar command to VACUUM FULL with a different name - CLUSTER. The VACUUM privilege does not apply to the CLUSTER command. This is probably correct. However, the documentation for the CLUSTER command does not say who can perform this command. I think it would be correct to add a sentence to the Notes section (https://www.postgresql.org/docs/devel/sql-cluster.html) similar to the one in the VACUUM documentation: "To cluster a table, one must ordinarily be the table's owner or a superuser." Ready to participate, if it seems reasonable. -- Pavel Luzanov Postgres Professional: https://postgrespro.com
Re: Perform streaming logical transactions by background workers and parallel apply
On Wed, Dec 7, 2022 at 10:10 AM Masahiko Sawada wrote: > > On Wed, Dec 7, 2022 at 1:29 PM Amit Kapila wrote: > > > > Right, but the leader will anyway exit at some point either due to an > > ERROR like "lost connection ... to parallel worker" or with a LOG > > like: "... will restart because of a parameter change" but I see your > > point. So, will it be better if we have a LOG message here and then > > proc_exit()? Do you have something else in mind for this? > > No, I was thinking that too. It's better to write a LOG message and do > proc_exit(). > > Regarding the error "lost connection ... to parallel worker", it could > still happen depending on the timing even if the parallel worker > cleanly exits due to parameter changes, right? If so, I'm concerned > that it could lead to disable the subscription unexpectedly if > disable_on_error is enabled. > If we want to avoid this then I think we have the following options (a) parallel apply skips checking parameter change (b) parallel worker won't exit on parameter change but will silently absorb the parameter and continue its processing; anyway, the leader will detect it and stop the worker for the parameter change Among these, the advantage of (b) is that it will allow reflecting the parameter change (that doesn't need restart) in the parallel worker. Do you have any better idea to deal with this? -- With Regards, Amit Kapila.
Re: PGDOCS - Logical replication GUCs - added some xrefs
On Tue, Dec 6, 2022 at 5:57 AM samay sharma wrote: > > Hi, > > On Mon, Oct 24, 2022 at 12:45 AM Peter Smith wrote: >> >> Hi hackers. >> >> There is a docs Logical Replication section "31.10 Configuration >> Settings" [1] which describes some logical replication GUCs, and >> details on how they interact with each other and how to take that into >> account when setting their values. >> >> There is another docs Server Configuration section "20.6 Replication" >> [2] which lists the replication-related GUC parameters, and what they >> are for. >> >> Currently AFAIK those two pages are unconnected, but I felt it might >> be helpful if some of the parameters in the list [2] had xref links to >> the additional logical replication configuration information [1]. PSA >> a patch to do that. > > > +1 on the patch. Some feedback on v5 below. > Thanks for your detailed review comments! I have changed most things according to your suggestions. Please check patch v6. > > + > > + For logical replication configuration settings > > refer > > + also to . > > + > > + > > I feel the top paragraph needs to explain terminology for logical replication > like it does for physical replication in addition to linking to the logical > replication config page. I'm recommending this as we use terms like > subscriber etc. in description of parameters without introducing them first. > > As an example, something like below might work. > > These settings control the behavior of the built-in streaming replication > feature (see Section 27.2.5) and logical replication (link). > > For physical replication, servers will be either a primary or a standby > server. Primaries can send data, while standbys are always receivers of > replicated data. When cascading replication (see Section 27.2.7) is used, > standby servers can also be senders, as well as receivers. Parameters are > mainly for sending and standby servers, though some parameters have meaning > only on the primary server. Settings may vary across the cluster without > problems if that is required. > > For logical replication, servers will either be publishers (also called > senders in the sections below) or subscribers. Publishers are , > Subscribers are... > OK. I split this blurb into 2 parts – streaming and logical replication. The streaming replication part is the same as before. The logical replication part is new. > > + > > + See for more details > > + about setting max_replication_slots for logical > > + replication. > > + > > > The link doesn't add any new information regarding max_replication_slots > other than "to reserve some for table sync" and has a good amount of > unrelated info. I think it might be useful to just put a line here asking to > reserve some for table sync instead of linking to the entire logical > replication config section. > OK. I copied the tablesync note back to config.sgml definition of 'max_replication_slots' and removed the link as suggested. Frankly, I also thought it is a bit strange that the max_replication_slots in the “Sending Servers” section was describing this parameter for “Subscribers”. OTOH, I did not want to split the definition in half so instead, I’ve added another Subscriber that just refers back to this place. It looks like an improvement to me. > > - Logical replication requires several configuration options to be set. > > + Logical replication requires several configuration parameters to be set. > > May not be needed? The docs have references to both options and parameters > but I don't feel strongly about it. Feel free to use what you prefer. OK. I removed this. > > I think we should add an additional line to the intro here saying that > parameters are mostly relevant only one of the subscriber or publisher. Maybe > a better written version of "While max_replication_slots means different > things on the publisher and subscriber, all other parameters are relevant > only on either the publisher or the subscriber." > OK. Done but with slightly different wording to that. > > + > > + Notes > > I don't think we need this sub-section. If I understand correctly, these > parameters are effective only on the subscriber side. So, any reason to not > include them in that section? OK. I moved these notes into the "Subscribers" section as suggested, and removed "Notes". > > > + > > + > > +Logical replication workers are also affected by > > + > linkend="guc-wal-receiver-timeout">wal_receiver_timeout, > > + > linkend="guc-wal-receiver-status-interval">wal_receiver_status_interval > > and > > + > linkend="guc-wal-retrieve-retry-interval">wal_receiver_retry_interval. > > + > > + > > I like moving this; it makes more sense here. Should we remove it from > config.sgml? It seems a bit out of place there as we generally talk only > about individual parameters there and this line is general logical > replication subscriber
Re: ExecRTCheckPerms() and many prunable partitions
On Wed, Dec 7, 2022 at 12:19 AM Alvaro Herrera wrote: > I have pushed this finally. Thanks a lot. > I made two further changes: > > 1. there was no reason to rename ExecCheckPerms_hook, since its >signature was changing anyway. I reverted it to the original name. Sure, that makes sense. > 2. I couldn't find any reason to expose ExecGetRTEPermissionInfo, and >given that it's a one-line function, I removed it. > > Maybe you had a reason to add ExecGetRTEPermissionInfo, thinking about > external callers; if so please discuss it. My thinking was that it might be better to have a macro/function that takes EState, not es_rteperminfos, from the callers. Kind of like how there is exec_rt_fetch(). Though, that is only a cosmetic consideration, so I don't want to insist. > I'll mark this commitfest entry as committed soon; please post the other > two patches you had in this series in a new thread. Will do, thanks. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Question regarding "Make archiver process an auxiliary process. commit"
At Wed, 7 Dec 2022 11:28:23 +0530, Sravan Kumar wrote in > On Tue, Dec 6, 2022 at 5:24 PM Bharath Rupireddy < > bharath.rupireddyforpostg...@gmail.com> wrote: > > > On Tue, Dec 6, 2022 at 4:57 PM Sravan Kumar > > wrote: > > > > > > Thank you for the feedback. > > > > > > I'll be glad to help with the fix. Here's the patch for review. > > > > Thanks. +1 for fixing this. > >> I would like to quote recent discussions on reducing the useless > >> wakeups or increasing the sleep/hibernation times in various processes > >> to reduce the power savings [1] [2] [3] [4] [5]. With that in context, > >> does the archiver need to wake up every 60 sec at all when its latch > >> gets set (PgArchWakeup()) whenever the server switches to a new WAL > >> file? What happens if we get rid of PGARCH_AUTOWAKE_INTERVAL and rely > >> on its latch being set? If required, we can spread PgArchWakeup() to > >> more places, no? > > > > > I like the idea of not having to wake up intermittently and probably we > should start a discussion about it. > > I see the following comment in PgArchWakeup(). > > /* > * We don't acquire ProcArrayLock here. It's actually fine because > * procLatch isn't ever freed, so we just can potentially set the wrong > * process' (or no process') latch. Even in that case the archiver will > * be relaunched shortly and will start archiving. > */ > > While I do not fully understand the comment, it gives me an impression that > the SetLatch() done here is counting on the timeout to wake up the archiver > in some cases where the latch is wrongly set. It is telling about the first iteration of archive process, not periodical wakeups. So I think it is doable by considering how to handle incomplete archiving iterations. > The proposed idea is a behaviour change while this thread intends to clean > up some code that's > a result of the mentioned commit. So probably the proposed idea can be > discussed as a separate thread. Agreed. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Question regarding "Make archiver process an auxiliary process. commit"
At Tue, 6 Dec 2022 17:23:50 +0530, Bharath Rupireddy wrote in > Thanks. +1 for fixing this. > > I would like to quote recent discussions on reducing the useless > wakeups or increasing the sleep/hibernation times in various processes > to reduce the power savings [1] [2] [3] [4] [5]. With that in context, > does the archiver need to wake up every 60 sec at all when its latch > gets set (PgArchWakeup()) whenever the server switches to a new WAL > file? What happens if we get rid of PGARCH_AUTOWAKE_INTERVAL and rely > on its latch being set? If required, we can spread PgArchWakeup() to > more places, no? I thought so first, but archiving may be interrupted for various reasons (disk full I think is the most common one). So, only intentional wakeups aren't sufficient. > Before even answering the above questions, I think we need to see if > there're any cases where a process can miss SetLatch() calls (I don't > have an answer for that). I read a recent Thomas' mail that says something like "should we consider the case latch sets are missed?". It is triggered by SIGURG or SetEvent(). I'm not sure but I believe the former is now reliable and the latter was born reliable. > Or do we want to stick to what the below comment says? > > /* > * There shouldn't be anything for the archiver to do except to wait for a > * signal ... however, the archiver exists to protect our data, so she > * wakes up occasionally to allow herself to be proactive. > */ So I think this is still valid. If we want to eliminate useless wakeups, archiver needs to remember whether the last iteration was fully done or not. But it seems to be a race condition is involved. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Question regarding "Make archiver process an auxiliary process. commit"
On Tue, Dec 6, 2022 at 5:24 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Tue, Dec 6, 2022 at 4:57 PM Sravan Kumar > wrote: > > > > Thank you for the feedback. > > > > I'll be glad to help with the fix. Here's the patch for review. > > Thanks. +1 for fixing this. >> I would like to quote recent discussions on reducing the useless >> wakeups or increasing the sleep/hibernation times in various processes >> to reduce the power savings [1] [2] [3] [4] [5]. With that in context, >> does the archiver need to wake up every 60 sec at all when its latch >> gets set (PgArchWakeup()) whenever the server switches to a new WAL >> file? What happens if we get rid of PGARCH_AUTOWAKE_INTERVAL and rely >> on its latch being set? If required, we can spread PgArchWakeup() to >> more places, no? > > I like the idea of not having to wake up intermittently and probably we should start a discussion about it. I see the following comment in PgArchWakeup(). /* * We don't acquire ProcArrayLock here. It's actually fine because * procLatch isn't ever freed, so we just can potentially set the wrong * process' (or no process') latch. Even in that case the archiver will * be relaunched shortly and will start archiving. */ While I do not fully understand the comment, it gives me an impression that the SetLatch() done here is counting on the timeout to wake up the archiver in some cases where the latch is wrongly set. The proposed idea is a behaviour change while this thread intends to clean up some code that's a result of the mentioned commit. So probably the proposed idea can be discussed as a separate thread. Before even answering the above questions, I think we need to see if > there're any cases where a process can miss SetLatch() calls (I don't > have an answer for that). > > Or do we want to stick to what the below comment says? > > /* > * There shouldn't be anything for the archiver to do except to wait > for a > * signal ... however, the archiver exists to protect our data, so she > * wakes up occasionally to allow herself to be proactive. > */ > > [1] > https://www.postgresql.org/message-id/CA%2BhUKGJCbcv8AtujLw3kEO2wRB7Ffzo1fmwaGG-tQLuMOjf6qQ%40mail.gmail.com > [2] > commit cd4329d9393f84dce34f0bd2dd936adc8ffaa213 > Author: Thomas Munro > Date: Tue Nov 29 11:28:08 2022 +1300 > > Remove promote_trigger_file. > > Previously, an idle startup (recovery) process would wake up every 5 > seconds to have a chance to poll for promote_trigger_file, even if that > GUC was not configured. That promotion triggering mechanism was > effectively superseded by pg_ctl promote and pg_promote() a long time > ago. There probably aren't many users left and it's very easy to > change > to the modern mechanisms, so we agreed to remove the feature. > > This is part of a campaign to reduce wakeups on idle systems. > > [3] https://commitfest.postgresql.org/41/4035/ > [4] https://commitfest.postgresql.org/41/4020/ > [5] > commit 05a7be93558c614ab89c794cb1d301ea9ff33199 > Author: Thomas Munro > Date: Tue Nov 8 20:36:36 2022 +1300 > > Suppress useless wakeups in walreceiver. > > Instead of waking up 10 times per second to check for various timeout > conditions, keep track of when we next have periodic work to do. > > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com > -- Thanks And Regards, Sravan Take life one day at a time.
Re: Force streaming every change in logical decoding
On Wed, Dec 7, 2022 at 12:55 PM Amit Kapila wrote: > > On Wed, Dec 7, 2022 at 7:31 AM Masahiko Sawada wrote: > > > > On Wed, Dec 7, 2022 at 8:46 AM Peter Smith wrote: > > > > > > > > > > > Yeah, I think this can also help in reducing the time for various > > > > tests in test_decoding/stream and > > > > src/test/subscription/t/*_stream_*.pl file by reducing the number of > > > > changes required to invoke streaming mode. Can we think of making this > > > > GUC extendible to even test more options on server-side (publisher) > > > > and client-side (subscriber) cases? For example, we can have something > > > > like logical_replication_mode with the following valid values: (a) > > > > server_serialize: this will serialize each change to file on > > > > publishers and then on commit restore and send all changes; (b) > > > > server_stream: this will stream each change as currently proposed in > > > > your patch. Then if we want to extend it for subscriber-side testing > > > > then we can introduce new options like client_serialize for the case > > > > being discussed in the email [1]. > > > > > > > > Thoughts? > > > > > > There is potential for lots of developer GUCs for testing/debugging in > > > the area of logical replication but IMO it might be better to keep > > > them all separated. Putting everything into a single > > > 'logical_replication_mode' might cause difficulties later when/if you > > > want combinations of the different modes. > > > > I think we want the developer option that forces streaming changes > > during logical decoding to be PGC_USERSET but probably the developer > > option for testing the parallel apply feature would be PGC_SIGHUP. > > > > Ideally, that is true but if we want to combine the multiple modes in > one parameter, is there a harm in keeping it as PGC_SIGHUP? It's not a big harm but we will end up doing ALTER SYSTEM and pg_reload_conf() even in regression tests (e.g. in test_decoding/stream.sql). > > > Also, since streaming changes is not specific to logical replication > > but to logical decoding, I'm not sure logical_replication_XXX is a > > good name. IMO having force_stream_mode and a different GUC for > > testing the parallel apply feature makes sense to me. > > > > But if we want to have a separate variable for testing/debugging > streaming like force_stream_mode, why not for serializing as well? And > if we want for both then we can even think of combining them in one > variable as logical_decoding_mode with values as 'stream' and > 'serialize'. Making it enum makes sense to me. > The first one specified would be given preference. Also, > the name force_stream_mode doesn't seem to convey that it is for > logical decoding. Agreed. > On one side having separate GUCs for publisher and subscriber seems to > give better flexibility but having more GUCs also sometimes makes them > less usable. Here, my thought was to have a single or as few GUCs as > possible which can be extendible by providing multiple values instead > of having different GUCs. I was trying to map this with the existing > string parameters in developer options. I see your point. On the other hand, I'm not sure it's a good idea to control different features by one GUC in general. The developer option could be an exception? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
RE: Time delayed LR (WAS Re: logical replication restrictions)
On Wednesday, December 7, 2022 12:00 PM Kyotaro Horiguchi wrote: > At Tue, 6 Dec 2022 11:08:43 -0800, Andres Freund wrote > in > > Hi, > > > > The tests fail on cfbot: > > https://cirrus-ci.com/task/4533866329800704 > > > > They only seem to fail on 32bit linux. > > > > https://api.cirrus-ci.com/v1/artifact/task/4533866329800704/testrun/bu > > ild-32/testrun/subscription/032_apply_delay/log/regress_log_032_apply_ > > delay > > [06:27:10.628](0.138s) ok 2 - check if the new rows were applied to > > subscriber timed out waiting for match: (?^:logical replication apply > > delay) at > /tmp/cirrus-ci-build/src/test/subscription/t/032_apply_delay.pl line 124. > > It fails for me on 64bit Linux.. (Rocky 8.7) > > > t/032_apply_delay.pl ... Dubious, test returned 29 (wstat > > 7424, 0x1d00) No subtests run > .. > > t/032_apply_delay.pl (Wstat: 7424 Tests: 0 Failed: 0) > > Non-zero exit status: 29 > > Parse errors: No plan found in TAP output > > regards. Hi, thank you so much for your notifications ! I'll look into the failures. Best Regards, Takamichi Osumi
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Tue, Dec 6, 2022 at 5:44 PM Takamichi Osumi (Fujitsu) wrote: > > On Friday, December 2, 2022 4:05 PM Amit Kapila > wrote: > > On Tue, Nov 15, 2022 at 12:33 PM Amit Kapila > > wrote: > > > One more thing I would like you to consider is the point raised by me > > > related to this patch's interaction with the parallel apply feature as > > > mentioned in the email [1]. I am not sure the idea proposed in that > > > email [1] is a good one because delaying after applying commit may not > > > be good as we want to delay the apply of the transaction(s) on > > > subscribers by this feature. I feel this needs more thought. > > > > > > > I have thought a bit more about this and we have the following options to > > choose the delay point from. (a) apply delay just before committing a > > transaction. As mentioned in comments in the patch this can lead to bloat > > and > > locks held for a long time. (b) apply delay before starting to apply > > changes for a > > transaction but here the problem is which time to consider. In some cases, > > like > > for streaming transactions, we don't receive the commit/prepare xact time in > > the start message. (c) use (b) but use the previous transaction's commit > > time. > > (d) apply delay after committing a transaction by using the xact's commit > > time. > > > > At this stage, among above, I feel any one of (c) or (d) is worth > > considering. Now, > > the difference between (c) and (d) is that if after commit the next xact's > > data is > > already delayed by more than min_apply_delay time then we don't need to kick > > the new logic of apply delay. > > > > The other thing to consider whether we need to process any keepalive > > messages during the delay because otherwise, walsender may think that the > > subscriber is not available and time out. This may not be a problem for > > synchronous replication but otherwise, it could be a problem. > > > > Thoughts? > Hi, > > > Thank you for your comments ! > Below are some analysis for the major points above. > > (1) About the timing to apply the delay > > One approach of (b) would be best. The idea is to delay all types of > transaction's application > based on the time when one transaction arrives at the subscriber node. > But I think it will unnecessarily add the delay when there is a delay in sending the transaction by the publisher (say due to the reason that publisher was busy handling other workloads or there was a temporary network communication break between publisher and subscriber). This could probably be the reason why physical replication (via recovery_min_apply_delay) uses the commit time of the sending side. > One advantage of this approach over (c) and (d) is that this can avoid the > case > where we might apply a transaction immediately without waiting, > if there are two transactions sequentially and the time in between exceeds > the min_apply_delay time. > I am not sure if I understand your point. However, I think even if the transactions are sequential but if the time between them exceeds (say because the publisher was down) min_apply_delay, there is no need to apply additional delay. > When we receive stream-in-progress transactions, we'll check whether the time > for delay > has passed or not at first in this approach. > > > (2) About the timeout issue > > When having a look at the physical replication internals, > it conducts sending feedback and application of delay separately on different > processes. > OTOH, the logical replication needs to achieve those within one process. > > When we want to apply delay and avoid the timeout, > we should not store all the transactions data into memory. > So, one approach for this is to serialize the transaction data and after the > delay, > we apply the transactions data. > It is not clear to me how this will avoid a timeout. > However, this means if users adopt this feature, > then all transaction data that should be delayed would be serialized. > We are not sure if this sounds a valid approach or not. > > One another approach was to divide the time of delay in apply_delay() and > utilize the divided time for WaitLatch and sends the keepalive messages from > there. > Do we anytime send keepalive messages from the apply side? I think we only send feedback reply messages as a response to the publisher's keep_alive message. So, we need to do something similar for this if you want to follow this approach. -- With Regards, Amit Kapila.
Re: Perform streaming logical transactions by background workers and parallel apply
On Wed, Dec 7, 2022 at 1:29 PM Amit Kapila wrote: > > On Wed, Dec 7, 2022 at 9:00 AM Masahiko Sawada wrote: > > > > On Thu, Dec 1, 2022 at 7:17 PM houzj.f...@fujitsu.com > > wrote: > > > > > > > --- > > > > if (am_parallel_apply_worker() && on_subinfo_change) { > > > > /* > > > > * If a parallel apply worker exits due to the subscription > > > > * information change, we notify the leader apply worker so that the > > > > * leader can report more meaningful message in time and restart the > > > > * logical replication. > > > > */ > > > > pq_putmessage('X', NULL, 0); > > > > } > > > > > > > > and > > > > > > > > ereport(ERROR, > > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > > > errmsg("logical replication parallel apply worker > > > > exited > > > > because of subscription information change"))); > > > > > > > > Do we really need an additional message in case of 'X'? When we call > > > > apply_worker_clean_exit with on_subinfo_change = true, we have reported > > > > the > > > > error message such as: > > > > > > > > ereport(LOG, > > > > (errmsg("logical replication parallel apply worker for > > > > subscription > > > > \"%s\" will stop because of a parameter change", > > > > MySubscription->name))); > > > > > > > > I think that reporting a similar message from the leader might not be > > > > meaningful for users. > > > > > > The intention is to let leader report more meaningful message if a worker > > > exited due to subinfo change. Otherwise, the leader is likely to report an > > > error like " lost connection ... to parallel apply worker" when trying to > > > send > > > data via shared memory if the worker exited. What do you think ? > > > > Agreed. But do we need to have the leader exit with an error in spite > > of the fact that the worker cleanly exits? If the leader exits with an > > error, the subscription will be disabled if disable_on_error is true, > > right? > > > > Right, but the leader will anyway exit at some point either due to an > ERROR like "lost connection ... to parallel worker" or with a LOG > like: "... will restart because of a parameter change" but I see your > point. So, will it be better if we have a LOG message here and then > proc_exit()? Do you have something else in mind for this? No, I was thinking that too. It's better to write a LOG message and do proc_exit(). Regarding the error "lost connection ... to parallel worker", it could still happen depending on the timing even if the parallel worker cleanly exits due to parameter changes, right? If so, I'm concerned that it could lead to disable the subscription unexpectedly if disable_on_error is enabled. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Perform streaming logical transactions by background workers and parallel apply
On Wed, Dec 7, 2022 at 9:00 AM Masahiko Sawada wrote: > > On Thu, Dec 1, 2022 at 7:17 PM houzj.f...@fujitsu.com > wrote: > > > > > --- > > > if (am_parallel_apply_worker() && on_subinfo_change) { > > > /* > > > * If a parallel apply worker exits due to the subscription > > > * information change, we notify the leader apply worker so that the > > > * leader can report more meaningful message in time and restart the > > > * logical replication. > > > */ > > > pq_putmessage('X', NULL, 0); > > > } > > > > > > and > > > > > > ereport(ERROR, > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > > errmsg("logical replication parallel apply worker exited > > > because of subscription information change"))); > > > > > > Do we really need an additional message in case of 'X'? When we call > > > apply_worker_clean_exit with on_subinfo_change = true, we have reported > > > the > > > error message such as: > > > > > > ereport(LOG, > > > (errmsg("logical replication parallel apply worker for > > > subscription > > > \"%s\" will stop because of a parameter change", > > > MySubscription->name))); > > > > > > I think that reporting a similar message from the leader might not be > > > meaningful for users. > > > > The intention is to let leader report more meaningful message if a worker > > exited due to subinfo change. Otherwise, the leader is likely to report an > > error like " lost connection ... to parallel apply worker" when trying to > > send > > data via shared memory if the worker exited. What do you think ? > > Agreed. But do we need to have the leader exit with an error in spite > of the fact that the worker cleanly exits? If the leader exits with an > error, the subscription will be disabled if disable_on_error is true, > right? > Right, but the leader will anyway exit at some point either due to an ERROR like "lost connection ... to parallel worker" or with a LOG like: "... will restart because of a parameter change" but I see your point. So, will it be better if we have a LOG message here and then proc_exit()? Do you have something else in mind for this? -- With Regards, Amit Kapila.
RE: Avoid streaming the transaction which are skipped (in corner cases)
On Wed, Dec 7, 2022 12:01 PM Dilip Kumar wrote: > > On Wed, Dec 7, 2022 at 9:28 AM Amit Kapila > wrote: > > > > On Tue, Dec 6, 2022 at 11:55 AM shiy.f...@fujitsu.com > > wrote: > > > > > > On Mon, Dec 5, 2022 6:57 PM Amit Kapila > wrote: > > > > > > > > On Mon, Dec 5, 2022 at 3:41 PM Dilip Kumar > wrote: > > > > > > > > > > I think we need something like this[1] so that we can better control > > > > > the streaming. > > > > > > > > > > > > > +1. The additional advantage would be that we can generate parallel > > > > apply and new streaming tests with much lesser data. Shi-San, can you > > > > please start a new thread for the GUC patch proposed by you as > > > > indicated by Dilip? > > > > > > > > > > OK, I started a new thread for it. [1] > > > > > > > Thanks. I think it is better to go ahead with this patch and once we > > decide what is the right thing to do in terms of GUC then we can try > > to add additional tests for this. Anyway, it is not that the code > > added by this patch is not getting covered by existing tests. What do > > you think? > > That makes sense to me. > +1 Regards, Shi yu
Re: Avoid streaming the transaction which are skipped (in corner cases)
On Wed, Dec 7, 2022 at 9:28 AM Amit Kapila wrote: > > On Tue, Dec 6, 2022 at 11:55 AM shiy.f...@fujitsu.com > wrote: > > > > On Mon, Dec 5, 2022 6:57 PM Amit Kapila wrote: > > > > > > On Mon, Dec 5, 2022 at 3:41 PM Dilip Kumar wrote: > > > > > > > > I think we need something like this[1] so that we can better control > > > > the streaming. > > > > > > > > > > +1. The additional advantage would be that we can generate parallel > > > apply and new streaming tests with much lesser data. Shi-San, can you > > > please start a new thread for the GUC patch proposed by you as > > > indicated by Dilip? > > > > > > > OK, I started a new thread for it. [1] > > > > Thanks. I think it is better to go ahead with this patch and once we > decide what is the right thing to do in terms of GUC then we can try > to add additional tests for this. Anyway, it is not that the code > added by this patch is not getting covered by existing tests. What do > you think? That makes sense to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Avoid streaming the transaction which are skipped (in corner cases)
On Tue, Dec 6, 2022 at 11:55 AM shiy.f...@fujitsu.com wrote: > > On Mon, Dec 5, 2022 6:57 PM Amit Kapila wrote: > > > > On Mon, Dec 5, 2022 at 3:41 PM Dilip Kumar wrote: > > > > > > I think we need something like this[1] so that we can better control > > > the streaming. > > > > > > > +1. The additional advantage would be that we can generate parallel > > apply and new streaming tests with much lesser data. Shi-San, can you > > please start a new thread for the GUC patch proposed by you as > > indicated by Dilip? > > > > OK, I started a new thread for it. [1] > Thanks. I think it is better to go ahead with this patch and once we decide what is the right thing to do in terms of GUC then we can try to add additional tests for this. Anyway, it is not that the code added by this patch is not getting covered by existing tests. What do you think? -- With Regards, Amit Kapila.
Re: Force streaming every change in logical decoding
On Wed, Dec 7, 2022 at 7:31 AM Masahiko Sawada wrote: > > On Wed, Dec 7, 2022 at 8:46 AM Peter Smith wrote: > > > > > > > > Yeah, I think this can also help in reducing the time for various > > > tests in test_decoding/stream and > > > src/test/subscription/t/*_stream_*.pl file by reducing the number of > > > changes required to invoke streaming mode. Can we think of making this > > > GUC extendible to even test more options on server-side (publisher) > > > and client-side (subscriber) cases? For example, we can have something > > > like logical_replication_mode with the following valid values: (a) > > > server_serialize: this will serialize each change to file on > > > publishers and then on commit restore and send all changes; (b) > > > server_stream: this will stream each change as currently proposed in > > > your patch. Then if we want to extend it for subscriber-side testing > > > then we can introduce new options like client_serialize for the case > > > being discussed in the email [1]. > > > > > > Thoughts? > > > > There is potential for lots of developer GUCs for testing/debugging in > > the area of logical replication but IMO it might be better to keep > > them all separated. Putting everything into a single > > 'logical_replication_mode' might cause difficulties later when/if you > > want combinations of the different modes. > > I think we want the developer option that forces streaming changes > during logical decoding to be PGC_USERSET but probably the developer > option for testing the parallel apply feature would be PGC_SIGHUP. > Ideally, that is true but if we want to combine the multiple modes in one parameter, is there a harm in keeping it as PGC_SIGHUP? > Also, since streaming changes is not specific to logical replication > but to logical decoding, I'm not sure logical_replication_XXX is a > good name. IMO having force_stream_mode and a different GUC for > testing the parallel apply feature makes sense to me. > But if we want to have a separate variable for testing/debugging streaming like force_stream_mode, why not for serializing as well? And if we want for both then we can even think of combining them in one variable as logical_decoding_mode with values as 'stream' and 'serialize'. The first one specified would be given preference. Also, the name force_stream_mode doesn't seem to convey that it is for logical decoding. We can probably have a separate variable for the subscriber side. On one side having separate GUCs for publisher and subscriber seems to give better flexibility but having more GUCs also sometimes makes them less usable. Here, my thought was to have a single or as few GUCs as possible which can be extendible by providing multiple values instead of having different GUCs. I was trying to map this with the existing string parameters in developer options. -- With Regards, Amit Kapila.
Re: Perform streaming logical transactions by background workers and parallel apply
On Thu, Dec 1, 2022 at 7:17 PM houzj.f...@fujitsu.com wrote: > > On Thursday, December 1, 2022 3:58 PM Masahiko Sawada > wrote: > > > > On Wed, Nov 30, 2022 at 10:51 PM houzj.f...@fujitsu.com > > wrote: > > > > > > On Wednesday, November 30, 2022 9:41 PM houzj.f...@fujitsu.com > > wrote: > > > > > > > > On Tuesday, November 29, 2022 8:34 PM Amit Kapila > > > > > Review comments on v53-0001* > > > > > > > > Attach the new version patch set. > > > > > > Sorry, there were some mistakes in the previous patch set. > > > Here is the correct V54 patch set. I also ran pgindent for the patch set. > > > > > > > Thank you for updating the patches. Here are random review comments for > > 0001 and 0002 patches. > > Thanks for the comments! > > > > > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("logical replication parallel apply worker exited > > abnormally"), > > errcontext("%s", edata.context))); and > > > > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("logical replication parallel apply worker exited > > because of subscription information change"))); > > > > I'm not sure ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is appropriate > > here. Given that parallel apply worker has already reported the error > > message > > with the error code, I think we don't need to set the errorcode for the logs > > from the leader process. > > > > Also, I'm not sure the term "exited abnormally" is appropriate since we use > > it > > when the server crashes for example. I think ERRORs reported here don't mean > > that in general. > > How about reporting "xxx worker exited due to error" ? Sounds better to me. > > > --- > > if (am_parallel_apply_worker() && on_subinfo_change) { > > /* > > * If a parallel apply worker exits due to the subscription > > * information change, we notify the leader apply worker so that the > > * leader can report more meaningful message in time and restart the > > * logical replication. > > */ > > pq_putmessage('X', NULL, 0); > > } > > > > and > > > > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("logical replication parallel apply worker exited > > because of subscription information change"))); > > > > Do we really need an additional message in case of 'X'? When we call > > apply_worker_clean_exit with on_subinfo_change = true, we have reported the > > error message such as: > > > > ereport(LOG, > > (errmsg("logical replication parallel apply worker for subscription > > \"%s\" will stop because of a parameter change", > > MySubscription->name))); > > > > I think that reporting a similar message from the leader might not be > > meaningful for users. > > The intention is to let leader report more meaningful message if a worker > exited due to subinfo change. Otherwise, the leader is likely to report an > error like " lost connection ... to parallel apply worker" when trying to send > data via shared memory if the worker exited. What do you think ? Agreed. But do we need to have the leader exit with an error in spite of the fact that the worker cleanly exits? If the leader exits with an error, the subscription will be disabled if disable_on_error is true, right? And what do you think about the error code? > > > --- > > -if (options->proto.logical.streaming && > > -PQserverVersion(conn->streamConn) >= 14) > > -appendStringInfoString(, ", streaming 'on'"); > > +if (options->proto.logical.streaming_str) > > +appendStringInfo(, ", streaming '%s'", > > + > > options->proto.logical.streaming_str); > > > > and > > > > +/* > > + * Assign the appropriate option value for streaming option > > according to > > + * the 'streaming' mode and the publisher's ability to > > support that mode. > > + */ > > +if (server_version >= 16 && > > +MySubscription->stream == SUBSTREAM_PARALLEL) > > +{ > > +options.proto.logical.streaming_str = pstrdup("parallel"); > > +MyLogicalRepWorker->parallel_apply = true; > > +} > > +else if (server_version >= 14 && > > + MySubscription->stream != SUBSTREAM_OFF) > > +{ > > +options.proto.logical.streaming_str = pstrdup("on"); > > +MyLogicalRepWorker->parallel_apply = false; > > +} > > +else > > +{ > > +options.proto.logical.streaming_str = NULL; > > +MyLogicalRepWorker->parallel_apply = false; > > +} > > > > This change moves the code of adjustment of the streaming option based on > > the publisher server version from libpqwalreceiver.c to worker.c. > > On the
Re: [RFC] building postgres with meson - v13
Hi, On 2022-09-26 14:15:35 -0700, Peter Geoghegan wrote: > On Mon, Sep 26, 2022 at 1:27 PM Andres Freund wrote: > > > Some feedback: > > > * I gather that "running" as it appears in commands like "meson test > > > --setup running" refers to a particular setup named "running", that > > > you invented as part of creating a meson-ish substitute for > > > installcheck. Can "running" be renamed to something that makes it > > > obvious that it's a Postgres thing, and not a generic meson thing? > > > > Yes. The only caveat is that it makes lines longer, because it's included in > > the printed test line (there's no real requirement to have the test suite > > and > > the setup named the same,b ut it seems confusing not to) > > Probably doesn't have to be too long. And I'm not sure of the details. > Just a small thing from my point of view. Attached is an updated version of that patch. I left the name as 'running' because a postgres- or pg- prefix felt to awkward. This just adds fairly minimal documentation for the 'running' setup, while we now have some basic docs for building with meson, we don't yet have a "translation" of regress.sgml. Not sure how to structure that best, either. I plan to commit that soon. This likely isn't the be-all-end-all, but it's quite useful as-is. Greetings, Andres Freund >From a50cc9ac3045b0ef1cfd3cc76b3ff5759bdb85ef Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 6 Dec 2022 19:24:44 -0800 Subject: [PATCH v2] meson: Add 'running' test setup, as a replacement for installcheck To run all tests that support running against existing server: $ meson test --setup running To run just the main pg_regress tests against existing server: $ meson test --setup running regress-running/regress To ensure the 'running' setup continues to work, test it as part of the freebsd CI task. Discussion: https://postgr.es/m/CAH2-Wz=xdqcmloo7rr_i6fkqddmcyb9q5gstnfuuqxroghb...@mail.gmail.com --- meson.build | 91 --- contrib/basic_archive/meson.build | 3 + contrib/pg_freespacemap/meson.build | 3 + contrib/pg_stat_statements/meson.build| 4 + contrib/pg_walinspect/meson.build | 3 + contrib/test_decoding/meson.build | 5 + src/interfaces/ecpg/test/meson.build | 1 + doc/src/sgml/installation.sgml| 6 ++ .cirrus.yml | 15 +++ src/test/isolation/meson.build| 1 + src/test/modules/commit_ts/meson.build| 3 + src/test/modules/snapshot_too_old/meson.build | 3 + src/test/modules/test_oat_hooks/meson.build | 1 + src/test/modules/test_pg_dump/meson.build | 2 + src/test/modules/test_slru/meson.build| 1 + src/test/modules/worker_spi/meson.build | 4 +- src/test/regress/meson.build | 1 + 17 files changed, 132 insertions(+), 15 deletions(-) diff --git a/meson.build b/meson.build index 39fc3ddab26..3cb50c0b172 100644 --- a/meson.build +++ b/meson.build @@ -2920,6 +2920,20 @@ endif # Test Generation ### +# When using a meson version understanding exclude_suites, define a +# 'tmp_install' test setup (the default) that excludes tests running against a +# pre-existing install and a 'running' setup that conflicts with creation of +# the temporary installation and tap tests (which don't support running +# against a running server). + +running_suites = [] +install_suites = [] +if meson.version().version_compare('>=0.57') + runningcheck = true +else + runningcheck = false +endif + testwrap = files('src/tools/testwrap') foreach test_dir : tests @@ -2927,7 +2941,6 @@ foreach test_dir : tests testwrap, '--basedir', meson.build_root(), '--srcdir', test_dir['sd'], -'--testgroup', test_dir['name'], ] foreach kind, v : test_dir @@ -2940,55 +2953,94 @@ foreach test_dir : tests if kind in ['regress', 'isolation', 'ecpg'] if kind == 'regress' runner = pg_regress +fallback_dbname = 'regression_@0@' elif kind == 'isolation' runner = pg_isolation_regress +fallback_dbname = 'isolation_regression_@0@' elif kind == 'ecpg' runner = pg_regress_ecpg +fallback_dbname = 'ecpg_regression_@0@' endif - test_output = test_result_dir / test_dir['name'] / kind + test_group = test_dir['name'] + test_group_running = test_dir['name'] + '-running' - test_command = [ + test_output = test_result_dir / test_group / kind + test_output_running = test_result_dir / test_group_running/ kind + + # Unless specified by the test, choose a non-conflicting database name, + # to avoid conflicts when running against existing server. + dbname = t.get('dbname', +fallback_dbname.format(test_dir['name'])) + + test_command_base = [ runner.full_path(),
Re: ANY_VALUE aggregate
On Mon, Dec 5, 2022 at 10:40 PM Vik Fearing wrote: > On 12/6/22 05:57, David G. Johnston wrote: > > On Mon, Dec 5, 2022 at 9:48 PM Vik Fearing > wrote: > > > >> I can imagine an optimization that would remove an ORDER BY clause > >> because it isn't needed for any other aggregate. > > > > > > I'm referring to the query: > > > > select any_value(v order by v) from (values (2),(1),(3)) as vals (v); > > // produces 1, per the documented implementation-defined behavior. > > Implementation-dependent. It is NOT implementation-defined, per spec. > > I really don't care all that much about the spec here given that ORDER BY in an aggregate call is non-spec. We often loosen the spec rules when they don't make technical sense to > us, but I don't know of any example of when we have tightened them. > The function has to choose some row from among its inputs, and the system has to obey an order by specification added to the function call. You are de-facto creating a first_value aggregate (which is by definition non-standard) whether you like it or not. I'm just saying to be upfront and honest about it - our users do want such a capability so maybe accept that there is a first time for everything. Not that picking an advantageous "implementation-dependent" implementation should be considered deviating from the spec. > > Someone writing: > > > > select any_value(v) from (values (2),(1),(3)) as vals (v) order by v; > > > > Is not presently, nor am I saying, promised the value 1. > > > > I'm assuming you are thinking of the second query form, while the > guarantee > > only needs to apply to the first. > > I am saying that a theoretical pg_aggregate.aggorderdoesnotmatter could > bestow upon ANY_VALUE the ability to make those two queries equivalent. > That theoretical idea should not be entertained. Removing a user's explicitly added ORDER BY should be off-limits. Any approach at optimization here should simply look at whether an ORDER BY is specified and pass that information to the function. If the function itself really believes that ordering matters it can emit its own runtime exception stating that fact and the user can fix their query. > If you care about which value you get back, use something else. > > There isn't a "something else" to use so that isn't presently an option. I suppose it comes down to what level of belief and care you have that people will simply mis-use this function if it is added in its current form to get the desired first_value effect that it produces. David J.
Re: Commit fest 2022-11
On Wed, Dec 07, 2022 at 08:25:25AM +0530, vignesh C wrote: > The Commitfest 2022-11 status still shows as "In Progress", Shouldn't > the status be changed to "Closed" and the entries be moved to the next > commitfest. Yes, Ian has told me that he is on it this week. -- Michael signature.asc Description: PGP signature
Re: Using WaitEventSet in the postmaster
Oops, v5 was broken as visible on cfbot (a last second typo broke it). Here's a better one. From 07b04dc410118ad04fd0006edda7ba80f241357a Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 6 Dec 2022 15:21:11 +1300 Subject: [PATCH v6 1/4] Add WL_SOCKET_ACCEPT event to WaitEventSet API. To be able to handle incoming connections on a server socket with the WaitEventSet API, we'll need a new kind of event to indicate that the the socket is ready to accept a connection. On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there is a different kernel event that we need to map our abstraction to. A future commit will use this. Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 13 - src/include/storage/latch.h | 7 +++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index eb3a569aae..7ced8264f0 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -864,6 +864,9 @@ FreeWaitEventSet(WaitEventSet *set) * - WL_SOCKET_CONNECTED: Wait for socket connection to be established, * can be combined with other WL_SOCKET_* events (on non-Windows * platforms, this is the same as WL_SOCKET_WRITEABLE) + * - WL_SOCKET_ACCEPT: Wait for new connection to a server socket, + * can be combined with other WL_SOCKET_* events (on non-Windows + * platforms, this is the same as WL_SOCKET_READABLE) * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer. * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies * @@ -874,7 +877,7 @@ FreeWaitEventSet(WaitEventSet *set) * i.e. it must be a process-local latch initialized with InitLatch, or a * shared latch associated with the current process by calling OwnLatch. * - * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED cases, EOF and error + * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED/ACCEPT cases, EOF and error * conditions cause the socket to be reported as readable/writable/connected, * so that the caller can deal with the condition. * @@ -1312,6 +1315,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event) flags |= FD_WRITE; if (event->events & WL_SOCKET_CONNECTED) flags |= FD_CONNECT; + if (event->events & WL_SOCKET_ACCEPT) + flags |= FD_ACCEPT; if (*handle == WSA_INVALID_EVENT) { @@ -2067,6 +2072,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, /* connected */ occurred_events->events |= WL_SOCKET_CONNECTED; } + if ((cur_event->events & WL_SOCKET_ACCEPT) && + (resEvents.lNetworkEvents & FD_ACCEPT)) + { + /* incoming connection could be accepted */ + occurred_events->events |= WL_SOCKET_ACCEPT; + } if (resEvents.lNetworkEvents & FD_CLOSE) { /* EOF/error, so signal all caller-requested socket flags */ diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index 68ab740f16..c55838db60 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -135,9 +135,16 @@ typedef struct Latch #define WL_SOCKET_CONNECTED WL_SOCKET_WRITEABLE #endif #define WL_SOCKET_CLOSED (1 << 7) +#ifdef WIN32 +#define WL_SOCKET_ACCEPT (1 << 8) +#else +/* avoid having to deal with case on platforms not requiring it */ +#define WL_SOCKET_ACCEPT WL_SOCKET_READABLE +#endif #define WL_SOCKET_MASK (WL_SOCKET_READABLE | \ WL_SOCKET_WRITEABLE | \ WL_SOCKET_CONNECTED | \ + WL_SOCKET_ACCEPT | \ WL_SOCKET_CLOSED) typedef struct WaitEvent -- 2.35.1 From 827866959dbbe537f6677271093f6d7730bd2527 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 6 Dec 2022 16:13:36 +1300 Subject: [PATCH v6 2/4] Don't leak a signalfd when using latches in the postmaster. At the time of commit 6a2a70a02 we didn't use latch infrastructure in the postmaster. We're planning to start doing that, so we'd better make sure that the signalfd inherited from a postmaster is not duplicated and then leaked in the child. Reviewed-by: Andres Freund Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 16 1 file changed, 16 insertions(+) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index 7ced8264f0..b32c96b63d 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -283,6 +283,22 @@ InitializeLatchSupport(void) #ifdef WAIT_USE_SIGNALFD sigset_t signalfd_mask; + if (IsUnderPostmaster) + { + /* + * It would probably be safe to re-use the inherited signalfd since + * signalfds only see the current process's pending signals, but it + * seems less surprising to close it and create our own. + */ + if (signal_fd != -1) + { + /* Release postmaster's signal FD; ignore
Re: meson PGXS compatibility
Hi, On 2022-12-01 12:17:51 -0800, Andres Freund wrote: > I'll push 0001, 0002 shortly. I don't think 0002 is the most elegant approach, > but it's not awful. I'd still like some review for 0003, but will try to push > it in a few days if that's not forthcoming. Pushed 0003 (move export_dynamic determination to configure.ac) and 0004 (PGXS compatibility). Hope there's no fallout from 0003. Greetings, Andres Freund
RE: Perform streaming logical transactions by background workers and parallel apply
On Tue, Dec 6, 2022 7:57 AM Peter Smith wrote: > Here are my review comments for patch v55-0002 Thansk for your comments. > == > > .../replication/logical/applyparallelworker.c > > 1. pa_can_start > > @@ -276,9 +278,9 @@ pa_can_start(TransactionId xid) > /* > * Don't start a new parallel worker if user has set skiplsn as it's > * possible that user want to skip the streaming transaction. For > - * streaming transaction, we need to spill the transaction to disk so > that > - * we can get the last LSN of the transaction to judge whether to > skip > - * before starting to apply the change. > + * streaming transaction, we need to serialize the transaction to a > + file > + * so that we can get the last LSN of the transaction to judge > + whether to > + * skip before starting to apply the change. > */ > if (!XLogRecPtrIsInvalid(MySubscription->skiplsn)) > return false; > > I think the wording change may belong in patch 0001 because it has > nothing to do with partial serializing. Changed. > ~~~ > > 2. pa_free_worker > > + /* > + * Stop the worker if there are enough workers in the pool. > + * > + * XXX The worker is also stopped if the leader apply worker needed > + to > + * serialize part of the transaction data due to a send timeout. This > + is > + * because the message could be partially written to the queue due to > + send > + * timeout and there is no way to clean the queue other than > + resending the > + * message until it succeeds. To avoid complexity, we directly stop > + the > + * worker in this case. > + */ > + if (winfo->serialize_changes || > + napplyworkers > (max_parallel_apply_workers_per_subscription / 2)) > > Don't need to say "due to send timeout" 2 times in 2 sentences. > > SUGGESTION > XXX The worker is also stopped if the leader apply worker needed to > serialize part of the transaction data due to a send timeout. This is > because the message could be partially written to the queue but there > is no way to clean the queue other than resending the message until it > succeeds. Directly stopping the worker avoids needing this complexity. Changed. > 4. > > /* > + * Replay the spooled messages in the parallel apply worker if the > +leader apply > + * worker has finished serializing changes to the file. > + */ > +static void > +pa_spooled_messages(void) > > I'm not 100% sure of the logic, so IMO maybe the comment should say a > bit more about how this works: > > Specifically, let's say there was some timeout and the LA needed to > write the spool file, then let's say the PA timed out and found itself > inside this function. Now, let's say the LA is still busy writing the > file -- so what happens next? > > Does this function simply return, then the main PA loop waits again, > then the times out again, then PA finds itself back inside this > function again... and that keeps happening over and over until > eventually the spool file is found FS_READY? Some explanatory comments > might help. Slightly changed the logic and comment here. > ~ > > 5. > > + /* > + * Check if changes have been serialized to a file. if so, read and > + apply > + * them. > + */ > + SpinLockAcquire(>mutex); > + fileset_state = MyParallelShared->fileset_state; > + SpinLockRelease(>mutex); > > "if so" -> "If so" Changed. > ~~~ > > > 6. pa_send_data > > + * > + * If the attempt to send data via shared memory times out, then we > + will > switch > + * to "PARTIAL_SERIALIZE mode" for the current transaction to prevent > possible > + * deadlocks with another parallel apply worker (refer to the > + comments atop > + * applyparallelworker.c for details). This means that the current > + data and any > + * subsequent data for this transaction will be serialized to a file. > */ > void > pa_send_data(ParallelApplyWorkerInfo *winfo, Size nbytes, const void > *data) > > SUGGESTION (minor comment rearranging) > > If the attempt to send data via shared memory times out, then we will > switch to "PARTIAL_SERIALIZE mode" for the current transaction -- this > means that the current data and any subsequent data for this > transaction will be serialized to a file. This is done to prevent > possible deadlocks with another parallel apply worker (refer to the > comments atop applyparallelworker.c for details). Changed. > ~ > > 7. > > + /* > + * Take the stream lock to make sure that the parallel apply worker > + * will wait for the leader to release the stream lock until the > + * end of the transaction. > + */ > + pa_lock_stream(winfo->shared->xid, AccessExclusiveLock); > > The comment doesn't sound right. > > "until the end" -> "at the end" (??) I think it means "PA wait ... until the end of transaction". > ~~~ > > 8. pa_stream_abort > > @@ -1374,6 +1470,7 @@ pa_stream_abort(LogicalRepStreamAbortData > *abort_data) > RollbackToSavepoint(spname); > CommitTransactionCommand(); > subxactlist = list_truncate(subxactlist, i + 1); > + > break; >
Re: Time delayed LR (WAS Re: logical replication restrictions)
At Tue, 6 Dec 2022 11:08:43 -0800, Andres Freund wrote in > Hi, > > The tests fail on cfbot: > https://cirrus-ci.com/task/4533866329800704 > > They only seem to fail on 32bit linux. > > https://api.cirrus-ci.com/v1/artifact/task/4533866329800704/testrun/build-32/testrun/subscription/032_apply_delay/log/regress_log_032_apply_delay > [06:27:10.628](0.138s) ok 2 - check if the new rows were applied to subscriber > timed out waiting for match: (?^:logical replication apply delay) at > /tmp/cirrus-ci-build/src/test/subscription/t/032_apply_delay.pl line 124. It fails for me on 64bit Linux.. (Rocky 8.7) > t/032_apply_delay.pl ... Dubious, test returned 29 (wstat 7424, > 0x1d00) > No subtests run .. > t/032_apply_delay.pl (Wstat: 7424 Tests: 0 Failed: 0) > Non-zero exit status: 29 > Parse errors: No plan found in TAP output regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [DOCS] Stats views and functions not in order?
On Tue, Dec 6, 2022 at 6:36 PM Peter Smith wrote: > I'd like to "fix" this but IIUC there is no consensus yet about what > order is best for patch 0001, right? > > I'm planning on performing a more thorough review of 0003 and 0004 tomorrow. As for 0001 - go with Peter E.'s suggested ordering. David J.
Re: Commit fest 2022-11
Hi, The Commitfest 2022-11 status still shows as "In Progress", Shouldn't the status be changed to "Closed" and the entries be moved to the next commitfest. Regards, Vignesh On Wed, 16 Nov 2022 at 18:30, Ian Lawrence Barwick wrote: > > 2022年11月14日(月) 22:38 Ian Lawrence Barwick : > > > > 2022年11月14日(月) 22:23 James Coleman : > > > > > > On Mon, Nov 14, 2022 at 7:08 AM Ian Lawrence Barwick > > > wrote: > > > > > > > > 2022年11月9日(水) 8:12 Justin Pryzby : > > > > > > > > If my script is not wrong, these patches add TAP tests, but don't > > > > > update > > > > > the requisite ./meson.build file. It seems like it'd be reasonable to > > > > > set them all as WOA until that's done. > > > > > > > > > > $ for a in `git branch -a |sort |grep commitfest/40`; do : echo > > > > > "$a..."; x=`git log -1 --compact-summary "$a"`; echo "$x" |grep > > > > > '/t/.*pl.*new' >/dev/null || continue; echo "$x" |grep -Fw meson > > > > > >/dev/null && continue; git log -1 --oneline "$a"; done > > > > > ... [CF 40/3558] Allow file inclusion in pg_hba and pg_ident files > > > > > ... [CF 40/3628] Teach pg_waldump to extract FPIs from the WAL stream > > > > > ... [CF 40/3646] Skip replicating the tables specified in except > > > > > table option > > > > > ... [CF 40/3663] Switching XLog source from archive to streaming when > > > > > primary available > > > > > ... [CF 40/3670] pg_rewind: warn when checkpoint hasn't happened > > > > > after promotion > > > > > ... [CF 40/3729] Testing autovacuum wraparound > > > > > ... [CF 40/3877] vacuumlo: add test to vacuumlo for test coverage > > > > > ... [CF 40/3985] TDE key management patches > > > > > > > > Looks like your script is correct, will update accordingly. > > > > > > > > > > It's possible this has been discussed before, but it seems less than > > > ideal to have notifications about moving to WOA be sent only in a bulk > > > email hanging off the "current CF" email chain as opposed to being > > > sent to the individual threads. Perhaps that's something the app > > > should do for us in this situation. Without that though the patch > > > authors are left to wade through unrelated discussion, and, probably > > > more importantly, the patch discussion thread doesn't show the current > > > state (I think bumping there is more likely to prompt activity as > > > well). > > > > FWIW I've been manually "bumping" the respective threads, which is somewhat > > time-consuming but seems to have been quite productive in terms of getting > > patches updated. > > > > Will do same for the above once I've confirmed what is being requested, > > (which I presume is adding the new tests to the 'tests' array in the > > respective > > "meson.build" file; just taking the opportunity to familiariize myself with > > it). > > Various mails since sent to the appropriate threads; I took the opportunity > to create a short wiki page: > >https://wiki.postgresql.org/wiki/Meson_for_patch_authors > > with relevant details (AFAICS) for anyone not familiar with meson; > corrections/ > improvements welcome. > > In the meantime I notice a number of patches in cfbot are currently failing on > TAP test "test_custom_rmgrs/t/001_basic.pl" in some environments. This was > added the other day in commit ae168c794f; there is already a fix for the issue > ( 36e0358e70 ) but the cfbot doesn't have that commit yet. > > Regards > > Ian Barwick
Re: Transaction timeout
At Mon, 5 Dec 2022 17:10:50 -0800, Andres Freund wrote in > I'm most concerned about the overhead when the timeouts are *not* > enabled. And this adds a branch to start_xact_command() and a function > call for get_timeout_active(TRANSACTION_TIMEOUT) in that case. On its > own, that's not a whole lot, but it does add up. There's 10+ function > calls for timeout and ps_display purposes for every single statement. That path seems like existing just for robustness. I inserted "Assert(0)" just before the disable_timeout(), but make check-world didn't fail [1]. Couldn't we get rid of that path, adding an assertion instead? I'm not sure about other timeouts yet, though. About disabling side, we cannot rely on StatementTimeout. [1] # 032_apply_delay.pl fails for me so I don't know any of the later # tests fails. > But it's definitely also worth optimizing the timeout enabled paths. And > you're right, it looks like there's a fair bit of optimization > potential. Thanks. I'll work on that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Force streaming every change in logical decoding
On Tue, Dec 6, 2022 at 7:18 PM Masahiko Sawada wrote: > > On Tue, Dec 6, 2022 at 7:29 PM Amit Kapila wrote: > > > > On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com > > wrote: > > > > > > Hi hackers, > > > > > > In logical decoding, when logical_decoding_work_mem is exceeded, the > > > changes are > > > sent to output plugin in streaming mode. But there is a restriction that > > > the > > > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC > > > to > > > allow sending every change to output plugin without waiting until > > > logical_decoding_work_mem is exceeded. > > > > > > This helps to test streaming mode. For example, to test "Avoid streaming > > > the > > > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS > > > messages. With the new option, it can be tested with fewer changes and in > > > less > > > time. Also, this new option helps to test more scenarios for "Perform > > > streaming > > > logical transactions by background workers" [2]. > > > > > > > Yeah, I think this can also help in reducing the time for various > > tests in test_decoding/stream and > > src/test/subscription/t/*_stream_*.pl file by reducing the number of > > changes required to invoke streaming mode. > > +1 > > > Can we think of making this > > GUC extendible to even test more options on server-side (publisher) > > and client-side (subscriber) cases? For example, we can have something > > like logical_replication_mode with the following valid values: (a) > > server_serialize: this will serialize each change to file on > > publishers and then on commit restore and send all changes; (b) > > server_stream: this will stream each change as currently proposed in > > your patch. Then if we want to extend it for subscriber-side testing > > then we can introduce new options like client_serialize for the case > > being discussed in the email [1]. > > Setting logical_replication_mode = 'client_serialize' implies that the > publisher behaves as server_stream? or do you mean we can set like > logical_replication_mode = 'server_stream, client_serialize'? > The latter one (logical_replication_mode = 'server_stream, client_serialize'). The idea is to cover more options with one GUC and each option can be used individually as well as in combination with others. -- With Regards, Amit Kapila.
Re: Force streaming every change in logical decoding
On Wed, Dec 7, 2022 at 8:46 AM Peter Smith wrote: > > On Tue, Dec 6, 2022 at 9:29 PM Amit Kapila wrote: > > > > On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com > > wrote: > > > > > > Hi hackers, > > > > > > In logical decoding, when logical_decoding_work_mem is exceeded, the > > > changes are > > > sent to output plugin in streaming mode. But there is a restriction that > > > the > > > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC > > > to > > > allow sending every change to output plugin without waiting until > > > logical_decoding_work_mem is exceeded. > > > > > > This helps to test streaming mode. For example, to test "Avoid streaming > > > the > > > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS > > > messages. With the new option, it can be tested with fewer changes and in > > > less > > > time. Also, this new option helps to test more scenarios for "Perform > > > streaming > > > logical transactions by background workers" [2]. > > > > > +1 > > > > > Yeah, I think this can also help in reducing the time for various > > tests in test_decoding/stream and > > src/test/subscription/t/*_stream_*.pl file by reducing the number of > > changes required to invoke streaming mode. Can we think of making this > > GUC extendible to even test more options on server-side (publisher) > > and client-side (subscriber) cases? For example, we can have something > > like logical_replication_mode with the following valid values: (a) > > server_serialize: this will serialize each change to file on > > publishers and then on commit restore and send all changes; (b) > > server_stream: this will stream each change as currently proposed in > > your patch. Then if we want to extend it for subscriber-side testing > > then we can introduce new options like client_serialize for the case > > being discussed in the email [1]. > > > > Thoughts? > > There is potential for lots of developer GUCs for testing/debugging in > the area of logical replication but IMO it might be better to keep > them all separated. Putting everything into a single > 'logical_replication_mode' might cause difficulties later when/if you > want combinations of the different modes. I think we want the developer option that forces streaming changes during logical decoding to be PGC_USERSET but probably the developer option for testing the parallel apply feature would be PGC_SIGHUP. Also, since streaming changes is not specific to logical replication but to logical decoding, I'm not sure logical_replication_XXX is a good name. IMO having force_stream_mode and a different GUC for testing the parallel apply feature makes sense to me. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: [DOCS] Stats views and functions not in order?
I'd like to "fix" this but IIUC there is no consensus yet about what order is best for patch 0001, right? -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Using WaitEventSet in the postmaster
Hi, On 2022-12-07 14:12:37 +1300, Thomas Munro wrote: > On Wed, Dec 7, 2022 at 12:12 PM Andres Freund wrote: > > On 2022-12-07 00:58:06 +1300, Thomas Munro wrote: > > > One way to fix that for the epoll version is to EPOLL_CTL_DEL and > > > EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask. > > > Tried like that in this version. Another approach would be to > > > (finally) write DeleteWaitEvent() to do the same thing at a higher > > > level... seems overkill for this. > > > > What about just recreating the WES during crash restart? > > It seems a bit like cheating but yeah that's a super simple solution, > and removes one patch from the stack. Done like that in this version. I somewhat wish we'd do that more aggressively during crash-restart, rather than the opposite. Mostly around shared memory contents though, so perhaps that's not that comparable... Greetings, Andres Freund
Re: Force streaming every change in logical decoding
On Tue, Dec 6, 2022 at 5:23 PM shiy.f...@fujitsu.com wrote: > > Hi hackers, > > In logical decoding, when logical_decoding_work_mem is exceeded, the changes > are > sent to output plugin in streaming mode. But there is a restriction that the > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to > allow sending every change to output plugin without waiting until > logical_decoding_work_mem is exceeded. > Some review comments for patch v1-0001. 1. Typos In several places "Wheather/wheather" -> "Whether/whether" == src/backend/replication/logical/reorderbuffer.c 2. ReorderBufferCheckMemoryLimit { ReorderBufferTXN *txn; - /* bail out if we haven't exceeded the memory limit */ - if (rb->size < logical_decoding_work_mem * 1024L) + /* + * Stream the changes immediately if force_stream_mode is on and the output + * plugin supports streaming. Otherwise wait until size exceeds + * logical_decoding_work_mem. + */ + bool force_stream = (force_stream_mode && ReorderBufferCanStream(rb)); + + /* bail out if force_stream is false and we haven't exceeded the memory limit */ + if (!force_stream && rb->size < logical_decoding_work_mem * 1024L) return; /* - * Loop until we reach under the memory limit. One might think that just - * by evicting the largest (sub)transaction we will come under the memory - * limit based on assumption that the selected transaction is at least as - * large as the most recent change (which caused us to go over the memory - * limit). However, that is not true because a user can reduce the + * If force_stream is true, loop until there's no change. Otherwise, loop + * until we reach under the memory limit. One might think that just by + * evicting the largest (sub)transaction we will come under the memory limit + * based on assumption that the selected transaction is at least as large as + * the most recent change (which caused us to go over the memory limit). + * However, that is not true because a user can reduce the * logical_decoding_work_mem to a smaller value before the most recent * change. */ - while (rb->size >= logical_decoding_work_mem * 1024L) + while ((!force_stream && rb->size >= logical_decoding_work_mem * 1024L) || +(force_stream && rb->size > 0)) { /* * Pick the largest transaction (or subtransaction) and evict it from IIUC this logic can be simplified quite a lot just by shifting that "bail out" condition into the loop. Something like: while (true) { if (!(force_stream && rb->size > 0 || rb->size < logical_decoding_work_mem * 1024L)) break; ... } -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Using WaitEventSet in the postmaster
On Wed, Dec 7, 2022 at 2:08 PM Justin Pryzby wrote: > > + /* > > + * It would probably be safe to re-use the inherited signalfd > > since > > + * signalfds only see the current processes pending signals, > > but it > > I think you mean "current process's", right ? Fixed in v5, thanks.
Re: Using WaitEventSet in the postmaster
On Wed, Dec 7, 2022 at 12:12 PM Andres Freund wrote: > On 2022-12-07 00:58:06 +1300, Thomas Munro wrote: > > One way to fix that for the epoll version is to EPOLL_CTL_DEL and > > EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask. > > Tried like that in this version. Another approach would be to > > (finally) write DeleteWaitEvent() to do the same thing at a higher > > level... seems overkill for this. > > What about just recreating the WES during crash restart? It seems a bit like cheating but yeah that's a super simple solution, and removes one patch from the stack. Done like that in this version. > > > This seems to hardcode the specific wait events we're waiting for based on > > > latch.c infrastructure. Not really convinced that's a good idea. > > > > What are you objecting to? The assumption that the first socket is at > > position 1? The use of GetNumRegisteredWaitEvents()? > > The latter. Removed. From 07b04dc410118ad04fd0006edda7ba80f241357a Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 6 Dec 2022 15:21:11 +1300 Subject: [PATCH v5 1/4] Add WL_SOCKET_ACCEPT event to WaitEventSet API. To be able to handle incoming connections on a server socket with the WaitEventSet API, we'll need a new kind of event to indicate that the the socket is ready to accept a connection. On Unix, it's just the same as WL_SOCKET_READABLE, but on Windows there is a different kernel event that we need to map our abstraction to. A future commit will use this. Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CA%2BhUKG%2BZ-HpOj1JsO9eWUP%2Bar7npSVinsC_npxSy%2BjdOMsx%3DGg%40mail.gmail.com --- src/backend/storage/ipc/latch.c | 13 - src/include/storage/latch.h | 7 +++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index eb3a569aae..7ced8264f0 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -864,6 +864,9 @@ FreeWaitEventSet(WaitEventSet *set) * - WL_SOCKET_CONNECTED: Wait for socket connection to be established, * can be combined with other WL_SOCKET_* events (on non-Windows * platforms, this is the same as WL_SOCKET_WRITEABLE) + * - WL_SOCKET_ACCEPT: Wait for new connection to a server socket, + * can be combined with other WL_SOCKET_* events (on non-Windows + * platforms, this is the same as WL_SOCKET_READABLE) * - WL_SOCKET_CLOSED: Wait for socket to be closed by remote peer. * - WL_EXIT_ON_PM_DEATH: Exit immediately if the postmaster dies * @@ -874,7 +877,7 @@ FreeWaitEventSet(WaitEventSet *set) * i.e. it must be a process-local latch initialized with InitLatch, or a * shared latch associated with the current process by calling OwnLatch. * - * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED cases, EOF and error + * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED/ACCEPT cases, EOF and error * conditions cause the socket to be reported as readable/writable/connected, * so that the caller can deal with the condition. * @@ -1312,6 +1315,8 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event) flags |= FD_WRITE; if (event->events & WL_SOCKET_CONNECTED) flags |= FD_CONNECT; + if (event->events & WL_SOCKET_ACCEPT) + flags |= FD_ACCEPT; if (*handle == WSA_INVALID_EVENT) { @@ -2067,6 +2072,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, /* connected */ occurred_events->events |= WL_SOCKET_CONNECTED; } + if ((cur_event->events & WL_SOCKET_ACCEPT) && + (resEvents.lNetworkEvents & FD_ACCEPT)) + { + /* incoming connection could be accepted */ + occurred_events->events |= WL_SOCKET_ACCEPT; + } if (resEvents.lNetworkEvents & FD_CLOSE) { /* EOF/error, so signal all caller-requested socket flags */ diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index 68ab740f16..c55838db60 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -135,9 +135,16 @@ typedef struct Latch #define WL_SOCKET_CONNECTED WL_SOCKET_WRITEABLE #endif #define WL_SOCKET_CLOSED (1 << 7) +#ifdef WIN32 +#define WL_SOCKET_ACCEPT (1 << 8) +#else +/* avoid having to deal with case on platforms not requiring it */ +#define WL_SOCKET_ACCEPT WL_SOCKET_READABLE +#endif #define WL_SOCKET_MASK (WL_SOCKET_READABLE | \ WL_SOCKET_WRITEABLE | \ WL_SOCKET_CONNECTED | \ + WL_SOCKET_ACCEPT | \ WL_SOCKET_CLOSED) typedef struct WaitEvent -- 2.35.1 From 827866959dbbe537f6677271093f6d7730bd2527 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 6 Dec 2022 16:13:36 +1300 Subject: [PATCH v5 2/4] Don't leak a signalfd when using latches in the postmaster. At the time of commit 6a2a70a02 we didn't use latch infrastructure in the postmaster. We're planning to start doing that, so we'd better make sure that the signalfd inherited from a postmaster is not duplicated and then leaked in the child. Reviewed-by: Andres Freund
Re: Using WaitEventSet in the postmaster
> From 61480441f67ca7fac96ca4bcfe85f27013a47aa8 Mon Sep 17 00:00:00 2001 > From: Thomas Munro > Date: Tue, 6 Dec 2022 16:13:36 +1300 > Subject: [PATCH v4 2/5] Don't leak a signalfd when using latches in the > postmaster. > > + /* > + * It would probably be safe to re-use the inherited signalfd > since > + * signalfds only see the current processes pending signals, > but it I think you mean "current process's", right ?
Re: Query Jumbling for CALL and SET utility statements
On Tue, Oct 11, 2022 at 11:54:51AM +0900, Michael Paquier wrote: > On Mon, Oct 10, 2022 at 09:16:47PM +0800, Julien Rouhaud wrote: >> Unless I'm missing something both can be handled using pg_node_attr() >> annotations that already exists? > > Indeed, that should work for the locators. My mistake here. When it comes to the locations to silence in the normalization, we already rely on the variable name "location" of type int in gen_node_support.pl, so we can just do the same for the code generating the jumbling. This stuff still needs two more pg_node_attr() to be able to work: one to ignore a full Node in the jumbling and a second that can be used on a per-field basis. Once this is in place, automating the generation of the code is not that complicated, most of the work is to get around placing the pg_node_attr() so as the jumbling gets things right. The number of fields to mark as things to ignore depends on the Node type (heavy for Const, for example), but I'd like to think that a "ignore" approach is better than an "include" approach so as new fields would be considered by default in the jumble compilation. I have not looked at all the edge cases, so perhaps more attrs would be needed.. -- Michael signature.asc Description: PGP signature
Collation DDL inconsistencies
When I looked at the bug: https://postgr.es/m/CALDQics_oBEYfOnu_zH6yw9WR1waPCmcrqxQ8+39hK3Op=z...@mail.gmail.com I noticed that the DDL around collations is inconsistent. For instance, CREATE COLLATION[1] uses LOCALE, LC_COLLATE, and LC_CTYPE parameters to specify either libc locales or an icu locale; whereas CREATE DATABASE[2] uses LOCALE, LC_COLLATE, and LC_CTYPE always for libc, and ICU_LOCALE if the default collation is ICU. The catalog representation is strange in a different way: datcollate/collcollate are always for libc, and daticulocale is for icu. That means anything that deals with those fields needs to pick the right one based on the provider. If this were a clean slate, it would make more sense if it were something like: datcollate/collcollate: to instantiate pg_locale_t datctype/collctype: to instantiate pg_locale_t datlibccollate: used by libc elsewhere datlibcctype: used by libc elsewhere daticulocale/colliculocale: remove these fields That way, if you are instantiating a pg_locale_t, you always just pass datcollate/datctype/collcollate/collctype, regardless of the provider (pg_newlocale_from_collation() would figure it out). And if you are going to do something straight with libc, you always use datlibccollate/datlibcctype. Aside: why don't we support different collate/ctype with ICU? It appears that u_strToTitle/u_strToUpper/u_strToLower just accept a string "locale", and it would be easy enough to pass it whatever is in datctype/collctype, right? We should validate that it's a valid locale; but other than that, I don't see the problem. Thoughts? Implementation-wise, I suppose this could create some annoyances in pg_dump. [1] https://www.postgresql.org/docs/devel/sql-createcollation.html [2] https://www.postgresql.org/docs/devel/sql-createdatabase.html [3] https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ustring_8h.html -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: Generate pg_stat_get_* functions with Macros
On Tue, Dec 06, 2022 at 05:28:47AM +0100, Drouvot, Bertrand wrote: > Fields renaming was mandatory in the previous ones as there was > already a mix of with/without "n_" in the existing fields names. > > That said, I think it's better to rename the fields as you did (to > be "consistent" on the naming between relation/db stats), so the > patch LGTM. Yeah, PgStat_StatDBEntry is the last one using this style, so I have kept my change with the variables renamed rather than painting more CppConcat()s. The functions are still named the same as the original ones. -- Michael signature.asc Description: PGP signature
Re: Force streaming every change in logical decoding
On Tue, Dec 6, 2022 at 9:29 PM Amit Kapila wrote: > > On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com > wrote: > > > > Hi hackers, > > > > In logical decoding, when logical_decoding_work_mem is exceeded, the > > changes are > > sent to output plugin in streaming mode. But there is a restriction that the > > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to > > allow sending every change to output plugin without waiting until > > logical_decoding_work_mem is exceeded. > > > > This helps to test streaming mode. For example, to test "Avoid streaming the > > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS > > messages. With the new option, it can be tested with fewer changes and in > > less > > time. Also, this new option helps to test more scenarios for "Perform > > streaming > > logical transactions by background workers" [2]. > > +1 > > Yeah, I think this can also help in reducing the time for various > tests in test_decoding/stream and > src/test/subscription/t/*_stream_*.pl file by reducing the number of > changes required to invoke streaming mode. Can we think of making this > GUC extendible to even test more options on server-side (publisher) > and client-side (subscriber) cases? For example, we can have something > like logical_replication_mode with the following valid values: (a) > server_serialize: this will serialize each change to file on > publishers and then on commit restore and send all changes; (b) > server_stream: this will stream each change as currently proposed in > your patch. Then if we want to extend it for subscriber-side testing > then we can introduce new options like client_serialize for the case > being discussed in the email [1]. > > Thoughts? There is potential for lots of developer GUCs for testing/debugging in the area of logical replication but IMO it might be better to keep them all separated. Putting everything into a single 'logical_replication_mode' might cause difficulties later when/if you want combinations of the different modes. For example, instead of logical_replication_mode = XXX/YYY/ZZZ maybe something like below will give more flexibility. logical_replication_dev_XXX = true/false logical_replication_dev_YYY = true/false logical_replication_dev_ZZZ = true/false -- Kind Regards, Peter Smith. Fujitsu Australia
remove extra space from dumped ALTER DEFAULT PRIVILEGES commands
Hi hackers, pg_dump adds an extra space in ALTER DEFAULT PRIVILEGES commands. AFAICT it's been this way since the command was first introduced (249724c). I've attached a patch to fix it. This is admittedly just a pet peeve, but maybe it is bothering others, too. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c index 9311417f18..694a28f532 100644 --- a/src/bin/pg_dump/dumputils.c +++ b/src/bin/pg_dump/dumputils.c @@ -184,7 +184,9 @@ buildACLCommands(const char *name, const char *subname, const char *nspname, prefix, privs->data, type); if (nspname && *nspname) appendPQExpBuffer(firstsql, "%s.", fmtId(nspname)); - appendPQExpBuffer(firstsql, "%s FROM ", name); + if (name && *name) +appendPQExpBuffer(firstsql, "%s ", name); + appendPQExpBufferStr(firstsql, "FROM "); if (grantee->len == 0) appendPQExpBufferStr(firstsql, "PUBLIC;\n"); else @@ -253,7 +255,9 @@ buildACLCommands(const char *name, const char *subname, const char *nspname, prefix, privs->data, type); if (nspname && *nspname) appendPQExpBuffer(thissql, "%s.", fmtId(nspname)); - appendPQExpBuffer(thissql, "%s TO ", name); + if (name && *name) + appendPQExpBuffer(thissql, "%s ", name); + appendPQExpBufferStr(thissql, "TO "); if (grantee->len == 0) appendPQExpBufferStr(thissql, "PUBLIC;\n"); else @@ -265,7 +269,9 @@ buildACLCommands(const char *name, const char *subname, const char *nspname, prefix, privswgo->data, type); if (nspname && *nspname) appendPQExpBuffer(thissql, "%s.", fmtId(nspname)); - appendPQExpBuffer(thissql, "%s TO ", name); + if (name && *name) + appendPQExpBuffer(thissql, "%s ", name); + appendPQExpBufferStr(thissql, "TO "); if (grantee->len == 0) appendPQExpBufferStr(thissql, "PUBLIC"); else diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl index 6656222363..4732ee2e4a 100644 --- a/src/bin/pg_dump/t/002_pg_dump.pl +++ b/src/bin/pg_dump/t/002_pg_dump.pl @@ -563,7 +563,7 @@ my %tests = ( regexp => qr/^ \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role IN SCHEMA dump_test \E - \QGRANT SELECT ON TABLES TO regress_dump_test_role;\E + \QGRANT SELECT ON TABLES TO regress_dump_test_role;\E /xm, like => { %full_runs, %dump_test_schema_runs, section_post_data => 1, }, @@ -582,7 +582,7 @@ my %tests = ( regexp => qr/^ \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role IN SCHEMA dump_test \E - \QGRANT ALL ON FUNCTIONS TO regress_dump_test_role;\E + \QGRANT ALL ON FUNCTIONS TO regress_dump_test_role;\E /xm, like => { %full_runs, %dump_test_schema_runs, section_post_data => 1, }, @@ -600,7 +600,7 @@ my %tests = ( regexp => qr/^ \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role \E - \QREVOKE ALL ON FUNCTIONS FROM PUBLIC;\E + \QREVOKE ALL ON FUNCTIONS FROM PUBLIC;\E /xm, like => { %full_runs, section_post_data => 1, }, unlike => { no_privs => 1, }, @@ -615,10 +615,10 @@ my %tests = ( regexp => qr/^ \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role \E - \QREVOKE ALL ON TABLES FROM regress_dump_test_role;\E\n + \QREVOKE ALL ON TABLES FROM regress_dump_test_role;\E\n \QALTER DEFAULT PRIVILEGES \E \QFOR ROLE regress_dump_test_role \E - \QGRANT INSERT,REFERENCES,DELETE,TRIGGER,TRUNCATE,VACUUM,ANALYZE,UPDATE ON TABLES TO regress_dump_test_role;\E + \QGRANT INSERT,REFERENCES,DELETE,TRIGGER,TRUNCATE,VACUUM,ANALYZE,UPDATE ON TABLES TO regress_dump_test_role;\E /xm, like => { %full_runs, section_post_data => 1, }, unlike => { no_privs => 1, },
Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row
Hi, On 2022-11-25 02:45:01 +0500, Hamid Akhtar wrote: > Rebasing the patch to the tip of the master branch. This doesn't pass tests on cfbot. Looks like possibly some files are missing? https://api.cirrus-ci.com/v1/artifact/task/4916614353649664/testrun/build/testrun/pageinspect/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/contrib/pageinspect/expected/page.out /tmp/cirrus-ci-build/build/testrun/pageinspect/regress/results/page.out --- /tmp/cirrus-ci-build/contrib/pageinspect/expected/page.out 2022-12-06 20:07:47.691479000 + +++ /tmp/cirrus-ci-build/build/testrun/pageinspect/regress/results/page.out 2022-12-06 20:11:42.955606000 + @@ -1,4 +1,5 @@ CREATE EXTENSION pageinspect; +ERROR: extension "pageinspect" has no installation script nor update path for version "1.12" -- Use a temp table so that effects of VACUUM are predictable CREATE TEMP TABLE test1 (a int, b int); INSERT INTO test1 VALUES (16777217, 131584); @@ -6,236 +7,203 @@ Greetings, Andres Freund
Re: Using WaitEventSet in the postmaster
On 2022-12-07 00:58:06 +1300, Thomas Munro wrote: > One way to fix that for the epoll version is to EPOLL_CTL_DEL and > EPOLL_CTL_ADD, whenever transitioning to/from a zero event mask. > Tried like that in this version. Another approach would be to > (finally) write DeleteWaitEvent() to do the same thing at a higher > level... seems overkill for this. What about just recreating the WES during crash restart? > > This seems to hardcode the specific wait events we're waiting for based on > > latch.c infrastructure. Not really convinced that's a good idea. > > What are you objecting to? The assumption that the first socket is at > position 1? The use of GetNumRegisteredWaitEvents()? The latter.
Re: Allow placeholders in ALTER ROLE w/o superuser
Hi, Alexander! On Tue, 6 Dec 2022 at 19:01, Alexander Korotkov wrote: > > On Mon, Dec 5, 2022 at 10:32 PM Alexander Korotkov > wrote: > > On Mon, Dec 5, 2022 at 8:18 PM Tom Lane wrote: > > > Alvaro Herrera writes: > > > > I couldn't find any discussion of the idea of adding "(s)" to the > > > > variable name in order to mark the variable userset in the catalog, and > > > > I have to admit I find it a bit strange. Are we really agreed that > > > > that's the way to proceed? > > > > > > I hadn't been paying close attention to this thread, sorry. > > > > > > I agree that that seems like a very regrettable choice, > > > especially if you anticipate having to bump catversion anyway. > > > > I totally understand that this change requires a catversion bump. > > I've reflected this in the commit message. > > > > > Better to add a bool column to the catalog. > > > > What about adding a boolean array to the pg_db_role_setting? So, > > pg_db_role_setting would have the following columns. > > * setdatabase oid > > * setrole oid > > * setconfig text[] > > * setuser bool[] > > The revised patch implements this way for storage USER SET flag. > think it really became more structured and less cumbersome. I agree that the patch became more structured and the complications for string parameter suffixing have gone away. I've looked it through and don't see problems with it. The only two-lines fix regarding variable initializing may be relevant (see v9). Tests pass and CI is also happy with it. I'd like to set it ready for committer if no objections. Regards, Pavel Borisov, Supabase. v9-0001-Add-USER-SET-parameter-values-for-pg_db_role_sett.patch Description: Binary data
core dumps generated in archive / restore commands etc
Hi, Occasionally I see core dumps for sh, cp etc when running the tests. I think this is mainly due to immediate shutdowns / crashes signalling the entire process group with SIGQUIT. If a sh/cp/... is running as part of an archive/restore command when the signal arrives, we'll trigger a coredump, because those tools won't have a SIGQUIT handler. ISTM that postmaster's signal_child() shouldn't send SIGQUIT to the process group in the #ifdef HAVE_SETSID section. We've already signalled the backend with SIGQUIT, so we could change the signal we send to the whole process group to one that doesn't trigger core dumps by default. SIGTERM seems like it would be the right choice. The one non-trivial aspect of this is that that signal will also be delivered to the group leader. It's possible that that could lead to some minor test behaviour issues, because the output could change if e.g. SIGTERM is received / processed first. Greetings, Andres Freund
Re: New strategies for freezing, advancing relfrozenxid early
On Tue, Dec 6, 2022 at 10:42 AM Andres Freund wrote: > The docs don't build: > https://cirrus-ci.com/task/5456939761532928 > [20:00:58.203] postgres.sgml:52: element link: validity error : IDREF > attribute linkend references an unknown ID "vacuum-for-wraparound" Thanks for pointing this out. FWIW it is a result of Bruce's recent addition of the transaction processing chapter to the docs. My intention is to post v9 later in the week, which will fix the doc build, and a lot more besides that. If you are planning on doing another round of review, I'd suggest that you hold off until then. v9 will have structural improvements that will likely make it easier to understand all the steps leading up to removing aggressive mode completely. It'll be easier to relate each local step/patch to the bigger picture for VACUUM. v9 will also address some of the concerns you raised in your review that weren't covered by v8, especially about the VM snapshotting infrastructure. But also your concerns about the transition from lazy strategies to eager strategies. The "catch up freezing" performed by the first VACUUM operation run against a table that just exceeded the GUC-controlled table size threshold will have far more limited impact, because the burden of freezing will be spread out across multiple VACUUM operations. The big idea behind the patch series is to relieve users from having to think about a special type of VACUUM that has to do much more freezing than other VACUUMs that ran against the same table in the recent past, of course, so it is important to avoid accidentally allowing any behavior that looks kind of like the ghost of aggressive VACUUM. -- Peter Geoghegan
Re: wake up logical workers after ALTER SUBSCRIPTION
On Tue, Dec 06, 2022 at 11:25:51AM -0800, Nathan Bossart wrote: > On Tue, Dec 06, 2022 at 07:44:46PM +0300, Melih Mutlu wrote: >> - When the state is SYNCDONE and the apply worker has to wake up to change >> the state to READY. >> >> I think we already call logicalrep_worker_wakeup_ptr wherever it's needed >> for the above cases? What am I missing here? > > IIUC we must restart all the apply workers for a subscription to enable > two_phase mode. It looks like finish_sync_worker() only wakes up its own > apply worker. I moved this logic to where the sync worker marks the state > as SYNCDONE and added a check that two_phase mode is pending. Even so, > there can still be unnecessary wakeups, but this adjustment should limit > them. Actually, that's not quite right. The sync worker will wake up the apply worker to change the state from SYNCDONE to READY. AllTablesyncsReady() checks that all tables are READY, so we need to wake up all the workers when an apply worker changes the state to READY. Each worker will then evaluate whether to restart for two_phase mode. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 1ca15608be05229b3349ba5840abceebb1497fe1 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Nov 2022 16:01:01 -0800 Subject: [PATCH v7 1/1] wake up logical workers as needed instead of relying on periodic wakeups --- src/backend/access/transam/xact.c | 3 ++ src/backend/commands/alter.c| 7 src/backend/commands/subscriptioncmds.c | 4 ++ src/backend/replication/logical/tablesync.c | 8 src/backend/replication/logical/worker.c| 46 + src/include/replication/logicalworker.h | 3 ++ 6 files changed, 71 insertions(+) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 8086b857b9..dc00e66cfb 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -47,6 +47,7 @@ #include "pgstat.h" #include "replication/logical.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/snapbuild.h" #include "replication/syncrep.h" @@ -2360,6 +2361,7 @@ CommitTransaction(void) AtEOXact_PgStat(true, is_parallel_worker); AtEOXact_Snapshot(true, false); AtEOXact_ApplyLauncher(true); + AtEOXact_LogicalRepWorkers(true); pgstat_report_xact_timestamp(0); CurrentResourceOwner = NULL; @@ -2860,6 +2862,7 @@ AbortTransaction(void) AtEOXact_HashTables(false); AtEOXact_PgStat(false, is_parallel_worker); AtEOXact_ApplyLauncher(false); + AtEOXact_LogicalRepWorkers(false); pgstat_report_xact_timestamp(0); } diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index 10b6fe19a2..d095cd3ced 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -59,6 +59,7 @@ #include "commands/user.h" #include "miscadmin.h" #include "parser/parse_func.h" +#include "replication/logicalworker.h" #include "rewrite/rewriteDefine.h" #include "tcop/utility.h" #include "utils/builtins.h" @@ -279,6 +280,12 @@ AlterObjectRename_internal(Relation rel, Oid objectId, const char *new_name) if (strncmp(new_name, "regress_", 8) != 0) elog(WARNING, "subscriptions created by regression test cases should have names starting with \"regress_\""); #endif + + /* + * Wake up the logical replication workers to handle this change + * quickly. + */ + LogicalRepWorkersWakeupAtCommit(objectId); } else if (nameCacheId >= 0) { diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index d673557ea4..d6993c26e5 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -34,6 +34,7 @@ #include "nodes/makefuncs.h" #include "pgstat.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/slot.h" #include "replication/walreceiver.h" @@ -1362,6 +1363,9 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, InvokeObjectPostAlterHook(SubscriptionRelationId, subid, 0); + /* Wake up the logical replication workers to handle this change quickly. */ + LogicalRepWorkersWakeupAtCommit(subid); + return myself; } diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 94e813ac53..e253db371a 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -105,6 +105,7 @@ #include "pgstat.h" #include "replication/logicallauncher.h" #include "replication/logicalrelation.h" +#include "replication/logicalworker.h" #include "replication/walreceiver.h" #include "replication/worker_internal.h" #include "replication/slot.h" @@ -517,6 +518,13 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
Re: [PATCH] Infinite loop while acquiring new TOAST Oid
Hi hackers! Here's some update on the subject - I've made a branch based on master from 28/11 and introduced 64-bit TOAST value ID there. It is not complete now but is already working and has some features: - extended structure for TOAST pointer (varatt_long_external) with 64-bit TOAST value ID; - individual ID counters for each TOAST table, being cached for faster access and lookup of maximum TOAST ID in use after server restart. https://github.com/postgrespro/postgres/tree/toast_64bit -- Regards, Nikita Malakhov Postgres Professional https://postgrespro.ru/
Re: Cygwin cleanup
On Fri, Jul 29, 2022 at 10:57 AM Thomas Munro wrote: > I wonder if these problems would go away as a nice incidental > side-effect if we used latches for postmaster wakeups. I don't > know... maybe, if the problem is just with the postmaster's pattern of > blocking/unblocking? Maybe backend startup is simple enough that it > doesn't hit the bug? From a quick glance, I think the assertion > failures that occur in regular backends can possibly be blamed on the > postmaster getting confused about its children due to unexpected > handler re-entry. Just to connect the dots, that's what this patch does: https://www.postgresql.org/message-id/flat/CA+hUKG+Z-HpOj1JsO9eWUP+ar7npSVinsC_npxSy+jdOMsx=g...@mail.gmail.com (There may be other places that break under Cygwin's flaky sa_mask implementation, I don't know and haven't seen any clues about that.)
Re: Error-safe user functions
Robert Haas writes: > I feel like this can go either way. If we pick a name that conveys a > specific intended behavior now, and then later we want to pass some > other sort of node for some purpose other than ignoring errors, it's > unpleasant to have a name that sounds like it can only ignore errors. > But if we never use it for anything other than ignoring errors, a > specific name is clearer. With Andres' proposal to make the function return boolean succeed/fail, I think it's pretty clear that the only useful case is to pass an ErrorSaveContext. There may well be future APIs that pass some other kind of context object to input functions, but they'll presumably have different goals and want a different sort of wrapper function. regards, tom lane
Re: Error-safe user functions
OK, here's a v3 responding to the comments from Andres. is preliminary refactoring of elog.c, with (I trust) no functional effect. It gets rid of some pre-existing code duplication as well as setting up to let 0001's additions be less duplicative. 0001 adopts use of Node pointers in place of "void *". To do this I needed an alias type in elog.h equivalent to fmgr.h's fmNodePtr. I decided that having two different aliases would be too confusing, so what I did here was to converge both elog.h and fmgr.h on using the same alias "typedef struct Node *NodePtr". That has to be in elog.h since it's included first, from postgres.h. (I thought of defining NodePtr in postgres.h, but postgres.h includes elog.h immediately so that wouldn't have looked very nice.) I also adopted Andres' recommendation that InputFunctionCallSafe return boolean. I'm still not totally sold on that ... but it does end with array_in and record_in never using SAFE_ERROR_OCCURRED at all, so maybe the idea's OK. 0002 adjusts the I/O functions for these API changes, and fixes my silly oversight about error cleanup in record_in. Given the discussion about testing requirements, I threw away the COPY hack entirely. This 0003 provides a couple of SQL-callable functions that can be used to invoke a specific datatype's input function. I haven't documented them, pending bikeshedding on names etc. I also arranged to test array_in and record_in with a datatype that still throws errors, reserving the existing test type "widget" for that purpose. (I'm not intending to foreclose development of new COPY features in this area, just abandoning the idea that that's our initial test mechanism.) Thoughts? regards, tom lane diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 2585e24845..f5cd1b7493 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -176,8 +176,15 @@ static char formatted_log_time[FORMATTED_TS_LEN]; static const char *err_gettext(const char *str) pg_attribute_format_arg(1); +static ErrorData *get_error_stack_entry(void); +static void set_stack_entry_domain(ErrorData *edata, const char *domain); +static void set_stack_entry_location(ErrorData *edata, + const char *filename, int lineno, + const char *funcname); +static bool matches_backtrace_functions(const char *funcname); static pg_noinline void set_backtrace(ErrorData *edata, int num_skip); static void set_errdata_field(MemoryContextData *cxt, char **ptr, const char *str); +static void FreeErrorDataContents(ErrorData *edata); static void write_console(const char *line, int len); static const char *process_log_prefix_padding(const char *p, int *ppadding); static void log_line_prefix(StringInfo buf, ErrorData *edata); @@ -434,27 +441,13 @@ errstart(int elevel, const char *domain) debug_query_string = NULL; } } - if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE) - { - /* - * Wups, stack not big enough. We treat this as a PANIC condition - * because it suggests an infinite loop of errors during error - * recovery. - */ - errordata_stack_depth = -1; /* make room on stack */ - ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE exceeded"))); - } /* Initialize data for this error frame */ - edata = [errordata_stack_depth]; - MemSet(edata, 0, sizeof(ErrorData)); + edata = get_error_stack_entry(); edata->elevel = elevel; edata->output_to_server = output_to_server; edata->output_to_client = output_to_client; - /* the default text domain is the backend's */ - edata->domain = domain ? domain : PG_TEXTDOMAIN("postgres"); - /* initialize context_domain the same way (see set_errcontext_domain()) */ - edata->context_domain = edata->domain; + set_stack_entry_domain(edata, domain); /* Select default errcode based on elevel */ if (elevel >= ERROR) edata->sqlerrcode = ERRCODE_INTERNAL_ERROR; @@ -462,8 +455,6 @@ errstart(int elevel, const char *domain) edata->sqlerrcode = ERRCODE_WARNING; else edata->sqlerrcode = ERRCODE_SUCCESSFUL_COMPLETION; - /* errno is saved here so that error parameter eval can't change it */ - edata->saved_errno = errno; /* * Any allocations for this error state level should go into ErrorContext @@ -474,32 +465,6 @@ errstart(int elevel, const char *domain) return true; } -/* - * Checks whether the given funcname matches backtrace_functions; see - * check_backtrace_functions. - */ -static bool -matches_backtrace_functions(const char *funcname) -{ - char *p; - - if (!backtrace_symbol_list || funcname == NULL || funcname[0] == '\0') - return false; - - p = backtrace_symbol_list; - for (;;) - { - if (*p == '\0') /* end of backtrace_symbol_list */ - break; - - if (strcmp(funcname, p) == 0) - return true; - p += strlen(p) + 1; - } - - return false; -} - /* * errfinish --- end an error-reporting cycle * @@ -520,23 +485,7 @@ errfinish(const char *filename, int lineno, const char
Re: Temporary tables versus wraparound... again
Hi, On 2022-12-06 14:50:34 -0500, Greg Stark wrote: > On Tue, 6 Dec 2022 at 13:59, Andres Freund wrote: > > On 2022-12-06 13:47:39 -0500, Greg Stark wrote: > > > So I talked about this patch with Ronan Dunklau and he had a good > > > question Why are we maintaining relfrozenxid and relminmxid in > > > pg_class for temporary tables at all? Autovacuum can't use them and > > > other sessions won't care about them. The only session that might care > > > about them is the one attached to the temp schema. > > > > Uh, without relfrozenxid for temp tables we can end up truncating clog > > "ranges" away that are required to access the temp tables. So this would > > basically mean that temp tables can't be used reliably anymore. > > True, we would have to have some other mechanism for exporting the > frozenxid that the session needs. Presumably that would be something > in PGProc like the xmin and other numbers. It could be updated by > scanning our local hash table whenever a transaction starts. That'd be a fair bit of new mechanism. Not at all impossible, but I'm doubtful the complexity is worth it. In your patch the relevant catalog change is an inplace change and thus doesn't cause bloat. And if we have to retain the clog, I don't see that much benefit in the proposed approach. > This also probably is what would be needed for multixacts I guess? Yes. Greetings, Andres Freund
Re: Temporary tables versus wraparound... again
On Tue, 6 Dec 2022 at 13:59, Andres Freund wrote: > > Hi, > > On 2022-12-06 13:47:39 -0500, Greg Stark wrote: > > So I talked about this patch with Ronan Dunklau and he had a good > > question Why are we maintaining relfrozenxid and relminmxid in > > pg_class for temporary tables at all? Autovacuum can't use them and > > other sessions won't care about them. The only session that might care > > about them is the one attached to the temp schema. > > Uh, without relfrozenxid for temp tables we can end up truncating clog > "ranges" away that are required to access the temp tables. So this would > basically mean that temp tables can't be used reliably anymore. True, we would have to have some other mechanism for exporting the frozenxid that the session needs. Presumably that would be something in PGProc like the xmin and other numbers. It could be updated by scanning our local hash table whenever a transaction starts. This also probably is what would be needed for multixacts I guess? -- greg
Re: predefined role(s) for VACUUM and ANALYZE
On Tue, Dec 06, 2022 at 11:47:50AM +, Dagfinn Ilmari Mannsåker wrote: > These checks are getting rather repetitive, how about a data-driven > approach, along the lines of the below patch? I'm not quite happy with > the naming of the struct and its members (and maybe it should be in a > header?), suggestions welcome. +1. I wonder if we should also consider checking all the bits at once before we start checking for the predefined roles. I'm thinking of something a bit like this: role_mask = ACL_SELECT | ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_VACUUM | ACL_ANALYZE; if (mask & role_mask != result & role_mask) { ... existing checks here ... } I'm skeptical this actually produces any measurable benefit, but presumably the predefined roles list will continue to grow, so maybe it's still worth adding a fast path. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: predefined role(s) for VACUUM and ANALYZE
On Tue, Dec 06, 2022 at 04:57:37PM +0300, Pavel Luzanov wrote: > On 06.12.2022 03:04, Nathan Bossart wrote: >> I wonder why \dpS wasn't added. I wrote up a patch to add it and the >> corresponding documentation that other meta-commands already have. > > Yes, \dpS command and clarification in the documentation is exactly what is > needed. I created a new thread for this: https://postgr.es/m/20221206193606.GB3078082%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: 16: Collation versioning and dependency helpers
On Tue, 2022-12-06 at 10:53 -0800, Andres Freund wrote: > Perhaps worth posting a new version? Or are the remaining patches > abandoned in > favor of the other threads? Marked what is there as committed, and the remainder is abandoned in favor of other threads. Thanks, Jeff Davis
add \dpS to psql
Hi hackers, As discussed elsewhere [0], \dp doesn't show privileges on system objects, and this behavior is not mentioned in the docs. I've attached a small patch that adds support for the S modifier (i.e., \dpS) and the adjusts the docs. Thoughts? [0] https://postgr.es/m/a2382acd-e465-85b2-9d8e-f9ed1a5a66e9%40postgrespro.ru -- Nathan Bossart Amazon Web Services: https://aws.amazon.com diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index d3dd638b14..406936dd1c 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1825,14 +1825,16 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g -\dp [ pattern ] +\dp[S] [ pattern ] Lists tables, views and sequences with their associated access privileges. If pattern is specified, only tables, views and sequences whose names match the -pattern are listed. +pattern are listed. By default only user-created objects are shown; +supply a pattern or the S modifier to include system +objects. diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index de6a3a71f8..3520655dc0 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -875,7 +875,7 @@ exec_command_d(PsqlScanState scan_state, bool active_branch, const char *cmd) success = listCollations(pattern, show_verbose, show_system); break; case 'p': -success = permissionsList(pattern); +success = permissionsList(pattern, show_system); break; case 'P': { @@ -2831,7 +2831,7 @@ exec_command_z(PsqlScanState scan_state, bool active_branch) char *pattern = psql_scan_slash_option(scan_state, OT_NORMAL, NULL, true); - success = permissionsList(pattern); + success = permissionsList(pattern, false); free(pattern); } else diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 2eae519b1d..eb98797d67 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -1002,7 +1002,7 @@ listAllDbs(const char *pattern, bool verbose) * \z (now also \dp -- perhaps more mnemonic) */ bool -permissionsList(const char *pattern) +permissionsList(const char *pattern, bool showSystem) { PQExpBufferData buf; PGresult *res; @@ -1121,15 +1121,12 @@ permissionsList(const char *pattern) CppAsString2(RELKIND_FOREIGN_TABLE) "," CppAsString2(RELKIND_PARTITIONED_TABLE) ")\n"); - /* - * Unless a schema pattern is specified, we suppress system and temp - * tables, since they normally aren't very interesting from a permissions - * point of view. You can see 'em by explicit request though, eg with \z - * pg_catalog.* - */ + if (!showSystem && !pattern) + appendPQExpBufferStr(, "AND n.nspname !~ '^pg_'\n"); + if (!validateSQLNamePattern(, pattern, true, false, "n.nspname", "c.relname", NULL, -"n.nspname !~ '^pg_' AND pg_catalog.pg_table_is_visible(c.oid)", +"pg_catalog.pg_table_is_visible(c.oid)", NULL, 3)) goto error_return; diff --git a/src/bin/psql/describe.h b/src/bin/psql/describe.h index bd051e09cb..58d0cf032b 100644 --- a/src/bin/psql/describe.h +++ b/src/bin/psql/describe.h @@ -38,7 +38,7 @@ extern bool describeRoles(const char *pattern, bool verbose, bool showSystem); extern bool listDbRoleSettings(const char *pattern, const char *pattern2); /* \z (or \dp) */ -extern bool permissionsList(const char *pattern); +extern bool permissionsList(const char *pattern, bool showSystem); /* \ddp */ extern bool listDefaultACLs(const char *pattern);
Re: Cygwin cleanup
Hi, On 2022-11-08 19:04:37 -0600, Justin Pryzby wrote: > From 2741472080eceac5cb6d002c39eaf319d7f72b50 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby > Date: Fri, 30 Sep 2022 13:39:43 -0500 > Subject: [PATCH 1/3] meson: other fixes for cygwin > > XXX: what about HAVE_BUGGY_STRTOF ? What about it? As noted in another thread, HAVE_BUGGY_STRTOF is defined in a header, and shouldn't be affected by the buildsystem. Pushed this commit. > XXX This should use a canned Docker image with all the right packages > installed? But if the larger image is slower to start, then maybe not... I think once we convert the windows containers to windows VMs we can just install both cygwin and mingw in the same image. The overhead of installing too much seems far far smaller there. > +CONFIGURE_FLAGS: --enable-cassert --enable-debug --with-ldap > --with-ssl=openssl --with-libxml > +# --enable-tap-tests I assume this is disabled as tap tests fail? > +C:\tools\cygwin\bin\bash.exe --login -c "cygserver-config -y" I'd copy the approach used for mingw of putting most of this in an environment variable. > +findargs='' > case $os in > freebsd|linux|macos) > -;; > +;; > + > +cygwin) > +# XXX Evidently I don't know how to write two arguments here without > pathname expansion later, other than eval. > +#findargs='-name "*.stackdump"' > +for stack in $(find "$directory" -type f -name "*.stackdump") ; do > +binary=`basename "$stack" .stackdump` > +echo;echo; > +echo "dumping ${stack} for ${binary}" > +awk '/^0/{print $2}' $stack |addr2line -f -i -e > ./src/backend/postgres.exe > +#awk '/^0/{print $2}' $stack |addr2line -f -i -e > "./src/backend/$binary.exe" > +done > +exit 0 > +;; Is this stuff actually needed? Could we use the infrastructure we use for backtraces with msvc instead? Or use something that understands .stackdump files? > +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm > [...] > +++ b/src/test/perl/PostgreSQL/Test/Utils.pm > [...] > +++ b/src/test/recovery/t/020_archive_status.pl > [...] I think these should be in a separate commit, they're not actually about CI. Greetings, Andres Freund
Re: wake up logical workers after ALTER SUBSCRIPTION
Thanks for reviewing! On Tue, Dec 06, 2022 at 07:44:46PM +0300, Melih Mutlu wrote: > Is it really necessary to wake logical workers up when renaming other than > subscription or publication? address.objectId will be a valid subid only > when renaming a subscription. Oops, that is a mistake. I only meant to wake up the workers for ALTER SUBSCRIPTION RENAME. I think I've fixed this in v6. > - When the state is SYNCDONE and the apply worker has to wake up to change > the state to READY. > > I think we already call logicalrep_worker_wakeup_ptr wherever it's needed > for the above cases? What am I missing here? IIUC we must restart all the apply workers for a subscription to enable two_phase mode. It looks like finish_sync_worker() only wakes up its own apply worker. I moved this logic to where the sync worker marks the state as SYNCDONE and added a check that two_phase mode is pending. Even so, there can still be unnecessary wakeups, but this adjustment should limit them. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 4be842c8a857a13b0bec2ebcbd9f8addf8b8562a Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 21 Nov 2022 16:01:01 -0800 Subject: [PATCH v6 1/1] wake up logical workers as needed instead of relying on periodic wakeups --- src/backend/access/transam/xact.c | 3 ++ src/backend/commands/alter.c| 7 src/backend/commands/subscriptioncmds.c | 4 ++ src/backend/replication/logical/tablesync.c | 8 src/backend/replication/logical/worker.c| 46 + src/include/replication/logicalworker.h | 3 ++ 6 files changed, 71 insertions(+) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index 8086b857b9..dc00e66cfb 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -47,6 +47,7 @@ #include "pgstat.h" #include "replication/logical.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/snapbuild.h" #include "replication/syncrep.h" @@ -2360,6 +2361,7 @@ CommitTransaction(void) AtEOXact_PgStat(true, is_parallel_worker); AtEOXact_Snapshot(true, false); AtEOXact_ApplyLauncher(true); + AtEOXact_LogicalRepWorkers(true); pgstat_report_xact_timestamp(0); CurrentResourceOwner = NULL; @@ -2860,6 +2862,7 @@ AbortTransaction(void) AtEOXact_HashTables(false); AtEOXact_PgStat(false, is_parallel_worker); AtEOXact_ApplyLauncher(false); + AtEOXact_LogicalRepWorkers(false); pgstat_report_xact_timestamp(0); } diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index 10b6fe19a2..d095cd3ced 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -59,6 +59,7 @@ #include "commands/user.h" #include "miscadmin.h" #include "parser/parse_func.h" +#include "replication/logicalworker.h" #include "rewrite/rewriteDefine.h" #include "tcop/utility.h" #include "utils/builtins.h" @@ -279,6 +280,12 @@ AlterObjectRename_internal(Relation rel, Oid objectId, const char *new_name) if (strncmp(new_name, "regress_", 8) != 0) elog(WARNING, "subscriptions created by regression test cases should have names starting with \"regress_\""); #endif + + /* + * Wake up the logical replication workers to handle this change + * quickly. + */ + LogicalRepWorkersWakeupAtCommit(objectId); } else if (nameCacheId >= 0) { diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index d673557ea4..d6993c26e5 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -34,6 +34,7 @@ #include "nodes/makefuncs.h" #include "pgstat.h" #include "replication/logicallauncher.h" +#include "replication/logicalworker.h" #include "replication/origin.h" #include "replication/slot.h" #include "replication/walreceiver.h" @@ -1362,6 +1363,9 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, InvokeObjectPostAlterHook(SubscriptionRelationId, subid, 0); + /* Wake up the logical replication workers to handle this change quickly. */ + LogicalRepWorkersWakeupAtCommit(subid); + return myself; } diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 94e813ac53..ca556f7145 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -105,6 +105,7 @@ #include "pgstat.h" #include "replication/logicallauncher.h" #include "replication/logicalrelation.h" +#include "replication/logicalworker.h" #include "replication/walreceiver.h" #include "replication/worker_internal.h" #include "replication/slot.h" @@ -334,6 +335,13 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) */ ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false); + /* + * We might be ready to enable two_phase mode.
Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE
On Tue, 6 Dec 2022 at 20:42, Melih Mutlu wrote: > > Hi Vignesh, > > Looks like the patch needs a rebase. Rebased > Also one little suggestion: > >> + if (ends_with(prev_wd, ')')) >> + COMPLETE_WITH(Alter_routine_options, "CALLED ON NULL INPUT", >> + "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT"); > > > What do you think about gathering FUNCTION options as you did with ROUTINE > options. > Something like the following would seem nicer, I think. > >> #define Alter_function_options \ >> Alter_routine_options, "CALLED ON NULL INPUT", \ >> >> "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT" I did not make it as a macro for alter function options as it is used only in one place whereas the others were required in more than one place. The attached v4 patch is rebased on top of HEAD. Regards, Vignesh From c1436c498d1d939384999ecb508d5da3403abc90 Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Wed, 7 Dec 2022 00:27:37 +0530 Subject: [PATCH v4] Tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE. Tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE action was missing, added the tab completion for the same. --- src/bin/psql/tab-complete.c | 57 ++--- 1 file changed, 53 insertions(+), 4 deletions(-) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 89e7317c23..105517e61e 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -1149,6 +1149,16 @@ static const SchemaQuery Query_for_trigger_of_table = { "CREATE", "CONNECT", "TEMPORARY", "EXECUTE", "USAGE", "SET", "ALTER SYSTEM", \ "VACUUM", "ANALYZE", "ALL" +/* ALTER PROCEDURE options */ +#define Alter_procedure_options \ +"DEPENDS ON EXTENSION", "EXTERNAL SECURITY", "NO DEPENDS ON EXTENSION", \ +"OWNER TO", "RENAME TO", "RESET", "SECURITY", "SET" + +/* ALTER ROUTINE options */ +#define Alter_routine_options \ +Alter_procedure_options, "COST", "IMMUTABLE", "LEAKPROOF", "NOT LEAKPROOF", \ +"PARALLEL", "ROWS", "STABLE", "VOLATILE" + /* * These object types were introduced later than our support cutoff of * server version 9.2. We use the VersionedQuery infrastructure so that @@ -1812,12 +1822,51 @@ psql_completion(const char *text, int start, int end) else COMPLETE_WITH_FUNCTION_ARG(prev2_wd); } - /* ALTER FUNCTION,PROCEDURE,ROUTINE (...) */ - else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny)) + /* ALTER FUNCTION (...) */ + else if (Matches("ALTER", "FUNCTION", MatchAny, MatchAny)) + { + if (ends_with(prev_wd, ')')) + COMPLETE_WITH(Alter_routine_options, "CALLED ON NULL INPUT", + "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT"); + else + COMPLETE_WITH_FUNCTION_ARG(prev2_wd); + } + /* ALTER FUNCTION|ROUTINE (...) PARALLEL */ + else if (Matches("ALTER", "FUNCTION|ROUTINE", MatchAny, MatchAny, + "PARALLEL")) + COMPLETE_WITH("RESTRICTED", "SAFE", "UNSAFE"); + /* + * ALTER FUNCTION|PROCEDURE|ROUTINE (...) + * [EXTERNAL] SECURITY + */ + else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, + "SECURITY") || + Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, + "EXTERNAL", "SECURITY")) + COMPLETE_WITH("DEFINER" ,"INVOKER"); + /* ALTER FUNCTION|PROCEDURE|ROUTINE (...) RESET */ + else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, + "RESET")) + COMPLETE_WITH_QUERY_VERBATIM_PLUS(Query_for_list_of_set_vars, + "ALL"); + /* ALTER FUNCTION|PROCEDURE|ROUTINE (...) SET */ + else if (Matches("ALTER", "FUNCTION|PROCEDURE|ROUTINE", MatchAny, MatchAny, + "SET")) + COMPLETE_WITH_QUERY_VERBATIM_PLUS(Query_for_list_of_set_vars, + "SCHEMA"); + /* ALTER PROCEDURE (...) */ + else if (Matches("ALTER", "PROCEDURE", MatchAny, MatchAny)) + { + if (ends_with(prev_wd, ')')) + COMPLETE_WITH(Alter_procedure_options); + else + COMPLETE_WITH_FUNCTION_ARG(prev2_wd); + } + /* ALTER ROUTINE (...) */ + else if (Matches("ALTER", "ROUTINE", MatchAny, MatchAny)) { if (ends_with(prev_wd, ')')) - COMPLETE_WITH("OWNER TO", "RENAME TO", "SET SCHEMA", - "DEPENDS ON EXTENSION", "NO DEPENDS ON EXTENSION"); + COMPLETE_WITH(Alter_routine_options); else COMPLETE_WITH_FUNCTION_ARG(prev2_wd); } -- 2.34.1
Re: Time delayed LR (WAS Re: logical replication restrictions)
Hi, The tests fail on cfbot: https://cirrus-ci.com/task/4533866329800704 They only seem to fail on 32bit linux. https://api.cirrus-ci.com/v1/artifact/task/4533866329800704/testrun/build-32/testrun/subscription/032_apply_delay/log/regress_log_032_apply_delay [06:27:10.628](0.138s) ok 2 - check if the new rows were applied to subscriber timed out waiting for match: (?^:logical replication apply delay) at /tmp/cirrus-ci-build/src/test/subscription/t/032_apply_delay.pl line 124. Greetings, Andres Freund
Re: generic plans and "initial" pruning
I find the API of GetCachedPlans a little weird after this patch. I think it may be better to have it return a pointer of a new struct -- one that contains both the CachedPlan pointer and the list of pruning results. (As I understand, the sole caller that isn't interested in the pruning results, SPI_plan_get_cached_plan, can be explained by the fact that it knows there won't be any. So I don't think we need to worry about this case?) And I think you should make that struct also be the last argument of PortalDefineQuery, so you don't need the separate PortalStorePartitionPruneResults function -- because as far as I can tell, the callers that pass a non-NULL pointer there are the exactly same that later call PortalStorePartitionPruneResults. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Re: Temporary tables versus wraparound... again
Hi, On 2022-12-06 13:47:39 -0500, Greg Stark wrote: > So I talked about this patch with Ronan Dunklau and he had a good > question Why are we maintaining relfrozenxid and relminmxid in > pg_class for temporary tables at all? Autovacuum can't use them and > other sessions won't care about them. The only session that might care > about them is the one attached to the temp schema. Uh, without relfrozenxid for temp tables we can end up truncating clog "ranges" away that are required to access the temp tables. So this would basically mean that temp tables can't be used reliably anymore. Greetings, Andres Freund
Re: 16: Collation versioning and dependency helpers
Hi, On 2022-10-31 16:36:54 -0700, Jeff Davis wrote: > Committed these two small changes. FWIW, as it stands cfbot can't apply the remaining changes: http://cfbot.cputube.org/patch_40_3977.log Perhaps worth posting a new version? Or are the remaining patches abandoned in favor of the other threads? Greetings, Andres Freund
Re: PATCH: Using BRIN indexes for sorted output
Hi, On 2022-11-17 00:52:35 +0100, Tomas Vondra wrote: > Well, yeah. That's pretty much exactly what the last version of this > patch (from October 23) does. That version unfortunately doesn't build successfully: https://cirrus-ci.com/task/5108789846736896 [03:02:48.641] Duplicate OIDs detected: [03:02:48.641] 9979 [03:02:48.641] 9980 [03:02:48.641] found 2 duplicate OID(s) in catalog data Greetings, Andres Freund
Re: Temporary tables versus wraparound... again
So I talked about this patch with Ronan Dunklau and he had a good question Why are we maintaining relfrozenxid and relminmxid in pg_class for temporary tables at all? Autovacuum can't use them and other sessions won't care about them. The only session that might care about them is the one attached to the temp schema. So we could replace relfrozenxid and relminmxid for temp tables with a local hash table that can be updated on every truncate easily and efficiently. If a temp table actually wraps around the only session that runs into problems is the one attached to that temp schema. It can throw local session errors and doesn't need to interfere with the rest of the cluster in any way. It could even start running vacuums though I'm not convinced that's a great solution. At least I think so. I'm pretty sure about relfrozenxid but as always I don't really know how relminmxid works. I think we only need to worry about multixacts for subtransactions, all of which are in the same transaction -- does that even work that way? But this is really attractive since it means no catalog updates just for temp tables on every transaction and no wraparound cluster problems even if you have on-commit-preserve-rows tables. It really shouldn't be possible for a regular user to cause the whole cluster to run into problems just by creating a temp table and keeping a connection around a while.
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi, On 2022-11-11 17:16:36 +0100, Önder Kalacı wrote: > I rebased the changes to the current master branch, reflected pg_indent > suggestions and also made a few minor style changes. Needs another rebase, I think: https://cirrus-ci.com/task/559244463758 [05:44:22.102] FAILED: src/backend/postgres_lib.a.p/replication_logical_worker.c.o [05:44:22.102] ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -Isrc/include/storage -Isrc/include/utils -Isrc/include/catalog -Isrc/include/nodes -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -pthread -DBUILDING_DLL -MD -MQ src/backend/postgres_lib.a.p/replication_logical_worker.c.o -MF src/backend/postgres_lib.a.p/replication_logical_worker.c.o.d -o src/backend/postgres_lib.a.p/replication_logical_worker.c.o -c ../src/backend/replication/logical/worker.c [05:44:22.102] ../src/backend/replication/logical/worker.c: In function ‘get_usable_indexoid’: [05:44:22.102] ../src/backend/replication/logical/worker.c:2101:36: error: ‘ResultRelInfo’ has no member named ‘ri_RootToPartitionMap’ [05:44:22.102] 2101 | TupleConversionMap *map = relinfo->ri_RootToPartitionMap; [05:44:22.102] |^~ Greetings, Andres Freund
Re: Removing unneeded self joins
Hi, On 2022-10-05 17:25:18 +0500, Andrey Lepikhov wrote: > New version, rebased onto current master. > Nothing special, just rebase. This doesn't pass the main regression tests due to a plan difference. https://cirrus-ci.com/task/5537518245380096 https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out --- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 19:11:52.453920838 + +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 2022-12-05 19:15:21.864183651 + @@ -5806,7 +5806,7 @@ Nested Loop Join Filter: (sj_t3.id = sj_t1.id) -> Nested Loop - Join Filter: (sj_t3.id = sj_t2.id) + Join Filter: (sj_t2.id = sj_t3.id) -> Nested Loop Semi Join -> Nested Loop -> HashAggregate Greetings, Andres Freund
Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
Hi, On 2022-09-19 01:29:21 +0300, Anton A. Melnikov wrote: > Corrected patch is attached (v2-0001-Fix-burst-checkpoint_req-growth.patch). This patch doesn't pass the main regression tests tests successfully: https://cirrus-ci.com/task/5502700019253248 diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/rules.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/rules.out --- /tmp/cirrus-ci-build/src/test/regress/expected/rules.out2022-12-06 05:49:53.687424000 + +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/rules.out 2022-12-06 05:53:04.64269 + @@ -1816,6 +1816,9 @@ FROM pg_stat_get_archiver() s(archived_count, last_archived_wal, last_archived_time, failed_count, last_failed_wal, last_failed_time, stats_reset); pg_stat_bgwriter| SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, +pg_stat_get_bgwriter_timed_restartpoints() AS restartpoints_timed, +pg_stat_get_bgwriter_requested_restartpoints() AS restartpoints_req, +pg_stat_get_bgwriter_performed_restartpoints() AS restartpoints_done, pg_stat_get_checkpoint_write_time() AS checkpoint_write_time, pg_stat_get_checkpoint_sync_time() AS checkpoint_sync_time, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, Greetings, Andres Freund
Re: New strategies for freezing, advancing relfrozenxid early
Hi, On 2022-11-23 15:06:52 -0800, Peter Geoghegan wrote: > Attached is v8. The docs don't build: https://cirrus-ci.com/task/5456939761532928 [20:00:58.203] postgres.sgml:52: element link: validity error : IDREF attribute linkend references an unknown ID "vacuum-for-wraparound" Greetings, Andres Freund
Re: RFC: Logging plan of the running query
Hi, This patch does not currently build, due to a conflicting oid: https://cirrus-ci.com/task/4638460594618368?logs=build#L108 [17:26:44.602] /usr/bin/perl ../src/include/catalog/../../backend/catalog/genbki.pl --include-path=../src/include --set-version=16 --output=src/include/catalog ../src/include/catalog/pg_proc.h ../src/include/catalog/pg_type.h ../src/include/catalog/pg_attribute.h ../src/include/catalog/pg_class.h ../src/include/catalog/pg_attrdef.h ../src/include/catalog/pg_constraint.h ../src/include/catalog/pg_inherits.h ../src/include/catalog/pg_index.h ../src/include/catalog/pg_operator.h ../src/include/catalog/pg_opfamily.h ../src/include/catalog/pg_opclass.h ../src/include/catalog/pg_am.h ../src/include/catalog/pg_amop.h ../src/include/catalog/pg_amproc.h ../src/include/catalog/pg_language.h ../src/include/catalog/pg_largeobject_metadata.h ../src/include/catalog/pg_largeobject.h ../src/include/catalog/pg_aggregate.h ../src/include/catalog/pg_statistic.h ../src/include/catalog/pg_statistic_ext.h ../src/include/catalog/pg_statistic_ext_data.h ../src/include/catalog/pg_rewrite.h ../src/include/catalog/pg_trigger.h ../src/include/catalog/pg_event_trigger.h ../src/include/catalog/pg_description.h ../src/include/catalog/pg_cast.h ../src/include/catalog/pg_enum.h ../src/include/catalog/pg_namespace.h ../src/include/catalog/pg_conversion.h ../src/include/catalog/pg_depend.h ../src/include/catalog/pg_database.h ../src/include/catalog/pg_db_role_setting.h ../src/include/catalog/pg_tablespace.h ../src/include/catalog/pg_authid.h ../src/include/catalog/pg_auth_members.h ../src/include/catalog/pg_shdepend.h ../src/include/catalog/pg_shdescription.h ../src/include/catalog/pg_ts_config.h ../src/include/catalog/pg_ts_config_map.h ../src/include/catalog/pg_ts_dict.h ../src/include/catalog/pg_ts_parser.h ../src/include/catalog/pg_ts_template.h ../src/include/catalog/pg_extension.h ../src/include/catalog/pg_foreign_data_wrapper.h ../src/include/catalog/pg_foreign_server.h ../src/include/catalog/pg_user_mapping.h ../src/include/catalog/pg_foreign_table.h ../src/include/catalog/pg_policy.h ../src/include/catalog/pg_replication_origin.h ../src/include/catalog/pg_default_acl.h ../src/include/catalog/pg_init_privs.h ../src/include/catalog/pg_seclabel.h ../src/include/catalog/pg_shseclabel.h ../src/include/catalog/pg_collation.h ../src/include/catalog/pg_parameter_acl.h ../src/include/catalog/pg_partitioned_table.h ../src/include/catalog/pg_range.h ../src/include/catalog/pg_transform.h ../src/include/catalog/pg_sequence.h ../src/include/catalog/pg_publication.h ../src/include/catalog/pg_publication_namespace.h ../src/include/catalog/pg_publication_rel.h ../src/include/catalog/pg_subscription.h ../src/include/catalog/pg_subscription_rel.h [17:26:44.602] Duplicate OIDs detected: [17:26:44.602] 4550 [17:26:44.602] found 1 duplicate OID(s) in catalog data I suggest you choose a random oid out of the "development purposes" range: * OIDs 1- are reserved for manual assignment (see .dat files in * src/include/catalog/). Of these, 8000- are reserved for * development purposes (such as in-progress patches and forks); * they should not appear in released versions. Greetings, Andres Freund
Re: Eliminating SPI from RI triggers - take 2
Hi, On 2022-10-15 14:47:05 +0900, Amit Langote wrote: > Attached updated patches. These started to fail to build recently: [04:43:33.046] ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -Isrc/include/storage -Isrc/include/utils -Isrc/include/catalog -Isrc/include/nodes -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -pthread -DBUILDING_DLL -MD -MQ src/backend/postgres_lib.a.p/executor_execPartition.c.o -MF src/backend/postgres_lib.a.p/executor_execPartition.c.o.d -o src/backend/postgres_lib.a.p/executor_execPartition.c.o -c ../src/backend/executor/execPartition.c [04:43:33.046] ../src/backend/executor/execPartition.c: In function ‘ExecGetLeafPartitionForKey’: [04:43:33.046] ../src/backend/executor/execPartition.c:1679:19: error: too few arguments to function ‘build_attrmap_by_name_if_req’ [04:43:33.046] 1679 |AttrMap *map = build_attrmap_by_name_if_req(RelationGetDescr(root_rel), [04:43:33.046] | ^~~~ [04:43:33.046] In file included from ../src/include/access/tupconvert.h:17, [04:43:33.046] from ../src/include/nodes/execnodes.h:32, [04:43:33.046] from ../src/include/executor/execPartition.h:16, [04:43:33.046] from ../src/backend/executor/execPartition.c:21: [04:43:33.046] ../src/include/access/attmap.h:47:17: note: declared here [04:43:33.046]47 | extern AttrMap *build_attrmap_by_name_if_req(TupleDesc indesc, [04:43:33.046] | ^~~~ Regards, Andres
Re: doc: add missing "id" attributes to extension packaging page
On 06.12.2022 at 19:11, Brar Piening wrote: The current statistics for docbook elements with/without ids after applying the patch are the following: Somehow my e-mail client destroyed the table. That's how it was supposed to look like: name | with_id | without_id | id_coverage | min_id_len | max_id_len ---+-++-++ sect2 | 870 | 0 | 100.00 | 7 | 57 sect1 | 739 | 0 | 100.00 | 4 | 46 refentry | 307 | 0 | 100.00 | 8 | 37 sect3 | 206 | 0 | 100.00 | 11 | 57 chapter | 77 | 0 | 100.00 | 5 | 24 sect4 | 28 | 0 | 100.00 | 16 | 47 biblioentry | 23 | 0 | 100.00 | 6 | 15 simplesect | 20 | 0 | 100.00 | 24 | 39 appendix | 15 | 0 | 100.00 | 7 | 23 part | 8 | 0 | 100.00 | 5 | 20 co | 4 | 0 | 100.00 | 18 | 30 figure | 3 | 0 | 100.00 | 13 | 28 reference | 3 | 0 | 100.00 | 14 | 18 anchor | 1 | 0 | 100.00 | 21 | 21 bibliography | 1 | 0 | 100.00 | 8 | 8 book | 1 | 0 | 100.00 | 10 | 10 index | 1 | 0 | 100.00 | 11 | 11 legalnotice | 1 | 0 | 100.00 | 13 | 13 preface | 1 | 0 | 100.00 | 9 | 9 glossentry | 119 | 14 | 89.47 | 13 | 32 table | 285 | 162 | 63.76 | 12 | 56 example | 27 | 16 | 62.79 | 12 | 42 refsect3 | 5 | 3 | 62.50 | 19 | 24 refsect2 | 41 | 56 | 42.27 | 10 | 36 varlistentry | 1701 | 3172 | 34.91 | 9 | 64 footnote | 5 | 18 | 21.74 | 17 | 32 step | 28 | 130 | 17.72 | 7 | 28 refsect1 | 151 | 1333 | 10.18 | 15 | 40 informaltable | 1 | 15 | 6.25 | 25 | 25 phrase | 2 | 94 | 2.08 | 20 | 26 indexterm | 5 | 3262 | 0.15 | 17 | 26 variablelist | 1 | 813 | 0.12 | 21 | 21 function | 4 | 4011 | 0.10 | 12 | 28 entry | 11 | 17740 | 0.06 | 21 | 40 para | 3 | 25734 | 0.01 | 21 | 27
Re: PATCH: AM-specific statistics, with an example implementation for BRIN (WIP)
Hi, On 2022-10-18 13:33:59 +0200, Tomas Vondra wrote: > I mentioned I have some ideas how to improve this, and that I'll start a > separate thread to discuss this. So here we go ... This CF entry has been failing tests since it was submitted. Are you planning to work on this further? If not I think we should just close the CF entry. Greetings, Andres Freund
Re: Add the ability to limit the amount of memory that can be allocated to backends.
Hi, On 2022-11-26 22:22:15 -0500, Reid Thompson wrote: > rebased/patched to current master && current > pg-stat-activity-backend-memory-allocated This version fails to build with msvc, and builds with warnings on other platforms. https://cirrus-ci.com/build/5410696721072128 msvc: [20:26:51.286] c:\cirrus\src\include\utils/backend_status.h(40): error C2059: syntax error: 'constant' mingw cross: [20:26:26.358] from /usr/share/mingw-w64/include/winsock2.h:23, [20:26:26.358] from ../../src/include/port/win32_port.h:60, [20:26:26.358] from ../../src/include/port.h:24, [20:26:26.358] from ../../src/include/c.h:1306, [20:26:26.358] from ../../src/include/postgres.h:47, [20:26:26.358] from controldata_utils.c:18: [20:26:26.358] ../../src/include/utils/backend_status.h:40:2: error: expected identifier before numeric constant [20:26:26.358]40 | IGNORE, [20:26:26.358] | ^~ [20:26:26.358] In file included from ../../src/include/postgres.h:48, [20:26:26.358] from controldata_utils.c:18: [20:26:26.358] ../../src/include/utils/backend_status.h: In function ‘pgstat_report_allocated_bytes’: [20:26:26.358] ../../src/include/utils/backend_status.h:365:12: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘uint64’ {aka ‘long long unsigned int’} [-Werror=format=] [20:26:26.358] 365 | errmsg("Backend %d deallocated %ld bytes, exceeding the %ld bytes it is currently reporting allocated. Setting reported to 0.", [20:26:26.358] | ^~~ [20:26:26.358] 366 | MyProcPid, allocated_bytes, *my_allocated_bytes)); [20:26:26.358] |~~~ [20:26:26.358] || [20:26:26.358] |uint64 {aka long long unsigned int} Due to windows having long be 32bit, you need to use %lld. Our custom to deal with that is to cast the argument to errmsg as long long unsigned and use %llu. Btw, given that the argument is uint64, it doesn't seem correct to use %ld, that's signed. Not that it's going to matter, but ... Greetings, Andres Freund
Re: [Proposal] Add foreign-server health checks infrastructure
Hi, On 2022-11-25 11:41:45 +, Hayato Kuroda (Fujitsu) wrote: > Sorry for my late reply. I understood that we got agreement the basic design > of first version. Thanks! > I attached new version patches. This is failing on cfbot / CI, as have prior versions. https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F40%2F3388 https://api.cirrus-ci.com/v1/artifact/task/6655254493134848/testrun/build/testrun/postgres_fdw/regress/regression.diffs diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out /tmp/cirrus-ci-build/build/testrun/postgres_fdw/regress/results/postgres_fdw.out --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 2022-12-06 05:21:33.906116000 + +++ /tmp/cirrus-ci-build/build/testrun/postgres_fdw/regress/results/postgres_fdw.out 2022-12-06 05:27:24.929694000 + @@ -11732,10 +11732,9 @@ -- execute check function. This should return false on supported platforms, -- otherwise return true. SELECT postgres_fdw_verify_connection_states('loopback'); -WARNING: 08006 postgres_fdw_verify_connection_states --- - f + t (1 row) ABORT; The failure happens everywhere except on linux, so presumably there's some portability issue in the patch. I've set the patch as waiting on author for now. Note that you can test CI in your own repository, as explained in src/tools/ci/README Greetings, Andres Freund
Re: Error-safe user functions
On Tue, Dec 6, 2022 at 6:46 AM Andrew Dunstan wrote: > > On 2022-12-05 Mo 20:06, Tom Lane wrote: > > Andres Freund writes: > > > >> But perhaps it's even worth having such a function properly exposed: > >> It's not at all rare to parse text data during ETL and quite often > >> erroring out fatally is undesirable. As savepoints are undesirable > >> overhead-wise, there's a lot of SQL out there that tries to do a > >> pre-check about whether some text could be cast to some other data > >> type. A function that'd try to cast input to a certain type without > >> erroring out hard would be quite useful for that. > > Corey and Vik are already talking about a non-error CAST variant. > > > /metoo! :-) > > > > Maybe we should leave this in abeyance until something shows up > > for that? Otherwise we'll be making a nonstandard API for what > > will probably ultimately be SQL-spec functionality. I don't mind > > that as regression-test infrastructure, but I'm a bit less excited > > about exposing it as a user feature. > > > > > I think a functional mechanism could be very useful. Who knows when the > standard might specify something in this area? > > > Vik's working on the standard (he put the spec in earlier in this thread). I'm working on implementing it on top of Tom's work, but I'm one patchset behind at the moment. Once completed, it should be leverage-able in several places, COPY being the most obvious. What started all this was me noticing that if I implemented TRY_CAST in pl/pgsql with an exception block, then I wasn't able to use parallel query.
Re: Make EXPLAIN generate a generic plan for a parameterized query
Hi, On 2022-10-29 10:35:26 +0200, Laurenz Albe wrote: > On Tue, 2022-10-25 at 19:03 +0800, Julien Rouhaud wrote: > > On Tue, Oct 25, 2022 at 11:08:27AM +0200, Laurenz Albe wrote: > > > Here is a patch that > > > implements it with an EXPLAIN option named GENERIC_PLAN. > > > > I only have a quick look at the patch for now. Any reason why you don't > > rely > > on the existing explain_filter() function for emitting stable output > > (without > > having to remove the costs)? It would also take care of checking that it > > works > > in plpgsql. > > No, there is no principled reason I did it like that. Version 2 does it like > you suggest. This fails to build the docs: https://cirrus-ci.com/task/5609301511766016 [17:47:01.064] ref/explain.sgml:179: parser error : Opening and ending tag mismatch: likeral line 179 and literal [17:47:01.064] ANALYZE, since a statement with unknown parameters [17:47:01.064] ^ Greetings, Andres Freund
Re: doc: add missing "id" attributes to extension packaging page
On 06.12.2022 at 18:59, Brar Piening wrote: On 06.12.2022 at 09:38, Alvaro Herrera wrote: I would welcome separate patches: one to add the IDs, another for the XSL/CSS stuff. That allows us to discuss them separately. I'll send two patches in two separate e-mails in a moment. This is patch no 2 that adds links to html elements with ids to make them visible on the HTML surface when hovering the element. Regards, Brar diff --git a/doc/src/sgml/stylesheet-html-common.xsl b/doc/src/sgml/stylesheet-html-common.xsl index 9df2782ce4..111b03d6fc 100644 --- a/doc/src/sgml/stylesheet-html-common.xsl +++ b/doc/src/sgml/stylesheet-html-common.xsl @@ -301,4 +301,115 @@ set toc,title + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 6 + + + + + + http://www.w3.org/1999/xhtml;> + + + +clear: both + + + + + + + + + + + + + + + + + + + + + + + # + + + + + + id_link + + # + + + + + + + + element without id. Please add an id to make it usable + as stable anchor in the public HTML documentation. + + + Please add an id to the + + element in the sgml source code. + + + + + + diff --git a/doc/src/sgml/stylesheet.css b/doc/src/sgml/stylesheet.css index 6410a47958..e4174e0613 100644 --- a/doc/src/sgml/stylesheet.css +++ b/doc/src/sgml/stylesheet.css @@ -163,3 +163,13 @@ acronym{ font-style: inherit; } width: 75%; } } + +/* Links to ids of headers and definition terms */ +a.id_link { +color: inherit; +visibility: hidden; +} + +*:hover > a.id_link { +visibility: visible; +}
Re: pg_upgrade test failure
Hi, On 2022-11-08 01:16:09 +1300, Thomas Munro wrote: > So [1] on its own didn't fix this. My next guess is that the attached > might help. > > Hmm. Following Michael's clue that this might involve log files and > pg_ctl, I noticed one thing: pg_ctl implements > wait_for_postmaster_stop() by waiting for kill(pid, 0) to fail, and > our kill emulation does CallNamedPipe(). If the server is in the > process of exiting and the kernel is cleaning up all the handles we > didn't close, is there any reason to expect the signal pipe to be > closed after the log file? What is our plan here? This afaict is the most common "false positive" for cfbot in the last weeks. E.g.: https://api.cirrus-ci.com/v1/artifact/task/5462686092230656/testrun/build/testrun/pg_upgrade/002_pg_upgrade/log/regress_log_002_pg_upgrade ... [00:02:58.761](93.859s) ok 10 - run of pg_upgrade for new instance [00:02:58.808](0.047s) not ok 11 - pg_upgrade_output.d/ removed after pg_upgrade success [00:02:58.815](0.007s) # Failed test 'pg_upgrade_output.d/ removed after pg_upgrade success' # at C:/cirrus/src/bin/pg_upgrade/t/002_pg_upgrade.pl line 288. Michael: Why does 002_pg_upgrade.pl try to filter the list of files in pg_upgrade_output.d for files ending in .log? And why does it print those only after starting the new node? How about moving the iteration through the pg_upgrade_output.d to before the ->start and printing all the files, but only slurp_file() if the filename ends with *.log? Minor nit: It seems off to quite so many copies of $newnode->data_dir . "/pg_upgrade_output.d" particularly where the test defines $log_path, but then still builds it from scratch after (line 304). Greetings, Andres Freund
Re: doc: add missing "id" attributes to extension packaging page
On 06.12.2022 at 09:38, Alvaro Herrera wrote: I care. The problem last time is that we were in the middle of the last commitfest, so we were (or at least I was) distracted by other stuff. Ok, thanks. That's appreciated and understood. Looking at the resulting psql page, https://pgdocs.piening.info/app-psql.html#APP-PSQL-OPTIONS-EXPANDED I note that the ID for the -x option is called "options-blah". I understand where does this come from: it's the "expanded" bit in the "options" section. However, put together it's a bit silly to have "options" in plural there; it would make more sense to have it be https://pgdocs.piening.info/app-psql.html#APP-PSQL-OPTION-EXPANDED (where you can read more naturally "the expanded option for psql"). How laborious would it be to make it so? No problem. I've already done it. Your second link should work now. It'll probably have some conflicts, yeah. I've updated and rebased my branch on current master now. I would welcome separate patches: one to add the IDs, another for the XSL/CSS stuff. That allows us to discuss them separately. I'll send two patches in two separate e-mails in a moment. Regards, Brar
Re: Add tracking of backend memory allocated to pg_stat_activity
Hi, On 2022-12-06 08:47:55 -0500, Reid Thompson wrote: > The intent was to capture and display all the memory allocated to the > backends, for admins/users/max_total_backend_memory/others to utilize. It's going to be far less accurate than that. For one, there's a lot of operating system resources, like the page table, that are going to be ignored. We're also not going to capture allocations done directly via malloc(). There's also copy-on-write effects that we're ignoring. If we just want to show an accurate picture of the current memory usage, we need to go to operating system specific interfaces. > Why should we ignore the allocations prior to backend_status.c? It seems to add complexity without a real increase in accuracy to me. But I'm not going to push harder on it than I already have. Greetings, Andres Freund
Re: Error-safe user functions
On Tue, Dec 6, 2022 at 11:07 AM Tom Lane wrote: > > I originally chose InputFunctionCallContext as a more neutral name in > > case we wanted to be able to pass some other sort of node for the > > context in future. > > Maybe that was a little too forward looking. > > I didn't like that because it seemed to convey nothing at all about > the expected behavior. I feel like this can go either way. If we pick a name that conveys a specific intended behavior now, and then later we want to pass some other sort of node for some purpose other than ignoring errors, it's unpleasant to have a name that sounds like it can only ignore errors. But if we never use it for anything other than ignoring errors, a specific name is clearer. -- Robert Haas EDB: http://www.enterprisedb.com
Re: wake up logical workers after ALTER SUBSCRIPTION
Hi Nathan, @@ -410,6 +411,12 @@ ExecRenameStmt(RenameStmt *stmt) > stmt->newname); > table_close(catalog, RowExclusiveLock); > > + /* > + * Wake up the logical replication workers to handle this > + * change quickly. > + */ > + LogicalRepWorkersWakeupAtCommit(address.objectId); Is it really necessary to wake logical workers up when renaming other than subscription or publication? address.objectId will be a valid subid only when renaming a subscription. @@ -322,6 +323,9 @@ UpdateSubscriptionRelState(Oid subid, Oid relid, char > state, > > /* Cleanup. */ > table_close(rel, NoLock); > + > + /* Wake up the logical replication workers to handle this change > quickly. */ > + LogicalRepWorkersWakeupAtCommit(subid); I wonder why a wakeup call is needed every time a subscription relation is updated. It seems to me that there are two places where UpdateSubscriptionRelState is called and we need another worker to wake up: - When a relation is in SYNCWAIT state, it waits for the apply worker to wake up and change the relation state to CATCHUP. Then tablesync worker needs to wake up to continue from CATCHUP state. - When the state is SYNCDONE and the apply worker has to wake up to change the state to READY. I think we already call logicalrep_worker_wakeup_ptr wherever it's needed for the above cases? What am I missing here? Best, -- Melih Mutlu Microsoft
Re: Error-safe user functions
Andrew Dunstan writes: > On 2022-12-06 Tu 09:42, Tom Lane wrote: >> I'm not sure this is any *better* than Safe ... it's longer, less >> mellifluous, and still subject to misinterpretation. But it's >> a possible alternative. > Yeah, I don't think there's terribly much to choose between 'safe' and > 'noerror' in terms of meaning. Yeah, I just wanted to throw it out there and see if anyone thought it was a better idea. > I originally chose InputFunctionCallContext as a more neutral name in > case we wanted to be able to pass some other sort of node for the > context in future. > Maybe that was a little too forward looking. I didn't like that because it seemed to convey nothing at all about the expected behavior. regards, tom lane
Re: Error-safe user functions
On 2022-12-06 Tu 09:42, Tom Lane wrote: > [ continuing the naming quagmire... ] > > I wrote: >> Andres Freund writes: >>> Not that I have a suggestion for a better name, but I don't particularly >>> like "Safe" denoting non-erroring input function calls. There's too many >>> interpretations of safe - e.g. safe against privilege escalation issues >>> or such. >> Yeah, I'm not that thrilled with it either --- but it's a reasonably >> on-point modifier, and short. > It occurs to me that another spelling could be NoError (or _noerror > where not using camel case). There's some precedent for that already; > and where we have it, it has the same implication of reporting rather > than throwing certain errors, without making a guarantee about all > errors. For instance lookup_rowtype_tupdesc_noerror won't prevent > throwing errors if catalog corruption is detected inside the catcaches. > > I'm not sure this is any *better* than Safe ... it's longer, less > mellifluous, and still subject to misinterpretation. But it's > a possible alternative. > > Yeah, I don't think there's terribly much to choose between 'safe' and 'noerror' in terms of meaning. I originally chose InputFunctionCallContext as a more neutral name in case we wanted to be able to pass some other sort of node for the context in future. Maybe that was a little too forward looking. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: ExecRTCheckPerms() and many prunable partitions
I have pushed this finally. I made two further changes: 1. there was no reason to rename ExecCheckPerms_hook, since its signature was changing anyway. I reverted it to the original name. 2. I couldn't find any reason to expose ExecGetRTEPermissionInfo, and given that it's a one-line function, I removed it. Maybe you had a reason to add ExecGetRTEPermissionInfo, thinking about external callers; if so please discuss it. I'll mark this commitfest entry as committed soon; please post the other two patches you had in this series in a new thread. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Improve tab completion for ALTER FUNCTION/PROCEDURE/ROUTINE
Hi Vignesh, Looks like the patch needs a rebase. Also one little suggestion: + if (ends_with(prev_wd, ')')) > + COMPLETE_WITH(Alter_routine_options, "CALLED ON NULL INPUT", > + "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT"); What do you think about gathering FUNCTION options as you did with ROUTINE options. Something like the following would seem nicer, I think. #define Alter_function_options \ > Alter_routine_options, "CALLED ON NULL INPUT", \ "RETURNS NULL ON NULL INPUT", "STRICT", "SUPPORT" Best, -- Melih Mutlu Microsoft
Re: Allow placeholders in ALTER ROLE w/o superuser
On Mon, Dec 5, 2022 at 10:32 PM Alexander Korotkov wrote: > On Mon, Dec 5, 2022 at 8:18 PM Tom Lane wrote: > > Alvaro Herrera writes: > > > I couldn't find any discussion of the idea of adding "(s)" to the > > > variable name in order to mark the variable userset in the catalog, and > > > I have to admit I find it a bit strange. Are we really agreed that > > > that's the way to proceed? > > > > I hadn't been paying close attention to this thread, sorry. > > > > I agree that that seems like a very regrettable choice, > > especially if you anticipate having to bump catversion anyway. > > I totally understand that this change requires a catversion bump. > I've reflected this in the commit message. > > > Better to add a bool column to the catalog. > > What about adding a boolean array to the pg_db_role_setting? So, > pg_db_role_setting would have the following columns. > * setdatabase oid > * setrole oid > * setconfig text[] > * setuser bool[] The revised patch implements this way for storage USER SET flag. I think it really became more structured and less cumbersome. -- Regards, Alexander Korotkov 0001-Add-USER-SET-parameter-values-for-pg_db_role_sett-v8.patch Description: Binary data
Re: old_snapshot: add test for coverage
On Mon, Aug 8, 2022 at 2:37 PM Tom Lane wrote: > > Dong Wook Lee writes: > > I wrote a test of the old_snapshot extension for coverage. > > Hmm, does this really provide any meaningful coverage? The test > sure looks like it's not doing much. Previously written tests were simply test codes to increase coverage. Therefore, we can make a better test. I'll think about it by this week. If this work exceeds my ability, I will let you know by reply. It's okay that the issue should be closed unless I write a meaningful test. --- Regards, DongWook Lee.
Re: Missing MaterialPath support in reparameterize_path_by_child
Ashutosh Bapat writes: > On Mon, Dec 5, 2022 at 8:13 PM Tom Lane wrote: >> I don't especially like "rel->nparts = 0" as a way of disabling >> partitionwise join ... > ... If we can not generate AppendPath for a > join relation, it means there is no way to compute child join > relations and thus the relation is not partitioned. So setting > rel->nparts = 0 is right. If we had nparts > 0 before, then it is partitioned for some value of "partitioned", so I don't entirely buy this argument. > Probably we should add macros similar to > dummy relation for marking and checking partitioned relation. I see > IS_PARTITIONED_RELATION() is defined already. Maybe we could add > mark_(un)partitioned_rel(). Hiding it behind a macro with an explanatory name would be an improvement, for sure. regards, tom lane
Re: Error-safe user functions
[ continuing the naming quagmire... ] I wrote: > Andres Freund writes: >> Not that I have a suggestion for a better name, but I don't particularly >> like "Safe" denoting non-erroring input function calls. There's too many >> interpretations of safe - e.g. safe against privilege escalation issues >> or such. > Yeah, I'm not that thrilled with it either --- but it's a reasonably > on-point modifier, and short. It occurs to me that another spelling could be NoError (or _noerror where not using camel case). There's some precedent for that already; and where we have it, it has the same implication of reporting rather than throwing certain errors, without making a guarantee about all errors. For instance lookup_rowtype_tupdesc_noerror won't prevent throwing errors if catalog corruption is detected inside the catcaches. I'm not sure this is any *better* than Safe ... it's longer, less mellifluous, and still subject to misinterpretation. But it's a possible alternative. regards, tom lane
Re: Missing MaterialPath support in reparameterize_path_by_child
On Mon, Dec 5, 2022 at 8:13 PM Tom Lane wrote: > > Ashutosh Bapat writes: > > partition-wise join should be willing to fallback to non-partitionwise > > join in such a case. After spending a few minutes with the code, I > > think generate_partitionwise_join_paths() should not call > > set_cheapest() is the pathlist of the child is NULL and should just > > wind up and avoid adding any path. > > We clearly need to not call set_cheapest(), but that's not sufficient; > we still fail at higher levels, as you'll see if you try the example > Richard found. I ended up making fe12f2f8f to fix this. Thanks. That looks good. > > I don't especially like "rel->nparts = 0" as a way of disabling > partitionwise join; ISTM it'd be clearer and more flexible to reset > consider_partitionwise_join instead of destroying the data structure. > But that's the way it's being done elsewhere, and I didn't want to > tamper with it in a bug fix. I see various assertions about parent > and child consider_partitionwise_join flags being equal, which we > might have to revisit if we try to make it work that way. > AFAIR, consider_partitionwise_join tells whether a given partitioned relation (join, higher or base) can be considered for partitionwise join. set_append_rel_size() decides that based on some properties. But rel->nparts is indicator of whether the relation (join, higher or base) is partitioned or not. If we can not generate AppendPath for a join relation, it means there is no way to compute child join relations and thus the relation is not partitioned. So setting rel->nparts = 0 is right. Probably we should add macros similar to dummy relation for marking and checking partitioned relation. I see IS_PARTITIONED_RELATION() is defined already. Maybe we could add mark_(un)partitioned_rel(). -- Best Wishes, Ashutosh Bapat
Re: predefined role(s) for VACUUM and ANALYZE
On 06.12.2022 03:04, Nathan Bossart wrote: I wonder why \dpS wasn't added. I wrote up a patch to add it and the corresponding documentation that other meta-commands already have. Yes, \dpS command and clarification in the documentation is exactly what is needed. -- Pavel Luzanov Postgres Professional: https://postgrespro.com
Re: Add tracking of backend memory allocated to pg_stat_activity
On Fri, 2022-12-02 at 09:18 -0800, Andres Freund wrote: > Hi, > > On 2022-12-02 11:09:30 -0500, Reid Thompson wrote: > > It appears to me that Postmaster populates the local variable that > > *my_allocated_bytes points to. That allocation is passed to forked > > children, and if not zeroed out, will be double counted as part of > > the child allocation. Is this invalid? > > I don't think we should count allocations made before > backend_status.c has > been initialized. > > Greetings, > > Andres Freund Hi, The intent was to capture and display all the memory allocated to the backends, for admins/users/max_total_backend_memory/others to utilize. Why should we ignore the allocations prior to backend_status.c? Thanks, Reid -- Reid Thompson Senior Software Engineer Crunchy Data, Inc. reid.thomp...@crunchydata.com www.crunchydata.com
Re: Force streaming every change in logical decoding
On Tue, Dec 6, 2022 at 7:29 PM Amit Kapila wrote: > > On Tue, Dec 6, 2022 at 11:53 AM shiy.f...@fujitsu.com > wrote: > > > > Hi hackers, > > > > In logical decoding, when logical_decoding_work_mem is exceeded, the > > changes are > > sent to output plugin in streaming mode. But there is a restriction that the > > minimum value of logical_decoding_work_mem is 64kB. I tried to add a GUC to > > allow sending every change to output plugin without waiting until > > logical_decoding_work_mem is exceeded. > > > > This helps to test streaming mode. For example, to test "Avoid streaming the > > transaction which are skipped" [1], it needs many XLOG_XACT_INVALIDATIONS > > messages. With the new option, it can be tested with fewer changes and in > > less > > time. Also, this new option helps to test more scenarios for "Perform > > streaming > > logical transactions by background workers" [2]. > > > > Yeah, I think this can also help in reducing the time for various > tests in test_decoding/stream and > src/test/subscription/t/*_stream_*.pl file by reducing the number of > changes required to invoke streaming mode. +1 > Can we think of making this > GUC extendible to even test more options on server-side (publisher) > and client-side (subscriber) cases? For example, we can have something > like logical_replication_mode with the following valid values: (a) > server_serialize: this will serialize each change to file on > publishers and then on commit restore and send all changes; (b) > server_stream: this will stream each change as currently proposed in > your patch. Then if we want to extend it for subscriber-side testing > then we can introduce new options like client_serialize for the case > being discussed in the email [1]. Setting logical_replication_mode = 'client_serialize' implies that the publisher behaves as server_stream? or do you mean we can set like logical_replication_mode = 'server_stream, client_serialize'? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Check snapshot argument of index_beginscan and family
On Tue, 6 Dec 2022 at 04:31, Alexander Korotkov wrote: > > On Fri, Dec 2, 2022 at 6:18 PM Alexander Korotkov > wrote: > > On Mon, Nov 28, 2022 at 1:30 PM Aleksander Alekseev > > wrote: > > > Thanks for the feedback! > > > > > > > I think it's a nice catch and worth fixing. The one thing I don't > > > > agree with is using asserts for handling the error that can appear > > > > because most probably the server is built with assertions off and in > > > > this case, there still will be a crash in this case. I'd do this with > > > > report ERROR. Otherwise, the patch looks right and worth committing. > > > > > > Normally a developer is not supposed to pass NULLs there so I figured > > > having extra if's in the release builds doesn't worth it. Personally I > > > wouldn't mind using ereport() but my intuition tells me that the > > > committers are not going to approve this :) > > > > > > Let's see what the rest of the community thinks. > > > > I think this is harmless assertion patch. I'm going to push this if > > no objections. > > Pushed! Great, thanks! Pavel Borisov.
Re: pg_basebackup: add test about zstd compress option
> The only thing that I can think of would be if $decompress_program > --test were failing, but actually trying to decompress succeeded. I > would be inclined to dismiss that particular scenario as not important > enough to be worth the additional CPU cycles. When I wrote this test, it was just to increase coverage for pg_basebackup. As I checked again, it already does that in the pg_verifybackup 008_untar.pl, 009_extrack.pl test you mentioned. Therefore, I agree with your opinion. --- Regards, DongWook Lee.
RE: Time delayed LR (WAS Re: logical replication restrictions)
On Friday, December 2, 2022 4:05 PM Amit Kapila wrote: > On Tue, Nov 15, 2022 at 12:33 PM Amit Kapila > wrote: > > One more thing I would like you to consider is the point raised by me > > related to this patch's interaction with the parallel apply feature as > > mentioned in the email [1]. I am not sure the idea proposed in that > > email [1] is a good one because delaying after applying commit may not > > be good as we want to delay the apply of the transaction(s) on > > subscribers by this feature. I feel this needs more thought. > > > > I have thought a bit more about this and we have the following options to > choose the delay point from. (a) apply delay just before committing a > transaction. As mentioned in comments in the patch this can lead to bloat and > locks held for a long time. (b) apply delay before starting to apply changes > for a > transaction but here the problem is which time to consider. In some cases, > like > for streaming transactions, we don't receive the commit/prepare xact time in > the start message. (c) use (b) but use the previous transaction's commit time. > (d) apply delay after committing a transaction by using the xact's commit > time. > > At this stage, among above, I feel any one of (c) or (d) is worth > considering. Now, > the difference between (c) and (d) is that if after commit the next xact's > data is > already delayed by more than min_apply_delay time then we don't need to kick > the new logic of apply delay. > > The other thing to consider whether we need to process any keepalive > messages during the delay because otherwise, walsender may think that the > subscriber is not available and time out. This may not be a problem for > synchronous replication but otherwise, it could be a problem. > > Thoughts? Hi, Thank you for your comments ! Below are some analysis for the major points above. (1) About the timing to apply the delay One approach of (b) would be best. The idea is to delay all types of transaction's application based on the time when one transaction arrives at the subscriber node. One advantage of this approach over (c) and (d) is that this can avoid the case where we might apply a transaction immediately without waiting, if there are two transactions sequentially and the time in between exceeds the min_apply_delay time. When we receive stream-in-progress transactions, we'll check whether the time for delay has passed or not at first in this approach. (2) About the timeout issue When having a look at the physical replication internals, it conducts sending feedback and application of delay separately on different processes. OTOH, the logical replication needs to achieve those within one process. When we want to apply delay and avoid the timeout, we should not store all the transactions data into memory. So, one approach for this is to serialize the transaction data and after the delay, we apply the transactions data. However, this means if users adopt this feature, then all transaction data that should be delayed would be serialized. We are not sure if this sounds a valid approach or not. One another approach was to divide the time of delay in apply_delay() and utilize the divided time for WaitLatch and sends the keepalive messages from there. But, this approach requires some change on the level of libpq layer (like implementing a new function for wal receiver in order to monitor if the data from the publisher is readable or not there). Probably, the first idea to serialize the delayed transactions might be better on this point. Any feedback is welcome. Best Regards, Takamichi Osumi