Add --{no-,}bypassrls flags to createuser
Hi, Add --{no-,}bypassrls flags to createuser. The following is an example of execution. -- $ createuser a --bypassrls $ psql -c "\du a" List of roles Role name | Attributes | Member of ---++--- a | Bypass RLS | {} -- Do you think? Regards, -- Shinya Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATIONdiff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml index 17579e50af..6c2ee1e0c6 100644 --- a/doc/src/sgml/ref/createuser.sgml +++ b/doc/src/sgml/ref/createuser.sgml @@ -290,6 +290,28 @@ PostgreSQL documentation + + --bypassrls + + +The new user will have the BYPASSRLS privilege, +which is described more fully in the documentation for . + + + + + + --no-bypassrls + + +The new user will not have the BYPASSRLS +privilege, which is described more fully in the documentation for . + + + + -? --help diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c index bfba0d09d1..5b363b6f54 100644 --- a/src/bin/scripts/createuser.c +++ b/src/bin/scripts/createuser.c @@ -48,6 +48,8 @@ main(int argc, char *argv[]) {"replication", no_argument, NULL, 1}, {"no-replication", no_argument, NULL, 2}, {"interactive", no_argument, NULL, 3}, + {"bypassrls", no_argument, NULL, 4}, + {"no-bypassrls", no_argument, NULL, 5}, {"connection-limit", required_argument, NULL, 'c'}, {"pwprompt", no_argument, NULL, 'P'}, {"encrypted", no_argument, NULL, 'E'}, @@ -76,7 +78,8 @@ main(int argc, char *argv[]) createrole = TRI_DEFAULT, inherit = TRI_DEFAULT, login = TRI_DEFAULT, -replication = TRI_DEFAULT; +replication = TRI_DEFAULT, +bypassrls = TRI_DEFAULT; PQExpBufferData sql; @@ -165,6 +168,12 @@ main(int argc, char *argv[]) case 3: interactive = true; break; + case 4: +bypassrls = TRI_YES; +break; + case 5: +bypassrls = TRI_NO; +break; default: /* getopt_long already emitted a complaint */ pg_log_error_hint("Try \"%s --help\" for more information.", progname); @@ -304,6 +313,10 @@ main(int argc, char *argv[]) appendPQExpBufferStr(, " REPLICATION"); if (replication == TRI_NO) appendPQExpBufferStr(, " NOREPLICATION"); + if (bypassrls == TRI_YES) + appendPQExpBufferStr(, " BYPASSRLS"); + if (bypassrls == TRI_NO) + appendPQExpBufferStr(, " NOBYPASSRLS"); if (conn_limit >= -1) appendPQExpBuffer(, " CONNECTION LIMIT %d", conn_limit); if (roles.head != NULL) @@ -366,6 +379,8 @@ help(const char *progname) "than using defaults\n")); printf(_(" --replication role can initiate replication\n")); printf(_(" --no-replication role cannot initiate replication\n")); + printf(_(" --bypassrls role can bypass row-level security (RLS) policy\n")); + printf(_(" --no-bypassrlsrole cannot bypass row-level security (RLS) policy\n")); printf(_(" -?, --helpshow this help, then exit\n")); printf(_("\nConnection options:\n")); printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
Re: Skip partition tuple routing with constant partition key
On Tue, Apr 12, 2022 at 6:16 AM Masahiko Sawada wrote: > > Hi, > > On Thu, Apr 7, 2022 at 4:37 PM Andres Freund wrote: > > > > Hi, > > > > On 2022-04-06 00:07:07 -0400, Tom Lane wrote: > > > Amit Langote writes: > > > > On Sun, Apr 3, 2022 at 10:31 PM Greg Stark wrote: > > > >> Is this a problem with the patch or its tests? > > > >> [18:14:20.798] Test Summary Report > > > >> [18:14:20.798] --- > > > >> [18:14:20.798] t/013_partition.pl (Wstat: 15360 Tests: 31 Failed: 0) > > > > > > > Hmm, make check-world passes for me after rebasing the patch (v10) to > > > > the latest HEAD (clean), nor do I see a failure on cfbot: > > > > http://cfbot.cputube.org/amit-langote.html > > > > > > 013_partition.pl has been failing regularly in the buildfarm, > > > most recently here: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican=2022-03-31%2000%3A49%3A45 > > > > Just failed locally on my machine as well. > > > > > > > I don't think there's room to blame any uncommitted patches > > > for that. Somebody broke it a short time before here: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse=2022-03-17%2016%3A08%3A19 > > > > The obvious thing to point a finger at is > > > > commit c91f71b9dc91ef95e1d50d6d782f477258374fc6 > > Author: Tomas Vondra > > Date: 2022-03-16 16:42:47 +0100 > > > > Fix publish_as_relid with multiple publications > > > > I've not managed to reproduce this issue on my machine but while > reviewing the code and the server logs[1] I may have found possible > bugs: > > 2022-04-08 12:59:30.701 EDT [91997:1] LOG: logical replication apply > worker for subscription "sub2" has started > 2022-04-08 12:59:30.702 EDT [91998:3] 013_partition.pl LOG: > statement: ALTER SUBSCRIPTION sub2 SET PUBLICATION pub_lower_level, > pub_all > 2022-04-08 12:59:30.733 EDT [91998:4] 013_partition.pl LOG: > disconnection: session time: 0:00:00.036 user=buildfarm > database=postgres host=[local] > 2022-04-08 12:59:30.740 EDT [92001:1] LOG: logical replication table > synchronization worker for subscription "sub2", table "tab4_1" has > started > 2022-04-08 12:59:30.744 EDT [91997:2] LOG: logical replication apply > worker for subscription "sub2" will restart because of a parameter > change > 2022-04-08 12:59:30.750 EDT [92003:1] LOG: logical replication table > synchronization worker for subscription "sub2", table "tab3" has > started > > The logs say that the apply worker for "sub2" finished whereas the > tablesync workers for "tab4_1" and "tab3" started. After these logs, > there are no logs that these tablesync workers finished and the apply > worker for "sub2" restarted, until the timeout. While reviewing the > code, I realized that the tablesync workers can advance its relstate > even without the apply worker intervention. > > After a tablesync worker copies the table it sets > SUBREL_STATE_SYNCWAIT to its relstate, then it waits for the apply > worker to update the relstate to SUBREL_STATE_CATCHUP. If the apply > worker has already died, it breaks from the wait loop and returns > false: > > wait_for_worker_state_change(): > > for (;;) > { > LogicalRepWorker *worker; > > : > > /* > * Bail out if the apply worker has died, else signal it we're > * waiting. > */ > LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); > worker = logicalrep_worker_find(MyLogicalRepWorker->subid, > InvalidOid, false); > if (worker && worker->proc) > logicalrep_worker_wakeup_ptr(worker); > LWLockRelease(LogicalRepWorkerLock); > if (!worker) > break; > > : > } > > return false; > > However, the caller doesn't check the return value at all: > > /* > * We are done with the initial data synchronization, update the state. > */ > SpinLockAcquire(>relmutex); > MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT; > MyLogicalRepWorker->relstate_lsn = *origin_startpos; > SpinLockRelease(>relmutex); > > /* > * Finally, wait until the main apply worker tells us to catch up and then > * return to let LogicalRepApplyLoop do it. > */ > wait_for_worker_state_change(SUBREL_STATE_CATCHUP); > return slotname; > > Therefore, the tablesync worker started logical replication while > keeping its relstate as SUBREL_STATE_SYNCWAIT. > > Given the server logs, it's likely that both tablesync workers for > "tab4_1" and "tab3" were in this situation. That is, there were two > tablesync workers who were applying changes for the target relation > but the relstate was SUBREL_STATE_SYNCWAIT. > > When it comes to starting the apply worker, probably it didn't happen > since there are already running tablesync workers as much as > max_sync_workers_per_subscription (2 by default): > > logicalrep_worker_launch(): > > /* > * If we reached the sync worker limit per
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
On Tue, Apr 12, 2022 at 06:22:48PM +0900, Michael Paquier wrote: > On Mon, Apr 11, 2022 at 12:46:02PM +, gkokola...@pm.me wrote: >> It looks good. If you choose to discard the comment regarding the use of >> 'method' over 'algorithm' from above, can you please use the full word in the >> variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I can >> not really explain it, the later reads a bit rude. Then again that may be >> just >> me. > > Thanks. I have been able to do an extra pass on 0001 and 0002, fixing > those naming inconsistencies with "algo" vs "algorithm" that you and > Robert have reported, and applied them. For 0003, I'll look at it > later. Attached is a rebase with improvements about the variable > names. This has been done with the proper renames. With that in place, I see no reason now to not be able to set the compression level as it is possible to pass it down with the options available. This requires only a couple of lines, as of the attached. LZ4 has a dummy structure called LZ4F_INIT_PREFERENCES to initialize LZ4F_preferences_t, that holds the compression level before passing it down to LZ4F_compressBegin(), but that's available only in v1.8.3. Using it risks lowering down the minimal version of LZ4 we are able to use now, but replacing that with a set of memset()s is also a way to set up things as per its documentation. Thoughts? -- Michael diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c index d5bcc208a9..1b3cc5c4f7 100644 --- a/src/bin/pg_basebackup/walmethods.c +++ b/src/bin/pg_basebackup/walmethods.c @@ -165,6 +165,7 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_ { size_t ctx_out; size_t header_size; + LZ4F_preferences_t prefs; ctx_out = LZ4F_createCompressionContext(, LZ4F_VERSION); if (LZ4F_isError(ctx_out)) @@ -177,8 +178,13 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_ lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, NULL); lz4buf = pg_malloc0(lz4bufsize); + /* assign the compression level, default is 0 */ + memset(, 0, sizeof(prefs)); + memset(, 0, sizeof(prefs.frameInfo)); + prefs.compressionLevel = dir_data->compression_level; + /* add the header */ - header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, NULL); + header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, ); if (LZ4F_isError(header_size)) { dir_data->lasterrstring = LZ4F_getErrorName(header_size); signature.asc Description: PGP signature
Re: Skipping schema changes in publication
On Wed, Apr 13, 2022 at 8:45 AM vignesh C wrote: > > On Tue, Apr 12, 2022 at 4:46 PM Amit Kapila wrote: > > > > I understand that part but what I pointed out was that it might be > > better to avoid using ADD keyword in this syntax like: ALTER > > PUBLICATION pub1 SKIP TABLE t1,t2; > > Currently we are supporting Alter publication using the following syntax: > ALTER PUBLICATION pub1 ADD TABLE t1; > ALTER PUBLICATION pub1 SET TABLE t1; > ALTER PUBLICATION pub1 DROP TABLE T1; > ALTER PUBLICATION pub1 ADD ALL TABLES IN SCHEMA sch1; > ALTER PUBLICATION pub1 SET ALL TABLES IN SCHEMA sch1; > ALTER PUBLICATION pub1 DROP ALL TABLES IN SCHEMA sch1; > > I have extended the new syntax in similar lines: > ALTER PUBLICATION pub1 ADD SKIP TABLE t1; > ALTER PUBLICATION pub1 SET SKIP TABLE t1; > ALTER PUBLICATION pub1 DROP SKIP TABLE T1; > > I did it like this to maintain consistency. > What is the difference between ADD and SET variants? I understand we need some way to remove the SKIP table setting but not sure if DROP is the best alternative. The other ideas could be: To set skip tables: ALTER PUBLICATION pub1 SKIP TABLE t1, t2...; To reset skip tables: ALTER PUBLICATION pub1 SKIP TABLE; /* basically an empty list*/ Yet another way to reset skip tables: ALTER PUBLICATION pub1 RESET SKIP TABLE; /* Here we need to introduce RESET. */ -- With Regards, Amit Kapila.
Re: deparsing utility commands
On Wed, Apr 13, 2022 at 2:29 PM Ajin Cherian wrote: > > > This patch-set has not been rebased for some time. I see that there is > interest in this patch from the logical > replication of DDL thread [1]. > Forgot to add the link to the thread. [1] - https://www.postgresql.org/message-id/202203162206.7spggyktx63e@alvherre.pgsql
Re: deparsing utility commands
On Wed, Apr 13, 2022 at 2:12 PM Shulgin, Oleksandr wrote: >> You seem to have squashed the patches? Please keep the split out. > > > Well, if that makes the review process easier :-) > > -- > Alex > This patch-set has not been rebased for some time. I see that there is interest in this patch from the logical replication of DDL thread [1]. I will take a stab at rebasing this patch-set, I have already rebased the first patch and will work on the other patches in the coming days. Do review and give me feedback. regards, Ajin Cherian Fujitsu Australia 0001-ddl_deparse-core-support.patch Description: Binary data
Re: Support logical replication of DDLs
On Wed, Apr 13, 2022 at 5:49 AM Zheng Li wrote: > > Hi, > > Here is the rebased new branch > https://github.com/zli236/postgres/commits/ddl_replication > Thanks, but it would be good if you can share it in the patch form as well unless you need to send patches too frequently. It is easier to give comments. -- With Regards, Amit Kapila.
Re: row filtering for logical replication
On Tue, Apr 12, 2022 at 6:16 PM Alvaro Herrera wrote: > > On 2022-Apr-12, Amit Kapila wrote: > > > It still has the same problem. The table can be dropped just before > > this message and the get_rel_name will return NULL and we don't expect > > that. > > Ugh, I forgot to change the errmsg() parts to use the new variable, > apologies. Fixed. > Thanks, this will work and fix the issue. I think this looks better than the current code, however, I am not sure if the handling for the concurrently dropped tables is better (both get_rel_relkind() and get_rel_name() can fail due to those reasons). I understand this won't fail because of the protection you have in the patch, so feel free to go ahead with this if you like this style better. -- With Regards, Amit Kapila.
Re: PG DOCS - logical replication filtering
On Wed, Apr 13, 2022 at 12:08 AM Euler Taveira wrote: > > On Tue, Apr 12, 2022, at 5:30 AM, Peter Smith wrote: > > Not changed. Because in fact, I copied most of this sentence > (including the uppercase, "operations", "and/or") from existing > documentation [1] > e.g. see "The tables added to a publication that publishes UPDATE > and/or DELETE operations must ..." > [1] https://www.postgresql.org/docs/current/sql-createpublication.html > > Hmm. You are checking the Notes. I'm referring the the publish parameter. IMO > this sentence should use operations in lowercase letters too because even if > you create it with uppercase letters, Postgres will provide a lowercase word > when you dump it. IIRC the row filter replication identity checking is at run-time based on the operation (not at DDL time based on the publish_parameter). For this reason, and also because this is the same format/wording used in many places already on the create_publication.sgml, I did not change this formatting. But I did modify [v10] as earlier suggested to replace the “and/or” with “or”, and also added another word “operation”. > > Yeah, I know the information is the same in the summary and in the > text. Personally, I find big slabs of technical text difficult to > digest, so I'd have to spend 5 minutes of careful reading and drawing > the exact same summary on a piece of paper anyway just to visualize > what the text says. The summary makes it easy to understand at a > glance. But I have modified the summary in [v9] to remove the "case" > and to add other column headings as suggested. > > Isn't it better to use a table instead of synopsis? Modified [v10] as suggested. > > Not changed. The readers of this docs page are all users who will be > familiar with the filter expressions, so I felt the "OR'ed together" > part will be perfectly clear to the intended audience. > > If you want to keep it, change it to "ORed". It is used in indices.sgml. Let's > keep the consistence. Modified [v10] as suggested. > > But I did not understand your point about “If, for some reason, > someone decided to change it, this section will contain obsolete > information”, because IIUC that will be equally true for both the > explanation and the output, so I did not understand why you say "psql > output is fine, the explanation is not". It is the business of this > documentation to help the user to know how and where they can find the > row filter information they may need to know. > > You are describing a psql command here. My point is keep psql explanation in > the psql section. This section is to describe the row filter feature. If we > start describing features in other sections, we will have outdated information > when the referred feature is changed and someone fails to find all references. > I tend to concentrate detailed explanation in the feature section. If I have > to > add links in other sections, I use "Seee Section 1.23 for details ...". Modified [v10] so the psql descriptions are now very generic. > > Not changed. The publisher and subscriber programlistings are always > separated. If you are looking at the rendered HTML I think it is quite > clear that one is at the publisher and one is at the subscriber. OTOH, > if we omitted creating the tables on the subscriber then I think that > really would cause some confusion. > > The difference is extra space. By default, the CSS does not include a border > for programlisting. That's why I complained about it. I noticed that the > website CSS includes it. However, the PDF will not include the border. I would > add a separate description for the subscriber just to be clear. > Modified [v10] as suggested to add a separate description for the subscriber. > One last suggestion, you are using identifiers in uppercase letters but > "primary key" is in lowercase. > Modified [v10] as suggested to make this uppercase -- [v10] https://www.postgresql.org/message-id/CAHut%2BPvMEYkCRWDoZSpFnP%2B5SExus7YzWAd%3D6ah9vwkfRhOnSg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: PG DOCS - logical replication filtering
PSA patch v10 which addresses the remaining review comments from Euler [1] -- [1] https://www.postgresql.org/message-id/3cd8d622-6a26-4eaf-a5aa-ac78030e5f50%40www.fastmail.com Kind Regards, Peter Smith. Fujitsu Australia v10-0001-Add-additional-documentation-for-row-filters.patch Description: Binary data
Re: Skipping schema changes in publication
On Tue, Apr 12, 2022 at 4:46 PM Amit Kapila wrote: > > On Tue, Apr 12, 2022 at 4:17 PM vignesh C wrote: > > > > On Tue, Apr 12, 2022 at 12:19 PM Amit Kapila > > wrote: > > > > > > > > > For the second syntax (Alter Publication ...), isn't it better to > > > avoid using ADD? It looks odd to me because we are not adding anything > > > in publication with this sytax. > > > > I was thinking of the scenario where user initially creates the > > publication for all tables: > > CREATE PUBLICATION pub1 FOR ALL TABLES; > > > > After that user decides to skip few tables ex: t1, t2 > > ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2; > > > > I thought of supporting this syntax if incase user decides to add the > > skipping of a few tables later. > > > > I understand that part but what I pointed out was that it might be > better to avoid using ADD keyword in this syntax like: ALTER > PUBLICATION pub1 SKIP TABLE t1,t2; Currently we are supporting Alter publication using the following syntax: ALTER PUBLICATION pub1 ADD TABLE t1; ALTER PUBLICATION pub1 SET TABLE t1; ALTER PUBLICATION pub1 DROP TABLE T1; ALTER PUBLICATION pub1 ADD ALL TABLES IN SCHEMA sch1; ALTER PUBLICATION pub1 SET ALL TABLES IN SCHEMA sch1; ALTER PUBLICATION pub1 DROP ALL TABLES IN SCHEMA sch1; I have extended the new syntax in similar lines: ALTER PUBLICATION pub1 ADD SKIP TABLE t1; ALTER PUBLICATION pub1 SET SKIP TABLE t1; ALTER PUBLICATION pub1 DROP SKIP TABLE T1; I did it like this to maintain consistency. But I'm fine doing it either way to keep it simple for the user. Regards, Vignesh
Re: Improving the "Routine Vacuuming" docs
On Tue, Apr 12, 2022 at 5:22 PM Peter Geoghegan wrote: > I just don't think that you need to make it any more complicated than > this: physical XID values are only meaningful when compared to other > XIDs from the same cluster. The system needs to make sure that no two > XIDs can ever be more than about 2 billion XIDs apart, and here's how > you as a DBA can help the system to make sure of that. > > I decided to run with that perspective and came up with the following rough draft. A decent amount of existing material I would either just remove or place elsewhere as "see for details". The following represents the complete section. David J. This vacuum responsibility is necessary due to the fact that a transaction ID (xid) has a lifetime of 2 billion transactions. The rows created by a given transaction (recorded in xmin) must be frozen prior to the expiration of the xid. (The expired xid values can then be resurrected, see ... for details). This is done by flagging the rows as frozen and thus visible for the remainder of the row's life. While vacuum will not touch a row's xmin while updating its frozen status, two reserved xid values may be seen. BootstreapTransactionId (1) may be seen on system catalog tables to indicate records inserted during initdb. FronzenTransactionID (2) may be seen on any table and also indicates that the row is frozen. This was the mechanism used in versions prior to 9.4, when it was decided to keep the xmin unchanged for forensic use. VACUUM uses the visibility map to determine which pages of a table must be scanned. Normally, it will skip pages that don't have any dead row versions even if those pages might still have row versions with old XID values. Therefore, normal VACUUMs won't always freeze every old row version in the table. When that happens, VACUUM will eventually need to perform an aggressive vacuum, which will freeze all eligible unfrozen XID and MXID values, including those from all-visible but not all-frozen pages. In practice most tables require periodic aggressive vacuuming. Thus, an aging transaction will potentially pass a number of milestone ages, controlled by various configuration settings or hard-coded into the server, as it awaits its fate either being memorialized cryogenically or in death. While the following speaks of an individual transaction's age, in practice each table has a relfrozenxid attribute which is used by system as a reference age as it is oldest potentially living transaction on the table (see xref for details). The first milestone is controlled by vacuum_freeze_min_age (50 million) and marks the age at which the row becomes eligible to become frozen. Next up is vacuum_freeze_table_age (120 million). Before this age the row can be frozen, but a non-aggressive vacuum may not encounter the row due to the visibility map optimizations described above. Vacuums performed while relfrozenxid is older than this age will be done aggressively. For tables where routine complete vacuuming doesn't happen the auto-vacuum daemon acts as a safety net. When the age of the row exceeds autovacuum_freeze_max_age (200 million) the autovacuum daemon, even if disabled for the table, will perform an anti-wraparound vacuum on the table (see below). Finally, as a measure of last resort, the system will begin emitting warnings (1.940 billion) and then (1.997 billion) shutdown. It may be restarted in single user mode for manual aggressive vacuuming. An anti-wraparound vacuum is much more expensive than an aggressive vacuum and so the gap between the vacuum_freeze_table_age and autovacuum_freeze_max_age should be somewhat large (vacuum age must be at most 95% of the autovacuum age to be meaningful). Transaction history and commit status storage requirements are directly related to autovacuum_freeze_max_age due to retention policies based upon that age. See xref ... for additional details. The reason for vacuum_freeze_min_age is to manage the trade-off between minimizing rows marked dead that are already frozen versus minimizing the number of rows being frozen aggressively.
Re: failures in t/031_recovery_conflict.pl on CI
Hi, On 2022-04-12 15:05:22 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2022-04-09 19:34:26 -0400, Tom Lane wrote: > >> +1. This is probably more feasible given the latch infrastructure > >> than it was when that code was first written. > > > What do you think about just reordering the disable_all_timeouts() to be > > before the got_standby_deadlock_timeout check in the back branches? I think > > that should close at least the most obvious hole. And fix it properly in > > HEAD? > > I don't have much faith in that, and I don't see why we can't fix it > properly. It's not too hard, agreed. It's somewhat hard to test that it works though. The new recovery conflict tests test all recovery conflicts except for deadlock ones, because they're not easy to trigger... But I think I nearly got it working reliably. It's probably worth backpatching the tests, after stripping them of the stats checks? Three questions: - For HEAD we have to replace the disable_all_timeouts() calls, it breaks the replay progress reporting. Is there a reason to keep them in the backbranches? Hard to see how an extension or something could rely on it, but ...? - I named the variable set by the signal handler got_standby_delay_timeout, because just got_standby_timeout looked weird besides the _deadlock_. Which makes me think that we should rename STANDBY_TIMEOUT to STANDBY_DELAY_TIMEOUT too? - There's the following comment in ResolveRecoveryConflictWithBufferPin(): "We assume that only UnpinBuffer() and the timeout requests established above can wake us up here." That bogus afaict? There's plenty other things that can cause MyProc->latch to be set. Is it worth doing something about this at the same time? Right now we seem to call ResolveRecoveryConflictWithBufferPin() in rapid succession initially. Greetings, Andres Freund
Re: Improving the "Routine Vacuuming" docs
On Tue, Apr 12, 2022 at 4:24 PM David G. Johnston wrote: > I've attached some off-the-cuff thoughts on reworking the first three > paragraphs and the note. > > It's hopefully useful for providing perspective if nothing else. More perspective is definitely helpful. > I'm assuming and caring only about visible rows when I'm reading this > section. Maybe we need to make that explicit - only xmin matters (and the > invisible frozen flag)? The statement "only xmin matters" is true in spirit. If xmax needed to be frozen then we'd actually remove the whole tuple instead (unless it was a MultiXact). Alternatively, if it looked like xmax needed to be frozen, but the XID turned out to have been from an aborted xact, then we'd clear the XID from xmax instead. Freezing tends to lag the removal of dead tuples, but that's just an optimization. If you set vacuum_freeze_min_age to 0 then freezing and dead tuple removal happen in tandem (actually, we can remove tuples inserted by an XID after VACUUM's OldestXmin/removal cutoff when the inserting xact aborts, but VACUUM makes no real promises about XIDs >= OldestXmin anyway). >> It might also be useful to describe freezing all of a live tuple's >> XIDs as roughly the opposite process as completely physically removing >> a dead tuple. It follows that we don't necessarily need to freeze >> anything to advance relfrozenxid (especially not on Postgres 15). > > > I failed to pickup on how this and "mod-2^32" math interplay, and I'm not > sure I care when reading this. It made more sense to consider "shortest > path" along the "circle". It would probably be possible to teach the system to deal with coexisting XIDs that are close to a full ~4 billion XIDs apart. We'd have to give up on the mod-2^32 comparison stuff in functions like TransactionIdPrecedes(), and carefully keep track of things per-table, and carry that context around a lot more. I certainly don't think that that's a good idea (if 2 billion XIDs wasn't enough, why should 4 billion XIDs be?), but it does seem feasible. My point is this: there is nothing particularly intuitive or natural about the current ~2 billion XID limit, even if you already know that XIDs are generally represented on disk as 32-bit unsigned integers. And so the fact is that we are already asking users to take it on faith that there are truly good reasons why the system cannot tolerate any scenario in which two unfrozen XIDs are more than about ~2 billion XIDs apart. Why not just admit that, and then deem the XID comparison rules out of scope for this particular chapter of the docs? *Maybe* it's still useful to discuss why things work that way in code like TransactionIdPrecedes(), but that's a totally different discussion -- it doesn't seem particularly relevant to the design of VACUUM, no matter the audience. Most DBAs will just accept that the "XID distance" limit/invariant is about ~2 billion XIDs for esoteric implementation reasons, with some vague idea of why it must be so (e.g., "32-bit integers don't have enough space"). They will be no worse off for it. (Bear in mind that mod-2^32 comparison stuff was only added when freezing/wraparound was first implemented back in 2001, by commit bc7d37a525.) >> Currently, "25.1.5. Preventing Transaction ID Wraparound Failures" >> says this, right up-front: >> >> "But since transaction IDs have limited size (32 bits) a cluster that >> runs for a long time (more than 4 billion transactions) would suffer >> transaction ID wraparound" > > > I both agree and disagree - where I settled (as of now) is reflected in the > patch. I just don't think that you need to make it any more complicated than this: physical XID values are only meaningful when compared to other XIDs from the same cluster. The system needs to make sure that no two XIDs can ever be more than about 2 billion XIDs apart, and here's how you as a DBA can help the system to make sure of that. Discussion of the past becoming the future just isn't helpful, because that simply cannot ever happen on any version of Postgres from the last decade. Freezing is more or less an overhead of storing data in Postgres long term (even medium term) -- that is the simple reality. We should say so. > I am wondering, for the more technical details, is there an existing place to > send xrefs, do you plan to create one, or is it likely unnecessary? I might end up doing that, but just want to get a general sense of how other hackers feel about it for now. -- Peter Geoghegan
Re: Support logical replication of DDLs
Hi, Here is the rebased new branch https://github.com/zli236/postgres/commits/ddl_replication Regards, Zheng
Re: Improving the "Routine Vacuuming" docs
On Tue, Apr 12, 2022 at 2:53 PM Peter Geoghegan wrote: > Recent work on VACUUM and relfrozenxid advancement required that I > update the maintenance.sgml VACUUM documentation ("Routine > Vacuuming"). It was tricky to keep things current, due in part to > certain structural problems. Many of these problems are artifacts of > how the document evolved over time. > > "Routine Vacuuming" ought to work as a high level description of how > VACUUM keeps the system going over time. The intended audience is > primarily DBAs, so low level implementation details should either be > given much less prominence, or not even mentioned. We should keep it > practical -- without going too far in the direction of assuming that > we know the limits of what information might be useful. > +1 I've attached some off-the-cuff thoughts on reworking the first three paragraphs and the note. It's hopefully useful for providing perspective if nothing else. > My high level concerns are: > > * Instead of discussing FrozenTransactionId (and then explaining how > that particular magic value is not really used anymore anyway), why > not describe freezing in terms of the high level rules? > Agreed and considered > > Something along the lines of the following seems more useful: "A tuple > whose xmin is frozen (and xmax is unset) is considered visible to > every possible MVCC snapshot. In other words, the transaction that > inserted the tuple is treated as if it ran and committed at some point > that is now *infinitely* far in the past." > I'm assuming and caring only about visible rows when I'm reading this section. Maybe we need to make that explicit - only xmin matters (and the invisible frozen flag)? > It might also be useful to describe freezing all of a live tuple's > XIDs as roughly the opposite process as completely physically removing > a dead tuple. It follows that we don't necessarily need to freeze > anything to advance relfrozenxid (especially not on Postgres 15). > I failed to pickup on how this and "mod-2^32" math interplay, and I'm not sure I care when reading this. It made more sense to consider "shortest path" along the "circle". > Currently, "25.1.5. Preventing Transaction ID Wraparound Failures" > says this, right up-front: > > "But since transaction IDs have limited size (32 bits) a cluster that > runs for a long time (more than 4 billion transactions) would suffer > transaction ID wraparound" > I both agree and disagree - where I settled (as of now) is reflected in the patch. > * The description of wraparound sounds terrifying, implying that data > corruption can result. > Agreed, though I just skimmed a bit after the material the patch covers. > > * XID space isn't really a precious resource -- it isn't even a > resource at all IMV. > Agreed > > * We don't cleanly separate discussion of anti-wraparound autovacuums, > and aggressive vacuums, and the general danger of wraparound (by which > I actually mean the danger of having the xidStopLimit stop limit kick > in). > Didn't really get this far. I am wondering, for the more technical details, is there an existing place to send xrefs, do you plan to create one, or is it likely unnecessary? David J. v0001-doc-reworking-of-vacuum-transaction-wraparound-section.patch Description: Binary data
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 05:46:36PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> If we allow changing GUCs in _PG_init() and provide another hook where >> MaxBackends will be initialized, do we need to introduce another GUC flag, >> or can we get away with just blocking all GUC changes when the new hook is >> called? I'm slightly hesitant to add a GUC flag that will need to manually >> maintained. Wouldn't it be easily forgotten? > > On the whole I think Robert's got the right idea: we do not really > need an enforcement mechanism at all. People who are writing > extensions that do this sort of thing will learn how to do it right. > (It's not like there's not a thousand other things they have to get > right.) Okay. So maybe we only need the attached patches. 0001 is just 5ecd018 un-reverted, and 0002 is Julien's patch from a few weeks ago with some minor edits. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 4c5ebca537ffdfbf61079a82b18ce7bc97222c69 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 12 Apr 2022 14:57:00 -0700 Subject: [PATCH v3 1/2] Fix comments about bgworker registration before MaxBackends initialization Since 6bc8ef0b, InitializeMaxBackends() has used max_worker_processes instead of adapting MaxBackends to the number of background workers registered by modules loaded in shared_preload_libraries (at this time, bgworkers were only static, but gained dynamic capabilities as a matter of supporting parallel queries meaning that a control cap was necessary). Some comments referred to the past registration logic, making them confusing and incorrect, so fix these. Some of the out-of-core modules that could be loaded in this path sometimes like to manipulate dynamically some of the resource-related GUCs for their own needs, this commit adds a note about that. Author: Nathan Bossart Discussion: https://postgr.es/m/20220127181815.GA551692@nathanxps13 --- src/backend/postmaster/postmaster.c | 10 -- src/backend/utils/init/postinit.c | 5 ++--- 2 files changed, 6 insertions(+), 9 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 3dcaf8a4a5..d57fefa9a8 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -1005,10 +1005,8 @@ PostmasterMain(int argc, char *argv[]) LocalProcessControlFile(false); /* - * Register the apply launcher. Since it registers a background worker, - * it needs to be called before InitializeMaxBackends(), and it's probably - * a good idea to call it before any modules had chance to take the - * background worker slots. + * Register the apply launcher. It's probably a good idea to call this + * before any modules had a chance to take the background worker slots. */ ApplyLauncherRegister(); @@ -1029,8 +1027,8 @@ PostmasterMain(int argc, char *argv[]) #endif /* - * Now that loadable modules have had their chance to register background - * workers, calculate MaxBackends. + * Now that loadable modules have had their chance to alter any GUCs, + * calculate MaxBackends. */ InitializeMaxBackends(); diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 9139fe895c..a28612b375 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -538,9 +538,8 @@ pg_split_opts(char **argv, int *argcp, const char *optstr) /* * Initialize MaxBackends value from config options. * - * This must be called after modules have had the chance to register background - * workers in shared_preload_libraries, and before shared memory size is - * determined. + * This must be called after modules have had the chance to alter GUCs in + * shared_preload_libraries and before shared memory size is determined. * * Note that in EXEC_BACKEND environment, the value is passed down from * postmaster to subprocesses via BackendParameters in SubPostmasterMain; only -- 2.25.1 >From 862a5b9a8a27995157a8b71b8f3ce09bf6ed17c5 Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Fri, 25 Mar 2022 11:05:24 +0800 Subject: [PATCH v3 2/2] Add a new shmem_request_hook hook. Currently, preloaded libraries are expected to request additional shared memory in _PG_init(). However, it is not unusal for such requests to depend on MaxBackends, which won't be initialized at that time. This introduces a new hook where modules can use MaxBackends to request additional shared memory. Author: Julien Rouhaud Discussion: https://postgr.es/m/20220412210112.GA2065815%40nathanxps13 --- src/backend/storage/ipc/ipci.c | 15 +++ src/include/storage/ipc.h| 2 ++ src/tools/pgindent/typedefs.list | 1 + 3 files changed, 18 insertions(+) diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c index 75e456360b..fdd72175d5 100644 --- a/src/backend/storage/ipc/ipci.c +++ b/src/backend/storage/ipc/ipci.c @@ -52,6 +52,7 @@ /* GUCs */ int
Improving the "Routine Vacuuming" docs
Recent work on VACUUM and relfrozenxid advancement required that I update the maintenance.sgml VACUUM documentation ("Routine Vacuuming"). It was tricky to keep things current, due in part to certain structural problems. Many of these problems are artifacts of how the document evolved over time. "Routine Vacuuming" ought to work as a high level description of how VACUUM keeps the system going over time. The intended audience is primarily DBAs, so low level implementation details should either be given much less prominence, or not even mentioned. We should keep it practical -- without going too far in the direction of assuming that we know the limits of what information might be useful. My high level concerns are: * Instead of discussing FrozenTransactionId (and then explaining how that particular magic value is not really used anymore anyway), why not describe freezing in terms of the high level rules? Something along the lines of the following seems more useful: "A tuple whose xmin is frozen (and xmax is unset) is considered visible to every possible MVCC snapshot. In other words, the transaction that inserted the tuple is treated as if it ran and committed at some point that is now *infinitely* far in the past." It might also be useful to describe freezing all of a live tuple's XIDs as roughly the opposite process as completely physically removing a dead tuple. It follows that we don't necessarily need to freeze anything to advance relfrozenxid (especially not on Postgres 15). * The general description of how the XID space works similarly places way too much emphasis on low level details that are of very little relevance. These details would even seem totally out of place if I was the intended audience. The problem isn't really that the information is too technical. The problem is that we emphasize mechanistic stuff while never quite explaining the point of it all. Currently, "25.1.5. Preventing Transaction ID Wraparound Failures" says this, right up-front: "But since transaction IDs have limited size (32 bits) a cluster that runs for a long time (more than 4 billion transactions) would suffer transaction ID wraparound" This is way too mechanistic. We totally muddle things by even mentioning 4 billion XIDs in the first place. It seems like a confusing artefact of a time before freezing was invented, back when you really could have XIDs that were more than 2 billion XIDs apart. This statement has another problem: it's flat-out untrue. The xidStopLimit stuff will reliably kick in at about 2 billion XIDs. * The description of wraparound sounds terrifying, implying that data corruption can result. The alarming language isn't proportionate to the true danger (something I complained about in a dedicated thread last year [1]). * XID space isn't really a precious resource -- it isn't even a resource at all IMV. ISTM that we should be discussing wraparound as an issue about the maximum *difference* between any two unfrozen XIDs in a cluster/installation. Talking about an abstract-sounding XID space seems to me to be quite counterproductive. The logical XID space is practically infinite, after all. We should move away from the idea that physical XID space is a precious resource. Sure, users are often concerned that the xidStopLimit mechanism might kick-in, effectively resulting in an outage. That makes perfect sense. But it doesn't follow that XIDs are precious, and implying that they are intrinsically valuable just confuses matters. First of all, physical XID space is usually abundantly available. A "distance" of ~2 billion XIDs is a vast distance in just about any application (barring those with pathological problems, such as a leaked replication slot). Second of all, Since the amount of physical freezing required to be able to advance relfrozenxid by any given amount (amount of XIDs) varies enormously, and is not even predictable for a given table (because individual tables don't get their own physical XID space), the age of datfrozenxid predicts very little about how close we are to having the dreaded xidStopLimit mechanism kick in. We do need some XID-wise slack, but that's just a way of absorbing shocks -- it's ballast, usually only really needed for one or two very large tables. Third of all, and most importantly, the whole idea that we can just put off freezing indefinitely and actually reduce the pain (rather than having a substantial increase in problems) seems to have just about no basis in reality, at least once you get into the tens of millions range (though usually well before that). Why should you be better off if all of your freezing occurs in one big balloon payment? Sometimes getting into debt for a while is useful, but why should it make sense to keep delaying freezing? And if it doesn't make sense, then why does it still make sense to treat XID space as a precious resource? * We don't cleanly separate discussion of anti-wraparound autovacuums, and aggressive vacuums, and
Re: make MaxBackends available in _PG_init
Nathan Bossart writes: > If we allow changing GUCs in _PG_init() and provide another hook where > MaxBackends will be initialized, do we need to introduce another GUC flag, > or can we get away with just blocking all GUC changes when the new hook is > called? I'm slightly hesitant to add a GUC flag that will need to manually > maintained. Wouldn't it be easily forgotten? On the whole I think Robert's got the right idea: we do not really need an enforcement mechanism at all. People who are writing extensions that do this sort of thing will learn how to do it right. (It's not like there's not a thousand other things they have to get right.) regards, tom lane
Re: make MaxBackends available in _PG_init
Robert Haas writes: > On Tue, Apr 12, 2022 at 4:58 PM Tom Lane wrote: >> It seems after a bit of reflection that what we ought to do is identify >> the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing, >> mark those with a new GUC flag, and not allow them to be changed after >> shared memory is set up. > I dunno, I feel like this is over-engineered. It probably is. I'm just offering this as a solution if people want to insist on a mechanism to prevent unsafe GUC changes. If we drop the idea of trying to forcibly prevent extensions from Doing It Wrong, then we don't need this, only the extra hook. regards, tom lane
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 4:58 PM Tom Lane wrote: > It's reasonable to allow changing *most* GUCs in _PG_init(); let us > please not break cases that work fine today. > > It seems after a bit of reflection that what we ought to do is identify > the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing, > mark those with a new GUC flag, and not allow them to be changed after > shared memory is set up. (An alternative to a new flag bit is to > subdivide the PGC_POSTMASTER context into two values, but that seems > like it'd be far more invasive. A flag bit needn't be user-visible.) > Then, what we'd need to do is arrange for shared-preload modules to > receive their _PG_init call before shared memory creation (when they > can still twiddle such GUCs) and then another callback later. I dunno, I feel like this is over-engineered. An extra hook wouldn't really affect anyone who doesn't need to use it, and I doubt that adapting to it would create much pain for anyone who is knowledgeable enough to be writing these kinds of extensions in the first place. This design is something everyone who adds a GUC from now until the end of the project needs to know about for the benefit of a tiny number of extension authors who weren't really going to have a big problem anyway. -- Robert Haas EDB: http://www.enterprisedb.com
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 04:58:42PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> I think it'd be reasonable to allow changing custom GUCs in _PG_init(). >> Those aren't going to impact things like MaxBackends. > > It's reasonable to allow changing *most* GUCs in _PG_init(); let us > please not break cases that work fine today. Right, this is a fair point. > It seems after a bit of reflection that what we ought to do is identify > the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing, > mark those with a new GUC flag, and not allow them to be changed after > shared memory is set up. (An alternative to a new flag bit is to > subdivide the PGC_POSTMASTER context into two values, but that seems > like it'd be far more invasive. A flag bit needn't be user-visible.) > Then, what we'd need to do is arrange for shared-preload modules to > receive their _PG_init call before shared memory creation (when they > can still twiddle such GUCs) and then another callback later. If we allow changing GUCs in _PG_init() and provide another hook where MaxBackends will be initialized, do we need to introduce another GUC flag, or can we get away with just blocking all GUC changes when the new hook is called? I'm slightly hesitant to add a GUC flag that will need to manually maintained. Wouldn't it be easily forgotten? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 04:33:39PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> On Tue, Apr 12, 2022 at 03:12:42PM -0400, Robert Haas wrote: >>> But if there's even one use case where adjusting GUCs at this phase is >>> reasonable, then 0003 isn't really good enough. We need an 0004 that >>> provides a new hook in a place where such changes can safely be made. > >> I think that is doable. IMO it should be ѕomething like _PG_change_GUCs() >> that is called before _PG_init(). The other option is to add a hook called >> after _PG_init() where MaxBackends is available (in which case we likely >> want GetMaxBackends() again). Thoughts? > > I like the second option. Calling into a module before we've called its > _PG_init function is just weird, and will cause no end of confusion. Alright. I think that is basically what Julien recommended a few weeks ago [0]. Besides possibly reintroducing GetMaxBackends(), we might also want to block SetConfigOption() when called in this new hook. [0] https://postgr.es/m/20220325031146.cd23gaak5qlzdy6l%40jrouhaud -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: make MaxBackends available in _PG_init
Nathan Bossart writes: > I think it'd be reasonable to allow changing custom GUCs in _PG_init(). > Those aren't going to impact things like MaxBackends. It's reasonable to allow changing *most* GUCs in _PG_init(); let us please not break cases that work fine today. It seems after a bit of reflection that what we ought to do is identify the subset of PGC_POSTMASTER GUCs that feed into shared memory sizing, mark those with a new GUC flag, and not allow them to be changed after shared memory is set up. (An alternative to a new flag bit is to subdivide the PGC_POSTMASTER context into two values, but that seems like it'd be far more invasive. A flag bit needn't be user-visible.) Then, what we'd need to do is arrange for shared-preload modules to receive their _PG_init call before shared memory creation (when they can still twiddle such GUCs) and then another callback later. regards, tom lane
Re: make MaxBackends available in _PG_init
Nathan Bossart writes: > On Tue, Apr 12, 2022 at 03:12:42PM -0400, Robert Haas wrote: >> But if there's even one use case where adjusting GUCs at this phase is >> reasonable, then 0003 isn't really good enough. We need an 0004 that >> provides a new hook in a place where such changes can safely be made. > I think that is doable. IMO it should be ѕomething like _PG_change_GUCs() > that is called before _PG_init(). The other option is to add a hook called > after _PG_init() where MaxBackends is available (in which case we likely > want GetMaxBackends() again). Thoughts? I like the second option. Calling into a module before we've called its _PG_init function is just weird, and will cause no end of confusion. regards, tom lane
Re: WIP: WAL prefetch (another approach)
On Wed, Apr 13, 2022 at 3:57 AM Dagfinn Ilmari Mannsåker wrote: > Simon Riggs writes: > > This is a nice feature if it is safe to turn off full_page_writes. As other have said/shown, it does also help if a block with FPW is evicted and then read back in during one checkpoint cycle, in other words if the working set is larger than shared buffers. This also provides infrastructure for proposals in the next cycle, as part of commitfest #3316: * in direct I/O mode, I/O stalls become more likely due to lack of kernel prefetching/double-buffering, so prefetching becomes more essential * even in buffered I/O mode when benefiting from free double-buffering, the copy from kernel buffer to user space buffer can be finished in the background instead of calling pread() when you need the page, but you need to start it sooner * adjacent blocks accessed by nearby records can be merged into a single scatter-read, for example with preadv() in the background * repeated buffer lookups, pins, locks (and maybe eventually replay) to the same page can be consolidated Pie-in-the-sky ideas: * someone might eventually want to be able to replay in parallel (hard, but certainly requires lookahead) * I sure hope we'll eventually use different techniques for torn-page protection to avoid the high online costs of FPW > > When is it safe to do that? On which platform? > > > > I am not aware of any released software that allows full_page_writes > > to be safely disabled. Perhaps something has been released recently > > that allows this? I think we have substantial documentation about > > safety of other settings, so we should carefully document things here > > also. > > Our WAL reliability docs claim that ZFS is safe against torn pages: > > https://www.postgresql.org/docs/current/wal-reliability.html: > > If you have file-system software that prevents partial page writes > (e.g., ZFS), you can turn off this page imaging by turning off the > full_page_writes parameter. Unfortunately, posix_fadvise(WILLNEED) doesn't do anything on ZFS right now :-(. I have some patches to fix that on Linux[1] and FreeBSD and it seems like there's a good chance of getting them committed based on feedback, but it needs some more work on tests and mmap integration. If anyone's interested in helping get that landed faster, please ping me off-list. [1] https://github.com/openzfs/zfs/pull/9807
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 03:30:23PM -0400, Robert Haas wrote: > On Tue, Apr 12, 2022 at 3:22 PM Tom Lane wrote: >> Yeah. It's a very long way from "changing shared memory sizes here >> doesn't work" to "no extension is allowed to change any GUC at load >> time". A counterexample is that it's not clear why an extension >> shouldn't be allowed to define a GUC using DefineCustomXXXVariable >> and then immediately set it to some other value. I think trying to >> forbid that across-the-board is going to mostly result in breaking >> a lot of cases that are perfectly safe. > > Hmm, that's an interesting case which I hadn't considered. It seems > like sort of a lame thing to do, because why wouldn't you just create > the GUC with the right initial value instead of modifying it after the > fact? But maybe there's some use case in which it makes sense to do it > that way. A read-only GUC that advertises some calculated value, > perhaps? I think it'd be reasonable to allow changing custom GUCs in _PG_init(). Those aren't going to impact things like MaxBackends. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 03:12:42PM -0400, Robert Haas wrote: > But if there's even one use case where adjusting GUCs at this phase is > reasonable, then 0003 isn't really good enough. We need an 0004 that > provides a new hook in a place where such changes can safely be made. I think that is doable. IMO it should be ѕomething like _PG_change_GUCs() that is called before _PG_init(). The other option is to add a hook called after _PG_init() where MaxBackends is available (in which case we likely want GetMaxBackends() again). Thoughts? > Meanwhile, committed 0001. Thanks. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: How about a psql backslash command to show GUCs?
On 4/12/22 1:00 PM, Tom Lane wrote: "Jonathan S. Katz" writes: On 4/12/22 11:19 AM, Tom Lane wrote: It'd just look like this, I think. I see from looking at guc.c that boot_val can be NULL, so we'd better use IS DISTINCT FROM. I tested it and I like this a lot better, at least it's much more consolidated. They all seem to be generated (directories, timezones, collations/encodings). Yeah, most of what shows up in a minimally-configured installation is postmaster-computed settings like config_file, rather than things that were actually set by the DBA. Personally I'd rather hide the ones that have source = 'override', but that didn't seem to be the consensus. The list seems more reasonable now, though now that I'm fully in the "less is more" camp based on the "non-defaults" description, I think anything we can do to further prune is good. The one exception to this seems to be "max_stack_depth", which is rendering on my "\dconfig" though I didn't change it, an it's showing it's default value of 2MB. "boot_val" says 100, "reset_val" says 2048, and it's commented out in my postgresql.conf. Do we want to align that? I don't think there's any principled thing we could do about that in psql. The boot_val is a conservatively small 100kB, but we crank that up automatically based on getrlimit(RLIMIT_STACK), so on any reasonable platform it's going to show as not being default. Got it. We may be at a point where it's "good enough" and let more people chime in during beta. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 3:22 PM Tom Lane wrote: > Yeah. It's a very long way from "changing shared memory sizes here > doesn't work" to "no extension is allowed to change any GUC at load > time". A counterexample is that it's not clear why an extension > shouldn't be allowed to define a GUC using DefineCustomXXXVariable > and then immediately set it to some other value. I think trying to > forbid that across-the-board is going to mostly result in breaking > a lot of cases that are perfectly safe. Hmm, that's an interesting case which I hadn't considered. It seems like sort of a lame thing to do, because why wouldn't you just create the GUC with the right initial value instead of modifying it after the fact? But maybe there's some use case in which it makes sense to do it that way. A read-only GUC that advertises some calculated value, perhaps? -- Robert Haas EDB: http://www.enterprisedb.com
Re: make MaxBackends available in _PG_init
Robert Haas writes: > Gah, I hate putting this off for another year, but I guess I'm also > not convinced that 0003 has the right idea, so maybe it's for the > best. Here's what's bugging me: 0003 decrees that _PG_init() is the > Wrong Place to Adjust GUC Settings. Now, that begs the question: what > is the right place to adjust GUC settings? I argued upthread that > extensions shouldn't be overriding values from postgresql.conf at all, > but on further reflection I think there's at least one legitimate use > case for this sort of thing: some kind of auto-tuning module that, for > example, looks at how much memory you have and sets shared_buffers or > work_mem or something based on the result. Yeah. It's a very long way from "changing shared memory sizes here doesn't work" to "no extension is allowed to change any GUC at load time". A counterexample is that it's not clear why an extension shouldn't be allowed to define a GUC using DefineCustomXXXVariable and then immediately set it to some other value. I think trying to forbid that across-the-board is going to mostly result in breaking a lot of cases that are perfectly safe. regards, tom lane
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 2:03 PM Nathan Bossart wrote: > On Tue, Apr 12, 2022 at 10:44:27AM -0700, Andres Freund wrote: > > On 2022-04-11 14:14:35 -0700, Nathan Bossart wrote: > >> Here's a new patch set. I've only changed 0003 as described above. My > >> apologies for the unnecessary round trip. > > > > ISTM we shouldn't apply 0002, 0003 to master before we've branches 15 off.. > > That seems reasonable to me. Gah, I hate putting this off for another year, but I guess I'm also not convinced that 0003 has the right idea, so maybe it's for the best. Here's what's bugging me: 0003 decrees that _PG_init() is the Wrong Place to Adjust GUC Settings. Now, that begs the question: what is the right place to adjust GUC settings? I argued upthread that extensions shouldn't be overriding values from postgresql.conf at all, but on further reflection I think there's at least one legitimate use case for this sort of thing: some kind of auto-tuning module that, for example, looks at how much memory you have and sets shared_buffers or work_mem or something based on the result. It's arguable how well something like this can actually work, but we probably shouldn't try to prevent people from doing it. I'm a little less clear why Citus thinks this is an appropriate thing for it to do, vs. telling the user to configure a useful value, and I'd be curious to hear an explanation around that. But if there's even one use case where adjusting GUCs at this phase is reasonable, then 0003 isn't really good enough. We need an 0004 that provides a new hook in a place where such changes can safely be made. Meanwhile, committed 0001. -- Robert Haas EDB: http://www.enterprisedb.com
Re: failures in t/031_recovery_conflict.pl on CI
Andres Freund writes: > On 2022-04-09 19:34:26 -0400, Tom Lane wrote: >> +1. This is probably more feasible given the latch infrastructure >> than it was when that code was first written. > What do you think about just reordering the disable_all_timeouts() to be > before the got_standby_deadlock_timeout check in the back branches? I think > that should close at least the most obvious hole. And fix it properly in > HEAD? I don't have much faith in that, and I don't see why we can't fix it properly. Don't we just need to have the signal handler set MyLatch, and then do the unsafe stuff back in the "if (got_standby_deadlock_timeout)" stanza in mainline? regards, tom lane
Re: Frontend error logging style
Peter Eisentraut writes: > On 11.04.22 17:22, Tom Lane wrote: >> The patch I presented keeps the unlikely() checks in the debug-level >> macros. Do you think we should drop those too? I figured that avoiding >> evaluating the arguments would be worth something. > Oh, that's right, the whole thing is to not evaluate the arguments if > the log level isn't adequate. We should probably keep that. I don't think that's nearly as interesting in the frontend as in the backend. Error messages in the backend frequently include catalog lookups and the like in the arguments, but I think nearly all frontend messages are writing nothing more complicated than variables and maybe some indirections or array fetches. So I'm not all that worried about runtime, and would rather save some code space. regards, tom lane
Re: failures in t/031_recovery_conflict.pl on CI
Hi, On 2022-04-09 19:34:26 -0400, Tom Lane wrote: > Andres Freund writes: > > It's been broken in different ways all the way back to 9.0, from what I can > > see, but I didn't check every single version. > > > Afaics the fix is to nuke the idea of doing anything substantial in the > > signal > > handler from orbit, and instead just set a flag in the handler. > > +1. This is probably more feasible given the latch infrastructure > than it was when that code was first written. What do you think about just reordering the disable_all_timeouts() to be before the got_standby_deadlock_timeout check in the back branches? I think that should close at least the most obvious hole. And fix it properly in HEAD? Greetings, Andres Freund
Re: Frontend error logging style
On 11.04.22 17:22, Tom Lane wrote: Peter Eisentraut writes: On 08.04.22 22:26, Tom Lane wrote: I think we should put a centralized level check into logging.c, and get rid of at least the "if (likely())" checks, because those are going to succeed approximately 100.0% of the time. Maybe there's an argument for keeping the unlikely() ones. Yeah, that seems ok to change. The previous coding style is more useful if you have a lot of debug messages in a hot code path, but that usually doesn't apply to where this is used. The patch I presented keeps the unlikely() checks in the debug-level macros. Do you think we should drop those too? I figured that avoiding evaluating the arguments would be worth something. Oh, that's right, the whole thing is to not evaluate the arguments if the log level isn't adequate. We should probably keep that. Is the code size a big problem? ereport() has a bunch of extra code around each call as well. Does it have similar problems?
Re: WIP: WAL prefetch (another approach)
> Other way around. FPWs make prefetch unnecessary. > Therefore you would only want prefetch with FPW=off, AFAIK. > A few scenarios I can imagine page prefetch can help are, 1/ A DR replica instance that is smaller instance size than primary. Page prefetch can bring the pages back into memory in advance when they are evicted. This speeds up the replay and is cost effective. 2/ Allows larger checkpoint_timeout for the same recovery SLA and perhaps improved performance? 3/ WAL prefetch (not pages by itself) can improve replay by itself (not sure if it was measured in isolation, Tomas V can comment on it). 4/ Read replica running analytical workload scenario Tomas V mentioned earlier. > > Or put this another way: when is it safe and sensible to use > recovery_prefetch != off? > When checkpoint_timeout is set large and under heavy write activity, on a read replica that has working set higher than the memory and receiving constant updates from primary. This covers 1 & 4 above. > -- > Simon Riggshttp://www.EnterpriseDB.com/ > > >
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 10:44:27AM -0700, Andres Freund wrote: > On 2022-04-11 14:14:35 -0700, Nathan Bossart wrote: >> Here's a new patch set. I've only changed 0003 as described above. My >> apologies for the unnecessary round trip. > > ISTM we shouldn't apply 0002, 0003 to master before we've branches 15 off.. That seems reasonable to me. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: CLUSTER sort on abbreviated expressions is broken
On Sun, Apr 3, 2022 at 4:34 PM Thomas Munro wrote: > I probably should have made it clearer in the commit message, > cc58eecc5 doesn't fix this problem in the master branch. It only > fixes the code that incorrectly assumed that datum1 was always > available. Attached patch fixes the issue, and includes the test case that you posted. There is only a one line change to tuplesort.c. This is arguably the same bug -- abbreviation is just another "haveDatum1 optimization" that needs to be accounted for. -- Peter Geoghegan v1-0001-Fix-CLUSTER-sort-on-abbreviated-expressions.patch Description: Binary data
Re: make MaxBackends available in _PG_init
Hi, On 2022-04-11 14:14:35 -0700, Nathan Bossart wrote: > On Mon, Apr 11, 2022 at 01:44:42PM -0700, Nathan Bossart wrote: > > On Mon, Apr 11, 2022 at 04:36:36PM -0400, Robert Haas wrote: > >> If we throw an error while defining_custom_guc is true, how will it > >> ever again become false? > > > > Ah, I knew I was forgetting something this morning. > > > > It won't, but the only place it is presently needed is when loading > > shared_preload_libraries, so I believe startup will fail anyway. However, > > I can see defining_custom_guc being used elsewhere, so that is probably not > > good enough. Another approach could be to add a static > > set_config_option_internal() function with a boolean argument to indicate > > whether it is being used while defining a custom GUC. I'll adjust 0003 > > with that approach unless a better idea prevails. > > Here's a new patch set. I've only changed 0003 as described above. My > apologies for the unnecessary round trip. ISTM we shouldn't apply 0002, 0003 to master before we've branches 15 off.. Greetings, Andres Freund
Re: How about a psql backslash command to show GUCs?
"Jonathan S. Katz" writes: > On 4/12/22 11:19 AM, Tom Lane wrote: >> It'd just look like this, I think. I see from looking at guc.c that >> boot_val can be NULL, so we'd better use IS DISTINCT FROM. > I tested it and I like this a lot better, at least it's much more > consolidated. They all seem to be generated (directories, timezones, > collations/encodings). Yeah, most of what shows up in a minimally-configured installation is postmaster-computed settings like config_file, rather than things that were actually set by the DBA. Personally I'd rather hide the ones that have source = 'override', but that didn't seem to be the consensus. > The one exception to this seems to be "max_stack_depth", which is > rendering on my "\dconfig" though I didn't change it, an it's showing > it's default value of 2MB. "boot_val" says 100, "reset_val" says 2048, > and it's commented out in my postgresql.conf. Do we want to align that? I don't think there's any principled thing we could do about that in psql. The boot_val is a conservatively small 100kB, but we crank that up automatically based on getrlimit(RLIMIT_STACK), so on any reasonable platform it's going to show as not being default. regards, tom lane
Re: How about a psql backslash command to show GUCs?
On 4/12/22 11:19 AM, Tom Lane wrote: "Jonathan S. Katz" writes: On 4/11/22 4:11 PM, Tom Lane wrote: This idea does somewhat address my unhappiness upthread about printing values with source = 'internal', but I see that it gets confused by some GUCs with custom show hooks, like unix_socket_permissions. Maybe it needs to be "source != 'default' AND setting != boot_val"? Running through a few GUCs, that seems reasonable. Happy to test the patch out prior to commit to see if it renders better. It'd just look like this, I think. I see from looking at guc.c that boot_val can be NULL, so we'd better use IS DISTINCT FROM. (IS DISTINCT FROM is pretty handy :) I tested it and I like this a lot better, at least it's much more consolidated. They all seem to be generated (directories, timezones, collations/encodings). The one exception to this seems to be "max_stack_depth", which is rendering on my "\dconfig" though I didn't change it, an it's showing it's default value of 2MB. "boot_val" says 100, "reset_val" says 2048, and it's commented out in my postgresql.conf. Do we want to align that? That said, the patch itself LGTM. Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: avoid multiple hard links to same WAL file after a crash
On Tue, Apr 12, 2022 at 03:46:31PM +0900, Kyotaro Horiguchi wrote: > At Mon, 11 Apr 2022 09:52:57 -0700, Nathan Bossart > wrote in >> I traced this back a while ago. I believe the link() was first added in >> November 2000 as part of f0e37a8. This even predates WAL recycling, which >> was added in July 2001 as part of 7d4d5c0. > > f0e37a8 lacks discussion.. It introduced the CHECKPOINT command from > somwhere out of the ML.. This patch changed XLogFileInit to > supportusing existent files so that XLogWrite can use the new segment > provided by checkpoint and still allow XLogWrite to create a new > segment by itself. Yeah, I've been unable to find any discussion besides a brief reference to adding checkpointing [0]. [0] https://postgr.es/m/8F4C99C66D04D4118F580090272A7A23018D85%40sectorbase1.sectorbase.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: make MaxBackends available in _PG_init
On Tue, Apr 12, 2022 at 01:08:35PM +0800, Julien Rouhaud wrote: > It looks sensible to me, although I think I would apply 0003 before 0002. Great. > + if (process_shared_preload_libraries_in_progress && > + !allow_when_loading_preload_libs) > + ereport(ERROR, > + (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM), > +errmsg("cannot change parameters while loading " > + "\"shared_preload_libraries\""))); > + > > I think the error message should mention at least which GUC is being modified. My intent was to make it clear that no parameters can be updated while loading s_p_l, but I agree that we should mention the specific GUC somewhere. Maybe this could go in a hint? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: WIP: WAL prefetch (another approach)
On 4/12/22 17:46, Simon Riggs wrote: > On Tue, 12 Apr 2022 at 16:41, Tomas Vondra > wrote: >> >> On 4/12/22 15:58, Simon Riggs wrote: >>> On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: >>> With that... I've finally pushed the 0002 patch and will be watching the build farm. >>> >>> This is a nice feature if it is safe to turn off full_page_writes. >>> >>> When is it safe to do that? On which platform? >>> >>> I am not aware of any released software that allows full_page_writes >>> to be safely disabled. Perhaps something has been released recently >>> that allows this? I think we have substantial documentation about >>> safety of other settings, so we should carefully document things here >>> also. >>> >> >> I don't see why/how would an async prefetch make FPW unnecessary. Did >> anyone claim that be the case? > > Other way around. FPWs make prefetch unnecessary. > Therefore you would only want prefetch with FPW=off, AFAIK. > > Or put this another way: when is it safe and sensible to use > recovery_prefetch != off? > That assumes the FPI stays in memory until the next modification, and that can be untrue for a number of reasons. Long checkpoint interval with enough random accesses in between is a nice example. See the benchmarks I did a year ago (regular pgbench). Or imagine a r/o replica used to run analytics queries, that access so much data it evicts the buffers initialized by the FPI records. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: WIP: WAL prefetch (another approach)
Simon Riggs writes: > On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: > >> With that... I've finally pushed the 0002 patch and will be watching >> the build farm. > > This is a nice feature if it is safe to turn off full_page_writes. > > When is it safe to do that? On which platform? > > I am not aware of any released software that allows full_page_writes > to be safely disabled. Perhaps something has been released recently > that allows this? I think we have substantial documentation about > safety of other settings, so we should carefully document things here > also. Our WAL reliability docs claim that ZFS is safe against torn pages: https://www.postgresql.org/docs/current/wal-reliability.html: If you have file-system software that prevents partial page writes (e.g., ZFS), you can turn off this page imaging by turning off the full_page_writes parameter. - ilmari
Re: WIP: WAL prefetch (another approach)
On Tue, 12 Apr 2022 at 16:41, Tomas Vondra wrote: > > On 4/12/22 15:58, Simon Riggs wrote: > > On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: > > > >> With that... I've finally pushed the 0002 patch and will be watching > >> the build farm. > > > > This is a nice feature if it is safe to turn off full_page_writes. > > > > When is it safe to do that? On which platform? > > > > I am not aware of any released software that allows full_page_writes > > to be safely disabled. Perhaps something has been released recently > > that allows this? I think we have substantial documentation about > > safety of other settings, so we should carefully document things here > > also. > > > > I don't see why/how would an async prefetch make FPW unnecessary. Did > anyone claim that be the case? Other way around. FPWs make prefetch unnecessary. Therefore you would only want prefetch with FPW=off, AFAIK. Or put this another way: when is it safe and sensible to use recovery_prefetch != off? -- Simon Riggshttp://www.EnterpriseDB.com/
Re: WIP: WAL prefetch (another approach)
On 4/12/22 15:58, Simon Riggs wrote: > On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: > >> With that... I've finally pushed the 0002 patch and will be watching >> the build farm. > > This is a nice feature if it is safe to turn off full_page_writes. > > When is it safe to do that? On which platform? > > I am not aware of any released software that allows full_page_writes > to be safely disabled. Perhaps something has been released recently > that allows this? I think we have substantial documentation about > safety of other settings, so we should carefully document things here > also. > I don't see why/how would an async prefetch make FPW unnecessary. Did anyone claim that be the case? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: random() function documentation
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?= writes: > Dean Rasheed writes: >> I think it'd be sufficient to just say that it's a deterministic >> pseudorandom number generator. I don't see much value in documenting >> the internal algorithm used. > WFM on both points. Sold then, I'll make it so. regards, tom lane
Re: How about a psql backslash command to show GUCs?
"Jonathan S. Katz" writes: > On 4/11/22 4:11 PM, Tom Lane wrote: >> This idea does somewhat address my unhappiness upthread about printing >> values with source = 'internal', but I see that it gets confused by >> some GUCs with custom show hooks, like unix_socket_permissions. >> Maybe it needs to be "source != 'default' AND setting != boot_val"? > Running through a few GUCs, that seems reasonable. Happy to test the > patch out prior to commit to see if it renders better. It'd just look like this, I think. I see from looking at guc.c that boot_val can be NULL, so we'd better use IS DISTINCT FROM. regards, tom lane diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index e7377d4583..17790bd8a4 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -4410,7 +4410,8 @@ describeConfigurationParameters(const char *pattern, bool verbose, NULL, "pg_catalog.lower(s.name)", NULL, NULL); else - appendPQExpBufferStr(, "WHERE s.source <> 'default'\n"); + appendPQExpBufferStr(, "WHERE s.source <> 'default' AND\n" + " s.setting IS DISTINCT FROM s.boot_val\n"); appendPQExpBufferStr(, "ORDER BY 1;");
Re: random() function documentation
Dean Rasheed writes: > On Mon, 11 Apr 2022 at 20:20, Tom Lane wrote: >> >> >> How about we just say "uses a linear-feedback shift register algorithm"? > > I think it'd be sufficient to just say that it's a deterministic > pseudorandom number generator. I don't see much value in documenting > the internal algorithm used. > >> > Should we >> > perhaps also add a warning that the same seed is not guaranteed to >> > produce the same sequence across different (major?) versions? >> >> I wouldn't bother, on the grounds that then we'd need such disclaimers >> in a whole lot of places. Others might see it differently though. > > Agreed, though I think when the release notes are written, they ought > to warn that the sequence will change with this release. WFM on both points. > Regards, > Dean - ilmari
Re: random() function documentation
Fabien COELHO writes: >>> How about we just say "uses a linear-feedback shift register algorithm"? >> I think it'd be sufficient to just say that it's a deterministic >> pseudorandom number generator. I don't see much value in documenting >> the internal algorithm used. > Hmmm… I'm not so sure. ISTM that people interested in using the random > user-facing variants (only random?) could like a pointer on the algorithm > to check for the expected quality of the produced pseudo-random stream? > See attached. I don't want to get that specific. We were not specific before and there has been no call for such detail in the docs. (Unlike closed-source software, anybody who really wants algorithmic details can find all they want to know in the source code.) It would just amount to another thing to forget to update next time someone changes the algorithm ... which is a consideration that leads me to favor Dean's phrasing. regards, tom lane
Re: Temporary file access API
On Mon, 11 Apr 2022 at 10:05, Antonin Houska wrote: > > Robert Haas wrote: > > > On Fri, Apr 8, 2022 at 4:54 AM Antonin Houska wrote: > > > Do you think that the use of a system call is a problem itself (e.g. > > > because > > > the code looks less simple if read/write is used somewhere and > > > fread/fwrite > > > elsewhere; of course of read/write is mandatory in special cases like WAL, > > > heap pages, etc.) or is the problem that the system calls are used too > > > frequently? I suppose only the latter. > > > > I'm not really super-interested in reducing the number of system > > calls. It's not a dumb thing in which to be interested and I know that > > for example Thomas Munro is very interested in it and has done a bunch > > of work in that direction just to improve performance. But for me the > > attraction of this is mostly whether it gets us closer to TDE. > > > > And that's why I'm asking these questions about adopting it in > > different places. I kind of thought that your previous patches needed > > to encrypt, I don't know, 10 or 20 different kinds of files. So I was > > surprised to see this patch touching the handling of only 2 kinds of > > files. If we consolidate the handling of let's say 15 of 20 cases into > > a single mechanism, we've really moved the needle in the right > > direction -- but consolidating the handling of 2 of 20 cases, or > > whatever the real numbers are, isn't very exciting. > > There are't really that many kinds of files to encrypt: > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data > > (And pg_stat/* files should be removed from the list.) I was looking at that list of files that contain user data, and noticed that all relation forks except the main fork were marked as 'does not contain user data'. To me this seems not necessarily true: AMs do have access to forks for user data storage as well (without any real issues or breaking the abstraction), and the init-fork is expected to store user data (specifically in the form of unlogged sequences). Shouldn't those forks thus also be encrypted-by-default, or should we provide some other method to ensure that non-main forks with user data are encrypted? - Matthias
Re: support for MERGE
Em ter., 12 de abr. de 2022 às 10:47, Alvaro Herrera < alvhe...@alvh.no-ip.org> escreveu: > On 2022-Apr-02, Ranier Vilela wrote: > > > Em sáb., 2 de abr. de 2022 às 12:01, Alvaro Herrera < > alvhe...@alvh.no-ip.org> > > escreveu: > > > IMHO, actually there are bug here. > > ExecGetChildToRootMap is clear, is possible returning NULL. > > To discover if the map is NULL, ExecGetChildToRootMap needs to process > > "ResultRelInfo *leaf_part_rri". > > So, the argument "if the map is NULL, this function should not be > called", > > is contradictory. > > I was not explicit enough. I meant "if no map is needed to adjust > columns, then this function should not be called". The caller already > knows if it's needed or not; it doesn't depend on literally testing > 'map'. If somebody mis-calls this function, it would have crashed, yes; > but that's a caller bug, not this function's. > Thanks for the explanation. > > A few days ago, the community Coverity also complained about this, so I > added an Assert that the map is not null, which should silence it. > Thanks for hardening this. > > > If the right fix is to return the original list, here is the patch > attached. > > ... for a buggy caller (one that calls it when unnecessary), then yes > this would be the correct code -- except that now the caller doesn't > know if the returned list needs to be freed or not. So it seems better > to avoid accumulating pointless calls to this function by just not > coping with them. > Sure, it is always better to avoid doing work, unless strictly necessary. regards, Ranier Vilela
Re: PG DOCS - logical replication filtering
On Tue, Apr 12, 2022, at 5:30 AM, Peter Smith wrote: > Not changed. Because in fact, I copied most of this sentence > (including the uppercase, "operations", "and/or") from existing > documentation [1] > e.g. see "The tables added to a publication that publishes UPDATE > and/or DELETE operations must ..." > [1] https://www.postgresql.org/docs/current/sql-createpublication.html Hmm. You are checking the Notes. I'm referring the the publish parameter. IMO this sentence should use operations in lowercase letters too because even if you create it with uppercase letters, Postgres will provide a lowercase word when you dump it. > Yeah, I know the information is the same in the summary and in the > text. Personally, I find big slabs of technical text difficult to > digest, so I'd have to spend 5 minutes of careful reading and drawing > the exact same summary on a piece of paper anyway just to visualize > what the text says. The summary makes it easy to understand at a > glance. But I have modified the summary in [v9] to remove the "case" > and to add other column headings as suggested. Isn't it better to use a table instead of synopsis? > Not changed. The readers of this docs page are all users who will be > familiar with the filter expressions, so I felt the "OR'ed together" > part will be perfectly clear to the intended audience. If you want to keep it, change it to "ORed". It is used in indices.sgml. Let's keep the consistence. > But I did not understand your point about “If, for some reason, > someone decided to change it, this section will contain obsolete > information”, because IIUC that will be equally true for both the > explanation and the output, so I did not understand why you say "psql > output is fine, the explanation is not". It is the business of this > documentation to help the user to know how and where they can find the > row filter information they may need to know. You are describing a psql command here. My point is keep psql explanation in the psql section. This section is to describe the row filter feature. If we start describing features in other sections, we will have outdated information when the referred feature is changed and someone fails to find all references. I tend to concentrate detailed explanation in the feature section. If I have to add links in other sections, I use "Seee Section 1.23 for details ...". > Not changed. The publisher and subscriber programlistings are always > separated. If you are looking at the rendered HTML I think it is quite > clear that one is at the publisher and one is at the subscriber. OTOH, > if we omitted creating the tables on the subscriber then I think that > really would cause some confusion. The difference is extra space. By default, the CSS does not include a border for programlisting. That's why I complained about it. I noticed that the website CSS includes it. However, the PDF will not include the border. I would add a separate description for the subscriber just to be clear. One last suggestion, you are using identifiers in uppercase letters but "primary key" is in lowercase. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: WIP: WAL prefetch (another approach)
On Thu, 7 Apr 2022 at 08:46, Thomas Munro wrote: > With that... I've finally pushed the 0002 patch and will be watching > the build farm. This is a nice feature if it is safe to turn off full_page_writes. When is it safe to do that? On which platform? I am not aware of any released software that allows full_page_writes to be safely disabled. Perhaps something has been released recently that allows this? I think we have substantial documentation about safety of other settings, so we should carefully document things here also. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: row filtering for logical replication
On 2022-Apr-12, Amit Kapila wrote: > I mean that it fetches the tuple from the RELOID cache and then > performs relkind and other checks similar to what we are doing. I > think it could also have used get_rel_relkind() but probably not done > because it doesn't have a lock on the relation. Ah, but that one uses a lot more fields from the pg_class tuple in the non-error path. We only need relkind, up until we know the error is to be thrown. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: support for MERGE
On 2022-Apr-02, Ranier Vilela wrote: > Em sáb., 2 de abr. de 2022 às 12:01, Alvaro Herrera > escreveu: > IMHO, actually there are bug here. > ExecGetChildToRootMap is clear, is possible returning NULL. > To discover if the map is NULL, ExecGetChildToRootMap needs to process > "ResultRelInfo *leaf_part_rri". > So, the argument "if the map is NULL, this function should not be called", > is contradictory. I was not explicit enough. I meant "if no map is needed to adjust columns, then this function should not be called". The caller already knows if it's needed or not; it doesn't depend on literally testing 'map'. If somebody mis-calls this function, it would have crashed, yes; but that's a caller bug, not this function's. A few days ago, the community Coverity also complained about this, so I added an Assert that the map is not null, which should silence it. > If the right fix is to return the original list, here is the patch attached. ... for a buggy caller (one that calls it when unnecessary), then yes this would be the correct code -- except that now the caller doesn't know if the returned list needs to be freed or not. So it seems better to avoid accumulating pointless calls to this function by just not coping with them. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I suspect most samba developers are already technically insane... Of course, since many of them are Australians, you can't tell." (L. Torvalds)
Re: Temporary file access API
On Tue, Apr 12, 2022 at 5:30 AM Antonin Houska wrote: > Robert Haas wrote: > > On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska wrote: > > > There are't really that many kinds of files to encrypt: > > > > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data > > > > > > (And pg_stat/* files should be removed from the list.) > > > > This kind of gets into some theoretical questions. Like, do we think > > that it's an information leak if people can look at how many > > transactions are committing and aborting in pg_xact_status? In theory > > it could be, but I know it's been argued that that's too much of a > > side channel. I'm not sure I believe that, but it's arguable. > > I was referring to the fact that the statistics are no longer stored in files: > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd Oh, yeah, I agree with that. I was saying that I'm not altogether a believer in the idea that we need not encrypt SLRU files. TBH, I think that no matter what we do there are going to be side channel leaks, where some security researcher demonstrates that they can infer something based on file length or file creation rate or something. It seems inevitable. But the more stuff we don't encrypt, the easier those attacks are going to be. On the other hand, encrypting more stuff also makes the project harder. It's hard for me to judge what the right balance is here. Maybe it's OK to exclude that stuff for an initial version and just disclaim the heck out of it, but I don't think that should be our long term strategy. > > I really don't know how you can argue that pg_dynshmem/mmap.NNN > > doesn't contain user data - surely it can. > > Good point. Since postgres does not control writing into this file, it's a > special case though. (Maybe TDE will have to reject to start if > dynamic_shared_memory_type is set to mmap and the instance is encrypted.) Interesting point. -- Robert Haas EDB: http://www.enterprisedb.com
GSOC: New and improved website for pgjdbc (JDBC) (2022)
Hi, I'm keshav, and I have updated my proposal. kindly accept my changes. postgreSql_ New and improved website for pgjdbc (JDBC) (2022).pdf Description: Adobe PDF document
Re: row filtering for logical replication
On 2022-Apr-12, Amit Kapila wrote: > It still has the same problem. The table can be dropped just before > this message and the get_rel_name will return NULL and we don't expect > that. Ugh, I forgot to change the errmsg() parts to use the new variable, apologies. Fixed. > Also, is there a reason that you haven't kept the test case added by Hou-San? None. I put it back here. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ >From f23be23c27eb9bed7350745233f4660f4c5b326a Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 12 Apr 2022 12:59:50 +0200 Subject: [PATCH v4] fixup checking for rowfilter/collist on altering publication --- src/backend/commands/publicationcmds.c| 89 +++ src/test/regress/expected/publication.out | 16 +++- src/test/regress/sql/publication.sql | 8 ++ 3 files changed, 63 insertions(+), 50 deletions(-) diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 7aacb6b2fe..d2b9f95f6d 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -945,60 +945,57 @@ AlterPublicationOptions(ParseState *pstate, AlterPublicationStmt *stmt, foreach(lc, root_relids) { - HeapTuple rftuple; Oid relid = lfirst_oid(lc); - bool has_column_list; - bool has_row_filter; + char relkind; + char *relname; + HeapTuple rftuple; + bool has_rowfilter; + bool has_collist; + + /* Beware: we don't have lock on the relations */ rftuple = SearchSysCache2(PUBLICATIONRELMAP, ObjectIdGetDatum(relid), ObjectIdGetDatum(pubform->oid)); - - has_row_filter -= !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL); - - has_column_list -= !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL); - - if (HeapTupleIsValid(rftuple) && -(has_row_filter || has_column_list)) + if (!HeapTupleIsValid(rftuple)) +continue; + has_rowfilter = !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL); + has_collist = !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL); + if (!has_rowfilter && !has_collist) { -HeapTuple tuple; - -tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); -if (HeapTupleIsValid(tuple)) -{ - Form_pg_class relform = (Form_pg_class) GETSTRUCT(tuple); - - if ((relform->relkind == RELKIND_PARTITIONED_TABLE) && - has_row_filter) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot set %s for publication \"%s\"", - "publish_via_partition_root = false", - stmt->pubname), - errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" " - "which is not allowed when %s is false.", - NameStr(relform->relname), - "publish_via_partition_root"))); - - if ((relform->relkind == RELKIND_PARTITIONED_TABLE) && - has_column_list) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot set %s for publication \"%s\"", - "publish_via_partition_root = false", - stmt->pubname), - errdetail("The publication contains a column list for a partitioned table \"%s\" " - "which is not allowed when %s is false.", - NameStr(relform->relname), - "publish_via_partition_root"))); - - ReleaseSysCache(tuple); -} - ReleaseSysCache(rftuple); +continue; } + + relkind = get_rel_relkind(relid); + if (relkind != RELKIND_PARTITIONED_TABLE) + { +ReleaseSysCache(rftuple); +continue; + } + relname = get_rel_name(relid); + if (relname == NULL) /* table concurrently dropped */ + { +ReleaseSysCache(rftuple); +continue; + } + + if (has_rowfilter) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set parameter \"%s\" to false for publication \"%s\"", +"publish_via_partition_root", +stmt->pubname), + errdetail("The publication contains a WHERE clause for partitioned table \"%s\" which is not allowed when \"%s\" is false.", + relname, "publish_via_partition_root"))); + Assert(has_collist); + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set parameter \"%s\" to false for publication \"%s\"", + "publish_via_partition_root", + stmt->pubname), + errdetail("The publication contains a column list for partitioned table \"%s\", which is not allowed when \"%s\" is false.", + relname, "publish_via_partition_root"))); } } diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out index 8208f9fa0e..cc9c8990ae 100644 --- a/src/test/regress/expected/publication.out +++ b/src/test/regress/expected/publication.out @@ -588,8 +588,12 @@ UPDATE rf_tbl_abcd_part_pk SET a = 1; -- fail
Re: row filtering for logical replication
On Tue, Apr 12, 2022 at 5:01 PM Alvaro Herrera wrote: > > On 2022-Apr-12, Amit Kapila wrote: > > > On Tue, Apr 12, 2022 at 3:45 PM Amit Kapila wrote: > > > > We don't have a lock on the relation, so if it gets dropped > > > concurrently, it won't behave sanely. For example, get_rel_name() will > > > return NULL which seems incorrect to me. > > Oh, oops ... a trap for the unwary? Anyway, yes, we can disregard the > entry when get_rel_name returns null. Amended patch attached. > > > > I am not sure about this as well because you will instead do a RELOID > > > cache lookup even when there is no row filter or column list. > > I guess my assumption is that the pg_class cache is typically more > populated than other relcaches, but that's unsubstantiated. I'm not > sure if we have any way to tell which one is the more common case. > Anyway, let's do it the way you already had it. > > > It seems to me that we have a similar coding pattern in > > ExecGrant_Relation(). > > Not sure what you mean? > I mean that it fetches the tuple from the RELOID cache and then performs relkind and other checks similar to what we are doing. I think it could also have used get_rel_relkind() but probably not done because it doesn't have a lock on the relation. -- With Regards, Amit Kapila.
Re: row filtering for logical replication
On Tue, Apr 12, 2022 at 5:12 PM Alvaro Herrera wrote: > > Sorry, I think I neglected to "git add" some late changes. > + if (has_rowfilter) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set parameter \"%s\" to false for publication \"%s\"", + "publish_via_partition_root", + stmt->pubname), + errdetail("The publication contains a WHERE clause for partitioned table \"%s\" which is not allowed when \"%s\" is false.", +get_rel_name(relid), +"publish_via_partition_root"))); It still has the same problem. The table can be dropped just before this message and the get_rel_name will return NULL and we don't expect that. Also, is there a reason that you haven't kept the test case added by Hou-San? -- With Regards, Amit Kapila.
Re: row filtering for logical replication
Sorry, I think I neglected to "git add" some late changes. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ >From e7569ed4c4a01f782f9326ebc9a3c9049973ef4b Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 12 Apr 2022 12:59:50 +0200 Subject: [PATCH v3] fixup checking for rowfilter/collist on altering publication --- src/backend/commands/publicationcmds.c| 91 +++ src/test/regress/expected/publication.out | 8 +- 2 files changed, 49 insertions(+), 50 deletions(-) diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 7aacb6b2fe..5aa5201055 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -945,60 +945,59 @@ AlterPublicationOptions(ParseState *pstate, AlterPublicationStmt *stmt, foreach(lc, root_relids) { - HeapTuple rftuple; Oid relid = lfirst_oid(lc); - bool has_column_list; - bool has_row_filter; + char relkind; + char *relname; + HeapTuple rftuple; + bool has_rowfilter; + bool has_collist; + + /* Beware: we don't have lock on the relations */ rftuple = SearchSysCache2(PUBLICATIONRELMAP, ObjectIdGetDatum(relid), ObjectIdGetDatum(pubform->oid)); - - has_row_filter -= !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL); - - has_column_list -= !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL); - - if (HeapTupleIsValid(rftuple) && -(has_row_filter || has_column_list)) + if (!HeapTupleIsValid(rftuple)) +continue; + has_rowfilter = !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL); + has_collist = !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL); + if (!has_rowfilter && !has_collist) { -HeapTuple tuple; - -tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); -if (HeapTupleIsValid(tuple)) -{ - Form_pg_class relform = (Form_pg_class) GETSTRUCT(tuple); - - if ((relform->relkind == RELKIND_PARTITIONED_TABLE) && - has_row_filter) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot set %s for publication \"%s\"", - "publish_via_partition_root = false", - stmt->pubname), - errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" " - "which is not allowed when %s is false.", - NameStr(relform->relname), - "publish_via_partition_root"))); - - if ((relform->relkind == RELKIND_PARTITIONED_TABLE) && - has_column_list) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot set %s for publication \"%s\"", - "publish_via_partition_root = false", - stmt->pubname), - errdetail("The publication contains a column list for a partitioned table \"%s\" " - "which is not allowed when %s is false.", - NameStr(relform->relname), - "publish_via_partition_root"))); - - ReleaseSysCache(tuple); -} - ReleaseSysCache(rftuple); +continue; } + + relkind = get_rel_relkind(relid); + if (relkind != RELKIND_PARTITIONED_TABLE) + { +ReleaseSysCache(rftuple); +continue; + } + relname = get_rel_name(relid); + if (relname == NULL) /* table concurrently dropped */ + { +ReleaseSysCache(rftuple); +continue; + } + + if (has_rowfilter) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set parameter \"%s\" to false for publication \"%s\"", +"publish_via_partition_root", +stmt->pubname), + errdetail("The publication contains a WHERE clause for partitioned table \"%s\" which is not allowed when \"%s\" is false.", + get_rel_name(relid), + "publish_via_partition_root"))); + Assert(has_collist); + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set parameter \"%s\" to false for publication \"%s\"", + "publish_via_partition_root", + stmt->pubname), + errdetail("The publication contains a column list for partitioned table \"%s\", which is not allowed when \"%s\" is false.", + get_rel_name(relid), + "publish_via_partition_root"))); } } diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out index 8208f9fa0e..37cf11e785 100644 --- a/src/test/regress/expected/publication.out +++ b/src/test/regress/expected/publication.out @@ -588,8 +588,8 @@ UPDATE rf_tbl_abcd_part_pk SET a = 1; -- fail - cannot set PUBLISH_VIA_PARTITION_ROOT to false if any row filter is -- used for partitioned table ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=0); -ERROR: cannot set publish_via_partition_root = false for publication "testpub6" -DETAIL: The publication contains a WHERE clause for a partitioned table
Re: row filtering for logical replication
On 2022-Apr-12, Amit Kapila wrote: > On Tue, Apr 12, 2022 at 3:45 PM Amit Kapila wrote: > > We don't have a lock on the relation, so if it gets dropped > > concurrently, it won't behave sanely. For example, get_rel_name() will > > return NULL which seems incorrect to me. Oh, oops ... a trap for the unwary? Anyway, yes, we can disregard the entry when get_rel_name returns null. Amended patch attached. > > I am not sure about this as well because you will instead do a RELOID > > cache lookup even when there is no row filter or column list. I guess my assumption is that the pg_class cache is typically more populated than other relcaches, but that's unsubstantiated. I'm not sure if we have any way to tell which one is the more common case. Anyway, let's do it the way you already had it. > It seems to me that we have a similar coding pattern in ExecGrant_Relation(). Not sure what you mean? In that function, when the syscache lookup returns NULL, an error is raised. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El número de instalaciones de UNIX se ha elevado a 10, y se espera que este número aumente" (UPM, 1972) >From 631a6d04cbe420164833dd4e88a86d0e076fd47d Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 12 Apr 2022 12:59:50 +0200 Subject: [PATCH v2] fixup checking for rowfilter/collist on altering publication --- src/backend/commands/publicationcmds.c| 91 +++ src/test/regress/expected/publication.out | 8 +- 2 files changed, 49 insertions(+), 50 deletions(-) diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 7aacb6b2fe..59fc39e9f2 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -945,60 +945,59 @@ AlterPublicationOptions(ParseState *pstate, AlterPublicationStmt *stmt, foreach(lc, root_relids) { - HeapTuple rftuple; Oid relid = lfirst_oid(lc); - bool has_column_list; - bool has_row_filter; + char relkind; + char *relname; + HeapTuple rftuple; + bool has_rowfilter; + bool has_collist; + + /* Beware: we don't have lock on the relations */ rftuple = SearchSysCache2(PUBLICATIONRELMAP, ObjectIdGetDatum(relid), ObjectIdGetDatum(pubform->oid)); - - has_row_filter -= !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL); - - has_column_list -= !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL); - - if (HeapTupleIsValid(rftuple) && -(has_row_filter || has_column_list)) + if (!HeapTupleIsValid(rftuple)) +continue; + has_rowfilter = !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL); + has_collist = !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL); + if (!has_rowfilter && !has_collist) { -HeapTuple tuple; - -tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); -if (HeapTupleIsValid(tuple)) -{ - Form_pg_class relform = (Form_pg_class) GETSTRUCT(tuple); - - if ((relform->relkind == RELKIND_PARTITIONED_TABLE) && - has_row_filter) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot set %s for publication \"%s\"", - "publish_via_partition_root = false", - stmt->pubname), - errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" " - "which is not allowed when %s is false.", - NameStr(relform->relname), - "publish_via_partition_root"))); - - if ((relform->relkind == RELKIND_PARTITIONED_TABLE) && - has_column_list) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot set %s for publication \"%s\"", - "publish_via_partition_root = false", - stmt->pubname), - errdetail("The publication contains a column list for a partitioned table \"%s\" " - "which is not allowed when %s is false.", - NameStr(relform->relname), - "publish_via_partition_root"))); - - ReleaseSysCache(tuple); -} - ReleaseSysCache(rftuple); +continue; } + + relkind = get_rel_relkind(relid); + if (relkind != RELKIND_PARTITIONED_TABLE) + { +ReleaseSysCache(rftuple); +continue; + } + relname = get_rel_name(relid); + if (relname == NULL) /* table concurrently dropped */ + { +ReleaseSysCache(rftuple); +continue; + } + + if (has_rowfilter) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set parameter \"%s\" to false for publication \"%s\"", +"publish_via_partition_root", +stmt->pubname), + errdetail("The publication includes partitioned table \"%s\" with a WHERE clause, which is not allowed when \"%s\" is false.", + get_rel_name(relid), + "publish_via_partition_root"))); + Assert(has_collist); + ereport(ERROR, +
Re: typos
On Mon, Apr 11, 2022 at 4:15 PM Amit Kapila wrote: > > On Mon, Apr 11, 2022 at 3:55 PM Masahiko Sawada wrote: > > > > On Mon, Apr 11, 2022 at 7:10 PM Justin Pryzby wrote: > > > > > > Amit or Masahiko may want to comment on 0012 (doc review: Add ALTER > > > SUBSCRIPTION ... SKIP). > > > > +1. I'll take care of pushing this one tomorrow unless we have more > comments on this part. > I have pushed this one. -- With Regards, Amit Kapila.
Re: Skipping schema changes in publication
On Tue, Apr 12, 2022 at 4:17 PM vignesh C wrote: > > On Tue, Apr 12, 2022 at 12:19 PM Amit Kapila wrote: > > > > > > For the second syntax (Alter Publication ...), isn't it better to > > avoid using ADD? It looks odd to me because we are not adding anything > > in publication with this sytax. > > I was thinking of the scenario where user initially creates the > publication for all tables: > CREATE PUBLICATION pub1 FOR ALL TABLES; > > After that user decides to skip few tables ex: t1, t2 > ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2; > > I thought of supporting this syntax if incase user decides to add the > skipping of a few tables later. > I understand that part but what I pointed out was that it might be better to avoid using ADD keyword in this syntax like: ALTER PUBLICATION pub1 SKIP TABLE t1,t2; -- With Regards, Amit Kapila.
Re: Support logical replication of DDLs
On Mon, Apr 11, 2022 at 11:01 PM Zheng Li wrote: > > >Even if this works, how will we make Alter Table statement work where > >it needs to rewrite the table? There also I think we can face a > >similar problem if we directly send the statement, once the table will > >be updated due to the DDL statement and then again due to table > >rewrite as that will have a separate WAL. > > Yes, I think any DDL that can generate DML changes should be listed > out and handled properly or documented. Here is one extreme example > involving volatile functions: > ALTER TABLE nd_ddl ADD COLUMN t timestamp DEFAULT now(). > Again, I think we need to somehow skip the data rewrite on the > subscriber when replicating such DDL and let DML replication handle > the rewrite. > I am not sure what is the right way to deal with this but another idea we can investigate is to probably rewrite the DDL such that it doesn't rewrite the table. > >Another somewhat unrelated problem I see with this work is how to save > >recursion of the same command between nodes (when the involved nodes > >replicate DDLs). For DMLs, we can avoid that via replication origins > >as is being done in the patch proposed [1] but not sure how will we > >deal with that here? > > I'll need to investigate "recursion of the same command between > nodes", could you provide an example? > See email [1]. I think the same solution should work for your proposal as well because I see that LogLogicalMessage includes origin but it is better if you can confirm it. [1] - https://www.postgresql.org/message-id/flat/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubh...@mail.gmail.com -- With Regards, Amit Kapila.
Re: Support logical replication of DDLs
On Mon, Apr 11, 2022 at 6:16 PM Euler Taveira wrote: > > On Mon, Apr 11, 2022, at 2:00 AM, Amit Kapila wrote: > > On Thu, Apr 7, 2022 at 3:46 PM Amit Kapila wrote: > > > > On Wed, Mar 23, 2022 at 10:39 AM Japin Li wrote: > > > > 2. For DDL replication, do we need to wait for a consistent point of > > snapshot? For DMLs, that point is a convenient point to initialize > > replication from, which is why we export a snapshot at that point, > > which is used to read normal data. Do we have any similar needs for > > DDL replication? > > > > I have thought a bit more about this and I think we need to build the > snapshot for DML replication as we need to read catalog tables to > decode the corresponding WAL but it is not clear to me if we have a > similar requirement for DDL replication. If the catalog access is > required then it makes sense to follow the current snapshot model, > otherwise, we may need to think differently for DDL replication. > > One more related point is that for DML replication, we do ensure that > we copy the entire data of the table (via initial sync) which exists > even before the publication for that table exists, so do we want to do > something similar for DDLs? How do we sync the schema of the table > before the user has defined the publication? Say the table has been > created before the publication is defined and after that, there are > only Alter statements, so do we expect, users to create the table on > the subscriber and then we can replicate the Alter statements? And > even if we do that it won't be clear which Alter statements will be > replicated after publication is defined especially if those Alters > happened concurrently with defining publications? > > The *initial* DDL replication is a different problem than DDL replication. The > former requires a snapshot to read the current catalog data and build a CREATE > command as part of the subscription process. The subsequent DDLs in that > object > will be handled by a different approach that is being discussed here. > I think they are not completely independent because of the current way to do initial sync followed by replication. The initial sync and replication need some mechanism to ensure that one of those doesn't overwrite the work done by the other. Now, the initial idea and patch can be developed separately but I think both the patches have some dependency. -- With Regards, Amit Kapila.
Re: API stability [was: pgsql: Fix possible recovery trouble if TRUNCATE overlaps a checkpoint.]
(My mailer has been fixed.) At Mon, 11 Apr 2022 21:45:59 +0200, Markus Wanner wrote in > On Mon, 2022-04-11 at 15:21 -0400, Robert Haas wrote: > > ... before v13, the commit in question actually > > changed the size of PGXACT, which is really quite bad -- it needs to > > be 12 bytes for performance reasons. And there's no spare bytes > > available, so I think we should follow one of the suggestions that he > > had over in that email thread, and put delayChkptEnd in PGPROC even > > though delayChkpt is in PGXACT. > > This makes sense to me. Kudos to Kyotaro for considering this. > > At first read, this sounded like a trade-off between compatibility and > performance for PG 12 and older. But I realize leaving delayChkpt in > PGXACT and adding just delayChkptEnd to PGPROC is compatible and leaves > PGXACT at a size of 12 bytes. So this sounds like a good approach to > me. Thanks! So, I created the patches for back-patching from 10 to 14. (With fixed a silly bug of the v1-pg14 that HaveVirtualXIDsDelayingChkpt and HaveVirtualXIDsDelayingChkptEnd are inverted..) They revert delayChkpt-related changes made by the patch and add delayChkptEnd stuff. I compared among every pair of the patches for neighbouring versions, to make sure not making the same change in different way and they have the same set of hunks. This version takes the way sharing the common static function (*ChkptGuts) between the functions *Chkpt() and *ChkptEnd(). regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 471ca4aa38a92f837ed4e4aeda40648c3b0b0a9b Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 6 Apr 2022 15:47:50 +0900 Subject: [PATCH v2] Fix ABI/API break Commit bbace5697d caused ABI break by changing the type and width of PGPROC.delayChkpt and API break by changing the signature of some public functions. Restore the member and function signature to previous state and add delayChkptEnd to an existing gap in PGPROC struct. Backpatch to 10, all supported branches. --- src/backend/access/transam/multixact.c | 6 +-- src/backend/access/transam/twophase.c | 13 +++--- src/backend/access/transam/xact.c | 5 +-- src/backend/access/transam/xlog.c | 10 ++--- src/backend/access/transam/xloginsert.c | 2 +- src/backend/catalog/storage.c | 6 +-- src/backend/storage/buffer/bufmgr.c | 6 +-- src/backend/storage/ipc/procarray.c | 60 +++-- src/backend/storage/lmgr/proc.c | 6 ++- src/include/storage/proc.h | 42 +++-- src/include/storage/procarray.h | 8 ++-- 11 files changed, 82 insertions(+), 82 deletions(-) diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 50d8bab9e2..b643564f16 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -3075,8 +3075,8 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB) * crash/basebackup, even though the state of the data directory would * require it. */ - Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0); - MyProc->delayChkpt |= DELAY_CHKPT_START; + Assert(!MyProc->delayChkpt); + MyProc->delayChkpt = true; /* WAL log truncation */ WriteMTruncateXlogRec(newOldestMultiDB, @@ -3102,7 +3102,7 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB) /* Then offsets */ PerformOffsetsTruncation(oldestMulti, newOldestMulti); - MyProc->delayChkpt &= ~DELAY_CHKPT_START; + MyProc->delayChkpt = false; END_CRIT_SECTION(); LWLockRelease(MultiXactTruncationLock); diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index dea3f485f7..911d93fbf4 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -474,7 +474,8 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, } proc->xid = xid; Assert(proc->xmin == InvalidTransactionId); - proc->delayChkpt = 0; + proc->delayChkpt = false; + proc->delayChkptEnd = false; proc->statusFlags = 0; proc->pid = 0; proc->databaseId = databaseid; @@ -1165,8 +1166,7 @@ EndPrepare(GlobalTransaction gxact) START_CRIT_SECTION(); - Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0); - MyProc->delayChkpt |= DELAY_CHKPT_START; + MyProc->delayChkpt = true; XLogBeginInsert(); for (record = records.head; record != NULL; record = record->next) @@ -1209,7 +1209,7 @@ EndPrepare(GlobalTransaction gxact) * checkpoint starting after this will certainly see the gxact as a * candidate for fsyncing. */ - MyProc->delayChkpt &= ~DELAY_CHKPT_START; + MyProc->delayChkpt = false; /* * Remember that we have this GlobalTransaction entry locked for us. If @@ -2276,8 +2276,7 @@ RecordTransactionCommitPrepared(TransactionId xid, START_CRIT_SECTION(); /* See notes in RecordTransactionCommit */ - Assert((MyProc->delayChkpt & DELAY_CHKPT_START) == 0); -
Re: Skipping schema changes in publication
On Tue, Apr 12, 2022 at 12:19 PM Amit Kapila wrote: > > On Tue, Apr 12, 2022 at 11:53 AM vignesh C wrote: > > > > On Sat, Mar 26, 2022 at 7:37 PM vignesh C wrote: > > > > > > On Tue, Mar 22, 2022 at 12:38 PM vignesh C wrote: > > > > > > > > Hi, > > > > > > > > This feature adds an option to skip changes of all tables in specified > > > > schema while creating publication. > > > > This feature is helpful for use cases where the user wants to > > > > subscribe to all the changes except for the changes present in a few > > > > schemas. > > > > Ex: > > > > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2; > > > > OR > > > > ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2; > > > > > > > > A new column pnskip is added to table "pg_publication_namespace", to > > > > maintain the schemas that the user wants to skip publishing through > > > > the publication. Modified the output plugin (pgoutput) to skip > > > > publishing the changes if the relation is part of skip schema > > > > publication. > > > > As a continuation to this, I will work on implementing skipping tables > > > > from all tables in schema and skipping tables from all tables > > > > publication. > > > > > > > > Attached patch has the implementation for this. > > > > > > The patch was not applying on top of HEAD because of the recent > > > commits, attached patch is rebased on top of HEAD. > > > > The patch does not apply on top of HEAD because of the recent commit, > > attached patch is rebased on top of HEAD. > > > > I have also included the implementation for skipping a few tables from > > all tables publication, the 0002 patch has the implementation for the > > same. > > This feature is helpful for use cases where the user wants to > > subscribe to all the changes except for the changes present in a few > > tables. > > Ex: > > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2; > > OR > > ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2; > > > > For the second syntax (Alter Publication ...), isn't it better to > avoid using ADD? It looks odd to me because we are not adding anything > in publication with this sytax. I was thinking of the scenario where user initially creates the publication for all tables: CREATE PUBLICATION pub1 FOR ALL TABLES; After that user decides to skip few tables ex: t1, t2 ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2; I thought of supporting this syntax if incase user decides to add the skipping of a few tables later. Thoughts? Regards, Vignesh
Re: row filtering for logical replication
On Tue, Apr 12, 2022 at 3:45 PM Amit Kapila wrote: > > On Tue, Apr 12, 2022 at 2:35 PM Alvaro Herrera > wrote: > > > > I understand that this is a minimal fix, and for that it seems OK, but I > > think the surrounding style is rather baroque. This code can be made > > simpler. Here's my take on it. > > > > We don't have a lock on the relation, so if it gets dropped > concurrently, it won't behave sanely. For example, get_rel_name() will > return NULL which seems incorrect to me. > It seems to me that we have a similar coding pattern in ExecGrant_Relation(). -- With Regards, Amit Kapila.
Re: row filtering for logical replication
On Tue, Apr 12, 2022 at 2:35 PM Alvaro Herrera wrote: > > I understand that this is a minimal fix, and for that it seems OK, but I > think the surrounding style is rather baroque. This code can be made > simpler. Here's my take on it. > We don't have a lock on the relation, so if it gets dropped concurrently, it won't behave sanely. For example, get_rel_name() will return NULL which seems incorrect to me. > I think it's also faster: we avoid > looking up pg_publication_rel entries for rels that aren't partitioned > tables. > I am not sure about this as well because you will instead do a RELOID cache lookup even when there is no row filter or column list. -- With Regards, Amit Kapila.
RE: WIP: WAL prefetch (another approach)
Hi, Thank you for your reply. I missed the message, sorry. Regards, Noriyoshi Shinoda -Original Message- From: Thomas Munro Sent: Tuesday, April 12, 2022 6:28 PM To: Shinoda, Noriyoshi (PN Japan FSIP) Cc: Justin Pryzby ; Tomas Vondra ; Stephen Frost ; Andres Freund ; Jakub Wartak ; Alvaro Herrera ; Tomas Vondra ; Dmitry Dolgov <9erthali...@gmail.com>; David Steele ; pgsql-hackers Subject: Re: WIP: WAL prefetch (another approach) On Tue, Apr 12, 2022 at 9:03 PM Shinoda, Noriyoshi (PN Japan FSIP) wrote: > Thank you for developing the great feature. I tested this feature and checked > the documentation. Currently, the documentation for the > pg_stat_prefetch_recovery view is included in the description for the > pg_stat_subscription view. > > INVALID URI REMOVED > toring-stats.html*MONITORING-PG-STAT-SUBSCRIPTION__;Iw!!NpxR!xRu7zc4Hc > ZppB-32Fp3YfESPqJ7B4AOP_RF7QuYP-kCWidoiJ5txu9CW8sX61TfwddE$ Hi! Thanks. I had just committed a fix before I saw your message, because there was already another report here: https://www.postgresql.org/message-id/flat/cakrakevk-lrhmdyt6x_p33ef6dcorm2jed5h_ehdrdv0res...@mail.gmail.com
Re: Temporary file access API
Robert Haas wrote: > On Mon, Apr 11, 2022 at 4:05 AM Antonin Houska wrote: > > There are't really that many kinds of files to encrypt: > > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#List_of_the_files_that_contain_user_data > > > > (And pg_stat/* files should be removed from the list.) > > This kind of gets into some theoretical questions. Like, do we think > that it's an information leak if people can look at how many > transactions are committing and aborting in pg_xact_status? In theory > it could be, but I know it's been argued that that's too much of a > side channel. I'm not sure I believe that, but it's arguable. I was referring to the fact that the statistics are no longer stored in files: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd > Similarly, the argument that global/pg_internal.init doesn't contain > user data relies on the theory that the only table data that will make > its way into the file is for system catalogs. I guess that's not user > data *exactly* but ... are we sure that's how we want to roll here? Yes, this is worth attention. > I really don't know how you can argue that pg_dynshmem/mmap.NNN > doesn't contain user data - surely it can. Good point. Since postgres does not control writing into this file, it's a special case though. (Maybe TDE will have to reject to start if dynamic_shared_memory_type is set to mmap and the instance is encrypted.) Thanks. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: WIP: WAL prefetch (another approach)
On Tue, Apr 12, 2022 at 9:03 PM Shinoda, Noriyoshi (PN Japan FSIP) wrote: > Thank you for developing the great feature. I tested this feature and checked > the documentation. Currently, the documentation for the > pg_stat_prefetch_recovery view is included in the description for the > pg_stat_subscription view. > > https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION Hi! Thanks. I had just committed a fix before I saw your message, because there was already another report here: https://www.postgresql.org/message-id/flat/CAKrAKeVk-LRHMdyT6x_p33eF6dCorM2jed5h_eHdRdv0reSYTA%40mail.gmail.com
Re: Documentation issue with pg_stat_recovery_prefetch
On Tue, Apr 12, 2022 at 8:11 AM sirisha chamarthi wrote: > I was going through pg_stat_recovery_prefetch documentation and saw an issue > with formatting. Attached a small patch to fix the issue. This is the first > time I am sending an email to hackers. Please educate me if I miss something. Thanks Sirisha! Ouch, that's embarrassing. My best guess is that I might have screwed that up a long time ago while rebasing an early development version over commit 92f94686, which changed the link style and moved paragraphs around, and then never noticed that it was wrong. Researching that made me notice another problem: the table was using the 3 column layout from a couple of years ago, because I had also missed the style change in commit a0427506. Oops. Fixed.
Re: Fixes for compression options of pg_receivewal and refactoring of backup_compression.{c,h}
On Mon, Apr 11, 2022 at 12:46:02PM +, gkokola...@pm.me wrote: > On Monday, April 11th, 2022 at 8:52 AM, Michael Paquier > wrote: >> - 0001 is a simple rename of backup_compression.{c,h} to >> compression.{c,h}, removing anything related to base backups from >> that. One extra reason behind this renaming is that I would like to >> use this infra for pg_dump, but that's material for 16~. > > I agree with the design. If you permit me a couple of nitpicks regarding > naming. > > +typedef enum pg_compress_algorithm > +{ > + PG_COMPRESSION_NONE, > + PG_COMPRESSION_GZIP, > + PG_COMPRESSION_LZ4, > + PG_COMPRESSION_ZSTD > +} pg_compress_algorithm; > > Elsewhere in the codebase, (e.g. create_table.sgml, alter_table.sgml, > brin_tuple.c, detoast.c, toast_compression.c, tupdesc.c, gist.c to mention a > few) variations of of the nomenclature "compression method" are used, like > 'VARDATA_COMPRESSED_GET_COMPRESS_METHOD' or 'InvalidCompressionMethod' etc. I > feel that it would be nicer if we followed one naming rule for this and I > recommend to substitute algorithm for method throughout. Technically and as far as I know, both are correct and hold more or less the same meaning. pg_basebackup's code exposes algorithm in a more extended way, so I have just stuck to it for the internal variables and such. Perhaps we could rename the whole, but I see no strong reason to do that. > Last, even though it is not needed now, it will be helpful to have a > PG_COMPRESSION_INVALID in some scenarios. Though we can add it when we come to > it. Perhaps. There is no need for it yet, though. pg_dump would not need that, as well. >> - 0002 removes WalCompressionMethod, replacing it by >> pg_compress_algorithm as these are the same enums. Robert complained >> about the confusion that WalCompressionMethod could lead to as this >> could be used for the compression of data, and not only WAL. I have >> renamed some variables to be more consistent, while on it. > > It looks good. If you choose to discard the comment regarding the use of > 'method' over 'algorithm' from above, can you please use the full word in the > variable, e.g. 'wal_compress_algorithm' instead of 'wal_compress_algo'. I can > not really explain it, the later reads a bit rude. Then again that may be just > me. Thanks. I have been able to do an extra pass on 0001 and 0002, fixing those naming inconsistencies with "algo" vs "algorithm" that you and Robert have reported, and applied them. For 0003, I'll look at it later. Attached is a rebase with improvements about the variable names. -- Michael From 94850ac604402371135e85de27a5d9074a947d4b Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Tue, 12 Apr 2022 18:14:28 +0900 Subject: [PATCH v2] Rework compression options of pg_receivewal Since ba5 and the introduction of LZ4 in pg_receivewal, the compression of archived WAL is controlled by two options: - --compression-method with "gzip", "none" or "lz4" as possible value. - --compress=N to specify a compression level, with a backward-incompatible change where a value of zero leads to a failure instead of no compression. This commit takes advantage of the previous refactoring done to rework the compression options of pg_receivewal: - --compression-method is removed. - --compress is extended to use the same grammar as pg_basebackup, as of a METHOD:DETAIL specification, where a METHOD is "gzip", "none" or "lz4" and a DETAIL is a comma-separated list of options, the only keyword supported is now "level" to control the compression level. If only an integer is specified as value of this option, "none" is implied for 0 and "gzip" is implied. So this brings back --compress to be backward-compatible with ~14, while still supporting LZ4. This has the advantage of centralizing the set of checks used by pg_basebackup. --- src/bin/pg_basebackup/pg_receivewal.c| 131 +-- src/bin/pg_basebackup/t/020_pg_receivewal.pl | 16 +-- doc/src/sgml/ref/pg_receivewal.sgml | 47 --- 3 files changed, 121 insertions(+), 73 deletions(-) diff --git a/src/bin/pg_basebackup/pg_receivewal.c b/src/bin/pg_basebackup/pg_receivewal.c index ede1d4648d..f942883332 100644 --- a/src/bin/pg_basebackup/pg_receivewal.c +++ b/src/bin/pg_basebackup/pg_receivewal.c @@ -57,6 +57,8 @@ static XLogRecPtr endpos = InvalidXLogRecPtr; static void usage(void); +static void parse_compress_options(char *option, char **algorithm, + char **detail); static DIR *get_destination_dir(char *dest_folder); static void close_destination_dir(DIR *dest_dir, char *dest_folder); static XLogRecPtr FindStreamingStart(uint32 *tli); @@ -90,9 +92,8 @@ usage(void) printf(_(" --synchronous flush write-ahead log immediately after writing\n")); printf(_(" -v, --verbose output verbose messages\n")); printf(_(" -V, --version output version information, then exit\n")); - printf(_(" --compression-method=METHOD\n" - "
Re: row filtering for logical replication
I understand that this is a minimal fix, and for that it seems OK, but I think the surrounding style is rather baroque. This code can be made simpler. Here's my take on it. I think it's also faster: we avoid looking up pg_publication_rel entries for rels that aren't partitioned tables. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli) diff --git a/src/backend/commands/publicationcmds.c b/src/backend/commands/publicationcmds.c index 7aacb6b2fe..e8ef003fe5 100644 --- a/src/backend/commands/publicationcmds.c +++ b/src/backend/commands/publicationcmds.c @@ -945,60 +945,42 @@ AlterPublicationOptions(ParseState *pstate, AlterPublicationStmt *stmt, foreach(lc, root_relids) { - HeapTuple rftuple; Oid relid = lfirst_oid(lc); - bool has_column_list; - bool has_row_filter; + char relkind; + HeapTuple rftuple; + + relkind = get_rel_relkind(relid); + if (relkind != RELKIND_PARTITIONED_TABLE) +continue; rftuple = SearchSysCache2(PUBLICATIONRELMAP, ObjectIdGetDatum(relid), ObjectIdGetDatum(pubform->oid)); + if (!HeapTupleIsValid(rftuple)) +continue; - has_row_filter -= !heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL); + if (!heap_attisnull(rftuple, Anum_pg_publication_rel_prqual, NULL)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set %s to false for publication \"%s\"", +"publish_via_partition_root", +stmt->pubname), + errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" which is not allowed when %s is false.", + get_rel_name(relid), + "publish_via_partition_root"))); - has_column_list -= !heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL); + if (!heap_attisnull(rftuple, Anum_pg_publication_rel_prattrs, NULL)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set %s to false for publication \"%s\"", +"publish_via_partition_root", +stmt->pubname), + errdetail("The publication contains a column list for a partitioned table \"%s\" which is not allowed when %s is false.", + get_rel_name(relid), + "publish_via_partition_root"))); - if (HeapTupleIsValid(rftuple) && -(has_row_filter || has_column_list)) - { -HeapTuple tuple; + ReleaseSysCache(rftuple); -tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); -if (HeapTupleIsValid(tuple)) -{ - Form_pg_class relform = (Form_pg_class) GETSTRUCT(tuple); - - if ((relform->relkind == RELKIND_PARTITIONED_TABLE) && - has_row_filter) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot set %s for publication \"%s\"", - "publish_via_partition_root = false", - stmt->pubname), - errdetail("The publication contains a WHERE clause for a partitioned table \"%s\" " - "which is not allowed when %s is false.", - NameStr(relform->relname), - "publish_via_partition_root"))); - - if ((relform->relkind == RELKIND_PARTITIONED_TABLE) && - has_column_list) - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("cannot set %s for publication \"%s\"", - "publish_via_partition_root = false", - stmt->pubname), - errdetail("The publication contains a column list for a partitioned table \"%s\" " - "which is not allowed when %s is false.", - NameStr(relform->relname), - "publish_via_partition_root"))); - - ReleaseSysCache(tuple); -} - -ReleaseSysCache(rftuple); - } } } diff --git a/src/test/regress/expected/publication.out b/src/test/regress/expected/publication.out index 8208f9fa0e..580cc5be7f 100644 --- a/src/test/regress/expected/publication.out +++ b/src/test/regress/expected/publication.out @@ -588,7 +588,7 @@ UPDATE rf_tbl_abcd_part_pk SET a = 1; -- fail - cannot set PUBLISH_VIA_PARTITION_ROOT to false if any row filter is -- used for partitioned table ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=0); -ERROR: cannot set publish_via_partition_root = false for publication "testpub6" +ERROR: cannot set publish_via_partition_root to false for publication "testpub6" DETAIL: The publication contains a WHERE clause for a partitioned table "rf_tbl_abcd_part_pk" which is not allowed when publish_via_partition_root is false. -- Now change the root filter to use a column "b" -- (which is not in the replica identity) @@ -951,7 +951,7 @@ UPDATE rf_tbl_abcd_part_pk SET a = 1; -- fail - cannot set PUBLISH_VIA_PARTITION_ROOT to false if any column list is -- used for partitioned table ALTER PUBLICATION testpub6 SET (PUBLISH_VIA_PARTITION_ROOT=0); -ERROR: cannot set publish_via_partition_root =
RE: WIP: WAL prefetch (another approach)
Hi, Thank you for developing the great feature. I tested this feature and checked the documentation. Currently, the documentation for the pg_stat_prefetch_recovery view is included in the description for the pg_stat_subscription view. https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-SUBSCRIPTION It is also not displayed in the list of "28.2. The Statistics Collector". https://www.postgresql.org/docs/devel/monitoring.html The attached patch modifies the pg_stat_prefetch_recovery view to appear as a separate view. Regards, Noriyoshi Shinoda -Original Message- From: Thomas Munro Sent: Friday, April 8, 2022 10:47 AM To: Justin Pryzby Cc: Tomas Vondra ; Stephen Frost ; Andres Freund ; Jakub Wartak ; Alvaro Herrera ; Tomas Vondra ; Dmitry Dolgov <9erthali...@gmail.com>; David Steele ; pgsql-hackers Subject: Re: WIP: WAL prefetch (another approach) On Fri, Apr 8, 2022 at 12:55 AM Justin Pryzby wrote: > The docs seem to be wrong about the default. > > +are not yet in the buffer pool, during recovery. Valid values are > +off (the default), on and > +try. The setting try > + enables Fixed. > + concurrency and distance, respectively. By default, it is set to > + try, which enabled the feature on systems where > + posix_fadvise is available. > > Should say "which enables". Fixed. > Curiously, I reported a similar issue last year. Sorry. I guess both times we only agreed on what the default should be in the final review round before commit, and I let the docs get out of sync (well, the default is mentioned in two places and I apparently ended my search too soon, changing only one). I also found another recently obsoleted sentence: the one about showing nulls sometimes was no longer true. Removed. pg_stat_recovery_prefetch_doc_v1.diff Description: pg_stat_recovery_prefetch_doc_v1.diff
Re: PG DOCS - logical replication filtering
On Tue, Apr 12, 2022 at 3:33 AM Euler Taveira wrote: > > On Mon, Apr 11, 2022, at 7:40 AM, Amit Kapila wrote: > > On Mon, Apr 11, 2022 at 12:39 PM Peter Smith wrote: > > > > On Fri, Apr 8, 2022 at 4:12 PM Peter Smith wrote: > > > > OK. Added in v7 [1] > > > > Thanks, this looks mostly good to me. I didn't like the new section > added for partitioned tables examples, so I removed it and added some > explanation of the tests. I have slightly changed a few other lines. I > am planning to commit the attached tomorrow unless there are more > comments. > > I have a few comments. Thanks for your review comments! I addressed most of them as suggested - see the details below. > > > If a publication publishes UPDATE and/or DELETE operations ... > > > > If we are talking about operations, use lowercase like I suggested in the > previous version. See the publish parameter [1]. If we are talking about > commands, the word "operations" should be removed or replaced by "commands". > The "and/or" isn't required, "or" implies the same. If you prefer > "operations", > my suggestion is to adjust the last sentence that says "only INSERT" to "only > insert operation". Not changed. Because in fact, I copied most of this sentence (including the uppercase, "operations", "and/or") from existing documentation [1] e.g. see "The tables added to a publication that publishes UPDATE and/or DELETE operations must ..." [1] https://www.postgresql.org/docs/current/sql-createpublication.html > > > If the old row satisfies the row filter expression (it was sent to the > > subscriber) but the new row doesn't, then from a data consistency > > perspective > > the old row should be removed from the subscriber. > > > > I suggested small additions to this sentence. We should at least add a comma > after "then" and "perspective". The same applies to the next paragraph too. Modified the commas in [v9] as suggested. > > Regarding the "Summary", it is redundant as I said before. We already > described > all 4 cases. I vote to remove it. However, if we would go with a table, I > suggest to improve the formatting: add borders, "old row" and "new row" should > be titles, "no match" and "match" should be represented by symbols ("" and > "X", > for example), and "Case X" column should be removed because this extra column > adds nothing. Yeah, I know the information is the same in the summary and in the text. Personally, I find big slabs of technical text difficult to digest, so I'd have to spend 5 minutes of careful reading and drawing the exact same summary on a piece of paper anyway just to visualize what the text says. The summary makes it easy to understand at a glance. But I have modified the summary in [v9] to remove the "case" and to add other column headings as suggested. > > Regarding the "Partitioned Tables", I suggested to remove the bullets. We > generally have paragraphs with a few sentences. I tend to avoid short > paragraphs. I also didn't like the 2 bullets using almost the same words. In > the previous version, I suggested something like > > If the publication contains a partitioned table, the parameter > publish_via_partition_root determines which row filter expression is used. If > the parameter publish_via_partition_root is true, the row filter expression > associated with the partitioned table is used. Otherwise, the row filter > expression associated with the individual partition is used. > Modified in [v9] to remove the bullets. > > will be copied. (see Section 31.3.6 for details). > > There is an extra period after "copied" that should be removed. The other > option is to remove the parentheses and have another sentence for "See ...". > Modified in [v9] as suggested. > > those expressions get OR'ed together > > I prefer plain English here. This part of the sentence is also redundant with > the rest of the sentence so I suggested to remove it in the previous version. > > rows that satisfy any of the row filter expressions is replicated. > > instead of > > those expressions get OR'ed together, so that rows satisfying any of the > expressions will be replicated. Not changed. The readers of this docs page are all users who will be familiar with the filter expressions, so I felt the "OR'ed together" part will be perfectly clear to the intended audience. > > I also didn't use a different paragraph (like I suggested in the previous > version) because we are talking about the same thing. > Modified in [v9] to use a single paragraph. > The bullets in the example sounds strange, that's why I suggested removing it. > We can even combine the 3 sentences into one paragraph. Modified [v9] so the whole example now has no bullets. Also combined all these 3 sentences as suggested. > > > The PSQL command \dRp+ shows the row filter expressions (if defined) for > > each > > table of the publications. > > Well, we don't use PSQL (upppercase) in the documentation. I suggested a > different sentence: > > The psql shows the row
Re: random() function documentation
How about we just say "uses a linear-feedback shift register algorithm"? I think it'd be sufficient to just say that it's a deterministic pseudorandom number generator. I don't see much value in documenting the internal algorithm used. Hmmm… I'm not so sure. ISTM that people interested in using the random user-facing variants (only random?) could like a pointer on the algorithm to check for the expected quality of the produced pseudo-random stream? See attached. Should we perhaps also add a warning that the same seed is not guaranteed to produce the same sequence across different (major?) versions? I wouldn't bother, on the grounds that then we'd need such disclaimers in a whole lot of places. Others might see it differently though. Agreed, Agreed. though I think when the release notes are written, they ought to warn that the sequence will change with this release. Yes. -- Fabien.diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 0a5c402640..7492454592 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1832,10 +1832,11 @@ repeat('Pg', 4) PgPgPgPg - The random() function uses a simple linear - congruential algorithm. It is fast but not suitable for cryptographic - applications; see the module for a more - secure alternative. + The random() function uses + https://en.wikipedia.org/wiki/Xoroshiro128%2B;>xoroshiro128**, a + linear feedback shift register algorithm. + It is fast but not suitable for cryptographic applications; + see the module for a more secure alternative. If setseed() is called, the series of results of subsequent random() calls in the current session can be repeated by re-issuing setseed() with the same
Re: Unable to connect to Postgres13 server from psql client built on master
Hi, On Tue, Apr 12, 2022 at 12:47:54AM -0700, sirisha chamarthi wrote: > > I am unable to connect to my Postgres server (version 13 running) in Azure > Postgres from the PSQL client built on the latest master. However, I am > able to connect to the Postgres 15 server running locally on the machine. I > installed an earlier version of the PSQL client (v 12) and was able to > connect to both the Azure PG instance as well as the local instance. Can > this be a bug in the master? I tried looking at the server logs in Azure > but couldn't get anything meaningful from those. Any tips on how I can > debug psql client further? > > root@userspgdev:/usr/local/pgsql# psql -U postgres -h > inst.postgres.database.azure.com -d postgres > Password for user postgres: > psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1), server 13.6) > WARNING: psql major version 12, server major version 13. > Some psql features might not work. > SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: > 256, compression: off) > Type "help" for help. > > postgres=> \q > > bin/psql -U postgres -h inst.postgres.database.azure.com -d postgres > psql: error: connection to server at "inst.postgres.database.azure.com" > (20.116.167.xx), port 5432 failed: FATAL: no pg_hba.conf entry for host > "20.125.61.xx", user "postgres", database "postgres", SSL off It's hard to be sure without the pg_hba.conf file, but the most likely explanation is that your remote server only accept connection with SSL and you haven't built your local binaries with SSL support.
Re: PG DOCS - logical replication filtering
PSA patch v9 which addresses most of Euler's review comments [1] -- [1] https://www.postgresql.org/message-id/1c78ebd4-b38d-4b5d-a6ea-d583efe87d97%40www.fastmail.com Kind Regards, Peter Smith. Fujitsu Australia v9-0001-Add-additional-documentation-for-row-filters.patch Description: Binary data
Unable to connect to Postgres13 server from psql client built on master
Hi hackers. I am unable to connect to my Postgres server (version 13 running) in Azure Postgres from the PSQL client built on the latest master. However, I am able to connect to the Postgres 15 server running locally on the machine. I installed an earlier version of the PSQL client (v 12) and was able to connect to both the Azure PG instance as well as the local instance. Can this be a bug in the master? I tried looking at the server logs in Azure but couldn't get anything meaningful from those. Any tips on how I can debug psql client further? My local server is running with trust authentication and the remote server is running with md5 in the pg_hba.conf. I am not sure if this changes the psql behavior somehow. root@userspgdev:/usr/local/pgsql# ./psql -U postgres -h inst.postgres.database.azure.com -d postgres bash: ./psql: No such file or directory root@userspgdev:/usr/local/pgsql# psql -U postgres -h inst.postgres.database.azure.com -d postgres Password for user postgres: psql (12.9 (Ubuntu 12.9-0ubuntu0.20.04.1), server 13.6) WARNING: psql major version 12, server major version 13. Some psql features might not work. SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. postgres=> \q bin/psql -U postgres -h inst.postgres.database.azure.com -d postgres psql: error: connection to server at "inst.postgres.database.azure.com" (20.116.167.xx), port 5432 failed: FATAL: no pg_hba.conf entry for host "20.125.61.xx", user "postgres", database "postgres", SSL off Also, wondering why no error is emitted by the psql client when the connection attempt fails? Thanks, Sirisha
Re: MERGE bug report
On 2022-Apr-11, Richard Guo wrote: > At first I was wondering whether we need to also include vars used in > each action's targetlist, just as what we did for each action's qual. > Then later I realized parse_merge.c already did that. But now it looks > much better to process them two in preprocess_targetlist. Yeah. I pushed that. However, now EXPLAIN VERBOSE doesn't show the columns from the source relation in the Output line --- I think only those that are used as join quals are shown, thanks to distribute_quals_to_rels. I think it would be better to fix this. Maybe expanding the source target list earlier is called for, after all. I looked at transformUpdateStmt and siblings for inspiration, but came out blank. > A minor comment is that we can use list_concat_copy(list1, list2) > instead of list_concat(list_copy(list1), list2) for better efficiency. Thanks for that tip. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La vida es para el que se aventura"
fix cost subqueryscan wrong parallel cost
The cost_subqueryscan function does not judge whether it is parallel. regress -- Incremental sort vs. set operations with varno 0 set enable_hashagg to off; explain (costs off) select * from t union select * from t order by 1,3; QUERY PLAN -- Incremental Sort Sort Key: t.a, t.c Presorted Key: t.a -> Unique -> Sort Sort Key: t.a, t.b, t.c -> Append -> Gather Workers Planned: 2 -> Parallel Seq Scan on t -> Gather Workers Planned: 2 -> Parallel Seq Scan on t t_1 to Incremental Sort Sort Key: t.a, t.c Presorted Key: t.a -> Unique -> Sort Sort Key: t.a, t.b, t.c -> Gather Workers Planned: 2 -> Parallel Append -> Parallel Seq Scan on t -> Parallel Seq Scan on t t_1 Obviously the latter is less expensive bu...@sohu.com fix-cost_subqueryscan-get-wrong-parallel-cost.patch Description: Binary data
Re: Handle infinite recursion in logical replication setup
On Tue, Apr 12, 2022 at 10:26 AM Peter Smith wrote: > > On Thu, Apr 7, 2022 at 2:09 PM Peter Smith wrote: > > > > FYI, here is a test script that is using the current patch (v6) to > > demonstrate a way to share table data between different numbers of > > nodes (up to 5 of them here). > > > > The script starts off with just 2-way sharing (nodes N1, N2), > > then expands to 3-way sharing (nodes N1, N2, N3), > > then 4-way sharing (nodes N1, N2, N3, N4), > > then 5-way sharing (nodes N1, N2, N3, N4, N5). > > > > As an extra complication, for this test, all 5 nodes have different > > initial table data, which gets replicated to the others whenever each > > new node joins the existing share group. > > > > PSA. > > > > > Hi Vignesh. I had some problems getting the above test script working. > It was OK up until I tried to join the 5th node (N5) to the existing 4 > nodes. The ERROR was manifesting itself strangely because it appeared > that there was an index violation in the pg_subscription_rel catalog > even though AFAIK the N5 did not have any entries in it. > > > e.g. > 2022-04-07 09:13:28.361 AEST [24237] ERROR: duplicate key value > violates unique constraint "pg_subscription_rel_srrelid_srsubid_index" > 2022-04-07 09:13:28.361 AEST [24237] DETAIL: Key (srrelid, > srsubid)=(16384, 16393) already exists. > 2022-04-07 09:13:28.361 AEST [24237] STATEMENT: create subscription > sub51 connection 'port=7651' publication pub1 with > (subscribe_local_only=true,copy_data=force); > 2022-04-07 09:13:28.380 AEST [24237] ERROR: duplicate key value > violates unique constraint "pg_subscription_rel_srrelid_srsubid_index" > 2022-04-07 09:13:28.380 AEST [24237] DETAIL: Key (srrelid, > srsubid)=(16384, 16394) already exists. > 2022-04-07 09:13:28.380 AEST [24237] STATEMENT: create subscription > sub52 connection 'port=7652' publication pub2 with > (subscribe_local_only=true,copy_data=false); > 2022-04-07 09:13:28.405 AEST [24237] ERROR: duplicate key value > violates unique constraint "pg_subscription_rel_srrelid_srsubid_index" > 2022-04-07 09:13:28.405 AEST [24237] DETAIL: Key (srrelid, > srsubid)=(16384, 16395) already exists. > 2022-04-07 09:13:28.405 AEST [24237] STATEMENT: create subscription > sub53 connection 'port=7653' publication pub3 with > (subscribe_local_only=true,copy_data=false); > 2022-04-07 09:13:28.425 AEST [24237] ERROR: duplicate key value > violates unique constraint "pg_subscription_rel_srrelid_srsubid_index" > 2022-04-07 09:13:28.425 AEST [24237] DETAIL: Key (srrelid, > srsubid)=(16384, 16396) already exists. > 2022-04-07 09:13:28.425 AEST [24237] STATEMENT: create subscription > sub54 connection 'port=7654' publication pub4 with > (subscribe_local_only=true,copy_data=false); > 2022-04-07 09:17:52.472 AEST [25852] ERROR: duplicate key value > violates unique constraint "pg_subscription_rel_srrelid_srsubid_index" > 2022-04-07 09:17:52.472 AEST [25852] DETAIL: Key (srrelid, > srsubid)=(16384, 16397) already exists. > 2022-04-07 09:17:52.472 AEST [25852] STATEMENT: create subscription > sub51 connection 'port=7651' publication pub1; > > ~~~ > > When I debugged this it seemed like each of the CREAT SUBSCRIPTION was > trying to make a double-entry, because the fetch_tables (your patch > v6-0002 modified SQL of this) was retuning the same table 2x. > > (gdb) bt > #0 errfinish (filename=0xbc1057 "nbtinsert.c", lineno=671, > funcname=0xbc25e0 <__func__.15798> "_bt_check_unique") at elog.c:510 > #1 0x00526d83 in _bt_check_unique (rel=0x7f654219c2a0, > insertstate=0x7ffd9629ddd0, heapRel=0x7f65421b0e28, > checkUnique=UNIQUE_CHECK_YES, is_unique=0x7ffd9629de01, > speculativeToken=0x7ffd9629ddcc) at nbtinsert.c:664 > #2 0x00526157 in _bt_doinsert (rel=0x7f654219c2a0, > itup=0x19ea8e8, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false, > heapRel=0x7f65421b0e28) at nbtinsert.c:208 > #3 0x0053450e in btinsert (rel=0x7f654219c2a0, > values=0x7ffd9629df10, isnull=0x7ffd9629def0, ht_ctid=0x19ea894, > heapRel=0x7f65421b0e28, checkUnique=UNIQUE_CHECK_YES, > indexUnchanged=false, indexInfo=0x19dea80) at nbtree.c:201 > #4 0x005213b6 in index_insert (indexRelation=0x7f654219c2a0, > values=0x7ffd9629df10, isnull=0x7ffd9629def0, heap_t_ctid=0x19ea894, > heapRelation=0x7f65421b0e28, checkUnique=UNIQUE_CHECK_YES, > indexUnchanged=false, indexInfo=0x19dea80) at indexam.c:193 > #5 0x005c81d5 in CatalogIndexInsert (indstate=0x19de540, > heapTuple=0x19ea890) at indexing.c:158 > #6 0x005c8325 in CatalogTupleInsert (heapRel=0x7f65421b0e28, > tup=0x19ea890) at indexing.c:231 > #7 0x005f0170 in AddSubscriptionRelState (subid=16400, > relid=16384, state=105 'i', sublsn=0) at pg_subscription.c:315 > #8 0x006d6fa5 in CreateSubscription (pstate=0x1942dc0, > stmt=0x191f6a0, isTopLevel=true) at subscriptioncmds.c:767 > > ~~ > > Aside: All this was happening when I did not have enough logical > replication workers configured. (There were WARNINGS in the logfile > that I had
Re: Skipping schema changes in publication
On Tue, Apr 12, 2022 at 11:53 AM vignesh C wrote: > > On Sat, Mar 26, 2022 at 7:37 PM vignesh C wrote: > > > > On Tue, Mar 22, 2022 at 12:38 PM vignesh C wrote: > > > > > > Hi, > > > > > > This feature adds an option to skip changes of all tables in specified > > > schema while creating publication. > > > This feature is helpful for use cases where the user wants to > > > subscribe to all the changes except for the changes present in a few > > > schemas. > > > Ex: > > > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2; > > > OR > > > ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2; > > > > > > A new column pnskip is added to table "pg_publication_namespace", to > > > maintain the schemas that the user wants to skip publishing through > > > the publication. Modified the output plugin (pgoutput) to skip > > > publishing the changes if the relation is part of skip schema > > > publication. > > > As a continuation to this, I will work on implementing skipping tables > > > from all tables in schema and skipping tables from all tables > > > publication. > > > > > > Attached patch has the implementation for this. > > > > The patch was not applying on top of HEAD because of the recent > > commits, attached patch is rebased on top of HEAD. > > The patch does not apply on top of HEAD because of the recent commit, > attached patch is rebased on top of HEAD. > > I have also included the implementation for skipping a few tables from > all tables publication, the 0002 patch has the implementation for the > same. > This feature is helpful for use cases where the user wants to > subscribe to all the changes except for the changes present in a few > tables. > Ex: > CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2; > OR > ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2; > For the second syntax (Alter Publication ...), isn't it better to avoid using ADD? It looks odd to me because we are not adding anything in publication with this sytax. -- With Regards, Amit Kapila.
Re: avoid multiple hard links to same WAL file after a crash
At Mon, 11 Apr 2022 09:52:57 -0700, Nathan Bossart wrote in > On Mon, Apr 11, 2022 at 12:28:47PM -0400, Tom Lane wrote: > > Robert Haas writes: > >> On Mon, Apr 11, 2022 at 5:12 AM Kyotaro Horiguchi > >> wrote: > >>> If this diagnosis is correct, the comment is proved to be paranoid. > > > >> It's sometimes difficult to understand what problems really old code > >> comments are worrying about. For example, could they have been > >> worrying about bugs in the code? Could they have been worrying about > >> manual interference with the pg_wal directory? It's hard to know. > > > > "git blame" can be helpful here, if you trace back to when the comment > > was written and then try to find the associated mailing-list discussion. > > (That leap can be difficult for commits pre-dating our current > > convention of including links in the commit message, but it's usually > > not *that* hard to locate contemporaneous discussion.) > > I traced this back a while ago. I believe the link() was first added in > November 2000 as part of f0e37a8. This even predates WAL recycling, which > was added in July 2001 as part of 7d4d5c0. f0e37a8 lacks discussion.. It introduced the CHECKPOINT command from somwhere out of the ML.. This patch changed XLogFileInit to supportusing existent files so that XLogWrite can use the new segment provided by checkpoint and still allow XLogWrite to create a new segment by itself. Just before the commit, calls to XLogFileInit were protected (or serialized) by logwr_lck. At the commit calls to the same function were still serialized by ControlFileLockId. I *guess* that Vadim faced/noticed a race condition when he added checkpoint. Thus introduced the link+remove protocol but finally it became useless by moving the call to XLogFileInit within ControlFileLockId section. But, of course, all of story is mere a guess. regards. -- Kyotaro Horiguchi NTT Open Source Software Center