RE: refactoring basebackup.c
Hi, Hackers. Thank you for developing a great feature. The current help message shown below does not seem to be able to specify the 'client-' or 'server-' for lz4 compression. --compress = {[{client, server}-]gzip, lz4, none}[:LEVEL] The attached small patch fixes the help message as follows: --compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL] Regards, Noriyoshi Shinoda -Original Message- From: Robert Haas Sent: Saturday, February 12, 2022 12:50 AM To: Justin Pryzby Cc: Jeevan Ladhe ; Dipesh Pandit ; Abhijit Menon-Sen ; Dmitry Dolgov <9erthali...@gmail.com>; Jeevan Ladhe ; Mark Dilger ; pgsql-hack...@postgresql.org; tushar Subject: Re: refactoring basebackup.c On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby wrote: > FYI: there's a couple typos in the last 2 patches. Hmm. OK. But I don't consider "can be optionally specified" incorrect or worse than "can optionally be specified". I do agree that spelling words correctly is a good idea. -- Robert Haas EDB: http://www.enterprisedb.com pg_basebackup_help_v1.diff Description: pg_basebackup_help_v1.diff
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
On Sat, Jan 29, 2022 at 8:42 PM Peter Geoghegan wrote: > Attached is v7, a revision that overhauls the algorithm that decides > what to freeze. I'm now calling it block-driven freezing in the commit > message. Also included is a new patch, that makes VACUUM record zero > free space in the FSM for an all-visible page, unless the total amount > of free space happens to be greater than one half of BLCKSZ. I pushed the earlier refactoring and instrumentation patches today. Attached is v8. No real changes -- just a rebased version. It will be easier to benchmark and test the page-driven freezing stuff now, since the master/baseline case will now output instrumentation showing how relfrozenxid has been advanced (if at all) -- whether (and to what extent) each VACUUM operation advances relfrozenxid can now be directly compared, just by monitoring the log_autovacuum_min_duration output for a given table over time. -- Peter Geoghegan From 41136d2a8af434a095ce3e6dfdfbe4b48b9ec338 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sun, 23 Jan 2022 21:10:38 -0800 Subject: [PATCH v8 3/3] Add all-visible FSM heuristic. When recording free space in all-frozen page, record that the page has zero free space when it has less than half BLCKSZ worth of space, according to the traditional definition. Otherwise record free space as usual. Making all-visible pages resistant to change like this can be thought of as a form of hysteresis. The page is given an opportunity to "settle" and permanently stay in the same state when the tuples on the page will never be updated or deleted. But when they are updated or deleted, the page can once again be used to store any tuple. Over time, most pages tend to settle permanently in many workloads, perhaps only on the second or third attempt. --- src/backend/access/heap/vacuumlazy.c | 21 + 1 file changed, 21 insertions(+) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index ea4b75189..95049ed25 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1231,6 +1231,13 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers) */ freespace = PageGetHeapFreeSpace(page); +/* + * An all-visible page should not have its free space + * available from FSM unless it's more than half empty + */ +if (PageIsAllVisible(page) && freespace < BLCKSZ / 2) + freespace = 0; + UnlockReleaseBuffer(buf); RecordPageWithFreeSpace(vacrel->rel, blkno, freespace); continue; @@ -1368,6 +1375,13 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers) { Size freespace = PageGetHeapFreeSpace(page); + /* + * An all-visible page should not have its free space available + * from FSM unless it's more than half empty + */ + if (PageIsAllVisible(page) && freespace < BLCKSZ / 2) +freespace = 0; + UnlockReleaseBuffer(buf); RecordPageWithFreeSpace(vacrel->rel, blkno, freespace); } @@ -2537,6 +2551,13 @@ lazy_vacuum_heap_rel(LVRelState *vacrel) page = BufferGetPage(buf); freespace = PageGetHeapFreeSpace(page); + /* + * An all-visible page should not have its free space available from + * FSM unless it's more than half empty + */ + if (PageIsAllVisible(page) && freespace < BLCKSZ / 2) + freespace = 0; + UnlockReleaseBuffer(buf); RecordPageWithFreeSpace(vacrel->rel, tblk, freespace); vacuumed_pages++; -- 2.30.2 From 4838bd1f11b748d2082caedfe4da506b8fe3f67a Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Mon, 13 Dec 2021 15:00:49 -0800 Subject: [PATCH v8 2/3] Make block-level characteristics drive freezing. Teach VACUUM to freeze all of the tuples on a page whenever it notices that it would otherwise mark the page all-visible, without also marking it all-frozen. VACUUM won't freeze _any_ tuples on the page unless _all_ tuples (that remain after pruning) are all-visible. It may occasionally be necessary to freeze the page due to the presence of a particularly old XID, from before VACUUM's FreezeLimit cutoff. But the FreezeLimit mechanism will seldom have any impact on which pages are frozen anymore -- it is just a backstop now. Freezing can now informally be thought of as something that takes place at the level of an entire page, or not at all -- differences in XIDs among tuples on the same page are not interesting, barring extreme cases. Freezing a page is now practically synonymous with setting the page to all-visible in the visibility map, at least to users. The main upside of the new approach to freezing is that it makes the overhead of vacuuming much more predictable over time. We avoid the need for large balloon payments, since the system no longer accumulates "freezing debt" that can only be paid off by anti-wraparound vacuuming. This seems to have been particularly troublesome with append-only tables, especially in the common case where XIDs from pages that are marked all-visible for the first
Re: pgsql: Add TAP test to automate the equivalent of check_guc
On Sat, Feb 12, 2022 at 09:49:47AM +0900, Michael Paquier wrote: > I guess that it would be better to revert the TAP test and rework all > that for the time being, then. And this part is done. -- Michael signature.asc Description: PGP signature
Re: Teach pg_receivewal to use lz4 compression
On Fri, Feb 11, 2022 at 10:07:49AM -0500, Robert Haas wrote: > Over in > http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pgebaosgfmzz6ac4_srwom_+u...@mail.gmail.com > I was noticing that CreateWalTarMethod doesn't support LZ4 > compression. It would be nice if it did. I thought maybe the patch on > this thread would fix that, but I think maybe it doesn't, because it > looks like that's touching the WalDirectoryMethod part of that file, > rather than the WalTarMethod part. Is that correct? Correct. pg_receivewal only cares about the directory method, so this thread was limited to this part. Yes, it would be nice to extend fully the tar method of walmethods.c to support LZ4, but I was not sure what needed to be done, and I am still not sure based on what has just been done as of 751b8d23. > And, on a related note, Michael, do you plan to get something > committed here? Apart from f79962d, ba5 and 50e1441, I don't think that there was something left to do for this thread. Perhaps I am missing something? -- Michael signature.asc Description: PGP signature
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-02-11 16:41:24 -0800, Andres Freund wrote: > FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench > r/w workload that's been modified to start 70 savepoints at the start shows > > pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: > t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table > "pgbench_branches" > pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: > t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table > "pgbench_branches" > pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: > t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table > "pgbench_branches" > > after a few minutes of running with a local, not slowed down, syncrep. Without > any other artifical slowdowns or such. And this can easily be triggered even without subtransactions, in a completely reliable way. The only reason I'm so far not succeeding in turning it into an isolationtester spec is that a transaction waiting for SyncRep doesn't count as waiting for isolationtester. Basically S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; S2: SELECT pg_xact_status($xid); S2: UPDATE; suffices, because the pg_xact_status() causes an xlog fetch, priming the xid cache, which then causes the TransactionIdIsInProgress() to take the early return path, despite the transaction still being in progress. Which then allows the update to proceed, despite the S1 not having "properly committed" yet. Greetings, Andres Freund
Re: Replacing TAP test planning with done_testing()
Le sam. 12 févr. 2022 à 04:28, Alvaro Herrera a écrit : > On 2022-Feb-11, Tom Lane wrote: > > > Andres Freund writes: > > > > +1 on backpatching. Backpatching tests now is less likely to cause > conflicts, > > > but more likely to fail during tests. > > > > If you've got the energy to do it, +1 for backpatching. I agree > > with Michael's opinion that doing so will probably save us > > future annoyance. > > +1 > +1 >
Re: PGroonga index-only scan problem with yesterday’s PostgreSQL updates
On 2/11/22 15:57, Tom Lane wrote: Possibly same issue I just fixed? https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e5691cc9170bcd6c684715c2755d919c5a16fea2 Yeah, that does seem to fix my test cases. Thanks! Any chance this will make it into minor releases sooner than three months from now? I know extra minor releases are unusual, but this regression will be critical at least for self-hosted Zulip users and presumably other PGroonga users. Anders
Re: pgsql: Add TAP test to automate the equivalent of check_guc
On Fri, Feb 11, 2022 at 12:34:31PM -0500, Tom Lane wrote: > Justin Pryzby writes: >> Or, is it okay to use ABS_SRCDIR? > > Don't see why not --- other tests certainly do. Another solution would be to rely on TESTDIR, which is always set as of the Makefile invocations and vcregress.pl, but that's doomed for VPATH builds as this points to the build directory, not the source directory so we would not have access to the sample file. The root issue here is that we don't have access to top_srcdir when running the TAP tests, if we want to keep this stuff as a TAP test. libpq's regress.pl tweaks things by passing down a SRCDIR to take care of the VPATH problem, but there is nothing in %ENV that points to the source directory in the context of a TAP test by default, if we cannot rely on pg_config to find where the sample file is located. That's different than this thread, but should we make an effort and expand the data available here for TAP tests? Hmm. Relying on ABS_SRCDIR in the main test suite would be fine, though. It also looks that there would be nothing preventing the addition of the three tests in the TAP test, as of three SQL queries instead. These had better be written with one or more CTEs, for readability, at least, with the inner query reading the contents of the sample file to grab the list of all the GUCs. I guess that it would be better to revert the TAP test and rework all that for the time being, then. -- Michael signature.asc Description: PGP signature
Re: pgsql: Add TAP test to automate the equivalent of check_guc
On Fri, Feb 11, 2022 at 10:48:11AM +0100, Christoph Berg wrote: > It never caused any problem in the 12+ years we have been doing this, > but Debian is patching pg_config not to be relocatable since we are > installing into /usr/lib/postgresql/NN /usr/share/postgresql/NN, so > not a single prefix. > > https://salsa.debian.org/postgresql/postgresql/-/blob/15/debian/patches/50-per-version-dirs.patch Wow. This is the way for Debian to bypass the fact that ./configure is itself patched, hence you cannot rely on things like --libdir, --bindir and the kind at build time? That's.. Err.. Fancy, I'd say. -- Michael signature.asc Description: PGP signature
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-02-10 22:11:38 -0800, Andres Freund wrote: > Looks lik syncrep will make this a lot worse, because it can drastically > increase the window between the TransactionIdCommitTree() and > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > least it might make it easier to write tests exercising this scenario... FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench r/w workload that's been modified to start 70 savepoints at the start shows pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table "pgbench_branches" pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table "pgbench_branches" pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table "pgbench_branches" after a few minutes of running with a local, not slowed down, syncrep. Without any other artifical slowdowns or such. Greetings, Andres Freund
Re: logical decoding and replication of sequences
On 2/10/22 19:17, Tomas Vondra wrote: > I've polished & pushed the first part adding sequence decoding > infrastructure etc. Attached are the two remaining parts. > > I plan to wait a day or two and then push the test_decoding part. The > last part (for built-in replication) will need more work and maybe > rethinking the grammar etc. > I've pushed the second part, adding sequences to test_decoding. Here's the remaining part, rebased, with a small tweak in the TAP test to eliminate the issue with not waiting for sequence increments. I've kept the tweak in a separate patch, so that we can throw it away easily if we happen to resolve the issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom c456270469cdb6ad769455eb3d16aea6db2c02af Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 10 Feb 2022 15:18:59 +0100 Subject: [PATCH 1/2] Add support for decoding sequences to built-in replication --- doc/src/sgml/catalogs.sgml | 71 doc/src/sgml/ref/alter_publication.sgml | 24 +- doc/src/sgml/ref/alter_subscription.sgml| 4 +- src/backend/catalog/pg_publication.c| 149 - src/backend/catalog/system_views.sql| 10 + src/backend/commands/publicationcmds.c | 350 +++- src/backend/commands/sequence.c | 79 + src/backend/commands/subscriptioncmds.c | 272 +++ src/backend/executor/execReplication.c | 2 +- src/backend/nodes/copyfuncs.c | 1 + src/backend/nodes/equalfuncs.c | 1 + src/backend/parser/gram.y | 32 ++ src/backend/replication/logical/proto.c | 52 +++ src/backend/replication/logical/tablesync.c | 118 ++- src/backend/replication/logical/worker.c| 60 src/backend/replication/pgoutput/pgoutput.c | 85 - src/backend/utils/cache/relcache.c | 4 +- src/bin/psql/tab-complete.c | 14 +- src/include/catalog/pg_proc.dat | 5 + src/include/catalog/pg_publication.h| 14 + src/include/commands/sequence.h | 1 + src/include/nodes/parsenodes.h | 6 + src/include/replication/logicalproto.h | 19 ++ src/include/replication/pgoutput.h | 1 + src/test/regress/expected/rules.out | 8 + src/test/subscription/t/028_sequences.pl| 196 +++ 26 files changed, 1542 insertions(+), 36 deletions(-) create mode 100644 src/test/subscription/t/028_sequences.pl diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 879d2dbce03..271dc03e5a2 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -9540,6 +9540,11 @@ SCRAM-SHA-256$iteration count: prepared transactions + + pg_publication_sequences + publications and their associated sequences + + pg_publication_tables publications and their associated tables @@ -11375,6 +11380,72 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx + + pg_publication_sequences + + + pg_publication_sequences + + + + The view pg_publication_sequences provides + information about the mapping between publications and the sequences they + contain. Unlike the underlying catalog + pg_publication_rel, + this view expands + publications defined as FOR ALL SEQUENCES, so for such + publications there will be a row for each eligible sequence. + + + + pg_publication_sequences Columns + + + + + Column Type + + + Description + + + + + + + + pubname name + (references pg_publication.pubname) + + + Name of publication + + + + + + schemaname name + (references pg_namespace.nspname) + + + Name of schema containing sequence + + + + + + sequencename name + (references pg_class.relname) + + + Name of sequence + + + + + + + pg_publication_tables diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml index 7c7c27bf7ce..9da8274ae2c 100644 --- a/doc/src/sgml/ref/alter_publication.sgml +++ b/doc/src/sgml/ref/alter_publication.sgml @@ -31,7 +31,9 @@ ALTER PUBLICATION name RENAME TO where publication_object is one of: TABLE [ ONLY ] table_name [ * ] [, ... ] +SEQUENCE sequence_name [ * ] [, ... ] ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ] +ALL SEQUENCES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ] @@ -56,7 +58,18 @@ ALTER PUBLICATION name RENAME TO - The fourth variant of this command listed in the synopsis can change + The next three variants change which sequences are part of the publication. + The SET SEQUENCE clause will replace the list of sequences + in the publication with the
Re: PGroonga index-only scan problem with yesterday’s PostgreSQL updates
Anders Kaseorg writes: > With yesterday’s release of PostgreSQL 11.15, 12.10, and 13.6 > (presumably 10.20 and 14.2 as well), Zulip’s test suite started failing > with “variable not found in subplan target list” errors from PostgreSQL > on a table that has a PGroonga index. Possibly same issue I just fixed? https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e5691cc9170bcd6c684715c2755d919c5a16fea2 regards, tom lane
PGroonga index-only scan problem with yesterday’s PostgreSQL updates
With yesterday’s release of PostgreSQL 11.15, 12.10, and 13.6 (presumably 10.20 and 14.2 as well), Zulip’s test suite started failing with “variable not found in subplan target list” errors from PostgreSQL on a table that has a PGroonga index. I found the following reproduction recipe from a fresh database: psql (11.15 (Debian 11.15-1.pgdg100+1)) Type "help" for help. test=# CREATE EXTENSION pgroonga; CREATE EXTENSION test=# CREATE TABLE t AS SELECT CAST(c AS text) FROM generate_series(1, 1) AS c; SELECT 1 test=# CREATE INDEX t_c ON t USING pgroonga (c); CREATE INDEX test=# VACUUM t; VACUUM test=# SELECT COUNT(*) FROM t; ERROR: variable not found in subplan target list I filed https://github.com/pgroonga/pgroonga/issues/203, and confirmed with a Git bisection of PostgreSQL that this error first appears with ec36745217 (REL_11_15~42) “Fix index-only scan plans, take 2.” I’m aware that this likely just exposed a previously hidden PGroonga bug, but I figure PostgreSQL developers might want to know about this anyway and help come up with the right fix. The PGroonga author suggested I start a thread here: https://github.com/pgroonga/pgroonga/issues/203#issuecomment-1036708841 Thanks for investigating this! Our CI is also broken with new PostgreSQL: https://github.com/pgroonga/pgroonga/runs/5149762901?check_suite_focus=true https://www.postgresql.org/message-id/602391641208390%40iva4-92c901fae84c.qloud-c.yandex.net says partial-retrieval index-only scan but our case is non-retrievable index-only scan. In non-retrievable index-only scan, the error is occurred. We asked about non-retrievable index-only scan on the PostgreSQL mailing list in the past: https://www.postgresql.org/message-id/5148.1584372043%40sss.pgh.pa.us We thought non-retrievable index-only scan should not be used but PostgreSQL may use it as a valid plan. So I think that our case should be supported with postgres/postgres@ec36745 “Fix index-only > scan plans, take 2.” Could you ask about this case on the PostgreSQL mailing list https://www.postgresql.org/list/pgsql-hackers/ ? The following patch fixes our case and PostgreSQL's test cases are still passed but a review by the original author is needed: --- postgresql-11.15.orig/src/backend/optimizer/plan/setrefs.c 2022-02-08 06:20:23.0 +0900 +++ postgresql-11.15/src/backend/optimizer/plan/setrefs.c 2022-02-12 07:32:20.355567317 +0900 @@ -1034,7 +1034,7 @@ { TargetEntry *indextle = (TargetEntry *) lfirst(lc); - if (!indextle->resjunk) + if (!indextle->resjunk || indextle->expr->type == T_Var) stripped_indextlist = lappend(stripped_indextlist, indextle); } ``` Anders
Re: support for MERGE
On Fri, Feb 11, 2022 at 05:25:49PM -0300, Alvaro Herrera wrote: > > I'm not sure git diff --cherry-pick is widely known/used, but I think > > using that relative to master may be good enough. > > I had never heard of git diff --cherry-pick, and the manpages I found > don't document it, so frankly I doubt it's known. I still have no idea > what does it do. See git-log(1) --cherry-pick Omit any commit that introduces the same change as another commit on the “other side” when the set of commits are limited with symmetric difference. The "symmetric difference" is the triple-dot notation. The last few years I've used this to check for missing bits in the draft release notes. (Actually, I tend to start my own list of features before that). It's doing a generic version of what git_changelog does. https://www.postgresql.org/message-id/20210510144045.gc27...@telsasoft.com > I suppose there is an obvious reason why using git diff with > $(git merge-base ...) as one of the arguments doesn't work for these purposes. > > > Andres thinks that does the wrong thing if CI is run manually (not by CFBOT) > > for patches against backbranches. > > I wonder if it's sufficient to handle these things (coverage > specifically) for branch master only. Or default to master, and maybe try to parse the commit message and pull out Backpatch-through: NN. It's intended to be machine-readable, after all. Let's talk about it more on the/another thread :) -- Justin
Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
On 2/6/22 10:45, Robert Haas wrote: > For what it's worth, I am generally in favor of having something like > this in PostgreSQL. I think it's wrong of us to continue assuming that > everyone has command-line access. Even when that's true, it's not > necessarily convenient. If you choose to use a relational database, > you may be the sort of person who likes SQL. Almost completely off topic, but this reminded me of an incident about 30 years ago at my first gig as an SA/DBA. There was an application programmer who insisted on loading a set of values from a text file into a temp table (it was Ingres, anyone remember that?). Why? Because he knew how to write "Select * from mytable order by mycol" but didn't know how to drive the Unix sort utility at the command line. When I was unable to restrain myself from smiling at this he got very angry and yelled at me loudly. So, yes, some people do like SQL and hate the command line. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Fix overflow in DecodeInterval
Tom Lane writes: > Joseph Koshakow writes: > > The attached patch fixes an overflow bug in DecodeInterval when applying > > the units week, decade, century, and millennium. The overflow check logic > > was modelled after the overflow check at the beginning of `int > > tm2interval(struct pg_tm *tm, fsec_t fsec, Interval *span);` in timestamp.c. > > > Good catch, but I don't think that tm2interval code is best practice > anymore. Rather than bringing "double" arithmetic into the mix, > you should use the overflow-detecting arithmetic functions in > src/include/common/int.h. The existing code here is also pretty > faulty in that it doesn't notice addition overflow when combining > multiple units. So for example, instead of > > > tm->tm_mday += val * 7; > > > I think we should write something like > > > if (pg_mul_s32_overflow(val, 7, )) > return DTERR_FIELD_OVERFLOW; > if (pg_add_s32_overflow(tm->tm_mday, tmp, >tm_mday)) > return DTERR_FIELD_OVERFLOW; > > > Perhaps some macros could be used to make this more legible? > > > regards, tom lane > > > @postgresql Thanks for the feedback Tom, I've attached an updated patch with your suggestions. Feel free to rename the horribly named macro. Also while fixing this I noticed that fractional intervals can also cause an overflow issue. postgres=# SELECT INTERVAL '0.1 months 2147483647 days'; interval -- -2147483646 days (1 row) I haven't looked into it, but it's probably a similar cause. From 49b5beb4132a0c9e793e1fd7b91a4da0c96a6196 Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Fri, 11 Feb 2022 14:18:32 -0500 Subject: [PATCH] Check for overflow when decoding an interval When decoding an interval there are various date units which are aliases for some multiple of another unit. For example a week is 7 days and a decade is 10 years. When decoding these specific units, there is no check for overflow, allowing the interval to overflow. This commit adds an overflow check for all of these units. --- src/backend/utils/adt/datetime.c | 22 ++ src/test/regress/expected/interval.out | 32 ++ src/test/regress/sql/interval.sql | 8 +++ 3 files changed, 58 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 7926258c06..59d391adda 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -21,6 +21,7 @@ #include "access/htup_details.h" #include "access/xact.h" #include "catalog/pg_type.h" +#include "common/int.h" #include "common/string.h" #include "funcapi.h" #include "miscadmin.h" @@ -3106,6 +3107,19 @@ DecodeInterval(char **field, int *ftype, int nf, int range, int val; double fval; + /* + * Multiply src by mult and add it to dest while checking for overflow. + */ +#define SAFE_MUL_ADD(src, mult, dst) \ + do\ + {\ + int tmp; \ + if (pg_mul_s32_overflow(src, mult, )) \ + return DTERR_FIELD_OVERFLOW; \ + if (pg_add_s32_overflow(dst, tmp, &(dst))) \ + return DTERR_FIELD_OVERFLOW; \ + } while (0) + *dtype = DTK_DELTA; type = IGNORE_DTF; ClearPgTm(tm, fsec); @@ -3293,7 +3307,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range, break; case DTK_WEEK: - tm->tm_mday += val * 7; + SAFE_MUL_ADD(val, 7, tm->tm_mday); AdjustFractDays(fval, tm, fsec, 7); tmask = DTK_M(WEEK); break; @@ -3311,19 +3325,19 @@ DecodeInterval(char **field, int *ftype, int nf, int range, break; case DTK_DECADE: - tm->tm_year += val * 10; + SAFE_MUL_ADD(val, 10, tm->tm_year); tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 10); tmask = DTK_M(DECADE); break; case DTK_CENTURY: - tm->tm_year += val * 100; + SAFE_MUL_ADD(val, 100, tm->tm_year); tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 100); tmask = DTK_M(CENTURY); break; case DTK_MILLENNIUM: - tm->tm_year += val * 1000; + SAFE_MUL_ADD(val, 1000, tm->tm_year); tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 1000); tmask = DTK_M(MILLENNIUM); break; diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out index accd4a7d90..88865483fa 100644 --- a/src/test/regress/expected/interval.out +++ b/src/test/regress/expected/interval.out @@ -232,6 +232,38 @@ INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 years'); ERROR: interval out of range LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 years'... ^ +INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 weeks'); +ERROR: interval field value out of range: "2147483647 weeks" +LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 weeks')... + ^ +INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 weeks');
Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
On Thu, Jan 6, 2022 at 10:23 AM Robert Haas wrote: > That's fixed now. So what should we do about this patch? This is a > bug, so it would be nice to do *something*. I don't really like the > fact that this makes the behavior contingent on USE_ASSERT_CHECKING, > and I suggest that you make a new symbol like USE_BARRIER_SMGR_RELEASE > which by default gets defined on WIN32, but can be defined elsewhere > if you want (see the treatment of EXEC_BACKEND in pg_config_manual.h). Ok, done like that. > Furthermore, I can't see back-patching this, given that it would be > the very first use of the barrier machinery. But I think it would be > good to get something into master, because then we'd actually be using > this procsignalbarrier stuff for something. On a good day we've fixed > a bug. On a bad day we'll learn something new about how > procsignalbarrier needs to work. Agreed. Pushed. The basic Windows/tablespace bug seen occasionally in CI[1] should now be fixed. For the sake of the archives, here's a link to the ongoing discussion about further potential uses of this mechanism: https://www.postgresql.org/message-id/flat/20220209220004.kb3dgtn2x2k2gtdm%40alap3.anarazel.de [1] https://www.postgresql.org/message-id/CA%2BhUKGJp-m8uAD_wS7%2BdkTgif013SNBSoJujWxvRUzZ1nkoUyA%40mail.gmail.com
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Fri, Feb 11, 2022 at 4:08 PM Alvaro Herrera wrote: > It seems you're thinking deciding what to do based on an option that > gets a boolean argument. But what about making the argument be an enum? > For example > > CREATE DATABASE ... WITH (STRATEGY = LOG); -- default if option is > omitted > CREATE DATABASE ... WITH (STRATEGY = CHECKPOINT); > > So the user has to think about it in terms of some strategy to choose, > rather than enabling or disabling some flag with nontrivial > implications. I don't like those particular strategy names very much, but in general I think that could be a way to go, too. I somewhat hope we never end up with THREE strategies for creating a new database, but now that I think about it, we might. Somebody might want to use a fancy FS primitive that clones a directory at the FS level, or something. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Merging statistics from children instead of re-sampling everything
On Wed, Jun 30, 2021 at 11:15 AM Tomas Vondra wrote: > You're right maintaining a per-partition samples and merging those might > solve (or at least reduce) some of the problems, e.g. eliminating most > of the I/O that'd be needed for sampling. And yeah, it's not entirely > clear how to merge some of the statistics types (like ndistinct). But > for a lot of the basic stats it works quite nicely, I think. It feels like you might in some cases get very different answers. Let's say you have 1000 partitions. In each of those partitions, there is a particular value that appears in column X in 50% of the rows. This value differs for every partition. So you can imagine for example that in partition 1, X = 1 with probability 50%; in partition 2, X = 2 with probability 50%, etc. There is also a value, let's say 0, which appears in 0.5% of the rows in every partition. It seems possible that 0 is not an MCV in any partition, or in only some of them, but it might be more common overall than the #1 MCV of any single partition. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On 2/11/22 15:47, Robert Haas wrote: > On Fri, Feb 11, 2022 at 3:40 PM Andrew Dunstan wrote: >> I'm not really sure any single parameter name is going to capture the >> subtlety involved here. > I mean to some extent that's inevitable, but it's not a reason not to > do the best we can. True. I do think we should be wary of any name starting with "LOG", though. Long experience tells us that's something that confuses users when it refers to the WAL. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On 2022-Feb-11, Robert Haas wrote: > What I find difficult about doing that is that this is all a bunch of > technical details that users may have difficulty understanding. If we > say WAL_LOG or WAL_LOG_DATA, a reasonably but not incredibly > well-informed user will assume that skipping WAL is not really an > option. If we say CHECKPOINT, a reasonably but not incredibly > well-informed user will presume they don't want one (I think). > CHECKPOINT also seems like it's naming the switch by the unwanted side > effect, which doesn't seem too flattering to the existing method. It seems you're thinking deciding what to do based on an option that gets a boolean argument. But what about making the argument be an enum? For example CREATE DATABASE ... WITH (STRATEGY = LOG); -- default if option is omitted CREATE DATABASE ... WITH (STRATEGY = CHECKPOINT); So the user has to think about it in terms of some strategy to choose, rather than enabling or disabling some flag with nontrivial implications. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "[PostgreSQL] is a great group; in my opinion it is THE best open source development communities in existence anywhere."(Lamar Owen)
Re: Per-table storage parameters for TableAM/IndexAM extensions
On Wed, Dec 29, 2021 at 12:08 PM Sadhuprasad Patro wrote: > Open Question: When a user changes AM, then what should be the > behavior for not supported storage options? Should we drop the options > and go with only system storage options? > Or throw an error, in which case the user has to clean the added parameters. A few people have endorsed the error behavior here but I foresee problems. Imagine that I am using the "foo" tableam with "compression=lots" and I want to switch to the "bar" AM which does not support that option. If I remove the "compression=lots" option using a separate command, the "foo" table AM may rewrite my whole table and decompress everything. Then when I convert to the "bar" AM it's going to have to be rewritten again. That's painful. I clearly need some way to switch AMs without having to rewrite the table twice. It's also interesting to consider the other direction. If I am switching from "bar" to "foo" I would really like to be able to add the "compression=lots" option at the same time I make the switch. There needs to be some syntax for that. One way to solve the first of these problem is to silently drop unsupported options. Maybe a better way is to have syntax that allows you to specify options to be added and removed at the time you switch AMs e.g.: ALTER TABLE mytab SET ACCESS METHOD bar OPTIONS (DROP compression); ALTER TABLE mytab SET ACCESS METHOD foo OPTIONS (ADD compression 'lots'); I don't like that particular syntax a ton personally but it does match what we already use for ALTER SERVER. Unfortunately it's wildly inconsistent with what we do for ALTER TABLE. Another idea might be something like: ALTER TABLE mytab SET ACCESS METHOD bar RESET compression; ALTER TABLE mytab SET ACCESS METHOD foo SET compression = 'lots'; You'd need to be able to do multiple things with one command e.g. ALTER TABLE mytab SET ACCESS METHOD baz RESET compression, SET preferred_fruit = 'banana'; Regardless of the details, I don't think it's viable to just say, well, rewrite the table multiple times if that's what it takes. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Fix overflow in DecodeInterval
Joseph Koshakow writes: > The attached patch fixes an overflow bug in DecodeInterval when applying > the units week, decade, century, and millennium. The overflow check logic > was modelled after the overflow check at the beginning of `int > tm2interval(struct pg_tm *tm, fsec_t fsec, Interval *span);` in timestamp.c. Good catch, but I don't think that tm2interval code is best practice anymore. Rather than bringing "double" arithmetic into the mix, you should use the overflow-detecting arithmetic functions in src/include/common/int.h. The existing code here is also pretty faulty in that it doesn't notice addition overflow when combining multiple units. So for example, instead of tm->tm_mday += val * 7; I think we should write something like if (pg_mul_s32_overflow(val, 7, )) return DTERR_FIELD_OVERFLOW; if (pg_add_s32_overflow(tm->tm_mday, tmp, >tm_mday)) return DTERR_FIELD_OVERFLOW; Perhaps some macros could be used to make this more legible? regards, tom lane
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Fri, Feb 11, 2022 at 1:32 PM Bruce Momjian wrote: > > Yeah, maybe. But it's not clear to me with that kind of naming whether > > TRUE or FALSE would be the existing behavior? One version logs a > > single record for the whole database, and the other logs a record per > > database block. Neither version logs per file. LOG_COPIED_BLOCKS, > > maybe? > > Yes, I like BLOCKS more than FILE. Cool. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Fri, Feb 11, 2022 at 3:40 PM Andrew Dunstan wrote: > I'm not really sure any single parameter name is going to capture the > subtlety involved here. I mean to some extent that's inevitable, but it's not a reason not to do the best we can. -- Robert Haas EDB: http://www.enterprisedb.com
Re: support for CREATE MODULE
On Thu, Feb 10, 2022 at 4:17 PM Alvaro Herrera wrote: > On 2022-Feb-04, Tom Lane wrote: > > If we invent modules I think they need to work more like extensions > > naming-wise, ie they group objects but have no effect for naming. > > I think modules are an orthogonal concept to extensions, and trying to > mix both is messy. > > The way I see modules working is as a "logical" grouping of objects -- > they provide encapsulated units of functionality. A module has private > functions, which cannot be called except from other functions in the > same module. You can abstract them out of the database design, leaving > you with only the exposed functions, the public API. > > An extension is a way to distribute or package objects. An extension > can contain a module, and of course you should be able to use an > extension to distribute a module, or even several modules. In fact, I > think it should be possible to have several extensions contribute > different objects to the same module. > > But things like name resolution rules (search path) are not affected by > how the objects are distributed, whereas the search path is critical in > how you think about the objects in a module. > > If modules are just going to be extensions, I see no point. +1. I think that extensions, as we have them in PostgreSQL today, are basically feature-complete. In my experience, they do all the things that we need them to do, and pretty well. I think Tom was correct when he predicted many years ago that getting the extension feature into PostgreSQL would be remembered as one the biggest improvements of that era. (I do not recall his exact words.) But the same is clearly not true of schemas. Schemas are missing features that people expect to have, private objects being one of them, and variables another, and I think there are other things missing, too. Also, search_path is an absolutely wretched design and I'd propose ripping it out if I had any sensible idea what would replace it. IMHO, if we're going to do anything at all in this area, it ought to be targeting the rather-large weaknesses of the schema system, rather than anything about the extension system. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On 2/11/22 13:32, Bruce Momjian wrote: > On Fri, Feb 11, 2022 at 01:18:58PM -0500, Robert Haas wrote: >> On Fri, Feb 11, 2022 at 12:50 PM Bruce Momjian wrote: >>> On Fri, Feb 11, 2022 at 12:35:50PM -0500, Robert Haas wrote: How about something like LOG_AS_CLONE? That makes it clear, I hope, that we're logging it a different way, but that method of logging it is different in each case. You'd still have to read the documentation to find out what it really means, but at least it seems like it points you more in the right direction. To me, anyway. >>> I think CLONE would be confusing since we don't use that term often, >>> maybe LOG_DB_COPY or LOG_FILE_COPY? >> Yeah, maybe. But it's not clear to me with that kind of naming whether >> TRUE or FALSE would be the existing behavior? One version logs a >> single record for the whole database, and the other logs a record per >> database block. Neither version logs per file. LOG_COPIED_BLOCKS, >> maybe? > Yes, I like BLOCKS more than FILE. I'm not really sure any single parameter name is going to capture the subtlety involved here. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Replacing TAP test planning with done_testing()
On 2022-Feb-11, Tom Lane wrote: > Andres Freund writes: > > +1 on backpatching. Backpatching tests now is less likely to cause > > conflicts, > > but more likely to fail during tests. > > If you've got the energy to do it, +1 for backpatching. I agree > with Michael's opinion that doing so will probably save us > future annoyance. +1 -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: support for MERGE
On 2022-Feb-11, Justin Pryzby wrote: > Because I put your patch on top of some other branch with the CI coverage (and > other stuff). Ah, that makes sense. > But it has to figure out where the branch "starts". Which I did by looking at > "git diff --cherry-pick origin..." > > I'm not sure git diff --cherry-pick is widely known/used, but I think > using that relative to master may be good enough. I had never heard of git diff --cherry-pick, and the manpages I found don't document it, so frankly I doubt it's known. I still have no idea what does it do. I suppose there is an obvious reason why using git diff with $(git merge-base ...) as one of the arguments doesn't work for these purposes. > Andres thinks that does the wrong thing if CI is run manually (not by CFBOT) > for patches against backbranches. I wonder if it's sufficient to handle these things (coverage specifically) for branch master only. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "Uno puede defenderse de los ataques; contra los elogios se esta indefenso"
Re: Replacing TAP test planning with done_testing()
Andres Freund writes: > On 2022-02-11 21:04:53 +0100, Daniel Gustafsson wrote: >> I opted out of backpatching for now, to solicit more comments on that. It's >> not a bugfix, but it's also not affecting the compiled bits that we ship, so >> I >> think there's a case to be made both for and against a backpatch. Looking at >> the oldest branch we support, it seems we've done roughly 25 changes to the >> test plans in REL_10_STABLE over the years, so it's neither insignificant nor >> an everyday activity. Personally I don't have strong opinions, what do >> others >> think? > +1 on backpatching. Backpatching tests now is less likely to cause conflicts, > but more likely to fail during tests. If you've got the energy to do it, +1 for backpatching. I agree with Michael's opinion that doing so will probably save us future annoyance. regards, tom lane
Re: Replacing TAP test planning with done_testing()
On 2022-02-11 21:04:53 +0100, Daniel Gustafsson wrote: > I opted out of backpatching for now, to solicit more comments on that. It's > not a bugfix, but it's also not affecting the compiled bits that we ship, so I > think there's a case to be made both for and against a backpatch. Looking at > the oldest branch we support, it seems we've done roughly 25 changes to the > test plans in REL_10_STABLE over the years, so it's neither insignificant nor > an everyday activity. Personally I don't have strong opinions, what do others > think? +1 on backpatching. Backpatching tests now is less likely to cause conflicts, but more likely to fail during tests.
Re: Replacing TAP test planning with done_testing()
> On 10 Feb 2022, at 01:58, Michael Paquier wrote: >>> Daniel Gustafsson writes: The attached patch removes all Test::More planning and instead ensures that all tests conclude with a done_testing() call. Pushed to master now with a few more additional hunks fixing test changes that happened between posting this and now. > Could it be possible to backpatch that even if > this could be qualified as only cosmetic? Each time a test is > backpatched we need to tweak the number of tests planned, and that may > change slightly depending on the branch dealt with. I opted out of backpatching for now, to solicit more comments on that. It's not a bugfix, but it's also not affecting the compiled bits that we ship, so I think there's a case to be made both for and against a backpatch. Looking at the oldest branch we support, it seems we've done roughly 25 changes to the test plans in REL_10_STABLE over the years, so it's neither insignificant nor an everyday activity. Personally I don't have strong opinions, what do others think? -- Daniel Gustafsson https://vmware.com/
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On 2/11/22 19:13, Robert Haas wrote: > On Fri, Feb 11, 2022 at 1:06 PM Tom Lane wrote: >> While I'm not opposed to moving those goalposts at some point, >> I think pushing them all the way up to 9.5 for this one easily-fixed >> problem is not very reasonable. >> >> Given other recent discussion, an argument could be made for moving >> the cutoff to 9.2, on the grounds that it's too hard to test against >> anything older. But that still leaves us needing a version check >> if we want to use TABLESAMPLE. > > OK, thanks. > Yeah, I think checking server_version is fairly simple. Which is mostly why I haven't done anything about that in the submitted patch, because the other issues (what fraction to sample) seemed more important. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Fix overflow in DecodeInterval
The attached patch fixes an overflow bug in DecodeInterval when applying the units week, decade, century, and millennium. The overflow check logic was modelled after the overflow check at the beginning of `int tm2interval(struct pg_tm *tm, fsec_t fsec, Interval *span);` in timestamp.c. This is my first patch, so apologies if I screwed up the process somewhere. - Joe Koshakow From b3c096039a4684646806e2959d7ac9318504b031 Mon Sep 17 00:00:00 2001 From: Joseph Koshakow Date: Fri, 11 Feb 2022 14:18:32 -0500 Subject: [PATCH] Check for overflow when decoding an interval When decoding an interval there are various date units which are aliases for some multiple of another unit. For example a week is 7 days and a decade is 10 years. When decoding these specific units, there is no check for overflow, allowing the interval to overflow. This commit adds an overflow check for all of these units. --- src/backend/utils/adt/datetime.c | 28 ++ src/test/regress/expected/interval.out | 32 ++ src/test/regress/sql/interval.sql | 8 +++ 3 files changed, 64 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 7926258c06..2813e8647d 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -3293,10 +3293,15 @@ DecodeInterval(char **field, int *ftype, int nf, int range, break; case DTK_WEEK: - tm->tm_mday += val * 7; + { + double days = (double) val * 7; + if (days > INT_MAX || days < INT_MIN) + return DTERR_FIELD_OVERFLOW; + tm->tm_mday += days; AdjustFractDays(fval, tm, fsec, 7); tmask = DTK_M(WEEK); break; + } case DTK_MONTH: tm->tm_mon += val; @@ -3311,22 +3316,37 @@ DecodeInterval(char **field, int *ftype, int nf, int range, break; case DTK_DECADE: - tm->tm_year += val * 10; + { + double years = (double) val * 10; + if (years > INT_MAX || years < INT_MIN) + return DTERR_FIELD_OVERFLOW; + tm->tm_year += years; tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 10); tmask = DTK_M(DECADE); break; + } case DTK_CENTURY: - tm->tm_year += val * 100; + { + double years = (double) val * 100; + if (years > INT_MAX || years < INT_MIN) + return DTERR_FIELD_OVERFLOW; + tm->tm_year += years; tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 100); tmask = DTK_M(CENTURY); break; + } case DTK_MILLENNIUM: - tm->tm_year += val * 1000; + { + double years = (double) val * 1000; + if (years > INT_MAX || years < INT_MIN) + return DTERR_FIELD_OVERFLOW; + tm->tm_year += years; tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 1000); tmask = DTK_M(MILLENNIUM); break; + } default: return DTERR_BAD_FORMAT; diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out index accd4a7d90..88865483fa 100644 --- a/src/test/regress/expected/interval.out +++ b/src/test/regress/expected/interval.out @@ -232,6 +232,38 @@ INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 years'); ERROR: interval out of range LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 years'... ^ +INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 weeks'); +ERROR: interval field value out of range: "2147483647 weeks" +LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 weeks')... + ^ +INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 weeks'); +ERROR: interval field value out of range: "-2147483648 weeks" +LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 weeks'... + ^ +INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 decades'); +ERROR: interval field value out of range: "2147483647 decades" +LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 decades... + ^ +INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 decades'); +ERROR: interval field value out of range: "-2147483648 decades" +LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 decade... + ^ +INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 centuries'); +ERROR: interval field value out of range: "2147483647 centuries" +LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647 centuri... + ^ +INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 centuries'); +ERROR: interval field value out of range: "-2147483648 centuries" +LINE 1: INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('-2147483648 centur... + ^ +INSERT INTO INTERVAL_TBL_OF (f1) VALUES ('2147483647
Re: Race condition in TransactionIdIsInProgress
Hi, On 2022-02-11 13:48:59 +, Simon Riggs wrote: > On Fri, 11 Feb 2022 at 08:48, Simon Riggs > wrote: > > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund wrote: > > > > > Looks lik syncrep will make this a lot worse, because it can drastically > > > increase the window between the TransactionIdCommitTree() and > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But > > > at > > > least it might make it easier to write tests exercising this scenario... > > > > Agreed > > > > TransactionIdIsKnownCompleted(xid) is only broken because the single > > item cache is set too early in some cases. The single item cache is > > important for performance, so we just need to be more careful about > > setting the cache. > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache > being set, but this fails! Not worked out why, yet. I don't think it's safe to check !TransactionIdKnownNotInProgress() in TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to do TransactionLogFetch() internally. > just_remove_TransactionIdIsKnownCompleted_call.v1.patch I think this might contain a different diff than what the name suggests? > just removes the known offending call, passes make check, but IMHO > leaves the same error just as likely by other callers. Which other callers are you referring to? To me it seems that the "real" reason behind this specific issue being nontrivial to fix and behind the frequent error of calling TransactionIdDidCommit() before TransactionIdIsInProgress() is that we've really made a mess out of the transaction status determination code / API. IMO the original sin is requiring the complicated if (TransactionIdIsCurrentTransactionId(xid) ... else if (TransactionIdIsInProgress(xid) ... else if (TransactionIdDidCommit(xid) ... dance at pretty much any place checking transaction status. The multiple calls for the same xid is, I think, what pushed the caching down to the TransactionLogFetch level. ISTM that we should introduce something like TransactionIdStatus(xid) that returns XACT_STATUS_CURRENT XACT_STATUS_IN_PROGRESS XACT_STATUS_COMMITTED XACT_STATUS_ABORTED accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks XidInMVCCSnapshot() instead of TransactionIdIsInProgress(). I think that'd also make visibility checks faster - we spend a good chunk of cycles doing unnecessary function calls and repeatedly gathering information. It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more optimized - TransactionIdIsInProgress() knows it doesn't need to check anything before RecentXmin, but TransactionIdIsCurrentTransactionId() doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is smaller than GetTopTransactionIdIfAny() before bsearching through subxids. Of course anything along these lines would never be backpatchable... Greetings, Andres Freund
Re: support for MERGE
On Fri, Feb 11, 2022 at 03:21:43PM -0300, Alvaro Herrera wrote: > On 2022-Jan-28, Justin Pryzby wrote: > > Have you looked at code coverage ? I have an experimental patch to add > > that to > > cirrus, and ran it with this patch; visible here: > > https://cirrus-ci.com/task/6362512059793408 > > Ah, thanks, this is useful. I think it is showing that the new code is > generally well covered, but there are some exceptions, particularly > ExecMergeMatched in some concurrent cases (TM_Deleted and > TM_SelfModified after table_tuple_lock -- those code pages have > user-facing errors but are not covered by any tests.) > > How does this work? I notice it is reporting for > src/bin/pg_upgrade/relfilenode.c, but that file is not changed by this > patch. Because I put your patch on top of some other branch with the CI coverage (and other stuff). It tries to only show files changed by the branch being checked: https://github.com/justinpryzby/postgres/commit/d668142040031915 But it has to figure out where the branch "starts". Which I did by looking at "git diff --cherry-pick origin..." Andres thinks that does the wrong thing if CI is run manually (not by CFBOT) for patches against backbranches. I'm not sure git diff --cherry-pick is widely known/used, but I think using that relative to master may be good enough. Ongoing discussion here. https://www.postgresql.org/message-id/20220203035827.GG23027%40telsasoft.com -- Justin
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Fri, Feb 11, 2022 at 01:18:58PM -0500, Robert Haas wrote: > On Fri, Feb 11, 2022 at 12:50 PM Bruce Momjian wrote: > > On Fri, Feb 11, 2022 at 12:35:50PM -0500, Robert Haas wrote: > > > How about something like LOG_AS_CLONE? That makes it clear, I hope, > > > that we're logging it a different way, but that method of logging it > > > is different in each case. You'd still have to read the documentation > > > to find out what it really means, but at least it seems like it points > > > you more in the right direction. To me, anyway. > > > > I think CLONE would be confusing since we don't use that term often, > > maybe LOG_DB_COPY or LOG_FILE_COPY? > > Yeah, maybe. But it's not clear to me with that kind of naming whether > TRUE or FALSE would be the existing behavior? One version logs a > single record for the whole database, and the other logs a record per > database block. Neither version logs per file. LOG_COPIED_BLOCKS, > maybe? Yes, I like BLOCKS more than FILE. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: support for MERGE
On 2022-Jan-28, Justin Pryzby wrote: > Have you looked at code coverage ? I have an experimental patch to add that > to > cirrus, and ran it with this patch; visible here: > https://cirrus-ci.com/task/6362512059793408 Ah, thanks, this is useful. I think it is showing that the new code is generally well covered, but there are some exceptions, particularly ExecMergeMatched in some concurrent cases (TM_Deleted and TM_SelfModified after table_tuple_lock -- those code pages have user-facing errors but are not covered by any tests.) How does this work? I notice it is reporting for src/bin/pg_upgrade/relfilenode.c, but that file is not changed by this patch. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Fri, Feb 11, 2022 at 12:50 PM Bruce Momjian wrote: > On Fri, Feb 11, 2022 at 12:35:50PM -0500, Robert Haas wrote: > > How about something like LOG_AS_CLONE? That makes it clear, I hope, > > that we're logging it a different way, but that method of logging it > > is different in each case. You'd still have to read the documentation > > to find out what it really means, but at least it seems like it points > > you more in the right direction. To me, anyway. > > I think CLONE would be confusing since we don't use that term often, > maybe LOG_DB_COPY or LOG_FILE_COPY? Yeah, maybe. But it's not clear to me with that kind of naming whether TRUE or FALSE would be the existing behavior? One version logs a single record for the whole database, and the other logs a record per database block. Neither version logs per file. LOG_COPIED_BLOCKS, maybe? -- Robert Haas EDB: http://www.enterprisedb.com
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On Fri, Feb 11, 2022 at 1:06 PM Tom Lane wrote: > While I'm not opposed to moving those goalposts at some point, > I think pushing them all the way up to 9.5 for this one easily-fixed > problem is not very reasonable. > > Given other recent discussion, an argument could be made for moving > the cutoff to 9.2, on the grounds that it's too hard to test against > anything older. But that still leaves us needing a version check > if we want to use TABLESAMPLE. OK, thanks. -- Robert Haas EDB: http://www.enterprisedb.com
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Robert Haas writes: > I think it's going to be necessary to compromise on that at some > point. Sure. The existing postgres_fdw documentation [1] already addresses this issue: postgres_fdw can be used with remote servers dating back to PostgreSQL 8.3. Read-only capability is available back to 8.1. A limitation however is that postgres_fdw generally assumes that immutable built-in functions and operators are safe to send to the remote server for execution, if they appear in a WHERE clause for a foreign table. Thus, a built-in function that was added since the remote server's release might be sent to it for execution, resulting in “function does not exist” or a similar error. This type of failure can be worked around by rewriting the query, for example by embedding the foreign table reference in a sub-SELECT with OFFSET 0 as an optimization fence, and placing the problematic function or operator outside the sub-SELECT. While I'm not opposed to moving those goalposts at some point, I think pushing them all the way up to 9.5 for this one easily-fixed problem is not very reasonable. Given other recent discussion, an argument could be made for moving the cutoff to 9.2, on the grounds that it's too hard to test against anything older. But that still leaves us needing a version check if we want to use TABLESAMPLE. regards, tom lane [1] https://www.postgresql.org/docs/devel/postgres-fdw.html#id-1.11.7.45.17
Re: O(n) tasks cause lengthy startups and checkpoints
Here is a rebased patch set. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 506aa95dd77f16dc64d7fe9c52ca67dd3c10212e Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 5 Jan 2022 19:24:22 + Subject: [PATCH v4 1/8] Introduce custodian. The custodian process is a new auxiliary process that is intended to help offload tasks could otherwise delay startup and checkpointing. This commit simply adds the new process; it does not yet do anything useful. --- src/backend/postmaster/Makefile | 1 + src/backend/postmaster/auxprocess.c | 8 + src/backend/postmaster/custodian.c | 213 src/backend/postmaster/postmaster.c | 44 - src/backend/storage/lmgr/proc.c | 1 + src/backend/utils/activity/wait_event.c | 3 + src/backend/utils/init/miscinit.c | 3 + src/include/miscadmin.h | 3 + src/include/postmaster/custodian.h | 17 ++ src/include/storage/proc.h | 11 +- src/include/utils/wait_event.h | 1 + 11 files changed, 300 insertions(+), 5 deletions(-) create mode 100644 src/backend/postmaster/custodian.c create mode 100644 src/include/postmaster/custodian.h diff --git a/src/backend/postmaster/Makefile b/src/backend/postmaster/Makefile index dbbeac5a82..1b7aae60f5 100644 --- a/src/backend/postmaster/Makefile +++ b/src/backend/postmaster/Makefile @@ -18,6 +18,7 @@ OBJS = \ bgworker.o \ bgwriter.o \ checkpointer.o \ + custodian.o \ fork_process.o \ interrupt.o \ pgarch.o \ diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c index 0587e45920..7eae34884d 100644 --- a/src/backend/postmaster/auxprocess.c +++ b/src/backend/postmaster/auxprocess.c @@ -20,6 +20,7 @@ #include "pgstat.h" #include "postmaster/auxprocess.h" #include "postmaster/bgwriter.h" +#include "postmaster/custodian.h" #include "postmaster/startup.h" #include "postmaster/walwriter.h" #include "replication/walreceiver.h" @@ -74,6 +75,9 @@ AuxiliaryProcessMain(AuxProcType auxtype) case CheckpointerProcess: MyBackendType = B_CHECKPOINTER; break; + case CustodianProcess: + MyBackendType = B_CUSTODIAN; + break; case WalWriterProcess: MyBackendType = B_WAL_WRITER; break; @@ -153,6 +157,10 @@ AuxiliaryProcessMain(AuxProcType auxtype) CheckpointerMain(); proc_exit(1); + case CustodianProcess: + CustodianMain(); + proc_exit(1); + case WalWriterProcess: WalWriterMain(); proc_exit(1); diff --git a/src/backend/postmaster/custodian.c b/src/backend/postmaster/custodian.c new file mode 100644 index 00..dd86f0f5ce --- /dev/null +++ b/src/backend/postmaster/custodian.c @@ -0,0 +1,213 @@ +/*- + * + * custodian.c + * + * The custodian process is new as of Postgres 15. It's main purpose is to + * offload tasks that could otherwise delay startup and checkpointing, but + * it needn't be restricted to just those things. Offloaded tasks should + * not be synchronous (e.g., checkpointing shouldn't need to wait for the + * custodian to complete a task before proceeding). Also, ensure that any + * offloaded tasks are either not required during single-user mode or are + * performed separately during single-user mode. + * + * The custodian is not an essential process and can shutdown quickly when + * requested. The custodian will wake up approximately once every 5 + * minutes to perform its tasks, but backends can (and should) set its + * latch to wake it up sooner. + * + * Normal termination is by SIGTERM, which instructs the bgwriter to + * exit(0). Emergency termination is by SIGQUIT; like any backend, the + * custodian will simply abort and exit on SIGQUIT. + * + * If the custodian exits unexpectedly, the postmaster treats that the same + * as a backend crash: shared memory may be corrupted, so remaining + * backends should be killed by SIGQUIT and then a recovery cycle started. + * + * + * Copyright (c) 2022, PostgreSQL Global Development Group + * + * + * IDENTIFICATION + * src/backend/postmaster/custodian.c + * + *- + */ +#include "postgres.h" + +#include + +#include "libpq/pqsignal.h" +#include "pgstat.h" +#include "postmaster/custodian.h" +#include "postmaster/interrupt.h" +#include "storage/bufmgr.h" +#include "storage/condition_variable.h" +#include "storage/proc.h" +#include "storage/procsignal.h" +#include "utils/memutils.h" + +#define CUSTODIAN_TIMEOUT_S (300) /* 5 minutes */ + +/* + * Main entry point for custodian process + * + * This is invoked from AuxiliaryProcessMain, which has already created the + * basic execution environment, but not enabled signals yet. + */ +void +CustodianMain(void) +{ + sigjmp_buf local_sigjmp_buf; + MemoryContext custodian_context; + + /* + * Properly accept or ignore signals that might be sent to us. + */ +
[PATCH] Support % wildcard in extension upgrade filenames
At PostGIS we've been thinking of ways to have better support, from PostgreSQL proper, to our upgrade strategy, which has always consisted in a SINGLE upgrade file good for upgrading from any older version. The current lack of such support for EXTENSIONs forced us to install a lot of files on disk and we'd like to stop doing that at some point in the future. The proposal is to support wildcards for versions encoded in the filename so that (for our case) a single wildcard could match "any version". I've been thinking about the '%' character for that, to resemble the wildcard used for LIKE. Here's the proposal: https://lists.osgeo.org/pipermail/postgis-devel/2022-February/029500.html A very very short (and untested) patch which might (or might not) support our case is attached. The only problem with my proposal/patch would be the presence, on the wild, of PostgreSQL EXTENSION actually using the '%' character in their version strings, which is currently considered legit by PostgreSQL. How do you feel about the proposal (which is wider than the patch) ? --strk; Libre GIS consultant/developer https://strk.kbt.io/services.html >From e9d060b19c655924c4edb6679169261d605f735d Mon Sep 17 00:00:00 2001 From: Sandro Santilli Date: Fri, 11 Feb 2022 18:49:45 +0100 Subject: [PATCH] Support % wildcard in extension upgrade scripts --- src/backend/commands/extension.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index a2e77c418a..aafd9c17ee 100644 --- a/src/backend/commands/extension.c +++ b/src/backend/commands/extension.c @@ -1072,6 +1072,8 @@ get_ext_ver_info(const char *versionname, List **evi_list) foreach(lc, *evi_list) { evi = (ExtensionVersionInfo *) lfirst(lc); + if (strcmp(evi->name, "%") == 0) + return evi; if (strcmp(evi->name, versionname) == 0) return evi; } -- 2.30.2
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Fri, Feb 11, 2022 at 12:35:50PM -0500, Robert Haas wrote: > How about something like LOG_AS_CLONE? That makes it clear, I hope, > that we're logging it a different way, but that method of logging it > is different in each case. You'd still have to read the documentation > to find out what it really means, but at least it seems like it points > you more in the right direction. To me, anyway. I think CLONE would be confusing since we don't use that term often, maybe LOG_DB_COPY or LOG_FILE_COPY? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
On Fri, Feb 11, 2022 at 12:39 PM Tom Lane wrote: > Tomas Vondra writes: > > So here we go. The patch does a very simple thing - it uses TABLESAMPLE > > to collect/transfer just a small sample from the remote node, saving > > both CPU and network. > > This is great if the remote end has TABLESAMPLE, but pre-9.5 servers > don't, and postgres_fdw is supposed to still work with old servers. > So you need some conditionality for that. I think it's going to be necessary to compromise on that at some point. I don't, for example, think it would be reasonable for postgres_fdw to have detailed knowledge of which operators can be pushed down as a function of the remote PostgreSQL version. Nor do I think that we care about whether this works at all against, say, PostgreSQL 8.0. I'm not sure where it's reasonable to draw a line and say we're not going to expend any more effort, and maybe 15 with 9.5 is a small enough gap that we still care at least somewhat about compatibility. But even that is not 100% obvious to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: postgres_fdw: using TABLESAMPLE to collect remote sample
Tomas Vondra writes: > So here we go. The patch does a very simple thing - it uses TABLESAMPLE > to collect/transfer just a small sample from the remote node, saving > both CPU and network. This is great if the remote end has TABLESAMPLE, but pre-9.5 servers don't, and postgres_fdw is supposed to still work with old servers. So you need some conditionality for that. regards, tom lane
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On Fri, Feb 11, 2022 at 12:11 PM Andrew Dunstan wrote: > The last one at least has the advantage that it doesn't invent yet > another keyword. We don't need a new keyword for this as long as it lexes as one token, because createdb_opt_name accepts IDENT. So I think we should focus on trying to come up with something that is as clear as we know how to make it. What I find difficult about doing that is that this is all a bunch of technical details that users may have difficulty understanding. If we say WAL_LOG or WAL_LOG_DATA, a reasonably but not incredibly well-informed user will assume that skipping WAL is not really an option. If we say CHECKPOINT, a reasonably but not incredibly well-informed user will presume they don't want one (I think). CHECKPOINT also seems like it's naming the switch by the unwanted side effect, which doesn't seem too flattering to the existing method. How about something like LOG_AS_CLONE? That makes it clear, I hope, that we're logging it a different way, but that method of logging it is different in each case. You'd still have to read the documentation to find out what it really means, but at least it seems like it points you more in the right direction. To me, anyway. > I can live with the new method being the default. I'm sure it would be > highlighted in the release notes too. That would make sense. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Add TAP test to automate the equivalent of check_guc
Justin Pryzby writes: > Or, is it okay to use ABS_SRCDIR? Don't see why not --- other tests certainly do. regards, tom lane
Re: pgsql: Add TAP test to automate the equivalent of check_guc
On Fri, Feb 11, 2022 at 10:41:27AM -0500, Tom Lane wrote: > Justin Pryzby writes: > > On Fri, Feb 11, 2022 at 09:59:55AM -0500, Tom Lane wrote: > >> This seems like a pretty bad idea even if it weren't failing outright. > >> We should be examining the version of the file that's in the source > >> tree; the one in the installation tree might have version-skew > >> problems, if you've not yet done "make install". > > > My original way used the source tree, but Michael thought it would be an > > issue > > for "installcheck" where the config may not be available. > > Yeah, you are at risk either way, but in practice nobody is going to be > running these TAP tests without a source tree. > > > This is what I had written: > > FROM (SELECT regexp_split_to_table(pg_read_file('postgresql.conf'), '\n') > > AS ln) conf > > That's not using the source tree either, but the copy in the > cluster-under-test. I'd fear it to be unstable in the buildfarm, where > animals can append whatever they please to the config file being used by > tests. The important test is that "new GUCs exist in sample.conf". The converse test is less interesting, so I think the important half would be okay. I see the BF client appends to postgresql.conf. https://github.com/PGBuildFarm/client-code/blob/main/run_build.pl#L1374 It could use "include", which was added in v8.2. Or it could add a marker, like pg_regress does: src/test/regress/pg_regress.c: fputs("\n# Configuration added by pg_regress\n\n", pg_conf); If it were desirable to also check that "things that look like GUCs in the conf are actually GUCs", either of those would avoid false positives. Or, is it okay to use ABS_SRCDIR? +\getenv abs_srcdir PG_ABS_SRCDIR +\set filename :abs_srcdir '../../../../src/backend/utils/misc/postgresql.conf.sample' +-- test that GUCS are in postgresql.conf +SELECT lower(name) FROM tab_settings_flags WHERE NOT not_in_sample EXCEPT +SELECT regexp_replace(ln, '^#?([_[:alpha:]]+) (= .*|[^ ]*$)', '\1') AS guc +FROM (SELECT regexp_split_to_table(pg_read_file(:'filename'), '\n') AS ln) conf +WHERE ln ~ '^#?[[:alpha:]]' +ORDER BY 1; + +-- test that lines in postgresql.conf that look like GUCs are GUCs +SELECT regexp_replace(ln, '^#?([_[:alpha:]]+) (= .*|[^ ]*$)', '\1') AS guc +FROM (SELECT regexp_split_to_table(pg_read_file(:'filename'), '\n') AS ln) conf +WHERE ln ~ '^#?[[:alpha:]]' +EXCEPT SELECT lower(name) FROM tab_settings_flags WHERE NOT not_in_sample +ORDER BY 1; -- Justin
Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
On 2/10/22 07:32, Dilip Kumar wrote: > On Wed, Feb 9, 2022 at 9:31 PM Robert Haas wrote: >> On Wed, Feb 9, 2022 at 10:59 AM Dilip Kumar wrote: >>> On Wed, Feb 9, 2022 at 9:25 PM Andrew Dunstan wrote: On 6/16/21 03:52, Dilip Kumar wrote: > On Tue, Jun 15, 2021 at 7:01 PM Andrew Dunstan > wrote: >> Rather than use size, I'd be inclined to say use this if the source >> database is marked as a template, and use the copydir approach for >> anything that isn't. > Yeah, that is possible, on the other thought wouldn't it be good to > provide control to the user by providing two different commands, e.g. > COPY DATABASE for the existing method (copydir) and CREATE DATABASE > for the new method (fully wal logged)? This proposal seems to have gotten lost. >>> Yeah, I am planning to work on this part so that we can support both >>> methods. >> But can we pick a different syntax? In my view this should be an >> option to CREATE DATABASE rather than a whole new command. > Maybe we can provide something like > > CREATE DATABASE..WITH WAL_LOG=true/false ? OR > CREATE DATABASE..WITH WAL_LOG_DATA_PAGE=true/false ? OR > CREATE DATABASE..WITH CHECKPOINT=true/false ? OR > > And then we can explain in documentation about these options? I think > default should be new method? > > The last one at least has the advantage that it doesn't invent yet another keyword. I can live with the new method being the default. I'm sure it would be highlighted in the release notes too. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: the build farm is ok, but not the hippopotamus (or the jay)
On 2/11/22 16:16, Robert Haas wrote: Hi, I was looking over the buildfarm results yesterday and this morning, and things seem to be mostly green, but not entirely. 'hippopotamus' was failing yesterday with strange LDAP-related errors, and is now complaining about pgsql-Git-Dirty, which I assume means that someone is doing something funny on that machine manually. 'jay' has the same maintainer and is also showing some failures. Yes, those are "my" machines. I've been upgrading the OS on them, and this is likely a fallout from that. Sorry about the noise. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
postgres_fdw: using TABLESAMPLE to collect remote sample
Hi, here's a small patch modifying postgres_fdw to use TABLESAMPLE to collect sample when running ANALYZE on a foreign table. Currently the sampling happens locally, i.e. we fetch all data from the remote server and then pick rows. But that's hugely expensive for large relations and/or with limited network bandwidth, of course. Alexander Lepikhov mentioned this in [1], but it was actually proposed by Stephen in 2020 [2] but no patch even materialized. So here we go. The patch does a very simple thing - it uses TABLESAMPLE to collect/transfer just a small sample from the remote node, saving both CPU and network. And indeed, that helps quite a bit: - create table t (a int); Time: 10.223 ms insert into t select i from generate_series(1,1000) s(i); Time: 552.207 ms analyze t; Time: 310.445 ms CREATE FOREIGN TABLE ft (a int) SERVER foreign_server OPTIONS (schema_name 'public', table_name 't'); Time: 4.838 ms analyze ft; WARNING: SQL: DECLARE c1 CURSOR FOR SELECT a FROM public.t TABLESAMPLE SYSTEM(0.375001) Time: 44.632 ms alter foreign table ft options (sample 'false'); Time: 4.821 ms analyze ft; WARNING: SQL: DECLARE c1 CURSOR FOR SELECT a FROM public.t Time: 6690.425 ms (00:06.690) - 6690ms without sampling, and 44ms with sampling - quite an improvement. Of course, the difference may be much smaller/larger in other cases. Now, there's a couple issues ... Firstly, the FDW API forces a bit strange division of work between different methods and duplicating some of it (and it's already mentioned in postgresAnalyzeForeignTable). But that's minor and can be fixed. The other issue is which sampling method to use - we have SYSTEM and BERNOULLI built in, and the tsm_system_rows as an extension (and _time, but that's not very useful here). I guess we'd use one of the built-in ones, because that'll work on more systems out of the box. But that leads to the main issue - determining the fraction of rows to sample. We know how many rows we want to sample, but we have no idea how many rows there are in total. We can look at reltuples, but we can't be sure how accurate / up-to-date that value is. The patch just trusts it unless it's obviously bogus (-1, 0, etc.) and applies some simple sanity checks, but I wonder if we need to do more (e.g. look at relation size and adjust reltuples by current/relpages). FWIW this is yet a bit more convoluted when analyzing partitioned table with foreign partitions, because we only ever look at relation sizes to determine how many rows to sample from each partition. regards [1] https://www.postgresql.org/message-id/bdb0bea2-a0da-1f1d-5c92-96ff90c198eb%40postgrespro.ru [2] https://www.postgresql.org/message-id/20200829162231.GE29590%40tamriel.snowman.net -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bf12eac0288..68f875c47a6 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -2267,6 +2267,26 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) appendStringInfo(buf, "::pg_catalog.regclass) / %d", BLCKSZ); } +/* + * Construct SELECT statement to acquire numbe of rows of given relation. + * + * Note: Maybe this should compare relpages and current relation size + * and adjust reltuples accordingly? + */ +void +deparseAnalyzeTuplesSql(StringInfo buf, Relation rel) +{ + StringInfoData relname; + + /* We'll need the remote relation name as a literal. */ + initStringInfo(); + deparseRelation(, rel); + + appendStringInfoString(buf, "SELECT reltuples FROM pg_catalog.pg_class WHERE oid = "); + deparseStringLiteral(buf, relname.data); + appendStringInfoString(buf, "::pg_catalog.regclass"); +} + /* * Construct SELECT statement to acquire sample rows of given relation. * @@ -2328,6 +2348,72 @@ deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs) deparseRelation(buf, rel); } +/* + * Construct SELECT statement to acquire sample rows of given relation, + * by sampling a fraction of the table using TABLESAMPLE SYSTEM. + * + * SELECT command is appended to buf, and list of columns retrieved + * is returned to *retrieved_attrs. + * + * Note: We could allow selecting system/bernoulli, and maybe even the + * optional TSM modules (especially tsm_system_rows would help). + */ +void +deparseAnalyzeSampleSql(StringInfo buf, Relation rel, List **retrieved_attrs, double sample_frac) +{ + Oid relid = RelationGetRelid(rel); + TupleDesc tupdesc = RelationGetDescr(rel); + int i; + char *colname; + List *options; + ListCell *lc; + bool first = true; + + *retrieved_attrs = NIL; + + appendStringInfoString(buf, "SELECT "); + for (i = 0; i < tupdesc->natts; i++) + { + /* Ignore dropped columns. */ + if
Re: Assertion failure in WaitForWALToBecomeAvailable state machine
On Fri, Feb 11, 2022 at 6:31 PM Dilip Kumar wrote: > > On Fri, Feb 11, 2022 at 6:22 PM Bharath Rupireddy > wrote: > > > > On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar wrote: > > > > > > IIUC, the issue can happen while the walreceiver failed to get WAL > > from primary for whatever reasons and its status is not > > WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead > > in WaitForWALToBecomeAvailable for reading from archive which ends up > > in this assertion failure. ITSM, a rare scenario and it depends on > > what walreceiver does between failure to get WAL from primary and > > updating status to WALRCV_STOPPING or WALRCV_STOPPED. > > > > If the above race condition is a serious problem, if one thinks at > > least it is a problem at all, that needs to be fixed. > > I don't think we can design a software which has open race conditions > even though they are rarely occurring :) Yes. > I don't think > > just making InstallXLogFileSegmentActive false is enough. By looking > > at the comment [1], it doesn't make sense to move ahead for restoring > > from the archive location without the WAL receiver fully stopped. > > IMO, the real fix is to just remove WalRcvStreaming() and call > > XLogShutdownWalRcv() unconditionally. Anyways, we have the > > Assert(!WalRcvStreaming()); down below. I don't think it will create > > any problem. > > If WalRcvStreaming() is returning false that means walreceiver is > already stopped so we don't need to shutdown it externally. I think > like we are setting this flag outside start streaming we can reset it > also outside XLogShutdownWalRcv. Or I am fine even if we call > XLogShutdownWalRcv() because if walreceiver is stopped it will just > reset the flag we want it to reset and it will do nothing else. As I said, I'm okay with just calling XLogShutdownWalRcv() unconditionally as it just returns in case walreceiver has already stopped and updates the InstallXLogFileSegmentActive flag to false. Let's also hear what other hackers have to say about this. Regards, Bharath Rupireddy.
Re: pg_receivewal.exe unhandled exception in zlib1.dll
Hi, On 2022-02-11 16:33:11 +0100, Juan José Santamaría Flecha wrote: > If you execute pg_receivewal.exe with the option "--compression-method > gzip" it will fail with no error. You can see an error in the db log: > > 2022-02-10 11:46:32.725 CET [11664][walsender] [pg_receivewal][3/0:0] LOG: > could not receive data from client: An existing connection was forcibly > closed by the remote host. > > Tracing the execution you can see: > > Unhandled exception at 0x7FFEA78C1208 (ucrtbase.dll) in An invalid > parameter was passed to a function that considers invalid parameters fatal. > ucrtbase.dll!7ff8e8ae36a2() Unknown > > zlib1.dll!gz_comp(gz_state * state, int flush) Line 111 C > zlib1.dll!gz_write(gz_state * state, const void * buf, unsigned __int64 > len) Line 235 C > pg_receivewal.exe!dir_write(void * f, const void * buf, unsigned __int64 > count) Line 300 C > pg_receivewal.exe!ProcessXLogDataMsg(pg_conn * conn, StreamCtl * stream, > char * copybuf, int len, unsigned __int64 * blockpos) Line 1150 C > pg_receivewal.exe!HandleCopyStream(pg_conn * conn, StreamCtl * stream, > unsigned __int64 * stoppos) Line 850 C > pg_receivewal.exe!ReceiveXlogStream(pg_conn * conn, StreamCtl * stream) > Line 605 C > pg_receivewal.exe!StreamLog() Line 636 C > pg_receivewal.exe!main(int argc, char * * argv) Line 1005 C > > The problem comes from the file descriptor passed to gzdopen() in > 'src/bin/pg_basebackup/walmethods.c'. Using gzopen() instead, solves the > issue without ifdefing for WIN32. Please find attached a patch for so. I hit this as well. The problem is really caused by using a debug build of postgres vs a production build of zlib. The use different C runtimes, with different file descriptors. A lot of resources in the windows world are unfortunately tied to the C runtime and that there can multiple C runtimes in a single process. Greetings, Andres Freund
Re: refactoring basebackup.c
On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby wrote: > FYI: there's a couple typos in the last 2 patches. Hmm. OK. But I don't consider "can be optionally specified" incorrect or worse than "can optionally be specified". I do agree that spelling words correctly is a good idea. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Add TAP test to automate the equivalent of check_guc
Justin Pryzby writes: > On Fri, Feb 11, 2022 at 09:59:55AM -0500, Tom Lane wrote: >> This seems like a pretty bad idea even if it weren't failing outright. >> We should be examining the version of the file that's in the source >> tree; the one in the installation tree might have version-skew >> problems, if you've not yet done "make install". > My original way used the source tree, but Michael thought it would be an issue > for "installcheck" where the config may not be available. Yeah, you are at risk either way, but in practice nobody is going to be running these TAP tests without a source tree. > This is what I had written: > FROM (SELECT regexp_split_to_table(pg_read_file('postgresql.conf'), '\n') AS > ln) conf That's not using the source tree either, but the copy in the cluster-under-test. I'd fear it to be unstable in the buildfarm, where animals can append whatever they please to the config file being used by tests. regards, tom lane
Re: OpenSSL conflicts with wincrypt.h
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= writes: > There is a comment in 'src/backend/libpq/be-secure-openssl.c' addressing > this issue, but I have to explicitly undefine X509_NAME. Please find > attached a patch for so. Um ... why? Shouldn't the #undef in the OpenSSL headers take care of the problem? regards, tom lane
Re: pgsql: Add TAP test to automate the equivalent of check_guc
On Fri, Feb 11, 2022 at 09:59:55AM -0500, Tom Lane wrote: > Christoph Berg writes: > > this test is failing at Debian package compile time: > > Could not open /usr/share/postgresql/15/postgresql.conf.sample: No such > > file or directory at t/003_check_guc.pl line 47. > > > So it's trying to read from /usr/share/postgresql which doesn't exist > > yet at build time. > > > The relevant part of the test is this: > > > # Find the location of postgresql.conf.sample, based on the information > > # provided by pg_config. > > my $sample_file = > > $node->config_data('--sharedir') . '/postgresql.conf.sample'; > > This seems like a pretty bad idea even if it weren't failing outright. > We should be examining the version of the file that's in the source > tree; the one in the installation tree might have version-skew > problems, if you've not yet done "make install". My original way used the source tree, but Michael thought it would be an issue for "installcheck" where the config may not be available. https://www.postgresql.org/message-id/YfTg/WHNLVVygy8v%40paquier.xyz This is what I had written: -- test that GUCS are in postgresql.conf SELECT lower(name) FROM tab_settings_flags WHERE NOT not_in_sample EXCEPT SELECT regexp_replace(ln, '^#?([_[:alpha:]]+) (= .*|[^ ]*$)', '\1') AS guc FROM (SELECT regexp_split_to_table(pg_read_file('postgresql.conf'), '\n') AS ln) conf WHERE ln ~ '^#?[[:alpha:]]' ORDER BY 1; lower - config_file plpgsql.check_asserts plpgsql.extra_errors plpgsql.extra_warnings plpgsql.print_strict_params plpgsql.variable_conflict (6 rows) -- test that lines in postgresql.conf that look like GUCs are GUCs SELECT regexp_replace(ln, '^#?([_[:alpha:]]+) (= .*|[^ ]*$)', '\1') AS guc FROM (SELECT regexp_split_to_table(pg_read_file('postgresql.conf'), '\n') AS ln) conf WHERE ln ~ '^#?[[:alpha:]]' EXCEPT SELECT lower(name) FROM tab_settings_flags WHERE NOT not_in_sample ORDER BY 1; guc --- include include_dir include_if_exists (3 rows) -- Justin
pg_receivewal.exe unhandled exception in zlib1.dll
Hello, If you execute pg_receivewal.exe with the option "--compression-method gzip" it will fail with no error. You can see an error in the db log: 2022-02-10 11:46:32.725 CET [11664][walsender] [pg_receivewal][3/0:0] LOG: could not receive data from client: An existing connection was forcibly closed by the remote host. Tracing the execution you can see: Unhandled exception at 0x7FFEA78C1208 (ucrtbase.dll) in An invalid parameter was passed to a function that considers invalid parameters fatal. ucrtbase.dll!7ff8e8ae36a2() Unknown > zlib1.dll!gz_comp(gz_state * state, int flush) Line 111 C zlib1.dll!gz_write(gz_state * state, const void * buf, unsigned __int64 len) Line 235 C pg_receivewal.exe!dir_write(void * f, const void * buf, unsigned __int64 count) Line 300 C pg_receivewal.exe!ProcessXLogDataMsg(pg_conn * conn, StreamCtl * stream, char * copybuf, int len, unsigned __int64 * blockpos) Line 1150 C pg_receivewal.exe!HandleCopyStream(pg_conn * conn, StreamCtl * stream, unsigned __int64 * stoppos) Line 850 C pg_receivewal.exe!ReceiveXlogStream(pg_conn * conn, StreamCtl * stream) Line 605 C pg_receivewal.exe!StreamLog() Line 636 C pg_receivewal.exe!main(int argc, char * * argv) Line 1005 C The problem comes from the file descriptor passed to gzdopen() in 'src/bin/pg_basebackup/walmethods.c'. Using gzopen() instead, solves the issue without ifdefing for WIN32. Please find attached a patch for so. Regards, Juan José Santamaría Flecha diff --git a/src/bin/pg_basebackup/walmethods.c b/src/bin/pg_basebackup/walmethods.c index a6d08c1..bc0790a 100644 --- a/src/bin/pg_basebackup/walmethods.c +++ b/src/bin/pg_basebackup/walmethods.c @@ -143,7 +143,7 @@ dir_open_for_write(const char *pathname, const char *temp_suffix, size_t pad_to_ #ifdef HAVE_LIBZ if (dir_data->compression_method == COMPRESSION_GZIP) { - gzfp = gzdopen(fd, "wb"); + gzfp = gzopen(tmppath, "wb"); if (gzfp == NULL) { dir_data->lasterrno = errno;
OpenSSL conflicts with wincrypt.h
Hello, When building Postgres using MSVC with Kerberos Version 4.1 and OpenSSL 1.1.1l (both of them, using only one will raise no errors), I see errors like: "C:\postgres\pgsql.sln" (default target) (1) -> "C:\postgres\postgres.vcxproj" (default target) (2) -> (ClCompile target) -> C:\postgres\src\backend\libpq\be-secure-openssl.c(583,43): warning C4047: 'function': 'X509_NAME *' differs in levels of indirection from 'int' [C:\postgres\pos tgres.vcxproj] "C:\postgres\pgsql.sln" (default target) (1) -> "C:\postgres\postgres.vcxproj" (default target) (2) -> (ClCompile target) -> C:\postgres\src\backend\libpq\be-secure-openssl.c(74,35): error C2143: syntax error: missing ')' before '(' [C:\postgres\postgres.vcxproj] There is a comment in 'src/backend/libpq/be-secure-openssl.c' addressing this issue, but I have to explicitly undefine X509_NAME. Please find attached a patch for so. Regards, Juan José Santamaría Flecha diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 3d0168a..649bdc3 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -49,6 +49,9 @@ #ifndef OPENSSL_NO_ECDH #include #endif +#ifdef WIN32 +#undef X509_NAME +#endif #include
Re: refactoring basebackup.c
On Fri, Feb 11, 2022 at 08:35:25PM +0530, Jeevan Ladhe wrote: > Thanks Robert for the bravity :-) FYI: there's a couple typos in the last 2 patches. I added them to my typos branch; feel free to wait until April if you'd prefer to see them fixed in bulk. diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 53aa40dcd19..649b91208f3 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -419,7 +419,7 @@ PostgreSQL documentation The compression method can be set to gzip or lz4, or none for no -compression. A compression level can be optionally specified, by +compression. A compression level can optionally be specified, by appending the level number after a colon (:). If no level is specified, the default compression level will be used. If only a level is specified without mentioning an algorithm, @@ -440,7 +440,7 @@ PostgreSQL documentation -Xstream, pg_wal.tar will be compressed using gzip if client-side gzip compression is selected, but will not be compressed if server-side -compresion or LZ4 compresion is selected. +compression or LZ4 compression is selected.
the build farm is ok, but not the hippopotamus (or the jay)
Hi, I was looking over the buildfarm results yesterday and this morning, and things seem to be mostly green, but not entirely. 'hippopotamus' was failing yesterday with strange LDAP-related errors, and is now complaining about pgsql-Git-Dirty, which I assume means that someone is doing something funny on that machine manually. 'jay' has the same maintainer and is also showing some failures. The last failure on jay looks like this: test pgp-armor... ok 126 ms test pgp-decrypt ... ok 192 ms test pgp-encrypt ... ok 611 ms test pgp-compression ... FAILED (test process exited with exit code 127)2 ms test pgp-pubkey-decrypt ... FAILED (test process exited with exit code 127)2 ms test pgp-pubkey-encrypt ... FAILED9 ms test pgp-info ... FAILED8 ms Is there some maintenance happening on these machines that means we should discount these failures, or is something broken? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Merging statistics from children instead of re-sampling everything
On 2/11/22 05:29, Andrey V. Lepikhov wrote: On 2/11/22 03:37, Tomas Vondra wrote: That being said, this thread was not really about foreign partitions, but about re-analyzing inheritance trees in general. And sampling foreign partitions doesn't really solve that - we'll still do the sampling over and over. IMO, to solve the problem we should do two things: 1. Avoid repeatable partition scans in the case inheritance tree. 2. Avoid to re-analyze everything in the case of active changes in small subset of partitions. For (1) i can imagine a solution like multiplexing: on the stage of defining which relations to scan, group them and prepare parameters of scanning to make multiple samples in one shot. >> It looks like we need a separate logic for analysis of partitioned tables - we should form and cache samples on each partition before an analysis. It requires a prototype to understand complexity of such solution and can be done separately from (2). I'm not sure I understand what you mean by multiplexing. The term usually means "sending multiple signals at once" but I'm not sure how that applies to this issue. Can you elaborate? I assume you mean something like collecting a sample for a partition once, and then keeping and reusing the sample for future ANALYZE runs, until invalidated in some sense. Yeah, I agree that'd be useful - and not just for partitions, actually. I've been pondering something like that for regular tables, because the sample might be used for estimation of clauses directly. But it requires storing the sample somewhere, and I haven't found a good and simple way to do that. We could serialize that into bytea, or we could create a new fork, or something, but what should that do with oversized attributes (how would TOAST work for a fork) and/or large samples (which might not fit into 1GB bytea)? Task (2) is more difficult to solve. Here we can store samples from each partition in values[] field of pg_statistic or in specific table which stores a 'most probable values' snapshot of each table. I think storing samples in pg_statistic is problematic, because values[] is subject to 1GB limit etc. Not an issue for small MCV list of a single attribute, certainly an issue for larger samples. Even if the data fit, the size of pg_statistic would explode. Most difficult problem here, as you mentioned, is ndistinct value. Is it possible to store not exactly calculated value of ndistinct, but an 'expected value', based on analysis of samples and histograms on partitions? Such value can solve also a problem of estimation of a SETOP result grouping (joining of them, etc), where we have statistics only on sources of the union. I think ndistinct is problem only when merging final estimates. But if we're able to calculate and store some intermediate results, we can easily use HLL and merge that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Teach pg_receivewal to use lz4 compression
On Thu, Nov 4, 2021 at 10:47 PM Michael Paquier wrote: > Indeed. My rebase was a bit sloppy here. Hi! Over in http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pgebaosgfmzz6ac4_srwom_+u...@mail.gmail.com I was noticing that CreateWalTarMethod doesn't support LZ4 compression. It would be nice if it did. I thought maybe the patch on this thread would fix that, but I think maybe it doesn't, because it looks like that's touching the WalDirectoryMethod part of that file, rather than the WalTarMethod part. Is that correct? And, on a related note, Michael, do you plan to get something committed here? -- Robert Haas EDB: http://www.enterprisedb.com
Re: refactoring basebackup.c
Thanks Robert for the bravity :-) Regards, Jeevan Ladhe On Fri, 11 Feb 2022, 20:31 Robert Haas, wrote: > On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit > wrote: > > > Sure, please find the rebased patch attached. > > > > Thanks, I have validated v2 patch on top of rebased patch. > > I'm still feeling brave, so I committed this too after fixing a few > things. In the process I noticed that we don't have support for LZ4 > compression of streamed WAL (cf. CreateWalTarMethod). It would be good > to fix that. I'm not quite sure whether > > http://postgr.es/m/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me > is basically what we need or whether something else is required. > > -- > Robert Haas > EDB: http://www.enterprisedb.com >
Re: refactoring basebackup.c
On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit wrote: > > Sure, please find the rebased patch attached. > > Thanks, I have validated v2 patch on top of rebased patch. I'm still feeling brave, so I committed this too after fixing a few things. In the process I noticed that we don't have support for LZ4 compression of streamed WAL (cf. CreateWalTarMethod). It would be good to fix that. I'm not quite sure whether http://postgr.es/m/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me is basically what we need or whether something else is required. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pgsql: Add TAP test to automate the equivalent of check_guc
Christoph Berg writes: > this test is failing at Debian package compile time: > Could not open /usr/share/postgresql/15/postgresql.conf.sample: No such file > or directory at t/003_check_guc.pl line 47. > So it's trying to read from /usr/share/postgresql which doesn't exist > yet at build time. > The relevant part of the test is this: > # Find the location of postgresql.conf.sample, based on the information > # provided by pg_config. > my $sample_file = > $node->config_data('--sharedir') . '/postgresql.conf.sample'; This seems like a pretty bad idea even if it weren't failing outright. We should be examining the version of the file that's in the source tree; the one in the installation tree might have version-skew problems, if you've not yet done "make install". regards, tom lane
Re: Synchronizing slots from primary to standby
On 05.02.22 20:59, Andres Freund wrote: On 2022-01-03 14:46:52 +0100, Peter Eisentraut wrote: From ec00dc6ab8bafefc00e9b1c78ac9348b643b8a87 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Mon, 3 Jan 2022 14:43:36 +0100 Subject: [PATCH v3] Synchronize logical replication slots from primary to standby I've just skimmed the patch and the related threads. As far as I can tell this cannot be safely used without the conflict handling in [1], is that correct? This or similar questions have been asked a few times about this or similar patches, but they always come with some doubt. If we think so, it would be useful perhaps if we could come up with test cases that would demonstrate why that other patch/feature is necessary. (I'm not questioning it personally, I'm just throwing out ideas here.)
Re: Accommodate startup process in a separate ProcState array slot instead of in MaxBackends slots.
В Сб, 16/10/2021 в 16:37 +0530, Bharath Rupireddy пишет: > On Thu, Oct 14, 2021 at 10:56 AM Fujii Masao > wrote: > > On 2021/10/12 15:46, Bharath Rupireddy wrote: > > > On Tue, Oct 12, 2021 at 5:37 AM Fujii Masao > > > wrote: > > > > On 2021/10/12 4:07, Bharath Rupireddy wrote: > > > > > Hi, > > > > > > > > > > While working on [1], it is found that currently the ProcState array > > > > > doesn't have entries for auxiliary processes, it does have entries for > > > > > MaxBackends. But the startup process is eating up one slot from > > > > > MaxBackends. We need to increase the size of the ProcState array by 1 > > > > > at least for the startup process. The startup process uses ProcState > > > > > slot via InitRecoveryTransactionEnvironment->SharedInvalBackendInit. > > > > > The procState array size is initialized to MaxBackends in > > > > > SInvalShmemSize. > > > > > > > > > > The consequence of not fixing this issue is that the database may hit > > > > > the error "sorry, too many clients already" soon in > > > > > SharedInvalBackendInit. > > > > On second thought, I wonder if this error could not happen in practice. No? > > Because autovacuum doesn't work during recovery and the startup process > > can safely use the ProcState entry for autovacuum worker process. > > Also since the minimal allowed value of autovacuum_max_workers is one, > > the ProcState array guarantees to have at least one entry for autovacuum > > worker. > > > > If this understanding is right, we don't need to enlarge the array and > > can just update the comment. I don't strongly oppose to enlarge > > the array in the master, but I'm not sure it's worth doing that > > in back branches if the issue can cause no actual error. > > Yes, the issue can't happen. The comment in the SInvalShmemSize, > mentioning about the startup process always having an extra slot > because the autovacuum worker is not active during recovery, looks > okay. But, is it safe to assume that always? Do we have a way to > specify that in the form an Assert(when_i_am_startup_proc && > autovacuum_not_running) (this looks a bit dirty though)? Instead, we > can just enlarge the array in the master and be confident about the > fact that the startup process always has one dedicated slot. But this slot wont be used for most of cluster life. It will be just waste. And `Assert(there_is_startup_proc && autovacuum_not_running)` has value on its own, hasn't it? So why doesn't add it with comment. regards, Yura Sokolov
Re: Synchronizing slots from primary to standby
On 10.02.22 22:47, Bruce Momjian wrote: On Tue, Feb 8, 2022 at 08:27:32PM +0530, Ashutosh Sharma wrote: Which means that if e.g. the standby_slot_names GUC differs from synchronize_slot_names on the physical replica, the slots synchronized on the physical replica are not going to be valid. Or if the primary drops its logical slots. Should the redo function for the drop replication slot have the capability to drop it on standby and its subscribers (if any) as well? Slots are not WAL logged (and shouldn't be). I think you pretty much need the recovery conflict handling infrastructure I referenced upthread, which recognized during replay if a record has a conflict with a slot on a standby. And then ontop of that you can build something like this patch. OK. Understood, thanks Andres. I would love to see this feature in PG 15. Can someone explain its current status? Thanks. The way I understand it: 1. This feature (probably) depends on the "Minimal logical decoding on standbys" patch. The details there aren't totally clear (to me). That patch had some activity lately but I don't see it in a state that it's nearing readiness. 2. I think the way this (my) patch is currently written needs some refactoring about how we launch and manage workers. Right now, it's all mangled together with logical replication, since that is a convenient way to launch and manage workers, but it really doesn't need to be tied to logical replication, since it can also be used for other logical slots. 3. It's an open question how to configure this. My patch show a very minimal configuration that allows you to keep all logical slots always behind one physical slot, which addresses one particular use case. In general, you might have things like, one set of logical slots should stay behind one physical slot, another set behind another physical slot, another set should not care, etc. This could turn into something like the synchronous replication feature, where it ends up with its own configuration language. Each of these are clearly significant jobs on their own.
Re: refactoring basebackup.c
On Fri, Feb 11, 2022 at 5:58 AM Jeevan Ladhe wrote: > >Jeevan, Your v12 patch does not apply on HEAD, it requires a > rebase. > > Sure, please find the rebased patch attached. It's Friday today, but I'm feeling brave, and it's still morning here, so ... committed. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Race condition in TransactionIdIsInProgress
On Fri, 11 Feb 2022 at 08:48, Simon Riggs wrote: > > On Fri, 11 Feb 2022 at 06:11, Andres Freund wrote: > > > Looks lik syncrep will make this a lot worse, because it can drastically > > increase the window between the TransactionIdCommitTree() and > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > > least it might make it easier to write tests exercising this scenario... > > Agreed > > TransactionIdIsKnownCompleted(xid) is only broken because the single > item cache is set too early in some cases. The single item cache is > important for performance, so we just need to be more careful about > setting the cache. Something like this... fix_cachedFetchXid.v1.patch prevents the cache being set, but this fails! Not worked out why, yet. just_remove_TransactionIdIsKnownCompleted_call.v1.patch just removes the known offending call, passes make check, but IMHO leaves the same error just as likely by other callers. -- Simon Riggshttp://www.EnterpriseDB.com/ fix_cachedFetchXid.v1.patch Description: Binary data just_remove_TransactionIdIsKnownCompleted_call.v1.patch Description: Binary data
Re: Identify missing publications from publisher while create/alter subscription.
I have spent little time looking at the latest patch. The patch looks to be in good shape as it has already been reviewed by many people here, although I did get some comments. Please take a look and let me know your thoughts. + /* Try to connect to the publisher. */ + wrconn = walrcv_connect(sub->conninfo, true, sub->name, ); + if (!wrconn) + ereport(ERROR, + (errmsg("could not connect to the publisher: %s", err))); I think it would be good to also include the errcode (ERRCODE_CONNECTION_FAILURE) here? -- @@ -514,6 +671,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt, PG_TRY(); { + check_publications(wrconn, publications, opts.validate_publication); + Instead of passing the opts.validate_publication argument to check_publication function, why can't we first check if this option is set or not and accordingly call check_publication function? For other options I see that it has been done in the similar way for e.g. check for opts.connect or opts.refresh or opts.enabled etc. -- Above comment also applies for: @@ -968,6 +1130,8 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt, replaces[Anum_pg_subscription_subpublications - 1] = true; update_tuple = true; + connect_and_check_pubs(sub, stmt->publication, + opts.validate_publication); -- + + When true, the command verifies if all the specified publications + that are being subscribed to are present in the publisher and throws + an error if any of the publication doesn't exist. The default is + false. publication -> publications (in the 4th line : throw an error if any of the publication doesn't exist) This applies for both CREATE and ALTER subscription commands. -- With Regards, Ashutosh Sharma. On Sat, Nov 13, 2021 at 12:50 PM vignesh C wrote: > > On Wed, Nov 10, 2021 at 11:16 AM Bharath Rupireddy > wrote: > > > > On Tue, Nov 9, 2021 at 9:27 PM vignesh C wrote: > > > Attached v12 version is rebased on top of Head. > > > > Thanks for the patch. Here are some comments on v12: > > > > 1) I think ERRCODE_TOO_MANY_ARGUMENTS isn't the right error code, the > > ERRCODE_UNDEFINED_OBJECT is more meaningful. Please change. > > + ereport(ERROR, > > + (errcode(ERRCODE_TOO_MANY_ARGUMENTS), > > + errmsg_plural("publication %s does not exist in the publisher", > > +"publications %s do not exist in the publisher", > > > > The existing code using > > ereport(ERROR, > > (errcode(ERRCODE_UNDEFINED_OBJECT), > > errmsg("subscription \"%s\" does not exist", subname))); > > Modified > > > 2) Typo: It is "One of the specified publications exists." > > +# One of the specified publication exist. > > Modified > > > 3) I think we can remove the test case "+# Specified publication does > > not exist." because the "+# One of the specified publication exist." > > covers the code. > > Modified > > > 4) Do we need the below test case? Even with refresh = false, it does > > call connect_and_check_pubs() right? Please remove it. > > +# Specified publication does not exist with refresh = false. > > +($ret, $stdout, $stderr) = $node_subscriber->psql('postgres', > > + "ALTER SUBSCRIPTION mysub1 SET PUBLICATION non_existent_pub > > WITH (REFRESH = FALSE, VALIDATE_PUBLICATION = TRUE)" > > +); > > +ok( $stderr =~ > > + m/ERROR: publication "non_existent_pub" does not exist in > > the publisher/, > > + "Alter subscription for non existent publication fails"); > > + > > Modified > > > 5) Change the test case names to different ones instead of the same. > > Have something like: > > "Create subscription fails with single non-existent publication"); > > "Create subscription fails with multiple non-existent publications"); > > "Create subscription fails with mutually exclusive options"); > > "Alter subscription add publication fails with non-existent publication"); > > "Alter subscription set publication fails with non-existent publication"); > > "Alter subscription set publication fails with connection to a > > non-existent database"); > > Modified > > Thanks for the comments, the attached v13 patch has the fixes for the same. > > Regards, > Vignesh
Re: Assertion failure in WaitForWALToBecomeAvailable state machine
On Fri, Feb 11, 2022 at 6:22 PM Bharath Rupireddy wrote: > > On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar wrote: > > > IIUC, the issue can happen while the walreceiver failed to get WAL > from primary for whatever reasons and its status is not > WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead > in WaitForWALToBecomeAvailable for reading from archive which ends up > in this assertion failure. ITSM, a rare scenario and it depends on > what walreceiver does between failure to get WAL from primary and > updating status to WALRCV_STOPPING or WALRCV_STOPPED. > > If the above race condition is a serious problem, if one thinks at > least it is a problem at all, that needs to be fixed. I don't think we can design a software which has open race conditions even though they are rarely occurring :) I don't think > just making InstallXLogFileSegmentActive false is enough. By looking > at the comment [1], it doesn't make sense to move ahead for restoring > from the archive location without the WAL receiver fully stopped. > IMO, the real fix is to just remove WalRcvStreaming() and call > XLogShutdownWalRcv() unconditionally. Anyways, we have the > Assert(!WalRcvStreaming()); down below. I don't think it will create > any problem. If WalRcvStreaming() is returning false that means walreceiver is already stopped so we don't need to shutdown it externally. I think like we are setting this flag outside start streaming we can reset it also outside XLogShutdownWalRcv. Or I am fine even if we call XLogShutdownWalRcv() because if walreceiver is stopped it will just reset the flag we want it to reset and it will do nothing else. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
On Tue, Feb 8, 2022 at 3:49 AM Fujii Masao wrote: > Here are the review comments for 0001 patch. > > I got the following compiler warning. > > [16:58:07.120] connection.c: In function ‘pgfdw_finish_pre_commit_cleanup’: > [16:58:07.120] connection.c:1726:4: error: ISO C90 forbids mixed declarations > and code [-Werror=declaration-after-statement] > [16:58:07.120] 1726 |PGresult *res; > [16:58:07.120] |^~~~ Sorry, I didn’t notice this, because my compiler doesn’t produce it. I tried to fix it. Attached is an updated version of the patch set. I hope this works for you. > + /* Ignore errors in the DEALLOCATE (see note above) */ > + if ((res = PQgetResult(entry->conn)) != NULL) > > Doesn't PQgetResult() need to be called repeatedly until it returns NULL or > the connection is lost because there can be more than one messages to receive? Yeah, we would receive a single message here, but PQgetResult must be called repeatedly until it returns NULL (see the documentation note about it in libpq.sgml); else the PQtransactionStatus of the connection would remain PQTRANS_ACTIVE, causing the connection to be closed at transaction end, because we do this in pgfdw_reset_xact_state called from pgfdw_xact_callback: /* * If the connection isn't in a good idle state, it is marked as * invalid or keep_connections option of its server is disabled, then * discard it to recover. Next GetConnection will open a new * connection. */ if (PQstatus(entry->conn) != CONNECTION_OK || PQtransactionStatus(entry->conn) != PQTRANS_IDLE || entry->changing_xact_state || entry->invalidated || !entry->keep_connections) { elog(DEBUG3, "discarding connection %p", entry->conn); disconnect_pg_server(entry); } But I noticed a brown-paper-bag bug in the bit you showed above: the if test should be modified as a while loop. :-( I fixed this in the attached. > + if (pending_deallocs) > + { > + foreach(lc, pending_deallocs) > > If pending_deallocs is NIL, we don't enter this foreach loop. So probably "if > (pending_deallocs)" seems not necessary. Yeah, I think we could omit the if test, but I added it to match other places (see e.g., foreign_grouping_ok() in postgres_fdw.c). It looks cleaner to me to have it before the loop. > entry->keep_connections = defGetBoolean(def); > + if (strcmp(def->defname, "parallel_commit") == 0) > + entry->parallel_commit = defGetBoolean(def); > > Isn't it better to use "else if" here, instead? Yeah, that would be better. Done. > +static void do_sql_command_begin(PGconn *conn, const char *sql); > +static void do_sql_command_end(PGconn *conn, const char *sql); > > To simplify the code more, I'm tempted to change do_sql_command() so that it > just calls the above two functions, instead of calling PQsendQuery() and > pgfw_get_result() directly. Thought? If we do this, probably we also need to > change do_sql_command_end() so that it accepts boolean flag which specifies > whether PQconsumeInput() is called or not, as follows. Done. Actually, I was planning to do this for consistency with a similar refactoring for pgfdw_cancel_query and pgfdw_exec_cleanup_query that had been done in the parallel-abort patch. I tweaked comments/docs a little bit as well. Thanks for reviewing! Best regards, Etsuro Fujita v4-0001-postgres-fdw-Add-support-for-parallel-commit.patch Description: Binary data v4-0002-postgres_fdw-Minor-cleanup-for-pgfdw_abort_cleanup.patch Description: Binary data v4-0003-postgres-fdw-Add-support-for-parallel-abort.patch Description: Binary data
Re: Assertion failure in WaitForWALToBecomeAvailable state machine
On Fri, Feb 11, 2022 at 3:33 PM Dilip Kumar wrote: > > Hi, > > The problem is that whenever we are going for streaming we always set > XLogCtl->InstallXLogFileSegmentActive to true, but while switching > from streaming to archive we do not always reset it so it hits > assertion in some cases. Basically we reset it inside > XLogShutdownWalRcv() but while switching from the streaming mode we > only call it conditionally if WalRcvStreaming(). But it is very much > possible that even before we call WalRcvStreaming() the walreceiver > might have set alrcv->walRcvState to WALRCV_STOPPED. So now > WalRcvStreaming() will return false. So I agree now we do not want to > really shut down the walreceiver but who will reset the flag? > > I just ran some tests on primary and attached the walreceiver to gdb > and waited for it to exit with timeout and then the recovery process > hit the assertion. > > 2022-02-11 14:33:56.976 IST [60978] FATAL: terminating walreceiver > due to timeout > cp: cannot stat > ‘/home/dilipkumar/work/PG/install/bin/wal_archive/0002.history’: > No such file or directory > 2022-02-11 14:33:57.002 IST [60973] LOG: restored log file > "00010003" from archive > TRAP: FailedAssertion("!XLogCtl->InstallXLogFileSegmentActive", File: > "xlog.c", Line: 3823, PID: 60973) > > I have just applied a quick fix and that solved the issue, basically > if the last failed source was streaming and the WalRcvStreaming() is > false then just reset this flag. IIUC, the issue can happen while the walreceiver failed to get WAL from primary for whatever reasons and its status is not WALRCV_STOPPING or WALRCV_STOPPED, and the startup process moved ahead in WaitForWALToBecomeAvailable for reading from archive which ends up in this assertion failure. ITSM, a rare scenario and it depends on what walreceiver does between failure to get WAL from primary and updating status to WALRCV_STOPPING or WALRCV_STOPPED. If the above race condition is a serious problem, if one thinks at least it is a problem at all, that needs to be fixed. I don't think just making InstallXLogFileSegmentActive false is enough. By looking at the comment [1], it doesn't make sense to move ahead for restoring from the archive location without the WAL receiver fully stopped. IMO, the real fix is to just remove WalRcvStreaming() and call XLogShutdownWalRcv() unconditionally. Anyways, we have the Assert(!WalRcvStreaming()); down below. I don't think it will create any problem. [1] /* * Before we leave XLOG_FROM_STREAM state, make sure that * walreceiver is not active, so that it won't overwrite * WAL that we restore from archive. */ if (WalRcvStreaming()) XLogShutdownWalRcv(); Regards, Bharath Rupireddy.
Re: Database-level collation version tracking
On Fri, Feb 11, 2022 at 12:07:02PM +0100, Peter Eisentraut wrote: > On 10.02.22 12:08, Julien Rouhaud wrote: > > > + errhint("Rebuild all objects affected > > > by collation in the template database and run " > > > + "ALTER DATABASE %s > > > REFRESH COLLATION VERSION, " > > > + "or build PostgreSQL > > > with the right library version.", > > > + > > > quote_identifier(dbtemplate; > > > > After a second read I think the messages are slightly ambiguous. What do > > you > > think about specifying the problematic collation name and provider? > > > > For now we only support libc default collation so users will probably have > > to > > reindex almost everything on that database (not sure if the versioning is > > more > > fine grained on Windows), but we should probably still specify "affected by > > libc collation" in the errhint so they have a chance to avoid unnecessary > > reindex. > > I think accurate would be something like "objects using the default > collation", since objects using a specific collation are not meant, even if > they use the same provider. Technically is the objects explicitly use the same collation as the default collation they should be impacted the same way, but agreed. > > > +/* > > > + * ALTER DATABASE name REFRESH COLLATION VERSION > > > + */ > > > +ObjectAddress > > > +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt) > > > > I'm wondering why you changed this function to return an ObjectAddress > > rather > > than an Oid? There's no event trigger support for ALTER DATABASE, and the > > rest > > of similar utility commands also returns Oid. > > Hmm, I was looking at RenameDatabase() and AlterDatabaseOwner(), which > return ObjectAddress. Apparently I managed to only check AlterDatabase and AlterDatabaseSet, which both return an Oid. Maybe we could also update those two to also return an ObjectAddress, for consistency?
Re: [PoC] Improve dead tuple storage for lazy vacuum
Hi, Today I noticed the inefficiencies of our dead tuple storage once again, and started theorizing about a better storage method; which is when I remembered that this thread exists, and that this thread already has amazing results. Are there any plans to get the results of this thread from PoC to committable? Kind regards, Matthias van de Meent
Re: refactoring basebackup.c
> Sure, please find the rebased patch attached. Thanks, I have validated v2 patch on top of rebased patch. Thanks, Dipesh
Re: Database-level collation version tracking
On 10.02.22 12:08, Julien Rouhaud wrote: +errhint("Rebuild all objects affected by collation in the template database and run " +"ALTER DATABASE %s REFRESH COLLATION VERSION, " +"or build PostgreSQL with the right library version.", + quote_identifier(dbtemplate; After a second read I think the messages are slightly ambiguous. What do you think about specifying the problematic collation name and provider? For now we only support libc default collation so users will probably have to reindex almost everything on that database (not sure if the versioning is more fine grained on Windows), but we should probably still specify "affected by libc collation" in the errhint so they have a chance to avoid unnecessary reindex. I think accurate would be something like "objects using the default collation", since objects using a specific collation are not meant, even if they use the same provider. +/* + * ALTER DATABASE name REFRESH COLLATION VERSION + */ +ObjectAddress +AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt) I'm wondering why you changed this function to return an ObjectAddress rather than an Oid? There's no event trigger support for ALTER DATABASE, and the rest of similar utility commands also returns Oid. Hmm, I was looking at RenameDatabase() and AlterDatabaseOwner(), which return ObjectAddress.
Re: refactoring basebackup.c
>Jeevan, Your v12 patch does not apply on HEAD, it requires a rebase. Sure, please find the rebased patch attached. Regards, Jeevan On Fri, 11 Feb 2022 at 14:13, Dipesh Pandit wrote: > Hi, > > Thanks for the feedback, I have incorporated the suggestions > and updated a new patch. PFA v2 patch. > > > I think similar to bbstreamer_lz4_compressor_content() in > > bbstreamer_lz4_decompressor_content() we can change len to avail_in. > > In bbstreamer_lz4_decompressor_content(), we are modifying avail_in > based on the number of bytes decompressed in each iteration. I think > we cannot replace it with "len" here. > > Jeevan, Your v12 patch does not apply on HEAD, it requires a > rebase. I have applied it on commit > 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660 > to validate my v2 patch. > > Thanks, > Dipesh > v13-0001-Add-a-LZ4-compression-method-for-server-side-compres.patch Description: Binary data
Re: [BUG]Update Toast data failure in logical replication
On Fri, Feb 11, 2022 at 12:00 PM tanghy.f...@fujitsu.com wrote: > > Attached the patches which fixed the above two comments and the first comment > in > my previous mail [1], the rest is the same as before. > I ran the tests on all branches, they all passed as expected. > Thanks, these look good to me. I'll push these early next week (Monday) unless there are any more suggestions or comments. -- With Regards, Amit Kapila.
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
> >> > So I think knowing what bad it is to have this feature is the key point to >> discussion now. >> >> I re-read the discussion at 2015 [1] and the below topic is added for the above question. Here is the summary for easy discussion. >From planner aspect: > While I've only read your description of the patch not the patch itself, > the search methodology you propose seems pretty brute-force and > unlikely to solve that issue. It's particularly important to avoid O(N^2) > behaviors when there are N expressions ... The patch has 3 steps in general. 1). Gather the filter_qual_list during the deconstruct_jointree. only unmergeable qual is gathered here. 2). After the root->eq_classes is built, scan each of the above quals to find out if there is a EC match, if yes, add it to the EC. There are some fast paths here. like ec->relids, em->em_relids. 3). compose the qual in ec_filter and members in ec_members, then distribute it to the relations. This step take the most cycles of this feature, and it is the most important part for this feature as well. Fortunately, thousands of partitions of a table would not make it worse since they are not generated at that stage. So I'd believe the number of ECs or EMs in an EC would be pretty small in common cases. > time would be spent on searches for matching subexpressions whether > or not anything was learned (and often nothing would be learned). This is about some cases like "SELECT * FROM t1, t2 WHERE t1.a = t2.a and t1.b > 3". In this case, we still need to go through steps 1 & 2, all the fast paths don't work and the equal() is unavoidable. However step 3 can be ignored. If we want to improve this, could we maintain an attr_eq_indexes in RelOptInfos which indicates if the given attribute appears in any one of EC members? = >From executor aspects: > The reason why the answer isn't automatically "all of them" > is because, first of all, it's possible that enforcing the condition > at a particular table costs more to filter out the rows that we save > in execution time at higher levels of the plan tree. For example, > consider A JOIN B ON A.X = B.X WHERE A.X > 100. It might be that > the range of A.X is [0,101] but the range of B.X is > [100,200]; so enforcing the inequality against A is very > selective but enforcing it against B filters out basically nothing. I think we can classify this as we push down / execute an qual, the qual takes lots of cycles, but it doesn't filter many rows. > A first cut might be to enforce the inequality against the relation > where it's believed to be most selective, equivalence-class column > mentioned in the inequality provided that the > selectivity is thought to be above some threshold ... but I'm not sure > this is very principled, I can only input +1 after some deep thoughts. >> Furthermore, there are some cases involving parameterized paths where >> enforcing the inequality multiple times is definitely bad: for >> example, if we've got a nested loop where the outer side is a seq scan >> that enforces the condition and the inner side is an index probe, it >> is just a waste to retest it on the inner side. We already know that >> the outer row passes the inequality, so the inner row will necessarily >> pass also. This doesn't apply to merge or hash joins, and it also >> doesn't apply to all nested loops: scans that aren't paramaterized by >> the equivalence-class column can still benefit from separate >> enforcement of the inequality. >> > I guess that could be fixed by somehow marking these pushed quals as > optional and having parameterised scans ignore optional quals. This has been done by committing 4. > Now, all that having been said, I think this is a pretty important > optimization. Lots of people have asked for it, and I think it would > be worth expending some brainpower to try to figure out a way to be > smarter than we are now, which is, in a nutshell, as dumb as possible. +1. I asked custom to add the derivable quals manually for 10+ of table each query last year and gained great results. Anyone still have interest in this? Or is a better solution really possible? Or is the current method too bad to rescue? -- Best Regards Andy Fan
Assertion failure in WaitForWALToBecomeAvailable state machine
Hi, The problem is that whenever we are going for streaming we always set XLogCtl->InstallXLogFileSegmentActive to true, but while switching from streaming to archive we do not always reset it so it hits assertion in some cases. Basically we reset it inside XLogShutdownWalRcv() but while switching from the streaming mode we only call it conditionally if WalRcvStreaming(). But it is very much possible that even before we call WalRcvStreaming() the walreceiver might have set alrcv->walRcvState to WALRCV_STOPPED. So now WalRcvStreaming() will return false. So I agree now we do not want to really shut down the walreceiver but who will reset the flag? I just ran some tests on primary and attached the walreceiver to gdb and waited for it to exit with timeout and then the recovery process hit the assertion. 2022-02-11 14:33:56.976 IST [60978] FATAL: terminating walreceiver due to timeout cp: cannot stat ‘/home/dilipkumar/work/PG/install/bin/wal_archive/0002.history’: No such file or directory 2022-02-11 14:33:57.002 IST [60973] LOG: restored log file "00010003" from archive TRAP: FailedAssertion("!XLogCtl->InstallXLogFileSegmentActive", File: "xlog.c", Line: 3823, PID: 60973) I have just applied a quick fix and that solved the issue, basically if the last failed source was streaming and the WalRcvStreaming() is false then just reset this flag. @@ -12717,6 +12717,12 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, */ if (WalRcvStreaming()) XLogShutdownWalRcv(); + else + { + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + XLogCtl->InstallXLogFileSegmentActive = false; + LWLockRelease(ControlFileLock); + } -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
[PATCH] minor reloption regression tests improvement
I'd like to suggest a patch for reloption regression tests. This patch tests case, that can be rarely met in actual life: when reloptions have some illegal option set (as a result of malfunction or extension downgrade or something), and user tries to remove this option by using RESET. Current postgres behaviour is to actually remove this option. Like: UPDATE pg_class SET reloptions = '{illegal_option=4}' WHERE oid = 'reloptions_test'::regclass; ALTER TABLE reloptions_test RESET (illegal_option); Why this should be tested: 1. It is what postgres actually do now. 2. This behaviour is reasonable. DB User can fix problem without updating pg_class, having rights to change his own table. 3. Better to get test alarm, if this behavior is accidentally changed. -- Nikolay Shaplov aka Nataraj Fuzzing Engineer at Postgres Professional Matrix IM: @dhyan:nataraj.su diff --git a/src/test/regress/expected/reloptions.out b/src/test/regress/expected/reloptions.out index b6aef6f654..9f460a7e60 100644 --- a/src/test/regress/expected/reloptions.out +++ b/src/test/regress/expected/reloptions.out @@ -183,6 +183,17 @@ SELECT reloptions FROM pg_class WHERE oid = ( {autovacuum_vacuum_cost_delay=23} (1 row) +-- Can reset option that is not allowed, but for some reason is already set +UPDATE pg_class + SET reloptions = '{fillfactor=13,autovacuum_enabled=false,illegal_option=4}' + WHERE oid = 'reloptions_test'::regclass; +ALTER TABLE reloptions_test RESET (illegal_option); +SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass; +reloptions +-- + {fillfactor=13,autovacuum_enabled=false} +(1 row) + -- -- CREATE INDEX, ALTER INDEX for btrees -- diff --git a/src/test/regress/sql/reloptions.sql b/src/test/regress/sql/reloptions.sql index 4252b0202f..fadce3384d 100644 --- a/src/test/regress/sql/reloptions.sql +++ b/src/test/regress/sql/reloptions.sql @@ -105,6 +105,13 @@ SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass; SELECT reloptions FROM pg_class WHERE oid = ( SELECT reltoastrelid FROM pg_class WHERE oid = 'reloptions_test'::regclass); +-- Can reset option that is not allowed, but for some reason is already set +UPDATE pg_class + SET reloptions = '{fillfactor=13,autovacuum_enabled=false,illegal_option=4}' + WHERE oid = 'reloptions_test'::regclass; +ALTER TABLE reloptions_test RESET (illegal_option); +SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass; + -- -- CREATE INDEX, ALTER INDEX for btrees --
Re: pgsql: Add TAP test to automate the equivalent of check_guc
Re: Michael Paquier > Add TAP test to automate the equivalent of check_guc > > src/backend/utils/misc/check_guc is a script that cross-checks the > consistency of the GUCs with postgresql.conf.sample, making sure that Hi, this test is failing at Debian package compile time: 00:55:07 ok 18 - drop tablespace 2 00:55:07 ok 19 - drop tablespace 3 00:55:07 ok 20 - drop tablespace 4 00:55:07 ok 00:55:07 # Looks like your test exited with 2 before it could output anything. 00:55:09 t/003_check_guc.pl .. 00:55:09 1..3 00:55:09 Dubious, test returned 2 (wstat 512, 0x200) 00:55:09 Failed 3/3 subtests 00:55:09 00:55:09 Test Summary Report 00:55:09 --- 00:55:09 t/003_check_guc.pl(Wstat: 512 Tests: 0 Failed: 0) 00:55:09 Non-zero exit status: 2 00:55:09 Parse errors: Bad plan. You planned 3 tests but ran 0. 00:55:09 Files=3, Tests=62, 6 wallclock secs ( 0.05 usr 0.01 sys + 3.31 cusr 1.35 csys = 4.72 CPU) 00:55:09 Result: FAIL 00:55:09 make[3]: *** [/<>/build/../src/makefiles/pgxs.mk:457: check] Error 1 00:55:09 make[3]: Leaving directory '/<>/build/src/test/modules/test_misc' ### Starting node "main" # Running: pg_ctl -w -D /srv/projects/postgresql/pg/master/build/src/test/modules/test_misc/tmp_check/t_003_check_guc_main_data/pgdata -l /srv/projects/postgresql/pg/master/build/src/test/modules/test_misc/tmp_check/log/003_check_guc_main.log -o --cluster-name=main start waiting for server to start done server started # Postmaster PID for node "main" is 1432398 Could not open /usr/share/postgresql/15/postgresql.conf.sample: No such file or directory at t/003_check_guc.pl line 47. ### Stopping node "main" using mode immediate So it's trying to read from /usr/share/postgresql which doesn't exist yet at build time. The relevant part of the test is this: # Find the location of postgresql.conf.sample, based on the information # provided by pg_config. my $sample_file = $node->config_data('--sharedir') . '/postgresql.conf.sample'; It never caused any problem in the 12+ years we have been doing this, but Debian is patching pg_config not to be relocatable since we are installing into /usr/lib/postgresql/NN /usr/share/postgresql/NN, so not a single prefix. https://salsa.debian.org/postgresql/postgresql/-/blob/15/debian/patches/50-per-version-dirs.patch Christoph
Re: unlogged sequences
On 25.06.19 20:37, Andres Freund wrote: I.e. I think it'd be better if we just added a fork argument to fill_seq_with_data(), and then do something like smgrcreate(srel, INIT_FORKNUM, false); log_smgrcreate(>rd_node, INIT_FORKNUM); fill_seq_with_data(rel, tuple, INIT_FORKNUM); and add a FlushBuffer() to the end of fill_seq_with_data() if writing INIT_FORKNUM. The if (RelationNeedsWAL(rel)) would need an || forkNum == INIT_FORKNUM. Now that logical replication of sequences is nearing completion, I figured it would be suitable to dust off this old discussion on unlogged sequences, mainly so that sequences attached to unlogged tables can be excluded from replication. Attached is a new patch that incorporates the above suggestions, with some slight refactoring. The only thing I didn't/couldn't do was to call FlushBuffers(), since that is not an exported function. So this still calls FlushRelationBuffers(), which was previously not liked. Ideas welcome. I have also re-tested the crash reported by Michael Paquier in the old discussion and added test cases that catch them. The rest of the patch is just documentation, DDL support, client support, etc. What is not done yet is support for ALTER SEQUENCE ... SET LOGGED/UNLOGGED. This is a bit of a problem because: 1. The new behavior is that a serial/identity sequence of a new unlogged table is now also unlogged. 2. There is also a new restriction that changing a table to logged is not allowed if it is linked to an unlogged sequence. (This is IMO similar to the existing restriction on linking mixed logged/unlogged tables via foreign keys.) 3. Thus, currently, you can't create an unlogged table with a serial/identity column and then change it to logged. This is reflected in some of the test changes I had to make in alter_table.sql to work around this. These should eventually go away. Interestingly, there is grammar support for ALTER SEQUENCE ... SET LOGGED/UNLOGGED because there is this: | ALTER SEQUENCE qualified_name alter_table_cmds { AlterTableStmt *n = makeNode(AlterTableStmt); n->relation = $3; n->cmds = $4; n->objtype = OBJECT_SEQUENCE; n->missing_ok = false; $$ = (Node *)n; } But it is rejected later in tablecmds.c. In fact, it appears that this piece of grammar is currently useless because there are no alter_table_cmds that actually work for sequences. (This used to be different because things like OWNER TO also went through here.) I tried to make tablecmds.c handle sequences as well, but that became messy. So I'm thinking about making ALTER SEQUENCE ... SET LOGGED/UNLOGGED an entirely separate code path and rip out the above grammar, but that needs some further pondering. But all that is a bit of a separate effort, so in the meantime some review of the changes in and around fill_seq_with_data() would be useful.From 761d9be68da6557ccd03f06f41e7858380679406 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 11 Feb 2022 09:44:37 +0100 Subject: [PATCH v2] Unlogged sequences Add support for unlogged sequences. Unlike for unlogged tables, this is not a performance feature. It allows sequences associated with unlogged tables to be excluded from replication. An identity/serial sequence is automatically made unlogged when its owning table is. (See also discussion in bug #15631.) TODO: - ALTER SEQUENCE ... SET LOGGED/UNLOGGED Discussion: https://www.postgresql.org/message-id/flat/04e12818-2f98-257c-b926-2845d74ed04f%402ndquadrant.com --- doc/src/sgml/ref/create_sequence.sgml | 23 +- doc/src/sgml/ref/pg_dump.sgml | 6 +-- src/backend/commands/sequence.c| 36 +++ src/backend/commands/tablecmds.c | 22 ++ src/backend/parser/parse_utilcmd.c | 1 + src/bin/pg_dump/pg_dump.c | 4 +- src/bin/psql/describe.c| 8 +++- src/test/recovery/t/014_unlogged_reinit.pl | 51 +++--- src/test/regress/expected/alter_table.out | 40 + src/test/regress/expected/sequence.out | 12 - src/test/regress/sql/alter_table.sql | 9 ++-- src/test/regress/sql/sequence.sql | 8 +++- 12 files changed, 173 insertions(+), 47 deletions(-) diff --git a/doc/src/sgml/ref/create_sequence.sgml b/doc/src/sgml/ref/create_sequence.sgml index 20bdbc002f..a84aa5bf56 100644 --- a/doc/src/sgml/ref/create_sequence.sgml +++ b/doc/src/sgml/ref/create_sequence.sgml @@ -21,7 +21,7 @@ -CREATE [ TEMPORARY | TEMP ] SEQUENCE [ IF NOT EXISTS ] name +CREATE [ { TEMPORARY | TEMP } | UNLOGGED ] SEQUENCE [ IF NOT EXISTS ] name [ AS data_type ] [ INCREMENT [ BY ] increment ] [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ] @@ -92,6 +92,27 @@ Parameters + +
Re: Race condition in TransactionIdIsInProgress
On Fri, 11 Feb 2022 at 06:11, Andres Freund wrote: > Looks lik syncrep will make this a lot worse, because it can drastically > increase the window between the TransactionIdCommitTree() and > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at > least it might make it easier to write tests exercising this scenario... Agreed TransactionIdIsKnownCompleted(xid) is only broken because the single item cache is set too early in some cases. The single item cache is important for performance, so we just need to be more careful about setting the cache. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: refactoring basebackup.c
Hi, Thanks for the feedback, I have incorporated the suggestions and updated a new patch. PFA v2 patch. > I think similar to bbstreamer_lz4_compressor_content() in > bbstreamer_lz4_decompressor_content() we can change len to avail_in. In bbstreamer_lz4_decompressor_content(), we are modifying avail_in based on the number of bytes decompressed in each iteration. I think we cannot replace it with "len" here. Jeevan, Your v12 patch does not apply on HEAD, it requires a rebase. I have applied it on commit 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660 to validate my v2 patch. Thanks, Dipesh From 47a0ef4348747ffa61eccd7954e00f3cf5fc7222 Mon Sep 17 00:00:00 2001 From: Dipesh Pandit Date: Thu, 3 Feb 2022 18:31:03 +0530 Subject: [PATCH] support client side compression and decompression using LZ4 --- src/bin/pg_basebackup/Makefile| 1 + src/bin/pg_basebackup/bbstreamer.h| 3 + src/bin/pg_basebackup/bbstreamer_lz4.c| 431 ++ src/bin/pg_basebackup/pg_basebackup.c | 32 +- src/bin/pg_verifybackup/t/009_extract.pl | 7 +- src/bin/pg_verifybackup/t/010_client_untar.pl | 111 +++ src/tools/msvc/Mkvcbuild.pm | 1 + 7 files changed, 580 insertions(+), 6 deletions(-) create mode 100644 src/bin/pg_basebackup/bbstreamer_lz4.c create mode 100644 src/bin/pg_verifybackup/t/010_client_untar.pl diff --git a/src/bin/pg_basebackup/Makefile b/src/bin/pg_basebackup/Makefile index ada3a5a..1d0db4f 100644 --- a/src/bin/pg_basebackup/Makefile +++ b/src/bin/pg_basebackup/Makefile @@ -43,6 +43,7 @@ BBOBJS = \ bbstreamer_file.o \ bbstreamer_gzip.o \ bbstreamer_inject.o \ + bbstreamer_lz4.o \ bbstreamer_tar.o all: pg_basebackup pg_receivewal pg_recvlogical diff --git a/src/bin/pg_basebackup/bbstreamer.h b/src/bin/pg_basebackup/bbstreamer.h index fe49ae3..c2de77b 100644 --- a/src/bin/pg_basebackup/bbstreamer.h +++ b/src/bin/pg_basebackup/bbstreamer.h @@ -206,6 +206,9 @@ extern bbstreamer *bbstreamer_extractor_new(const char *basepath, void (*report_output_file) (const char *)); extern bbstreamer *bbstreamer_gzip_decompressor_new(bbstreamer *next); +extern bbstreamer *bbstreamer_lz4_compressor_new(bbstreamer *next, + int compresslevel); +extern bbstreamer *bbstreamer_lz4_decompressor_new(bbstreamer *next); extern bbstreamer *bbstreamer_tar_parser_new(bbstreamer *next); extern bbstreamer *bbstreamer_tar_terminator_new(bbstreamer *next); extern bbstreamer *bbstreamer_tar_archiver_new(bbstreamer *next); diff --git a/src/bin/pg_basebackup/bbstreamer_lz4.c b/src/bin/pg_basebackup/bbstreamer_lz4.c new file mode 100644 index 000..f0bc226 --- /dev/null +++ b/src/bin/pg_basebackup/bbstreamer_lz4.c @@ -0,0 +1,431 @@ +/*- + * + * bbstreamer_lz4.c + * + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/bin/pg_basebackup/bbstreamer_lz4.c + *- + */ + +#include "postgres_fe.h" + +#include + +#ifdef HAVE_LIBLZ4 +#include +#endif + +#include "bbstreamer.h" +#include "common/logging.h" +#include "common/file_perm.h" +#include "common/string.h" + +#ifdef HAVE_LIBLZ4 +typedef struct bbstreamer_lz4_frame +{ + bbstreamer base; + + LZ4F_compressionContext_t cctx; + LZ4F_decompressionContext_t dctx; + LZ4F_preferences_t prefs; + + size_t bytes_written; + bool header_written; +} bbstreamer_lz4_frame; + +static void bbstreamer_lz4_compressor_content(bbstreamer *streamer, + bbstreamer_member *member, + const char *data, int len, + bbstreamer_archive_context context); +static void bbstreamer_lz4_compressor_finalize(bbstreamer *streamer); +static void bbstreamer_lz4_compressor_free(bbstreamer *streamer); + +const bbstreamer_ops bbstreamer_lz4_compressor_ops = { + .content = bbstreamer_lz4_compressor_content, + .finalize = bbstreamer_lz4_compressor_finalize, + .free = bbstreamer_lz4_compressor_free +}; + +static void bbstreamer_lz4_decompressor_content(bbstreamer *streamer, +bbstreamer_member *member, +const char *data, int len, +bbstreamer_archive_context context); +static void bbstreamer_lz4_decompressor_finalize(bbstreamer *streamer); +static void bbstreamer_lz4_decompressor_free(bbstreamer *streamer); + +const bbstreamer_ops bbstreamer_lz4_decompressor_ops = { + .content = bbstreamer_lz4_decompressor_content, + .finalize = bbstreamer_lz4_decompressor_finalize, + .free = bbstreamer_lz4_decompressor_free +}; +#endif + +/* + * Create a new base backup streamer that performs lz4 compression of tar + * blocks. + */ +bbstreamer * +bbstreamer_lz4_compressor_new(bbstreamer *next, int compresslevel) +{ +#ifdef HAVE_LIBLZ4 + bbstreamer_lz4_frame *streamer; + LZ4F_errorCode_t ctxError; + LZ4F_preferences_t *prefs; + size_t compressed_bound; + +