doubt about FullTransactionIdAdvance()
Hi, hackers I don't quite understand FullTransactionIdAdvance(), correct me if I’m wrong. /* * Advance a FullTransactionId variable, stepping over xids that would appear * to be special only when viewed as 32bit XIDs. */ static inline void FullTransactionIdAdvance(FullTransactionId *dest) { dest->value++; /* see FullTransactionIdAdvance() */ if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId)) return; while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId) dest->value++; } #define XidFromFullTransactionId(x) ((x).value) #define FullTransactionIdPrecedes(a, b) ((a).value < (b).value) Can the codes reach line: while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)? As we will return if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId)), and the two judgements seem equal. Another confusion is the comments: /* see FullTransactionIdAdvance() */, is its own itself. Could anyone explain this? Thanks in advance. Regards, Zhang Mingli
Re: Add 64-bit XIDs into PostgreSQL 15
Hi, On Oct 22, 2022, 00:09 +0800, Maxim Orlov , wrote: > > > > Done! Thanks! Here is the rebased version. > > > > This version has bug fix for multixact replication. Previous versions of > > the patch set does not write pd_multi_base in WAL. Thus, this field was set > > to 0 upon WAL reply on replica. > > This caused replica to panic. Fix this by adding pd_multi_base of a page > > into WAL. Appropriate tap test is added. > > > > Also, add refactoring and improvements in heapam.c in order to reduce diff > > and make it more "tidy". > > > > Reviews and opinions are very welcome! > > > > -- > > Best regards, > > Maxim Orlov. Found some outdate code comments around several variables, such as xidWrapLimit/xidWarnLimit/xidStopLimt. These variables are not used any more. I attach an additional V48-0009 patch as they are just comments, apply it if you want to. Regards, Zhang Mingli v48-0009-remove-some-outdate-codes-comments-about-xidWrap.patch Description: Binary data
Re: Collation version tracking for macOS
On Sat, Oct 22, 2022 at 10:24 AM Thomas Munro wrote: > ... But it > doesn't provide a way for me to create a new database that uses 63 on > purpose when I know what I'm doing. There are various reasons I might > want to do that. Thinking some more about this, I guess that could be addressed by having an explicit way to request either the library version or collversion-style version when creating a database or collation, but not actually storing it in daticulocale/colliculocale. That could be done either as part of the string that is trimmed off before storing it (so it's only used briefly during creation to find a non-default library)... Perhaps that'd look like initdb --icu-locale "67:en" (ICU library version) or "154.14:en" (individual collation version) or some new syntax in a few places. Thereafter, it would always be looked up by searching for the right library by [dat]collversion as Peter E suggested. Let me try harder to vocalise some more thoughts that have stopped me from trying to code the search-by-collversion design so far: Suppose your pgdata encounters a PostgreSQL linked against a later ICU library, most likely after an OS upgrade or migratoin, a pg_upgrade, or via streaming replication. You might get a new error "can't find ICU collation 'en' with version '153.14'; HINT: install missing ICU library version", and somehow you'll have to work out which one might contain 'en' v153.14 and install it with apt-get etc. Then it'll magically work: your postgres linked against (say) 71 will happily work with the dlopen'd 67. This is enough if you want to stay on 67 until the heat death of the universe. So far so good. Problem 1: Suppose you're ready to start using (say) v72. I guess you'd use the REFRESH command, which would open the main linked ICU's collversion and stamp that into the catalogue, at which point new sessions would start using that, and then you'd have to rebuild all your indexes (with no help from PG to tell you how to find everything that needs to be rebuilt, as belaboured in previous reverted work). Aside from the possibility of getting the rebuilding job wrong (as belaboured elsewhere), it's not great, because there is still a transitional period where you can be using the wrong version for your data. So this requires some careful planning and understanding from the administrator. I admit that the upgrade story is a tiny bit better than the v5 DB2-style patch, which starts using the new version immediately if you didn't use a prefix (and logs the usual warnings about collversion mismatch) instead of waiting for you to run REFRESH. But both of them have a phase where they might use the wrong library to access an index. That's dissatisfying, and leads me to prefer the simple DB2-style solution that at least admits up front that it's not very clever. The DB2-style patch could be improved a bit here with the addition of one more GUC: default_icu_library, so the administrator, rather than the packager, remains in control of which version we use for non-prefixed iculocale values (likely to be what almost everyone is interested in), defaulting to what the packager linked against. I've added that to the patch for illustration (though obviously the error messages produced by collversion mismatch could use some adjustment, ie to clarify that the warning might be cleared by installing and selecting a different library version). Problem 2: If ICU 67 ever decides to report a different version for a given collation (would it ever do that? I don't expect so, but ...), we'd be unable to open the collation with the search-by-collversion design, and potentially the database. What is a user supposed to do then? Presumably our error/hint for that would be "please insert the correct ICU library into drive A", but now there is no correct library; if you can even diagnose what's happened, I guess you might downgrade the ICU library using package tools or whatever if possible, but otherwise you'd be stuck, if you just can't get the right library. Is this a problem? Would you want to be able to say "I don't care, computer, please just press on"? So I think we need a way to turn off the search-by-collversion thing. How should it look? I'd love to hear others' thoughts on how we can turn this into a workable solution. Hopefully while staying simple... From 0355984c9a80ff15bfac51677fea30b9be68226b Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Wed, 8 Jun 2022 17:43:53 +1200 Subject: [PATCH v6] WIP: Multi-version ICU. Add a layer of indirection when accessing ICU, so that multiple major versions of the library can be used at once. Versions other than the one that PostgreSQL was linked against are opened with dlopen(), but we refuse to open version higher than the one were were compiled against. The ABI might change in future releases so that wouldn't be safe. By default, the system linker's default search path is used to find libraries, but icu_library_path may be used to
Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
On Thu, Oct 20, 2022 at 11:52 AM Peter Geoghegan wrote: > Leaving my patch series aside, I still don't think that it makes sense > to make it impossible to auto-cancel antiwraparound autovacuums, > across the board, regardless of the underlying table age. One small thought on the presentation/docs side of this: maybe it would be better to invent a new kind of autovacuum that has the same purpose as antiwraparound autovacuum, but goes by a different name, and doesn't have the special behavior around cancellations. We wouldn't have to change anything about the behavior of antiwraparound autovacuum once we reached the point of needing one. Maybe we wouldn't even need to invent a new user-visible name for this other kind of autovacuum. While even this so-called "new kind of autovacuum" will be rare once my main patch series gets in, it'll still be a totally normal occurrence. Whereas antiwraparound autovacuums are sometimes described as an emergency mechanism. That way we wouldn't be fighting against the widely held perception that antiwraparound autovacuums are scary. In fact that reputation would be fully deserved, for the first time. There are lots of problems with the idea that antiwraparound autovacuum is kind of an emergency thing right now, but this would make things fit the perception, "fixing" the perception. Antiwraparound autovacuums would become far far rarer under this scheme, but when they did happen they'd be clear cause for concern. A useful signal for users, who should ideally aim to never see *any* antiwraparound autovacuums. -- Peter Geoghegan
Re: Missing update of all_hasnulls in BRIN opclasses
On 10/21/22 18:44, Tomas Vondra wrote: > > ... > >> Apart from that, this patch looks good. >> Sadly, I don't think we can fix it like this :-( The problem is that all ranges start with all_nulls=true, because the new range gets initialized by brin_memtuple_initialize() like that. But this happens for *every* range before we even start processing the rows. So this way all the ranges would end up with has_nulls=true, making that flag pretty useless. Actually, even just doing "truncate" on the table creates such all-nulls range for the first range, and serializes it to disk. I wondered why we even write such tuples for "empty" ranges to disk, for example after "TRUNCATE" - the table is empty by definition, so how come we write all-nulls brin summary for the first range? For example brininsert() checks if the brin tuple was modified and needs to be written back, but brinbuild() just ignores that, and initializes (and writes) writes the tuple to disk anyway. I think we should not do that - there should be a flag in BrinBuildState, tracking if the BRIN tuple was modified, and we should only write it if it's true. That means we should never get an on-disk summary representing nothing. That doesn't fix the issue, though, because we still need to pass the memtuple tuple to the add_value opclass procedure, and whether it sets the has_nulls flag depends on whether it's a new tuple representing no other rows (in which case has_nulls remains false) or whether it was read from disk (in which case it needs to be flipped to 'true'). But the opclass has no way to tell the difference at the moment - it just gets the BrinMemTuple. So we'd have to extend this, somehow. I wonder how to do this in a back-patchable way - we can't add parameters to the opclass procedure, and the other solution seems to be storing it right in the BrinMemTuple, somehow. But that's likely an ABI break :-( The only solution I can think of is actually passing it using all_nulls and has_nulls - we could set both flags to true (which we never do now) and teach the opclass that it signifies "empty" (and thus not to update has_nulls after resetting all_nulls). Something like the attached (I haven't added any more tests, not sure what would those look like - I can't think of a query testing this, although maybe we could check how the flags change using pageinspect). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom a99fd6a737cec24bb4063e99a241ff3e04c6ebb8 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 20 Oct 2022 19:55:23 +0200 Subject: [PATCH 1/9] fixup: brin has_nulls --- src/backend/access/brin/brin_bloom.c| 1 + src/backend/access/brin/brin_inclusion.c| 1 + src/backend/access/brin/brin_minmax.c | 1 + src/backend/access/brin/brin_minmax_multi.c | 1 + 4 files changed, 4 insertions(+) diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c index 6b0af7267d5..60315450b41 100644 --- a/src/backend/access/brin/brin_bloom.c +++ b/src/backend/access/brin/brin_bloom.c @@ -539,6 +539,7 @@ brin_bloom_add_value(PG_FUNCTION_ARGS) BloomGetFalsePositiveRate(opts)); column->bv_values[0] = PointerGetDatum(filter); column->bv_allnulls = false; + column->bv_hasnulls = true; updated = true; } else diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c index 4b02d374f23..e0f44d3e62c 100644 --- a/src/backend/access/brin/brin_inclusion.c +++ b/src/backend/access/brin/brin_inclusion.c @@ -164,6 +164,7 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS) column->bv_values[INCLUSION_UNMERGEABLE] = BoolGetDatum(false); column->bv_values[INCLUSION_CONTAINS_EMPTY] = BoolGetDatum(false); column->bv_allnulls = false; + column->bv_hasnulls = true; new = true; } diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c index 9e8a8e056cc..8a5661a8952 100644 --- a/src/backend/access/brin/brin_minmax.c +++ b/src/backend/access/brin/brin_minmax.c @@ -90,6 +90,7 @@ brin_minmax_add_value(PG_FUNCTION_ARGS) column->bv_values[0] = datumCopy(newval, attr->attbyval, attr->attlen); column->bv_values[1] = datumCopy(newval, attr->attbyval, attr->attlen); column->bv_allnulls = false; + column->bv_hasnulls = true; PG_RETURN_BOOL(true); } diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c index 9a0bcf6698d..4e7119e2d78 100644 --- a/src/backend/access/brin/brin_minmax_multi.c +++ b/src/backend/access/brin/brin_minmax_multi.c @@ -2500,6 +2500,7 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS) MemoryContextSwitchTo(oldctx); column->bv_allnulls = false; + column->bv_hasnulls = true; modified = true; column->bv_mem_value = PointerGetDatum(ranges); -- 2.37.3 From 57e53d34f2f7bba91fcc0de6f4eff551669554fb Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Sat, 22 Oct 2022 02:26:48
Re: Pluggable toaster
Hi! Aleksander, we have had this in mind while developing this feature, and have checked it. Just a slight modification is needed to make it work with Pluggable Storage (Access Methods) API. On Fri, Oct 21, 2022 at 4:01 PM Aleksander Alekseev < aleksan...@timescale.com> wrote: > Hi Nikita, > > > Here's rebased patch: > > v23-0001-toaster-interface.patch - Pluggable TOAST API interface with > reference (original) TOAST mechanics - new API is introduced but > > reference TOAST is still left unchanged; > > v23-0002-toaster-default.patch - Default TOAST mechanics is > re-implemented using TOAST API and is plugged in as Default Toaster; > > v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package > > Thanks for keeping the patch up to date. > > As I recall one of the open questions was: how this feature is > supposed to work with table access methods? Could you please summarize > what the current consensus is in this respect? > > -- > Best regards, > Aleksander Alekseev > -- Regards, Nikita Malakhov Postgres Professional https://postgrespro.ru/
Re: Collation version tracking for macOS
Hi, Here is a rebase of this experimental patch. I think the basic mechanics are promising, but we haven't agreed on a UX. I hope we can figure this out. Restating the choice made in this branch of the experiment: Here I try to be just like DB2 (if I understood its manual correctly). In DB2, you can use names like "en_US" if you don't care about changes, and names like "CLDR181_en_US" if you do. It's the user's choice to use the second kind to avoid "unexpected effects on applications or database objects" after upgrades. Translated to PostgreSQL concepts, you can use a database default ICU locale like "en-US" if you don't care and "67:en-US" if you do, and for COLLATION objects it's the same. The convention I tried in this patch is that you use either "en-US-x-icu" (which points to "en-US") or "en-US-x-icu67" (which points to "67:en-US") depending on whether you care about this problem. I recognise that this is a bit cheesy, it's all the user's problem to deal with or ignore. An alternative mentioned by Peter E was that the locale names shouldn't carry the prefix, but somehow we should have a list of ICU versions to search for a matching datcollversion/collversion. How would that look? Perhaps a GUC, icu_library_versions = '63, 67, 71'? There is a currently natural and smallish range of supported versions, probably something like 54 ... U_ICU_VERSION_MAJOR_NUM, but it seems a bit weird to try to dlopen ~25 libraries or whatever it might be... Do you think we should try to code this up? I haven't tried it, but the main usability problem I predict with that idea is this: It can cope with a scenario where you created a database with ICU 63 and started using a default of "en" and maybe some explicit fr-x-icu or whatever, and then you upgrade to a new postgres binary using ICU 71, and, as long as you still have ICU 63 installed it'll just magicaly keep using 63, now via dlopen(). But it doesn't provide a way for me to create a new database that uses 63 on purpose when I know what I'm doing. There are various reasons I might want to do that. Maybe the ideas could be combined? Perhaps "en" means "create using binary's linked ICU, open using search-by-collversion", while "67:en" explicitly says which to use? Changes since last version: * Now it just uses the default dlopen() search path, unless you set icu_library_path. Is that a security problem? It's pretty convenient, because it means you can just "apt-get install libicu63" (or local equivalent) and that's all, now 63 is available. * To try the idea out, I made it automatically create "*-x-icu67" alongside the regular "-x-icu" collation objects at initdb time. From d3e83d0aa5cbb3eb192a2f66d68623cd3b1595b4 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Wed, 8 Jun 2022 17:43:53 +1200 Subject: [PATCH v5] WIP: Multi-version ICU. Add a layer of indirection when accessing ICU, so that multiple major versions of the library can be used at once. Versions other than the one that PostgreSQL was linked against are opened with dlopen(), but we refuse to open version higher than the one were were compiled against. The ABI might change in future releases so that wouldn't be safe. By default, the system linker's default search path is used to find libraries, but icu_library_path may be used to specify an absolute path to look in. ICU libraries are expected to have been built without ICU's --disable-renaming option. That is, major versions must use distinct symbol names. This arrangement means that at least one major version of ICU is always available -- the one that PostgreSQL was linked again. It should be simple on most software distributions to install extra versions using a package manager, or to build extra libraries as required, to access older ICU releases. For example, on Debian bullseye the packages are named libicu63, libicu67, libicu71. In this version of the patch, '63:en' used as a database default locale or COLLATION object requests ICU library 63, and 'en' requests the library that is linked against the postgres executable. XXX Many other designs possible, to discuss! Discussion: https://postgr.es/m/CA%2BhUKGL4VZRpP3CkjYQkv4RQ6pRYkPkSNgKSxFBwciECQ0mEuQ%40mail.gmail.com --- src/backend/access/hash/hashfunc.c | 16 +- src/backend/commands/collationcmds.c | 20 ++ src/backend/utils/adt/formatting.c | 53 +++- src/backend/utils/adt/pg_locale.c| 364 ++- src/backend/utils/adt/varchar.c | 16 +- src/backend/utils/adt/varlena.c | 56 +++-- src/backend/utils/misc/guc_tables.c | 14 ++ src/include/utils/pg_locale.h| 73 ++ src/tools/pgindent/typedefs.list | 3 + 9 files changed, 549 insertions(+), 66 deletions(-) diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c index b57ed946c4..0a61538efd 100644 --- a/src/backend/access/hash/hashfunc.c +++ b/src/backend/access/hash/hashfunc.c @@ -298,11 +298,11 @@ hashtext(PG_FUNCTION_ARGS)
Multiple grouping set specs referencing duplicate alias
Hi all! I think I may have stumbled across a case of wrong results on HEAD (same through version 9.6, though interestingly 9.5 produces different results altogether). test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY ai2, ROLLUP(ai1) ORDER BY ai1, ai2; ai1 | ai2 -+- 1 | 1 | 1 2 | 2 | 2 3 | 3 | 3 (6 rows) I had expected: ai1 | ai2 -+- 1 | 1 2 | 2 3 | 3 | 1 | 2 | 3 (6 rows) It seems to me that the plan is missing a Sort node (on ai1 and ai2) above the Aggregate node. QUERY PLAN GroupAggregate Group Key: i, i Group Key: i -> Sort Sort Key: i -> Function Scan on generate_series i I have a hunch part of the issue may be an assumption that a duplicate aliased column will produce the same column values and hence isn't included in the range table, nor subsequently the pathkeys. However, that assumption does not seem to be true for queries with multiple grouping set specifications: test=# SELECT i as ai1, i as ai2 FROM (values (1),(2),(3)) v(i) GROUP BY ai1, ROLLUP(ai2); ai1 | ai2 -+- 1 | 1 2 | 2 3 | 3 1 | 2 | 3 | (6 rows) It seems to me that excluding the duplicate alias from the pathkeys is leading to a case where the group order is incorrectly determined to satisfy the sort order. Thus create_ordered_paths() decides against applying an explicit sort node. But simply forcing an explicit sort still seems wrong since we've effectively lost a relevant column for the sort. I tinkered a bit and hacked together an admittedly ugly patch that triggers an explicit sort constructed from the parse tree. An alternative approach I had considered was to update the rewriteHandler to explicitly force the existence of the duplicate alias column in the range tables. But that also felt meh. Does this seem like a legit issue? And if so, any thoughts on alternative approaches? Thanks, David Kimura commit 7e7b9a0dce1ba60f4fc087082f04b04e7f405539 Author: David Kimura Date: Wed Oct 19 21:32:27 2022 + Fix multiple grouping specs using duplicate alias bug There seems to have been an assumption that multiple aliases that refer to the same column would always produce mirrored results. However, that does not seem to be the case when the aliased columns are passed through different grouping set specs. Consider the case: SELECT i AS alias1, i AS alias2 FROM generate_series(1,3)i GROUP BY alias2, ROLLUP(alias1); This led to a scenario where a relevant column is collapsed at parse time and thus excluded from sort pathkeys. Consequently group order could then incorrectly decide it satisfied the sort order. Which leads to create_ordered_paths() forgoing an explicit sort node. A solution to the issue is to force an explicit sort and revive the relevant column info. diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index ac86ce9003..0b81d4d211 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -2157,6 +2157,46 @@ change_plan_targetlist(Plan *subplan, List *tlist, bool tlist_parallel_safe) return subplan; } +/* + * reevaluate_sort_info + * + * Build the sort node info from the parse tree. + */ +static void +reevaluate_sort_info(Query *parse, Sort *plan) +{ + int i = 0; + ListCell *l1; + ListCell *l2; + + plan->numCols = list_length(parse->sortClause); + + plan->sortColIdx = palloc(sizeof(int) * plan->numCols); + plan->sortOperators = palloc(sizeof(int) * plan->numCols); + plan->collations = palloc(sizeof(int) * plan->numCols); + plan->nullsFirst = palloc(sizeof(int) * plan->numCols); + + foreach(l1, parse->sortClause) + { + foreach(l2, parse->targetList) + { + SortGroupClause *sortClause = (SortGroupClause *) lfirst(l1); + TargetEntry *targetEntry = (TargetEntry *) lfirst(l2); + + if (sortClause->tleSortGroupRef == targetEntry->ressortgroupref) + { +plan->sortColIdx[i] = targetEntry->resno; +plan->sortOperators[i] = sortClause->sortop; +plan->collations[i] = exprCollation((Node *) targetEntry->expr); +plan->nullsFirst[i] = sortClause->nulls_first; + +i++; +break; + } + } + } +} + /* * create_sort_plan * @@ -2187,6 +2227,9 @@ create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags) IS_OTHER_REL(best_path->subpath->parent) ? best_path->path.parent->relids : NULL); + if (best_path->need_evaluation) + reevaluate_sort_info(root->parse, plan); + copy_generic_path_info(>plan, (Path *) best_path); return plan; @@ -2213,6 +2256,9 @@ create_incrementalsort_plan(PlannerInfo *root, IncrementalSortPath *best_path, best_path->spath.path.parent->relids : NULL, best_path->nPresortedCols); + if
Re: refactor ownercheck and aclcheck functions
On 20.10.22 01:24, Corey Huinker wrote: I'd be inclined to remove the highly used ones as well. That way the codebase would have more examples of object_ownercheck() for readers to see. Seeing the existence of pg_FOO_ownercheck implies that a pg_BAR_ownercheck might exist, and if BAR is missing they might be inclined to re-add it. We do have several ownercheck and aclcheck functions that can't be refactored into this framework right now, so we do have to keep some special-purpose functions around anyway. I'm afraid converting all the callers would blow up this patch quite a bit, but it could be done as a follow-up patch. If we do keep them, would it make sense to go the extra step and turn the remaining six "regular" into static inline functions or even #define-s? That could make sense.
patch suggestion: Fix citext_utf8 test's "Turkish I" with ICU collation provider
Hello, hackers. In current master, as well as in REL_15_STABLE, installcheck in contrib/citext fails in most locales, if we use ICU as a locale provider: $ rm -fr data; initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start && make -C contrib/citext installcheck; pg_ctl -D data stop; cat contrib/citext/regression.diffs ... test citext ... ok 457 ms test citext_utf8 ... FAILED 21 ms ... diff -u /home/ashutosh/pg/REL_15_STABLE/contrib/citext/expected/citext_utf8.out /home/ashutosh/pg/REL_15_STABLE/contrib/citext/results/citext_utf8.out --- /home/ashutosh/pg/REL_15_STABLE/contrib/citext/expected/citext_utf8.out 2022-07-14 17:45:31.747259743 +0300 +++ /home/ashutosh/pg/REL_15_STABLE/contrib/citext/results/citext_utf8.out 2022-10-21 19:43:21.146044062 +0300 @@ -54,7 +54,7 @@ SELECT 'i'::citext = 'İ'::citext AS t; t --- - t + f (1 row) The reason is that in ICU lowercasing Unicode symbol "İ" (U+0130 "LATIN CAPITAL LETTER I WITH DOT ABOVE") can give two valid results: - "i", i.e. "U+0069 LATIN SMALL LETTER I" in "tr" and "az" locales. - "i̇", i.e. "U+0069 LATIN SMALL LETTER I" followed by "U+0307 COMBINING DOT ABOVE" in all other locales I've tried (including "en-US", "de", "ru", etc). And the way this test is currently written only accepts plain latin "i", which might be true in glibc, but is not so in ICU. Verified on ICU 70.1, but I've seen this on few other ICU versions as well, so I think this is probably an ICU's feature, not a bug(?). Since we probably want installcheck in contrib/citext to pass on databases with various locales, including reasonable ICU-based ones, I suggest to fix this test by accepting either of outputs as valid. I can see two ways of doing that: 1. change SQL in the test to use "IN" instead of "="; 2. add an alternative output. I think in this case "IN" is better, because that allows a single comment to address both possible outputs and to avoid unnecessary duplication. I've attached a patch authored mostly by my colleague, Roman Zharkov, as one possible fix. Only versions 15+ are affected. Any comments? -- Anton Voloshin Postgres Professional, The Russian Postgres Company https://postgrespro.ruFrom 5cfbb59a11d099fa9b8703502fd8aac936a02c4d Mon Sep 17 00:00:00 2001 From: Roman Zharkov Date: Fri, 21 Oct 2022 19:56:19 +0300 Subject: [PATCH] Fix citext_utf8 test's "Turkish I" with ICU collation provider MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With the ICU collation provider the Turkish unicode symbol "İ" (U+0130 "LATIN CAPITAL LETTER I WITH DOT ABOVE") has two lowercase variants: - "i", i.e. "U+0069 LATIN SMALL LETTER I", in "tr" and "az" locales. - "i̇", i.e. "U+0069 LATIN SMALL LETTER I" followed by "U+0307 COMBINING DOT ABOVE" in all other locales I've tried (including "en-US", "de", "ru", etc). So, add both variants to the test. --- contrib/citext/expected/citext_utf8.out | 6 -- contrib/citext/sql/citext_utf8.sql | 6 -- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/contrib/citext/expected/citext_utf8.out b/contrib/citext/expected/citext_utf8.out index 666b07ccec4..62f0794028f 100644 --- a/contrib/citext/expected/citext_utf8.out +++ b/contrib/citext/expected/citext_utf8.out @@ -48,10 +48,12 @@ SELECT 'Ä'::citext <> 'Ä'::citext AS t; t (1 row) --- Test the Turkish dotted I. The lowercase is a single byte while the +-- Test the Turkish dotted I. The lowercase might be a single byte while the -- uppercase is multibyte. This is why the comparison code can't be optimized -- to compare string lengths. -SELECT 'i'::citext = 'İ'::citext AS t; +-- Note that lower('İ') is 'i' (U+0069) in tr and az locales, +-- but 'i̇' (U+0069 U+0307) in C and most (all?) other locales. +SELECT 'İ'::citext in ('i'::citext, 'i̇'::citext) AS t; t --- t diff --git a/contrib/citext/sql/citext_utf8.sql b/contrib/citext/sql/citext_utf8.sql index d068000b423..942daa9ce50 100644 --- a/contrib/citext/sql/citext_utf8.sql +++ b/contrib/citext/sql/citext_utf8.sql @@ -24,10 +24,12 @@ SELECT 'À'::citext <> 'B'::citext AS t; SELECT 'Ä'::text <> 'Ä'::text AS t; SELECT 'Ä'::citext <> 'Ä'::citext AS t; --- Test the Turkish dotted I. The lowercase is a single byte while the +-- Test the Turkish dotted I. The lowercase might be a single byte while the -- uppercase is multibyte. This is why the comparison code can't be optimized -- to compare string lengths. -SELECT 'i'::citext = 'İ'::citext AS t; +-- Note that lower('İ') is 'i' (U+0069) in tr and az locales, +-- but 'i̇' (U+0069 U+0307) in C and most (all?) other locales. +SELECT 'İ'::citext in ('i'::citext, 'i̇'::citext) AS t; -- Regression. SELECT 'láska'::citext <> 'laská'::citext AS t; -- 2.38.1
Re: Missing update of all_hasnulls in BRIN opclasses
On 10/21/22 17:50, Matthias van de Meent wrote: > On Fri, 21 Oct 2022 at 17:24, Tomas Vondra > wrote: >> >> Hi, >> >> While working on some BRIN code, I discovered a bug in handling NULL >> values - when inserting a non-NULL value into a NULL-only range, we >> reset the all_nulls flag but don't update the has_nulls flag. And >> because of that we then fail to return the range for IS NULL ranges. > > Ah, that's bad. > Yeah, I guess we'll need to inform the users to consider rebuilding BRIN indexes on NULL-able columns. > One question though: doesn't (shouldn't?) column->bv_allnulls already > imply column->bv_hasnulls? The column has nulls if all of the values > are null, right? Or is the description of the field deceptive, and > does bv_hasnulls actually mean "has nulls bitmap"? > What null bitmap do you mean? We're talking about summary for a page range - IIRC we translate this to nullbitmap for a BRIN tuple, but there may be multiple columns, and "has nulls bitmap" is an aggregate over all of them. Yeah, maybe it'd make sense to also have has_nulls=true whenever all_nulls=true, and maybe it'd be simpler because it'd be enough to check just one flag in consistent function etc. But we still need to track 2 different states - "has nulls" and "has summary". In any case, this ship sailed long ago - at least for the existing opclasses. >> Attached is a patch fixing this by properly updating the has_nulls flag. > > One comment on the patch: > >> +SET enable_seqscan = off; >> + [...] >> +SET enable_seqscan = off; > > Looks like duplicated SETs. Should that last one be RESET instead? > Yeah, should have been RESET. > Apart from that, this patch looks good. > Thanks! -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: thinko in basic_archive.c
On Fri, Oct 21, 2022 at 10:43 AM Kyotaro Horiguchi wrote: > > Thanks, but we don't need to wipe out the all bytes. Just putting \0 > at the beginning of the buffer is sufficient. Nah, that's not a clean way IMO. > And the Memset() at the > beginning of basic_archive_file_internal is not needed since that > static variables are initially initialized to zeros. Removed. MemSet() after durable_rename() would be sufficient. > This is not necessarily needed, but it might be better we empty > tempfilepath after unlinking the file. I think it's not necessary as the archiver itself is shutting down and I don't think the server calls the shutdown callback twice. However, if we want basic_archive_shutdown() to be more protective against multiple calls (for any reason that we're missing), we can have a static local variable to quickly exit if the callback is already called. instead of MemSet(), but that's not needed I guess. > + expectation that a value will soon be provided. Care must be taken when > + multiple servers are archiving to the same > + basic_archive.archive_library directory as they all > + might try to archive the same WAL file. > > I don't understand what kind of care should be taken by reading this.. It's just a notice, however I agree with you that it may be confusing. I've removed it. Please review the attached v4 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v4-0001-Remove-leftover-temporary-files-via-basic_archive.patch Description: Binary data
Re: Missing update of all_hasnulls in BRIN opclasses
On Fri, 21 Oct 2022 at 17:24, Tomas Vondra wrote: > > Hi, > > While working on some BRIN code, I discovered a bug in handling NULL > values - when inserting a non-NULL value into a NULL-only range, we > reset the all_nulls flag but don't update the has_nulls flag. And > because of that we then fail to return the range for IS NULL ranges. Ah, that's bad. One question though: doesn't (shouldn't?) column->bv_allnulls already imply column->bv_hasnulls? The column has nulls if all of the values are null, right? Or is the description of the field deceptive, and does bv_hasnulls actually mean "has nulls bitmap"? > Attached is a patch fixing this by properly updating the has_nulls flag. One comment on the patch: > +SET enable_seqscan = off; > + [...] > +SET enable_seqscan = off; Looks like duplicated SETs. Should that last one be RESET instead? Apart from that, this patch looks good. - Matthias
Re: Avoid memory leaks during base backups
On Fri, Oct 21, 2022 at 11:58 AM Michael Paquier wrote: > > To be exact, it seems to me that tablespace_map and backup_state > should be reset before deleting backupcontext, but the reset of > backupcontext should happen after the fact. > > + backup_state = NULL; > tablespace_map = NULL; > These two in pg_backup_start() don't matter, do they? They are > reallocated a couple of lines down. After all, that is what is being discussed here; what if palloc down below fails and they're not reset to NULL after MemoryContextReset()? > +* across. We keep the memory allocated in this memory context less, > What does "We keep the memory allocated in this memory context less" > mean here? We try to keep it less because we don't want to allocate more memory and leak it unless pg_start_backup() is called again. Please read the description. I'll leave it to the committer's discretion whether to have that part or remove it. On Fri, Oct 21, 2022 at 12:11 PM Kyotaro Horiguchi wrote: > > > +* across. We keep the memory allocated in this memory context less, > +* because any error before reaching pg_backup_stop() can leak the > memory > +* until pg_backup_start() is called again. While this is not smart, > it > +* helps to keep things simple. > > I think the "less" is somewhat obscure. I feel we should be more > explicitly. And we don't need to put emphasis on "leak". I recklessly > propose this as the draft. I tried to put it simple, please see the attached v10. I'll leave it to the committer's discretion for better wording. On Fri, Oct 21, 2022 at 7:47 PM Robert Haas wrote: > > On Fri, Oct 21, 2022 at 2:18 AM Michael Paquier wrote: > > AFAIK, one of the callbacks associated to a memory context could > > fail, see comments before MemoryContextCallResetCallbacks() in > > MemoryContextDelete(). I agree that it should not matter here, but I > > think that it is better to reset the pointers before attempting the > > deletion of the memory context in this case. > > I think this is nitpicking. There's no real danger here, and if there > were, the error handling would have to take it into account somehow, > which it doesn't. > > I'd probably do it before resetting the context as a matter of style, > to make it clear that there's no window in which the pointers are set > but referencing invalid memory. But I do not think it makes any > practical difference at all. Please see the attached v10. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 3d8cf10d0bab48afee639359669b69c2901afd34 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Fri, 21 Oct 2022 15:04:36 + Subject: [PATCH v10] Avoid memory leaks during backups using SQL-callable functions Any failure while taking backups using SQL-callable functions can leak the memory allocated for backup_state and tablespace_map, as they both need to be long-lived, they are being created in TopMemoryContext. To fix the memory leak problem, we create a special session-level memory context as a direct child of TopMemoryContext so that the memory allocated is carried across. We delete the memory context at the end of pg_backup_stop(). We keep the memory allocated in this memory context less, because any error before reaching pg_backup_stop() can still leak the memory until pg_backup_start() is called again. While this is not smart, it helps to keep things simple. Author: Bharath Rupireddy Reviewed-by: Robert Haas, Alvaro Herrera Reviewed-by: Cary Huang, Michael Paquier Discussion: https://www.postgresql.org/message-id/CALj2ACXqvfKF2B0beQ=aJMdWnpNohmBPsRg=EDQj_6y1t2O8mQ@mail.gmail.com --- src/backend/access/transam/xlogfuncs.c | 42 ++ 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index a801a94fe8..e9ec86316b 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -45,6 +45,9 @@ static BackupState *backup_state = NULL; static StringInfo tablespace_map = NULL; +/* A long-lived workspace for SQL-callable backup functions. */ +static MemoryContext backupcontext = NULL; + /* * pg_backup_start: set up for taking an on-line backup dump * @@ -72,27 +75,27 @@ pg_backup_start(PG_FUNCTION_ARGS) /* * backup_state and tablespace_map need to be long-lived as they are used - * in pg_backup_stop(). + * in pg_backup_stop(). We create a special session-level memory context as + * a direct child of TopMemoryContext so that the memory allocated is + * carried across. We typically delete the memory context at the end of + * pg_backup_stop(), however, an error before it can leak the memory until + * pg_backup_start() is called again. Hence, we try to keep the memory + * allocated in this memory context less. While this is not smart, it helps + * to keep things simple. */ -
Missing update of all_hasnulls in BRIN opclasses
Hi, While working on some BRIN code, I discovered a bug in handling NULL values - when inserting a non-NULL value into a NULL-only range, we reset the all_nulls flag but don't update the has_nulls flag. And because of that we then fail to return the range for IS NULL ranges. Reproducing this is trivial: create table t (a int); create index on t using brin (a); insert into t values (null); insert into t values (1); set enable_seqscan = off; select * from t where a is null; This should return 1 row, but actually it returns no rows. Attached is a patch fixing this by properly updating the has_nulls flag. I reproduced this all the way back to 9.5, so it's a long-standing bug. It's interesting no one noticed / reported it so far, it doesn't seem like a particularly rare corner case. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Companydiff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c index 6b0af7267d5..60315450b41 100644 --- a/src/backend/access/brin/brin_bloom.c +++ b/src/backend/access/brin/brin_bloom.c @@ -539,6 +539,7 @@ brin_bloom_add_value(PG_FUNCTION_ARGS) BloomGetFalsePositiveRate(opts)); column->bv_values[0] = PointerGetDatum(filter); column->bv_allnulls = false; + column->bv_hasnulls = true; updated = true; } else diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c index 4b02d374f23..e0f44d3e62c 100644 --- a/src/backend/access/brin/brin_inclusion.c +++ b/src/backend/access/brin/brin_inclusion.c @@ -164,6 +164,7 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS) column->bv_values[INCLUSION_UNMERGEABLE] = BoolGetDatum(false); column->bv_values[INCLUSION_CONTAINS_EMPTY] = BoolGetDatum(false); column->bv_allnulls = false; + column->bv_hasnulls = true; new = true; } diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c index 9e8a8e056cc..8a5661a8952 100644 --- a/src/backend/access/brin/brin_minmax.c +++ b/src/backend/access/brin/brin_minmax.c @@ -90,6 +90,7 @@ brin_minmax_add_value(PG_FUNCTION_ARGS) column->bv_values[0] = datumCopy(newval, attr->attbyval, attr->attlen); column->bv_values[1] = datumCopy(newval, attr->attbyval, attr->attlen); column->bv_allnulls = false; + column->bv_hasnulls = true; PG_RETURN_BOOL(true); } diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c index 9a0bcf6698d..4e7119e2d78 100644 --- a/src/backend/access/brin/brin_minmax_multi.c +++ b/src/backend/access/brin/brin_minmax_multi.c @@ -2500,6 +2500,7 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS) MemoryContextSwitchTo(oldctx); column->bv_allnulls = false; + column->bv_hasnulls = true; modified = true; column->bv_mem_value = PointerGetDatum(ranges); diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out index 73fa38396e4..cc896c2d9d4 100644 --- a/src/test/regress/expected/brin.out +++ b/src/test/regress/expected/brin.out @@ -572,3 +572,39 @@ CREATE UNLOGGED TABLE brintest_unlogged (n numrange); CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n); INSERT INTO brintest_unlogged VALUES (numrange(0, 2^1000::numeric)); DROP TABLE brintest_unlogged; +-- test that we properly update has_nulls when inserting something into +-- a range that only had NULLs before +CREATE TABLE brintest_4 (a INT, b INT, c INT, d INET); +CREATE INDEX brintest_4_idx ON brintest_4 USING brin (a, b int4_minmax_multi_ops, c int4_bloom_ops, d inet_inclusion_ops); +-- insert a NULL value, so that we get an all-nulls range +INSERT INTO brintest_4 VALUES (NULL, NULL, NULL, NULL); +-- now insert a non-NULL value +INSERT INTO brintest_4 VALUES (1, 1, 1, '127.0.0.1'); +-- see that we can still match the value when using the brin index +SET enable_seqscan = off; +SELECT * FROM brintest_4 WHERE a IS NULL; + a | b | c | d +---+---+---+--- + | | | +(1 row) + +SELECT * FROM brintest_4 WHERE b IS NULL; + a | b | c | d +---+---+---+--- + | | | +(1 row) + +SELECT * FROM brintest_4 WHERE c IS NULL; + a | b | c | d +---+---+---+--- + | | | +(1 row) + +SELECT * FROM brintest_4 WHERE d IS NULL; + a | b | c | d +---+---+---+--- + | | | +(1 row) + +DROP TABLE brintest_4; +SET enable_seqscan = off; diff --git a/src/test/regress/sql/brin.sql b/src/test/regress/sql/brin.sql index e68e9e18df5..17a01a4b82f 100644 --- a/src/test/regress/sql/brin.sql +++ b/src/test/regress/sql/brin.sql @@ -515,3 +515,24 @@ CREATE UNLOGGED TABLE brintest_unlogged (n numrange); CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n); INSERT INTO brintest_unlogged VALUES (numrange(0, 2^1000::numeric)); DROP TABLE brintest_unlogged; + +-- test that we properly update has_nulls when inserting something into +-- a range that only had NULLs before +CREATE TABLE brintest_4 (a INT, b INT, c INT, d INET); +CREATE INDEX
Re: Crash after a call to pg_backup_start()
On Fri, Oct 21, 2022 at 7:06 PM Michael Paquier wrote: > > On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote: > > Yeah, the two conditions could be both false. How about we update the > > comment a bit to emphasize this? Such as > > > > /* At most one of these conditions can be true */ > > or > > /* These conditions can not be both true */ > > If you do that, it would be a bit easier to read as of the following > assertion instead? Like: > Assert(!during_backup_start || >sessionBackupState == SESSION_BACKUP_NONE); > > Please note that I have not checked in details all the interactions > behind register_persistent_abort_backup_handler() before entering in > do_pg_backup_start() and the ERROR_CLEANUP block used in this > routine (just a matter of some elog(ERROR)s put here and there, for > example). Anyway, yes, both conditions can be false, and that's easy > to reproduce: > 1) Do pg_backup_start(). > 2) Do pg_backup_stop(). > 3) Stop the session to kick do_pg_abort_backup() > 4) Assert()-boom. I'm wondering if we need the assertion at all. We know that when the arg is true or the sessionBackupState is SESSION_BACKUP_RUNNING, the runningBackups would've been incremented and we can just go ahead and decrement it, like the attached patch. This is a cleaner approach IMO unless I'm missing something here. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From b147720ca68a120efdf8f20c58cb3499901e6d61 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Fri, 21 Oct 2022 14:29:27 + Subject: [PATCH v1] Fix assertion failure in do_pg_abort_backup() --- src/backend/access/transam/xlog.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index dea978a962..104309c56f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8841,11 +8841,12 @@ do_pg_abort_backup(int code, Datum arg) { bool during_backup_start = DatumGetBool(arg); - /* Only one of these conditions can be true */ - Assert(during_backup_start ^ - (sessionBackupState == SESSION_BACKUP_RUNNING)); - - if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE) + /* + * When either of these were true, we know that the runningBackups has + * already been incremented, hence decrement it. + */ + if (during_backup_start || + sessionBackupState == SESSION_BACKUP_RUNNING) { WALInsertLockAcquireExclusive(); Assert(XLogCtl->Insert.runningBackups > 0); -- 2.34.1
Re: ICU for global collation
Hello! I discovered an interesting behaviour during installcheck runs when the cluster was initialized with ICU locale provider: $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start 1) The ECPG tests fail because they use the SQL_ASCII encoding [1], the database template0 uses the ICU locale provider and SQL_ASCII is not supported by ICU: $ make -C src/interfaces/ecpg/ installcheck ... == creating database "ecpg1_regression" == ERROR: encoding "SQL_ASCII" is not supported with ICU provider ERROR: database "ecpg1_regression" does not exist command failed: "/home/marina/postgresql/master/my/inst/bin/psql" -X -c "CREATE DATABASE \"ecpg1_regression\" TEMPLATE=template0 ENCODING='SQL_ASCII'" -c "ALTER DATABASE \"ecpg1_regression\" SET lc_messages TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_monetary TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_numeric TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_time TO 'C';ALTER DATABASE \"ecpg1_regression\" SET bytea_output TO 'hex';ALTER DATABASE \"ecpg1_regression\" SET timezone_abbreviations TO 'Default';" "postgres" 2) The option --no-locale in pg_regress is described as "use C locale" [2]. But in this case the created databases actually use the ICU locale provider with the ICU cluster locale from template0 (see diff_check_backend_used_provider.patch): $ make NO_LOCALE=1 installcheck In regression.diffs: diff -U3 /home/marina/postgresql/master/src/test/regress/expected/test_setup.out /home/marina/postgresql/master/src/test/regress/results/test_setup.out --- /home/marina/postgresql/master/src/test/regress/expected/test_setup.out 2022-09-27 05:31:27.674628815 +0300 +++ /home/marina/postgresql/master/src/test/regress/results/test_setup.out 2022-10-21 15:09:31.232992885 +0300 @@ -143,6 +143,798 @@ \set filename :abs_srcdir '/data/person.data' COPY person FROM :'filename'; VACUUM ANALYZE person; +NOTICE: varstrfastcmp_locale sss->collate_c 0 sss->locale 0xefacd0 +NOTICE: varstrfastcmp_locale sss->locale->provider i +NOTICE: varstrfastcmp_locale sss->locale->info.icu.locale en-US ... The patch diff_fix_pg_regress_create_database.patch fixes both issues for me. [1] https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/interfaces/ecpg/test/Makefile#L18 [2] https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/test/regress/pg_regress.c#L1992 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c index b57ed946c42bb54ede800e95045aa937a8dbad85..b3c0f6f753f8428274389844ccf9778a7ed47ea4 100644 --- a/src/backend/access/hash/hashfunc.c +++ b/src/backend/access/hash/hashfunc.c @@ -281,6 +281,14 @@ hashtext(PG_FUNCTION_ARGS) if (!lc_collate_is_c(collid)) mylocale = pg_newlocale_from_collation(collid); + elog(NOTICE, "hashtext lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale); + if (mylocale) + { + elog(NOTICE, "hashtext mylocale->provider %c", mylocale->provider); + if (mylocale->provider == COLLPROVIDER_ICU) + elog(NOTICE, "hashtext mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)"); + } + if (!mylocale || mylocale->deterministic) { result = hash_any((unsigned char *) VARDATA_ANY(key), @@ -337,6 +345,14 @@ hashtextextended(PG_FUNCTION_ARGS) if (!lc_collate_is_c(collid)) mylocale = pg_newlocale_from_collation(collid); + elog(NOTICE, "hashtextextended lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale); + if (mylocale) + { + elog(NOTICE, "hashtextextended mylocale->provider %c", mylocale->provider); + if (mylocale->provider == COLLPROVIDER_ICU) + elog(NOTICE, "hashtextextended mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)"); + } + if (!mylocale || mylocale->deterministic) { result = hash_any_extended((unsigned char *) VARDATA_ANY(key), diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c index 02d462a659778016f3c4479d425ba0a84feb6e26..9627c84a7ccfb4c4013556a51c989e9e6d611634 100644 --- a/src/backend/regex/regc_pg_locale.c +++ b/src/backend/regex/regc_pg_locale.c @@ -243,6 +243,8 @@ pg_set_regex_collation(Oid collation) errhint("Use the COLLATE clause to set the collation explicitly."))); } + elog(NOTICE, "pg_set_regex_collation lc_ctype_is_c(collid) %d", lc_ctype_is_c(collation)); + if (lc_ctype_is_c(collation)) { /* C/POSIX collations use this path regardless of database encoding */ @@ -259,6 +261,14 @@ pg_set_regex_collation(Oid collation) */ pg_regex_locale = pg_newlocale_from_collation(collation); + elog(NOTICE, "pg_set_regex_collation pg_regex_locale %p", pg_regex_locale); + if (pg_regex_locale) + { + elog(NOTICE,
Re: Avoid memory leaks during base backups
On Fri, Oct 21, 2022 at 2:18 AM Michael Paquier wrote: > AFAIK, one of the callbacks associated to a memory context could > fail, see comments before MemoryContextCallResetCallbacks() in > MemoryContextDelete(). I agree that it should not matter here, but I > think that it is better to reset the pointers before attempting the > deletion of the memory context in this case. I think this is nitpicking. There's no real danger here, and if there were, the error handling would have to take it into account somehow, which it doesn't. I'd probably do it before resetting the context as a matter of style, to make it clear that there's no window in which the pointers are set but referencing invalid memory. But I do not think it makes any practical difference at all. -- Robert Haas EDB: http://www.enterprisedb.com
Re: parse partition strategy string in gram.y
On Fri, 21 Oct 2022 at 20:34, Justin Pryzby wrote: > On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote: >> Is there any way to get the regression tests diffs from Cirrus CI? >> I did not find the diffs in [1]. >> >> [1] https://cirrus-ci.com/build/4721735111540736 > > They're called "main". > I'm planning on submitting a patch to rename it to "regress", someday. > See also: > https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com Oh, thank you very much! I find it in testrun/build/testrun/main/regress [1]. [1] https://api.cirrus-ci.com/v1/artifact/task/6215926717612032/testrun/build/testrun/main/regress/regression.diffs -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Crash after a call to pg_backup_start()
On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote: > Yeah, the two conditions could be both false. How about we update the > comment a bit to emphasize this? Such as > > /* At most one of these conditions can be true */ > or > /* These conditions can not be both true */ If you do that, it would be a bit easier to read as of the following assertion instead? Like: Assert(!during_backup_start || sessionBackupState == SESSION_BACKUP_NONE); Please note that I have not checked in details all the interactions behind register_persistent_abort_backup_handler() before entering in do_pg_backup_start() and the ERROR_CLEANUP block used in this routine (just a matter of some elog(ERROR)s put here and there, for example). Anyway, yes, both conditions can be false, and that's easy to reproduce: 1) Do pg_backup_start(). 2) Do pg_backup_stop(). 3) Stop the session to kick do_pg_abort_backup() 4) Assert()-boom. -- Michael signature.asc Description: PGP signature
Re: cross-platform pg_basebackup
On Fri, Oct 21, 2022 at 4:14 AM davinder singh wrote: > Hi, > Patch v2 looks good to me, I have tested it, and pg_basebackup works fine > across the platforms (Windows to Linux and Linux to Windows). > Syntax used for testing > $ pg_basebackup -h remote_server_ip -p 5432 -U user_name -D backup/data -T > olddir=newdir > > I have also tested with non-absolute paths, it behaves as expected. Cool. Thanks to you, Andrew, and Tom for reviewing. Committed and back-patched to all supported branches. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Pluggable toaster
Hi Nikita, > Here's rebased patch: > v23-0001-toaster-interface.patch - Pluggable TOAST API interface with > reference (original) TOAST mechanics - new API is introduced but > reference TOAST is still left unchanged; > v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented > using TOAST API and is plugged in as Default Toaster; > v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package Thanks for keeping the patch up to date. As I recall one of the open questions was: how this feature is supposed to work with table access methods? Could you please summarize what the current consensus is in this respect? -- Best regards, Aleksander Alekseev
Re: parse partition strategy string in gram.y
On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote: > Is there any way to get the regression tests diffs from Cirrus CI? > I did not find the diffs in [1]. > > [1] https://cirrus-ci.com/build/4721735111540736 They're called "main". I'm planning on submitting a patch to rename it to "regress", someday. See also: https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com -- Justin
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Shi yu, all > In execReplication.c: > > + TypeCacheEntry **eq = NULL; /* only used when the index is not > unique */ > > Maybe the comment here should be changed. Now it is used when the index is > not > primary key or replica identity index. > > makes sense, updated > 2. > +# wait until the index is created > +$node_subscriber->poll_query_until( > + 'postgres', q{select count(*)=1 from pg_stat_all_indexes where > indexrelname = 'test_replica_id_full_idx';} > +) or die "Timed out while waiting for check subscriber tap_sub_rep_full > updates one row via index"; > > The message doesn't seem right, should it be changed to "Timed out while > waiting for creating index test_replica_id_full_idx"? > yes, updated > > 3. > +# now, ingest more data and create index on column y which has higher > cardinality > +# then create an index on column y so that future commands uses the index > on column > +$node_publisher->safe_psql('postgres', > + "INSERT INTO test_replica_id_full SELECT 50, i FROM > generate_series(0,3100)i;"); > > The comment say "create (an) index on column y" twice, maybe it can be > changed > to: > > now, ingest more data and create index on column y which has higher > cardinality, > so that future commands will use the index on column y > > fixed > 4. > +# deletes 200 rows > +$node_publisher->safe_psql('postgres', > + "DELETE FROM test_replica_id_full WHERE x IN (5, 6);"); > + > +# wait until the index is used on the subscriber > +$node_subscriber->poll_query_until( > + 'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes > where indexrelname = 'test_replica_id_full_idx';} > +) or die "Timed out while waiting for check subscriber tap_sub_rep_full > updates 200 rows via index"; > > It would be better to call wait_for_catchup() after DELETE. (And some other > places in this file.) > Hmm, I cannot follow this easily. Why do you think wait_for_catchup() should be called? In general, I tried to follow a pattern where we call poll_query_until() so that we are sure that all the changes are replicated via the index. And then, an additional check with `is($result, ..` such that we also verify the correctness of the data. One alternative could be to use wait_for_catchup() and then have multiple `is($result, ..` to check both pg_stat_all_indexes and the correctness of the data. One minor advantage I see with the current approach is that every `is($result, ..` adds one step to the test. So, if I use `is($result, ..` for pg_stat_all_indexes queries, then I'd be adding multiple steps for a single test. It felt it is more natural/common to test roughly once with `is($result, ..` on each test. Or, at least do not add additional ones for pg_stat_all_indexes checks. > Besides, the "updates" in the message should be "deletes". > > fixed > 5. > +# wait until the index is used on the subscriber > +$node_subscriber->poll_query_until( > + 'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes > where indexrelname ilike 'users_table_part_%';} > +) or die "Timed out while waiting for check subscriber tap_sub_rep_full > updates partitioned table"; > > Maybe we should say "updates partitioned table with index" in this message. > > Fixed Attached v20. Thanks! Onder KALACI v20_0001_use_index_on_subs_when_pub_rep_ident_full.patch Description: Binary data
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 10/21/22 2:58 AM, Michael Paquier wrote: On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote: Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to implement regexes for databases and roles in hba. It does also contain new regexes related TAP tests and doc updates. Thanks for the updated version. This is really easy to look at now. It relies on the refactoring made in fc579e11c6 (but changes the regcomp_auth_token() parameters so that it is now responsible for emitting the compilation error message (if any), to avoid code duplication in parse_hba_line() and parse_ident_line() for roles, databases and user name mapping). @@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens) [...] - if (!tok->quoted && tok->string[0] == '+') + if (!token_has_regexp(tok)) { Hmm. Do we need token_has_regexp() here for all the cases? We know that the string can begin with a '+', hence it is no regex. The same applies for the case of "all". The remaining case is the one where the user name matches exactly the AuthToken string, which should be last as we want to treat anything beginning with a '/' as a regex. It seems like we could do an order like that? Say: if (!tok->quoted && tok->string[0] == '+') //do else if (token_is_keyword(tok, "all")) //do else if (token_has_regexp(tok)) //do regex compilation, handling failures else if (token_matches(tok, role)) //do exact match The same approach with keywords first, regex, and exact match could be applied as well for the databases? Perhaps it is just mainly a matter of taste, Yeah, I think it is. and it depends on how much you want to prioritize the place of the regex over the rest but that could make the code easier to understand in the long-run and this is a very sensitive code area, And agree that your proposal tastes better ;-): it is easier to understand, v2 attached has been done that way. Compiling the expressions for the user and database lists one-by-one in parse_hba_line() as you do is correct. However there is a gotcha that you are forgetting here: the memory allocations done for the regexp compilations are not linked to the memory context where each line is processed (named hbacxt in load_hba()) and need a separate cleanup. Oops, right, thanks for the call out! In the same fashion as load_ident(), it seems to me that we need two extra things for this patch: - if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go through new_parsed_lines and free for each line the AuthTokens for the database and user lists. - if ok and new_parsed_lines != NIL, the same cleanup needs to happen. Right, but I think that should be "parsed_hba_lines != NIL". My guess is that you could do both the same way as load_ident() does, keeping some symmetry between the two code paths. Right. To avoid code duplication in the !ok/ok cases, the function free_hba_line() has been added in v2: it goes through the list of databases and roles tokens and call free_auth_token() for each of them. Unifying both into a common routine would be sort of annoying as HbaLines uses lists within the lists of parsed lines, and IdentLine can have one at most in each line. I agree, and v2 is not attempting to unify them. For now, I have made your last patch a bit shorter by applying the refactoring of regcomp_auth_token() separately with a few tweaks to the comments. Thanks! v2 attached does apply on top of that. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index c6f1b70fd3..30753003ba 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -234,10 +234,13 @@ hostnogssenc database userPostgreSQL database. - Multiple database names can be supplied by separating them with - commas. A separate file containing database names can be specified by - preceding the file name with @. + a specific PostgreSQL database or a regular + expression preceded by /. + Multiple database names and/or regular expressions preceded by / + can be supplied by separating them with commas. + A separate file containing database names and/or regular expressions preceded + by / can be specified by preceding the file name + with @. @@ -249,18 +252,20 @@ hostnogssenc database userall specifies that it matches all users. Otherwise, this is either the name of a specific - database user, or a group name preceded by +. + database user, a regular expression preceded by /, + or a group name preceded by +. (Recall that there is no real distinction between users and groups in PostgreSQL; a + mark really means match any of the roles that are directly or indirectly members
Re: START_REPLICATION SLOT causing a crash in an assert build
On Thu, Oct 20, 2022 at 6:54 AM Andres Freund wrote: > > Hi, > > On 2022-10-13 15:57:28 +0900, Masahiko Sawada wrote: > > I've attached an updated patch. I've added the common function to > > start pg_recvlogical and wait for it to become active. Please review > > it. > > > +# Start pg_recvlogical process and wait for it to become active. > > +sub start_pg_recvlogical > > +{ > > + my ($node, $slot_name, $create_slot) = @_; > > + > > + my @cmd = ( > > + 'pg_recvlogical', '-S', "$slot_name", '-d', > > + $node->connstr('postgres'), > > + '--start', '--no-loop', '-f', '-'); > > + push @cmd, '--create-slot' if $create_slot; > > + > > + # start pg_recvlogical process. > > + my $pg_recvlogical = IPC::Run::start(@cmd); > > + > > + # Wait for the replication slot to become active. > > + $node->poll_query_until('postgres', > > + "SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE > > slot_name = '$slot_name' AND active_pid IS NOT NULL)" > > + ) or die "slot never became active"; > > + > > + return $pg_recvlogical; > > +} > > Because you never process the output from pg_recvlogical I think this test > will fail if the pipe buffer is small (or more changes are made). I think > either it needs to output to a file, or we need to process the output. Okay, but how can we test this situation? As far as I tested, if we don't specify the redirection of pg_recvlogical's output as above, pg_recvlogical's stdout and stderr are output to the log file. So I could not reproduce the issue you're concerned about. Which pipe do you refer to? > > A file seems the easier solution in this case, we don't really care about what > changes are streamed to the client, right? > > > > +$node = PostgreSQL::Test::Cluster->new('test2'); > > +$node->init(allows_streaming => 'logical'); > > +$node->start; > > +$node->safe_psql('postgres', "CREATE TABLE test(i int)"); > > Why are we creating a new cluster? Initdbs takes a fair bit of time on some > platforms, so this seems unnecessary? Agreed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: parse partition strategy string in gram.y
headerscheck fail, fixed here. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ #error "Operator lives in the wrong universe" ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire) >From c434020fc07616cdd13017135819083186d33256 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 20 Oct 2022 12:29:18 +0200 Subject: [PATCH v3] have gram.y resolve partition strategy names --- src/backend/commands/tablecmds.c | 35 +-- src/backend/parser/gram.y | 22 - src/backend/partitioning/partbounds.c | 27 - src/backend/utils/cache/partcache.c | 6 + src/include/nodes/parsenodes.h| 15 ++-- src/include/partitioning/partbounds.h | 2 +- src/include/utils/partcache.h | 3 ++- 7 files changed, 54 insertions(+), 56 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 20135ef1b0..e07fd747f7 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -605,9 +605,10 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, void *arg); static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); -static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy); +static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec); static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs, - List **partexprs, Oid *partopclass, Oid *partcollation, char strategy); + List **partexprs, Oid *partopclass, Oid *partcollation, + PartitionStrategy strategy); static void CreateInheritance(Relation child_rel, Relation parent_rel); static void RemoveInheritance(Relation child_rel, Relation parent_rel, bool expect_detached); @@ -1122,7 +1123,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, if (partitioned) { ParseState *pstate; - char strategy; int partnatts; AttrNumber partattrs[PARTITION_MAX_KEYS]; Oid partopclass[PARTITION_MAX_KEYS]; @@ -1147,14 +1147,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, * and CHECK constraints, we could not have done the transformation * earlier. */ - stmt->partspec = transformPartitionSpec(rel, stmt->partspec, -); + stmt->partspec = transformPartitionSpec(rel, stmt->partspec); ComputePartitionAttrs(pstate, rel, stmt->partspec->partParams, partattrs, , partopclass, - partcollation, strategy); + partcollation, stmt->partspec->strategy); - StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs, + StorePartitionKey(rel, stmt->partspec->strategy, partnatts, partattrs, + partexprs, partopclass, partcollation); /* make it all visible */ @@ -17132,10 +17132,10 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, /* * Transform any expressions present in the partition key * - * Returns a transformed PartitionSpec, as well as the strategy code + * Returns a transformed PartitionSpec. */ static PartitionSpec * -transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) +transformPartitionSpec(Relation rel, PartitionSpec *partspec) { PartitionSpec *newspec; ParseState *pstate; @@ -17148,21 +17148,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) newspec->partParams = NIL; newspec->location = partspec->location; - /* Parse partitioning strategy name */ - if (pg_strcasecmp(partspec->strategy, "hash") == 0) - *strategy = PARTITION_STRATEGY_HASH; - else if (pg_strcasecmp(partspec->strategy, "list") == 0) - *strategy = PARTITION_STRATEGY_LIST; - else if (pg_strcasecmp(partspec->strategy, "range") == 0) - *strategy = PARTITION_STRATEGY_RANGE; - else - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("unrecognized partitioning strategy \"%s\"", - partspec->strategy))); - /* Check valid number of columns for strategy */ - if (*strategy == PARTITION_STRATEGY_LIST && + if (partspec->strategy == PARTITION_STRATEGY_LIST && list_length(partspec->partParams) != 1) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), @@ -17208,7 +17195,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs, List **partexprs, Oid *partopclass, Oid *partcollation, - char strategy) + PartitionStrategy strategy) { int attn; ListCell *lc; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 737bd2d06d..6ca23f88c4 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@
Re: parse partition strategy string in gram.y
On 2022-Oct-21, Japin Li wrote: > Is there any way to get the regression tests diffs from Cirrus CI? > I did not find the diffs in [1]. I think they should be somewhere in the artifacts, but I'm not sure. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La primera ley de las demostraciones en vivo es: no trate de usar el sistema. Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
Re: parse partition strategy string in gram.y
On Fri, 21 Oct 2022 at 18:12, Alvaro Herrera wrote: > On 2022-Oct-21, Japin Li wrote: > >> Does there an error about forget the LIST partition? > > Of course. > https://cirrus-ci.com/build/4721735111540736 > > This is what you get for moving cases around at the last minute ... > Is there any way to get the regression tests diffs from Cirrus CI? I did not find the diffs in [1]. [1] https://cirrus-ci.com/build/4721735111540736 -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: parse partition strategy string in gram.y
On 2022-Oct-21, Japin Li wrote: > Does there an error about forget the LIST partition? Of course. https://cirrus-ci.com/build/4721735111540736 This is what you get for moving cases around at the last minute ... Fixed, thanks. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ >From dc345f3cb70c335421f29a6867438ed4bb95bd91 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 20 Oct 2022 12:29:18 +0200 Subject: [PATCH v2] have gram.y resolve partition strategy names --- src/backend/commands/tablecmds.c | 35 +-- src/backend/parser/gram.y | 22 - src/backend/partitioning/partbounds.c | 26 src/backend/utils/cache/partcache.c | 6 + src/include/nodes/parsenodes.h| 15 ++-- src/include/partitioning/partbounds.h | 2 +- src/include/utils/partcache.h | 2 +- 7 files changed, 53 insertions(+), 55 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 20135ef1b0..e07fd747f7 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -605,9 +605,10 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, void *arg); static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); -static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy); +static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec); static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs, - List **partexprs, Oid *partopclass, Oid *partcollation, char strategy); + List **partexprs, Oid *partopclass, Oid *partcollation, + PartitionStrategy strategy); static void CreateInheritance(Relation child_rel, Relation parent_rel); static void RemoveInheritance(Relation child_rel, Relation parent_rel, bool expect_detached); @@ -1122,7 +1123,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, if (partitioned) { ParseState *pstate; - char strategy; int partnatts; AttrNumber partattrs[PARTITION_MAX_KEYS]; Oid partopclass[PARTITION_MAX_KEYS]; @@ -1147,14 +1147,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, * and CHECK constraints, we could not have done the transformation * earlier. */ - stmt->partspec = transformPartitionSpec(rel, stmt->partspec, -); + stmt->partspec = transformPartitionSpec(rel, stmt->partspec); ComputePartitionAttrs(pstate, rel, stmt->partspec->partParams, partattrs, , partopclass, - partcollation, strategy); + partcollation, stmt->partspec->strategy); - StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs, + StorePartitionKey(rel, stmt->partspec->strategy, partnatts, partattrs, + partexprs, partopclass, partcollation); /* make it all visible */ @@ -17132,10 +17132,10 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, /* * Transform any expressions present in the partition key * - * Returns a transformed PartitionSpec, as well as the strategy code + * Returns a transformed PartitionSpec. */ static PartitionSpec * -transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) +transformPartitionSpec(Relation rel, PartitionSpec *partspec) { PartitionSpec *newspec; ParseState *pstate; @@ -17148,21 +17148,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) newspec->partParams = NIL; newspec->location = partspec->location; - /* Parse partitioning strategy name */ - if (pg_strcasecmp(partspec->strategy, "hash") == 0) - *strategy = PARTITION_STRATEGY_HASH; - else if (pg_strcasecmp(partspec->strategy, "list") == 0) - *strategy = PARTITION_STRATEGY_LIST; - else if (pg_strcasecmp(partspec->strategy, "range") == 0) - *strategy = PARTITION_STRATEGY_RANGE; - else - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("unrecognized partitioning strategy \"%s\"", - partspec->strategy))); - /* Check valid number of columns for strategy */ - if (*strategy == PARTITION_STRATEGY_LIST && + if (partspec->strategy == PARTITION_STRATEGY_LIST && list_length(partspec->partParams) != 1) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), @@ -17208,7 +17195,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs, List **partexprs, Oid *partopclass, Oid *partcollation, - char strategy) + PartitionStrategy strategy) { int attn; ListCell *lc; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 737bd2d06d..6ca23f88c4 100644
Re: parse partition strategy string in gram.y
On Fri, 21 Oct 2022 at 17:32, Alvaro Herrera wrote: > Hello > > I've had this patch sitting in a local branch for way too long. It's a > trivial thing but for some reason it bothered me: we let the partition > strategy flow into the backend as a string and only parse it into the > catalog 1-char version quite late. > > This patch makes gram.y responsible for parsing it and passing it down > as a value from a new enum, which looks more neat. Because it's an > enum, some "default:" cases can be removed in a couple of places. I > also added a new elog() in case the catalog contents becomes broken. Does there an error about forget the LIST partition? +/* + * Parse a user-supplied partition strategy string into parse node + * PartitionStrategy representation, or die trying. + */ +static PartitionStrategy +parsePartitionStrategy(char *strategy) +{ + if (pg_strcasecmp(strategy, "range") == 0) <-- it should be list + return PARTITION_STRATEGY_RANGE; <-- PARTITION_STRATEGY_LIST + else if (pg_strcasecmp(strategy, "hash") == 0) + return PARTITION_STRATEGY_HASH; + else if (pg_strcasecmp(strategy, "range") == 0) + return PARTITION_STRATEGY_RANGE; + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg("unrecognized partitioning strategy \"%s\"", + strategy))); +} + -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Crash after a call to pg_backup_start()
On Fri, Oct 21, 2022 at 3:10 PM Kyotaro Horiguchi wrote: > It seems to me that the comment is true and the condition is a thinko. Yeah, the two conditions could be both false. How about we update the comment a bit to emphasize this? Such as /* At most one of these conditions can be true */ or /* These conditions can not be both true */ > Please find the attached fix. +1 Thanks Richard
Re: Standby recovers records from wrong timeline
On Fri, 21 Oct 2022 at 11:44, Kyotaro Horiguchi wrote: > > At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi > wrote in > > latest works. It dones't consider the case of explict target timlines > > so it's just a PoC. (So this doesn't work if recovery_target_timeline > > is set to 2 for the "standby" in the repro.) > > So, finally I noticed that the function XLogFileReadAnyTLI is not > needed at all if we are going this direction. > > Regardless of recvoery_target_timeline is latest or any explicit > imeline id or checkpoint timeline, what we can do to reach the target > timline is just to follow the history file's direction. > > If segments are partly gone while reading on a timeline, a segment on > the older timelines is just a crap since it should be incompatible. I came to the same conclusion. I adjusted XLogFileReadAnyTLI to not use any timeline that ends within the segment (attached patch). At this point the name of the function becomes really wrong, XLogFileReadCorrectTLI or something to that effect would be much more descriptive and the code could be simplified. However I'm not particularly happy with this approach as it will not use valid WAL if that is not available. Consider scenario of a cascading failure. Node A has a hard failure, then node B promotes, archives history file, but doesn't see enough traffic to archive a full segment before failing itself. While this is happening we restore node A from backup and start it up as a standby. If node b fails before node A has a chance to connect then either we are continuing recovery on the wrong timeline (current behavior) or we will not try to recover the first portion of the archived WAL file (with patch). So I think the correct approach would still be to have ReadRecord() or ApplyWalRecord() determine that switching timelines is needed. -- Ants Aasma www.cybertec-postgresql.com diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index cb07694aea6..73bde98b920 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -4171,6 +4171,7 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source) { TimeLineHistoryEntry *hent = (TimeLineHistoryEntry *) lfirst(cell); TimeLineID tli = hent->tli; + XLogSegNo beginseg = 0; if (tli < curFileTLI) break;/* don't bother looking at too-old TLIs */ @@ -4181,7 +4182,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source) */ if (hent->begin != InvalidXLogRecPtr) { - XLogSegNo beginseg = 0; XLByteToSeg(hent->begin, beginseg, wal_segment_size); @@ -4223,6 +4223,14 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source) return fd; } } + + /* + * For segments containing known timeline switches only consider the + * last timeline as redo otherwise doesn't know when to switch + * timelines. + */ + if (segno == beginseg && beginseg > 0) + break; } /* Couldn't find it. For simplicity, complain about front timeline */
Re: Standby recovers records from wrong timeline
At Fri, 21 Oct 2022 17:44:40 +0900 (JST), Kyotaro Horiguchi wrote in > At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi > wrote in > > latest works. It dones't consider the case of explict target timlines > > so it's just a PoC. (So this doesn't work if recovery_target_timeline > > is set to 2 for the "standby" in the repro.) > > So, finally I noticed that the function XLogFileReadAnyTLI is not > needed at all if we are going this direction. > > Regardless of recvoery_target_timeline is latest or any explicit > imeline id or checkpoint timeline, what we can do to reach the target > timline is just to follow the history file's direction. > > If segments are partly gone while reading on a timeline, a segment on > the older timelines is just a crap since it should be incompatible. > > So.. I'm at a loss about what the function is for. > > Please anyone tell me why do we need the behavior of > XLogFileReadAnyTLI() at all? It is introduced by 1bb2558046. And the behavior dates back to 2042b3428d. Hmmm.. XLogFileRead() at the time did essentially the same thing to the current XLogFileReadAnyTLI. At that time the expectedTL*I*s contained only timeline IDs. Thus it seems to me, at that time, recovery assumed that it is fine with reading the segment on the greatest available timeline in the TLI list at every mement. (Mmm. I cannot describe this precise enough) In other words it did not intend to use the segments on the older timelines than expected as the replacement of the segment on the correct timelnie. If this is correct (I hople the description above makes sense), now that we can determine the exact TLI to read for the specified segno, we don't need to descend to older timelines. In other words, the current XLogFileReadAnyTLI() should be just XLogFileReadOnHistory(), which reads a segment of the exact timeline calculated from the expectedTLEs and the segno. I'm going to work in this direction. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
parse partition strategy string in gram.y
Hello I've had this patch sitting in a local branch for way too long. It's a trivial thing but for some reason it bothered me: we let the partition strategy flow into the backend as a string and only parse it into the catalog 1-char version quite late. This patch makes gram.y responsible for parsing it and passing it down as a value from a new enum, which looks more neat. Because it's an enum, some "default:" cases can be removed in a couple of places. I also added a new elog() in case the catalog contents becomes broken. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Estoy de acuerdo contigo en que la verdad absoluta no existe... El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama) >From e6067eee889a6636eb013f2f8d85efaf2112232b Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 20 Oct 2022 12:29:18 +0200 Subject: [PATCH] have gram.y resolve partition strategy names --- src/backend/commands/tablecmds.c | 35 +-- src/backend/parser/gram.y | 22 - src/backend/partitioning/partbounds.c | 26 src/backend/utils/cache/partcache.c | 6 + src/include/nodes/parsenodes.h| 15 ++-- src/include/partitioning/partbounds.h | 2 +- src/include/utils/partcache.h | 2 +- 7 files changed, 53 insertions(+), 55 deletions(-) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 20135ef1b0..e07fd747f7 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -605,9 +605,10 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid, void *arg); static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, void *arg); -static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy); +static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec); static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs, - List **partexprs, Oid *partopclass, Oid *partcollation, char strategy); + List **partexprs, Oid *partopclass, Oid *partcollation, + PartitionStrategy strategy); static void CreateInheritance(Relation child_rel, Relation parent_rel); static void RemoveInheritance(Relation child_rel, Relation parent_rel, bool expect_detached); @@ -1122,7 +1123,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, if (partitioned) { ParseState *pstate; - char strategy; int partnatts; AttrNumber partattrs[PARTITION_MAX_KEYS]; Oid partopclass[PARTITION_MAX_KEYS]; @@ -1147,14 +1147,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId, * and CHECK constraints, we could not have done the transformation * earlier. */ - stmt->partspec = transformPartitionSpec(rel, stmt->partspec, -); + stmt->partspec = transformPartitionSpec(rel, stmt->partspec); ComputePartitionAttrs(pstate, rel, stmt->partspec->partParams, partattrs, , partopclass, - partcollation, strategy); + partcollation, stmt->partspec->strategy); - StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs, + StorePartitionKey(rel, stmt->partspec->strategy, partnatts, partattrs, + partexprs, partopclass, partcollation); /* make it all visible */ @@ -17132,10 +17132,10 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid, /* * Transform any expressions present in the partition key * - * Returns a transformed PartitionSpec, as well as the strategy code + * Returns a transformed PartitionSpec. */ static PartitionSpec * -transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) +transformPartitionSpec(Relation rel, PartitionSpec *partspec) { PartitionSpec *newspec; ParseState *pstate; @@ -17148,21 +17148,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy) newspec->partParams = NIL; newspec->location = partspec->location; - /* Parse partitioning strategy name */ - if (pg_strcasecmp(partspec->strategy, "hash") == 0) - *strategy = PARTITION_STRATEGY_HASH; - else if (pg_strcasecmp(partspec->strategy, "list") == 0) - *strategy = PARTITION_STRATEGY_LIST; - else if (pg_strcasecmp(partspec->strategy, "range") == 0) - *strategy = PARTITION_STRATEGY_RANGE; - else - ereport(ERROR, -(errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("unrecognized partitioning strategy \"%s\"", - partspec->strategy))); - /* Check valid number of columns for strategy */ - if (*strategy == PARTITION_STRATEGY_LIST && + if (partspec->strategy == PARTITION_STRATEGY_LIST && list_length(partspec->partParams) != 1) ereport(ERROR, (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), @@ -17208,7 +17195,7 @@
RE: Perform streaming logical transactions by background workers and parallel apply
On Wednesday, October 19, 2022 8:50 PM Kuroda, Hayato/黒田 隼人 wrote: > > === > 01. applyparallelworker.c - SIZE_STATS_MESSAGE > > ``` > /* > * There are three fields in each message received by the parallel apply > * worker: start_lsn, end_lsn and send_time. Because we have updated these > * statistics in the leader apply worker, we can ignore these fields in the > * parallel apply worker (see function LogicalRepApplyLoop). > */ > #define SIZE_STATS_MESSAGE (2 * sizeof(XLogRecPtr) + sizeof(TimestampTz)) > ``` > > According to other comment styles, it seems that the first sentence of the > comment should represent the datatype and usage, not the detailed reason. > For example, about ParallelApplyWorkersList, you said "A list ...". How about > adding like following message: > The message size that can be skipped by parallel apply worker Thanks for the comments, but the current description seems enough to me. > ~~~ > 02. applyparallelworker.c - parallel_apply_start_subtrans > > ``` > if (current_xid != top_xid && > !list_member_xid(subxactlist, current_xid)) ``` > > A macro TransactionIdEquals is defined in access/transam.h. Should we use it, > or is it too trivial? I checked the existing codes, it seems both style are being used. Maybe we can post a separate patch to replace them later. > ~~~ > 08. worker.c - apply_handle_prepare_internal > > Same as above. > > > ~~~ > 09. worker.c - maybe_reread_subscription > > ``` > /* >* Exit if any parameter that affects the remote connection was > changed. >* The launcher will start a new worker. >*/ > if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 || > strcmp(newsub->name, MySubscription->name) != 0 || > strcmp(newsub->slotname, MySubscription->slotname) != 0 || > newsub->binary != MySubscription->binary || > newsub->stream != MySubscription->stream || > strcmp(newsub->origin, MySubscription->origin) != 0 || > newsub->owner != MySubscription->owner || > !equal(newsub->publications, MySubscription->publications)) > { > ereport(LOG, > (errmsg("logical replication apply worker for > subscription \"%s\" will restart because of a parameter change", > MySubscription->name))); > > proc_exit(0); > } > ``` > > When the parallel apply worker has been launched and then the subscription > option has been modified, the same message will appear twice. > But if the option "streaming" is changed from "parallel" to "on", one of them > will not restart again. > Should we modify message? Thanks, it seems a timing problem, if the leader catch the change first and stop the parallel workers, the message will only appear once. But I agree we'd better make the message clear. I changed the message in parallel apply worker. While on it, I also adjusted some other message to include "parallel apply worker" if they are in parallel apply worker. Best regards, Hou zj
Re: Data is copied twice when specifying both child and parent table in publication
Here are my review comments for HEAD patches v13* // Patch HEAD_v13-0001 I already posted some follow-up questions. See [1] / Patch HEAD_v13-0002 1. Commit message The following usage scenarios are not described in detail in the manual: If one subscription subscribes multiple publications, and these publications publish a partitioned table and its partitions respectively. When we specify this parameter on one or more of these publications, which identity and schema should be used to publish the changes? In these cases, I think the parameter publish_via_partition_root behave as follows: ~ It seemed worded a bit strangely. Also, you said "on one or more of these publications" but the examples only show only one publication having 'publish_via_partition_root'. SUGGESTION (I've modified the wording slightly but the examples are unchanged). Assume a subscription is subscribing to multiple publications, and these publications publish a partitioned table and its partitions respectively: [publisher-side] create table parent (a int primary key) partition by range (a); create table child partition of parent default; create publication pub1 for table parent; create publication pub2 for table child; [subscriber-side] create subscription sub connection '' publication pub1, pub2; The manual does not clearly describe the behaviour when the user had specified the parameter 'publish_via_partition_root' on just one of the publications. This patch modifies documentation to clarify the following rules: - If the parameter publish_via_partition_root is specified only in pub1, changes will be published using the identity and schema of the table 'parent'. - If the parameter publish_via_partition_root is specified only in pub2, changes will be published using the identity and schema of the table 'child'. ~~~ 2. - If the parameter publish_via_partition_root is specified only in pub2, changes will be published using the identity and schema of the table child. ~ Is that right though? This rule seems 100% contrary to the meaning of 'publish_via_partition_root=true'. -- 3. doc/src/sgml/ref/create_publication.sgml + + If a root partitioned table is published by any subscribed publications which + set publish_via_partition_root = true, changes on this root partitioned table + (or on its partitions) will be published using the identity and schema of this + root partitioned table rather than that of the individual partitions. + This seems to only describe the first example from the commit message. What about some description to explain the second example? -- [1] https://www.postgresql.org/message-id/CAHut%2BPt%2B1PNx6VsZ-xKzAU-18HmNXhjCC1TGakKX46Wg7YNT1Q%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
Re: Data is copied twice when specifying both child and parent table in publication
On Mon, Oct 17, 2022 at 4:49 PM wangw.f...@fujitsu.com wrote: > > On Wed, Oct 5, 2022 at 11:08 AM Peter Smith wrote: > > Hi Wang-san. Here are my review comments for HEAD_v12-0001 patch. > ... > > > > 3. QUESTION - pg_get_publication_tables / fetch_table_list > > > > When the same table is published by different publications (but there > > are other differences like row-filters/column-lists in each > > publication) the result tuple of this function does not include the > > pubid. Maybe the SQL of pg_publication_tables/fetch_table_list() is OK > > as-is but how does it manage to associate each table with the correct > > tuple? > > > > I know it apparently all seems to work but I’m not how does that > > happen? Can you explain why a puboid is not needed for the result > > tuple of this function? > > Sorry, I am not sure I understand your question. > I try to answer your question by explaining the two functions you mentioned: > > First, the function pg_get_publication_tables gets the list (see table_infos) > that included published table and the corresponding publication. Then based > on this list, the function pg_get_publication_tables returns information > (scheme, relname, row filter and column list) about the published tables in > the > publications list. It just doesn't return pubid. > > Then, the SQL in the function fetch_table_list will get the columns in the > column list from pg_attribute. (This is to return all columns when the column > list is not specified) > I meant, for example, if the different publications specified different col-lists for the same table then IIUC the fetch_table_lists() is going to return 2 list elements (schema,rel_name,row_filter,col_list). But when the schema/rel_name are the same for 2 elements then (without also a pubid) how are you going to know where the list element came from, and how come that is not important? > > ~~ > > > > test_pub=# create table t1(a int, b int, c int); > > CREATE TABLE > > test_pub=# create publication pub1 for table t1(a) where (a > 99); > > CREATE PUBLICATION > > test_pub=# create publication pub2 for table t1(a,b) where (b < 33); > > CREATE PUBLICATION > > > > Following seems OK when I swap orders of publication names... > > > > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual, > > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC > > ARRAY['pub2','pub1']) gpt(relid, attrs, qual); > > relid | attrs | rowfilter > > ---+---+--- > > 16385 | 1 2 | (b < 33) > > 16385 | 1 | (a > 99) > > (2 rows) > > > > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual, > > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC > > ARRAY['pub1','pub2']) gpt(relid, attrs, qual); > > relid | attrs | rowfilter > > ---+---+--- > > 16385 | 1 | (a > 99) > > 16385 | 1 2 | (b < 33) > > (2 rows) > > > > But what about this (this is similar to the SQL fragment from > > fetch_table_list); I swapped the pub names but the results are the > > same... > > > > test_pub=# SELECT pg_get_publication_tables(VARIADIC > > array_agg(p.pubname)) from pg_publication p where pubname > > IN('pub2','pub1'); > > > > pg_get_publication_tables > > > > - > > - > > - > > - > > --- > > (16385,1,"{OPEXPR :opno 521 :opfuncid 147 :opresulttype 16 :opretset > > false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1 > > :vartype 23 :vartypmod -1 :var > > collid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 47} > > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 > > :constbyval true :constisnull false : > > location 51 :constvalue 4 [ 99 0 0 0 0 0 0 0 ]}) :location 49}") > > (16385,"1 2","{OPEXPR :opno 97 :opfuncid 66 :opresulttype 16 > > :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 > > :varattno 2 :vartype 23 :vartypmod -1 :v > > arcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 49} > > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 > > :constbyval true :constisnull false > > :location 53 :constvalue 4 [ 33 0 0 0 0 0 0 0 ]}) :location 51}") > > (2 rows) > > > > test_pub=# SELECT pg_get_publication_tables(VARIADIC > > array_agg(p.pubname)) from pg_publication p where pubname > > IN('pub1','pub2'); > > > > pg_get_publication_tables > > > > - > > - > >
Re: fix stats_fetch_consistency value in postgresql.conf.sample
At Fri, 21 Oct 2022 11:58:15 +1100, Peter Smith wrote in > Hi, I was hoping to use this patch in my other thread [1], but your > latest attachment is reported broken in cfbot [2]. Please rebase it. Ouch. I haven't reach here. I'll do that next Monday. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Standby recovers records from wrong timeline
At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi wrote in > latest works. It dones't consider the case of explict target timlines > so it's just a PoC. (So this doesn't work if recovery_target_timeline > is set to 2 for the "standby" in the repro.) So, finally I noticed that the function XLogFileReadAnyTLI is not needed at all if we are going this direction. Regardless of recvoery_target_timeline is latest or any explicit imeline id or checkpoint timeline, what we can do to reach the target timline is just to follow the history file's direction. If segments are partly gone while reading on a timeline, a segment on the older timelines is just a crap since it should be incompatible. So.. I'm at a loss about what the function is for. Please anyone tell me why do we need the behavior of XLogFileReadAnyTLI() at all? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: cross-platform pg_basebackup
Hi, Patch v2 looks good to me, I have tested it, and pg_basebackup works fine across the platforms (Windows to Linux and Linux to Windows). Syntax used for testing $ pg_basebackup -h remote_server_ip -p 5432 -U user_name -D backup/data -T olddir=newdir I have also tested with non-absolute paths, it behaves as expected. On Fri, Oct 21, 2022 at 12:42 AM Andrew Dunstan wrote: > > On 2022-10-20 Th 14:47, Robert Haas wrote: > > On Thu, Oct 20, 2022 at 1:28 PM Tom Lane wrote: > >> Robert Haas writes: > >>> Cool. Here's a patch. > >> LGTM, except I'd be inclined to ensure that all the macros > >> are function-style, ie > >> > >> +#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch) > >> > >> not just > >> > >> +#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP > >> > >> I don't recall the exact rules, but I know that the second style > >> can lead to expanding the macro in more cases, which we likely > >> don't want. It also seems like better documentation to show > >> the expected arguments. > > OK, thanks. v2 attached. > > > > > Looks good. > > > cheers > > > andrew > > -- > Andrew Dunstan > EDB: https://www.enterprisedb.com > > > > -- Regards, Davinder EnterpriseDB: http://www.enterprisedb.com
Re: Standby recovers records from wrong timeline
At Fri, 21 Oct 2022 16:45:59 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 20 Oct 2022 14:44:40 +0300, Ants Aasma wrote in > > My understanding is that backup archives are supposed to remain valid > > even after PITR or equivalently a lagging standby promoting. > > Sorry, I was dim because of maybe catching a cold:p > > On second thought. everything works fine if the first segment of the > new timeline is archived in this case. So the problem here is whether > recovery should wait for a known new timline when no segment on the > new timeline is available yet. As you say, I think it is sensible > that recovery waits at the divergence LSN for the first segment on the > new timeline before proceeding on the same timeline. It is simpler than anticipated. Just not descending timelines when latest works. It dones't consider the case of explict target timlines so it's just a PoC. (So this doesn't work if recovery_target_timeline is set to 2 for the "standby" in the repro.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index cb07694aea..18c14d1fbf 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -4223,6 +4223,13 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source) return fd; } } + + /* + * If recovery_target_timeline is "latest", we don't want to read this + * segment belongs to older timelines. + */ + if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_LATEST) + break; } /* Couldn't find it. For simplicity, complain about front timeline */
Re: Standby recovers records from wrong timeline
At Thu, 20 Oct 2022 14:44:40 +0300, Ants Aasma wrote in > My understanding is that backup archives are supposed to remain valid > even after PITR or equivalently a lagging standby promoting. Sorry, I was dim because of maybe catching a cold:p On second thought. everything works fine if the first segment of the new timeline is archived in this case. So the problem here is whether recovery should wait for a known new timline when no segment on the new timeline is available yet. As you say, I think it is sensible that recovery waits at the divergence LSN for the first segment on the new timeline before proceeding on the same timeline. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Crash after a call to pg_backup_start()
While playing with a proposed patch, I noticed that a session crashes after a failed call to pg_backup_start(). postgres=# select pg_backup_start(repeat('x', 1026)); ERROR: backup label too long (max 1024 bytes) postgres=# \q > TRAP: failed Assert("during_backup_start ^ (sessionBackupState == > SESSION_BACKUP_RUNNING)"), File: "xlog.c", Line: 8846, PID: 165835 Surprisingly this happens by a series of succussful calls to pg_backup_start and stop. postgres=# select pg_backup_start('x'); postgres=# select pg_backup_top(); postgres=# \q > TRAP: failed Assert("durin.. >> do_pg_abort_backup(int code, Datum arg) > /* Only one of these conditions can be true */ > Assert(during_backup_start ^ > (sessionBackupState == SESSION_BACKUP_RUNNING)); It seems to me that the comment is true and the condition is a thinko. This is introduced by df3737a651 so it is master only. Please find the attached fix. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index dea978a962..888f5b1bff 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8842,8 +8842,8 @@ do_pg_abort_backup(int code, Datum arg) bool during_backup_start = DatumGetBool(arg); /* Only one of these conditions can be true */ - Assert(during_backup_start ^ - (sessionBackupState == SESSION_BACKUP_RUNNING)); + Assert(!(during_backup_start && + (sessionBackupState == SESSION_BACKUP_RUNNING))); if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE) {
Re: Avoid memory leaks during base backups
At Thu, 20 Oct 2022 19:47:07 +0200, Alvaro Herrera wrote in > I agree that's a good idea, and the patch looks good to me, but I don't > think asserting that they are null afterwards is useful. +1 for this direction. And the patch is fine to me. > oldcontext = MemoryContextSwitchTo(backupcontext); > Assert(backup_state == NULL); > Assert(tablespace_map == NULL); > backup_state = (BackupState *) palloc0(sizeof(BackupState)); > tablespace_map = makeStringInfo(); > MemoryContextSwitchTo(oldcontext); We can use MemoryContextAllocZero() for this purpose, but of couse not mandatory. +* across. We keep the memory allocated in this memory context less, +* because any error before reaching pg_backup_stop() can leak the memory +* until pg_backup_start() is called again. While this is not smart, it +* helps to keep things simple. I think the "less" is somewhat obscure. I feel we should be more explicitly. And we don't need to put emphasis on "leak". I recklessly propose this as the draft. "The context is intended to be used by this function to store only session-lifetime values. It is, if left alone, reset at the next call to blow away orphan memory blocks from the previous failed call. While this is not smart, it helps to keep things simple." regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Avoid memory leaks during base backups
On Fri, Oct 21, 2022 at 11:34:27AM +0530, Bharath Rupireddy wrote: > On Fri, Oct 21, 2022 at 12:21 AM Robert Haas wrote: >> On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy >> wrote: >>> I think elsewhere in the code we reset dangling pointers either ways - >>> before or after deleting/resetting memory context. But placing them >>> before would give us extra safety in case memory context >>> deletion/reset fails. Not sure what's the best way. >> >> I think it's OK to assume that deallocating memory will always >> succeed, so it doesn't matter whether you do it just before or just >> after that. But it's not OK to assume that *allocating* memory will >> always succeed. > > Right. To be exact, it seems to me that tablespace_map and backup_state should be reset before deleting backupcontext, but the reset of backupcontext should happen after the fact. + backup_state = NULL; tablespace_map = NULL; These two in pg_backup_start() don't matter, do they? They are reallocated a couple of lines down. +* across. We keep the memory allocated in this memory context less, What does "We keep the memory allocated in this memory context less" mean here? -- Michael signature.asc Description: PGP signature
Re: Avoid memory leaks during base backups
On Thu, Oct 20, 2022 at 02:51:21PM -0400, Robert Haas wrote: > On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy > wrote: >> I think elsewhere in the code we reset dangling pointers either ways - >> before or after deleting/resetting memory context. But placing them >> before would give us extra safety in case memory context >> deletion/reset fails. Not sure what's the best way. > > I think it's OK to assume that deallocating memory will always > succeed, so it doesn't matter whether you do it just before or just > after that. But it's not OK to assume that *allocating* memory will > always succeed. AFAIK, one of the callbacks associated to a memory context could fail, see comments before MemoryContextCallResetCallbacks() in MemoryContextDelete(). I agree that it should not matter here, but I think that it is better to reset the pointers before attempting the deletion of the memory context in this case. -- Michael signature.asc Description: PGP signature
Re: Avoid memory leaks during base backups
On Thu, Oct 20, 2022 at 11:17 PM Alvaro Herrera wrote: > > On 2022-Oct-20, Bharath Rupireddy wrote: > > > I think elsewhere in the code we reset dangling pointers either ways - > > before or after deleting/resetting memory context. But placing them > > before would give us extra safety in case memory context > > deletion/reset fails. Not sure what's the best way. However, I'm > > nullifying the dangling pointers after deleting/resetting memory > > context. > > I agree that's a good idea, and the patch looks good to me, but I don't > think asserting that they are null afterwards is useful. +1. Removed those assertions. Please see the attached v9 patch. On Fri, Oct 21, 2022 at 12:21 AM Robert Haas wrote: > > On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy > wrote: > > I think elsewhere in the code we reset dangling pointers either ways - > > before or after deleting/resetting memory context. But placing them > > before would give us extra safety in case memory context > > deletion/reset fails. Not sure what's the best way. > > I think it's OK to assume that deallocating memory will always > succeed, so it doesn't matter whether you do it just before or just > after that. But it's not OK to assume that *allocating* memory will > always succeed. Right. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v9-0001-Avoid-memory-leaks-during-backups-using-SQL-calla.patch Description: Binary data
Re: making relfilenodes 56 bits
On Thu, Sep 29, 2022 at 09:23:38PM -0400, Tom Lane wrote: > Hmmm ... I'd tend to do SELECT COUNT(*) FROM. But can't we provide > any actual checks on the sanity of the output? I realize that the > output's far from static, but still ... Honestly, checking all the fields is not that exciting, but the maximum I can think of that would be portable enough is something like the attached. No arithmetic operators for xid limits things a bit, but at least that's something. Thoughts? -- Michael diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out index 9f106c2a10..38987e2afc 100644 --- a/src/test/regress/expected/misc_functions.out +++ b/src/test/regress/expected/misc_functions.out @@ -594,3 +594,89 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g; Index Cond: (unique1 = g.g) (4 rows) +-- +-- Test functions for control data +-- +\x +SELECT checkpoint_lsn > '0/0'::pg_lsn AS checkpoint_lsn, +redo_lsn > '0/0'::pg_lsn AS redo_lsn, +redo_wal_file IS NOT NULL AS redo_wal_file, +timeline_id > 0 AS timeline_id, +prev_timeline_id > 0 AS prev_timeline_id, +next_xid IS NOT NULL AS next_xid, +next_oid > 0 AS next_oid, +next_multixact_id != '0'::xid AS next_multixact_id, +next_multi_offset IS NOT NULL AS next_multi_offset, +oldest_xid != '0'::xid AS oldest_xid, +oldest_xid_dbid > 0 AS oldest_xid_dbid, +oldest_active_xid != '0'::xid AS oldest_active_xid, +oldest_multi_xid != '0'::xid AS oldest_multi_xid, +oldest_multi_dbid > 0 AS oldest_multi_dbid, +oldest_commit_ts_xid IS NOT NULL AS oldest_commit_ts_xid, +newest_commit_ts_xid IS NOT NULL AS newest_commit_ts_xid + FROM pg_control_checkpoint(); +-[ RECORD 1 ]+-- +checkpoint_lsn | t +redo_lsn | t +redo_wal_file| t +timeline_id | t +prev_timeline_id | t +next_xid | t +next_oid | t +next_multixact_id| t +next_multi_offset| t +oldest_xid | t +oldest_xid_dbid | t +oldest_active_xid| t +oldest_multi_xid | t +oldest_multi_dbid| t +oldest_commit_ts_xid | t +newest_commit_ts_xid | t + +SELECT max_data_alignment > 0 AS max_data_alignment, +database_block_size > 0 AS database_block_size, +blocks_per_segment > 0 AS blocks_per_segment, +wal_block_size > 0 AS wal_block_size, +max_identifier_length > 0 AS max_identifier_length, +max_index_columns > 0 AS max_index_columns, +max_toast_chunk_size > 0 AS max_toast_chunk_size, +large_object_chunk_size > 0 AS large_object_chunk_size, +float8_pass_by_value IS NOT NULL AS float8_pass_by_value, +data_page_checksum_version >= 0 AS data_page_checksum_version + FROM pg_control_init(); +-[ RECORD 1 ]--+-- +max_data_alignment | t +database_block_size| t +blocks_per_segment | t +wal_block_size | t +max_identifier_length | t +max_index_columns | t +max_toast_chunk_size | t +large_object_chunk_size| t +float8_pass_by_value | t +data_page_checksum_version | t + +SELECT min_recovery_end_lsn >= '0/0'::pg_lsn AS min_recovery_end_lsn, +min_recovery_end_timeline >= 0 AS min_recovery_end_timeline, +backup_start_lsn >= '0/0'::pg_lsn AS backup_start_lsn, +backup_end_lsn >= '0/0'::pg_lsn AS backup_end_lsn, +end_of_backup_record_required IS NOT NULL AS end_of_backup_record_required + FROM pg_control_recovery(); +-[ RECORD 1 ]-+-- +min_recovery_end_lsn | t +min_recovery_end_timeline | t +backup_start_lsn | t +backup_end_lsn| t +end_of_backup_record_required | t + +SELECT pg_control_version > 0 AS pg_control_version, +catalog_version_no > 0 AS catalog_version_no, +system_identifier >= 0 AS system_identifier, +pg_control_last_modified <= now() AS pg_control_last_modified + FROM pg_control_system(); +-[ RECORD 1 ]+-- +pg_control_version | t +catalog_version_no | t +system_identifier| t +pg_control_last_modified | t + diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql index 639e9b352c..986e07c3a5 100644 --- a/src/test/regress/sql/misc_functions.sql +++ b/src/test/regress/sql/misc_functions.sql @@ -223,3 +223,47 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g; EXPLAIN (COSTS OFF) SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g; + +-- +-- Test functions for control data +-- +\x +SELECT checkpoint_lsn > '0/0'::pg_lsn AS checkpoint_lsn, +redo_lsn > '0/0'::pg_lsn AS redo_lsn, +redo_wal_file IS NOT NULL AS redo_wal_file, +timeline_id > 0 AS timeline_id, +prev_timeline_id > 0 AS prev_timeline_id, +next_xid IS NOT NULL AS next_xid, +next_oid > 0 AS next_oid, +next_multixact_id != '0'::xid AS next_multixact_id, +next_multi_offset IS NOT NULL AS next_multi_offset, +oldest_xid !=