doubt about FullTransactionIdAdvance()

2022-10-21 Thread Zhang Mingli
Hi, hackers

I don't quite understand FullTransactionIdAdvance(), correct me if I’m wrong.


/*
 * Advance a FullTransactionId variable, stepping over xids that would appear
 * to be special only when viewed as 32bit XIDs.
 */
static inline void
FullTransactionIdAdvance(FullTransactionId *dest)
{
dest->value++;

/* see FullTransactionIdAdvance() */
if (FullTransactionIdPrecedes(*dest, FirstNormalFullTransactionId))
 return;

while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
 dest->value++;
}

#define XidFromFullTransactionId(x) ((x).value)
#define FullTransactionIdPrecedes(a, b) ((a).value < (b).value)

Can the codes reach line: while (XidFromFullTransactionId(*dest) < 
FirstNormalTransactionId)?
As we will return if (FullTransactionIdPrecedes(*dest, 
FirstNormalFullTransactionId)), and the two judgements seem equal.
Another confusion is the comments: /* see FullTransactionIdAdvance() */, is its 
own  itself.
Could anyone explain this? Thanks in advance.

Regards,
Zhang Mingli


Re: Add 64-bit XIDs into PostgreSQL 15

2022-10-21 Thread Zhang Mingli
Hi,


On Oct 22, 2022, 00:09 +0800, Maxim Orlov , wrote:
> >
> > Done! Thanks! Here is the rebased version.
> >
> > This version has bug fix for multixact replication. Previous versions of 
> > the patch set does not write pd_multi_base in WAL. Thus, this field was set 
> > to 0 upon WAL reply on replica.
> > This caused replica to panic. Fix this by adding pd_multi_base of a page 
> > into WAL. Appropriate tap test is added.
> >
> > Also, add refactoring and improvements in heapam.c in order to reduce diff 
> > and make it more "tidy".
> >
> > Reviews and opinions are very welcome!
> >
> > --
> > Best regards,
> > Maxim Orlov.
Found some outdate code comments around several variables, such as 
xidWrapLimit/xidWarnLimit/xidStopLimt.

These variables are not used any more.

I attach an additional V48-0009 patch as they are just comments, apply it if 
you want to.

Regards,
Zhang Mingli


v48-0009-remove-some-outdate-codes-comments-about-xidWrap.patch
Description: Binary data


Re: Collation version tracking for macOS

2022-10-21 Thread Thomas Munro
On Sat, Oct 22, 2022 at 10:24 AM Thomas Munro  wrote:
> ... But it
> doesn't provide a way for me to create a new database that uses 63 on
> purpose when I know what I'm doing.  There are various reasons I might
> want to do that.

Thinking some more about this, I guess that could be addressed by
having an explicit way to request either the library version or
collversion-style version when creating a database or collation, but
not actually storing it in daticulocale/colliculocale.  That could be
done either as part of the string that is trimmed off before storing
it (so it's only used briefly during creation to find a non-default
library)... Perhaps that'd look like initdb --icu-locale "67:en" (ICU
library version) or "154.14:en" (individual collation version) or some
new syntax in a few places.  Thereafter, it would always be looked up
by searching for the right library by [dat]collversion as Peter E
suggested.

Let me try harder to vocalise some more thoughts that have stopped me
from trying to code the search-by-collversion design so far:

Suppose your pgdata encounters a PostgreSQL linked against a later ICU
library, most likely after an OS upgrade or migratoin, a pg_upgrade,
or via streaming replication.  You might get a new error "can't find
ICU collation 'en' with version '153.14'; HINT: install missing ICU
library version", and somehow you'll have to work out which one might
contain 'en' v153.14 and install it with apt-get etc.  Then it'll
magically work: your postgres linked against (say) 71 will happily
work with the dlopen'd 67.  This is enough if you want to stay on 67
until the heat death of the universe.  So far so good.

Problem 1:  Suppose you're ready to start using (say) v72.  I guess
you'd use the REFRESH command, which would open the main linked ICU's
collversion and stamp that into the catalogue, at which point new
sessions would start using that, and then you'd have to rebuild all
your indexes (with no help from PG to tell you how to find everything
that needs to be rebuilt, as belaboured in previous reverted work).
Aside from the possibility of getting the rebuilding job wrong (as
belaboured elsewhere), it's not great, because there is still a
transitional period where you can be using the wrong version for your
data.  So this requires some careful planning and understanding from
the administrator.

I admit that the upgrade story is a tiny bit better than the v5
DB2-style patch, which starts using the new version immediately if you
didn't use a prefix (and logs the usual warnings about collversion
mismatch) instead of waiting for you to run REFRESH.  But both of them
have a phase where they might use the wrong library to access an
index.  That's dissatisfying, and leads me to prefer the simple
DB2-style solution that at least admits up front that it's not very
clever.  The DB2-style patch could be improved a bit here with the
addition of one more GUC: default_icu_library, so the administrator,
rather than the packager, remains in control of which version we use
for non-prefixed iculocale values (likely to be what almost everyone
is interested in), defaulting to what the packager linked against.
I've added that to the patch for illustration (though obviously the
error messages produced by collversion mismatch could use some
adjustment, ie to clarify that the warning might be cleared by
installing and selecting a different library version).

Problem 2:  If ICU 67 ever decides to report a different version for a
given collation (would it ever do that?  I don't expect so, but ...),
we'd be unable to open the collation with the search-by-collversion
design, and potentially the database.  What is a user supposed to do
then?  Presumably our error/hint for that would be "please insert the
correct ICU library into drive A", but now there is no correct
library; if you can even diagnose what's happened, I guess you might
downgrade the ICU library using package tools or whatever if possible,
but otherwise you'd be stuck, if you just can't get the right library.
Is this a problem?  Would you want to be able to say "I don't care,
computer, please just press on"?  So I think we need a way to turn off
the search-by-collversion thing.  How should it look?

I'd love to hear others' thoughts on how we can turn this into a
workable solution.  Hopefully while staying simple...
From 0355984c9a80ff15bfac51677fea30b9be68226b Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 8 Jun 2022 17:43:53 +1200
Subject: [PATCH v6] WIP: Multi-version ICU.

Add a layer of indirection when accessing ICU, so that multiple major
versions of the library can be used at once.  Versions other than the
one that PostgreSQL was linked against are opened with dlopen(), but we
refuse to open version higher than the one were were compiled against.
The ABI might change in future releases so that wouldn't be safe.

By default, the system linker's default search path is used to find
libraries, but icu_library_path may be used to 

Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

2022-10-21 Thread Peter Geoghegan
On Thu, Oct 20, 2022 at 11:52 AM Peter Geoghegan  wrote:
> Leaving my patch series aside, I still don't think that it makes sense
> to make it impossible to auto-cancel antiwraparound autovacuums,
> across the board, regardless of the underlying table age.

One small thought on the presentation/docs side of this: maybe it
would be better to invent a new kind of autovacuum that has the same
purpose as antiwraparound autovacuum, but goes by a different name,
and doesn't have the special behavior around cancellations. We
wouldn't have to change anything about the behavior of antiwraparound
autovacuum once we reached the point of needing one.

Maybe we wouldn't even need to invent a new user-visible name for this
other kind of autovacuum. While even this so-called "new kind of
autovacuum" will be rare once my main patch series gets in, it'll
still be a totally normal occurrence. Whereas antiwraparound
autovacuums are sometimes described as an emergency mechanism.

That way we wouldn't be fighting against the widely held perception
that antiwraparound autovacuums are scary. In fact that reputation
would be fully deserved, for the first time. There are lots of
problems with the idea that antiwraparound autovacuum is kind of an
emergency thing right now, but this would make things fit the
perception, "fixing" the perception. Antiwraparound autovacuums would
become far far rarer under this scheme, but when they did happen
they'd be clear cause for concern. A useful signal for users, who
should ideally aim to never see *any* antiwraparound autovacuums.

-- 
Peter Geoghegan




Re: Missing update of all_hasnulls in BRIN opclasses

2022-10-21 Thread Tomas Vondra
On 10/21/22 18:44, Tomas Vondra wrote:
> 
> ...
>
>> Apart from that, this patch looks good.
>>

Sadly, I don't think we can fix it like this :-(

The problem is that all ranges start with all_nulls=true, because the
new range gets initialized by brin_memtuple_initialize() like that. But
this happens for *every* range before we even start processing the rows.
So this way all the ranges would end up with has_nulls=true, making that
flag pretty useless.

Actually, even just doing "truncate" on the table creates such all-nulls
range for the first range, and serializes it to disk.

I wondered why we even write such tuples for "empty" ranges to disk, for
example after "TRUNCATE" - the table is empty by definition, so how come
we write all-nulls brin summary for the first range?

For example brininsert() checks if the brin tuple was modified and needs
to be written back, but brinbuild() just ignores that, and initializes
(and writes) writes the tuple to disk anyway. I think we should not do
that - there should be a flag in BrinBuildState, tracking if the BRIN
tuple was modified, and we should only write it if it's true.

That means we should never get an on-disk summary representing nothing.

That doesn't fix the issue, though, because we still need to pass the
memtuple tuple to the add_value opclass procedure, and whether it sets
the has_nulls flag depends on whether it's a new tuple representing no
other rows (in which case has_nulls remains false) or whether it was
read from disk (in which case it needs to be flipped to 'true').

But the opclass has no way to tell the difference at the moment - it
just gets the BrinMemTuple. So we'd have to extend this, somehow.

I wonder how to do this in a back-patchable way - we can't add
parameters to the opclass procedure, and the other solution seems to be
storing it right in the BrinMemTuple, somehow. But that's likely an ABI
break :-(

The only solution I can think of is actually passing it using all_nulls
and has_nulls - we could set both flags to true (which we never do now)
and teach the opclass that it signifies "empty" (and thus not to update
has_nulls after resetting all_nulls).

Something like the attached (I haven't added any more tests, not sure
what would those look like - I can't think of a query testing this,
although maybe we could check how the flags change using pageinspect).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom a99fd6a737cec24bb4063e99a241ff3e04c6ebb8 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Thu, 20 Oct 2022 19:55:23 +0200
Subject: [PATCH 1/9] fixup: brin has_nulls

---
 src/backend/access/brin/brin_bloom.c| 1 +
 src/backend/access/brin/brin_inclusion.c| 1 +
 src/backend/access/brin/brin_minmax.c   | 1 +
 src/backend/access/brin/brin_minmax_multi.c | 1 +
 4 files changed, 4 insertions(+)

diff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index 6b0af7267d5..60315450b41 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -539,6 +539,7 @@ brin_bloom_add_value(PG_FUNCTION_ARGS)
 			BloomGetFalsePositiveRate(opts));
 		column->bv_values[0] = PointerGetDatum(filter);
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		updated = true;
 	}
 	else
diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index 4b02d374f23..e0f44d3e62c 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -164,6 +164,7 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS)
 		column->bv_values[INCLUSION_UNMERGEABLE] = BoolGetDatum(false);
 		column->bv_values[INCLUSION_CONTAINS_EMPTY] = BoolGetDatum(false);
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		new = true;
 	}
 
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 9e8a8e056cc..8a5661a8952 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -90,6 +90,7 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
 		column->bv_values[0] = datumCopy(newval, attr->attbyval, attr->attlen);
 		column->bv_values[1] = datumCopy(newval, attr->attbyval, attr->attlen);
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		PG_RETURN_BOOL(true);
 	}
 
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 9a0bcf6698d..4e7119e2d78 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2500,6 +2500,7 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
 		MemoryContextSwitchTo(oldctx);
 
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		modified = true;
 
 		column->bv_mem_value = PointerGetDatum(ranges);
-- 
2.37.3

From 57e53d34f2f7bba91fcc0de6f4eff551669554fb Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 22 Oct 2022 02:26:48 

Re: Pluggable toaster

2022-10-21 Thread Nikita Malakhov
Hi!

Aleksander, we have had this in mind while developing this feature, and
have checked it. Just a slight modification is needed
to make it work with Pluggable Storage (Access Methods) API.

On Fri, Oct 21, 2022 at 4:01 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> > Here's rebased patch:
> > v23-0001-toaster-interface.patch - Pluggable TOAST API interface with
> reference (original) TOAST mechanics - new API is introduced but
> > reference TOAST is still left unchanged;
> > v23-0002-toaster-default.patch - Default TOAST mechanics is
> re-implemented using TOAST API and is plugged in as Default Toaster;
> > v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package
>
> Thanks for keeping the patch up to date.
>
> As I recall one of the open questions was: how this feature is
> supposed to work with table access methods? Could you please summarize
> what the current consensus is in this respect?
>
> --
> Best regards,
> Aleksander Alekseev
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/


Re: Collation version tracking for macOS

2022-10-21 Thread Thomas Munro
Hi,

Here is a rebase of this experimental patch.  I think the basic
mechanics are promising, but we haven't agreed on a UX.  I hope we can
figure this out.

Restating the choice made in this branch of the experiment:  Here I
try to be just like DB2 (if I understood its manual correctly).
In DB2, you can use names like "en_US" if you don't care about
changes, and names like "CLDR181_en_US" if you do.  It's the user's
choice to use the second kind to avoid "unexpected effects on
applications or database objects" after upgrades.  Translated to
PostgreSQL concepts, you can use a database default ICU locale like
"en-US" if you don't care and "67:en-US" if you do, and for COLLATION
objects it's the same.  The convention I tried in this patch is that
you use either "en-US-x-icu" (which points to "en-US") or
"en-US-x-icu67" (which points to "67:en-US") depending on whether you
care about this problem.

I recognise that this is a bit cheesy, it's all the user's problem to
deal with or ignore.

An alternative mentioned by Peter E was that the locale names
shouldn't carry the prefix, but somehow we should have a list of ICU
versions to search for a matching datcollversion/collversion.  How
would that look?  Perhaps a GUC, icu_library_versions = '63, 67, 71'?
There is a currently natural and smallish range of supported versions,
probably something like 54 ... U_ICU_VERSION_MAJOR_NUM, but it seems a
bit weird to try to dlopen ~25 libraries or whatever it might be...
Do you think we should try to code this up?

I haven't tried it, but the main usability problem I predict with that
idea is this:  It can cope with a scenario where you created a
database with ICU 63 and started using a default of "en" and maybe
some explicit fr-x-icu or whatever, and then you upgrade to a new
postgres binary using ICU 71, and, as long as you still have ICU 63
installed it'll just magicaly keep using 63, now via dlopen().  But it
doesn't provide a way for me to create a new database that uses 63 on
purpose when I know what I'm doing.  There are various reasons I might
want to do that.

Maybe the ideas could be combined?  Perhaps "en" means "create using
binary's linked ICU, open using search-by-collversion", while "67:en"
explicitly says which to use?

Changes since last version:

 * Now it just uses the default dlopen() search path, unless you set
icu_library_path.  Is that a security problem?  It's pretty
convenient, because it means you can just "apt-get install libicu63"
(or local equivalent) and that's all, now 63 is available.

 * To try the idea out, I made it automatically create "*-x-icu67"
alongside the regular "-x-icu" collation objects at initdb time.
From d3e83d0aa5cbb3eb192a2f66d68623cd3b1595b4 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Wed, 8 Jun 2022 17:43:53 +1200
Subject: [PATCH v5] WIP: Multi-version ICU.

Add a layer of indirection when accessing ICU, so that multiple major
versions of the library can be used at once.  Versions other than the
one that PostgreSQL was linked against are opened with dlopen(), but we
refuse to open version higher than the one were were compiled against.
The ABI might change in future releases so that wouldn't be safe.

By default, the system linker's default search path is used to find
libraries, but icu_library_path may be used to specify an absolute path
to look in.  ICU libraries are expected to have been built without ICU's
--disable-renaming option.  That is, major versions must use distinct
symbol names.

This arrangement means that at least one major version of ICU is always
available -- the one that PostgreSQL was linked again.  It should be
simple on most software distributions to install extra versions using a
package manager, or to build extra libraries as required, to access
older ICU releases.  For example, on Debian bullseye the packages are
named libicu63, libicu67, libicu71.

In this version of the patch, '63:en' used as a database default locale
or COLLATION object requests ICU library 63, and 'en' requests the
library that is linked against the postgres executable.

XXX Many other designs possible, to discuss!

Discussion: https://postgr.es/m/CA%2BhUKGL4VZRpP3CkjYQkv4RQ6pRYkPkSNgKSxFBwciECQ0mEuQ%40mail.gmail.com
---
 src/backend/access/hash/hashfunc.c   |  16 +-
 src/backend/commands/collationcmds.c |  20 ++
 src/backend/utils/adt/formatting.c   |  53 +++-
 src/backend/utils/adt/pg_locale.c| 364 ++-
 src/backend/utils/adt/varchar.c  |  16 +-
 src/backend/utils/adt/varlena.c  |  56 +++--
 src/backend/utils/misc/guc_tables.c  |  14 ++
 src/include/utils/pg_locale.h|  73 ++
 src/tools/pgindent/typedefs.list |   3 +
 9 files changed, 549 insertions(+), 66 deletions(-)

diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index b57ed946c4..0a61538efd 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -298,11 +298,11 @@ hashtext(PG_FUNCTION_ARGS)
 

Multiple grouping set specs referencing duplicate alias

2022-10-21 Thread David Kimura
Hi all!

I think I may have stumbled across a case of wrong results on HEAD (same
through version 9.6, though interestingly 9.5 produces different results
altogether).

test=# SELECT i AS ai1, i AS ai2 FROM generate_series(1,3)i GROUP BY
ai2, ROLLUP(ai1) ORDER BY ai1, ai2;

 ai1 | ai2
-+-
   1 |   1
 |   1
   2 |   2
 |   2
   3 |   3
 |   3
(6 rows)

I had expected:

 ai1 | ai2
-+-
   1 |   1
   2 |   2
   3 |   3
 |   1
 |   2
 |   3
(6 rows)

It seems to me that the plan is missing a Sort node (on ai1 and ai2) above the
Aggregate node.

   QUERY PLAN

 GroupAggregate
   Group Key: i, i
   Group Key: i
   ->  Sort
 Sort Key: i
 ->  Function Scan on generate_series i

I have a hunch part of the issue may be an assumption that a duplicate aliased
column will produce the same column values and hence isn't included in the
range table, nor subsequently the pathkeys. However, that assumption does not
seem to be true for queries with multiple grouping set specifications:

test=#  SELECT i as ai1, i as ai2 FROM (values (1),(2),(3)) v(i) GROUP
BY ai1, ROLLUP(ai2);
 ai1 | ai2
-+-
   1 |   1
   2 |   2
   3 |   3
   1 |
   2 |
   3 |
(6 rows)

It seems to me that excluding the duplicate alias from the pathkeys is leading
to a case where the group order is incorrectly determined to satisfy the sort
order. Thus create_ordered_paths() decides against applying an explicit sort
node. But simply forcing an explicit sort still seems wrong since we've
effectively lost a relevant column for the sort.

I tinkered a bit and hacked together an admittedly ugly patch that triggers an
explicit sort constructed from the parse tree. An alternative approach I had
considered was to update the rewriteHandler to explicitly force the existence of
the duplicate alias column in the range tables. But that also felt meh.

Does this seem like a legit issue? And if so, any thoughts on alternative
approaches?

Thanks,
David Kimura
commit 7e7b9a0dce1ba60f4fc087082f04b04e7f405539
Author: David Kimura 
Date:   Wed Oct 19 21:32:27 2022 +

Fix multiple grouping specs using duplicate alias bug

There seems to have been an assumption that multiple aliases that refer
to the same column would always produce mirrored results. However, that
does not seem to be the case when the aliased columns are passed through
different grouping set specs. Consider the case:

  SELECT i AS alias1, i AS alias2 FROM generate_series(1,3)i GROUP BY alias2, ROLLUP(alias1);

This led to a scenario where a relevant column is collapsed at parse
time and thus excluded from sort pathkeys. Consequently group order
could then incorrectly decide it satisfied the sort order. Which leads
to create_ordered_paths() forgoing an explicit sort node.

A solution to the issue is to force an explicit sort and revive the
relevant column info.

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce9003..0b81d4d211 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2157,6 +2157,46 @@ change_plan_targetlist(Plan *subplan, List *tlist, bool tlist_parallel_safe)
 	return subplan;
 }
 
+/*
+ * reevaluate_sort_info
+ *
+ * Build the sort node info from the parse tree.
+ */
+static void
+reevaluate_sort_info(Query *parse, Sort *plan)
+{
+	int			i = 0;
+	ListCell   *l1;
+	ListCell   *l2;
+
+	plan->numCols = list_length(parse->sortClause);
+
+	plan->sortColIdx = palloc(sizeof(int) * plan->numCols);
+	plan->sortOperators = palloc(sizeof(int) * plan->numCols);
+	plan->collations = palloc(sizeof(int) * plan->numCols);
+	plan->nullsFirst = palloc(sizeof(int) * plan->numCols);
+
+	foreach(l1, parse->sortClause)
+	{
+		foreach(l2, parse->targetList)
+		{
+			SortGroupClause *sortClause = (SortGroupClause *) lfirst(l1);
+			TargetEntry *targetEntry = (TargetEntry *) lfirst(l2);
+
+			if (sortClause->tleSortGroupRef == targetEntry->ressortgroupref)
+			{
+plan->sortColIdx[i] = targetEntry->resno;
+plan->sortOperators[i] = sortClause->sortop;
+plan->collations[i] = exprCollation((Node *) targetEntry->expr);
+plan->nullsFirst[i] = sortClause->nulls_first;
+
+i++;
+break;
+			}
+		}
+	}
+}
+
 /*
  * create_sort_plan
  *
@@ -2187,6 +2227,9 @@ create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
    IS_OTHER_REL(best_path->subpath->parent) ?
    best_path->path.parent->relids : NULL);
 
+	if (best_path->need_evaluation)
+		reevaluate_sort_info(root->parse, plan);
+
 	copy_generic_path_info(>plan, (Path *) best_path);
 
 	return plan;
@@ -2213,6 +2256,9 @@ create_incrementalsort_plan(PlannerInfo *root, IncrementalSortPath *best_path,
 			  best_path->spath.path.parent->relids : NULL,
 			  best_path->nPresortedCols);
 
+	if 

Re: refactor ownercheck and aclcheck functions

2022-10-21 Thread Peter Eisentraut

On 20.10.22 01:24, Corey Huinker wrote:
I'd be inclined to remove the highly used ones as well. That way the 
codebase would have more examples of object_ownercheck() for readers to 
see. Seeing the existence of pg_FOO_ownercheck implies that a 
pg_BAR_ownercheck might exist, and if BAR is missing they might be 
inclined to re-add it.


We do have several ownercheck and aclcheck functions that can't be 
refactored into this framework right now, so we do have to keep some 
special-purpose functions around anyway.  I'm afraid converting all the 
callers would blow up this patch quite a bit, but it could be done as a 
follow-up patch.


If we do keep them, would it make sense to go the extra step and turn 
the remaining six "regular" into static inline functions or even #define-s?


That could make sense.





patch suggestion: Fix citext_utf8 test's "Turkish I" with ICU collation provider

2022-10-21 Thread Anton Voloshin

Hello, hackers.

In current master, as well as in REL_15_STABLE, installcheck in 
contrib/citext fails in most locales, if we use ICU as a locale provider:


$ rm -fr data; initdb --locale-provider icu --icu-locale en-US -D data 
&& pg_ctl -D data -l logfile start && make -C contrib/citext 
installcheck; pg_ctl -D data stop; cat contrib/citext/regression.diffs

...
test citext   ... ok  457 ms
test citext_utf8  ... FAILED   21 ms
...
diff -u 
/home/ashutosh/pg/REL_15_STABLE/contrib/citext/expected/citext_utf8.out 
/home/ashutosh/pg/REL_15_STABLE/contrib/citext/results/citext_utf8.out
--- 
/home/ashutosh/pg/REL_15_STABLE/contrib/citext/expected/citext_utf8.out 
   2022-07-14 17:45:31.747259743 +0300
+++ 
/home/ashutosh/pg/REL_15_STABLE/contrib/citext/results/citext_utf8.out 
   2022-10-21 19:43:21.146044062 +0300

@@ -54,7 +54,7 @@
 SELECT 'i'::citext = 'İ'::citext AS t;
  t
 ---
- t
+ f
 (1 row)

The reason is that in ICU lowercasing Unicode symbol "İ" (U+0130
"LATIN CAPITAL LETTER I WITH DOT ABOVE") can give two valid results:
- "i", i.e. "U+0069 LATIN SMALL LETTER I" in "tr" and "az" locales.
- "i̇", i.e. "U+0069 LATIN SMALL LETTER I" followed by "U+0307 COMBINING
  DOT ABOVE" in all other locales I've tried (including "en-US", "de",
  "ru", etc).
And the way this test is currently written only accepts plain latin "i", 
which might be true in glibc, but is not so in ICU. Verified on ICU 
70.1, but I've seen this on few other ICU versions as well, so I think 
this is probably an ICU's feature, not a bug(?).


Since we probably want installcheck in contrib/citext to pass on
databases with various locales, including reasonable ICU-based ones,
I suggest to fix this test by accepting either of outputs as valid.

I can see two ways of doing that:
1. change SQL in the test to use "IN" instead of "=";
2. add an alternative output.

I think in this case "IN" is better, because that allows a single 
comment to address both possible outputs and to avoid unnecessary 
duplication.


I've attached a patch authored mostly by my colleague, Roman Zharkov, as 
one possible fix.


Only versions 15+ are affected.

Any comments?

--
Anton Voloshin
Postgres Professional, The Russian Postgres Company
https://postgrespro.ruFrom 5cfbb59a11d099fa9b8703502fd8aac936a02c4d Mon Sep 17 00:00:00 2001
From: Roman Zharkov 
Date: Fri, 21 Oct 2022 19:56:19 +0300
Subject: [PATCH] Fix citext_utf8 test's "Turkish I" with ICU collation
 provider
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

With the ICU collation provider the Turkish unicode symbol "İ" (U+0130
"LATIN CAPITAL LETTER I WITH DOT ABOVE") has two lowercase variants:
- "i", i.e. "U+0069 LATIN SMALL LETTER I", in "tr" and "az" locales.
- "i̇", i.e. "U+0069 LATIN SMALL LETTER I" followed by "U+0307 COMBINING
  DOT ABOVE" in all other locales I've tried (including "en-US", "de",
  "ru", etc).

So, add both variants to the test.
---
 contrib/citext/expected/citext_utf8.out | 6 --
 contrib/citext/sql/citext_utf8.sql  | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/contrib/citext/expected/citext_utf8.out b/contrib/citext/expected/citext_utf8.out
index 666b07ccec4..62f0794028f 100644
--- a/contrib/citext/expected/citext_utf8.out
+++ b/contrib/citext/expected/citext_utf8.out
@@ -48,10 +48,12 @@ SELECT 'Ä'::citext <> 'Ä'::citext AS t;
  t
 (1 row)
 
--- Test the Turkish dotted I. The lowercase is a single byte while the
+-- Test the Turkish dotted I. The lowercase might be a single byte while the
 -- uppercase is multibyte. This is why the comparison code can't be optimized
 -- to compare string lengths.
-SELECT 'i'::citext = 'İ'::citext AS t;
+-- Note that lower('İ') is 'i' (U+0069) in tr and az locales,
+-- but 'i̇' (U+0069 U+0307) in C and most (all?) other locales.
+SELECT 'İ'::citext in ('i'::citext, 'i̇'::citext) AS t;
  t 
 ---
  t
diff --git a/contrib/citext/sql/citext_utf8.sql b/contrib/citext/sql/citext_utf8.sql
index d068000b423..942daa9ce50 100644
--- a/contrib/citext/sql/citext_utf8.sql
+++ b/contrib/citext/sql/citext_utf8.sql
@@ -24,10 +24,12 @@ SELECT 'À'::citext <> 'B'::citext AS t;
 SELECT 'Ä'::text   <> 'Ä'::text   AS t;
 SELECT 'Ä'::citext <> 'Ä'::citext AS t;
 
--- Test the Turkish dotted I. The lowercase is a single byte while the
+-- Test the Turkish dotted I. The lowercase might be a single byte while the
 -- uppercase is multibyte. This is why the comparison code can't be optimized
 -- to compare string lengths.
-SELECT 'i'::citext = 'İ'::citext AS t;
+-- Note that lower('İ') is 'i' (U+0069) in tr and az locales,
+-- but 'i̇' (U+0069 U+0307) in C and most (all?) other locales.
+SELECT 'İ'::citext in ('i'::citext, 'i̇'::citext) AS t;
 
 -- Regression.
 SELECT 'láska'::citext <> 'laská'::citext AS t;
-- 
2.38.1



Re: Missing update of all_hasnulls in BRIN opclasses

2022-10-21 Thread Tomas Vondra



On 10/21/22 17:50, Matthias van de Meent wrote:
> On Fri, 21 Oct 2022 at 17:24, Tomas Vondra
>  wrote:
>>
>> Hi,
>>
>> While working on some BRIN code, I discovered a bug in handling NULL
>> values - when inserting a non-NULL value into a NULL-only range, we
>> reset the all_nulls flag but don't update the has_nulls flag. And
>> because of that we then fail to return the range for IS NULL ranges.
> 
> Ah, that's bad.
> 

Yeah, I guess we'll need to inform the users to consider rebuilding BRIN
indexes on NULL-able columns.

> One question though: doesn't (shouldn't?) column->bv_allnulls already
> imply column->bv_hasnulls? The column has nulls if all of the values
> are null, right? Or is the description of the field deceptive, and
> does bv_hasnulls actually mean "has nulls bitmap"?
> 

What null bitmap do you mean? We're talking about summary for a page
range - IIRC we translate this to nullbitmap for a BRIN tuple, but there
may be multiple columns, and "has nulls bitmap" is an aggregate over all
of them.

Yeah, maybe it'd make sense to also have has_nulls=true whenever
all_nulls=true, and maybe it'd be simpler because it'd be enough to
check just one flag in consistent function etc. But we still need to
track 2 different states - "has nulls" and "has summary".

In any case, this ship sailed long ago - at least for the existing
opclasses.


>> Attached is a patch fixing this by properly updating the has_nulls flag.
> 
> One comment on the patch:
> 
>> +SET enable_seqscan = off;
>> + [...]
>> +SET enable_seqscan = off;
> 
> Looks like duplicated SETs. Should that last one be RESET instead?
> 

Yeah, should have been RESET.

> Apart from that, this patch looks good.
> 

Thanks!

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: thinko in basic_archive.c

2022-10-21 Thread Bharath Rupireddy
On Fri, Oct 21, 2022 at 10:43 AM Kyotaro Horiguchi
 wrote:
>
> Thanks, but we don't need to wipe out the all bytes. Just putting \0
> at the beginning of the buffer is sufficient.

Nah, that's not a clean way IMO.

> And the Memset() at the
> beginning of basic_archive_file_internal is not needed since that
> static variables are initially initialized to zeros.

Removed. MemSet() after durable_rename() would be sufficient.

> This is not necessarily needed, but it might be better we empty
> tempfilepath after unlinking the file.

I think it's not necessary as the archiver itself is shutting down and
I don't think the server calls the shutdown callback twice. However,
if we want basic_archive_shutdown() to be more protective against
multiple calls (for any reason that we're missing), we can have a
static local variable to quickly exit if the callback is already
called. instead of MemSet(), but that's not needed I guess.

> +  expectation that a value will soon be provided. Care must be taken when
> +  multiple servers are archiving to the same
> +  basic_archive.archive_library directory as they all
> +  might try to archive the same WAL file.
>
> I don't understand what kind of care should be taken by reading this..

It's just a notice, however I agree with you that it may be confusing.
I've removed it.

Please review the attached v4 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v4-0001-Remove-leftover-temporary-files-via-basic_archive.patch
Description: Binary data


Re: Missing update of all_hasnulls in BRIN opclasses

2022-10-21 Thread Matthias van de Meent
On Fri, 21 Oct 2022 at 17:24, Tomas Vondra
 wrote:
>
> Hi,
>
> While working on some BRIN code, I discovered a bug in handling NULL
> values - when inserting a non-NULL value into a NULL-only range, we
> reset the all_nulls flag but don't update the has_nulls flag. And
> because of that we then fail to return the range for IS NULL ranges.

Ah, that's bad.

One question though: doesn't (shouldn't?) column->bv_allnulls already
imply column->bv_hasnulls? The column has nulls if all of the values
are null, right? Or is the description of the field deceptive, and
does bv_hasnulls actually mean "has nulls bitmap"?

> Attached is a patch fixing this by properly updating the has_nulls flag.

One comment on the patch:

> +SET enable_seqscan = off;
> + [...]
> +SET enable_seqscan = off;

Looks like duplicated SETs. Should that last one be RESET instead?

Apart from that, this patch looks good.

- Matthias




Re: Avoid memory leaks during base backups

2022-10-21 Thread Bharath Rupireddy
On Fri, Oct 21, 2022 at 11:58 AM Michael Paquier  wrote:
>
> To be exact, it seems to me that tablespace_map and backup_state
> should be reset before deleting backupcontext, but the reset of
> backupcontext should happen after the fact.
>
> +   backup_state = NULL;
> tablespace_map = NULL;
> These two in pg_backup_start() don't matter, do they?  They are
> reallocated a couple of lines down.

After all, that is what is being discussed here; what if palloc down
below fails and they're not reset to NULL after MemoryContextReset()?

> +* across. We keep the memory allocated in this memory context less,
> What does "We keep the memory allocated in this memory context less"
> mean here?

We try to keep it less because we don't want to allocate more memory
and leak it unless pg_start_backup() is called again. Please read the
description. I'll leave it to the committer's discretion whether to
have that part or remove it.

On Fri, Oct 21, 2022 at 12:11 PM Kyotaro Horiguchi
 wrote:
>
>
> +* across. We keep the memory allocated in this memory context less,
> +* because any error before reaching pg_backup_stop() can leak the 
> memory
> +* until pg_backup_start() is called again. While this is not smart, 
> it
> +* helps to keep things simple.
>
> I think the "less" is somewhat obscure.  I feel we should be more
> explicitly. And we don't need to put emphasis on "leak".  I recklessly
> propose this as the draft.

I tried to put it simple, please see the attached v10. I'll leave it
to the committer's discretion for better wording.

On Fri, Oct 21, 2022 at 7:47 PM Robert Haas  wrote:
>
> On Fri, Oct 21, 2022 at 2:18 AM Michael Paquier  wrote:
> > AFAIK, one of the callbacks associated to a memory context could
> > fail, see comments before MemoryContextCallResetCallbacks() in
> > MemoryContextDelete().  I agree that it should not matter here, but I
> > think that it is better to reset the pointers before attempting the
> > deletion of the memory context in this case.
>
> I think this is nitpicking. There's no real danger here, and if there
> were, the error handling would have to take it into account somehow,
> which it doesn't.
>
> I'd probably do it before resetting the context as a matter of style,
> to make it clear that there's no window in which the pointers are set
> but referencing invalid memory. But I do not think it makes any
> practical difference at all.

Please see the attached v10.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 3d8cf10d0bab48afee639359669b69c2901afd34 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 21 Oct 2022 15:04:36 +
Subject: [PATCH v10] Avoid memory leaks during backups using SQL-callable
 functions

Any failure while taking backups using SQL-callable functions can
leak the memory allocated for backup_state and tablespace_map, as
they both need to be long-lived, they are being created in
TopMemoryContext.

To fix the memory leak problem, we create a special session-level
memory context as a direct child of TopMemoryContext so that the
memory allocated is carried across. We delete the memory context at
the end of pg_backup_stop(). We keep the memory allocated in this
memory context less, because any error before reaching
pg_backup_stop() can still leak the memory until pg_backup_start()
is called again. While this is not smart, it helps to keep things
simple.

Author: Bharath Rupireddy
Reviewed-by: Robert Haas, Alvaro Herrera
Reviewed-by: Cary Huang, Michael Paquier
Discussion: https://www.postgresql.org/message-id/CALj2ACXqvfKF2B0beQ=aJMdWnpNohmBPsRg=EDQj_6y1t2O8mQ@mail.gmail.com
---
 src/backend/access/transam/xlogfuncs.c | 42 ++
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index a801a94fe8..e9ec86316b 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -45,6 +45,9 @@
 static BackupState *backup_state = NULL;
 static StringInfo tablespace_map = NULL;
 
+/* A long-lived workspace for SQL-callable backup functions. */
+static MemoryContext backupcontext = NULL;
+
 /*
  * pg_backup_start: set up for taking an on-line backup dump
  *
@@ -72,27 +75,27 @@ pg_backup_start(PG_FUNCTION_ARGS)
 
 	/*
 	 * backup_state and tablespace_map need to be long-lived as they are used
-	 * in pg_backup_stop().
+	 * in pg_backup_stop(). We create a special session-level memory context as
+	 * a direct child of TopMemoryContext so that the memory allocated is
+	 * carried across. We typically delete the memory context at the end of
+	 * pg_backup_stop(), however, an error before it can leak the memory until
+	 * pg_backup_start() is called again. Hence, we try to keep the memory
+	 * allocated in this memory context less. While this is not smart, it helps
+	 * to keep things simple.
 	 */
-	

Missing update of all_hasnulls in BRIN opclasses

2022-10-21 Thread Tomas Vondra
Hi,

While working on some BRIN code, I discovered a bug in handling NULL
values - when inserting a non-NULL value into a NULL-only range, we
reset the all_nulls flag but don't update the has_nulls flag. And
because of that we then fail to return the range for IS NULL ranges.

Reproducing this is trivial:

  create table t (a int);
  create index on t using brin (a);
  insert into t values (null);
  insert into t values (1);

  set enable_seqscan  = off;
  select * from t where a is null;

This should return 1 row, but actually it returns no rows.

Attached is a patch fixing this by properly updating the has_nulls flag.

I reproduced this all the way back to 9.5, so it's a long-standing bug.
It's interesting no one noticed / reported it so far, it doesn't seem
like a particularly rare corner case.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index 6b0af7267d5..60315450b41 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -539,6 +539,7 @@ brin_bloom_add_value(PG_FUNCTION_ARGS)
 			BloomGetFalsePositiveRate(opts));
 		column->bv_values[0] = PointerGetDatum(filter);
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		updated = true;
 	}
 	else
diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index 4b02d374f23..e0f44d3e62c 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -164,6 +164,7 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS)
 		column->bv_values[INCLUSION_UNMERGEABLE] = BoolGetDatum(false);
 		column->bv_values[INCLUSION_CONTAINS_EMPTY] = BoolGetDatum(false);
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		new = true;
 	}
 
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 9e8a8e056cc..8a5661a8952 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -90,6 +90,7 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
 		column->bv_values[0] = datumCopy(newval, attr->attbyval, attr->attlen);
 		column->bv_values[1] = datumCopy(newval, attr->attbyval, attr->attlen);
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		PG_RETURN_BOOL(true);
 	}
 
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 9a0bcf6698d..4e7119e2d78 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2500,6 +2500,7 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
 		MemoryContextSwitchTo(oldctx);
 
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		modified = true;
 
 		column->bv_mem_value = PointerGetDatum(ranges);
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index 73fa38396e4..cc896c2d9d4 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -572,3 +572,39 @@ CREATE UNLOGGED TABLE brintest_unlogged (n numrange);
 CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n);
 INSERT INTO brintest_unlogged VALUES (numrange(0, 2^1000::numeric));
 DROP TABLE brintest_unlogged;
+-- test that we properly update has_nulls when inserting something into
+-- a range that only had NULLs before
+CREATE TABLE brintest_4 (a INT, b INT, c INT, d INET);
+CREATE INDEX brintest_4_idx ON brintest_4 USING brin (a, b int4_minmax_multi_ops, c int4_bloom_ops, d inet_inclusion_ops);
+-- insert a NULL value, so that we get an all-nulls range
+INSERT INTO brintest_4 VALUES (NULL, NULL, NULL, NULL);
+-- now insert a non-NULL value
+INSERT INTO brintest_4 VALUES (1, 1, 1, '127.0.0.1');
+-- see that we can still match the value when using the brin index
+SET enable_seqscan = off;
+SELECT * FROM brintest_4 WHERE a IS NULL;
+ a | b | c | d 
+---+---+---+---
+   |   |   | 
+(1 row)
+
+SELECT * FROM brintest_4 WHERE b IS NULL;
+ a | b | c | d 
+---+---+---+---
+   |   |   | 
+(1 row)
+
+SELECT * FROM brintest_4 WHERE c IS NULL;
+ a | b | c | d 
+---+---+---+---
+   |   |   | 
+(1 row)
+
+SELECT * FROM brintest_4 WHERE d IS NULL;
+ a | b | c | d 
+---+---+---+---
+   |   |   | 
+(1 row)
+
+DROP TABLE brintest_4;
+SET enable_seqscan = off;
diff --git a/src/test/regress/sql/brin.sql b/src/test/regress/sql/brin.sql
index e68e9e18df5..17a01a4b82f 100644
--- a/src/test/regress/sql/brin.sql
+++ b/src/test/regress/sql/brin.sql
@@ -515,3 +515,24 @@ CREATE UNLOGGED TABLE brintest_unlogged (n numrange);
 CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n);
 INSERT INTO brintest_unlogged VALUES (numrange(0, 2^1000::numeric));
 DROP TABLE brintest_unlogged;
+
+-- test that we properly update has_nulls when inserting something into
+-- a range that only had NULLs before
+CREATE TABLE brintest_4 (a INT, b INT, c INT, d INET);
+CREATE INDEX 

Re: Crash after a call to pg_backup_start()

2022-10-21 Thread Bharath Rupireddy
On Fri, Oct 21, 2022 at 7:06 PM Michael Paquier  wrote:
>
> On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
> > Yeah, the two conditions could be both false. How about we update the
> > comment a bit to emphasize this? Such as
> >
> > /* At most one of these conditions can be true */
> > or
> > /* These conditions can not be both true */
>
> If you do that, it would be a bit easier to read as of the following
> assertion instead?  Like:
> Assert(!during_backup_start ||
>sessionBackupState == SESSION_BACKUP_NONE);
>
> Please note that I have not checked in details all the interactions
> behind register_persistent_abort_backup_handler() before entering in
> do_pg_backup_start() and the ERROR_CLEANUP block used in this
> routine (just a matter of some elog(ERROR)s put here and there, for
> example).  Anyway, yes, both conditions can be false, and that's easy
> to reproduce:
> 1) Do pg_backup_start().
> 2) Do pg_backup_stop().
> 3) Stop the session to kick do_pg_abort_backup()
> 4) Assert()-boom.

I'm wondering if we need the assertion at all. We know that when the
arg is true or the sessionBackupState is SESSION_BACKUP_RUNNING, the
runningBackups would've been incremented and we can just go ahead and
decrement it, like the attached patch. This is a cleaner approach IMO
unless I'm missing something here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b147720ca68a120efdf8f20c58cb3499901e6d61 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 21 Oct 2022 14:29:27 +
Subject: [PATCH v1] Fix assertion failure in do_pg_abort_backup()

---
 src/backend/access/transam/xlog.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dea978a962..104309c56f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8841,11 +8841,12 @@ do_pg_abort_backup(int code, Datum arg)
 {
 	bool		during_backup_start = DatumGetBool(arg);
 
-	/* Only one of these conditions can be true */
-	Assert(during_backup_start ^
-		   (sessionBackupState == SESSION_BACKUP_RUNNING));
-
-	if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
+	/*
+	 * When either of these were true, we know that the runningBackups has
+	 * already been incremented, hence decrement it.
+	 */
+	if (during_backup_start ||
+		sessionBackupState == SESSION_BACKUP_RUNNING)
 	{
 		WALInsertLockAcquireExclusive();
 		Assert(XLogCtl->Insert.runningBackups > 0);
-- 
2.34.1



Re: ICU for global collation

2022-10-21 Thread Marina Polyakova

Hello!

I discovered an interesting behaviour during installcheck runs when the 
cluster was initialized with ICU locale provider:


$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start

1) The ECPG tests fail because they use the SQL_ASCII encoding [1], the 
database template0 uses the ICU locale provider and SQL_ASCII is not 
supported by ICU:


$ make -C src/interfaces/ecpg/ installcheck
...
== creating database "ecpg1_regression"   ==
ERROR:  encoding "SQL_ASCII" is not supported with ICU provider
ERROR:  database "ecpg1_regression" does not exist
command failed: "/home/marina/postgresql/master/my/inst/bin/psql" -X -c 
"CREATE DATABASE \"ecpg1_regression\" TEMPLATE=template0 
ENCODING='SQL_ASCII'" -c "ALTER DATABASE \"ecpg1_regression\" SET 
lc_messages TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_monetary 
TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_numeric TO 'C';ALTER 
DATABASE \"ecpg1_regression\" SET lc_time TO 'C';ALTER DATABASE 
\"ecpg1_regression\" SET bytea_output TO 'hex';ALTER DATABASE 
\"ecpg1_regression\" SET timezone_abbreviations TO 'Default';" 
"postgres"


2) The option --no-locale in pg_regress is described as "use C locale" 
[2]. But in this case the created databases actually use the ICU locale 
provider with the ICU cluster locale from template0 (see 
diff_check_backend_used_provider.patch):


$ make NO_LOCALE=1 installcheck

In regression.diffs:

diff -U3 
/home/marina/postgresql/master/src/test/regress/expected/test_setup.out 
/home/marina/postgresql/master/src/test/regress/results/test_setup.out
--- 
/home/marina/postgresql/master/src/test/regress/expected/test_setup.out	2022-09-27 
05:31:27.674628815 +0300
+++ 
/home/marina/postgresql/master/src/test/regress/results/test_setup.out	2022-10-21 
15:09:31.232992885 +0300

@@ -143,6 +143,798 @@
 \set filename :abs_srcdir '/data/person.data'
 COPY person FROM :'filename';
 VACUUM ANALYZE person;
+NOTICE:  varstrfastcmp_locale sss->collate_c 0 sss->locale 0xefacd0
+NOTICE:  varstrfastcmp_locale sss->locale->provider i
+NOTICE:  varstrfastcmp_locale sss->locale->info.icu.locale en-US
...

The patch diff_fix_pg_regress_create_database.patch fixes both issues 
for me.


[1] 
https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/interfaces/ecpg/test/Makefile#L18
[2] 
https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/test/regress/pg_regress.c#L1992


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index b57ed946c42bb54ede800e95045aa937a8dbad85..b3c0f6f753f8428274389844ccf9778a7ed47ea4 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -281,6 +281,14 @@ hashtext(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
+	elog(NOTICE, "hashtext lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale);
+	if (mylocale)
+	{
+		elog(NOTICE, "hashtext mylocale->provider %c", mylocale->provider);
+		if (mylocale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "hashtext mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+	}
+
 	if (!mylocale || mylocale->deterministic)
 	{
 		result = hash_any((unsigned char *) VARDATA_ANY(key),
@@ -337,6 +345,14 @@ hashtextextended(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
+	elog(NOTICE, "hashtextextended lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale);
+	if (mylocale)
+	{
+		elog(NOTICE, "hashtextextended mylocale->provider %c", mylocale->provider);
+		if (mylocale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "hashtextextended mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+	}
+
 	if (!mylocale || mylocale->deterministic)
 	{
 		result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 02d462a659778016f3c4479d425ba0a84feb6e26..9627c84a7ccfb4c4013556a51c989e9e6d611634 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -243,6 +243,8 @@ pg_set_regex_collation(Oid collation)
  errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
+	elog(NOTICE, "pg_set_regex_collation lc_ctype_is_c(collid) %d", lc_ctype_is_c(collation));
+
 	if (lc_ctype_is_c(collation))
 	{
 		/* C/POSIX collations use this path regardless of database encoding */
@@ -259,6 +261,14 @@ pg_set_regex_collation(Oid collation)
 		 */
 		pg_regex_locale = pg_newlocale_from_collation(collation);
 
+		elog(NOTICE, "pg_set_regex_collation pg_regex_locale %p", pg_regex_locale);
+		if (pg_regex_locale)
+		{
+			elog(NOTICE, 

Re: Avoid memory leaks during base backups

2022-10-21 Thread Robert Haas
On Fri, Oct 21, 2022 at 2:18 AM Michael Paquier  wrote:
> AFAIK, one of the callbacks associated to a memory context could
> fail, see comments before MemoryContextCallResetCallbacks() in
> MemoryContextDelete().  I agree that it should not matter here, but I
> think that it is better to reset the pointers before attempting the
> deletion of the memory context in this case.

I think this is nitpicking. There's no real danger here, and if there
were, the error handling would have to take it into account somehow,
which it doesn't.

I'd probably do it before resetting the context as a matter of style,
to make it clear that there's no window in which the pointers are set
but referencing invalid memory. But I do not think it makes any
practical difference at all.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: parse partition strategy string in gram.y

2022-10-21 Thread Japin Li


On Fri, 21 Oct 2022 at 20:34, Justin Pryzby  wrote:
> On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote:
>> Is there any way to get the regression tests diffs from Cirrus CI?
>> I did not find the diffs in [1].
>> 
>> [1] https://cirrus-ci.com/build/4721735111540736
>
> They're called "main".
> I'm planning on submitting a patch to rename it to "regress", someday.
> See also: 
> https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com

Oh, thank you very much!  I find it in testrun/build/testrun/main/regress [1].

[1] 
https://api.cirrus-ci.com/v1/artifact/task/6215926717612032/testrun/build/testrun/main/regress/regression.diffs

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Crash after a call to pg_backup_start()

2022-10-21 Thread Michael Paquier
On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
> Yeah, the two conditions could be both false. How about we update the
> comment a bit to emphasize this? Such as
> 
> /* At most one of these conditions can be true */
> or
> /* These conditions can not be both true */

If you do that, it would be a bit easier to read as of the following
assertion instead?  Like:
Assert(!during_backup_start ||
   sessionBackupState == SESSION_BACKUP_NONE);

Please note that I have not checked in details all the interactions
behind register_persistent_abort_backup_handler() before entering in
do_pg_backup_start() and the ERROR_CLEANUP block used in this
routine (just a matter of some elog(ERROR)s put here and there, for
example).  Anyway, yes, both conditions can be false, and that's easy
to reproduce:
1) Do pg_backup_start().
2) Do pg_backup_stop().
3) Stop the session to kick do_pg_abort_backup()
4) Assert()-boom.
--
Michael


signature.asc
Description: PGP signature


Re: cross-platform pg_basebackup

2022-10-21 Thread Robert Haas
On Fri, Oct 21, 2022 at 4:14 AM davinder singh
 wrote:
> Hi,
> Patch v2 looks good to me, I have tested it, and pg_basebackup works fine 
> across the platforms (Windows to Linux and Linux to Windows).
> Syntax used for testing
> $ pg_basebackup -h remote_server_ip -p 5432 -U user_name -D backup/data -T 
> olddir=newdir
>
> I have also tested with non-absolute paths, it behaves as expected.

Cool. Thanks to you, Andrew, and Tom for reviewing.

Committed and back-patched to all supported branches.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Pluggable toaster

2022-10-21 Thread Aleksander Alekseev
Hi Nikita,

> Here's rebased patch:
> v23-0001-toaster-interface.patch - Pluggable TOAST API interface with 
> reference (original) TOAST mechanics - new API is introduced but
> reference TOAST is still left unchanged;
> v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented 
> using TOAST API and is plugged in as Default Toaster;
> v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Thanks for keeping the patch up to date.

As I recall one of the open questions was: how this feature is
supposed to work with table access methods? Could you please summarize
what the current consensus is in this respect?

-- 
Best regards,
Aleksander Alekseev




Re: parse partition strategy string in gram.y

2022-10-21 Thread Justin Pryzby
On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote:
> Is there any way to get the regression tests diffs from Cirrus CI?
> I did not find the diffs in [1].
> 
> [1] https://cirrus-ci.com/build/4721735111540736

They're called "main".
I'm planning on submitting a patch to rename it to "regress", someday.
See also: 
https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com

-- 
Justin




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-21 Thread Önder Kalacı
Hi Shi yu, all


> In execReplication.c:
>
> +   TypeCacheEntry **eq = NULL; /* only used when the index is not
> unique */
>
> Maybe the comment here should be changed. Now it is used when the index is
> not
> primary key or replica identity index.
>
>
makes sense, updated


> 2.
> +# wait until the index is created
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select count(*)=1 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates one row via index";
>
> The message doesn't seem right,  should it be changed to "Timed out while
> waiting for creating index test_replica_id_full_idx"?
>

yes, updated


>
> 3.
> +# now, ingest more data and create index on column y which has higher
> cardinality
> +# then create an index on column y so that future commands uses the index
> on column
> +$node_publisher->safe_psql('postgres',
> +   "INSERT INTO test_replica_id_full SELECT 50, i FROM
> generate_series(0,3100)i;");
>
> The comment say "create (an) index on column y" twice, maybe it can be
> changed
> to:
>
> now, ingest more data and create index on column y which has higher
> cardinality,
> so that future commands will use the index on column y
>
>
fixed


> 4.
> +# deletes 200 rows
> +$node_publisher->safe_psql('postgres',
> +   "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes
> where indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates 200 rows via index";
>
> It would be better to call wait_for_catchup() after DELETE. (And some other
> places in this file.)
>

Hmm, I cannot follow this easily.

Why do you think wait_for_catchup() should be called? In general, I tried
to follow a pattern where we call poll_query_until() so that we are sure
that all the changes are replicated via the index. And then, an
additional check with `is($result, ..` such that we also verify the
correctness of the data.

One alternative could be to use wait_for_catchup() and then have multiple
`is($result, ..` to check both pg_stat_all_indexes and the correctness of
the data.

One minor advantage I see with the current approach is that every
`is($result, ..` adds one step to the test. So, if I use  `is($result, ..`
for pg_stat_all_indexes queries, then I'd be adding multiple steps for a
single test. It felt it is more natural/common to test roughly once with
`is($result, ..` on each test. Or, at least do not add additional ones for
pg_stat_all_indexes checks.



> Besides, the "updates" in the message should be "deletes".
>
>
fixed


> 5.
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes
> where indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates partitioned table";
>
> Maybe we should say "updates partitioned table with index" in this message.
>
>
Fixed

Attached v20.

Thanks!

Onder KALACI


v20_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-21 Thread Drouvot, Bertrand

Hi,

On 10/21/22 2:58 AM, Michael Paquier wrote:

On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:

Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
implement regexes for databases and roles in hba.

It does also contain new regexes related TAP tests and doc updates.


Thanks for the updated version.  This is really easy to look at now.


It relies on the refactoring made in fc579e11c6 (but changes the
regcomp_auth_token() parameters so that it is now responsible for emitting
the compilation error message (if any), to avoid code duplication in
parse_hba_line() and parse_ident_line() for roles, databases and user name
mapping).


@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
-   if (!tok->quoted && tok->string[0] == '+')
+   if (!token_has_regexp(tok))
 {
Hmm.  Do we need token_has_regexp() here for all the cases?  We know
that the string can begin with a '+', hence it is no regex.  The same
applies for the case of "all".  The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex.  It
seems like we could do an order like that?  Say:
if (!tok->quoted && tok->string[0] == '+')
 //do
else if (token_is_keyword(tok, "all"))
 //do
else if (token_has_regexp(tok))
 //do regex compilation, handling failures
else if (token_matches(tok, role))
 //do exact match

The same approach with keywords first, regex, and exact match could be
applied as well for the databases?  Perhaps it is just mainly a matter
of taste, 


Yeah, I think it is.


and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area, 


And agree that your proposal tastes better ;-): it is easier to 
understand, v2 attached has been done that way.



Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct.  However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup. 


Oops, right, thanks for the call out!


In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.


Right, but I think that should be "parsed_hba_lines != NIL".


My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths. 


Right. To avoid code duplication in the !ok/ok cases, the function 
free_hba_line() has been added in v2: it goes through the list of 
databases and roles tokens and call free_auth_token() for each of them.



Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.


I agree, and v2 is not attempting to unify them.


For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.


Thanks! v2 attached does apply on top of that.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..30753003ba 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -234,10 +234,13 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
-   preceding the file name with @.
+   a specific PostgreSQL database or a regular
+   expression preceded by /.
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names and/or regular expressions 
preceded
+   by / can be specified by preceding the file name
+   with @.
   
  
 
@@ -249,18 +252,20 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /,
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
 

Re: START_REPLICATION SLOT causing a crash in an assert build

2022-10-21 Thread Masahiko Sawada
On Thu, Oct 20, 2022 at 6:54 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-10-13 15:57:28 +0900, Masahiko Sawada wrote:
> > I've attached an updated patch. I've added the common function to
> > start pg_recvlogical and wait for it to become active. Please review
> > it.
>
> > +# Start pg_recvlogical process and wait for it to become active.
> > +sub start_pg_recvlogical
> > +{
> > + my ($node, $slot_name, $create_slot) = @_;
> > +
> > + my @cmd = (
> > + 'pg_recvlogical', '-S', "$slot_name", '-d',
> > + $node->connstr('postgres'),
> > + '--start', '--no-loop', '-f', '-');
> > + push @cmd, '--create-slot' if $create_slot;
> > +
> > + # start pg_recvlogical process.
> > + my $pg_recvlogical = IPC::Run::start(@cmd);
> > +
> > + # Wait for the replication slot to become active.
> > + $node->poll_query_until('postgres',
> > + "SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE 
> > slot_name = '$slot_name' AND active_pid IS NOT NULL)"
> > + ) or die "slot never became active";
> > +
> > + return $pg_recvlogical;
> > +}
>
> Because you never process the output from pg_recvlogical I think this test
> will fail if the pipe buffer is small (or more changes are made). I think
> either it needs to output to a file, or we need to process the output.

Okay, but how can we test this situation? As far as I tested, if we
don't specify the redirection of pg_recvlogical's output as above,
pg_recvlogical's stdout and stderr are output to the log file. So I
could not reproduce the issue you're concerned about. Which pipe do
you refer to?

>
> A file seems the easier solution in this case, we don't really care about what
> changes are streamed to the client, right?
>
>
> > +$node = PostgreSQL::Test::Cluster->new('test2');
> > +$node->init(allows_streaming => 'logical');
> > +$node->start;
> > +$node->safe_psql('postgres', "CREATE TABLE test(i int)");
>
> Why are we creating a new cluster? Initdbs takes a fair bit of time on some
> platforms, so this seems unnecessary?

Agreed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: parse partition strategy string in gram.y

2022-10-21 Thread Alvaro Herrera
headerscheck fail, fixed here.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)
>From c434020fc07616cdd13017135819083186d33256 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 20 Oct 2022 12:29:18 +0200
Subject: [PATCH v3] have gram.y resolve partition strategy names

---
 src/backend/commands/tablecmds.c  | 35 +--
 src/backend/parser/gram.y | 22 -
 src/backend/partitioning/partbounds.c | 27 -
 src/backend/utils/cache/partcache.c   |  6 +
 src/include/nodes/parsenodes.h| 15 ++--
 src/include/partitioning/partbounds.h |  2 +-
 src/include/utils/partcache.h |  3 ++-
 7 files changed, 54 insertions(+), 56 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 20135ef1b0..e07fd747f7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -605,9 +605,10 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid,
 			Oid oldRelOid, void *arg);
 static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid,
 			 Oid oldrelid, void *arg);
-static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy);
+static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec);
 static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
-  List **partexprs, Oid *partopclass, Oid *partcollation, char strategy);
+  List **partexprs, Oid *partopclass, Oid *partcollation,
+  PartitionStrategy strategy);
 static void CreateInheritance(Relation child_rel, Relation parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel,
 			  bool expect_detached);
@@ -1122,7 +1123,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (partitioned)
 	{
 		ParseState *pstate;
-		char		strategy;
 		int			partnatts;
 		AttrNumber	partattrs[PARTITION_MAX_KEYS];
 		Oid			partopclass[PARTITION_MAX_KEYS];
@@ -1147,14 +1147,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * and CHECK constraints, we could not have done the transformation
 		 * earlier.
 		 */
-		stmt->partspec = transformPartitionSpec(rel, stmt->partspec,
-);
+		stmt->partspec = transformPartitionSpec(rel, stmt->partspec);
 
 		ComputePartitionAttrs(pstate, rel, stmt->partspec->partParams,
 			  partattrs, , partopclass,
-			  partcollation, strategy);
+			  partcollation, stmt->partspec->strategy);
 
-		StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs,
+		StorePartitionKey(rel, stmt->partspec->strategy, partnatts, partattrs,
+		  partexprs,
 		  partopclass, partcollation);
 
 		/* make it all visible */
@@ -17132,10 +17132,10 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 /*
  * Transform any expressions present in the partition key
  *
- * Returns a transformed PartitionSpec, as well as the strategy code
+ * Returns a transformed PartitionSpec.
  */
 static PartitionSpec *
-transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
+transformPartitionSpec(Relation rel, PartitionSpec *partspec)
 {
 	PartitionSpec *newspec;
 	ParseState *pstate;
@@ -17148,21 +17148,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 	newspec->partParams = NIL;
 	newspec->location = partspec->location;
 
-	/* Parse partitioning strategy name */
-	if (pg_strcasecmp(partspec->strategy, "hash") == 0)
-		*strategy = PARTITION_STRATEGY_HASH;
-	else if (pg_strcasecmp(partspec->strategy, "list") == 0)
-		*strategy = PARTITION_STRATEGY_LIST;
-	else if (pg_strcasecmp(partspec->strategy, "range") == 0)
-		*strategy = PARTITION_STRATEGY_RANGE;
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized partitioning strategy \"%s\"",
-		partspec->strategy)));
-
 	/* Check valid number of columns for strategy */
-	if (*strategy == PARTITION_STRATEGY_LIST &&
+	if (partspec->strategy == PARTITION_STRATEGY_LIST &&
 		list_length(partspec->partParams) != 1)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -17208,7 +17195,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 static void
 ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
 	  List **partexprs, Oid *partopclass, Oid *partcollation,
-	  char strategy)
+	  PartitionStrategy strategy)
 {
 	int			attn;
 	ListCell   *lc;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 737bd2d06d..6ca23f88c4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ 

Re: parse partition strategy string in gram.y

2022-10-21 Thread Alvaro Herrera
On 2022-Oct-21, Japin Li wrote:

> Is there any way to get the regression tests diffs from Cirrus CI?
> I did not find the diffs in [1].

I think they should be somewhere in the artifacts, but I'm not sure.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)




Re: parse partition strategy string in gram.y

2022-10-21 Thread Japin Li


On Fri, 21 Oct 2022 at 18:12, Alvaro Herrera  wrote:
> On 2022-Oct-21, Japin Li wrote:
>
>> Does there an error about forget the LIST partition?
>
> Of course.
> https://cirrus-ci.com/build/4721735111540736
>
> This is what you get for moving cases around at the last minute ...
>

Is there any way to get the regression tests diffs from Cirrus CI?
I did not find the diffs in [1].

[1] https://cirrus-ci.com/build/4721735111540736

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: parse partition strategy string in gram.y

2022-10-21 Thread Alvaro Herrera
On 2022-Oct-21, Japin Li wrote:

> Does there an error about forget the LIST partition?

Of course.
https://cirrus-ci.com/build/4721735111540736

This is what you get for moving cases around at the last minute ...

Fixed, thanks.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From dc345f3cb70c335421f29a6867438ed4bb95bd91 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 20 Oct 2022 12:29:18 +0200
Subject: [PATCH v2] have gram.y resolve partition strategy names

---
 src/backend/commands/tablecmds.c  | 35 +--
 src/backend/parser/gram.y | 22 -
 src/backend/partitioning/partbounds.c | 26 
 src/backend/utils/cache/partcache.c   |  6 +
 src/include/nodes/parsenodes.h| 15 ++--
 src/include/partitioning/partbounds.h |  2 +-
 src/include/utils/partcache.h |  2 +-
 7 files changed, 53 insertions(+), 55 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 20135ef1b0..e07fd747f7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -605,9 +605,10 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid,
 			Oid oldRelOid, void *arg);
 static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid,
 			 Oid oldrelid, void *arg);
-static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy);
+static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec);
 static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
-  List **partexprs, Oid *partopclass, Oid *partcollation, char strategy);
+  List **partexprs, Oid *partopclass, Oid *partcollation,
+  PartitionStrategy strategy);
 static void CreateInheritance(Relation child_rel, Relation parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel,
 			  bool expect_detached);
@@ -1122,7 +1123,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (partitioned)
 	{
 		ParseState *pstate;
-		char		strategy;
 		int			partnatts;
 		AttrNumber	partattrs[PARTITION_MAX_KEYS];
 		Oid			partopclass[PARTITION_MAX_KEYS];
@@ -1147,14 +1147,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * and CHECK constraints, we could not have done the transformation
 		 * earlier.
 		 */
-		stmt->partspec = transformPartitionSpec(rel, stmt->partspec,
-);
+		stmt->partspec = transformPartitionSpec(rel, stmt->partspec);
 
 		ComputePartitionAttrs(pstate, rel, stmt->partspec->partParams,
 			  partattrs, , partopclass,
-			  partcollation, strategy);
+			  partcollation, stmt->partspec->strategy);
 
-		StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs,
+		StorePartitionKey(rel, stmt->partspec->strategy, partnatts, partattrs,
+		  partexprs,
 		  partopclass, partcollation);
 
 		/* make it all visible */
@@ -17132,10 +17132,10 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 /*
  * Transform any expressions present in the partition key
  *
- * Returns a transformed PartitionSpec, as well as the strategy code
+ * Returns a transformed PartitionSpec.
  */
 static PartitionSpec *
-transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
+transformPartitionSpec(Relation rel, PartitionSpec *partspec)
 {
 	PartitionSpec *newspec;
 	ParseState *pstate;
@@ -17148,21 +17148,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 	newspec->partParams = NIL;
 	newspec->location = partspec->location;
 
-	/* Parse partitioning strategy name */
-	if (pg_strcasecmp(partspec->strategy, "hash") == 0)
-		*strategy = PARTITION_STRATEGY_HASH;
-	else if (pg_strcasecmp(partspec->strategy, "list") == 0)
-		*strategy = PARTITION_STRATEGY_LIST;
-	else if (pg_strcasecmp(partspec->strategy, "range") == 0)
-		*strategy = PARTITION_STRATEGY_RANGE;
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized partitioning strategy \"%s\"",
-		partspec->strategy)));
-
 	/* Check valid number of columns for strategy */
-	if (*strategy == PARTITION_STRATEGY_LIST &&
+	if (partspec->strategy == PARTITION_STRATEGY_LIST &&
 		list_length(partspec->partParams) != 1)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -17208,7 +17195,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 static void
 ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
 	  List **partexprs, Oid *partopclass, Oid *partcollation,
-	  char strategy)
+	  PartitionStrategy strategy)
 {
 	int			attn;
 	ListCell   *lc;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 737bd2d06d..6ca23f88c4 100644

Re: parse partition strategy string in gram.y

2022-10-21 Thread Japin Li


On Fri, 21 Oct 2022 at 17:32, Alvaro Herrera  wrote:
> Hello
>
> I've had this patch sitting in a local branch for way too long.  It's a
> trivial thing but for some reason it bothered me: we let the partition 
> strategy flow into the backend as a string and only parse it into the
> catalog 1-char version quite late.
>
> This patch makes gram.y responsible for parsing it and passing it down
> as a value from a new enum, which looks more neat.  Because it's an
> enum, some "default:" cases can be removed in a couple of places.  I
> also added a new elog() in case the catalog contents becomes broken.

Does there an error about forget the LIST partition?

+/*
+ * Parse a user-supplied partition strategy string into parse node
+ * PartitionStrategy representation, or die trying.
+ */
+static PartitionStrategy
+parsePartitionStrategy(char *strategy)
+{
+   if (pg_strcasecmp(strategy, "range") == 0)   <-- it should be list
+   return PARTITION_STRATEGY_RANGE; <-- 
PARTITION_STRATEGY_LIST
+   else if (pg_strcasecmp(strategy, "hash") == 0)
+   return PARTITION_STRATEGY_HASH;
+   else if (pg_strcasecmp(strategy, "range") == 0)
+   return PARTITION_STRATEGY_RANGE;
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("unrecognized partitioning strategy \"%s\"",
+   strategy)));
+}
+


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Crash after a call to pg_backup_start()

2022-10-21 Thread Richard Guo
On Fri, Oct 21, 2022 at 3:10 PM Kyotaro Horiguchi 
wrote:

> It seems to me that the comment is true and the condition is a thinko.


Yeah, the two conditions could be both false. How about we update the
comment a bit to emphasize this? Such as

/* At most one of these conditions can be true */
or
/* These conditions can not be both true */


> Please find the attached fix.


+1

Thanks
Richard


Re: Standby recovers records from wrong timeline

2022-10-21 Thread Ants Aasma
On Fri, 21 Oct 2022 at 11:44, Kyotaro Horiguchi  wrote:
>
> At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> > latest works. It dones't consider the case of explict target timlines
> > so it's just a PoC.  (So this doesn't work if recovery_target_timeline
> > is set to 2 for the "standby" in the repro.)
>
> So, finally I noticed that the function XLogFileReadAnyTLI is not
> needed at all if we are going this direction.
>
> Regardless of recvoery_target_timeline is latest or any explicit
> imeline id or checkpoint timeline, what we can do to reach the target
> timline is just to follow the history file's direction.
>
> If segments are partly gone while reading on a timeline, a segment on
> the older timelines is just a crap since it should be incompatible.

I came to the same conclusion. I adjusted XLogFileReadAnyTLI to not use any
timeline that ends within the segment (attached patch). At this point the
name of the function becomes really wrong, XLogFileReadCorrectTLI or
something to that effect would be much more descriptive and the code could
be simplified.

However I'm not particularly happy with this approach as it will not use
valid WAL if that is not available. Consider scenario of a cascading
failure. Node A has a hard failure, then node B promotes, archives history
file, but doesn't see enough traffic to archive a full segment before
failing itself. While this is happening we restore node A from backup and
start it up as a standby.

If node b fails before node A has a chance to connect then either we are
continuing recovery on the wrong timeline (current behavior) or we will
not try to recover the first portion of the archived WAL file (with patch).

So I think the correct approach would still be to have ReadRecord() or
ApplyWalRecord() determine that switching timelines is needed.

-- 
Ants Aasma
www.cybertec-postgresql.com
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea6..73bde98b920 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4171,6 +4171,7 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 	{
 		TimeLineHistoryEntry *hent = (TimeLineHistoryEntry *) lfirst(cell);
 		TimeLineID	tli = hent->tli;
+		XLogSegNo	beginseg = 0;
 
 		if (tli < curFileTLI)
 			break;/* don't bother looking at too-old TLIs */
@@ -4181,7 +4182,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 		 */
 		if (hent->begin != InvalidXLogRecPtr)
 		{
-			XLogSegNo	beginseg = 0;
 
 			XLByteToSeg(hent->begin, beginseg, wal_segment_size);
 
@@ -4223,6 +4223,14 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 return fd;
 			}
 		}
+
+		/*
+		 * For segments containing known timeline switches only consider the
+		 * last timeline as redo otherwise doesn't know when to switch
+		 * timelines.
+		 */
+		if (segno == beginseg && beginseg > 0)
+			break;
 	}
 
 	/* Couldn't find it.  For simplicity, complain about front timeline */


Re: Standby recovers records from wrong timeline

2022-10-21 Thread Kyotaro Horiguchi
At Fri, 21 Oct 2022 17:44:40 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > latest works. It dones't consider the case of explict target timlines
> > so it's just a PoC.  (So this doesn't work if recovery_target_timeline
> > is set to 2 for the "standby" in the repro.)
> 
> So, finally I noticed that the function XLogFileReadAnyTLI is not
> needed at all if we are going this direction.
> 
> Regardless of recvoery_target_timeline is latest or any explicit
> imeline id or checkpoint timeline, what we can do to reach the target
> timline is just to follow the history file's direction.
> 
> If segments are partly gone while reading on a timeline, a segment on
> the older timelines is just a crap since it should be incompatible.
> 
> So.. I'm at a loss about what the function is for.
> 
> Please anyone tell me why do we need the behavior of
> XLogFileReadAnyTLI() at all?

It is introduced by 1bb2558046. And the behavior dates back to 2042b3428d.

Hmmm..  XLogFileRead() at the time did essentially the same thing to
the current XLogFileReadAnyTLI.  At that time the expectedTL*I*s
contained only timeline IDs.  Thus it seems to me, at that time,
recovery assumed that it is fine with reading the segment on the
greatest available timeline in the TLI list at every mement. (Mmm. I
cannot describe this precise enough)  In other words it did not
intend to use the segments on the older timelines than expected as the
replacement of the segment on the correct timelnie.

If this is correct (I hople the description above makes sense), now
that we can determine the exact TLI to read for the specified segno,
we don't need to descend to older timelines. In other words, the
current XLogFileReadAnyTLI() should be just XLogFileReadOnHistory(),
which reads a segment of the exact timeline calculated from the
expectedTLEs and the segno.

I'm going to work in this direction.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




parse partition strategy string in gram.y

2022-10-21 Thread Alvaro Herrera
Hello

I've had this patch sitting in a local branch for way too long.  It's a
trivial thing but for some reason it bothered me: we let the partition 
strategy flow into the backend as a string and only parse it into the
catalog 1-char version quite late.

This patch makes gram.y responsible for parsing it and passing it down
as a value from a new enum, which looks more neat.  Because it's an
enum, some "default:" cases can be removed in a couple of places.  I
also added a new elog() in case the catalog contents becomes broken.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)
>From e6067eee889a6636eb013f2f8d85efaf2112232b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 20 Oct 2022 12:29:18 +0200
Subject: [PATCH] have gram.y resolve partition strategy names

---
 src/backend/commands/tablecmds.c  | 35 +--
 src/backend/parser/gram.y | 22 -
 src/backend/partitioning/partbounds.c | 26 
 src/backend/utils/cache/partcache.c   |  6 +
 src/include/nodes/parsenodes.h| 15 ++--
 src/include/partitioning/partbounds.h |  2 +-
 src/include/utils/partcache.h |  2 +-
 7 files changed, 53 insertions(+), 55 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 20135ef1b0..e07fd747f7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -605,9 +605,10 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid,
 			Oid oldRelOid, void *arg);
 static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid,
 			 Oid oldrelid, void *arg);
-static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy);
+static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec);
 static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
-  List **partexprs, Oid *partopclass, Oid *partcollation, char strategy);
+  List **partexprs, Oid *partopclass, Oid *partcollation,
+  PartitionStrategy strategy);
 static void CreateInheritance(Relation child_rel, Relation parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel,
 			  bool expect_detached);
@@ -1122,7 +1123,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (partitioned)
 	{
 		ParseState *pstate;
-		char		strategy;
 		int			partnatts;
 		AttrNumber	partattrs[PARTITION_MAX_KEYS];
 		Oid			partopclass[PARTITION_MAX_KEYS];
@@ -1147,14 +1147,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * and CHECK constraints, we could not have done the transformation
 		 * earlier.
 		 */
-		stmt->partspec = transformPartitionSpec(rel, stmt->partspec,
-);
+		stmt->partspec = transformPartitionSpec(rel, stmt->partspec);
 
 		ComputePartitionAttrs(pstate, rel, stmt->partspec->partParams,
 			  partattrs, , partopclass,
-			  partcollation, strategy);
+			  partcollation, stmt->partspec->strategy);
 
-		StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs,
+		StorePartitionKey(rel, stmt->partspec->strategy, partnatts, partattrs,
+		  partexprs,
 		  partopclass, partcollation);
 
 		/* make it all visible */
@@ -17132,10 +17132,10 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 /*
  * Transform any expressions present in the partition key
  *
- * Returns a transformed PartitionSpec, as well as the strategy code
+ * Returns a transformed PartitionSpec.
  */
 static PartitionSpec *
-transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
+transformPartitionSpec(Relation rel, PartitionSpec *partspec)
 {
 	PartitionSpec *newspec;
 	ParseState *pstate;
@@ -17148,21 +17148,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 	newspec->partParams = NIL;
 	newspec->location = partspec->location;
 
-	/* Parse partitioning strategy name */
-	if (pg_strcasecmp(partspec->strategy, "hash") == 0)
-		*strategy = PARTITION_STRATEGY_HASH;
-	else if (pg_strcasecmp(partspec->strategy, "list") == 0)
-		*strategy = PARTITION_STRATEGY_LIST;
-	else if (pg_strcasecmp(partspec->strategy, "range") == 0)
-		*strategy = PARTITION_STRATEGY_RANGE;
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized partitioning strategy \"%s\"",
-		partspec->strategy)));
-
 	/* Check valid number of columns for strategy */
-	if (*strategy == PARTITION_STRATEGY_LIST &&
+	if (partspec->strategy == PARTITION_STRATEGY_LIST &&
 		list_length(partspec->partParams) != 1)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -17208,7 +17195,7 @@ 

RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-21 Thread houzj.f...@fujitsu.com
On Wednesday, October 19, 2022 8:50 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> ===
> 01. applyparallelworker.c - SIZE_STATS_MESSAGE
> 
> ```
> /*
>  * There are three fields in each message received by the parallel apply
>  * worker: start_lsn, end_lsn and send_time. Because we have updated these
>  * statistics in the leader apply worker, we can ignore these fields in the
>  * parallel apply worker (see function LogicalRepApplyLoop).
>  */
> #define SIZE_STATS_MESSAGE (2 * sizeof(XLogRecPtr) + sizeof(TimestampTz))
> ```
> 
> According to other comment styles, it seems that the first sentence of the
> comment should represent the datatype and usage, not the detailed reason.
> For example, about ParallelApplyWorkersList, you said "A list ...". How about
> adding like following message:
> The message size that can be skipped by parallel apply worker

Thanks for the comments, but the current description seems enough to me.

> ~~~
> 02. applyparallelworker.c - parallel_apply_start_subtrans
> 
> ```
>   if (current_xid != top_xid &&
>   !list_member_xid(subxactlist, current_xid)) ```
> 
> A macro TransactionIdEquals is defined in access/transam.h. Should we use it,
> or is it too trivial?

I checked the existing codes, it seems both style are being used.
Maybe we can post a separate patch to replace them later.

> ~~~
> 08. worker.c - apply_handle_prepare_internal
> 
> Same as above.
> 
> 
> ~~~
> 09. worker.c - maybe_reread_subscription
> 
> ```
>   /*
>* Exit if any parameter that affects the remote connection was
> changed.
>* The launcher will start a new worker.
>*/
>   if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 ||
>   strcmp(newsub->name, MySubscription->name) != 0 ||
>   strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
>   newsub->binary != MySubscription->binary ||
>   newsub->stream != MySubscription->stream ||
>   strcmp(newsub->origin, MySubscription->origin) != 0 ||
>   newsub->owner != MySubscription->owner ||
>   !equal(newsub->publications, MySubscription->publications))
>   {
>   ereport(LOG,
>   (errmsg("logical replication apply worker for
> subscription \"%s\" will restart because of a parameter change",
>   MySubscription->name)));
> 
>   proc_exit(0);
>   }
> ```
> 
> When the parallel apply worker has been launched and then the subscription
> option has been modified, the same message will appear twice.
> But if the option "streaming" is changed from "parallel" to "on", one of them
> will not restart again.
> Should we modify message?

Thanks, it seems a timing problem, if the leader catch the change first and stop
the parallel workers, the message will only appear once. But I agree we'd
better make the message clear. I changed the message in parallel apply worker.
While on it, I also adjusted some other message to include "parallel apply
worker" if they are in parallel apply worker.

Best regards,
Hou zj



Re: Data is copied twice when specifying both child and parent table in publication

2022-10-21 Thread Peter Smith
Here are my review comments for HEAD patches v13*

//

Patch HEAD_v13-0001

I already posted some follow-up questions. See [1]

/

Patch HEAD_v13-0002

1. Commit message

The following usage scenarios are not described in detail in the manual:
If one subscription subscribes multiple publications, and these publications
publish a partitioned table and its partitions respectively. When we specify
this parameter on one or more of these publications, which identity and schema
should be used to publish the changes?

In these cases, I think the parameter publish_via_partition_root behave as
follows:

~

It seemed worded a bit strangely. Also, you said "on one or more of
these publications" but the examples only show only one publication
having 'publish_via_partition_root'.

SUGGESTION (I've modified the wording slightly but the examples are unchanged).

Assume a subscription is subscribing to multiple publications, and
these publications publish a partitioned table and its partitions
respectively:

[publisher-side]
create table parent (a int primary key) partition by range (a);
create table child partition of parent default;

create publication pub1 for table parent;
create publication pub2 for table child;

[subscriber-side]
create subscription sub connection '' publication pub1, pub2;

The manual does not clearly describe the behaviour when the user had
specified the parameter 'publish_via_partition_root' on just one of
the publications. This patch modifies documentation to clarify the
following rules:

- If the parameter publish_via_partition_root is specified only in pub1,
changes will be published using the identity and schema of the table 'parent'.

- If the parameter publish_via_partition_root is specified only in pub2,
changes will be published using the identity and schema of the table 'child'.

~~~

2.

- If the parameter publish_via_partition_root is specified only in pub2,
changes will be published using the identity and schema of the table child.

~

Is that right though? This rule seems 100% contrary to the meaning of
'publish_via_partition_root=true'.

--

3. doc/src/sgml/ref/create_publication.sgml

+ 
+  If a root partitioned table is published by any subscribed
publications which
+  set publish_via_partition_root = true, changes on this root
partitioned table
+  (or on its partitions) will be published using the identity
and schema of this
+  root partitioned table rather than that of the individual partitions.
+ 

This seems to only describe the first example from the commit message.
What about some description to explain the second example?

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPt%2B1PNx6VsZ-xKzAU-18HmNXhjCC1TGakKX46Wg7YNT1Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Data is copied twice when specifying both child and parent table in publication

2022-10-21 Thread Peter Smith
On Mon, Oct 17, 2022 at 4:49 PM wangw.f...@fujitsu.com
 wrote:
>
> On Wed, Oct 5, 2022 at 11:08 AM Peter Smith  wrote:
> > Hi Wang-san. Here are my review comments for HEAD_v12-0001 patch.
>
...
> >
> > 3. QUESTION - pg_get_publication_tables / fetch_table_list
> >
> > When the same table is published by different publications (but there
> > are other differences like row-filters/column-lists in each
> > publication) the result tuple of this function does not include the
> > pubid. Maybe the SQL of pg_publication_tables/fetch_table_list() is OK
> > as-is but how does it manage to associate each table with the correct
> > tuple?
> >
> > I know it apparently all seems to work but I’m not how does that
> > happen? Can you explain why a puboid is not needed for the result
> > tuple of this function?
>
> Sorry, I am not sure I understand your question.
> I try to answer your question by explaining the two functions you mentioned:
>
> First, the function pg_get_publication_tables gets the list (see table_infos)
> that included published table and the corresponding publication. Then based
> on this list, the function pg_get_publication_tables returns information
> (scheme, relname, row filter and column list) about the published tables in 
> the
> publications list. It just doesn't return pubid.
>
> Then, the SQL in the function fetch_table_list will get the columns in the
> column list from pg_attribute. (This is to return all columns when the column
> list is not specified)
>

I meant, for example, if the different publications specified
different col-lists for the same table then IIUC the
fetch_table_lists() is going to return 2 list elements
(schema,rel_name,row_filter,col_list). But when the schema/rel_name
are the same for 2 elements then (without also a pubid) how are you
going to know where the list element came from, and how come that is
not important?

> > ~~
> >
> > test_pub=# create table t1(a int, b int, c int);
> > CREATE TABLE
> > test_pub=# create publication pub1 for table t1(a) where (a > 99);
> > CREATE PUBLICATION
> > test_pub=# create publication pub2 for table t1(a,b) where (b < 33);
> > CREATE PUBLICATION
> >
> > Following seems OK when I swap orders of publication names...
> >
> > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > ARRAY['pub2','pub1']) gpt(relid, attrs, qual);
> >  relid | attrs | rowfilter
> > ---+---+---
> >  16385 | 1 2   | (b < 33)
> >  16385 | 1 | (a > 99)
> > (2 rows)
> >
> > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > ARRAY['pub1','pub2']) gpt(relid, attrs, qual);
> >  relid | attrs | rowfilter
> > ---+---+---
> >  16385 | 1 | (a > 99)
> >  16385 | 1 2   | (b < 33)
> > (2 rows)
> >
> > But what about this (this is similar to the SQL fragment from
> > fetch_table_list); I swapped the pub names but the results are the
> > same...
> >
> > test_pub=# SELECT pg_get_publication_tables(VARIADIC
> > array_agg(p.pubname)) from pg_publication p where pubname
> > IN('pub2','pub1');
> >
> >  pg_get_publication_tables
> >
> > -
> > -
> > -
> > -
> > ---
> >  (16385,1,"{OPEXPR :opno 521 :opfuncid 147 :opresulttype 16 :opretset
> > false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1
> > :vartype 23 :vartypmod -1 :var
> > collid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 47}
> > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > :constbyval true :constisnull false :
> > location 51 :constvalue 4 [ 99 0 0 0 0 0 0 0 ]}) :location 49}")
> >  (16385,"1 2","{OPEXPR :opno 97 :opfuncid 66 :opresulttype 16
> > :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1
> > :varattno 2 :vartype 23 :vartypmod -1 :v
> > arcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 49}
> > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > :constbyval true :constisnull false
> >  :location 53 :constvalue 4 [ 33 0 0 0 0 0 0 0 ]}) :location 51}")
> > (2 rows)
> >
> > test_pub=# SELECT pg_get_publication_tables(VARIADIC
> > array_agg(p.pubname)) from pg_publication p where pubname
> > IN('pub1','pub2');
> >
> >  pg_get_publication_tables
> >
> > -
> > -
> > 

Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-10-21 Thread Kyotaro Horiguchi
At Fri, 21 Oct 2022 11:58:15 +1100, Peter Smith  wrote 
in 
> Hi, I was hoping to use this patch in my other thread [1], but your
> latest attachment is reported broken in cfbot [2]. Please rebase it.

Ouch. I haven't reach here. I'll do that next Monday.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Standby recovers records from wrong timeline

2022-10-21 Thread Kyotaro Horiguchi
At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> latest works. It dones't consider the case of explict target timlines
> so it's just a PoC.  (So this doesn't work if recovery_target_timeline
> is set to 2 for the "standby" in the repro.)

So, finally I noticed that the function XLogFileReadAnyTLI is not
needed at all if we are going this direction.

Regardless of recvoery_target_timeline is latest or any explicit
imeline id or checkpoint timeline, what we can do to reach the target
timline is just to follow the history file's direction.

If segments are partly gone while reading on a timeline, a segment on
the older timelines is just a crap since it should be incompatible.

So.. I'm at a loss about what the function is for.

Please anyone tell me why do we need the behavior of
XLogFileReadAnyTLI() at all?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: cross-platform pg_basebackup

2022-10-21 Thread davinder singh
Hi,
Patch v2 looks good to me, I have tested it, and pg_basebackup works fine
across the platforms (Windows to Linux and Linux to Windows).
Syntax used for testing
$ pg_basebackup -h remote_server_ip -p 5432 -U user_name -D backup/data -T
olddir=newdir

I have also tested with non-absolute paths, it behaves as expected.

On Fri, Oct 21, 2022 at 12:42 AM Andrew Dunstan  wrote:

>
> On 2022-10-20 Th 14:47, Robert Haas wrote:
> > On Thu, Oct 20, 2022 at 1:28 PM Tom Lane  wrote:
> >> Robert Haas  writes:
> >>> Cool. Here's a patch.
> >> LGTM, except I'd be inclined to ensure that all the macros
> >> are function-style, ie
> >>
> >> +#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)
> >>
> >> not just
> >>
> >> +#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP
> >>
> >> I don't recall the exact rules, but I know that the second style
> >> can lead to expanding the macro in more cases, which we likely
> >> don't want.  It also seems like better documentation to show
> >> the expected arguments.
> > OK, thanks. v2 attached.
> >
>
>
> Looks good.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>
>
>

-- 
Regards,
Davinder
EnterpriseDB: http://www.enterprisedb.com


Re: Standby recovers records from wrong timeline

2022-10-21 Thread Kyotaro Horiguchi
At Fri, 21 Oct 2022 16:45:59 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 20 Oct 2022 14:44:40 +0300, Ants Aasma  wrote in 
> > My understanding is that backup archives are supposed to remain valid
> > even after PITR or equivalently a lagging standby promoting.
> 
> Sorry, I was dim because of maybe catching a cold:p
> 
> On second thought. everything works fine if the first segment of the
> new timeline is archived in this case. So the problem here is whether
> recovery should wait for a known new timline when no segment on the
> new timeline is available yet.  As you say, I think it is sensible
> that recovery waits at the divergence LSN for the first segment on the
> new timeline before proceeding on the same timeline.

It is simpler than anticipated.  Just not descending timelines when
latest works. It dones't consider the case of explict target timlines
so it's just a PoC.  (So this doesn't work if recovery_target_timeline
is set to 2 for the "standby" in the repro.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..18c14d1fbf 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4223,6 +4223,13 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 return fd;
 			}
 		}
+
+		/*
+		 * If recovery_target_timeline is "latest", we don't want to read this
+		 * segment belongs to older timelines.
+		 */
+		if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_LATEST)
+			break;
 	}
 
 	/* Couldn't find it.  For simplicity, complain about front timeline */


Re: Standby recovers records from wrong timeline

2022-10-21 Thread Kyotaro Horiguchi
At Thu, 20 Oct 2022 14:44:40 +0300, Ants Aasma  wrote in 
> My understanding is that backup archives are supposed to remain valid
> even after PITR or equivalently a lagging standby promoting.

Sorry, I was dim because of maybe catching a cold:p

On second thought. everything works fine if the first segment of the
new timeline is archived in this case. So the problem here is whether
recovery should wait for a known new timline when no segment on the
new timeline is available yet.  As you say, I think it is sensible
that recovery waits at the divergence LSN for the first segment on the
new timeline before proceeding on the same timeline.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Crash after a call to pg_backup_start()

2022-10-21 Thread Kyotaro Horiguchi
While playing with a proposed patch, I noticed that a session crashes
after a failed call to pg_backup_start().

postgres=# select pg_backup_start(repeat('x', 1026));
ERROR:  backup label too long (max 1024 bytes)
postgres=# \q
> TRAP: failed Assert("during_backup_start ^ (sessionBackupState == 
> SESSION_BACKUP_RUNNING)"), File: "xlog.c", Line: 8846, PID: 165835

Surprisingly this happens by a series of succussful calls to
pg_backup_start and stop.

postgres=# select pg_backup_start('x');
postgres=# select pg_backup_top();
postgres=# \q
> TRAP: failed Assert("durin..


>> do_pg_abort_backup(int code, Datum arg)
>   /* Only one of these conditions can be true */
>   Assert(during_backup_start ^
>  (sessionBackupState == SESSION_BACKUP_RUNNING));

It seems to me that the comment is true and the condition is a thinko.
This is introduced by df3737a651 so it is master only.

Please find the attached fix.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dea978a962..888f5b1bff 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8842,8 +8842,8 @@ do_pg_abort_backup(int code, Datum arg)
 	bool		during_backup_start = DatumGetBool(arg);
 
 	/* Only one of these conditions can be true */
-	Assert(during_backup_start ^
-		   (sessionBackupState == SESSION_BACKUP_RUNNING));
+	Assert(!(during_backup_start &&
+			 (sessionBackupState == SESSION_BACKUP_RUNNING)));
 
 	if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
 	{


Re: Avoid memory leaks during base backups

2022-10-21 Thread Kyotaro Horiguchi
At Thu, 20 Oct 2022 19:47:07 +0200, Alvaro Herrera  
wrote in 
> I agree that's a good idea, and the patch looks good to me, but I don't
> think asserting that they are null afterwards is useful.

+1 for this direction. And the patch is fine to me.


>   oldcontext = MemoryContextSwitchTo(backupcontext);
>   Assert(backup_state == NULL);
>   Assert(tablespace_map == NULL);
>   backup_state = (BackupState *) palloc0(sizeof(BackupState));
>   tablespace_map = makeStringInfo();
>   MemoryContextSwitchTo(oldcontext);

We can use MemoryContextAllocZero() for this purpose, but of couse not
mandatory.


+* across. We keep the memory allocated in this memory context less,
+* because any error before reaching pg_backup_stop() can leak the 
memory
+* until pg_backup_start() is called again. While this is not smart, it
+* helps to keep things simple.

I think the "less" is somewhat obscure.  I feel we should be more
explicitly. And we don't need to put emphasis on "leak".  I recklessly
propose this as the draft.

"The context is intended to be used by this function to store only
session-lifetime values.  It is, if left alone, reset at the next call
to blow away orphan memory blocks from the previous failed call.
While this is not smart, it helps to keep things simple."

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Avoid memory leaks during base backups

2022-10-21 Thread Michael Paquier
On Fri, Oct 21, 2022 at 11:34:27AM +0530, Bharath Rupireddy wrote:
> On Fri, Oct 21, 2022 at 12:21 AM Robert Haas  wrote:
>> On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy 
>>  wrote:
>>> I think elsewhere in the code we reset dangling pointers either ways -
>>> before or after deleting/resetting memory context. But placing them
>>> before would give us extra safety in case memory context
>>> deletion/reset fails. Not sure what's the best way.
>>
>> I think it's OK to assume that deallocating memory will always
>> succeed, so it doesn't matter whether you do it just before or just
>> after that. But it's not OK to assume that *allocating* memory will
>> always succeed.
> 
> Right.

To be exact, it seems to me that tablespace_map and backup_state
should be reset before deleting backupcontext, but the reset of
backupcontext should happen after the fact.

+   backup_state = NULL;
tablespace_map = NULL;
These two in pg_backup_start() don't matter, do they?  They are
reallocated a couple of lines down.

+* across. We keep the memory allocated in this memory context less,
What does "We keep the memory allocated in this memory context less"
mean here? 
--
Michael


signature.asc
Description: PGP signature


Re: Avoid memory leaks during base backups

2022-10-21 Thread Michael Paquier
On Thu, Oct 20, 2022 at 02:51:21PM -0400, Robert Haas wrote:
> On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy
>  wrote:
>> I think elsewhere in the code we reset dangling pointers either ways -
>> before or after deleting/resetting memory context. But placing them
>> before would give us extra safety in case memory context
>> deletion/reset fails. Not sure what's the best way.
> 
> I think it's OK to assume that deallocating memory will always
> succeed, so it doesn't matter whether you do it just before or just
> after that. But it's not OK to assume that *allocating* memory will
> always succeed.

AFAIK, one of the callbacks associated to a memory context could
fail, see comments before MemoryContextCallResetCallbacks() in
MemoryContextDelete().  I agree that it should not matter here, but I
think that it is better to reset the pointers before attempting the
deletion of the memory context in this case.
--
Michael


signature.asc
Description: PGP signature


Re: Avoid memory leaks during base backups

2022-10-21 Thread Bharath Rupireddy
On Thu, Oct 20, 2022 at 11:17 PM Alvaro Herrera  wrote:
>
> On 2022-Oct-20, Bharath Rupireddy wrote:
>
> > I think elsewhere in the code we reset dangling pointers either ways -
> > before or after deleting/resetting memory context. But placing them
> > before would give us extra safety in case memory context
> > deletion/reset fails. Not sure what's the best way. However, I'm
> > nullifying the dangling pointers after deleting/resetting memory
> > context.
>
> I agree that's a good idea, and the patch looks good to me, but I don't
> think asserting that they are null afterwards is useful.

+1. Removed those assertions. Please see the attached v9 patch.

On Fri, Oct 21, 2022 at 12:21 AM Robert Haas  wrote:
>
> On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy
>  wrote:
> > I think elsewhere in the code we reset dangling pointers either ways -
> > before or after deleting/resetting memory context. But placing them
> > before would give us extra safety in case memory context
> > deletion/reset fails. Not sure what's the best way.
>
> I think it's OK to assume that deallocating memory will always
> succeed, so it doesn't matter whether you do it just before or just
> after that. But it's not OK to assume that *allocating* memory will
> always succeed.

Right.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v9-0001-Avoid-memory-leaks-during-backups-using-SQL-calla.patch
Description: Binary data


Re: making relfilenodes 56 bits

2022-10-21 Thread Michael Paquier
On Thu, Sep 29, 2022 at 09:23:38PM -0400, Tom Lane wrote:
> Hmmm ... I'd tend to do SELECT COUNT(*) FROM.  But can't we provide
> any actual checks on the sanity of the output?  I realize that the
> output's far from static, but still ...

Honestly, checking all the fields is not that exciting, but the
maximum I can think of that would be portable enough is something like
the attached.  No arithmetic operators for xid limits things a bit,
but at least that's something.

Thoughts?
--
Michael
diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 9f106c2a10..38987e2afc 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -594,3 +594,89 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
  Index Cond: (unique1 = g.g)
 (4 rows)
 
+--
+-- Test functions for control data
+--
+\x
+SELECT checkpoint_lsn > '0/0'::pg_lsn AS checkpoint_lsn,
+redo_lsn > '0/0'::pg_lsn AS redo_lsn,
+redo_wal_file IS NOT NULL AS redo_wal_file,
+timeline_id > 0 AS timeline_id,
+prev_timeline_id > 0 AS prev_timeline_id,
+next_xid IS NOT NULL AS next_xid,
+next_oid > 0 AS next_oid,
+next_multixact_id != '0'::xid AS next_multixact_id,
+next_multi_offset IS NOT NULL AS next_multi_offset,
+oldest_xid != '0'::xid AS oldest_xid,
+oldest_xid_dbid > 0 AS oldest_xid_dbid,
+oldest_active_xid != '0'::xid AS oldest_active_xid,
+oldest_multi_xid != '0'::xid AS oldest_multi_xid,
+oldest_multi_dbid > 0 AS oldest_multi_dbid,
+oldest_commit_ts_xid IS NOT NULL AS oldest_commit_ts_xid,
+newest_commit_ts_xid IS NOT NULL AS newest_commit_ts_xid
+  FROM pg_control_checkpoint();
+-[ RECORD 1 ]+--
+checkpoint_lsn   | t
+redo_lsn | t
+redo_wal_file| t
+timeline_id  | t
+prev_timeline_id | t
+next_xid | t
+next_oid | t
+next_multixact_id| t
+next_multi_offset| t
+oldest_xid   | t
+oldest_xid_dbid  | t
+oldest_active_xid| t
+oldest_multi_xid | t
+oldest_multi_dbid| t
+oldest_commit_ts_xid | t
+newest_commit_ts_xid | t
+
+SELECT max_data_alignment > 0 AS max_data_alignment,
+database_block_size > 0 AS database_block_size,
+blocks_per_segment > 0 AS blocks_per_segment,
+wal_block_size > 0 AS wal_block_size,
+max_identifier_length > 0 AS max_identifier_length,
+max_index_columns > 0 AS max_index_columns,
+max_toast_chunk_size > 0 AS max_toast_chunk_size,
+large_object_chunk_size > 0 AS large_object_chunk_size,
+float8_pass_by_value IS NOT NULL AS float8_pass_by_value,
+data_page_checksum_version >= 0 AS data_page_checksum_version
+  FROM pg_control_init();
+-[ RECORD 1 ]--+--
+max_data_alignment | t
+database_block_size| t
+blocks_per_segment | t
+wal_block_size | t
+max_identifier_length  | t
+max_index_columns  | t
+max_toast_chunk_size   | t
+large_object_chunk_size| t
+float8_pass_by_value   | t
+data_page_checksum_version | t
+
+SELECT min_recovery_end_lsn >= '0/0'::pg_lsn AS min_recovery_end_lsn,
+min_recovery_end_timeline >= 0 AS min_recovery_end_timeline,
+backup_start_lsn >= '0/0'::pg_lsn AS backup_start_lsn,
+backup_end_lsn >= '0/0'::pg_lsn AS backup_end_lsn,
+end_of_backup_record_required IS NOT NULL AS end_of_backup_record_required
+  FROM pg_control_recovery();
+-[ RECORD 1 ]-+--
+min_recovery_end_lsn  | t
+min_recovery_end_timeline | t
+backup_start_lsn  | t
+backup_end_lsn| t
+end_of_backup_record_required | t
+
+SELECT pg_control_version > 0 AS pg_control_version,
+catalog_version_no > 0 AS catalog_version_no,
+system_identifier >= 0 AS system_identifier,
+pg_control_last_modified <= now() AS pg_control_last_modified
+  FROM pg_control_system();
+-[ RECORD 1 ]+--
+pg_control_version   | t
+catalog_version_no   | t
+system_identifier| t
+pg_control_last_modified | t
+
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 639e9b352c..986e07c3a5 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -223,3 +223,47 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g;
 
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
+
+--
+-- Test functions for control data
+--
+\x
+SELECT checkpoint_lsn > '0/0'::pg_lsn AS checkpoint_lsn,
+redo_lsn > '0/0'::pg_lsn AS redo_lsn,
+redo_wal_file IS NOT NULL AS redo_wal_file,
+timeline_id > 0 AS timeline_id,
+prev_timeline_id > 0 AS prev_timeline_id,
+next_xid IS NOT NULL AS next_xid,
+next_oid > 0 AS next_oid,
+next_multixact_id != '0'::xid AS next_multixact_id,
+next_multi_offset IS NOT NULL AS next_multi_offset,
+oldest_xid !=