date:20221021

On Fri, Oct 21, 2022 at 10:43 AM Kyotaro Horiguchi
 wrote:
>
> Thanks, but we don't need to wipe out the all bytes. Just putting \0
> at the beginning of the buffer is sufficient.

Nah, that's not a clean way IMO.

> And the Memset() at the
> beginning of basic_archive_file_internal is not needed since that
> static variables are initially initialized to zeros.

Removed. MemSet() after durable_rename() would be sufficient.

> This is not necessarily needed, but it might be better we empty
> tempfilepath after unlinking the file.

I think it's not necessary as the archiver itself is shutting down and
I don't think the server calls the shutdown callback twice. However,
if we want basic_archive_shutdown() to be more protective against
multiple calls (for any reason that we're missing), we can have a
static local variable to quickly exit if the callback is already
called. instead of MemSet(), but that's not needed I guess.

> +  expectation that a value will soon be provided. Care must be taken when
> +  multiple servers are archiving to the same
> +  basic_archive.archive_library directory as they all
> +  might try to archive the same WAL file.
>
> I don't understand what kind of care should be taken by reading this..

It's just a notice, however I agree with you that it may be confusing.
I've removed it.

Please review the attached v4 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

v4-0001-Remove-leftover-temporary-files-via-basic_archive.patch
Description: Binary data

Re: Missing update of all_hasnulls in BRIN opclasses

2022-10-21 Thread Matthias van de Meent

On Fri, 21 Oct 2022 at 17:24, Tomas Vondra
 wrote:
>
> Hi,
>
> While working on some BRIN code, I discovered a bug in handling NULL
> values - when inserting a non-NULL value into a NULL-only range, we
> reset the all_nulls flag but don't update the has_nulls flag. And
> because of that we then fail to return the range for IS NULL ranges.

Ah, that's bad.

One question though: doesn't (shouldn't?) column->bv_allnulls already
imply column->bv_hasnulls? The column has nulls if all of the values
are null, right? Or is the description of the field deceptive, and
does bv_hasnulls actually mean "has nulls bitmap"?

> Attached is a patch fixing this by properly updating the has_nulls flag.

One comment on the patch:

> +SET enable_seqscan = off;
> + [...]
> +SET enable_seqscan = off;

Looks like duplicated SETs. Should that last one be RESET instead?

Apart from that, this patch looks good.

- Matthias

Re: Avoid memory leaks during base backups

On Fri, Oct 21, 2022 at 11:58 AM Michael Paquier  wrote:
>
> To be exact, it seems to me that tablespace_map and backup_state
> should be reset before deleting backupcontext, but the reset of
> backupcontext should happen after the fact.
>
> +   backup_state = NULL;
> tablespace_map = NULL;
> These two in pg_backup_start() don't matter, do they?  They are
> reallocated a couple of lines down.

After all, that is what is being discussed here; what if palloc down
below fails and they're not reset to NULL after MemoryContextReset()?

> +* across. We keep the memory allocated in this memory context less,
> What does "We keep the memory allocated in this memory context less"
> mean here?

We try to keep it less because we don't want to allocate more memory
and leak it unless pg_start_backup() is called again. Please read the
description. I'll leave it to the committer's discretion whether to
have that part or remove it.

On Fri, Oct 21, 2022 at 12:11 PM Kyotaro Horiguchi
 wrote:
>
>
> +* across. We keep the memory allocated in this memory context less,
> +* because any error before reaching pg_backup_stop() can leak the 
> memory
> +* until pg_backup_start() is called again. While this is not smart, 
> it
> +* helps to keep things simple.
>
> I think the "less" is somewhat obscure.  I feel we should be more
> explicitly. And we don't need to put emphasis on "leak".  I recklessly
> propose this as the draft.

I tried to put it simple, please see the attached v10. I'll leave it
to the committer's discretion for better wording.

On Fri, Oct 21, 2022 at 7:47 PM Robert Haas  wrote:
>
> On Fri, Oct 21, 2022 at 2:18 AM Michael Paquier  wrote:
> > AFAIK, one of the callbacks associated to a memory context could
> > fail, see comments before MemoryContextCallResetCallbacks() in
> > MemoryContextDelete().  I agree that it should not matter here, but I
> > think that it is better to reset the pointers before attempting the
> > deletion of the memory context in this case.
>
> I think this is nitpicking. There's no real danger here, and if there
> were, the error handling would have to take it into account somehow,
> which it doesn't.
>
> I'd probably do it before resetting the context as a matter of style,
> to make it clear that there's no window in which the pointers are set
> but referencing invalid memory. But I do not think it makes any
> practical difference at all.

Please see the attached v10.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 3d8cf10d0bab48afee639359669b69c2901afd34 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 21 Oct 2022 15:04:36 +
Subject: [PATCH v10] Avoid memory leaks during backups using SQL-callable
 functions

Any failure while taking backups using SQL-callable functions can
leak the memory allocated for backup_state and tablespace_map, as
they both need to be long-lived, they are being created in
TopMemoryContext.

To fix the memory leak problem, we create a special session-level
memory context as a direct child of TopMemoryContext so that the
memory allocated is carried across. We delete the memory context at
the end of pg_backup_stop(). We keep the memory allocated in this
memory context less, because any error before reaching
pg_backup_stop() can still leak the memory until pg_backup_start()
is called again. While this is not smart, it helps to keep things
simple.

Author: Bharath Rupireddy
Reviewed-by: Robert Haas, Alvaro Herrera
Reviewed-by: Cary Huang, Michael Paquier
Discussion: https://www.postgresql.org/message-id/CALj2ACXqvfKF2B0beQ=aJMdWnpNohmBPsRg=EDQj_6y1t2O8mQ@mail.gmail.com
---
 src/backend/access/transam/xlogfuncs.c | 42 ++
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index a801a94fe8..e9ec86316b 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -45,6 +45,9 @@
 static BackupState *backup_state = NULL;
 static StringInfo tablespace_map = NULL;
 
+/* A long-lived workspace for SQL-callable backup functions. */
+static MemoryContext backupcontext = NULL;
+
 /*
  * pg_backup_start: set up for taking an on-line backup dump
  *
@@ -72,27 +75,27 @@ pg_backup_start(PG_FUNCTION_ARGS)
 
 	/*
 	 * backup_state and tablespace_map need to be long-lived as they are used
-	 * in pg_backup_stop().
+	 * in pg_backup_stop(). We create a special session-level memory context as
+	 * a direct child of TopMemoryContext so that the memory allocated is
+	 * carried across. We typically delete the memory context at the end of
+	 * pg_backup_stop(), however, an error before it can leak the memory until
+	 * pg_backup_start() is called again. Hence, we try to keep the memory
+	 * allocated in this memory context less. While this is not smart, it helps
+	 * to keep things simple.
 	 */
-

Missing update of all_hasnulls in BRIN opclasses

2022-10-21 Thread Tomas Vondra

Hi,

While working on some BRIN code, I discovered a bug in handling NULL
values - when inserting a non-NULL value into a NULL-only range, we
reset the all_nulls flag but don't update the has_nulls flag. And
because of that we then fail to return the range for IS NULL ranges.

Reproducing this is trivial:

  create table t (a int);
  create index on t using brin (a);
  insert into t values (null);
  insert into t values (1);

  set enable_seqscan  = off;
  select * from t where a is null;

This should return 1 row, but actually it returns no rows.

Attached is a patch fixing this by properly updating the has_nulls flag.

I reproduced this all the way back to 9.5, so it's a long-standing bug.
It's interesting no one noticed / reported it so far, it doesn't seem
like a particularly rare corner case.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/access/brin/brin_bloom.c b/src/backend/access/brin/brin_bloom.c
index 6b0af7267d5..60315450b41 100644
--- a/src/backend/access/brin/brin_bloom.c
+++ b/src/backend/access/brin/brin_bloom.c
@@ -539,6 +539,7 @@ brin_bloom_add_value(PG_FUNCTION_ARGS)
 			BloomGetFalsePositiveRate(opts));
 		column->bv_values[0] = PointerGetDatum(filter);
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		updated = true;
 	}
 	else
diff --git a/src/backend/access/brin/brin_inclusion.c b/src/backend/access/brin/brin_inclusion.c
index 4b02d374f23..e0f44d3e62c 100644
--- a/src/backend/access/brin/brin_inclusion.c
+++ b/src/backend/access/brin/brin_inclusion.c
@@ -164,6 +164,7 @@ brin_inclusion_add_value(PG_FUNCTION_ARGS)
 		column->bv_values[INCLUSION_UNMERGEABLE] = BoolGetDatum(false);
 		column->bv_values[INCLUSION_CONTAINS_EMPTY] = BoolGetDatum(false);
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		new = true;
 	}
 
diff --git a/src/backend/access/brin/brin_minmax.c b/src/backend/access/brin/brin_minmax.c
index 9e8a8e056cc..8a5661a8952 100644
--- a/src/backend/access/brin/brin_minmax.c
+++ b/src/backend/access/brin/brin_minmax.c
@@ -90,6 +90,7 @@ brin_minmax_add_value(PG_FUNCTION_ARGS)
 		column->bv_values[0] = datumCopy(newval, attr->attbyval, attr->attlen);
 		column->bv_values[1] = datumCopy(newval, attr->attbyval, attr->attlen);
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		PG_RETURN_BOOL(true);
 	}
 
diff --git a/src/backend/access/brin/brin_minmax_multi.c b/src/backend/access/brin/brin_minmax_multi.c
index 9a0bcf6698d..4e7119e2d78 100644
--- a/src/backend/access/brin/brin_minmax_multi.c
+++ b/src/backend/access/brin/brin_minmax_multi.c
@@ -2500,6 +2500,7 @@ brin_minmax_multi_add_value(PG_FUNCTION_ARGS)
 		MemoryContextSwitchTo(oldctx);
 
 		column->bv_allnulls = false;
+		column->bv_hasnulls = true;
 		modified = true;
 
 		column->bv_mem_value = PointerGetDatum(ranges);
diff --git a/src/test/regress/expected/brin.out b/src/test/regress/expected/brin.out
index 73fa38396e4..cc896c2d9d4 100644
--- a/src/test/regress/expected/brin.out
+++ b/src/test/regress/expected/brin.out
@@ -572,3 +572,39 @@ CREATE UNLOGGED TABLE brintest_unlogged (n numrange);
 CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n);
 INSERT INTO brintest_unlogged VALUES (numrange(0, 2^1000::numeric));
 DROP TABLE brintest_unlogged;
+-- test that we properly update has_nulls when inserting something into
+-- a range that only had NULLs before
+CREATE TABLE brintest_4 (a INT, b INT, c INT, d INET);
+CREATE INDEX brintest_4_idx ON brintest_4 USING brin (a, b int4_minmax_multi_ops, c int4_bloom_ops, d inet_inclusion_ops);
+-- insert a NULL value, so that we get an all-nulls range
+INSERT INTO brintest_4 VALUES (NULL, NULL, NULL, NULL);
+-- now insert a non-NULL value
+INSERT INTO brintest_4 VALUES (1, 1, 1, '127.0.0.1');
+-- see that we can still match the value when using the brin index
+SET enable_seqscan = off;
+SELECT * FROM brintest_4 WHERE a IS NULL;
+ a | b | c | d 
+---+---+---+---
+   |   |   | 
+(1 row)
+
+SELECT * FROM brintest_4 WHERE b IS NULL;
+ a | b | c | d 
+---+---+---+---
+   |   |   | 
+(1 row)
+
+SELECT * FROM brintest_4 WHERE c IS NULL;
+ a | b | c | d 
+---+---+---+---
+   |   |   | 
+(1 row)
+
+SELECT * FROM brintest_4 WHERE d IS NULL;
+ a | b | c | d 
+---+---+---+---
+   |   |   | 
+(1 row)
+
+DROP TABLE brintest_4;
+SET enable_seqscan = off;
diff --git a/src/test/regress/sql/brin.sql b/src/test/regress/sql/brin.sql
index e68e9e18df5..17a01a4b82f 100644
--- a/src/test/regress/sql/brin.sql
+++ b/src/test/regress/sql/brin.sql
@@ -515,3 +515,24 @@ CREATE UNLOGGED TABLE brintest_unlogged (n numrange);
 CREATE INDEX brinidx_unlogged ON brintest_unlogged USING brin (n);
 INSERT INTO brintest_unlogged VALUES (numrange(0, 2^1000::numeric));
 DROP TABLE brintest_unlogged;
+
+-- test that we properly update has_nulls when inserting something into
+-- a range that only had NULLs before
+CREATE TABLE brintest_4 (a INT, b INT, c INT, d INET);
+CREATE INDEX

Re: Crash after a call to pg_backup_start()

On Fri, Oct 21, 2022 at 7:06 PM Michael Paquier  wrote:
>
> On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
> > Yeah, the two conditions could be both false. How about we update the
> > comment a bit to emphasize this? Such as
> >
> > /* At most one of these conditions can be true */
> > or
> > /* These conditions can not be both true */
>
> If you do that, it would be a bit easier to read as of the following
> assertion instead?  Like:
> Assert(!during_backup_start ||
>sessionBackupState == SESSION_BACKUP_NONE);
>
> Please note that I have not checked in details all the interactions
> behind register_persistent_abort_backup_handler() before entering in
> do_pg_backup_start() and the ERROR_CLEANUP block used in this
> routine (just a matter of some elog(ERROR)s put here and there, for
> example).  Anyway, yes, both conditions can be false, and that's easy
> to reproduce:
> 1) Do pg_backup_start().
> 2) Do pg_backup_stop().
> 3) Stop the session to kick do_pg_abort_backup()
> 4) Assert()-boom.

I'm wondering if we need the assertion at all. We know that when the
arg is true or the sessionBackupState is SESSION_BACKUP_RUNNING, the
runningBackups would've been incremented and we can just go ahead and
decrement it, like the attached patch. This is a cleaner approach IMO
unless I'm missing something here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From b147720ca68a120efdf8f20c58cb3499901e6d61 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 21 Oct 2022 14:29:27 +
Subject: [PATCH v1] Fix assertion failure in do_pg_abort_backup()

---
 src/backend/access/transam/xlog.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dea978a962..104309c56f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8841,11 +8841,12 @@ do_pg_abort_backup(int code, Datum arg)
 {
 	bool		during_backup_start = DatumGetBool(arg);
 
-	/* Only one of these conditions can be true */
-	Assert(during_backup_start ^
-		   (sessionBackupState == SESSION_BACKUP_RUNNING));
-
-	if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
+	/*
+	 * When either of these were true, we know that the runningBackups has
+	 * already been incremented, hence decrement it.
+	 */
+	if (during_backup_start ||
+		sessionBackupState == SESSION_BACKUP_RUNNING)
 	{
 		WALInsertLockAcquireExclusive();
 		Assert(XLogCtl->Insert.runningBackups > 0);
-- 
2.34.1

Re: ICU for global collation

2022-10-21 Thread Marina Polyakova


Hello!

I discovered an interesting behaviour during installcheck runs when the 
cluster was initialized with ICU locale provider:


$ initdb --locale-provider icu --icu-locale en-US -D data &&
pg_ctl -D data -l logfile start

1) The ECPG tests fail because they use the SQL_ASCII encoding [1], the 
database template0 uses the ICU locale provider and SQL_ASCII is not 
supported by ICU:


$ make -C src/interfaces/ecpg/ installcheck
...
== creating database "ecpg1_regression"   ==
ERROR:  encoding "SQL_ASCII" is not supported with ICU provider
ERROR:  database "ecpg1_regression" does not exist
command failed: "/home/marina/postgresql/master/my/inst/bin/psql" -X -c 
"CREATE DATABASE \"ecpg1_regression\" TEMPLATE=template0 
ENCODING='SQL_ASCII'" -c "ALTER DATABASE \"ecpg1_regression\" SET 
lc_messages TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_monetary 
TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_numeric TO 'C';ALTER 
DATABASE \"ecpg1_regression\" SET lc_time TO 'C';ALTER DATABASE 
\"ecpg1_regression\" SET bytea_output TO 'hex';ALTER DATABASE 
\"ecpg1_regression\" SET timezone_abbreviations TO 'Default';" 
"postgres"


2) The option --no-locale in pg_regress is described as "use C locale" 
[2]. But in this case the created databases actually use the ICU locale 
provider with the ICU cluster locale from template0 (see 
diff_check_backend_used_provider.patch):


$ make NO_LOCALE=1 installcheck

In regression.diffs:

diff -U3 
/home/marina/postgresql/master/src/test/regress/expected/test_setup.out 
/home/marina/postgresql/master/src/test/regress/results/test_setup.out
--- 
/home/marina/postgresql/master/src/test/regress/expected/test_setup.out	2022-09-27 
05:31:27.674628815 +0300
+++ 
/home/marina/postgresql/master/src/test/regress/results/test_setup.out	2022-10-21 
15:09:31.232992885 +0300

@@ -143,6 +143,798 @@
 \set filename :abs_srcdir '/data/person.data'
 COPY person FROM :'filename';
 VACUUM ANALYZE person;
+NOTICE:  varstrfastcmp_locale sss->collate_c 0 sss->locale 0xefacd0
+NOTICE:  varstrfastcmp_locale sss->locale->provider i
+NOTICE:  varstrfastcmp_locale sss->locale->info.icu.locale en-US
...

The patch diff_fix_pg_regress_create_database.patch fixes both issues 
for me.


[1] 
https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/interfaces/ecpg/test/Makefile#L18
[2] 
https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/test/regress/pg_regress.c#L1992


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index b57ed946c42bb54ede800e95045aa937a8dbad85..b3c0f6f753f8428274389844ccf9778a7ed47ea4 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -281,6 +281,14 @@ hashtext(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
+	elog(NOTICE, "hashtext lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale);
+	if (mylocale)
+	{
+		elog(NOTICE, "hashtext mylocale->provider %c", mylocale->provider);
+		if (mylocale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "hashtext mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+	}
+
 	if (!mylocale || mylocale->deterministic)
 	{
 		result = hash_any((unsigned char *) VARDATA_ANY(key),
@@ -337,6 +345,14 @@ hashtextextended(PG_FUNCTION_ARGS)
 	if (!lc_collate_is_c(collid))
 		mylocale = pg_newlocale_from_collation(collid);
 
+	elog(NOTICE, "hashtextextended lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale);
+	if (mylocale)
+	{
+		elog(NOTICE, "hashtextextended mylocale->provider %c", mylocale->provider);
+		if (mylocale->provider == COLLPROVIDER_ICU)
+			elog(NOTICE, "hashtextextended mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)");
+	}
+
 	if (!mylocale || mylocale->deterministic)
 	{
 		result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 02d462a659778016f3c4479d425ba0a84feb6e26..9627c84a7ccfb4c4013556a51c989e9e6d611634 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -243,6 +243,8 @@ pg_set_regex_collation(Oid collation)
  errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
+	elog(NOTICE, "pg_set_regex_collation lc_ctype_is_c(collid) %d", lc_ctype_is_c(collation));
+
 	if (lc_ctype_is_c(collation))
 	{
 		/* C/POSIX collations use this path regardless of database encoding */
@@ -259,6 +261,14 @@ pg_set_regex_collation(Oid collation)
 		 */
 		pg_regex_locale = pg_newlocale_from_collation(collation);
 
+		elog(NOTICE, "pg_set_regex_collation pg_regex_locale %p", pg_regex_locale);
+		if (pg_regex_locale)
+		{
+			elog(NOTICE,

Re: Avoid memory leaks during base backups

2022-10-21 Thread Robert Haas

On Fri, Oct 21, 2022 at 2:18 AM Michael Paquier  wrote:
> AFAIK, one of the callbacks associated to a memory context could
> fail, see comments before MemoryContextCallResetCallbacks() in
> MemoryContextDelete().  I agree that it should not matter here, but I
> think that it is better to reset the pointers before attempting the
> deletion of the memory context in this case.

I think this is nitpicking. There's no real danger here, and if there
were, the error handling would have to take it into account somehow,
which it doesn't.

I'd probably do it before resetting the context as a matter of style,
to make it clear that there's no window in which the pointers are set
but referencing invalid memory. But I do not think it makes any
practical difference at all.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parse partition strategy string in gram.y

2022-10-21 Thread Japin Li



On Fri, 21 Oct 2022 at 20:34, Justin Pryzby  wrote:
> On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote:
>> Is there any way to get the regression tests diffs from Cirrus CI?
>> I did not find the diffs in [1].
>> 
>> [1] https://cirrus-ci.com/build/4721735111540736
>
> They're called "main".
> I'm planning on submitting a patch to rename it to "regress", someday.
> See also: 
> https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com

Oh, thank you very much!  I find it in testrun/build/testrun/main/regress [1].

[1] 
https://api.cirrus-ci.com/v1/artifact/task/6215926717612032/testrun/build/testrun/main/regress/regression.diffs

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Crash after a call to pg_backup_start()

On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
> Yeah, the two conditions could be both false. How about we update the
> comment a bit to emphasize this? Such as
> 
> /* At most one of these conditions can be true */
> or
> /* These conditions can not be both true */

If you do that, it would be a bit easier to read as of the following
assertion instead?  Like:
Assert(!during_backup_start ||
   sessionBackupState == SESSION_BACKUP_NONE);

Please note that I have not checked in details all the interactions
behind register_persistent_abort_backup_handler() before entering in
do_pg_backup_start() and the ERROR_CLEANUP block used in this
routine (just a matter of some elog(ERROR)s put here and there, for
example).  Anyway, yes, both conditions can be false, and that's easy
to reproduce:
1) Do pg_backup_start().
2) Do pg_backup_stop().
3) Stop the session to kick do_pg_abort_backup()
4) Assert()-boom.
--
Michael


signature.asc
Description: PGP signature

Re: cross-platform pg_basebackup

2022-10-21 Thread Robert Haas

On Fri, Oct 21, 2022 at 4:14 AM davinder singh
 wrote:
> Hi,
> Patch v2 looks good to me, I have tested it, and pg_basebackup works fine 
> across the platforms (Windows to Linux and Linux to Windows).
> Syntax used for testing
> $ pg_basebackup -h remote_server_ip -p 5432 -U user_name -D backup/data -T 
> olddir=newdir
>
> I have also tested with non-absolute paths, it behaves as expected.

Cool. Thanks to you, Andrew, and Tom for reviewing.

Committed and back-patched to all supported branches.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pluggable toaster

2022-10-21 Thread Aleksander Alekseev

Hi Nikita,

> Here's rebased patch:
> v23-0001-toaster-interface.patch - Pluggable TOAST API interface with 
> reference (original) TOAST mechanics - new API is introduced but
> reference TOAST is still left unchanged;
> v23-0002-toaster-default.patch - Default TOAST mechanics is re-implemented 
> using TOAST API and is plugged in as Default Toaster;
> v23-0003-toaster-docs.patch - Pluggable TOAST API documentation package

Thanks for keeping the patch up to date.

As I recall one of the open questions was: how this feature is
supposed to work with table access methods? Could you please summarize
what the current consensus is in this respect?

-- 
Best regards,
Aleksander Alekseev

Re: parse partition strategy string in gram.y

2022-10-21 Thread Justin Pryzby

On Fri, Oct 21, 2022 at 06:22:44PM +0800, Japin Li wrote:
> Is there any way to get the regression tests diffs from Cirrus CI?
> I did not find the diffs in [1].
> 
> [1] https://cirrus-ci.com/build/4721735111540736

They're called "main".
I'm planning on submitting a patch to rename it to "regress", someday.
See also: 
https://www.postgresql.org/message-id/20221001161420.GF6256%40telsasoft.com

-- 
Justin

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-21 Thread Önder Kalacı

Hi Shi yu, all


> In execReplication.c:
>
> +   TypeCacheEntry **eq = NULL; /* only used when the index is not
> unique */
>
> Maybe the comment here should be changed. Now it is used when the index is
> not
> primary key or replica identity index.
>
>
makes sense, updated


> 2.
> +# wait until the index is created
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select count(*)=1 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates one row via index";
>
> The message doesn't seem right,  should it be changed to "Timed out while
> waiting for creating index test_replica_id_full_idx"?
>

yes, updated


>
> 3.
> +# now, ingest more data and create index on column y which has higher
> cardinality
> +# then create an index on column y so that future commands uses the index
> on column
> +$node_publisher->safe_psql('postgres',
> +   "INSERT INTO test_replica_id_full SELECT 50, i FROM
> generate_series(0,3100)i;");
>
> The comment say "create (an) index on column y" twice, maybe it can be
> changed
> to:
>
> now, ingest more data and create index on column y which has higher
> cardinality,
> so that future commands will use the index on column y
>
>
fixed


> 4.
> +# deletes 200 rows
> +$node_publisher->safe_psql('postgres',
> +   "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes
> where indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates 200 rows via index";
>
> It would be better to call wait_for_catchup() after DELETE. (And some other
> places in this file.)
>

Hmm, I cannot follow this easily.

Why do you think wait_for_catchup() should be called? In general, I tried
to follow a pattern where we call poll_query_until() so that we are sure
that all the changes are replicated via the index. And then, an
additional check with `is($result, ..` such that we also verify the
correctness of the data.

One alternative could be to use wait_for_catchup() and then have multiple
`is($result, ..` to check both pg_stat_all_indexes and the correctness of
the data.

One minor advantage I see with the current approach is that every
`is($result, ..` adds one step to the test. So, if I use  `is($result, ..`
for pg_stat_all_indexes queries, then I'd be adding multiple steps for a
single test. It felt it is more natural/common to test roughly once with
`is($result, ..` on each test. Or, at least do not add additional ones for
pg_stat_all_indexes checks.



> Besides, the "updates" in the message should be "deletes".
>
>
fixed


> 5.
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> +   'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes
> where indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates partitioned table";
>
> Maybe we should say "updates partitioned table with index" in this message.
>
>
Fixed

Attached v20.

Thanks!

Onder KALACI


v20_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-21 Thread Drouvot, Bertrand


Hi,

On 10/21/22 2:58 AM, Michael Paquier wrote:

On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:

Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
implement regexes for databases and roles in hba.

It does also contain new regexes related TAP tests and doc updates.


Thanks for the updated version.  This is really easy to look at now.


It relies on the refactoring made in fc579e11c6 (but changes the
regcomp_auth_token() parameters so that it is now responsible for emitting
the compilation error message (if any), to avoid code duplication in
parse_hba_line() and parse_ident_line() for roles, databases and user name
mapping).


@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
-   if (!tok->quoted && tok->string[0] == '+')
+   if (!token_has_regexp(tok))
 {
Hmm.  Do we need token_has_regexp() here for all the cases?  We know
that the string can begin with a '+', hence it is no regex.  The same
applies for the case of "all".  The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex.  It
seems like we could do an order like that?  Say:
if (!tok->quoted && tok->string[0] == '+')
 //do
else if (token_is_keyword(tok, "all"))
 //do
else if (token_has_regexp(tok))
 //do regex compilation, handling failures
else if (token_matches(tok, role))
 //do exact match

The same approach with keywords first, regex, and exact match could be
applied as well for the databases?  Perhaps it is just mainly a matter
of taste, 


Yeah, I think it is.


and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area, 


And agree that your proposal tastes better ;-): it is easier to 
understand, v2 attached has been done that way.



Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct.  However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup. 


Oops, right, thanks for the call out!


In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.


Right, but I think that should be "parsed_hba_lines != NIL".


My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths. 


Right. To avoid code duplication in the !ok/ok cases, the function 
free_hba_line() has been added in v2: it goes through the list of 
databases and roles tokens and call free_auth_token() for each of them.



Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.


I agree, and v2 is not attempting to unify them.


For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.


Thanks! v2 attached does apply on top of that.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..30753003ba 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -234,10 +234,13 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
-   preceding the file name with @.
+   a specific PostgreSQL database or a regular
+   expression preceded by /.
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names and/or regular expressions 
preceded
+   by / can be specified by preceding the file name
+   with @.
   
  
 
@@ -249,18 +252,20 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /,
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members

Re: START_REPLICATION SLOT causing a crash in an assert build

2022-10-21 Thread Masahiko Sawada

On Thu, Oct 20, 2022 at 6:54 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-10-13 15:57:28 +0900, Masahiko Sawada wrote:
> > I've attached an updated patch. I've added the common function to
> > start pg_recvlogical and wait for it to become active. Please review
> > it.
>
> > +# Start pg_recvlogical process and wait for it to become active.
> > +sub start_pg_recvlogical
> > +{
> > + my ($node, $slot_name, $create_slot) = @_;
> > +
> > + my @cmd = (
> > + 'pg_recvlogical', '-S', "$slot_name", '-d',
> > + $node->connstr('postgres'),
> > + '--start', '--no-loop', '-f', '-');
> > + push @cmd, '--create-slot' if $create_slot;
> > +
> > + # start pg_recvlogical process.
> > + my $pg_recvlogical = IPC::Run::start(@cmd);
> > +
> > + # Wait for the replication slot to become active.
> > + $node->poll_query_until('postgres',
> > + "SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE 
> > slot_name = '$slot_name' AND active_pid IS NOT NULL)"
> > + ) or die "slot never became active";
> > +
> > + return $pg_recvlogical;
> > +}
>
> Because you never process the output from pg_recvlogical I think this test
> will fail if the pipe buffer is small (or more changes are made). I think
> either it needs to output to a file, or we need to process the output.

Okay, but how can we test this situation? As far as I tested, if we
don't specify the redirection of pg_recvlogical's output as above,
pg_recvlogical's stdout and stderr are output to the log file. So I
could not reproduce the issue you're concerned about. Which pipe do
you refer to?

>
> A file seems the easier solution in this case, we don't really care about what
> changes are streamed to the client, right?
>
>
> > +$node = PostgreSQL::Test::Cluster->new('test2');
> > +$node->init(allows_streaming => 'logical');
> > +$node->start;
> > +$node->safe_psql('postgres', "CREATE TABLE test(i int)");
>
> Why are we creating a new cluster? Initdbs takes a fair bit of time on some
> platforms, so this seems unnecessary?

Agreed.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: parse partition strategy string in gram.y

headerscheck fail, fixed here.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
#error "Operator lives in the wrong universe"
  ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)
>From c434020fc07616cdd13017135819083186d33256 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 20 Oct 2022 12:29:18 +0200
Subject: [PATCH v3] have gram.y resolve partition strategy names

---
 src/backend/commands/tablecmds.c  | 35 +--
 src/backend/parser/gram.y | 22 -
 src/backend/partitioning/partbounds.c | 27 -
 src/backend/utils/cache/partcache.c   |  6 +
 src/include/nodes/parsenodes.h| 15 ++--
 src/include/partitioning/partbounds.h |  2 +-
 src/include/utils/partcache.h |  3 ++-
 7 files changed, 54 insertions(+), 56 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 20135ef1b0..e07fd747f7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -605,9 +605,10 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid,
 			Oid oldRelOid, void *arg);
 static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid,
 			 Oid oldrelid, void *arg);
-static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy);
+static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec);
 static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
-  List **partexprs, Oid *partopclass, Oid *partcollation, char strategy);
+  List **partexprs, Oid *partopclass, Oid *partcollation,
+  PartitionStrategy strategy);
 static void CreateInheritance(Relation child_rel, Relation parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel,
 			  bool expect_detached);
@@ -1122,7 +1123,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (partitioned)
 	{
 		ParseState *pstate;
-		char		strategy;
 		int			partnatts;
 		AttrNumber	partattrs[PARTITION_MAX_KEYS];
 		Oid			partopclass[PARTITION_MAX_KEYS];
@@ -1147,14 +1147,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * and CHECK constraints, we could not have done the transformation
 		 * earlier.
 		 */
-		stmt->partspec = transformPartitionSpec(rel, stmt->partspec,
-);
+		stmt->partspec = transformPartitionSpec(rel, stmt->partspec);
 
 		ComputePartitionAttrs(pstate, rel, stmt->partspec->partParams,
 			  partattrs, , partopclass,
-			  partcollation, strategy);
+			  partcollation, stmt->partspec->strategy);
 
-		StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs,
+		StorePartitionKey(rel, stmt->partspec->strategy, partnatts, partattrs,
+		  partexprs,
 		  partopclass, partcollation);
 
 		/* make it all visible */
@@ -17132,10 +17132,10 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 /*
  * Transform any expressions present in the partition key
  *
- * Returns a transformed PartitionSpec, as well as the strategy code
+ * Returns a transformed PartitionSpec.
  */
 static PartitionSpec *
-transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
+transformPartitionSpec(Relation rel, PartitionSpec *partspec)
 {
 	PartitionSpec *newspec;
 	ParseState *pstate;
@@ -17148,21 +17148,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 	newspec->partParams = NIL;
 	newspec->location = partspec->location;
 
-	/* Parse partitioning strategy name */
-	if (pg_strcasecmp(partspec->strategy, "hash") == 0)
-		*strategy = PARTITION_STRATEGY_HASH;
-	else if (pg_strcasecmp(partspec->strategy, "list") == 0)
-		*strategy = PARTITION_STRATEGY_LIST;
-	else if (pg_strcasecmp(partspec->strategy, "range") == 0)
-		*strategy = PARTITION_STRATEGY_RANGE;
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized partitioning strategy \"%s\"",
-		partspec->strategy)));
-
 	/* Check valid number of columns for strategy */
-	if (*strategy == PARTITION_STRATEGY_LIST &&
+	if (partspec->strategy == PARTITION_STRATEGY_LIST &&
 		list_length(partspec->partParams) != 1)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -17208,7 +17195,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 static void
 ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
 	  List **partexprs, Oid *partopclass, Oid *partcollation,
-	  char strategy)
+	  PartitionStrategy strategy)
 {
 	int			attn;
 	ListCell   *lc;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 737bd2d06d..6ca23f88c4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@

Re: parse partition strategy string in gram.y

On 2022-Oct-21, Japin Li wrote:

> Is there any way to get the regression tests diffs from Cirrus CI?
> I did not find the diffs in [1].

I think they should be somewhere in the artifacts, but I'm not sure.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

Re: parse partition strategy string in gram.y

2022-10-21 Thread Japin Li



On Fri, 21 Oct 2022 at 18:12, Alvaro Herrera  wrote:
> On 2022-Oct-21, Japin Li wrote:
>
>> Does there an error about forget the LIST partition?
>
> Of course.
> https://cirrus-ci.com/build/4721735111540736
>
> This is what you get for moving cases around at the last minute ...
>

Is there any way to get the regression tests diffs from Cirrus CI?
I did not find the diffs in [1].

[1] https://cirrus-ci.com/build/4721735111540736

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: parse partition strategy string in gram.y

On 2022-Oct-21, Japin Li wrote:

> Does there an error about forget the LIST partition?

Of course.
https://cirrus-ci.com/build/4721735111540736

This is what you get for moving cases around at the last minute ...

Fixed, thanks.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From dc345f3cb70c335421f29a6867438ed4bb95bd91 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 20 Oct 2022 12:29:18 +0200
Subject: [PATCH v2] have gram.y resolve partition strategy names

---
 src/backend/commands/tablecmds.c  | 35 +--
 src/backend/parser/gram.y | 22 -
 src/backend/partitioning/partbounds.c | 26 
 src/backend/utils/cache/partcache.c   |  6 +
 src/include/nodes/parsenodes.h| 15 ++--
 src/include/partitioning/partbounds.h |  2 +-
 src/include/utils/partcache.h |  2 +-
 7 files changed, 53 insertions(+), 55 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 20135ef1b0..e07fd747f7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -605,9 +605,10 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid,
 			Oid oldRelOid, void *arg);
 static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid,
 			 Oid oldrelid, void *arg);
-static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy);
+static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec);
 static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
-  List **partexprs, Oid *partopclass, Oid *partcollation, char strategy);
+  List **partexprs, Oid *partopclass, Oid *partcollation,
+  PartitionStrategy strategy);
 static void CreateInheritance(Relation child_rel, Relation parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel,
 			  bool expect_detached);
@@ -1122,7 +1123,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (partitioned)
 	{
 		ParseState *pstate;
-		char		strategy;
 		int			partnatts;
 		AttrNumber	partattrs[PARTITION_MAX_KEYS];
 		Oid			partopclass[PARTITION_MAX_KEYS];
@@ -1147,14 +1147,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * and CHECK constraints, we could not have done the transformation
 		 * earlier.
 		 */
-		stmt->partspec = transformPartitionSpec(rel, stmt->partspec,
-);
+		stmt->partspec = transformPartitionSpec(rel, stmt->partspec);
 
 		ComputePartitionAttrs(pstate, rel, stmt->partspec->partParams,
 			  partattrs, , partopclass,
-			  partcollation, strategy);
+			  partcollation, stmt->partspec->strategy);
 
-		StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs,
+		StorePartitionKey(rel, stmt->partspec->strategy, partnatts, partattrs,
+		  partexprs,
 		  partopclass, partcollation);
 
 		/* make it all visible */
@@ -17132,10 +17132,10 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 /*
  * Transform any expressions present in the partition key
  *
- * Returns a transformed PartitionSpec, as well as the strategy code
+ * Returns a transformed PartitionSpec.
  */
 static PartitionSpec *
-transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
+transformPartitionSpec(Relation rel, PartitionSpec *partspec)
 {
 	PartitionSpec *newspec;
 	ParseState *pstate;
@@ -17148,21 +17148,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 	newspec->partParams = NIL;
 	newspec->location = partspec->location;
 
-	/* Parse partitioning strategy name */
-	if (pg_strcasecmp(partspec->strategy, "hash") == 0)
-		*strategy = PARTITION_STRATEGY_HASH;
-	else if (pg_strcasecmp(partspec->strategy, "list") == 0)
-		*strategy = PARTITION_STRATEGY_LIST;
-	else if (pg_strcasecmp(partspec->strategy, "range") == 0)
-		*strategy = PARTITION_STRATEGY_RANGE;
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized partitioning strategy \"%s\"",
-		partspec->strategy)));
-
 	/* Check valid number of columns for strategy */
-	if (*strategy == PARTITION_STRATEGY_LIST &&
+	if (partspec->strategy == PARTITION_STRATEGY_LIST &&
 		list_length(partspec->partParams) != 1)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -17208,7 +17195,7 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 static void
 ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
 	  List **partexprs, Oid *partopclass, Oid *partcollation,
-	  char strategy)
+	  PartitionStrategy strategy)
 {
 	int			attn;
 	ListCell   *lc;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 737bd2d06d..6ca23f88c4 100644

Re: parse partition strategy string in gram.y

2022-10-21 Thread Japin Li



On Fri, 21 Oct 2022 at 17:32, Alvaro Herrera  wrote:
> Hello
>
> I've had this patch sitting in a local branch for way too long.  It's a
> trivial thing but for some reason it bothered me: we let the partition 
> strategy flow into the backend as a string and only parse it into the
> catalog 1-char version quite late.
>
> This patch makes gram.y responsible for parsing it and passing it down
> as a value from a new enum, which looks more neat.  Because it's an
> enum, some "default:" cases can be removed in a couple of places.  I
> also added a new elog() in case the catalog contents becomes broken.

Does there an error about forget the LIST partition?

+/*
+ * Parse a user-supplied partition strategy string into parse node
+ * PartitionStrategy representation, or die trying.
+ */
+static PartitionStrategy
+parsePartitionStrategy(char *strategy)
+{
+   if (pg_strcasecmp(strategy, "range") == 0)   <-- it should be list
+   return PARTITION_STRATEGY_RANGE; <-- 
PARTITION_STRATEGY_LIST
+   else if (pg_strcasecmp(strategy, "hash") == 0)
+   return PARTITION_STRATEGY_HASH;
+   else if (pg_strcasecmp(strategy, "range") == 0)
+   return PARTITION_STRATEGY_RANGE;
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("unrecognized partitioning strategy \"%s\"",
+   strategy)));
+}
+


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Crash after a call to pg_backup_start()

2022-10-21 Thread Richard Guo

On Fri, Oct 21, 2022 at 3:10 PM Kyotaro Horiguchi 
wrote:

> It seems to me that the comment is true and the condition is a thinko.

Yeah, the two conditions could be both false. How about we update the
comment a bit to emphasize this? Such as

/* At most one of these conditions can be true */
or
/* These conditions can not be both true */

> Please find the attached fix.

+1

Thanks
Richard

Re: Standby recovers records from wrong timeline

2022-10-21 Thread Ants Aasma

On Fri, 21 Oct 2022 at 11:44, Kyotaro Horiguchi  wrote:
>
> At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> > latest works. It dones't consider the case of explict target timlines
> > so it's just a PoC.  (So this doesn't work if recovery_target_timeline
> > is set to 2 for the "standby" in the repro.)
>
> So, finally I noticed that the function XLogFileReadAnyTLI is not
> needed at all if we are going this direction.
>
> Regardless of recvoery_target_timeline is latest or any explicit
> imeline id or checkpoint timeline, what we can do to reach the target
> timline is just to follow the history file's direction.
>
> If segments are partly gone while reading on a timeline, a segment on
> the older timelines is just a crap since it should be incompatible.

I came to the same conclusion. I adjusted XLogFileReadAnyTLI to not use any
timeline that ends within the segment (attached patch). At this point the
name of the function becomes really wrong, XLogFileReadCorrectTLI or
something to that effect would be much more descriptive and the code could
be simplified.

However I'm not particularly happy with this approach as it will not use
valid WAL if that is not available. Consider scenario of a cascading
failure. Node A has a hard failure, then node B promotes, archives history
file, but doesn't see enough traffic to archive a full segment before
failing itself. While this is happening we restore node A from backup and
start it up as a standby.

If node b fails before node A has a chance to connect then either we are
continuing recovery on the wrong timeline (current behavior) or we will
not try to recover the first portion of the archived WAL file (with patch).

So I think the correct approach would still be to have ReadRecord() or
ApplyWalRecord() determine that switching timelines is needed.

-- 
Ants Aasma
www.cybertec-postgresql.com
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea6..73bde98b920 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4171,6 +4171,7 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 	{
 		TimeLineHistoryEntry *hent = (TimeLineHistoryEntry *) lfirst(cell);
 		TimeLineID	tli = hent->tli;
+		XLogSegNo	beginseg = 0;

 		if (tli < curFileTLI)
 			break;/* don't bother looking at too-old TLIs */
@@ -4181,7 +4182,6 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 		 */
 		if (hent->begin != InvalidXLogRecPtr)
 		{
-			XLogSegNo	beginseg = 0;

 			XLByteToSeg(hent->begin, beginseg, wal_segment_size);

@@ -4223,6 +4223,14 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 return fd;
 			}
 		}
+
+		/*
+		 * For segments containing known timeline switches only consider the
+		 * last timeline as redo otherwise doesn't know when to switch
+		 * timelines.
+		 */
+		if (segno == beginseg && beginseg > 0)
+			break;
 	}

 	/* Couldn't find it.  For simplicity, complain about front timeline */

Re: Standby recovers records from wrong timeline

At Fri, 21 Oct 2022 17:44:40 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > latest works. It dones't consider the case of explict target timlines
> > so it's just a PoC.  (So this doesn't work if recovery_target_timeline
> > is set to 2 for the "standby" in the repro.)
> 
> So, finally I noticed that the function XLogFileReadAnyTLI is not
> needed at all if we are going this direction.
> 
> Regardless of recvoery_target_timeline is latest or any explicit
> imeline id or checkpoint timeline, what we can do to reach the target
> timline is just to follow the history file's direction.
> 
> If segments are partly gone while reading on a timeline, a segment on
> the older timelines is just a crap since it should be incompatible.
> 
> So.. I'm at a loss about what the function is for.
> 
> Please anyone tell me why do we need the behavior of
> XLogFileReadAnyTLI() at all?

It is introduced by 1bb2558046. And the behavior dates back to 2042b3428d.

Hmmm..  XLogFileRead() at the time did essentially the same thing to
the current XLogFileReadAnyTLI.  At that time the expectedTL*I*s
contained only timeline IDs.  Thus it seems to me, at that time,
recovery assumed that it is fine with reading the segment on the
greatest available timeline in the TLI list at every mement. (Mmm. I
cannot describe this precise enough)  In other words it did not
intend to use the segments on the older timelines than expected as the
replacement of the segment on the correct timelnie.

If this is correct (I hople the description above makes sense), now
that we can determine the exact TLI to read for the specified segno,
we don't need to descend to older timelines. In other words, the
current XLogFileReadAnyTLI() should be just XLogFileReadOnHistory(),
which reads a segment of the exact timeline calculated from the
expectedTLEs and the segno.

I'm going to work in this direction.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

parse partition strategy string in gram.y

Hello

I've had this patch sitting in a local branch for way too long.  It's a
trivial thing but for some reason it bothered me: we let the partition 
strategy flow into the backend as a string and only parse it into the
catalog 1-char version quite late.

This patch makes gram.y responsible for parsing it and passing it down
as a value from a new enum, which looks more neat.  Because it's an
enum, some "default:" cases can be removed in a couple of places.  I
also added a new elog() in case the catalog contents becomes broken.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Estoy de acuerdo contigo en que la verdad absoluta no existe...
El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)
>From e6067eee889a6636eb013f2f8d85efaf2112232b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 20 Oct 2022 12:29:18 +0200
Subject: [PATCH] have gram.y resolve partition strategy names

---
 src/backend/commands/tablecmds.c  | 35 +--
 src/backend/parser/gram.y | 22 -
 src/backend/partitioning/partbounds.c | 26 
 src/backend/utils/cache/partcache.c   |  6 +
 src/include/nodes/parsenodes.h| 15 ++--
 src/include/partitioning/partbounds.h |  2 +-
 src/include/utils/partcache.h |  2 +-
 7 files changed, 53 insertions(+), 55 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 20135ef1b0..e07fd747f7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -605,9 +605,10 @@ static void RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid,
 			Oid oldRelOid, void *arg);
 static void RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid,
 			 Oid oldrelid, void *arg);
-static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy);
+static PartitionSpec *transformPartitionSpec(Relation rel, PartitionSpec *partspec);
 static void ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNumber *partattrs,
-  List **partexprs, Oid *partopclass, Oid *partcollation, char strategy);
+  List **partexprs, Oid *partopclass, Oid *partcollation,
+  PartitionStrategy strategy);
 static void CreateInheritance(Relation child_rel, Relation parent_rel);
 static void RemoveInheritance(Relation child_rel, Relation parent_rel,
 			  bool expect_detached);
@@ -1122,7 +1123,6 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 	if (partitioned)
 	{
 		ParseState *pstate;
-		char		strategy;
 		int			partnatts;
 		AttrNumber	partattrs[PARTITION_MAX_KEYS];
 		Oid			partopclass[PARTITION_MAX_KEYS];
@@ -1147,14 +1147,14 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 		 * and CHECK constraints, we could not have done the transformation
 		 * earlier.
 		 */
-		stmt->partspec = transformPartitionSpec(rel, stmt->partspec,
-);
+		stmt->partspec = transformPartitionSpec(rel, stmt->partspec);
 
 		ComputePartitionAttrs(pstate, rel, stmt->partspec->partParams,
 			  partattrs, , partopclass,
-			  partcollation, strategy);
+			  partcollation, stmt->partspec->strategy);
 
-		StorePartitionKey(rel, strategy, partnatts, partattrs, partexprs,
+		StorePartitionKey(rel, stmt->partspec->strategy, partnatts, partattrs,
+		  partexprs,
 		  partopclass, partcollation);
 
 		/* make it all visible */
@@ -17132,10 +17132,10 @@ RangeVarCallbackForAlterRelation(const RangeVar *rv, Oid relid, Oid oldrelid,
 /*
  * Transform any expressions present in the partition key
  *
- * Returns a transformed PartitionSpec, as well as the strategy code
+ * Returns a transformed PartitionSpec.
  */
 static PartitionSpec *
-transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
+transformPartitionSpec(Relation rel, PartitionSpec *partspec)
 {
 	PartitionSpec *newspec;
 	ParseState *pstate;
@@ -17148,21 +17148,8 @@ transformPartitionSpec(Relation rel, PartitionSpec *partspec, char *strategy)
 	newspec->partParams = NIL;
 	newspec->location = partspec->location;
 
-	/* Parse partitioning strategy name */
-	if (pg_strcasecmp(partspec->strategy, "hash") == 0)
-		*strategy = PARTITION_STRATEGY_HASH;
-	else if (pg_strcasecmp(partspec->strategy, "list") == 0)
-		*strategy = PARTITION_STRATEGY_LIST;
-	else if (pg_strcasecmp(partspec->strategy, "range") == 0)
-		*strategy = PARTITION_STRATEGY_RANGE;
-	else
-		ereport(ERROR,
-(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unrecognized partitioning strategy \"%s\"",
-		partspec->strategy)));
-
 	/* Check valid number of columns for strategy */
-	if (*strategy == PARTITION_STRATEGY_LIST &&
+	if (partspec->strategy == PARTITION_STRATEGY_LIST &&
 		list_length(partspec->partParams) != 1)
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -17208,7 +17195,7 @@

RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-21 Thread houzj.f...@fujitsu.com

On Wednesday, October 19, 2022 8:50 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> ===
> 01. applyparallelworker.c - SIZE_STATS_MESSAGE
> 
> ```
> /*
>  * There are three fields in each message received by the parallel apply
>  * worker: start_lsn, end_lsn and send_time. Because we have updated these
>  * statistics in the leader apply worker, we can ignore these fields in the
>  * parallel apply worker (see function LogicalRepApplyLoop).
>  */
> #define SIZE_STATS_MESSAGE (2 * sizeof(XLogRecPtr) + sizeof(TimestampTz))
> ```
> 
> According to other comment styles, it seems that the first sentence of the
> comment should represent the datatype and usage, not the detailed reason.
> For example, about ParallelApplyWorkersList, you said "A list ...". How about
> adding like following message:
> The message size that can be skipped by parallel apply worker

Thanks for the comments, but the current description seems enough to me.

> ~~~
> 02. applyparallelworker.c - parallel_apply_start_subtrans
> 
> ```
>   if (current_xid != top_xid &&
>   !list_member_xid(subxactlist, current_xid)) ```
> 
> A macro TransactionIdEquals is defined in access/transam.h. Should we use it,
> or is it too trivial?

I checked the existing codes, it seems both style are being used.
Maybe we can post a separate patch to replace them later.

> ~~~
> 08. worker.c - apply_handle_prepare_internal
> 
> Same as above.
> 
> 
> ~~~
> 09. worker.c - maybe_reread_subscription
> 
> ```
>   /*
>* Exit if any parameter that affects the remote connection was
> changed.
>* The launcher will start a new worker.
>*/
>   if (strcmp(newsub->conninfo, MySubscription->conninfo) != 0 ||
>   strcmp(newsub->name, MySubscription->name) != 0 ||
>   strcmp(newsub->slotname, MySubscription->slotname) != 0 ||
>   newsub->binary != MySubscription->binary ||
>   newsub->stream != MySubscription->stream ||
>   strcmp(newsub->origin, MySubscription->origin) != 0 ||
>   newsub->owner != MySubscription->owner ||
>   !equal(newsub->publications, MySubscription->publications))
>   {
>   ereport(LOG,
>   (errmsg("logical replication apply worker for
> subscription \"%s\" will restart because of a parameter change",
>   MySubscription->name)));
> 
>   proc_exit(0);
>   }
> ```
> 
> When the parallel apply worker has been launched and then the subscription
> option has been modified, the same message will appear twice.
> But if the option "streaming" is changed from "parallel" to "on", one of them
> will not restart again.
> Should we modify message?

Thanks, it seems a timing problem, if the leader catch the change first and stop
the parallel workers, the message will only appear once. But I agree we'd
better make the message clear. I changed the message in parallel apply worker.
While on it, I also adjusted some other message to include "parallel apply
worker" if they are in parallel apply worker.

Best regards,
Hou zj

Re: Data is copied twice when specifying both child and parent table in publication

2022-10-21 Thread Peter Smith

Here are my review comments for HEAD patches v13*

Patch HEAD_v13-0001

I already posted some follow-up questions. See [1]

Patch HEAD_v13-0002

1. Commit message

The following usage scenarios are not described in detail in the manual:
If one subscription subscribes multiple publications, and these publications
publish a partitioned table and its partitions respectively. When we specify
this parameter on one or more of these publications, which identity and schema
should be used to publish the changes?

In these cases, I think the parameter publish_via_partition_root behave as
follows:

It seemed worded a bit strangely. Also, you said "on one or more of
these publications" but the examples only show only one publication
having 'publish_via_partition_root'.

SUGGESTION (I've modified the wording slightly but the examples are unchanged).

Assume a subscription is subscribing to multiple publications, and
these publications publish a partitioned table and its partitions
respectively:

[publisher-side]
create table parent (a int primary key) partition by range (a);
create table child partition of parent default;

create publication pub1 for table parent;
create publication pub2 for table child;

[subscriber-side]
create subscription sub connection '' publication pub1, pub2;

The manual does not clearly describe the behaviour when the user had
specified the parameter 'publish_via_partition_root' on just one of
the publications. This patch modifies documentation to clarify the
following rules:

- If the parameter publish_via_partition_root is specified only in pub1,
changes will be published using the identity and schema of the table 'parent'.

- If the parameter publish_via_partition_root is specified only in pub2,
changes will be published using the identity and schema of the table 'child'.

~~~

- If the parameter publish_via_partition_root is specified only in pub2,
changes will be published using the identity and schema of the table child.

Is that right though? This rule seems 100% contrary to the meaning of
'publish_via_partition_root=true'.

3. doc/src/sgml/ref/create_publication.sgml

+
+ If a root partitioned table is published by any subscribed
publications which
+ set publish_via_partition_root = true, changes on this root
partitioned table
+ (or on its partitions) will be published using the identity
and schema of this
+ root partitioned table rather than that of the individual partitions.
+

This seems to only describe the first example from the commit message.
What about some description to explain the second example?

--
[1]
https://www.postgresql.org/message-id/CAHut%2BPt%2B1PNx6VsZ-xKzAU-18HmNXhjCC1TGakKX46Wg7YNT1Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Data is copied twice when specifying both child and parent table in publication

2022-10-21 Thread Peter Smith

On Mon, Oct 17, 2022 at 4:49 PM wangw.f...@fujitsu.com
 wrote:
>
> On Wed, Oct 5, 2022 at 11:08 AM Peter Smith  wrote:
> > Hi Wang-san. Here are my review comments for HEAD_v12-0001 patch.
>
...
> >
> > 3. QUESTION - pg_get_publication_tables / fetch_table_list
> >
> > When the same table is published by different publications (but there
> > are other differences like row-filters/column-lists in each
> > publication) the result tuple of this function does not include the
> > pubid. Maybe the SQL of pg_publication_tables/fetch_table_list() is OK
> > as-is but how does it manage to associate each table with the correct
> > tuple?
> >
> > I know it apparently all seems to work but I’m not how does that
> > happen? Can you explain why a puboid is not needed for the result
> > tuple of this function?
>
> Sorry, I am not sure I understand your question.
> I try to answer your question by explaining the two functions you mentioned:
>
> First, the function pg_get_publication_tables gets the list (see table_infos)
> that included published table and the corresponding publication. Then based
> on this list, the function pg_get_publication_tables returns information
> (scheme, relname, row filter and column list) about the published tables in 
> the
> publications list. It just doesn't return pubid.
>
> Then, the SQL in the function fetch_table_list will get the columns in the
> column list from pg_attribute. (This is to return all columns when the column
> list is not specified)
>

I meant, for example, if the different publications specified
different col-lists for the same table then IIUC the
fetch_table_lists() is going to return 2 list elements
(schema,rel_name,row_filter,col_list). But when the schema/rel_name
are the same for 2 elements then (without also a pubid) how are you
going to know where the list element came from, and how come that is
not important?

> > ~~
> >
> > test_pub=# create table t1(a int, b int, c int);
> > CREATE TABLE
> > test_pub=# create publication pub1 for table t1(a) where (a > 99);
> > CREATE PUBLICATION
> > test_pub=# create publication pub2 for table t1(a,b) where (b < 33);
> > CREATE PUBLICATION
> >
> > Following seems OK when I swap orders of publication names...
> >
> > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > ARRAY['pub2','pub1']) gpt(relid, attrs, qual);
> >  relid | attrs | rowfilter
> > ---+---+---
> >  16385 | 1 2   | (b < 33)
> >  16385 | 1 | (a > 99)
> > (2 rows)
> >
> > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > ARRAY['pub1','pub2']) gpt(relid, attrs, qual);
> >  relid | attrs | rowfilter
> > ---+---+---
> >  16385 | 1 | (a > 99)
> >  16385 | 1 2   | (b < 33)
> > (2 rows)
> >
> > But what about this (this is similar to the SQL fragment from
> > fetch_table_list); I swapped the pub names but the results are the
> > same...
> >
> > test_pub=# SELECT pg_get_publication_tables(VARIADIC
> > array_agg(p.pubname)) from pg_publication p where pubname
> > IN('pub2','pub1');
> >
> >  pg_get_publication_tables
> >
> > -
> > -
> > -
> > -
> > ---
> >  (16385,1,"{OPEXPR :opno 521 :opfuncid 147 :opresulttype 16 :opretset
> > false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1
> > :vartype 23 :vartypmod -1 :var
> > collid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 47}
> > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > :constbyval true :constisnull false :
> > location 51 :constvalue 4 [ 99 0 0 0 0 0 0 0 ]}) :location 49}")
> >  (16385,"1 2","{OPEXPR :opno 97 :opfuncid 66 :opresulttype 16
> > :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1
> > :varattno 2 :vartype 23 :vartypmod -1 :v
> > arcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 49}
> > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > :constbyval true :constisnull false
> >  :location 53 :constvalue 4 [ 33 0 0 0 0 0 0 0 ]}) :location 51}")
> > (2 rows)
> >
> > test_pub=# SELECT pg_get_publication_tables(VARIADIC
> > array_agg(p.pubname)) from pg_publication p where pubname
> > IN('pub1','pub2');
> >
> >  pg_get_publication_tables
> >
> > -
> > -
> >

Re: fix stats_fetch_consistency value in postgresql.conf.sample

At Fri, 21 Oct 2022 11:58:15 +1100, Peter Smith  wrote 
in 
> Hi, I was hoping to use this patch in my other thread [1], but your
> latest attachment is reported broken in cfbot [2]. Please rebase it.

Ouch. I haven't reach here. I'll do that next Monday.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Standby recovers records from wrong timeline

At Fri, 21 Oct 2022 17:12:45 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> latest works. It dones't consider the case of explict target timlines
> so it's just a PoC.  (So this doesn't work if recovery_target_timeline
> is set to 2 for the "standby" in the repro.)

So, finally I noticed that the function XLogFileReadAnyTLI is not
needed at all if we are going this direction.

Regardless of recvoery_target_timeline is latest or any explicit
imeline id or checkpoint timeline, what we can do to reach the target
timline is just to follow the history file's direction.

If segments are partly gone while reading on a timeline, a segment on
the older timelines is just a crap since it should be incompatible.

So.. I'm at a loss about what the function is for.

Please anyone tell me why do we need the behavior of
XLogFileReadAnyTLI() at all?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: cross-platform pg_basebackup

2022-10-21 Thread davinder singh

Hi,
Patch v2 looks good to me, I have tested it, and pg_basebackup works fine
across the platforms (Windows to Linux and Linux to Windows).
Syntax used for testing
$ pg_basebackup -h remote_server_ip -p 5432 -U user_name -D backup/data -T
olddir=newdir

I have also tested with non-absolute paths, it behaves as expected.

On Fri, Oct 21, 2022 at 12:42 AM Andrew Dunstan  wrote:

>
> On 2022-10-20 Th 14:47, Robert Haas wrote:
> > On Thu, Oct 20, 2022 at 1:28 PM Tom Lane  wrote:
> >> Robert Haas  writes:
> >>> Cool. Here's a patch.
> >> LGTM, except I'd be inclined to ensure that all the macros
> >> are function-style, ie
> >>
> >> +#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)
> >>
> >> not just
> >>
> >> +#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP
> >>
> >> I don't recall the exact rules, but I know that the second style
> >> can lead to expanding the macro in more cases, which we likely
> >> don't want.  It also seems like better documentation to show
> >> the expected arguments.
> > OK, thanks. v2 attached.
> >
>
>
> Looks good.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>
>
>

-- 
Regards,
Davinder
EnterpriseDB: http://www.enterprisedb.com

Re: Standby recovers records from wrong timeline

At Fri, 21 Oct 2022 16:45:59 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 20 Oct 2022 14:44:40 +0300, Ants Aasma  wrote in 
> > My understanding is that backup archives are supposed to remain valid
> > even after PITR or equivalently a lagging standby promoting.
> 
> Sorry, I was dim because of maybe catching a cold:p
> 
> On second thought. everything works fine if the first segment of the
> new timeline is archived in this case. So the problem here is whether
> recovery should wait for a known new timline when no segment on the
> new timeline is available yet.  As you say, I think it is sensible
> that recovery waits at the divergence LSN for the first segment on the
> new timeline before proceeding on the same timeline.

It is simpler than anticipated.  Just not descending timelines when
latest works. It dones't consider the case of explict target timlines
so it's just a PoC.  (So this doesn't work if recovery_target_timeline
is set to 2 for the "standby" in the repro.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..18c14d1fbf 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -4223,6 +4223,13 @@ XLogFileReadAnyTLI(XLogSegNo segno, int emode, XLogSource source)
 return fd;
 			}
 		}
+
+		/*
+		 * If recovery_target_timeline is "latest", we don't want to read this
+		 * segment belongs to older timelines.
+		 */
+		if (recoveryTargetTimeLineGoal == RECOVERY_TARGET_TIMELINE_LATEST)
+			break;
 	}
 
 	/* Couldn't find it.  For simplicity, complain about front timeline */

Re: Standby recovers records from wrong timeline

At Thu, 20 Oct 2022 14:44:40 +0300, Ants Aasma  wrote in 
> My understanding is that backup archives are supposed to remain valid
> even after PITR or equivalently a lagging standby promoting.

Sorry, I was dim because of maybe catching a cold:p

On second thought. everything works fine if the first segment of the
new timeline is archived in this case. So the problem here is whether
recovery should wait for a known new timline when no segment on the
new timeline is available yet.  As you say, I think it is sensible
that recovery waits at the divergence LSN for the first segment on the
new timeline before proceeding on the same timeline.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Crash after a call to pg_backup_start()

While playing with a proposed patch, I noticed that a session crashes
after a failed call to pg_backup_start().

postgres=# select pg_backup_start(repeat('x', 1026));
ERROR:  backup label too long (max 1024 bytes)
postgres=# \q
> TRAP: failed Assert("during_backup_start ^ (sessionBackupState == 
> SESSION_BACKUP_RUNNING)"), File: "xlog.c", Line: 8846, PID: 165835

Surprisingly this happens by a series of succussful calls to
pg_backup_start and stop.

postgres=# select pg_backup_start('x');
postgres=# select pg_backup_top();
postgres=# \q
> TRAP: failed Assert("durin..


>> do_pg_abort_backup(int code, Datum arg)
>   /* Only one of these conditions can be true */
>   Assert(during_backup_start ^
>  (sessionBackupState == SESSION_BACKUP_RUNNING));

It seems to me that the comment is true and the condition is a thinko.
This is introduced by df3737a651 so it is master only.

Please find the attached fix.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dea978a962..888f5b1bff 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8842,8 +8842,8 @@ do_pg_abort_backup(int code, Datum arg)
 	bool		during_backup_start = DatumGetBool(arg);
 
 	/* Only one of these conditions can be true */
-	Assert(during_backup_start ^
-		   (sessionBackupState == SESSION_BACKUP_RUNNING));
+	Assert(!(during_backup_start &&
+			 (sessionBackupState == SESSION_BACKUP_RUNNING)));
 
 	if (during_backup_start || sessionBackupState != SESSION_BACKUP_NONE)
 	{

Re: Avoid memory leaks during base backups

At Thu, 20 Oct 2022 19:47:07 +0200, Alvaro Herrera  
wrote in 
> I agree that's a good idea, and the patch looks good to me, but I don't
> think asserting that they are null afterwards is useful.

+1 for this direction. And the patch is fine to me.


>   oldcontext = MemoryContextSwitchTo(backupcontext);
>   Assert(backup_state == NULL);
>   Assert(tablespace_map == NULL);
>   backup_state = (BackupState *) palloc0(sizeof(BackupState));
>   tablespace_map = makeStringInfo();
>   MemoryContextSwitchTo(oldcontext);

We can use MemoryContextAllocZero() for this purpose, but of couse not
mandatory.


+* across. We keep the memory allocated in this memory context less,
+* because any error before reaching pg_backup_stop() can leak the 
memory
+* until pg_backup_start() is called again. While this is not smart, it
+* helps to keep things simple.

I think the "less" is somewhat obscure.  I feel we should be more
explicitly. And we don't need to put emphasis on "leak".  I recklessly
propose this as the draft.

"The context is intended to be used by this function to store only
session-lifetime values.  It is, if left alone, reset at the next call
to blow away orphan memory blocks from the previous failed call.
While this is not smart, it helps to keep things simple."

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Avoid memory leaks during base backups

On Fri, Oct 21, 2022 at 11:34:27AM +0530, Bharath Rupireddy wrote:
> On Fri, Oct 21, 2022 at 12:21 AM Robert Haas  wrote:
>> On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy 
>>  wrote:
>>> I think elsewhere in the code we reset dangling pointers either ways -
>>> before or after deleting/resetting memory context. But placing them
>>> before would give us extra safety in case memory context
>>> deletion/reset fails. Not sure what's the best way.
>>
>> I think it's OK to assume that deallocating memory will always
>> succeed, so it doesn't matter whether you do it just before or just
>> after that. But it's not OK to assume that *allocating* memory will
>> always succeed.
> 
> Right.

To be exact, it seems to me that tablespace_map and backup_state
should be reset before deleting backupcontext, but the reset of
backupcontext should happen after the fact.

+   backup_state = NULL;
tablespace_map = NULL;
These two in pg_backup_start() don't matter, do they?  They are
reallocated a couple of lines down.

+* across. We keep the memory allocated in this memory context less,
What does "We keep the memory allocated in this memory context less"
mean here? 
--
Michael


signature.asc
Description: PGP signature

Re: Avoid memory leaks during base backups

On Thu, Oct 20, 2022 at 02:51:21PM -0400, Robert Haas wrote:
> On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy
>  wrote:
>> I think elsewhere in the code we reset dangling pointers either ways -
>> before or after deleting/resetting memory context. But placing them
>> before would give us extra safety in case memory context
>> deletion/reset fails. Not sure what's the best way.
> 
> I think it's OK to assume that deallocating memory will always
> succeed, so it doesn't matter whether you do it just before or just
> after that. But it's not OK to assume that *allocating* memory will
> always succeed.

AFAIK, one of the callbacks associated to a memory context could
fail, see comments before MemoryContextCallResetCallbacks() in
MemoryContextDelete().  I agree that it should not matter here, but I
think that it is better to reset the pointers before attempting the
deletion of the memory context in this case.
--
Michael


signature.asc
Description: PGP signature

Re: Avoid memory leaks during base backups

On Thu, Oct 20, 2022 at 11:17 PM Alvaro Herrera  wrote:
>
> On 2022-Oct-20, Bharath Rupireddy wrote:
>
> > I think elsewhere in the code we reset dangling pointers either ways -
> > before or after deleting/resetting memory context. But placing them
> > before would give us extra safety in case memory context
> > deletion/reset fails. Not sure what's the best way. However, I'm
> > nullifying the dangling pointers after deleting/resetting memory
> > context.
>
> I agree that's a good idea, and the patch looks good to me, but I don't
> think asserting that they are null afterwards is useful.

+1. Removed those assertions. Please see the attached v9 patch.

On Fri, Oct 21, 2022 at 12:21 AM Robert Haas  wrote:
>
> On Thu, Oct 20, 2022 at 1:35 PM Bharath Rupireddy
>  wrote:
> > I think elsewhere in the code we reset dangling pointers either ways -
> > before or after deleting/resetting memory context. But placing them
> > before would give us extra safety in case memory context
> > deletion/reset fails. Not sure what's the best way.
>
> I think it's OK to assume that deallocating memory will always
> succeed, so it doesn't matter whether you do it just before or just
> after that. But it's not OK to assume that *allocating* memory will
> always succeed.

Right.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v9-0001-Avoid-memory-leaks-during-backups-using-SQL-calla.patch
Description: Binary data

Re: making relfilenodes 56 bits