Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-10 Thread Amit Kapila
On Fri, Mar 10, 2023 at 7:58 PM Önder Kalacı  wrote:
>
>
> I think one option could be to drop some cases altogether, but not sure we'd 
> want that.
>
> As a semi-related question: Are you aware of any setting that'd make 
> pg_stat_all_indexes
> reflect the changes sooner? It is hard to debug what is the bottleneck in the 
> tests, but
> I have a suspicion that there might be several poll_query_until() calls on
> pg_stat_all_indexes, which might be the reason?
>

Yeah, I also think poll_query_until() calls on pg_stat_all_indexes is
the main reason for these tests taking more time. When I commented
those polls, it drastically reduces the test time. On looking at
pgstat_report_stat(), it seems we don't report stats sooner than 1s
and as most of this patch's test relies on stats, it leads to taking
more time. I don't have a better idea to verify this patch without
checking whether the index scan is really used by referring to
pg_stat_all_indexes. I think trying to reduce the poll calls may help
in reducing the test timings further. Some ideas on those lines are as
follows:
1.
+# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA VIA
+# A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY

No need to use Delete test separate for this.

2.
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
+
+# check if the index is used even when the index has NULL values
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates test_replica_id_full table";

Here, I think only one update is sufficient.

3.
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX people_last_names ON people(lastname)");
+
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
indexrelname = 'people_last_names';}
+) or die "Timed out while waiting for creating index people_last_names";

I don't think we need this poll.

4.
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname = 'first_name_1';");
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname =
'first_name_3' AND lastname = 'last_name_3';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates two rows via index scan with index on
expressions and columns";
+
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM people WHERE firstname = 'no-name';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full deletes two rows via index scan with index on
expressions and columns";

I think having one update or delete should be sufficient.

5.
+# update rows, moving them to other partitions
+$node_publisher->safe_psql('postgres',
+ "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=1 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for updates on partitioned table with index";
+
+# delete rows from different partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=3 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates partitioned table";
+

Can we combine these two polls?

6.
+# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS, ALSO
+# DROPS COLUMN

In this test, let's try to update/delete 2-3 rows instead of 20. And
after drop columns, let's keep just one of the update or delete.

7. Apart from the above, I think it is better to use
wait_for_catchup() consistently before trying to verify the data on
the subscriber. We always use it in other tests. I guess here you are
relying on the poll for index scans to ensure that data is replicated
but I feel it may still be better to use wait_for_catchup().
Similarly, wait_for_subscription_sync uses the 

Re: Add LZ4 compression in pg_dump

2023-03-10 Thread Alexander Lakhin

Hello,
23.02.2023 23:24, Tomas Vondra wrote:

On 2/23/23 16:26, Tomas Vondra wrote:

Thanks for v30 with the updated commit messages. I've pushed 0001 after
fixing a comment typo and removing (I think) an unnecessary change in an
error message.

I'll give the buildfarm a bit of time before pushing 0002 and 0003.


I've now pushed 0002 and 0003, after minor tweaks (a couple typos etc.),
and marked the CF entry as committed. Thanks for the patch!

I wonder how difficult would it be to add the zstd compression, so that
we don't have the annoying "unsupported" cases.


With the patch 0003 committed, a single warning -Wtype-limits appeared in the
master branch:
$ CPPFLAGS="-Og -Wtype-limits" ./configure --with-lz4 -q && make -s -j8
compress_lz4.c: In function ‘LZ4File_gets’:
compress_lz4.c:492:19: warning: comparison of unsigned expression in ‘< 0’ is 
always false [-Wtype-limits]

  492 | if (dsize < 0)
  |
(I wonder, is it accidental that there no other places that triggers
the warning, or some buildfarm animals had this check enabled before?)

It is not a false positive as can be proved by the 002_pg_dump.pl modified as
follows:
-   program => $ENV{'LZ4'},
+   program => 'mv',
    args    => [
-   '-z', '-f', '--rm',
"$tempdir/compression_lz4_dir/blobs.toc",
"$tempdir/compression_lz4_dir/blobs.toc.lz4",
    ],
    },
A diagnostic logging added shows:
LZ4File_gets() after LZ4File_read_internal; dsize: 18446744073709551615

and pg_restore fails with:
error: invalid line in large object TOC file 
".../src/bin/pg_dump/tmp_check/tmp_test_22ri/compression_lz4_dir/blobs.toc": ""


Best regards,
Alexander




Re: Dead code in ps_status.c

2023-03-10 Thread Andres Freund
Hi,

On 2023-03-11 16:59:46 +1300, Thomas Munro wrote:
> On Fri, Feb 17, 2023 at 3:38 AM Tom Lane  wrote:
> > Thomas Munro  writes:
> > > On Thu, Feb 16, 2023 at 6:34 PM Tom Lane  wrote:
> > > My GCC compile farm account seems to have expired, or something, so I
> > > couldn't check on wrasse's host (though whether wrasse is "live" is
> > > debatable: Solaris 11.3 has reached EOL, it's just that the CPU is too
> > > old to be upgraded, so it's not testing a real OS that anyone would
> > > actually run PostgreSQL on).  ...
>
> > My account still works, and what I see on wrasse's host is
>
> Just in case it helps someone else who finds themselves locked out of
> that, I noticed that I can still connect from my machine with OpenSSH
> 8.8p1, but not from another dev box which was upgraded to OpenSSH
> 9.2p1.  For reasons I didn't look into, the latter doesn't like
> exchanging 1s and 0s with "Sun_SSH_2.4" (something Oracle has
> apparently now abandoned in favour of stock OpenSSH, but that machine
> is stuck in time).

It's the key types supported by the old ssh. I have the following in my
~/.ssh/config to work around that:

Host gcc210.fsffrance.org
PubkeyAcceptedKeyTypes +ssh-rsa
KexAlgorithms +diffie-hellman-group1-sha1
Host gcc211.fsffrance.org
PubkeyAcceptedKeyTypes +ssh-rsa

Greetings,

Andres Freund




Re: proposal - get_extension_version function

2023-03-10 Thread Pavel Stehule
Hi


st 8. 3. 2023 v 17:58 odesílatel Pavel Stehule 
napsal:

> Hi
>
> I try to write a safeguard check that ensures the expected extension
> version for an extension library.
>
> Some like
>
> const char *expected_extversion = "2.5";
>
> ...
>
> extoid = getExtensionOfObject(ProcedureRelationId,
> fcinfo->flinfo->fn_oid));
> extversion = get_extension_version(extoid);
> if (strcmp(expected_extversion, extversion) != 0)
>elog(ERROR, "extension \"%s\" needs \"ALTER EXTENSION %s UPDATE\",
>   get_extension_name(extversion),
>   get_extension_name(extversion)))
>
> Currently the extension version is not simply readable - I need to read
> directly from the table.
>
> Notes, comments?
>

attached patch

Regards

Pavel


>
> Regards
>
> Pavel
>
>
From b12a43d885af4311dbb61684d65b6f5fc41b60d2 Mon Sep 17 00:00:00 2001
From: "ok...@github.com" 
Date: Sat, 11 Mar 2023 05:04:28 +0100
Subject: [PATCH] get_extension_version - given an extension OID, look up the
 version

Returns a palloc'd string, or NULL if no such extension.
---
 src/backend/commands/extension.c | 50 
 src/include/commands/extension.h |  1 +
 2 files changed, 51 insertions(+)

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 02ff4a9a7f..ea347750fe 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -256,6 +256,56 @@ get_extension_schema(Oid ext_oid)
 	return result;
 }
 
+/*
+ * get_extension_version - given an extension OID, look up the version
+ *
+ * Returns a palloc'd string, or NULL if no such extension.
+ */
+char *
+get_extension_version(Oid ext_oid)
+{
+	char	   *result;
+	Relation	rel;
+	SysScanDesc scandesc;
+	HeapTuple	tuple;
+	ScanKeyData entry[1];
+
+	rel = table_open(ExtensionRelationId, AccessShareLock);
+
+	ScanKeyInit([0],
+Anum_pg_extension_oid,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(ext_oid));
+
+	scandesc = systable_beginscan(rel, ExtensionOidIndexId, true,
+  NULL, 1, entry);
+
+	tuple = systable_getnext(scandesc);
+
+	/* We assume that there can be at most one matching tuple */
+	if (HeapTupleIsValid(tuple))
+	{
+		Datum		datum;
+		bool		isnull;
+
+		datum = heap_getattr(tuple, Anum_pg_extension_extversion,
+			 RelationGetDescr(rel), );
+
+		if (isnull)
+			elog(ERROR, "extversion is null");
+
+		result = text_to_cstring(DatumGetTextPP(datum));
+	}
+	else
+		result = NULL;
+
+	systable_endscan(scandesc);
+
+	table_close(rel, AccessShareLock);
+
+	return result;
+}
+
 /*
  * Utility functions to check validity of extension and version names
  */
diff --git a/src/include/commands/extension.h b/src/include/commands/extension.h
index 74ae391395..3563e07b9f 100644
--- a/src/include/commands/extension.h
+++ b/src/include/commands/extension.h
@@ -48,6 +48,7 @@ extern ObjectAddress ExecAlterExtensionContentsStmt(AlterExtensionContentsStmt *
 extern Oid	get_extension_oid(const char *extname, bool missing_ok);
 extern char *get_extension_name(Oid ext_oid);
 extern Oid	get_extension_schema(Oid ext_oid);
+extern char *get_extension_version(Oid ext_oid);
 extern bool extension_file_exists(const char *extensionName);
 
 extern ObjectAddress AlterExtensionNamespace(const char *extensionName, const char *newschema,
-- 
2.39.2



Re: cpluspluscheck vs ICU

2023-03-10 Thread Andres Freund
Hi,

On 2023-03-10 19:37:27 -0800, Andres Freund wrote:
> On 2022-03-23 08:56:17 -0700, Andres Freund wrote:
> > On 2022-03-23 08:19:38 -0400, Andrew Dunstan wrote:
> > > On 3/22/22 22:23, Andres Freund wrote:
> > > That only helps when running the CI/cfbot setup. Fixing it for other
> > > (manual or buildfarm) users would be nice. Luckily crake isn't building
> > > with ICU.
> >
> > Oh, I agree we need to fix it properly. I just don't yet know how to - see 
> > the
> > list of alternatives upthread. Seems no reason to hold up preventing further
> > problems via CI / cfbot though.
> 
> I just hit this once more - and I figured out a fairly easy fix:
> 
> We just need a
>   #ifndef U_DEFAULT_SHOW_DRAFT
>   #define U_DEFAULT_SHOW_DRAFT 0
>   #endif
> before including unicode/ucol.h.
> 
> At first I was looking at
>   #define U_SHOW_CPLUSPLUS_API 0
> and
>   #define U_HIDE_INTERNAL_API 1
> which both work, but they are documented to be internal.

Err. Unfortunately only the U_SHOW_CPLUSPLUS_API approach actually works. The
others don't, not quite sure what I was doing earlier.

So it's either relying on a define marked as internal, or the below:

> Alternatively we could emit U_DEFAULT_SHOW_DRAFT 0 into pg_config.h to avoid
> that issue.
> 
> 
> The only other thing I see is to do something like:
> 
> #ifdef USE_ICU
> #ifdef __cplusplus
> /* close extern "C", otherwise we'll get errors from within ICU */
> }
> #endif /* __cplusplus */
> 
> #include 
> 
> #ifdef __cplusplus
> extern "C" {
> #endif /* __cplusplus */
> 
> #endif /* USE_ICU */
> 
> which seems mighty ugly.

Greetings,

Andres Freund




Re: Dead code in ps_status.c

2023-03-10 Thread Thomas Munro
On Fri, Feb 17, 2023 at 3:38 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Thu, Feb 16, 2023 at 6:34 PM Tom Lane  wrote:
> > My GCC compile farm account seems to have expired, or something, so I
> > couldn't check on wrasse's host (though whether wrasse is "live" is
> > debatable: Solaris 11.3 has reached EOL, it's just that the CPU is too
> > old to be upgraded, so it's not testing a real OS that anyone would
> > actually run PostgreSQL on).  ...

> My account still works, and what I see on wrasse's host is

Just in case it helps someone else who finds themselves locked out of
that, I noticed that I can still connect from my machine with OpenSSH
8.8p1, but not from another dev box which was upgraded to OpenSSH
9.2p1.  For reasons I didn't look into, the latter doesn't like
exchanging 1s and 0s with "Sun_SSH_2.4" (something Oracle has
apparently now abandoned in favour of stock OpenSSH, but that machine
is stuck in time).




Re: cpluspluscheck vs ICU

2023-03-10 Thread Andres Freund
Hi,

On 2022-03-23 08:56:17 -0700, Andres Freund wrote:
> On 2022-03-23 08:19:38 -0400, Andrew Dunstan wrote:
> > On 3/22/22 22:23, Andres Freund wrote:
> > That only helps when running the CI/cfbot setup. Fixing it for other
> > (manual or buildfarm) users would be nice. Luckily crake isn't building
> > with ICU.
>
> Oh, I agree we need to fix it properly. I just don't yet know how to - see the
> list of alternatives upthread. Seems no reason to hold up preventing further
> problems via CI / cfbot though.

I just hit this once more - and I figured out a fairly easy fix:

We just need a
  #ifndef U_DEFAULT_SHOW_DRAFT
  #define U_DEFAULT_SHOW_DRAFT 0
  #endif
before including unicode/ucol.h.

At first I was looking at
  #define U_SHOW_CPLUSPLUS_API 0
and
  #define U_HIDE_INTERNAL_API 1
which both work, but they are documented to be internal.


The reason for the #ifndef is that pg_locale.h might be included by .c files
that already included ICU headers, which then otherwise would cause macro
redefinition warnings. E.g. in formatting.c.

Alternatively we could emit U_DEFAULT_SHOW_DRAFT 0 into pg_config.h to avoid
that issue.


The only other thing I see is to do something like:

#ifdef USE_ICU
#ifdef __cplusplus
/* close extern "C", otherwise we'll get errors from within ICU */
}
#endif /* __cplusplus */

#include 

#ifdef __cplusplus
extern "C" {
#endif /* __cplusplus */

#endif /* USE_ICU */

which seems mighty ugly.


Regards,

Andres




Re: pg_dump versus hash partitioning

2023-03-10 Thread Julien Rouhaud
On Fri, Mar 10, 2023 at 10:10:14PM -0500, Tom Lane wrote:
> Julien Rouhaud  writes:
> > Working on some side project that can cause dump of hash partitions to be
> > routed to a different partition, I realized that --load-via-partition-root 
> > can
> > indeed cause deadlock in such case without FK dependency or anything else.
>
> > The problem is that each worker will perform a TRUNCATE TABLE ONLY followed 
> > by
> > a copy of the original partition's data in a transaction, and that obviously
> > will lead to deadlock if the original and locked partition and the restored
> > partition are different.
>
> Oh, interesting.  I wonder if we can rearrange things to avoid that.

The BEGIN + TRUNCATE is only there to avoid generating WAL records just in case
the wal_level is minimal.  I don't remember if that optimization still exists,
but if yes we could avoid doing that if the server's wal_level is replica or
higher?  That's not perfect but it would help in many cases.




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-10 Thread Amit Kapila
On Fri, Mar 10, 2023 at 3:25 PM Önder Kalacı  wrote:
>
>>
>
> Done with an important caveat. I think this reorganization of the test helped
> us to find one edge case regarding dropped columns.
>
> I realized that the dropped columns also get into the tuples_equal() 
> function. And,
> the remote sends NULL to for the dropped columns(i.e., remoteslot), but
> index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped
> columns on the outslot. So, the dropped columns are not NULL in the outslot.
>
> This triggers tuples_equal to fail. To fix that, I improved the tuples_equal
> such that it skips the dropped columns.
>

Good catch. By any chance, have you tried with generated columns? See
logicalrep_write_tuple()/logicalrep_write_attrs() where we neither
send anything for dropped columns nor for generated columns.
Similarly, on receiving side, in logicalrep_rel_open() and
slot_store_data(), we seem to be using NULL for such columns.

> I also spend quite a bit of time understanding how/why this impacts
> HEAD. See steps below on HEAD, where REPLICA IDENTITY FULL
> fails to replica the data properly:
>
>
> -- pub
> CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 
> timestamptz);
> ALTER TABLE test REPLICA IDENTITY FULL;
> INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM 
> generate_series(0,1)i;
> CREATE PUBLICATION pub FOR ALL TABLES;
>
> -- sub
> CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 
> timestamptz);
> CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' 
> PUBLICATION pub;
>
> -- show that before dropping the columns, the data in the source and
> -- target are deleted properly
> DELETE FROM test WHERE x = 0;
>
> -- both on the source and target
> SELECT count(*) FROM test WHERE x = 0;
> ┌───┐
> │ count │
> ├───┤
> │ 0 │
> └───┘
> (1 row)
>
> -- drop columns on both the the source
> ALTER TABLE test DROP COLUMN drop_1;
> ALTER TABLE test DROP COLUMN drop_2;
> ALTER TABLE test DROP COLUMN drop_3;
>
> -- drop columns on both the the target
> ALTER TABLE test DROP COLUMN drop_1;
> ALTER TABLE test DROP COLUMN drop_2;
> ALTER TABLE test DROP COLUMN drop_3;
>
> -- on the target
> ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
>
>
> -- after dropping the columns
> DELETE FROM test WHERE x = 1;
>
> -- source
> SELECT count(*) FROM test WHERE x = 1;
> ┌───┐
> │ count │
> ├───┤
> │ 0 │
> └───┘
> (1 row)
>
> -- target, OOPS wrong result
> SELECT count(*) FROM test WHERE x = 1;
>
> ┌───┐
> │ count │
> ├───┤
> │ 1 │
> └───┘
> (1 row)
>
>
> Should we have a separate patch for the tuples_equal change so that we
> might want to backport?
>

Yes, it would be better to report and discuss this in a separate thread,

> Attached as v40_0001 on the patch.
>
> Note that I need to have that commit as 0001 so that 0002 patch
> passes the tests.
>

I think we can add such a test (which relies on existing buggy
behavior) later after fixing the existing bug. For now, it would be
better to remove that test and add it after we fix dropped columns
issue in HEAD.

-- 
With Regards,
Amit Kapila.




Re: pg_dump versus hash partitioning

2023-03-10 Thread Tom Lane
Julien Rouhaud  writes:
> Working on some side project that can cause dump of hash partitions to be
> routed to a different partition, I realized that --load-via-partition-root can
> indeed cause deadlock in such case without FK dependency or anything else.

> The problem is that each worker will perform a TRUNCATE TABLE ONLY followed by
> a copy of the original partition's data in a transaction, and that obviously
> will lead to deadlock if the original and locked partition and the restored
> partition are different.

Oh, interesting.  I wonder if we can rearrange things to avoid that.

regards, tom lane




Re: Lock mode in ExecMergeMatched()

2023-03-10 Thread Dean Rasheed
On Fri, 10 Mar 2023 at 21:42, Alexander Korotkov  wrote:
>
> I wonder why does ExecMergeMatched() determine the lock mode using
> ExecUpdateLockMode().  Why don't we use lock mode set by
> table_tuple_update() like ExecUpdate() does?  I skim through the
> MERGE-related threads, but didn't find an answer.
>
> I also noticed that we use ExecUpdateLockMode() even for CMD_DELETE.
> That ends up by usage of LockTupleNoKeyExclusive for CMD_DELETE, which
> seems plain wrong for me.
>
> The proposed change is attached.
>

That won't work if it did a cross-partition update, since it won't
have done a table_tuple_update() in that case, and updateCxt.lockmode
won't have been set. Also, when it loops back and retries, it might
execute a different action next time round. So I think it needs to
unconditionally use LockTupleExclusive, since it doesn't know if it'll
end up executing an update or a delete.

I'm currently working on a patch for bug #17809 that might change that
code though.

Regards,
Dean




Re: pg_dump versus hash partitioning

2023-03-10 Thread Julien Rouhaud
On Tue, Feb 14, 2023 at 02:21:33PM -0500, Tom Lane wrote:
> Here's a set of draft patches around this issue.
>
> 0001 does what I last suggested, ie force load-via-partition-root for
> leaf tables underneath a partitioned table with a partitioned-by-hash
> enum column.  It wasn't quite as messy as I first feared, although we do
> need a new query (and pg_dump now knows more about pg_partitioned_table
> than it used to).
>
> I was a bit unhappy to read this in the documentation:
>
> It is best not to use parallelism when restoring from an archive made
> with this option, because pg_restore will
> not know exactly which partition(s) a given archive data item will
> load data into.  This could result in inefficiency due to lock
> conflicts between parallel jobs, or perhaps even restore failures due
> to foreign key constraints being set up before all the relevant data
> is loaded.
>
> This made me wonder if this could be a usable solution at all, but
> after thinking for awhile, I don't see how the claim about foreign key
> constraints is anything but FUD.  pg_dump/pg_restore have sufficient
> dependency logic to prevent that from happening.  I think we can just
> drop the "or perhaps ..." clause here, and tolerate the possible
> inefficiency as better than failing.

Working on some side project that can cause dump of hash partitions to be
routed to a different partition, I realized that --load-via-partition-root can
indeed cause deadlock in such case without FK dependency or anything else.

The problem is that each worker will perform a TRUNCATE TABLE ONLY followed by
a copy of the original partition's data in a transaction, and that obviously
will lead to deadlock if the original and locked partition and the restored
partition are different.




Re: Should vacuum process config file reload more often

2023-03-10 Thread Melanie Plageman
On Fri, Mar 10, 2023 at 6:11 PM Melanie Plageman
 wrote:
>
> Quotes below are combined from two of Sawada-san's emails.
>
> I've also attached a patch with my suggested current version.
>
> On Thu, Mar 9, 2023 at 10:27 PM Masahiko Sawada  wrote:
> >
> > On Fri, Mar 10, 2023 at 11:23 AM Melanie Plageman
> >  wrote:
> > >
> > > On Tue, Mar 7, 2023 at 12:10 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Mon, Mar 6, 2023 at 5:26 AM Melanie Plageman
> > > >  wrote:
> > > > >
> > > > > On Thu, Mar 2, 2023 at 6:37 PM Melanie Plageman
> > > > > In this version I've removed wi_cost_delay from WorkerInfoData. There 
> > > > > is
> > > > > no synchronization of cost_delay amongst workers, so there is no 
> > > > > reason
> > > > > to keep it in shared memory.
> > > > >
> > > > > One consequence of not updating VacuumCostDelay from wi_cost_delay is
> > > > > that we have to have a way to keep track of whether or not autovacuum
> > > > > table options are in use.
> > > > >
> > > > > This patch does this in a cringeworthy way. I added two global
> > > > > variables, one to track whether or not cost delay table options are in
> > > > > use and the other to store the value of the table option cost delay. I
> > > > > didn't want to use a single variable with a special value to indicate
> > > > > that table option cost delay is in use because
> > > > > autovacuum_vacuum_cost_delay already has special values that mean
> > > > > certain things. My code needs a better solution.
> > > >
> > > > While it's true that wi_cost_delay doesn't need to be shared, it seems
> > > > to make the logic somewhat complex. We need to handle cost_delay in a
> > > > different way from other vacuum-related parameters and we need to make
> > > > sure av[_use]_table_option_cost_delay are set properly. Removing
> > > > wi_cost_delay from WorkerInfoData saves 8 bytes shared memory per
> > > > autovacuum worker but it might be worth considering to keep
> > > > wi_cost_delay for simplicity.
> > >
> > > Ah, it turns out we can't really remove wi_cost_delay from WorkerInfo
> > > anyway because the launcher doesn't know anything about table options
> > > and so the workers have to keep an updated wi_cost_delay that the
> > > launcher or other autovac workers who are not vacuuming that table can
> > > read from when calculating the new limit in autovac_balance_cost().
> >
> > IIUC if any of the cost delay parameters has been set individually,
> > the autovacuum worker is excluded from the balance algorithm.
>
> Ah, yes! That's right. So it is not a problem. Then I still think
> removing wi_cost_delay from the worker info makes sense. wi_cost_delay
> is a double and can't easily be accessed atomically the way
> wi_cost_limit can be.
>
> Keeping the cost delay local to the backends also makes it clear that
> cost delay is not something that should be written to by other backends
> or that can differ from worker to worker. Without table options in the
> picture, the cost delay should be the same for any worker who has
> reloaded the config file.
>
> As for the cost limit safe access issue, maybe we can avoid a LWLock
> acquisition for reading wi_cost_limit by using an atomic similar to what
> you suggested here for "did_rebalance".
>
> > > I've added in a shared lock for reading from wi_cost_limit in this
> > > patch. However, AutoVacuumUpdateLimit() is called unconditionally in
> > > vacuum_delay_point(), which is called quite often (per block-ish), so I
> > > was trying to think if there is a way we could avoid having to check
> > > this shared memory variable on every call to vacuum_delay_point().
> > > Rebalances shouldn't happen very often (done by the launcher when a new
> > > worker is launched and by workers between vacuuming tables). Maybe we
> > > can read from it less frequently?
> >
> > Yeah, acquiring the lwlock for every call to vacuum_delay_point()
> > seems to be harmful. One idea would be to have one sig_atomic_t
> > variable in WorkerInfoData and autovac_balance_cost() set it to true
> > after rebalancing the worker's cost-limit. The worker can check it
> > without locking and update its delay parameters if the flag is true.
>
> Instead of having the atomic indicate whether or not someone (launcher
> or another worker) did a rebalance, it would simply store the current
> cost limit. Then the worker can normally access it with a simple read.
>
> My rationale is that if we used an atomic to indicate whether or not we
> did a rebalance ("did_rebalance"), we would have the same cache
> coherency guarantees as if we just used the atomic for the cost limit.
> If we read from the "did_rebalance" variable and missed someone having
> written to it on another core, we still wouldn't get around to checking
> the wi_cost_limit variable in shared memory, so it doesn't matter that
> we bothered to keep it in shared memory and use a lock to access it.
>
> I noticed we don't allow wi_cost_limit to ever be less than 0, so we
> could store 

Re: postgres_fdw, dblink, and CREATE SUBSCRIPTION security

2023-03-10 Thread Jacob Champion
On Thu, Mar 9, 2023 at 6:17 AM Robert Haas  wrote:
> That seems like a circular argument. If you call the problem the
> confused deputy problem then the issue must indeed be that the deputy
> is confused, and needs to talk to someone else to get un-confused. But
> why is the deputy necessarily confused in the first place? Our deputy
> is confused because our code to decide whether to proxy a connection
> or not is super-dumb,

No, I think our proxy is confused because it doesn't know what power
it has, and it can't tell the server what power it wants to use. That
problem is independent of the decision to proxy. You're suggesting
strengthening the code that makes that decision -- adding an oracle
(in the form of a DBA) that knows about the confusion and actively
mitigates it. That's guaranteed to work if the oracle is perfect,
because "perfect" is somewhat tautologically defined as "whatever
ensures secure operation". But the oracle doesn't reduce the
confusion, and DBAs aren't perfect.

If you want to add a Sheriff Andy to hold Barney Fife's hand [1], that
will absolutely make Barney less of a problem, and I'd like to have
Andy around regardless. But Barney still doesn't know what's going on,
and when Andy makes a mistake, there will still be trouble. I'd like
to teach Barney some useful stuff.

> but if there's an intrinsic reason it can't be
> smarter, I don't understand what it is.

Well... I'm not well-versed enough in this to prove non-existence of a
solution. Can you find a solution, using the current protocol, that
doesn't make use of perfect out-of-band knowledge? We have a client
that will authenticate using any method the server asks it to, even if
its user intended to use something else. And we have a server that can
eagerly skip client authentication, and then eagerly run code on its
behalf, without first asking the client what it's even trying to do.
That would be an inherently hostile environment for *any* proxy, not
just ours.

Thanks,
--Jacob

[1] https://en.wikipedia.org/wiki/The_Andy_Griffith_Show#Premise_and_characters




Re: [PoC] Let libpq reject unexpected authentication requests

2023-03-10 Thread Jacob Champion
On Fri, Mar 10, 2023 at 3:09 PM Michael Paquier  wrote:
> >>  +  reason = libpq_gettext("server did not complete 
> >> authentication"),
> >> -+  result = false;
> >> ++  result = false;
> >>  +  }
> >
> > This reindentation looks odd.
>
> That's because the previous line has a comma.  So the reindent is
> right, not the code.

Whoops. :(

> Could you send a new patch with all these adjustments?  That would
> help a lot.

Will do!

Thanks,
--Jacob




Re: Should vacuum process config file reload more often

2023-03-10 Thread Melanie Plageman
Quotes below are combined from two of Sawada-san's emails.

I've also attached a patch with my suggested current version.

On Thu, Mar 9, 2023 at 10:27 PM Masahiko Sawada  wrote:
>
> On Fri, Mar 10, 2023 at 11:23 AM Melanie Plageman
>  wrote:
> >
> > On Tue, Mar 7, 2023 at 12:10 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Mon, Mar 6, 2023 at 5:26 AM Melanie Plageman
> > >  wrote:
> > > >
> > > > On Thu, Mar 2, 2023 at 6:37 PM Melanie Plageman
> > > > In this version I've removed wi_cost_delay from WorkerInfoData. There is
> > > > no synchronization of cost_delay amongst workers, so there is no reason
> > > > to keep it in shared memory.
> > > >
> > > > One consequence of not updating VacuumCostDelay from wi_cost_delay is
> > > > that we have to have a way to keep track of whether or not autovacuum
> > > > table options are in use.
> > > >
> > > > This patch does this in a cringeworthy way. I added two global
> > > > variables, one to track whether or not cost delay table options are in
> > > > use and the other to store the value of the table option cost delay. I
> > > > didn't want to use a single variable with a special value to indicate
> > > > that table option cost delay is in use because
> > > > autovacuum_vacuum_cost_delay already has special values that mean
> > > > certain things. My code needs a better solution.
> > >
> > > While it's true that wi_cost_delay doesn't need to be shared, it seems
> > > to make the logic somewhat complex. We need to handle cost_delay in a
> > > different way from other vacuum-related parameters and we need to make
> > > sure av[_use]_table_option_cost_delay are set properly. Removing
> > > wi_cost_delay from WorkerInfoData saves 8 bytes shared memory per
> > > autovacuum worker but it might be worth considering to keep
> > > wi_cost_delay for simplicity.
> >
> > Ah, it turns out we can't really remove wi_cost_delay from WorkerInfo
> > anyway because the launcher doesn't know anything about table options
> > and so the workers have to keep an updated wi_cost_delay that the
> > launcher or other autovac workers who are not vacuuming that table can
> > read from when calculating the new limit in autovac_balance_cost().
>
> IIUC if any of the cost delay parameters has been set individually,
> the autovacuum worker is excluded from the balance algorithm.

Ah, yes! That's right. So it is not a problem. Then I still think
removing wi_cost_delay from the worker info makes sense. wi_cost_delay
is a double and can't easily be accessed atomically the way
wi_cost_limit can be.

Keeping the cost delay local to the backends also makes it clear that
cost delay is not something that should be written to by other backends
or that can differ from worker to worker. Without table options in the
picture, the cost delay should be the same for any worker who has
reloaded the config file.

As for the cost limit safe access issue, maybe we can avoid a LWLock
acquisition for reading wi_cost_limit by using an atomic similar to what
you suggested here for "did_rebalance".

> > I've added in a shared lock for reading from wi_cost_limit in this
> > patch. However, AutoVacuumUpdateLimit() is called unconditionally in
> > vacuum_delay_point(), which is called quite often (per block-ish), so I
> > was trying to think if there is a way we could avoid having to check
> > this shared memory variable on every call to vacuum_delay_point().
> > Rebalances shouldn't happen very often (done by the launcher when a new
> > worker is launched and by workers between vacuuming tables). Maybe we
> > can read from it less frequently?
>
> Yeah, acquiring the lwlock for every call to vacuum_delay_point()
> seems to be harmful. One idea would be to have one sig_atomic_t
> variable in WorkerInfoData and autovac_balance_cost() set it to true
> after rebalancing the worker's cost-limit. The worker can check it
> without locking and update its delay parameters if the flag is true.

Instead of having the atomic indicate whether or not someone (launcher
or another worker) did a rebalance, it would simply store the current
cost limit. Then the worker can normally access it with a simple read.

My rationale is that if we used an atomic to indicate whether or not we
did a rebalance ("did_rebalance"), we would have the same cache
coherency guarantees as if we just used the atomic for the cost limit.
If we read from the "did_rebalance" variable and missed someone having
written to it on another core, we still wouldn't get around to checking
the wi_cost_limit variable in shared memory, so it doesn't matter that
we bothered to keep it in shared memory and use a lock to access it.

I noticed we don't allow wi_cost_limit to ever be less than 0, so we
could store wi_cost_limit in an atomic uint32.

I'm not sure if it is okay to do pg_atomic_read_u32() and
pg_atomic_unlocked_write_u32() or if we need pg_atomic_write_u32() in
most cases.

I've implemented the atomic cost limit in the attached patch. Though,
I'm pretty unsure about 

Re: [PoC] Let libpq reject unexpected authentication requests

2023-03-10 Thread Michael Paquier
On Fri, Mar 10, 2023 at 02:32:17PM -0800, Jacob Champion wrote:
> On 3/8/23 22:35, Michael Paquier wrote:
> Works for me. I wonder if
>
>sizeof(((PGconn*) 0)->allowed_auth_methods)
>
> would make pgindent any happier? That'd let you keep the assertion local
> to auth_method_allowed, but it looks scarier. :)

I can check that, now it's not bad to keep the assertion as it is,
either.

>> As of the "sensitive" cases of the patch:
>> - I don't really think that we have to care much of the cases like
>> "none,scram" meaning that a SASL exchange hastily downgraded to
>> AUTH_REQ_OK by the server would be a success, as "none" means that the
>> client is basically OK with trust-level.  This said, "none" could be a
>> dangerous option in some cases, while useful in others.
>
> Yeah. I think a server shouldn't be allowed to abandon a SCRAM exchange
> partway through, but that's completely independent of this patchset.

Agreed.

>> We could stick a small test somewhere, perhaps, certainly not in
>> src/test/authentication/.
>
> Where were you thinking? (Would it be so bad to have a tiny
> t/005_sspi.pl that's just skipped on *nix?)

Hmm, OK.  It may be worth having a 005_sspi.pl in
src/test/authentication/ specifically for Windows.  This patch gives
at least one reason to do so.  Looking at pg_regress.c, we have that:
if (config_auth_datadir)
{
#ifdef ENABLE_SSPI
if (!use_unix_sockets)
config_sspi_auth(config_auth_datadir, user);
#endif
exit(0);
}

So applying a check on $use_unix_sockets should be OK, I hope.

>> - SASL/SCRAM is indeed a problem on its own.  My guess is that we
>> should let channel_binding do the job for SASL, or introduce a new
>> option to decide which sasl mechanisms are authorized.  At the end,
>> using "scram-sha-256" as the keyword is fine by me as we use that even
>> for HBA files, so that's quite known now, I hope.
>
> Did you have any thoughts about the 0003 generalization attempt?

Not yet, unfortunately.

> > -+  if (conn->require_auth)
> > ++  if (conn->require_auth && conn->require_auth[0])
>
> Thank you for that catch. I guess we should test somewhere that
> `require_auth=` behaves normally?

Yeah, that seems like an idea.  That would be cheap enough.

>>  +  reason = libpq_gettext("server did not complete 
>> authentication"),
>> -+  result = false;
>> ++  result = false;
>>  +  }
>
> This reindentation looks odd.

That's because the previous line has a comma.  So the reindent is
right, not the code.

> nit: some of the new TAP test names have been rewritten with commas,
> others with colons.

Indeed, I thought to have caught all of them, but you wrote a lot of
tests :)

Could you send a new patch with all these adjustments?  That would
help a lot.
--
Michael


signature.asc
Description: PGP signature


Re: buildfarm + meson

2023-03-10 Thread Andres Freund
Hi,

On 2023-03-09 11:55:57 -0800, Andres Freund wrote:
> On 2023-03-09 14:47:36 -0500, Andrew Dunstan wrote:
> > On 2023-03-09 Th 08:28, Andrew Dunstan wrote:
> > > At this stage I think I'm prepared to turn this loose on a couple of my
> > > buildfarm animals, and if nothing goes awry for the remainder of the
> > > month merge the dev/meson branch and push a new release.
> 
> Cool!

I moved a few of my animals to it to, so far no problems.

The only other thing I noticed so far is that the status page doesn't yet know
how to generate the right "flags", but that's fairly minor...

Greetings,

Andres Freund




Re: Add LZ4 compression in pg_dump

2023-03-10 Thread Michael Paquier
On Fri, Mar 10, 2023 at 07:05:49AM -0600, Justin Pryzby wrote:
> On Thu, Mar 09, 2023 at 06:58:20PM +0100, Tomas Vondra wrote:
>> I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to
>> lz4f, doesn't that mean it (e.g. restore) won't work on systems that
>> only have older lz4 version? What would/should happen if we take backup
>> compressed with lz4f, an then try restoring it on an older system where
>> lz4 does not support lz4f?
> 
> You seem to be thinking about LZ4F as a weird, new innovation I'm
> experimenting with, but compress_lz4.c already uses LZ4F for its "file"
> API.

Note: we already use lz4 frames in pg_receivewal (for WAL) and
pg_basebackup (bbstreamer).
--
Michael


signature.asc
Description: PGP signature


RE: Ability to reference other extensions by schema in extension scripts

2023-03-10 Thread Regina Obe
> "Regina Obe"  writes:
> >> requires = 'extfoo, extbar'
> >> no_relocate = 'extfoo'
> 
> > So when no_relocate is specified, where would that live?
> 
> In the control file.
> 
> > Would I mark the extfoo as not relocatable on CREATE / ALTER of said
> > extension?
> > Or add an extra field to pg_extension
> 
> We don't record dependent extensions in pg_extension now, so that doesn't
> seem like it would fit well.  I was envisioning that ALTER EXTENSION SET
> SCHEMA would do something along the lines of
> 
> (1) scrape the list of dependent extensions out of pg_depend
> (2) open and parse each of their control files
> (3) fail if any of their control files mentions the target one in
> no_relocate.
> 
> Admittedly, this'd be a bit slow, but I doubt that ALTER EXTENSION SET
> SCHEMA is a performance bottleneck for anybody.
> 

Okay I'll move ahead with this approach.

Thanks,
Regina





Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-10 Thread Tom Lane
Thomas Munro  writes:
> I think this is the minimal back-patchable change.  I propose to go
> ahead and do that, and then to kick the ideas about latch API changes
> into a new thread for the next commitfest.

OK by me, but then again 4753ef37 wasn't my patch.

regards, tom lane




Re: Ability to reference other extensions by schema in extension scripts

2023-03-10 Thread Tom Lane
"Regina Obe"  writes:
>> requires = 'extfoo, extbar'
>> no_relocate = 'extfoo'

> So when no_relocate is specified, where would that live?

In the control file.

> Would I mark the extfoo as not relocatable on CREATE / ALTER of said
> extension?
> Or add an extra field to pg_extension

We don't record dependent extensions in pg_extension now, so that
doesn't seem like it would fit well.  I was envisioning that
ALTER EXTENSION SET SCHEMA would do something along the lines of

(1) scrape the list of dependent extensions out of pg_depend
(2) open and parse each of their control files
(3) fail if any of their control files mentions the target one in
no_relocate.

Admittedly, this'd be a bit slow, but I doubt that ALTER EXTENSION
SET SCHEMA is a performance bottleneck for anybody.

> I had tried to do that originally, e.g. instead of even bothering with such
> an extra arg, just mark it as not relocatable if the extension's script
> contains references to the required extension's schema.

I don't think that's a great approach, because those references might
appear in places that can track a rename (ie, in an object name that's
resolved to a stored OID).  Short of fully parsing the script file you
aren't going to get a reliable answer.  I'm content to lay that problem
off on the extension authors.

> But then what if extfoo is upgraded?

We already have mechanisms for version-dependent control files, so
I don't see where there's a problem.

regards, tom lane




Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

2023-03-10 Thread Thomas Munro
On Fri, Mar 10, 2023 at 6:58 PM Thomas Munro  wrote:
> ... Perhaps something like 0004, which also shows the sort
> of thing that we might consider back-patching to 14 and 15 (next
> revision I'll move that up the front and put it in back-patchable
> form).

I think this is the minimal back-patchable change.  I propose to go
ahead and do that, and then to kick the ideas about latch API changes
into a new thread for the next commitfest.
From 1758cf16d22af7c9bae56bc6a40c394ba0107207 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Sat, 11 Mar 2023 10:42:20 +1300
Subject: [PATCH] Fix fractional vacuum_cost_delay.

Commit 4753ef37 changed vacuum_delay_point() to use the WaitLatch() API,
to fix the problem that vacuum could keep running for a very long time
after the postmaster died.

Unfortunately, that broke commit caf626b2's support for fractional
vacuum_cost_delay, which shipped in PostgreSQL 12.  WaitLatch() works in
whole milliseconds.

For now, revert the chance from commit 4753ef37, but add an explicit
check for postmaster death.  That's an extra system call on systems
other than Linux and FreeBSD, but that overhead doesn't matter much
considering that we willingly went to sleep and woke up again.  (In
later work, we might add higher resolution timeouts to the latch API so
that we could do this in the standard programming pattern.)

Back-patch to 14, where commit 4753ef37 arrived.

Reported-by: Melanie Plageman 
Discussion: https://postgr.es/m/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg%40mail.gmail.com
---
 src/backend/commands/vacuum.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 2e12baf8eb..c54360a6a0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -50,6 +50,7 @@
 #include "postmaster/bgworker_internals.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
+#include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
@@ -2232,11 +2233,18 @@ vacuum_delay_point(void)
 		if (msec > VacuumCostDelay * 4)
 			msec = VacuumCostDelay * 4;
 
-		(void) WaitLatch(MyLatch,
-		 WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
-		 msec,
-		 WAIT_EVENT_VACUUM_DELAY);
-		ResetLatch(MyLatch);
+		pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY);
+		pg_usleep(msec * 1000);
+		pgstat_report_wait_end();
+
+		/*
+		 * We don't want to ignore postmaster death during very long vacuums
+		 * with vacuum_cost_delay configured.  We can't use the usual
+		 * WaitLatch() approach here because we want microsecond-based sleep
+		 * durations above.
+		 */
+		if (IsUnderPostmaster && !PostmasterIsAlive())
+			exit(1);
 
 		VacuumCostBalance = 0;
 
-- 
2.39.2



RE: Ability to reference other extensions by schema in extension scripts

2023-03-10 Thread Regina Obe
> No, pg_depend is not the thing to use for this.  I was thinking of a new
field in
> the extension's control file, right beside where it says it's dependent on
such-
> and-such extensions in the first place.  Say like
> 
>   requires = 'extfoo, extbar'
>   no_relocate = 'extfoo'
> 

So when no_relocate is specified, where would that live?

Would I mark the extfoo as not relocatable on CREATE / ALTER of said
extension?
Or add an extra field to pg_extension

I had tried to do that originally, e.g. instead of even bothering with such
an extra arg, just mark it as not relocatable if the extension's script
contains references to the required extension's schema.

But then what if extfoo is upgraded?

ALTER EXTENSION extfoo UPDATE;

Wipes out the not relocatable of extfoo set. 
So in order to prevent that, I have to 

a) check the control files of all extensions that depend on foo to see if
they made such a request.
or 
b) "Seeing if the extension is marked as not relocatable, prevent ALTER
EXTENSION from marking it as relocatable"
problem with b is what if the extension author changed their mind and wanted
it to be relocatable? Given the default is (not relocatable), it's possible
the author didn't know this and later decided to put in an explicit
relocate=false.
c) define a new column in pg_extension to hold this bit of info.  I was
hoping I could reuse pg_extension.extconfig, but it seems that's hardwired
to be only used for backup.

Am I missing something or is this really as complicated as I think it is?

If we go with b) I'm not sure why I need to bother defining a no_relocate,
as it's obvious looking at the extension install/upgrade script that it
should not be relocatable.

> > So you are proposing I change the execute_extension_scripts input args
> > to take more args?
> 
> Why not?  It's local to that file, so you won't break anything.
> 

Okay, I wasn't absolutely sure if it was.  If it is then I'll change.

>   regards, tom lane


Thanks,
Regina





Re: [PoC] Let libpq reject unexpected authentication requests

2023-03-10 Thread Jacob Champion
On 3/8/23 22:35, Michael Paquier wrote:
> I have been reviewing 0001, finishing with the attached, and that's
> nice work.  My notes are below.

Thanks!

> pqDropServerData() is in charge of cleaning up the transient data of a
> connection between different attempts.  Shouldn't client_finished_auth
> be reset to false there?  No parameters related to the connection
> parameters should be reset in this code path, but this state is
> different.  It does not seem possible that we could reach
> pqDropServerData() after client_finished_auth has been set to true,
> but that feels safer.

Yeah, that seems reasonable.

> +   case AUTH_REQ_SCM_CREDS:
> +   return libpq_gettext("server requested UNIX socket credentials");
> I am not really cool with the fact that this would fail and that we
> offer no options to control that.  Now, this involves servers up to
> 9.1, which is also a very good to rip of this code entirely.  For now,
> I think that we'd better allow this option, and discuss the removal of
> that in a separate thread.

Fair enough.

> pgindent has been complaining on the StaticAssertDecl() in fe-auth.c:
> src/interfaces/libpq/fe-auth.c: Error@847: Unbalanced parens
> Warning@847: Extra )
> Warning@847: Extra )
> Warning@848: Extra )
> 
> From what I can see, this comes from the use of {0} within the
> expression itself.  I don't really want to dig into why pg_bsd_indent
> thinks this is a bad idea, so let's just move the StaticAssertDecl() a
> bit, like in the attached.  The result is the same.

Works for me. I wonder if

   sizeof(((PGconn*) 0)->allowed_auth_methods)

would make pgindent any happier? That'd let you keep the assertion local
to auth_method_allowed, but it looks scarier. :)

> As of the "sensitive" cases of the patch:
> - I don't really think that we have to care much of the cases like
> "none,scram" meaning that a SASL exchange hastily downgraded to
> AUTH_REQ_OK by the server would be a success, as "none" means that the
> client is basically OK with trust-level.  This said, "none" could be a
> dangerous option in some cases, while useful in others.

Yeah. I think a server shouldn't be allowed to abandon a SCRAM exchange
partway through, but that's completely independent of this patchset.

> - SSPI is the default connection setup for the TAP tests on Windows.

Oh, I don't think I ever noticed that.

> We could stick a small test somewhere, perhaps, certainly not in
> src/test/authentication/.

Where were you thinking? (Would it be so bad to have a tiny
t/005_sspi.pl that's just skipped on *nix?)

> - SASL/SCRAM is indeed a problem on its own.  My guess is that we
> should let channel_binding do the job for SASL, or introduce a new
> option to decide which sasl mechanisms are authorized.  At the end,
> using "scram-sha-256" as the keyword is fine by me as we use that even
> for HBA files, so that's quite known now, I hope.

Did you have any thoughts about the 0003 generalization attempt?

> -+  if (conn->require_auth)
> ++  if (conn->require_auth && conn->require_auth[0])

Thank you for that catch. I guess we should test somewhere that
`require_auth=` behaves normally?

>  +  reason = libpq_gettext("server did not complete 
> authentication"),
> -+  result = false;
> ++  result = false;
>  +  }

This reindentation looks odd.

nit: some of the new TAP test names have been rewritten with commas,
others with colons.

Thanks,
--Jacob




Re: Ability to reference other extensions by schema in extension scripts

2023-03-10 Thread Tom Lane
"Regina Obe"  writes:
>> If you want to work harder, perhaps a reasonable way to deal with the issue
>> would be to allow dependent extensions to declare that they don't want your
>> extension relocated.  But I do not think it's okay to make that the default
>> behavior, much less the only behavior.

> -  the main issue I ran into is I have to introduce another dependency type
> or go with Sandro's idea of using refsubobjid for this purpose.

No, pg_depend is not the thing to use for this.  I was thinking of a new
field in the extension's control file, right beside where it says it's
dependent on such-and-such extensions in the first place.  Say like

requires = 'extfoo, extbar'
no_relocate = 'extfoo'

> So you are proposing I change the execute_extension_scripts input args to
> take more args?

Why not?  It's local to that file, so you won't break anything.

regards, tom lane




Re: recovery modules

2023-03-10 Thread Nathan Bossart
Here is a rebased version of the restore modules patch set.  I swapped the
patch for the stopgap fix for restore_command with the latest version [0],
and I marked the restore/ headers as installable (as was recently done for
archive/ [1]).  There are no other changes.

[0] https://postgr.es/m/20230301224751.GA1823946%40nathanxps13
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6ad5793

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 1173c8b4e476575c3e4b410f3aa6220360c38503 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 23 Feb 2023 14:27:48 -0800
Subject: [PATCH v13 1/7] Move extra code out of the Pre/PostRestoreCommand()
 block.

If SIGTERM is received within this block, the startup process will
immediately proc_exit() in the signal handler, so it is inadvisable
to include any more code than is required in this section.  This
change moves the code recently added to this block (see 1b06d7b and
7fed801) to outside of the block.  This ensures that only system()
is called while proc_exit() might be called in the SIGTERM handler,
which is how this code worked from v8.4 to v14.
---
 src/backend/access/transam/xlogarchive.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index fcc87ff44f..41684418b6 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -159,20 +159,27 @@ RestoreArchivedFile(char *path, const char *xlogfname,
 			(errmsg_internal("executing restore command \"%s\"",
 			 xlogRestoreCmd)));
 
+	fflush(NULL);
+	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
+
 	/*
-	 * Check signals before restore command and reset afterwards.
+	 * PreRestoreCommand() informs the SIGTERM handler for the startup process
+	 * that it should proc_exit() right away.  This is done for the duration of
+	 * the system() call because there isn't a good way to break out while it
+	 * is executing.  Since we might call proc_exit() in a signal handler, it
+	 * is best to put any additional logic before or after the
+	 * PreRestoreCommand()/PostRestoreCommand() section.
 	 */
 	PreRestoreCommand();
 
 	/*
 	 * Copy xlog from archival storage to XLOGDIR
 	 */
-	fflush(NULL);
-	pgstat_report_wait_start(WAIT_EVENT_RESTORE_COMMAND);
 	rc = system(xlogRestoreCmd);
-	pgstat_report_wait_end();
 
 	PostRestoreCommand();
+
+	pgstat_report_wait_end();
 	pfree(xlogRestoreCmd);
 
 	if (rc == 0)
-- 
2.25.1

>From bdd7268075f150bd292050f74701568af8eb66ec Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Tue, 14 Feb 2023 09:44:53 -0800
Subject: [PATCH v13 2/7] Don't proc_exit() in startup's SIGTERM handler if
 forked by system().

Instead, emit a message to STDERR and _exit() in this case.  This
change also adds assertions to proc_exit(), ProcKill(), and
AuxiliaryProcKill() to verify that these functions are not called
by a process forked by system(), etc.
---
 src/backend/postmaster/startup.c | 17 -
 src/backend/storage/ipc/ipc.c|  3 +++
 src/backend/storage/lmgr/proc.c  |  2 ++
 src/backend/utils/error/elog.c   | 28 
 src/include/utils/elog.h |  6 +-
 5 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index efc2580536..0e7de26bc2 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -19,6 +19,8 @@
  */
 #include "postgres.h"
 
+#include 
+
 #include "access/xlog.h"
 #include "access/xlogrecovery.h"
 #include "access/xlogutils.h"
@@ -121,7 +123,20 @@ StartupProcShutdownHandler(SIGNAL_ARGS)
 	int			save_errno = errno;
 
 	if (in_restore_command)
-		proc_exit(1);
+	{
+		/*
+		 * If we are in a child process (e.g., forked by system() in
+		 * RestoreArchivedFile()), we don't want to call any exit callbacks.
+		 * The parent will take care of that.
+		 */
+		if (MyProcPid == (int) getpid())
+			proc_exit(1);
+		else
+		{
+			write_stderr_signal_safe("StartupProcShutdownHandler() called in child process\n");
+			_exit(1);
+		}
+	}
 	else
 		shutdown_requested = true;
 	WakeupRecovery();
diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index 1904d21795..d5097dc008 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -103,6 +103,9 @@ static int	on_proc_exit_index,
 void
 proc_exit(int code)
 {
+	/* proc_exit() is not safe in forked processes from system(), etc. */
+	Assert(MyProcPid == (int) getpid());
+
 	/* Clean up everything that must be cleaned up */
 	proc_exit_prepare(code);
 
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..b9e2c3aafe 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -805,6 +805,7 @@ ProcKill(int code, Datum arg)
 	dlist_head *procgloballist;
 
 	Assert(MyProc != NULL);
+	

Lock mode in ExecMergeMatched()

2023-03-10 Thread Alexander Korotkov
Hi!

I wonder why does ExecMergeMatched() determine the lock mode using
ExecUpdateLockMode().  Why don't we use lock mode set by
table_tuple_update() like ExecUpdate() does?  I skim through the
MERGE-related threads, but didn't find an answer.

I also noticed that we use ExecUpdateLockMode() even for CMD_DELETE.
That ends up by usage of LockTupleNoKeyExclusive for CMD_DELETE, which
seems plain wrong for me.

The proposed change is attached.

--
Regards,
Alexander Korotkov


0001-ExecMergeMatched-lock-mode.patch
Description: Binary data


RE: Ability to reference other extensions by schema in extension scripts

2023-03-10 Thread Regina Obe
> "Regina Obe"  writes:
> > [ 0005-Allow-use-of-extschema-reqextname-to-reference.patch ]
> 
> I took a look at this.  I'm on board with the feature design, but not so
much
> with this undocumented restriction you added to ALTER EXTENSION SET
> SCHEMA:
> 
> + /* If an extension requires this extension
> +  * do not allow relocation */
> + if (pg_depend->deptype == DEPENDENCY_NORMAL &&
> pg_depend->classid == ExtensionRelationId){
> + dep.classId = pg_depend->classid;
> + dep.objectId = pg_depend->objid;
> + dep.objectSubId = pg_depend->objsubid;
> + ereport(ERROR,
> +
>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +  errmsg("cannot SET SCHEMA of
> extension %s because other extensions require it",
> + NameStr(extForm-
> >extname)),
> +  errdetail("%s requires extension
%s",
> +
> getObjectDescription(, false),
> +NameStr(extForm->extname;
> 
> That seems quite disastrous for usability, and it's making an assumption
> unsupported by any evidence: that it will be a majority use-case for
> dependent extensions to have used @extschema:myextension@ in a way that
> would be broken by ALTER EXTENSION SET SCHEMA.
> 
> I think we should just drop this.  It might be worth putting in some
> documentation notes about the hazard, instead.
> 
That was my thought originally too and also given the rarity of people
changing schemas
I wasn't that bothered with not forcing this.  Sandro was a bit more
bothered by not forcing it and given the default for extensions is not
relocatable, we didn't see that much of an issue with it.


> If you want to work harder, perhaps a reasonable way to deal with the
issue
> would be to allow dependent extensions to declare that they don't want
your
> extension relocated.  But I do not think it's okay to make that the
default
> behavior, much less the only behavior.

I had done that in one iteration of the patch.
We discussed this here
https://www.postgresql.org/message-id/01d949ad%241159adc0%24340d0940%24%
40pcorp.us 

and here
https://www.postgresql.org/message-id/20230223183906.6rhtybwdpe37sri7%40c19

-  the main issue I ran into is I have to introduce another dependency type
or go with Sandro's idea of using refsubobjid for this purpose.  I think
defining a new dependency type is less likely to cause unforeseen
complications elsewhere, but did require me to expand the scope (to make
changes to pg_depend).  Which I am fine with doing, but didn't want to over
extend my reach too much.

One of my revisions tried to use DEPENDENCY_AUTO which did not work (as
Sandro discovered) and I had some other annoyances with lack of helper
functions
https://www.postgresql.org/message-id/000401d93a14%248647f540%2492d7dfc0%24%
40pcorp.us

key point:
"Why isn't there a variant getAutoExtensionsOfObject take a DEPENDENCY type
as an option so it would be more useful or is there functionality for that I
missed?"


> And really, since we've gotten along without it so far, I'm not sure that
it's
> necessary to have it.
> 
> Another thing that's bothering me a bit is the use of
get_required_extension
> in execute_extension_script.  That does way more than you really need, and
> passing a bunch of bogus parameter values to it makes me uncomfortable.
> The callers already have the required extensions' OIDs at hand; it'd be
better
> to add that list to execute_extension_script's API instead of redoing the
> lookups.
> 
>   regards, tom lane

So you are proposing I change the execute_extension_scripts input args to
take more args?






Re: [BUG] pg_stat_statements and extended query protocol

2023-03-10 Thread David Zhang




Yes, I agree that proper test coverage is needed here. Will think
about how to accomplish this.


Tried to apply this patch to current master branch and the build was ok, 
however it crashed during initdb with a message like below.


"performing post-bootstrap initialization ... Segmentation fault (core 
dumped)"


If I remove this patch and recompile again, then "initdb -D $PGDATA" works.

Thanks,

David





Re: Add SHELL_EXIT_CODE to psql

2023-03-10 Thread Tom Lane
Justin Pryzby  writes:
> The exit status is one byte.  I think you should define the status
> variable along the lines of:

>  - 0 if successful; or,
>  - a positive number 1..255 indicating its exit status. or,
>  - a negative number N indicating it was terminated by signal -N; or,
>  - 256 if an internal error occurred (like pclose/ferror);

> See bash(1).  This would be a good behavior to start with, since it
> ought to be familiar to everyone, and if it's good enough to write/run
> shell scripts in, then it's got to be good enough for psql to run a
> single command in.

I'm okay with adopting bash's rule, but then it should actually match
bash --- signal N is reported as 128+N, not -N.

Not sure what to do about pclose/ferror cases.  Maybe use -1?

> I'm not sure why the shell uses 126-127 specially, though..

127 is used similarly by system(3).  I think both 126 and 127 might
be specified by POSIX, but not sure.  In any case, those are outside
our jurisdiction.

regards, tom lane




Re: zstd compression for pg_dump

2023-03-10 Thread Jacob Champion
Hi,

This'll need another rebase over the meson ICU changes.

On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion 
wrote:
> I did some smoke testing against zstd's GitHub release on Windows. To
> build against it, I had to construct an import library, and put that
> and the DLL into the `lib` folder expected by the MSVC scripts...
> which makes me wonder if I've chosen a harder way than necessary?

A meson wrap made this much easier! It looks like pg_dump's meson.build
is missing dependencies on zstd (meson couldn't find the headers in the
subproject without them).

> Parallel zstd dumps seem to work as expected, in that the resulting
> pg_restore output is identical to uncompressed dumps and nothing
> explodes. I haven't inspected the threading implementation for safety
> yet, as you mentioned.

Hm. Best I can tell, the CloneArchive() machinery is supposed to be
handling safety for this, by isolating each thread's state. I don't feel
comfortable pronouncing this new addition safe or not, because I'm not
sure I understand what the comments in the format-specific _Clone()
callbacks are saying yet.

> And I still wasn't able to test :workers, since
> it looks like the official libzstd for Windows isn't built for
> multithreading. That'll be another day's project.

The wrapped installation enabled threading too, so I was able to try
with :workers=8. Everything seems to work, but I didn't have a dataset
that showed speed improvements at the time. It did seem to affect the
overall compressibility negatively -- which makes sense, I think,
assuming each thread is looking at a separate and/or smaller window.

On to code (not a complete review):

> if (hasSuffix(fname, ".gz"))
> compression_spec.algorithm = PG_COMPRESSION_GZIP;
> else
> {
> boolexists;
> 
> exists = (stat(path, ) == 0);
> /* avoid unused warning if it is not built with compression */
> if (exists)
> compression_spec.algorithm = PG_COMPRESSION_NONE;
> -#ifdef HAVE_LIBZ
> -   if (!exists)
> -   {
> -   free_keep_errno(fname);
> -   fname = psprintf("%s.gz", path);
> -   exists = (stat(fname, ) == 0);
> -
> -   if (exists)
> -   compression_spec.algorithm = PG_COMPRESSION_GZIP;
> -   }
> -#endif
> -#ifdef USE_LZ4
> -   if (!exists)
> -   {
> -   free_keep_errno(fname);
> -   fname = psprintf("%s.lz4", path);
> -   exists = (stat(fname, ) == 0);
> -
> -   if (exists)
> -   compression_spec.algorithm = PG_COMPRESSION_LZ4;
> -   }
> -#endif
> +   else if (check_compressed_file(path, , "gz"))
> +   compression_spec.algorithm = PG_COMPRESSION_GZIP;
> +   else if (check_compressed_file(path, , "lz4"))
> +   compression_spec.algorithm = PG_COMPRESSION_LZ4;
> +   else if (check_compressed_file(path, , "zst"))
> +   compression_spec.algorithm = PG_COMPRESSION_ZSTD;
> }

This function lost some coherence, I think. Should there be a hasSuffix
check at the top for ".zstd" (and, for that matter, ".lz4")? And the
comment references an unused warning, which is only possible with the
#ifdef blocks that were removed.

I'm a little suspicious of the replacement of supports_compression()
with parse_compress_specification(). For example:

> -   errmsg = supports_compression(AH->compression_spec);
> -   if (errmsg)
> +   parse_compress_specification(AH->compression_spec.algorithm,
> +   NULL, _spec);
> +   if (compress_spec.parse_error != NULL)
> {
> pg_log_warning("archive is compressed, but this installation does not 
> support compression (%s
> -   errmsg);
> -   pg_free(errmsg);
> +   compress_spec.parse_error);
> +   pg_free(compress_spec.parse_error);
> }

The top-level error here is "does not support compression", but wouldn't
a bad specification option with a supported compression method trip this
path too?

> +static void
> +ZSTD_CCtx_setParam_or_die(ZSTD_CStream *cstream,
> + ZSTD_cParameter param, int value, char *paramname)

IMO we should avoid stepping on the ZSTD_ namespace with our own
internal function names.

> +   if (cs->readF != NULL)
> +   {
> +   zstdcs->dstream = ZSTD_createDStream();
> +   if (zstdcs->dstream == NULL)
> +   pg_fatal("could not initialize compression library");
> +
> +   zstdcs->input.size = ZSTD_DStreamInSize();
> +   zstdcs->input.src = pg_malloc(zstdcs->input.size);
> +
> +   zstdcs->output.size = ZSTD_DStreamOutSize();
> +   zstdcs->output.dst = pg_malloc(zstdcs->output.size + 1);
> +   }
> +
> +   if (cs->writeF != NULL)
> +   {
> +   zstdcs->cstream = ZstdCStreamParams(cs->compression_spec);
> +
> +   zstdcs->output.size = ZSTD_CStreamOutSize();
> +   zstdcs->output.dst = pg_malloc(zstdcs->output.size);
> +   zstdcs->output.pos = 0;
> +   }

This seems to suggest that 

RE: Ability to reference other extensions by schema in extension scripts

2023-03-10 Thread Regina Obe



> -Original Message-
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Friday, March 10, 2023 3:07 PM
> To: Regina Obe 
> Cc: 'Gregory Stark (as CFM)' ; 'Sandro Santilli'
> ; pgsql-hackers@lists.postgresql.org; 'Regina Obe'
> 
> Subject: Re: Ability to reference other extensions by schema in extension
> scripts
> 
> "Regina Obe"  writes:
> > [ 0005-Allow-use-of-extschema-reqextname-to-reference.patch ]
> 
> I took a look at this.  I'm on board with the feature design, but not so
much
> with this undocumented restriction you added to ALTER EXTENSION SET
> SCHEMA:
> 
> + /* If an extension requires this extension
> +  * do not allow relocation */
> + if (pg_depend->deptype == DEPENDENCY_NORMAL &&
> pg_depend->classid == ExtensionRelationId){
> + dep.classId = pg_depend->classid;
> + dep.objectId = pg_depend->objid;
> + dep.objectSubId = pg_depend->objsubid;
> + ereport(ERROR,
> +
>   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +  errmsg("cannot SET SCHEMA of
> extension %s because other extensions require it",
> + NameStr(extForm-
> >extname)),
> +  errdetail("%s requires extension
%s",
> +
> getObjectDescription(, false),
> +NameStr(extForm->extname;
> 
> That seems quite disastrous for usability, and it's making an assumption
> unsupported by any evidence: that it will be a majority use-case for
> dependent extensions to have used @extschema:myextension@ in a way that
> would be broken by ALTER EXTENSION SET SCHEMA.
> 
> I think we should just drop this.  It might be worth putting in some
> documentation notes about the hazard, instead.
> 
> If you want to work harder, perhaps a reasonable way to deal with the
issue
> would be to allow dependent extensions to declare that they don't want
your
> extension relocated.  But I do not think it's okay to make that the
default
> behavior, much less the only behavior.
> And really, since we've gotten along without it so far, I'm not sure that
it's
> necessary to have it.
> 
> Another thing that's bothering me a bit is the use of
get_required_extension
> in execute_extension_script.  That does way more than you really need, and
> passing a bunch of bogus parameter values to it makes me uncomfortable.
> The callers already have the required extensions' OIDs at hand; it'd be
better
> to add that list to execute_extension_script's API instead of redoing the
> lookups.
> 
>   regards, tom lane





Re: Progress report of CREATE INDEX for nested partitioned tables

2023-03-10 Thread Tom Lane
Justin Pryzby  writes:
> Update to address a compiler warning in the supplementary patches adding
> assertions.

I took a look through this.  It seems like basically a good solution,
but the count_leaf_partitions() function is bothering me, for two
reasons:

1. It seems like a pretty expensive thing to do.  Don't we have the
info at hand somewhere already?

2. Is it really safe to do find_all_inheritors with NoLock?  If so,
a comment explaining why would be good.

regards, tom lane




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-03-10 Thread Melanie Plageman
On Fri, Mar 10, 2023 at 3:19 PM Justin Pryzby  wrote:
>
> On Thu, Mar 09, 2023 at 11:43:01AM -0800, Andres Freund wrote:
> > > https://api.cirrus-ci.com/v1/artifact/task/6701069548388352/log/src/test/recovery/tmp_check/regression.diffs
> >
> > Hm. I guess the explanation here is that the buffers were already all 
> > written
> > out by another backend. Which is made more likely by your patch.
>
> FYI: that patch would've made it more likely for each backend to write
> out its *own* dirty pages of TOAST ... but the two other failures that I
> mentioned were for patches which wouldn't have affected this at all.

I think your patch made it more likely that a backend needing to flush a
buffer in order to fit its own data would be doing so in a buffer access
strategy IO context.

Your patch makes it so those toast table writes are using a
BAS_BULKWRITE (see GetBulkInsertState()) and when they are looking for
buffers to put their data in, they have to evict other data (theirs and
others) but all of this is tracked in io_context = 'bulkwrite' -- and
the test only counted writes done in io_context 'normal'. But it is good
that your patch did that! It helped us to see that this test is not
reliable.

The other times this test failed in cfbot were for a patch that had many
failures and might have something wrong with its code, IIRC.

Thanks again for the report!

- Melanie




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-03-10 Thread Justin Pryzby
On Thu, Mar 09, 2023 at 11:43:01AM -0800, Andres Freund wrote:
> On 2023-03-09 06:51:31 -0600, Justin Pryzby wrote:
> > On Tue, Mar 07, 2023 at 10:18:44AM -0800, Andres Freund wrote:
> > > Hi,
> > > 
> > > On 2023-03-06 15:21:14 -0500, Melanie Plageman wrote:
> > > > Good point. Attached is what you suggested. I committed the transaction
> > > > before the drop table so that the statistics would be visible when we
> > > > queried pg_stat_io.
> > > 
> > > Pushed, thanks for the report, analysis and fix, Tom, Horiguchi-san, 
> > > Melanie.
> > 
> > There's a 2nd portion of the test that's still flapping, at least on
> > cirrusci.
> > 
> > The issue that Tom mentioned is at:
> >  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
> > 
> > But what I've seen on cirrusci is at:
> >  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
> 
> Seems you meant to copy a different line for Tom's (s/writes/redas/)?

Seems so

> > https://api.cirrus-ci.com/v1/artifact/task/6701069548388352/log/src/test/recovery/tmp_check/regression.diffs
> 
> Hm. I guess the explanation here is that the buffers were already all written
> out by another backend. Which is made more likely by your patch.

FYI: that patch would've made it more likely for each backend to write
out its *own* dirty pages of TOAST ... but the two other failures that I
mentioned were for patches which wouldn't have affected this at all.

-- 
Justin




Re: Ability to reference other extensions by schema in extension scripts

2023-03-10 Thread Tom Lane
"Regina Obe"  writes:
> [ 0005-Allow-use-of-extschema-reqextname-to-reference.patch ]

I took a look at this.  I'm on board with the feature design,
but not so much with this undocumented restriction you added
to ALTER EXTENSION SET SCHEMA:

+   /* If an extension requires this extension
+* do not allow relocation */
+   if (pg_depend->deptype == DEPENDENCY_NORMAL && 
pg_depend->classid == ExtensionRelationId){
+   dep.classId = pg_depend->classid;
+   dep.objectId = pg_depend->objid;
+   dep.objectSubId = pg_depend->objsubid;
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("cannot SET SCHEMA of extension 
%s because other extensions require it",
+   
NameStr(extForm->extname)),
+errdetail("%s requires extension %s",
+  
getObjectDescription(, false), NameStr(extForm->extname;

That seems quite disastrous for usability, and it's making an assumption
unsupported by any evidence: that it will be a majority use-case for
dependent extensions to have used @extschema:myextension@ in a way that
would be broken by ALTER EXTENSION SET SCHEMA.

I think we should just drop this.  It might be worth putting in some
documentation notes about the hazard, instead.

If you want to work harder, perhaps a reasonable way to deal with
the issue would be to allow dependent extensions to declare that
they don't want your extension relocated.  But I do not think it's
okay to make that the default behavior, much less the only behavior.
And really, since we've gotten along without it so far, I'm not
sure that it's necessary to have it.

Another thing that's bothering me a bit is the use of
get_required_extension in execute_extension_script.  That does way
more than you really need, and passing a bunch of bogus parameter
values to it makes me uncomfortable.  The callers already have
the required extensions' OIDs at hand; it'd be better to add that list
to execute_extension_script's API instead of redoing the lookups.

regards, tom lane




Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

2023-03-10 Thread Melanie Plageman
On Thu, Mar 9, 2023 at 2:43 PM Andres Freund  wrote:
> On 2023-03-09 06:51:31 -0600, Justin Pryzby wrote:
> > On Tue, Mar 07, 2023 at 10:18:44AM -0800, Andres Freund wrote:
> > There's a 2nd portion of the test that's still flapping, at least on
> > cirrusci.
> >
> > The issue that Tom mentioned is at:
> >  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
> >
> > But what I've seen on cirrusci is at:
> >  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
>
> Seems you meant to copy a different line for Tom's (s/writes/redas/)?
>
>
> > https://api.cirrus-ci.com/v1/artifact/task/6701069548388352/log/src/test/recovery/tmp_check/regression.diffs
>
> Hm. I guess the explanation here is that the buffers were already all written
> out by another backend. Which is made more likely by your patch.
>
>
> I found a few more occurances and chatted with Melanie. Melanie will come up
> with a fix I think.

So, what this test is relying on is that either the checkpointer or
another backend will flush the pages of test_io_shared which we dirtied
above in the test. The test specifically checks for IOCONTEXT_NORMAL
writes. It could fail if some other backend is doing a bulkread or
bulkwrite and flushes these buffers first in a strategy context.
This will happen more often when shared buffers is small.

I tried to come up with a reliable test which was limited to
IOCONTEXT_NORMAL. I thought if we could guarantee a dirty buffer would
be pinned using a cursor, that we could then issue a checkpoint and
guarantee a flush that way. However, I don't see a way to guarantee that
no one flushes the buffer between dirtying it and pinning it with the
cursor.

So, I think our best bet is to just change the test to pass if there are
any writes in any contexts. By moving the sum(writes) before the INSERT
and keeping the checkpoint, we can guarantee that someway or another,
some buffers will be flushed. This essentially covers the same code anyway.

Patch attached.

- Melanie
From 0d2904f2cf3b6cbf016e5701aaa2bc6997b505cc Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Fri, 10 Mar 2023 14:26:37 -0500
Subject: [PATCH v1] Stabilize pg_stat_io writes test

Counting writes only for io_context = 'normal' is unreliable, as
backends using a buffer access strategy could flush all of the dirty
buffers out from under the other backends and checkpointer. Change the
test to count writes in any context. This achieves roughly the same
coverage anyway.

Reported-by: Justin Pryzby 
Discussion: https://www.postgresql.org/message-id/ZAnWU8WbXEDjrfUE%40telsasoft.com
---
 src/test/regress/expected/stats.out | 8 
 src/test/regress/sql/stats.sql  | 9 -
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 186c296299..e90940f676 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1137,6 +1137,9 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE io_object = 'relation' \gset io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1155,15 +1158,12 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs.
-SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
-  FROM pg_stat_io
-  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_before_
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
-  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_after_
+  WHERE io_object = 'relation' \gset io_sum_shared_after_
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
  ?column? 
 --
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d7f873cfc9..b94410e49e 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -549,6 +549,9 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE io_object = 'relation' \gset io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -558,16 +561,12 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
 
 -- After a checkpoint, there should 

Re: pgsql: Use ICU by default at initdb time.

2023-03-10 Thread Jeff Davis
On Fri, 2023-03-10 at 07:48 -0800, Jeff Davis wrote:
> On Fri, 2023-03-10 at 15:48 +0100, Peter Eisentraut wrote:
> > Yes, of course.  So we can't really do what I was thinking of.
> 
> OK, I plan to commit something like the patch in this thread soon. I
> just need to add an explanatory comment.

Committed a slightly narrower fix that derives the default encoding the
same way for both libc and ICU; except that ICU still uses UTF-8 for
C/POSIX/--no-locale (because ICU doesn't work with SQL_ASCII).

That seemed more consistent with the comments around
pg_get_encoding_from_locale() and it was also easier to document the -E
switch in initdb.

I'll keep an eye on the buildfarm to see if this fixes the problem or
causes other issues. But it seems like the right change.

Regards,
Jeff Davis





Re: RLS makes COPY TO process child tables

2023-03-10 Thread Tom Lane
Stephen Frost  writes:
> Yeah, that should also be updated.  Perhaps you'd send an updated patch
> which includes fixing that too and maybe adds clarifying documentation
> to COPY which mentions what happens when RLS is enabled on the relation?

I couldn't find anything in copy.sgml that seemed to need adjustment.
It already says

If row-level security is enabled for the table, the relevant SELECT
policies will apply to COPY table TO statements.

Together with the already-mentioned

COPY table TO copies the same rows as SELECT * FROM ONLY table.

I'd say that the behavior is very clearly specified already.  We just
need to make the code do what the docs say.  So I pushed the patch
without any docs changes.  I did add a test case, because I don't
like back-patching without something that proves that the issue is
relevant to, and corrected in, each branch.

regards, tom lane




Re: Memory leak from ExecutorState context?

2023-03-10 Thread Jehan-Guillaume de Rorthais
Hi,

> So I guess the best thing would be to go through these threads, see what
> the status is, restart the discussion and propose what to do. If you do
> that, I'm happy to rebase the patches, and maybe see if I could improve
> them in some way.

OK! It took me some time, but I did it. I'll try to sum up the situation as
simply as possible.

I reviewed the following threads:

* Out of Memory errors are frustrating as heck!
  2019-04-14 -> 2019-04-28
  
https://www.postgresql.org/message-id/flat/bc138e9f-c89e-9147-5395-61d51a757b3b%40gusw.net

  This discussion stalled, waiting for OP, but ideas there ignited all other
  discussions.

* accounting for memory used for BufFile during hash joins
  2019-05-04 -> 2019-09-10
  
https://www.postgresql.org/message-id/flat/20190504003414.bulcbnge3rhwhcsh%40development

  This was suppose to push forward a patch discussed on previous thread, but
  it actually took over it and more ideas pops from there.

* Replace hashtable growEnable flag
  2019-05-15 -> 2019-05-16
  
https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com

  This one quickly merged to the next one.

* Avoiding hash join batch explosions with extreme skew and weird stats
  2019-05-16 -> 2020-09-24
  
https://www.postgresql.org/message-id/flat/CA%2BhUKGKWWmf%3DWELLG%3DaUGbcugRaSQbtm0tKYiBut-B2rVKX63g%40mail.gmail.com

  Another thread discussing another facet of the problem, but eventually end up
  discussing / reviewing the BNLJ implementation.
  
Five possible fixes/ideas were discussed all over these threads:


1. "move BufFile stuff into separate context"
   last found patch: 2019-04-21
   
https://www.postgresql.org/message-id/20190421114618.z3mpgmimc3rmubi4%40development
   
https://www.postgresql.org/message-id/attachment/100822/0001-move-BufFile-stuff-into-separate-context.patch

   This patch helps with observability/debug by allocating the bufFiles in the
   appropriate context instead of the "ExecutorState" one.

   I suppose this simple one has been forgotten in the fog of all other
   discussions. Also, this probably worth to be backpatched.

2. "account for size of BatchFile structure in hashJoin"
   last found patch: 2019-04-22
   
https://www.postgresql.org/message-id/20190428141901.5dsbge2ka3rxmpk6%40development
   
https://www.postgresql.org/message-id/attachment/100951/v2-simple-rebalance.patch

   This patch seems like a good first step:

   * it definitely helps older versions where other patches discussed are way
 too invasive to be backpatched
   * it doesn't step on the way of other discussed patches

   While looking at the discussions around this patch, I was wondering if the
   planner considers the memory allocation of bufFiles. But of course, Melanie
   already noticed that long before I was aware of this problem and discussion:

   2019-07-10: «I do think that accounting for Buffile overhead when estimating
   the size of the hashtable during ExecChooseHashTableSize() so it can be
 used during planning is a worthwhile patch by itself (though I know it
 is not even part of this patch).»
   
https://www.postgresql.org/message-id/CAAKRu_Yiam-%3D06L%2BR8FR%2BVaceb-ozQzzMqRiY2pDYku1VdZ%3DEw%40mail.gmail.com
 
   Tomas Vondra agreed with this in his answer, but no new version of the patch
   where produced.

   Finally, Melanie was pushing the idea to commit this patch no matter other
   pending patches/ideas:

   2019-09-05: «If Tomas or someone else has time to pick up and modify BufFile
 accounting patch, committing that still seems like the nest logical
 step.»
   
https://www.postgresql.org/message-id/CAAKRu_b6%2BjC93WP%2BpWxqK5KAZJC5Rmxm8uquKtEf-KQ%2B%2B1Li6Q%40mail.gmail.com

   Unless I'm wrong, no one down voted this.

3. "per slice overflow file"
   last found patch: 2019-05-08
   
https://www.postgresql.org/message-id/20190508150844.rij36rtuk4lhvztw%40development
   
https://www.postgresql.org/message-id/attachment/101080/v4-per-slice-overflow-file-20190508.patch

   This patch has been withdraw after an off-list discussion with Thomas Munro
   because of a missing parallel hashJoin implementation. Plus, before any
   effort started on the parallel implementation, the BNLJ idea appeared and
   seemed more appealing.

   See:
   
https://www.postgresql.org/message-id/20190529145517.sj2poqmb3cr4cg6w%40development

   By the time, it still seems to have some interest despite the BNLJ patch:

   2019-07-10: «If slicing is made to work for parallel-aware hashjoin and the
   code is in a committable state (and probably has the threshold I mentioned
 above), then I think that this patch should go in.»
   
https://www.postgresql.org/message-id/CAAKRu_Yiam-%3D06L%2BR8FR%2BVaceb-ozQzzMqRiY2pDYku1VdZ%3DEw%40mail.gmail.com

   But this might have been disapproved later by Tomas:

   2019-09-10: «I have to admit I kinda lost track [...] My feeling is that we
 should get the BNLJ committed 

Re: pg_usleep for multisecond delays

2023-03-10 Thread Tom Lane
Nathan Bossart  writes:
> On Fri, Feb 10, 2023 at 10:51:20AM -0800, Nathan Bossart wrote:
>> On Fri, Feb 10, 2023 at 10:18:34AM -0500, Tom Lane wrote:
>>> BTW, we have an existing pg_sleep() function that deals with all
>>> of this except re-setting the latch.

> Here is a work-in-progress patch.  I quickly scanned through my previous
> patch and applied this new function everywhere it seemed safe to use (which
> unfortunately is rather limited).

A quick grep for pg_usleep with large intervals finds rather more
than you touched:

contrib/auth_delay/auth_delay.c: 46:pg_usleep(1000L * 
auth_delay_milliseconds);
src/backend/access/nbtree/nbtpage.c: 2979:  pg_usleep(500L);
src/backend/access/transam/xlog.c: 5109:pg_usleep(6000L);
src/backend/libpq/pqcomm.c: 717:pg_usleep(10L); 
/* wait 0.1 sec */
src/backend/postmaster/bgwriter.c: 199: pg_usleep(100L);
src/backend/postmaster/checkpointer.c: 313: pg_usleep(100L);
src/backend/postmaster/checkpointer.c: 1009:pg_usleep(10L); 
/* wait 0.1 sec, then retry */
src/backend/postmaster/postmaster.c: 4295:  pg_usleep(PreAuthDelay 
* 100L);
src/backend/postmaster/walwriter.c: 192:pg_usleep(100L);
src/backend/postmaster/bgworker.c: 762: pg_usleep(PostAuthDelay 
* 100L);
src/backend/postmaster/pgarch.c: 456:   
pg_usleep(100L);
src/backend/postmaster/pgarch.c: 488:   
pg_usleep(100L);/* wait a bit before retrying */
src/backend/postmaster/autovacuum.c: 444:   pg_usleep(PostAuthDelay 
* 100L);
src/backend/postmaster/autovacuum.c: 564:   pg_usleep(100L);
src/backend/postmaster/autovacuum.c: 690:   
pg_usleep(100L);/* 1s */
src/backend/postmaster/autovacuum.c: 1711:  
pg_usleep(PostAuthDelay * 100L);
src/backend/storage/ipc/procarray.c: 3799:  pg_usleep(100 * 1000L); 
/* 100ms */
src/backend/utils/init/postinit.c: 985: 
pg_usleep(PostAuthDelay * 100L);
src/backend/utils/init/postinit.c: 1192:pg_usleep(PostAuthDelay 
* 100L);

Did you have reasons for excluding the rest of these?

Taking a step back, I think it might be a mistake to try to share code
with the SQL-exposed function; at least, that is causing some API
decisions that feel odd.  I have mixed emotions about both the use
of double as the sleep-time datatype, and about the choice of seconds
(rather than, say, msec) as the unit.  Admittedly this is not all bad
--- for example, several of the calls I list above are delaying for
100ms, which we can easily accommodate in this scheme as "0.1", and
maybe it'd be a good idea to hit up the stuff waiting for 10ms too.
It still seems unusual for an internal support function though.
I haven't done the math on it, but are we limiting the precision
of the sleep (due to roundoff error) in any way that would matter?

A bigger issue is that we almost certainly ought to pass through
a wait event code instead of allowing all these cases to be
WAIT_EVENT_PG_SLEEP.

I'd skip the unconditional latch reset you added to pg_sleep_sql.
I realize that's bug-compatible with what happens now, but I still
think it's expending extra code to do what might well be the
wrong thing.

We should update the header comment for pg_usleep to direct people
to this new function.

regards, tom lane




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-10 Thread Chris Travers
"Right, the improvement this patch gives to the heap is not the full 
motivation. Another motivation is the improvement it gives to TableAM API. Our 
current API implies that the effort on locating the tuple by tid is small. This 
is more or less true for the heap, where we just need to pin and lock the 
buffer. But imagine other TableAM implementations, where locating a tuple is 
more expensive."

Yeah. Our TableAM API is a very nice start to getting pluggable storage, but we 
still have a long ways to go to have an ability to really provide a wide 
variety of pluggable storage engines.

In particular, the following approaches are likely to have much more expensive 
tid lookups:
 - columnar storage (may require a lot of random IO to reconstruct a tuple)
 - index oriented storage (tid no longer physically locatable in the file via 
seek)
 - compressed cold storage like pg_ctyogen (again seek may be problematic).

To my mind I think the performance benefits are a nice side benefit, but the 
main interest I have on this is regarding improvements in the TableAM 
capabilities.  I cannot see how to do this without a lot more infrastructure.

Re: RLS makes COPY TO process child tables

2023-03-10 Thread Tom Lane
Stephen Frost  writes:
>> Tom Lane  wrote:
>>> Yugo NAGATA  writes:
 I think this is a bug because the current behaviour is different from
 the documentation. 

>>> I agree, it shouldn't do that.

> Yeah, I agree based on what the COPY table TO docs say should be
> happening.

Yeah, the documentation is quite clear that child data is not included.

> I'm not sure if this makes good sense to back-patch.

I think we have to.  The alternative is to back-patch some very confusing
documentation changes saying "never mind all that if RLS is on".

regards, tom lane




Re: pgsql: Use ICU by default at initdb time.

2023-03-10 Thread Jeff Davis
On Fri, 2023-03-10 at 15:48 +0100, Peter Eisentraut wrote:
> Yes, of course.  So we can't really do what I was thinking of.

OK, I plan to commit something like the patch in this thread soon. I
just need to add an explanatory comment.

It passes CI, but it's possible that there could be more buildfarm
failures that I'll need to look at afterward, so I'll count this as a
"trial fix".

Regards,
Jeff Davis






Re: Date-Time dangling unit fix

2023-03-10 Thread Tom Lane
Alexander Lakhin  writes:
> I also wonder how the units affect time zone parsing.
> With the patch:
> SELECT time with time zone '010203m+3';
> ERROR:  invalid input syntax for type time with time zone: "010203m+3"
> But without the patch:
> SELECT time with time zone '010203m+3';
>   01:02:03+03

Yeah, I think this is the ptype-still-set-at-end-of-loop check.
I'm fine with throwing an error for that.

> Though with "non-unit" spec:
> SELECT time with time zone '010203mmm+3';
>   01:02:03-03
> (With or without the patch.)
> It seems like "units" were just ignored in a time zone specification,
> but now they are rejected.

I think it's reading "mmm+3" as a POSIX timezone spec.  From memory,
POSIX allows any sequence of 3 or more letters as a zone abbreviation.
It looks like we're being lax and not enforcing the "3 or more" part:

regression=# set time zone 'foobar+3';
SET
regression=# select timeofday();
   timeofday

 Fri Mar 10 12:08:24.484853 2023 FOOBAR
(1 row)

regression=# set time zone 'fo+3';
SET
regression=# select timeofday();
 timeofday  

 Fri Mar 10 12:08:38.207311 2023 FO
(1 row)

regards, tom lane




Re: pgsql: Use ICU by default at initdb time.

2023-03-10 Thread Peter Eisentraut

On 10.03.23 15:38, Jeff Davis wrote:

On Fri, 2023-03-10 at 10:59 +0100, Peter Eisentraut wrote:

I think originally the locale forced the encoding.  With ICU, we have
a
choice.  We could either stick to the encoding suggested by the OS,
or
pick our own.


We still need LC_COLLATE and LC_CTYPE to match the database encoding
though. If we get those from the environment (which are connected to an
encoding), then I think we need to get the encoding from the
environment, too, right?


Yes, of course.  So we can't really do what I was thinking of.





Re: pgsql: Use ICU by default at initdb time.

2023-03-10 Thread Jeff Davis
On Fri, 2023-03-10 at 10:59 +0100, Peter Eisentraut wrote:
> I think originally the locale forced the encoding.  With ICU, we have
> a 
> choice.  We could either stick to the encoding suggested by the OS,
> or 
> pick our own.

We still need LC_COLLATE and LC_CTYPE to match the database encoding
though. If we get those from the environment (which are connected to an
encoding), then I think we need to get the encoding from the
environment, too, right?

> Arguably, if we are going to nudge toward ICU, maybe we should nudge 
> toward UTF-8 as well.

The OSes are already doing a pretty good job of that. Regardless, we
need to remove the dependence on LC_CTYPE and LC_COLLATE when the
provider is ICU first (we're close to that point but not quite there).

Regards,
Jeff Davis





Re: [PATCH] Add pretty-printed XML output option

2023-03-10 Thread Tom Lane
Jim Jones  writes:
> On 09.03.23 21:21, Tom Lane wrote:
>> I've looked through this now, and have some minor complaints and a major
>> one.  The major one is that it doesn't work for XML that doesn't satisfy
>> IS DOCUMENT.  For example,

> How do you suggest the output should look like?

I'd say a series of node trees, each starting on a separate line.

>> I also suspect that it's outright broken to attach a header claiming
>> the data is now in UTF8 encoding.  If the database encoding isn't
>> UTF8, then either that's a lie or we now have an encoding violation.

> I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now 
> uses the encoding of the given doc and UTF8 if not provided.

Mmmm  doing this differently from what we do elsewhere does not
sound like the right path forward.  The input *is* (or had better be)
in the database encoding.

regards, tom lane




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-03-10 Thread Masahiko Sawada
On Fri, Mar 10, 2023 at 3:42 PM John Naylor
 wrote:
>
> On Thu, Mar 9, 2023 at 1:51 PM Masahiko Sawada  wrote:
>
> > I've attached the new version patches. I merged improvements and fixes
> > I did in the v29 patch.
>
> I haven't yet had a chance to look at those closely, since I've had to devote 
> time to other commitments. I remember I wasn't particularly impressed that 
> v29-0008 mixed my requested name-casing changes with a bunch of other random 
> things. Separating those out would be an obvious way to make it easier for me 
> to look at, whenever I can get back to this. I need to look at the iteration 
> changes as well, in addition to testing memory measurement (thanks for the 
> new results, they look encouraging).

Okay, I'll separate them again.

>
> > Apart from the memory measurement stuff, I've done another todo item
> > on my list; adding min max classes for node3 and node125. I've done
>
> This didn't help us move us closer to something committable the first time 
> you coded this without making sure it was a good idea. It's still not helping 
> and arguably makes it worse. To be fair, I did speak positively about 
> _considering_ additional size classes some months ago, but that has a very 
> obvious maintenance cost, something we can least afford right now.
>
> I'm frankly baffled you thought this was important enough to work on again, 
> yet thought it was a waste of time to try to prove to ourselves that 
> autovacuum in a realistic, non-deterministic workload gave the same answer as 
> the current tid lookup. Even if we had gone that far, it doesn't seem like a 
> good idea to add non-essential code to critical paths right now.

I didn't think that proving tidstore and the current tid lookup return
the same result was a waste of time. I've shared a patch to do that in
tidstore before. I agreed not to add it to the tree but we can test
that using this patch. Actually I've done a test that ran pgbench
workload for a few days.

IIUC it's still important to consider whether to have node1 since it
could be a good alternative for the path compression. The prototype
also implemented it. Of course we can leave it for future improvement.
But considering this item with the performance tests helps us to prove
our decoupling approach is promising.

> We're rapidly running out of time, and we're at the point in the cycle where 
> it's impossible to get meaningful review from anyone not already intimately 
> familiar with the patch series. I only want to see progress on addressing 
> possible (especially architectural) objections from the community, because if 
> they don't notice them now, they surely will after commit.

Right, we've been making many design decisions. Some of them are
agreed just between you and me and some are agreed with other hackers.
There are some irrevertible design decisions due to the remaining
time.

>  I have my own list of possible objections as well as bikeshedding points, 
> which I'll clean up and share next week.

Thanks.

>  I plan to invite Andres to look at that list and give his impressions, 
> because it's a lot quicker than reading the patches. Based on that, I'll 
> hopefully be able to decide whether we have enough time to address any 
> feedback and do remaining polishing in time for feature freeze.
>
> I'd suggest sharing your todo list in the meanwhile, it'd be good to discuss 
> what's worth doing and what is not.

Apart from more rounds of reviews and tests, my todo items that need
discussion and possibly implementation are:

* The memory measurement in radix trees and the memory limit in
tidstores. I've implemented it in v30-0007 through 0009 but we need to
review it. This is the highest priority for me.

* Additional size classes. It's important for an alternative of path
compression as well as supporting our decoupling approach. Middle
priority.

* Node shrinking support. Low priority.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-10 Thread Önder Kalacı
Hi Amit, all


> I'll look for more opportunities and reply to the thread. I wanted to send
> this mail so that you can have a look at (1) earlier.
>
>
> I merged SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
into SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS.

Also, merged SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA and
 A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY

So, we have 6 test cases left. I start to feel that trying to merge further
is going to start making
the readability get worse. Do you have any further easy test case merge
suggestions?

I think one option could be to drop some cases altogether, but not sure
we'd want that.

As a semi-related question: Are you aware of any setting that'd
make pg_stat_all_indexes
reflect the changes sooner? It is hard to debug what is the bottleneck in
the tests, but
I have a suspicion that there might be several poll_query_until() calls on
pg_stat_all_indexes, which might be the reason?

Attaching v43.


v43_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


v43_0001_Skip-dropped_columns_for_tuples_equal.patch
Description: Binary data


Re: [PATCH] Add pretty-printed XML output option

2023-03-10 Thread Jim Jones

Thanks a lot for the review!

On 09.03.23 21:21, Tom Lane wrote:

I've looked through this now, and have some minor complaints and a major
one.  The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT.  For example,

regression=# select '42'::xml is 
document;
  ?column?
--
  f
(1 row)

regression=# select xmlserialize (content '42' as text);
xmlserialize
---
  42
(1 row)

regression=# select xmlserialize (content '42' as text indent);
ERROR:  invalid XML document
DETAIL:  line 1: Extra content at the end of the document
42
   ^


I assumed it should fail because the XML string doesn't have a 
singly-rooted XML. Oracle has this feature implemented and it does not 
seem to allow non singly-rooted strings either[1]. Also, some the tools 
I use also fail in this case[2,3]


How do you suggest the output should look like? Does the SQL spec also 
define it? I can't find it online :(



This is not what the documentation promises, and I don't think it's
good enough --- the SQL spec has no restriction saying you can't
use INDENT with CONTENT.  I tried adjusting things so that we call
xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
but all that got me was empty output (except for a document header).
It seems like xmlDocDumpFormatMemory is not the thing to use, at least
not in the CONTENT case.  But libxml2 has a few other "dump"
functions, so maybe we can use a different one?  I see we are using
xmlNodeDump elsewhere, and that has a format option, so maybe there's
a way forward there.

A lesser issue is that INDENT tacks on a document header (XML declaration)
whether there was one or not.  I'm not sure whether that's an appropriate
thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
case.  We have code that can strip off the header again, but we
need to figure out exactly when to apply it.
I replaced xmlDocDumpFormatMemory with xmlSaveToBuffer and used to 
option XML_SAVE_NO_DECL for input docs with XML declaration. It no 
longer returns a XML declaration if the input doc does not contain it.

I also suspect that it's outright broken to attach a header claiming
the data is now in UTF8 encoding.  If the database encoding isn't
UTF8, then either that's a lie or we now have an encoding violation.
I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now 
uses the encoding of the given doc and UTF8 if not provided.

Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified.  We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.

I also had a bunch of cosmetic complaints (mostly around this having
a bad case of add-at-the-end-itis), which I've cleaned up in the
attached v20.  This doesn't address any of the above, however.

I swear to god I have no idea what "add-at-the-end-itis" means :)

regards, tom lane


Thanks a lot!

Best, Jim

1 - https://dbfiddle.uk/WUOWtjBU

2 - https://www.samltool.com/prettyprint.php

3 - https://xmlpretty.com/xmlpretty
From 5d522d8ec1bd01731d0f75a4163f9a8ad435bee6 Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Fri, 10 Mar 2023 13:47:16 +0100
Subject: [PATCH v21] Add pretty-printed XML output option

This patch implements the XML/SQL:2011 feature 'X069, XMLSERIALIZE: INDENT.'
It adds the options INDENT and NO INDENT (default) to the existing
xmlserialize function. It uses the indentation feature of xmlSaveToBuffer
from libxml2 to indent XML strings - XML_SAVE_FORMAT. Although the INDENT
feature is designed to work with xml strings of type DOCUMENT, this
implementation also allows the usage of CONTENT type strings as long as it
contains a well-formed singly-rooted xml - note the XMLOPTION_DOCUMENT in
the xml_parse call.

This patch also includes documentation, regression tests and their three
possible output files xml.out, xml_1.out and xml_2.out.
---
 doc/src/sgml/datatype.sgml|   8 +-
 src/backend/catalog/sql_features.txt  |   2 +-
 src/backend/executor/execExprInterp.c |   9 +-
 src/backend/parser/gram.y |  14 ++-
 src/backend/parser/parse_expr.c   |   1 +
 src/backend/utils/adt/xml.c   |  74 +++
 src/include/nodes/parsenodes.h|   1 +
 src/include/nodes/primnodes.h |   4 +-
 src/include/parser/kwlist.h   |   1 +
 src/include/utils/xml.h   |   1 +
 src/test/regress/expected/xml.out | 129 ++
 src/test/regress/expected/xml_1.out   |  85 +
 src/test/regress/expected/xml_2.out   | 129 ++
 

Re: Add LZ4 compression in pg_dump

2023-03-10 Thread Justin Pryzby
On Thu, Mar 09, 2023 at 06:58:20PM +0100, Tomas Vondra wrote:
> I'm a bit confused about the lz4 vs. lz4f stuff, TBH. If we switch to
> lz4f, doesn't that mean it (e.g. restore) won't work on systems that
> only have older lz4 version? What would/should happen if we take backup
> compressed with lz4f, an then try restoring it on an older system where
> lz4 does not support lz4f?

You seem to be thinking about LZ4F as a weird, new innovation I'm
experimenting with, but compress_lz4.c already uses LZ4F for its "file"
API.  LZ4F is also what's written by the lz4 CLI tool, and I found that
LZ4F has been included in the library for ~8 years:

https://github.com/lz4/lz4/releases?page=2
r126 Dec 24, 2014
New : lz4frame API is now integrated into liblz4

> Maybe if lz4f format is incompatible with regular lz4, we should treat
> it as a separate compression method 'lz4f'?
> 
> I'm mostly afk until the end of the week, but I tried searching for lz4f
> info - the results are not particularly enlightening, unfortunately.
> 
> AFAICS this only applies to lz4f stuff. Or would the streaming mode be a
> breaking change too?

Streaming mode outputs the same format as the existing code, but gives
better compression.  We could (theoretically) change it in a bugfix
release, and old output would still be restorable (I think new output
would even be restorable with the old versions of pg_restore).

But that's not true for LZ4F.  The benefit there is that it avoids
outputing a separate block for each row.  That's essential for narrow
tables, for which the block header currently being written has an
overhead several times larger than the data.

-- 
Justin




Re: Add standard collation UNICODE

2023-03-10 Thread Peter Eisentraut

On 09.03.23 20:23, Jeff Davis wrote:

On Thu, 2023-03-09 at 11:21 +0100, Peter Eisentraut wrote:

How about this patch version?


Looks good to me.


Committed, after adding a test.





Re: psql: Add role's membership options to the \du+ command

2023-03-10 Thread Pavel Luzanov

On 08.03.2023 00:02, David G. Johnston wrote:


FWIW I've finally gotten to publishing my beta version of the Role 
Graph for PostgreSQL pseudo-extension I'd been talking about:


https://github.com/polobo/RoleGraphForPostgreSQL


Great. So far I've looked very briefly, but it's interesting.

I handle EMPTY explicitly in the Role Graph but as I noted somewhere 
in my comments, it really shouldn't be possible to leave the database 
in that state.  Do we need to bug Robert on this directly or do you 
plan to have a go at it?


I don't plan to do that. Right now I don't have enough time and 
experience. This requires an experienced developer.


--
Pavel Luzanov
Postgres Professional:https://postgrespro.com


RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-03-10 Thread Takamichi Osumi (Fujitsu)
Hi,


On Friday, March 10, 2023 6:32 PM Wang, Wei/王 威  wrote:
> Attach the new patch set.
Thanks for updating the patch ! One review comment on v7-0005.

stream_start_cb_wrapper and stream_stop_cb_wrapper don't call the pair of 
threshold check and UpdateProgressAndKeepalive unlike other write wrapper 
functions like below. But, both of them write some data to the output plugin, 
set the flag of did_write and thus it updates the subscriber's 
last_recv_timestamp used for timeout check in LogicalRepApplyLoop. So, it looks 
adding the pair to both functions can be more accurate, in order to reset the 
counter in changes_count on the publisher ?

@@ -1280,6 +1282,8 @@ stream_start_cb_wrapper(ReorderBuffer *cache, 
ReorderBufferTXN *txn,

/* Pop the error context stack */
error_context_stack = errcallback.previous;
+
+   /* No progress has been made, so don't call UpdateProgressAndKeepalive 
*/
 }


Best Regards,
Takamichi Osumi



Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-10 Thread Amit Kapila
On Fri, Mar 10, 2023 at 5:16 PM Önder Kalacı  wrote:
>
>>
>> wip_for_optimize_index_column_match
>> +static bool
>> +IndexContainsAnyRemoteColumn(IndexInfo  *indexInfo,
>> + LogicalRepRelation  *remoterel)
>> +{
>> + for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
>> + {
>>
>> Wouldn't it be better to just check if the first column is not part of
>> the remote column then we can skip that index?
>
>
> Reading [1], I think I can follow what you suggest. So, basically,
> if the leftmost column is not filtered, we have the following:
>
>>  but the entire index would have to be scanned, so in most cases the planner 
>> would prefer a sequential table scan over using the index.
>
>
> So, in our case, we could follow a similar approach. If the leftmost column 
> of the index
> is not sent over the wire from the pub, we can prefer the sequential scan.
>
> Is my understanding of your suggestion accurate?
>

Yes. I request an opinion from Shi-San who has reported the problem.

-- 
With Regards,
Amit Kapila.




Re: psql: Add role's membership options to the \du+ command

2023-03-10 Thread Pavel Luzanov

On 08.03.2023 05:31, David G. Johnston wrote:
Moving the goal posts for this meta-command to >= 9.5 seems like it 
should be done as a separate patch and thread.  The documentation 
presently states we are targeting 9.2 and newer.


I missed the comment at the beginning of the file about version 9.2. I 
will return the version check for rolbypassrls.



My suggestion for the docs is below.



+        
+        Shown within each row, in newline-separated format, are the 
memberships granted to

+        the role.  The presentation includes both the name of the grantor
+        as well as the membership permissions (in an abbreviated format:
+        a for admin option, i 
for inherit option,
+        s for set option.) The word 
empty is printed in

+        the case that none of those permissions are granted.
+        See the linkend="sql-grant">GRANT command for their 
meaning.

+        
+        
+        If the form \dg+ is used the comment 
attached to the role is shown.

         


Thanks. I will replace the description with this one.

I would suggest tweaking the test output to include regress_du_admin 
and also to make regress_du_admin a CREATEROLE role with LOGIN.


Ok.

Thank you for review. I will definitely work on the new version, but 
unfortunately and with a high probability it will happen after March 20.


--
Pavel Luzanov
Postgres Professional:https://postgrespro.com


RE: Time delayed LR (WAS Re: logical replication restrictions)

2023-03-10 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

Based on the discussion Sawada-san pointed out[1] that the current approach of
logical time-delayed avoids recycling WALs, I'm planning to close the CF entry 
once.
This or the forked thread will be registered again after deciding on the 
alternative
approach. Thank you very much for the time to join our discussions earlier.

I think to solve the issue, logical changes must be flushed on subscribers once
and workers apply changes after spending a specified time. The straightforward
approach for it is following physical replication - introduce the walreceiver 
process
on the subscriber. We must research more, but at least there are some benefits:

* Publisher can be shutted down even if the apply worker stuck. The stuck is 
more
  likely happen than physical replication, so this may improve the robustness.
  More detail, please see another thread[2].
* In case of synchronous_commit = 'remote_write', publisher can COMMIT faster.
  This is because walreceiver will flush changes immediately and reply soon.
  Even if time-delayed is enabled, the wait-time will not be increased.
* May be used as an infrastructure of parallel apply for non-streaming 
transaction.
  The basic design of them are the similar - one process receive changes and 
others apply.

I searched old discussions [3] and wiki pages, and I found that the initial 
prototype
had a logical walreceiver but in a later version [4] apply worker directly 
received
changes. I could not find the reason for the decision, but I suspect there were 
the
following reasons. Could you please tell me the correct background about that?

* Performance bottlenecks. If the walreceiver flush changes and the worker 
applies
  them, fsync() is called for every reception.
* Complexity. In this design walreceiver and apply worker must share the 
progress
  of flush/apply. For crash recovery, more consideration is needed. The related 
discussion
  can be found in [5].
* Extendibility. In-core logical replication should be a sample of an external
  project. Apply worker is just a background worker that can be launched from 
an extension,
  so it can be easily understood. If it deeply depends on the walreceiver, 
other projects cannot follow.

[1]: 
https://www.postgresql.org/message-id/CAD21AoAeG2%2BRsUYD9%2BmEwr8-rrt8R1bqpe56T2D%3DeuO-Qs-GAg%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/flat/TYAPR01MB586668E50FC2447AD7F92491F5E89%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: 
https://www.postgresql.org/message-id/201206131327.24092.andres%402ndquadrant.com
[4]: 
https://www.postgresql.org/message-id/37e19ad5-f667-2fe2-b95b-bba69c5b6...@2ndquadrant.com
[5]: 
https://www.postgresql.org/message-id/1339586927-13156-12-git-send-email-andres%402ndquadrant.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-10 Thread Önder Kalacı
Hi Amit, all


> wip_for_optimize_index_column_match
> +static bool
> +IndexContainsAnyRemoteColumn(IndexInfo  *indexInfo,
> + LogicalRepRelation  *remoterel)
> +{
> + for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
> + {
>
> Wouldn't it be better to just check if the first column is not part of
> the remote column then we can skip that index?
>

Reading [1], I think I can follow what you suggest. So, basically,
if the leftmost column is not filtered, we have the following:

 but the entire index would have to be scanned, so in most cases the
> planner would prefer a sequential table scan over using the index.


So, in our case, we could follow a similar approach. If the leftmost column
of the index
is not sent over the wire from the pub, we can prefer the sequential scan.

Is my understanding of your suggestion accurate?


>
> In wip_optimize_for_non_pkey_non_ri_unique_index patch, irrespective
> of whether we want to retain this set of changes, the function name
> IsIdxSafeToSkipDuplicates() sounds better than
> IdxIsRelationIdentityOrPK() and we can even change the check to
> GetRelationIdentityOrPK() instead of separate checks replica index and
> PK. So, it would be better to include this part of the change (a.
> change the function name to IsIdxSafeToSkipDuplicates() and (b) change
> the check to use GetRelationIdentityOrPK()) in the main patch.
>
>
>
I agree that it is a good change. Added to v42

Thanks,
Onder KALACI



[1] https://www.postgresql.org/docs/current/indexes-multicolumn.html


v42_0001_Skip-dropped_columns_for_tuples_equal.patch
Description: Binary data


v42_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data


Re: Minimal logical decoding on standbys

2023-03-10 Thread Drouvot, Bertrand

Hi,

On 3/8/23 11:25 AM, Drouvot, Bertrand wrote:

Hi,

On 3/3/23 5:26 PM, Drouvot, Bertrand wrote:

Hi,

On 3/3/23 8:58 AM, Jeff Davis wrote:

On Thu, 2023-03-02 at 11:45 -0800, Jeff Davis wrote:

In this case it looks easier to add the right API than to be sure
about
whether it's needed or not.


I attached a sketch of one approach. 


Oh, that's very cool, thanks a lot!


I'm not very confident that it's
the right API or even that it works as I intended it, but if others
like the approach I can work on it some more.



I'll look at it early next week.



So, I took your patch and as an example I tried a quick integration in 0004,
(see 0004_new_API.txt attached) to put it in the logical decoding on standby 
context.

Based on this, I've 3 comments:

- Maybe ConditionVariableEventSleep() should take care of the “WaitEventSetWait 
returns 1 and cvEvent.event == WL_POSTMASTER_DEATH” case?

- Maybe ConditionVariableEventSleep() could accept and deal with the CV being 
NULL?
I used it in the POC attached to handle logical decoding on the primary server 
case.
One option should be to create a dedicated CV for that case though.

- In the POC attached I had to add this extra condition “(cv && 
!RecoveryInProgress())” to avoid waiting on the timeout when there is a promotion.
That makes me think that we may want to add 2 extra parameters (as 2 functions 
returning a bool?) to ConditionVariableEventSleep()
to check whether or not we still want to test the socket or the CV wake up in 
each loop iteration.

Also 3 additional remarks:

1) About InitializeConditionVariableWaitSet() and ConditionVariableWaitSetCreate(): I'm 
not sure about the naming as there is no CV yet (they "just" deal with 
WaitEventSet).

So, what about renaming?

+static WaitEventSet *ConditionVariableWaitSet = NULL;

to say, "LocalWaitSet" and then rename ConditionVariableWaitSetLatchPos, 
InitializeConditionVariableWaitSet() and ConditionVariableWaitSetCreate() accordingly?

But it might be not needed (see 3) below).

2)

  /*
   * Prepare to wait on a given condition variable.
   *
@@ -97,7 +162,8 @@ ConditionVariablePrepareToSleep(ConditionVariable *cv)
  void
  ConditionVariableSleep(ConditionVariable *cv, uint32 wait_event_info)
  {
-   (void) ConditionVariableTimedSleep(cv, -1 /* no timeout */ ,
+   (void) ConditionVariableEventSleep(cv, ConditionVariableWaitSet,
+  -1 
/* no timeout */ ,
    
wait_event_info);
  }

@@ -111,11 +177,27 @@ ConditionVariableSleep(ConditionVariable *cv, uint32 
wait_event_info)
  bool
  ConditionVariableTimedSleep(ConditionVariable *cv, long timeout,
     uint32 wait_event_info)
+{
+   return ConditionVariableEventSleep(cv, ConditionVariableWaitSet, 
timeout,
+  
wait_event_info);
+}
+

I like the idea of making use of the new ConditionVariableEventSleep() here, 
but on the other hand...

3)

I wonder if there is no race conditions: ConditionVariableWaitSet is being 
initialized with PGINVALID_SOCKET
as WL_LATCH_SET and might be also (if IsUnderPostmaster) be initialized with 
PGINVALID_SOCKET as WL_EXIT_ON_PM_DEATH.

So IIUC, the patch is introducing 2 new possible source of wake up.

Then, what about?

- not create ConditionVariableWaitSet, ConditionVariableWaitSetLatchPos, 
InitializeConditionVariableWaitSet() and ConditionVariableWaitSetCreate() at 
all?
- call ConditionVariableEventSleep() with a NULL parameter in 
ConditionVariableSleep() and ConditionVariableTimedSleep()?
- handle the case where the WaitEventSet parameter is NULL in 
ConditionVariableEventSleep()? (That could also make sense if we handle the 
case of the CV being NULL as proposed above)



I gave it a try, so please find attached 
v2-0001-Introduce-ConditionVariableEventSleep.txt (implementing the comments 
above) and 0004_new_API.txt to put the new API in the logical decoding on 
standby context.

There is no change in v2-0001-Introduce-ConditionVariableEventSleep.txt 
regarding the up-thread comment related to WL_POSTMASTER_DEATH.
 
What do you think?


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 9a820140b7356ab94479499a80fc4742403f3ca5 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Fri, 10 Mar 2023 10:58:23 +
Subject: [PATCH v99 5/7] Fixing Walsender corner case with logical decoding on
 standby.

The problem is that WalSndWaitForWal() waits for the *replay* LSN to
increase, but gets woken up by walreceiver when new WAL has been
flushed. Which means that typically walsenders will get woken up at the
same time that the startup process will be - which means that by the
time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
that the startup 

Re: Add macros for ReorderBufferTXN toptxn

2023-03-10 Thread Amit Kapila
On Fri, Mar 10, 2023 at 4:36 AM Peter Smith  wrote:
>
> During a recent code review, I was confused multiple times by the
> toptxn member of ReorderBufferTXN, which is defined only for
> sub-transactions.
>
> e.g. txn->toptxn member == NULL means the txn is a top level txn.
> e.g. txn->toptxn member != NULL means the txn is not a top level txn
>
> It makes sense if you squint and read it slowly enough, but IMO it's
> too easy to accidentally misinterpret the meaning when reading code
> that uses this member.
>
> ~
>
> Such code can be made easier to read just by introducing some simple macros:
>
> #define isa_toptxn(rbtxn) (rbtxn->toptxn == NULL)
> #define isa_subtxn(rbtxn) (rbtxn->toptxn != NULL)
> #define get_toptxn(rbtxn) (isa_subtxn(rbtxn) ? rbtxn->toptxn : rbtxn)
>
> ~
>
> PSA a small patch that does this.
>

I also find it will make code easier to read. So, +1 to the idea. I'll
do the detailed review and test next week unless there are objections
to the idea.

-- 
With Regards,
Amit Kapila.




Re: How to get the real type use oid in internal codes?

2023-03-10 Thread Peter Eisentraut

On 10.03.23 06:33, jack...@gmail.com wrote:
I can use relation struct  to get all attributes' typeoid, so which 
funcion I can use

to get the real type.


Depends on what you mean by "real", but perhaps the format_type* family 
of functions would help you.






Re: Date-Time dangling unit fix

2023-03-10 Thread Peter Eisentraut

On 04.03.23 22:05, Tom Lane wrote:

Joseph Koshakow  writes:

On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow  wrote:

I just found another class of this bug that the submitted patch does
not fix. If the units are at the beginning of the string, then they are
also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
think the fix here is to check and make sure that ptype is 0 before
reassigning the value to a non-zero number. I'll send an updated patch
with this tonight.

Attached is the described patch.

I started to look at this, and soon noticed that while we have test cases
matching this sort of date input, there is no documentation for it.  The
code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
because it looks a lot like the ISO 8601 format for intervals (durations).
But I don't have a copy of ISO 8601, and some googling fails to find any
indication that anybody else believes this is a valid datetime format.
Wikipedia for example documents a lot of variants of ISO 8601 [1],
but nothing that looks like this.


There are additional formats in (the lesser known) ISO 8601-2, one of 
which looks like this:


'1985Y4M12D', calendar year 1985, April 12th

But that is entirely incompatible with the above example, because it has 
the units after the numbers.


Even more reason not to support the earlier example.





Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-10 Thread Bharath Rupireddy
On Fri, Mar 10, 2023 at 9:54 AM Michael Paquier  wrote:
>
> Hmm.  I think this patch ought to have a result simpler than what's
> proposed here.
>
> First, do we really have to begin marking the functions as non-STRICT
> to abide with the treatment of NULL as a special value?  The part that
> I've found personally the most annoying with these functions is that
> an incorrect bound leads to a random failure, particularly when such
> queries are used for monitoring.  I would simplify the whole with two
> simple rules, as of:
> - Keeping all the functions strict.
> - When end_lsn is a LSN in the future of the current LSN inserted or
> replayed, adjust its value to be the exactly GetXLogReplayRecPtr() or
> GetFlushRecPtr().  This way, monitoring tools can use a value ahead,
> at will.
> - Failing if start_lsn > end_lsn.
> - Failing if start_lsn refers to a position older than what exists is
> still fine by me.

Done this way in the attached v5 patch.

> I would also choose to remove
> pg_get_wal_records_info_till_end_of_wal() from the SQL interface in
> 1.1 to limit the confusion arount it, but keep a few lines of code so
> as we are still able to use it when pg_walinspect 1.0 is the version
> enabled with CREATE EXTENSION.
>
> In short, pg_get_wal_records_info_till_end_of_wal() should be able to
> use exactly the same code as pg_get_wal_records_info(), still you need
> to keep *two* functions for their prosrc with PG_FUNCTION_ARGS as
> arguments so as 1.0 would work when dropped in place.  The result, it
> seems to me, mostly comes to simplify ValidateInputLSNs() and remove
> its till_end_of_wal argument.

This has already been taken care of in the previous patches, e.g. v3
and v4 and so in the latest v5 patch.

> +-- Removed function
> +SELECT 
> pg_get_functiondef('pg_get_wal_records_info_till_end_of_wal'::regproc);
> +ERROR:  function "pg_get_wal_records_info_till_end_of_wal" does not exist
> +LINE 1: SELECT pg_get_functiondef('pg_get_wal_records_info_till_end_...
>
> It seems to me that you should just replace all that and anything
> depending on pg_get_functiondef() with a \dx+ pg_walinspect, that
> would list all the objects part of the extension for the specific
> version you want to test.  Not sure that there is a need to list the
> full function definitions, either.  That just bloats the tests.

Agreed and used \dx+. One can anyways look at the function definitions
and compare for knowing what's changed.

> I think, however, that it is critical to test in oldextversions.out
> the *executions* of the functions, so as we make sure that they don't
> crash.  The patch is missing that.

You mean, we need to test the till_end_of_wal functions that were
removed in the latest version 1.1 but they must work if the extension
is installed with 1.0? If yes, I now added them.

> +-- Invalid input LSNs
> +SELECT * FROM pg_get_wal_record_info('0/0'); -- ERROR
> +ERROR:  invalid input LSN

Removed InvalidRecPtr checks for input/start LSN because anyways the
functions will fail with ERROR:  could not read WAL at LSN 0/0.

Any comments on the attached v5 patch?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 63360cfa8c2a3141a1360a966bc7e210eb57810b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Fri, 10 Mar 2023 10:54:20 +
Subject: [PATCH v5] Rework pg_walinspect functions

---
 contrib/pg_walinspect/Makefile|   2 +-
 .../pg_walinspect/expected/oldextversions.out |  55 +
 .../pg_walinspect/expected/pg_walinspect.out  |  60 -
 contrib/pg_walinspect/meson.build |   1 +
 .../pg_walinspect/pg_walinspect--1.0--1.1.sql |   4 +
 contrib/pg_walinspect/pg_walinspect.c | 213 --
 contrib/pg_walinspect/sql/oldextversions.sql  |  29 +++
 contrib/pg_walinspect/sql/pg_walinspect.sql   |  74 +++---
 doc/src/sgml/pgwalinspect.sgml|  88 
 9 files changed, 307 insertions(+), 219 deletions(-)
 create mode 100644 contrib/pg_walinspect/expected/oldextversions.out
 create mode 100644 contrib/pg_walinspect/sql/oldextversions.sql

diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile
index 7033878a79..22090f7716 100644
--- a/contrib/pg_walinspect/Makefile
+++ b/contrib/pg_walinspect/Makefile
@@ -9,7 +9,7 @@ PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-
 EXTENSION = pg_walinspect
 DATA = pg_walinspect--1.0.sql pg_walinspect--1.0--1.1.sql
 
-REGRESS = pg_walinspect
+REGRESS = pg_walinspect oldextversions
 
 REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/pg_walinspect/walinspect.conf
 
diff --git a/contrib/pg_walinspect/expected/oldextversions.out b/contrib/pg_walinspect/expected/oldextversions.out
new file mode 100644
index 00..4ad85392f0
--- /dev/null
+++ b/contrib/pg_walinspect/expected/oldextversions.out
@@ -0,0 +1,55 @@
+-- test old extension version entry points
+DROP EXTENSION pg_walinspect;

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2023-03-10 Thread Önder Kalacı
Hi Peter, all


> src/backend/replication/logical/relation.c
>
> 1. FindUsableIndexForReplicaIdentityFull
>
> +static Oid
> +FindUsableIndexForReplicaIdentityFull(Relation localrel)
> +{
> + List*indexlist = RelationGetIndexList(localrel);
> + Oid usableIndex = InvalidOid;
> + ListCell   *lc;
> +
> + foreach(lc, indexlist)
> + {
> + Oid idxoid = lfirst_oid(lc);
> + bool isUsableIndex;
> + Relation indexRelation = index_open(idxoid, AccessShareLock);
> + IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);
> +
> + isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);
> +
> + index_close(indexRelation, AccessShareLock);
> +
> + if (isUsableIndex)
> + {
> + /* we found one eligible index, don't need to continue */
> + usableIndex = idxoid;
> + break;
> + }
> + }
> +
> + return usableIndex;
> +}
>
> This comment is not functional -- if you prefer the code as-is, then
> ignore this comment.
>
> But, personally I would:
> - Move some of that code from the declarations. I feel it would be
> better if the index_open/index_close were both in the code-body
> instead of half in declarations and half not.
> - Remove the 'usableIndex' variable, and just return directly.
> - Shorten all the long names (and use consistent 'idx' instead of
> sometimes 'idx' and sometimes 'index')
>
> SUGGESTION (YMMV)
>
> static Oid
> FindUsableIndexForReplicaIdentityFull(Relation localrel)
> {
> List*idxlist = RelationGetIndexList(localrel);
> ListCell   *lc;
>
> foreach(lc, idxlist)
> {
> Oid idxoid = lfirst_oid(lc);
> bool isUsableIdx;
> Relation idxRel;
> IndexInfo  *idxInfo;
>
> idxRel = index_open(idxoid, AccessShareLock);
> idxInfo = BuildIndexInfo(idxRel);
> isUsableIdx = IsIndexUsableForReplicaIdentityFull(idxInfo);
> index_close(idxRel, AccessShareLock);
>
> /* Return the first eligible index found */
> if (isUsableIdx)
> return idxoid;
> }
>
> return InvalidOid;
> }
>

applied your suggestion. I think it made it slightly easier to follow.


>
> ==
> .../subscription/t/032_subscribe_use_index.pl
>
> 2. SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
>
> 2a.
> # Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
> #
> # This test ensures that after CREATE INDEX, the subscriber can
> automatically
> # use one of the indexes (provided that it fulfils the requirements).
> # Similarly, after DROP index, the subscriber can automatically switch to
> # sequential scan
>
>
> The last sentence is missing full-stop.
>
>
fixed


> ~
>
> 2b.
> # now, create index and see that the index is used
> $node_subscriber->safe_psql('postgres',
> "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
>
> Don't say "and see that the index is used". Yes, that is what this
> whole test is doing, but it is not what the psql following this
> comment is doing.
>

 true


> ~
>
> 2c.
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
> $node_publisher->wait_for_catchup($appname);
>
>
> # wait until the index is used on the subscriber
>
> The double blank lines here should be single.
>
> ~
>

fixed,


>
> 2d.
> # now, the update could either use the test_replica_id_full_idx or
> # test_replica_id_full_idy index it is not possible for user to control
> # which index to use
>
> This sentence should break at "it".
>
> Aso "user" --> "the user"

SUGGESTION
> # now, the update could either use the test_replica_id_full_idx or
> # test_replica_id_full_idy index; it is not possible for the user to
> control
> # which index to use
>
>
looks good, thanks


> ~
>
> 2e.
> # let's also test dropping test_replica_id_full_idy and
> # hence use test_replica_id_full_idx
>
>
> I think you ought to have dropped the other (first) index because we
> already know that the first index had been used (from earlier), but we
> are not 100% sure if the 'y' index has been chosen yet.
>

 make sense. Though in general it is hard to check pg_stat_all_indexes
for any of the indexes on this test, as we don't know the exact number
of tuples for each. Just wanted to explicitly note



> 
>
> 3. SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS
>
> 3a.
> # deletes 20 rows
> $node_publisher->safe_psql('postgres',
> "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
>
> # updates 20 rows
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = 100, y = '200' WHERE x IN (1,
> 2);");
>
>
> "deletes" --> "delete"
>
> "updates" --> "update"
>

Well, I thought the command *deletes* but I guess delete looks better


>
> ~~~
>
> 4. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS
>
> # updates 200 rows
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");
>
>
> "updates" --> "update"
>
> "200 rows" ??? is that right --  20 maybe ???
>
>
I guess this is from an earlier version of the patch, I fixed these types
of errors.


> ~~~
>
> 5. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES
>
> 5a.

Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-10 Thread Michael Paquier
On Fri, Mar 10, 2023 at 04:37:23PM +0800, Julien Rouhaud wrote:
> isn't '0/0' the same as InvalidXLogRecPtr? but my point is that we
> shouldn't require to spell it explicitly, just rely on the default value.

Perhaps.  Still the addition of a DEFAULT to the function definitions
and its value looks like a second patch to me.  The first should just
lift the bound restrictions currently in place while cleaning up the 
till_* functions.
--
Michael


signature.asc
Description: PGP signature


Re: logical decoding and replication of sequences, take 2

2023-03-10 Thread John Naylor
On Wed, Mar 1, 2023 at 1:02 AM Tomas Vondra 
wrote:
> here's a rebased patch to make cfbot happy, dropping the first part that
> is now unnecessary thanks to 7fe1aa991b.

Hi Tomas,

I'm looking into doing some "in situ" testing, but for now I'll mention
some minor nits I found:

0001

+ * so we simply do a lookup (the sequence is identified by relfilende). If

relfilenode? Or should it be called a relfilelocator, which is the
parameter type? I see some other references to relfilenode in comments and
commit message, and I'm not sure which need to be updated.

+ /* XXX Maybe check that we're still in the same top-level xact? */

Any ideas on what should happen here?

+ /* XXX how could we have sequence change without data? */
+ if(!datalen || !tupledata)
+ elog(ERROR, "sequence decode missing tuple data");

Since the ERROR is new based on feedback, we can get rid of XXX I think.

More generally, I associate XXX comments to highlight problems or
unpleasantness in the code that don't quite rise to the level of FIXME, but
are perhaps more serious than "NB:", "Note:", or "Important:"

+ * When we're called via the SQL SRF there's already a transaction

I see this was copied from existing code, but I found it confusing -- does
this function have a stable name?

+ /* Only ever called from ReorderBufferApplySequence, so transational. */

Typo: transactional

0002

I see a few SERIAL types in the tests but no GENERATED ... AS IDENTITY --
not sure if it matters, but seems good for completeness.

Reminder for later: Patches 0002 and 0003 still refer to 0da92dc530, which
is a reverted commit -- I assume it intends to refer to the content of 0001?

--
John Naylor
EDB: http://www.enterprisedb.com


Re: pgsql: Use ICU by default at initdb time.

2023-03-10 Thread Peter Eisentraut

On 10.03.23 03:26, Jeff Davis wrote:

That's because ICU always uses UTF-8 by default. ICU works just fine
with many other encodings; is there a reason it doesn't take it from
the environment just like for provider=libc?


I think originally the locale forced the encoding.  With ICU, we have a 
choice.  We could either stick to the encoding suggested by the OS, or 
pick our own.


Arguably, if we are going to nudge toward ICU, maybe we should nudge 
toward UTF-8 as well.






Re: pgsql: Allow tailoring of ICU locales with custom rules

2023-03-10 Thread Peter Eisentraut

On 08.03.23 21:57, Jeff Davis wrote:

On Wed, 2023-03-08 at 16:03 +, Peter Eisentraut wrote:

Allow tailoring of ICU locales with custom rules


Late review:

* Should throw error when provider != icu and rules != NULL


I have fixed that.


* Explain what the example means. By itself, users might get confused
wondering why someone would want to do that.

* Also consider a more practical example?


I have added a more practical example with explanation.


* It appears rules IS NULL behaves differently from rules=''. Is that
desired? For instance:
   create collation c1(provider=icu,
 locale='und-u-ka-shifted-ks-level1',
 deterministic=false);
   create collation c2(provider=icu,
 locale='und-u-ka-shifted-ks-level1',
 rules='',
 deterministic=false);
   select 'a b' collate c1 = 'ab' collate c1; -- true
   select 'a b' collate c2 = 'ab' collate c2; -- false


I'm puzzled by this.  The general behavior is, extract the rules of the 
original locale, append the custom rules, use that.  If the custom rules 
are the empty string, that should match using the original rules 
untouched.  Needs further investigation.



* Can you document the interaction between locale keywords
("@colStrength=primary") and a rule like '[strength 2]'?


I'll look into that.





RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-03-10 Thread wangw.f...@fujitsu.com
On Mon, Mar 10, 2023 14:35 PM Amit Kapila  wrote:
> On Fri, Mar 10, 2023 at 11:17 AM Peter Smith  wrote:
> >
> > On Fri, Mar 10, 2023 at 3:32 PM Amit Kapila  wrote:
> > >
> > > On Thu, Mar 9, 2023 at 10:56 AM Peter Smith 
> wrote:
> > > >
> > > > 2. rollback_prepared_cb_wrapper
> > > >
> > > >   /*
> > > >   * If the plugin support two-phase commits then rollback prepared 
> > > > callback
> > > >   * is mandatory
> > > > + *
> > > > + * FIXME: This should have been caught much earlier.
> > > >   */
> > > >   if (ctx->callbacks.rollback_prepared_cb == NULL)
> > > >   ereport(ERROR,
> > > >
> > > > ~
> > > >
> > > > Why is this seemingly unrelated FIXME still in the patch?
> > > >
> > >
> > > After reading this Fixme comment and the error message ("logical
> > > replication at prepare time requires a %s callback
> > > rollback_prepared_cb"), I think we can move this and a similar check
> > > in function commit_prepared_cb_wrapper() to prepare_cb_wrapper()
> > > function. This is because there is no use of letting prepare pass when
> > > we can't do a rollback or commit prepared. What do you think?
> > >
> >
> > My first impression was it sounds like a good idea to catch the
> > missing callbacks early as you said.
> >
> > But if you decide to check for missing commit/rollback callbacks early
> > in prepare_cb_wrapper(), then won't you also want to have equivalent
> > checking done earlier for stream_prepare_cb_wrapper()?
> >
> 
> Yeah, probably or we can leave the lazy checking as it is. In the
> ideal case, we could check for the presence of all the callbacks in
> StartupDecodingContext() but we delay it to find the missing methods
> later. One possibility is that we check for any missing method in
> StartupDecodingContext() if any one of prepare/streaming calls are
> present but not sure if that is any better than the current
> arrangement.
> 
> > And then it quickly becomes a slippery slope to question many other things:
> > - Why allow startup_cb if shutdown_cb is missing?
> >
> 
> I am not sure if there is a hard dependency between these two but
> their callers do check for Null before invoking those.
> 
> > - Why allow change_cb if commit_cb or rollback_cb is missing?
> 
> We have a check for change_cb and commit_cb in LoadOutputPlugin. Do we
> have rollback_cb() defined at all?
> 
> > - Why allow filter_prepare_cb if prepare_cb is missing?
> >
> 
> I am not so sure about this but If prepare gets filtered, we don't
> need to invoke prepare_cb.
> 
> > - etc.
> >
> > ~
> >
> > So I am wondering if the HEAD code lazy-check of the callback only at
> > the point where it is needed was actually a deliberate design choice
> > just to be simpler - e.g. we don't need to be so concerned about any
> > other callback dependencies.
> >
> 
> Yeah, changing that probably needs some more thought. I have mentioned
> one of the possibilities above.

I think this approach looks fine to me. So, I wrote a separate patch (0006) for
discussing and reviewing this approach.

Regards,
Wang wei


RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-03-10 Thread wangw.f...@fujitsu.com
On Thur, Mar 9, 2023 13:26 PM Peter Smith  wrote:
> Here are some review comments for v6-0001

Thanks for your comments.

> ==
> General.
> 
> 1.
> There are lots of new comments saying:
> /* don't call update progress, we didn't really make any */
> 
> but is the wording "call update progress" meaningful?
> 
> Should that be written something more like:
> /* No progress has been made so there is no need to call
> UpdateProgressAndKeepalive. */

Changed.
Shortened your suggested comment using a grammar tool. So, the modified comment
looks like this:
```
No progress has been made, so don't call UpdateProgressAndKeepalive
```

> ~~~
> 
> 4.
> 
> @@ -1370,6 +1377,8 @@ stream_abort_cb_wrapper(ReorderBuffer *cache,
> ReorderBufferTXN *txn,
> 
>   /* Pop the error context stack */
>   error_context_stack = errcallback.previous;
> +
> + UpdateProgressAndKeepalive(ctx, (txn->toptxn == NULL));
>  }
> 
> ~
> 
> Are the double parentheses necessary?

I think the code looks clearer this way.

> ==
> src/backend/replication/walsender.c
> 
> 6. WalSndUpdateProgressAndKeepalive
> 
> Since the 'ctx' is unused here, it might be nicer to annotate that to
> make it clear it is deliberate and suppress any possible warnings
> about unused params.
> 
> e.g. something like:
> 
> WalSndUpdateProgressAndKeepalive(
> pg_attribute_unused() LogicalDecodingContext *ctx,
> XLogRecPtr lsn,
> TransactionId xid,
> bool did_write,
> bool finished_xact)

Because many functions don't use this approach, I’m not sure what the rules are
for using it in PG. And I think that we should discuss this on a separate thread
to check which similar functions need this kind of modification in PG source
code.

Regards,
Wang wei


RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-03-10 Thread wangw.f...@fujitsu.com
On Wed, Mar 8, 2023 23:55 PM Osumi, Takamichi/大墨 昂道 
 wrote:
> Hi,
> 
> 
> On Wednesday, March 8, 2023 11:54 AM From: wangw.f...@fujitsu.com
>  wrote:
> > Attach the new patch.
> Thanks for sharing v6 ! Few minor comments for the same.

Thanks for your comments.

> (1) commit message
> 
> The old function name 'is_skip_threshold_change' is referred currently. We 
> need
> to update it to 'is_keepalive_threshold_exceeded' I think.

Fixed.

> (2) OutputPluginPrepareWrite
> 
> @@ -662,7 +656,8 @@ void
>  OutputPluginPrepareWrite(struct LogicalDecodingContext *ctx, bool last_write)
>  {
> if (!ctx->accept_writes)
> -   elog(ERROR, "writes are only accepted in commit, begin and 
> change
> callbacks");
> +   elog(ERROR, "writes are only accepted in output plugin 
> callbacks, "
> +"except startup, shutdown, filter_by_origin, and 
> filter_prepare.");
> 
> We can remove the period at the end of error string.

Removed.

> (3) is_keepalive_threshold_exceeded's comments
> 
> +/*
> + * Helper function to check whether a large number of changes have been
> skipped
> + * continuously.
> + */
> +static bool
> +is_keepalive_threshold_exceeded(LogicalDecodingContext *ctx)
> 
> I suggest to update the comment slightly something like below.
> From:
> ...whether a large number of changes have been skipped continuously
> To:
> ...whether a large number of changes have been skipped without being sent to
> the output plugin continuously

Make sense.
Also, I slightly corrected the original function comment with a grammar check
tool. So, the modified comment looks like this:
```
Helper function to check for continuous skipping of many changes without sending
them to the output plugin.
```

> (4) term for 'keepalive'
> 
> +/*
> + * Update progress tracking and send keep alive (if required).
> + */
> 
> The 'keep alive' might be better to be replaced with 'keepalive', which looks
> commonest in other source codes. In the current patch, there are 3 different
> ways to express it (the other one is 'keep-alive') and it would be better to 
> unify
> the term, at least within the same patch ?

Yes, agree.
Unified the comment you mentioned here ('keep alive') and the comment in the
commit message ('keep-alive') as 'keepalive'.

Regards,
Wang wei


RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-03-10 Thread wangw.f...@fujitsu.com
On Wed, Mar 8, 2023 19:06 PM Kuroda, Hayato/黒田 隼人  
wrote:
> Dear Wang,

Thanks for your testing and comments.

> ---
> ```
> +/*
> + * Update progress tracking and send keep alive (if required).
> + */
> +static void
> +UpdateProgressAndKeepalive(LogicalDecodingContext *ctx, bool finished_xact)
> ```
> 
> Can we add atop the UpdateProgressAndKeepalive()? Currently the developers
> who
> create output plugins must call OutputPluginUpdateProgress(), but from now the
> function is not only renamed but does not have nessesary to call from plugin
> (of cource we do not restrict to call it). I think it must be clarified for 
> them.

Make sense.
Added some comments atop this function.

> ---
> ReorderBufferUpdateProgressTxnCB must be removed from typedefs.list.

Removed.

> ---
> Do we have to write a document for the breakage somewhere? I think we do not
> have
> to add appendix-obsolete-* file because we did not have any links for that, 
> but
> we can add a warning in "Functions for Producing Output" subsection if needed.

Since we've moved the feature (update progress and send keepalive) from the
output plugin into the infrastructure, the output plugin is no longer
responsible for maintaining this feature anymore. Also, I think output plugin
developers only need to remove the call to the old function
OutputPluginUpdateProgress if they get compile errors related to this
modification. So, it seems to me that we don't need to add relevant
modifications in pg-doc.

Regards,
Wang wei


RE: Rework LogicalOutputPluginWriterUpdateProgress

2023-03-10 Thread wangw.f...@fujitsu.com
On Mon, Mar 10, 2023 11:56 AM Amit Kapila  wrote:
> On Wed, Mar 8, 2023 at 8:24 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > Attach the new patch.
> >
> 
> I think this combines multiple improvements in one patch. We can
> consider all of them together or maybe it would be better to split
> some of those. Do we think it makes sense to split some of the
> improvements? I could think of below:
> 
> 1. Remove SyncRepRequested() check from WalSndUpdateProgress().
> 2. Add check of wal_sender_timeout > 0 in WalSndUpdateProgress() and
> any other similar place.
> 3. Change the name of ProcessPendingWrites() to WalSndSendPending().
> 4. Change WalSndUpdateProgress() to WalSndUpdateProgressAndKeepalive().
> 5. The remaining patch.

I think it would help to review different improvements separately, so I split
the patch as suggested.

Also addressed the comments by Kuroda-san, Osumi-san and Peter.
Attach the new patch set.

Regards,
Wang wei


v7-0001-Remove-SyncRepRequested-check-from-WalSndUpdatePr.patch
Description:  v7-0001-Remove-SyncRepRequested-check-from-WalSndUpdatePr.patch


v7-0002-Check-wal_sender_timeout-is-in-effect-before-usin.patch
Description:  v7-0002-Check-wal_sender_timeout-is-in-effect-before-usin.patch


v7-0003-Rename-the-function-ProcessPendingWrites-to-WalSn.patch
Description:  v7-0003-Rename-the-function-ProcessPendingWrites-to-WalSn.patch


v7-0004-Rename-the-function-WalSndUpdateProgress-to-WalSn.patch
Description:  v7-0004-Rename-the-function-WalSndUpdateProgress-to-WalSn.patch


v7-0005-Rework-LogicalOutputPluginWriterUpdateProgressAnd.patch
Description:  v7-0005-Rework-LogicalOutputPluginWriterUpdateProgressAnd.patch


v7-0006-Catch-the-absence-of-commit-rollback_prepared_cb_.patch
Description:  v7-0006-Catch-the-absence-of-commit-rollback_prepared_cb_.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-03-10 Thread Katsuragi Yuta

On 2023-03-08 13:40, Hayato Kuroda (Fujitsu) wrote:

Dear Vignesh,

Thank you for reviewing! PSA new version.


Hi,

Thank you for the comments, Vignesh.
Thank you for updating the patch, Kuroda-san. This fix looks fine to me.

And also, there seems no other comments from this post [1].
So, I'm planning to update the status to ready for committer
on next Monday.


[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866F8EA3C06E1B43E42E4E4F5B79%40TYAPR01MB5866.jpnprd01.prod.outlook.com


regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: POC: Lock updated tuples in tuple_update() and tuple_delete()

2023-03-10 Thread Alexander Korotkov
On Wed, Mar 8, 2023 at 4:22 AM Andres Freund  wrote:
> On 2023-03-07 04:45:32 +0300, Alexander Korotkov wrote:
> > The second patch now implements a concept of LazyTupleTableSlot, a slot
> > which gets allocated only when needed.  Also, there is more minor
> > refactoring and more comments.
>
> This patch already is pretty big for what it actually improves. Introducing
> even infrastructure to get a not that big win, in a not particularly
> interesting, extreme, workload...

It's true that the win isn't dramatic.  But can't agree that workload
isn't interesting.  In my experience, high-contention over limited set
of row is something that frequently happen is production.  I
personally took part in multiple investigations over such workloads.

> What is motivating this?

Right, the improvement this patch gives to heap is not the full
motivation.  Another motivation is improvement it gives to TableAM
API.  Our current API implies that the effort on locating the tuple by
tid is small.  This is more or less true for heap, where we just need
to pin and lock the buffer.  But imagine other TableAM
implementations, where locating a tuple is more expensive.  Current
API insist that we do that twice in update attempt and lock.  Doing
that in single call could give such TableAM's singification economy
(but even for heap it's something).  I'm working on such TableAM: it's
OrioleDB which implements index-organized tables.  And I know there
are other examples (for instance, zedstore), where TID lookup includes
some indirection.

--
Regards,
Alexander Korotkov




Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-10 Thread Julien Rouhaud
On Fri, 10 Mar 2023, 16:14 Michael Paquier,  wrote:

> On Fri, Mar 10, 2023 at 04:04:15PM +0800, Julien Rouhaud wrote:
> > As long as we provide a sensible default value (so I guess '0/0' to
> > mean "no upper bound") and that we therefore don't have to manually
> > specify an upper bound if we don't want one I'm fine with keeping the
> > functions marked as STRICT.
>
> FWIW, using also InvalidXLogRecPtr as a shortcut to say "Don't fail,
> just do the job" is fine by me.


isn't '0/0' the same as InvalidXLogRecPtr? but my point is that we
shouldn't require to spell it explicitly, just rely on the default value.

Something like a FFF/ should
> just mean the same on a fresh cluster, still it gets risky the longer
> the WAL is generated.
>

yeah, it would be handy to accept 'infinity' in that context.

>


Re: Compilation error after redesign of the archive modules

2023-03-10 Thread Michael Paquier
On Fri, Mar 10, 2023 at 01:41:07PM +0530, Sravan Kumar wrote:
> I have attached a patch that fixes the problem. Can you please review
> if it makes sense to push this patch?

Indeed, reproduced here.  I'll fix that in a bit..
--
Michael


signature.asc
Description: PGP signature


Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-10 Thread Michael Paquier
On Fri, Mar 10, 2023 at 04:04:15PM +0800, Julien Rouhaud wrote:
> As long as we provide a sensible default value (so I guess '0/0' to
> mean "no upper bound") and that we therefore don't have to manually
> specify an upper bound if we don't want one I'm fine with keeping the
> functions marked as STRICT.

FWIW, using also InvalidXLogRecPtr as a shortcut to say "Don't fail,
just do the job" is fine by me.  Something like a FFF/ should
just mean the same on a fresh cluster, still it gets risky the longer
the WAL is generated.
--
Michael


signature.asc
Description: PGP signature


Assert failure of the cross-check for nullingrels

2023-03-10 Thread Richard Guo
While looking into issue [1], I came across $subject on master.  Below
is how to reproduce it.

DROP TABLE IF EXISTS t1,t2,t3,t4 CASCADE;
CREATE TABLE t1 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t2 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t3 AS SELECT true AS x FROM generate_series(0,1) x;
CREATE TABLE t4 AS SELECT true AS x FROM generate_series(0,1) x;
ANALYZE;

explain (costs off)
select * from t1 left join (t2 left join t3 on t2.x) on t2.x left join t4
on t3.x and t2.x where t1.x = coalesce(t2.x,true);

I've looked into this a little bit.  For the join of t2/t3 to t4, since
it can commute with the join of t1 to t2/t3 according to identity 3, we
would generate multiple versions for its joinquals.  In particular, the
qual 't3.x' would have two versions, one with varnullingrels as {t2/t3,
t1/t2}, the other one with varnullingrels as {t2/t3}.  So far so good.

Assume we've determined to build the join of t2/t3 to t4 after we've
built t1/t2 and t2/t3, then we'd find that both versions of qual 't3.x'
would be accepted by clause_is_computable_at.  This is not correct.  We
are supposed to accept only the one marked as {t2/t3, t1/t2}.  The other
one is not rejected mainly because we found that the qual 't3.x' does
not mention any nullable Vars of outer join t1/t2.

I wonder if we should consider syn_xxxhand rather than min_xxxhand in
clause_is_computable_at when we check if clause mentions any nullable
Vars.  But I'm not sure about that.

[1]
https://www.postgresql.org/message-id/flat/0b819232-4b50-f245-1c7d-c8c61bf41827%40postgrespro.ru

Thanks
Richard


Compilation error after redesign of the archive modules

2023-03-10 Thread Sravan Kumar
Hi,
With the redesign of the archive modules:
35739b87dcfef9fc0186aca659f262746fecd778 - Redesign archive modules
   if we were to compile basic_archive module with USE_PGXS=1, we get
compilation error:

[]$ make USE_PGXS=1
gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -g -O0 -fPIC
-fvisibility=hidden -I. -I./
-I/home/sravanv/work/workspaces/PGdevel_test/include/postgresql/server
-I/home/sravanv/work/workspaces/PGdevel_test/include/postgresql/internal
 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o basic_archive.o
basic_archive.c -MMD -MP -MF .deps/basic_archive.Po
basic_archive.c:33:36: fatal error: archive/archive_module.h: No such
file or directory
 #include "archive/archive_module.h"
^
compilation terminated.
make: *** [basic_archive.o] Error 1

I have attached a patch that fixes the problem. Can you please review
if it makes sense to push this patch?

-- 
Thanks & Regards,
Sravan Velagandula
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v1-0001-fix-archive-module-compilation-error.patch
Description: Binary data


Re: Combine pg_walinspect till_end_of_wal functions with others

2023-03-10 Thread Julien Rouhaud
On Fri, Mar 10, 2023 at 12:24 PM Michael Paquier  wrote:
>
> On Wed, Mar 08, 2023 at 01:40:46PM +0530, Bharath Rupireddy wrote:
> > 1. When start_lsn is NULL or invalid ('0/0'), emit an error. There was
> > a comment on the functions automatically determining start_lsn to be
> > the oldest WAL LSN. I'm not implementing this change now, as it
> > requires extra work to traverse the pg_wal directory. I'm planning to
> > do it in the next set of improvements where I'm planning to make the
> > functions timeline-aware, introduce functions for inspecting
> > wal_buffers and so on.
> >
> > [.. long description ..]
> >
> > 9. Refactored the tests according to the new behaviours.
>
> Hmm.  I think this patch ought to have a result simpler than what's
> proposed here.
>
> First, do we really have to begin marking the functions as non-STRICT
> to abide with the treatment of NULL as a special value?  The part that
> I've found personally the most annoying with these functions is that
> an incorrect bound leads to a random failure, particularly when such
> queries are used for monitoring.

As long as we provide a sensible default value (so I guess '0/0' to
mean "no upper bound") and that we therefore don't have to manually
specify an upper bound if we don't want one I'm fine with keeping the
functions marked as STRICT.

>  I would simplify the whole with two
> simple rules, as of:
> - Keeping all the functions strict.
> - When end_lsn is a LSN in the future of the current LSN inserted or
> replayed, adjust its value to be the exactly GetXLogReplayRecPtr() or
> GetFlushRecPtr().  This way, monitoring tools can use a value ahead,
> at will.
> - Failing if start_lsn > end_lsn.
> - Failing if start_lsn refers to a position older than what exists is
> still fine by me.

+1

> I would also choose to remove
> pg_get_wal_records_info_till_end_of_wal() from the SQL interface in
> 1.1 to limit the confusion arount it, but keep a few lines of code so
> as we are still able to use it when pg_walinspect 1.0 is the version
> enabled with CREATE EXTENSION.

Yeah the SQL function should be removed no matter what.