Re: WAL Insertion Lock Improvements

2023-05-08 Thread Michael Paquier
On Mon, May 08, 2023 at 08:18:04PM +0530, Bharath Rupireddy wrote:
> On Mon, May 8, 2023 at 5:57 PM Bharath Rupireddy
>  wrote:
>> On Mon, Apr 10, 2023 at 9:38 AM Michael Paquier  wrote:
>>> The sensitive change is in LWLockUpdateVar().  I am not completely
>>> sure to understand this removal, though.  Does that influence the case
>>> where there are waiters?
>>
>> I'll send about this in a follow-up email to not overload this
>> response with too much data.
> 
> It helps the case when there are no waiters. IOW, it updates value
> without waitlist lock when there are no waiters, so no extra waitlist
> lock acquisition/release just to update the value. In turn, it helps
> the other backend wanting to flush the WAL looking for the new updated
> value of insertingAt in WaitXLogInsertionsToFinish(), now the flushing
> backend can get the new value faster.

Sure, which is what the memory barrier given by exchange_u64
guarantees.  My thoughts on this one is that I am not completely sure
to understand that we won't miss any waiters that should have been
awaken.

> The fastpath exit in LWLockUpdateVar() doesn't seem to influence the
> results much, see below results. However, it avoids waitlist lock
> acquisition when there are no waiters.
> 
> test-case 1: -T5, WAL ~16 bytes
> clientsHEADPATCHED with fastpathPATCHED no fast path
> 64501354552846653
> 128949038979189103
> 25682289152915152835
> 51262498138838142084
> 76857083125074126768
> 102451308113593115930
> 2048410848876485110
> 4096199394225743917

Considering that there could be a few percents of noise mixed into
that, that's not really surprising as the workload is highly
concurrent on inserts so the fast path won't really shine :)

Should we split this patch into two parts, as they aim at tackling two
different cases then?  One for LWLockConflictsWithVar() and
LWLockReleaseClearVar() which are the straight-forward pieces
(using one pg_atomic_write_u64() in LWLockUpdateVar instead), then
a second for LWLockUpdateVar()?

Also, the fast path treatment in LWLockUpdateVar() may show some
better benefits when there are really few backends and a bunch of very
little records?  Still, even that sounds a bit limited..

> Out of 3 functions that got rid of waitlist lock
> LWLockConflictsWithVar/LWLockWaitForVar, LWLockUpdateVar,
> LWLockReleaseClearVar, perf reports tell that the biggest gain (for
> the use-cases that I've tried) is for
> LWLockConflictsWithVar/LWLockWaitForVar:
>
> test-case 6: -T5, WAL 4096 bytes
> HEAD:
> +   29.66% 0.07%  postgres  [.] LWLockWaitForVar
> +   20.94% 0.08%  postgres  [.] LWLockConflictsWithVar
>  0.19% 0.03%  postgres  [.] LWLockUpdateVar
> 
> PATCHED:
> +3.95% 0.08%  postgres  [.] LWLockWaitForVar
>  0.19% 0.03%  postgres  [.] LWLockConflictsWithVar
> +1.73% 0.00%  postgres  [.] LWLockReleaseClearVar

Indeed.
--
Michael


signature.asc
Description: PGP signature


RE: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

2023-05-08 Thread Hayato Kuroda (Fujitsu)
Dear Sawada-san,

Thank you for giving the patch! I confirmed that the problem you raised could be
occurred on the HEAD, and the test you added could reproduce that. When the 
stats
entry has been removed but pg_stat_get_subscription_stats() is called, the 
returned
values are set as 0x0.
Additionally, I have checked other pgstat_drop_* functions, and I could not find
any similar problems.

A comment:

```
+   /*
+* Tell the cumulative stats system that the subscription is getting
+* dropped.
+*/
+   pgstat_drop_subscription(subid);
```

Isn't it better to write down something you said as comment? Or is it quite 
trivial?

> There is a chance the
> transaction dropping the subscription fails due to network error etc
> but we don't need to worry about it as reporting the subscription drop
> is transactional.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: WAL Insertion Lock Improvements

2023-05-08 Thread Michael Paquier
On Tue, May 09, 2023 at 09:34:56AM +0530, Bharath Rupireddy wrote:
> Below is the configuration I've been using. I have been keeping the
> checkpoints away so far to get expected numbers. Probably, something
> that I should modify for this long run? Change checkpoint_timeout to
> 15 min or so?
> 
> max_wal_size=64GB
> checkpoint_timeout=1d
> shared_buffers=8GB
> max_connections=5000

Noted.  Something like that should be OK IMO, with all the checkpoints
generated based on the volume generated.  With records that have a
fixed size, this should, I assume, lead to results that could be
compared across runs, even if the patched code would lead to more
checkpoints generated.
--
Michael


signature.asc
Description: PGP signature


Re: WAL Insertion Lock Improvements

2023-05-08 Thread Bharath Rupireddy
On Tue, May 9, 2023 at 9:27 AM Michael Paquier  wrote:
>
> On Tue, May 09, 2023 at 09:24:14AM +0530, Bharath Rupireddy wrote:
> > I'll pick a test case that generates a reasonable amount of WAL 256
> > bytes. What do you think of the following?
> >
> > test-case 2: -T900, WAL ~256 bytes (for c in 1 2 4 8 16 32 64 128 256
> > 512 768 1024 2048 4096 - takes 3.5hrs)
> > test-case 2: -t100, WAL ~256 bytes
> >
> > If okay, I'll fire the tests.
>
> Sounds like a sensible duration, yes.  What's your setting for
> min/max_wal_size?  I assume that there are still 16GB throttled with
> target_completion at 0.9?

Below is the configuration I've been using. I have been keeping the
checkpoints away so far to get expected numbers. Probably, something
that I should modify for this long run? Change checkpoint_timeout to
15 min or so?

max_wal_size=64GB
checkpoint_timeout=1d
shared_buffers=8GB
max_connections=5000

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: DROP DATABASE is interruptible

2023-05-08 Thread Andres Freund
Hi,

On 2023-05-09 15:41:36 +1200, Thomas Munro wrote:
> I tried out the patch you posted over at [1].

Thanks!


> $ psql db2
> psql: error: connection to server on socket "/tmp/.s.PGSQL.5432"
> failed: FATAL:  database "db2" is invalid
> DETAIL:  Use DROP DATABASE to drop invalid databases
> 
> I suppose it should be a HINT?

Yup.


> +# FIXME: It'd be good to test the actual interruption path. But it's not
> +# immediately obvious how.
> 
> I wonder if there is some way to incorporate something based on
> SIGSTOP signals into the test, but I don't know how to do it on
> Windows and maybe that's a bit weird anyway.  For a non-OS-specific
> way to do it, I was wondering about having a test module function that
> has a wait loop that accepts ^C but deliberately ignores
> ProcSignalBarrier, and leaving that running in a background psql for a
> similar effect?

Seems a bit too complicated.

We really need to work at a framework for this kind of thing.


> Not sure why the test is under src/test/recovery.

Where else? We don't really have a place to put backend specific tests that
aren't about logical replication or recovery right now...

It partially is about dealing with crashes etc in the middle of DROP DATABASE,
so it doesn't seem unreasonble to me anyway.


> While a database exists in this state, we get periodic autovacuum
> noise, which I guess we should actually skip?

Yes, good catch.

Also should either reset datfrozenxid et al when invalidating, or ignore it
when computing horizons.


> I suppose someone might eventually wonder if autovacuum could complete the
> drop, but it seems a bit of a sudden weird leap in duties and might be
> confusing (perhaps it'd make more sense if 'invalid because creating' and
> 'invalid because dropping' were distinguished).

I'm bit hesitant to do so for now. Once it's a bit more settled, maybe?
Although I wonder if there's something better suited to the task than
autovacuum.

Greetings,

Andres Freund




Re: WAL Insertion Lock Improvements

2023-05-08 Thread Michael Paquier
On Tue, May 09, 2023 at 09:24:14AM +0530, Bharath Rupireddy wrote:
> I'll pick a test case that generates a reasonable amount of WAL 256
> bytes. What do you think of the following?
> 
> test-case 2: -T900, WAL ~256 bytes (for c in 1 2 4 8 16 32 64 128 256
> 512 768 1024 2048 4096 - takes 3.5hrs)
> test-case 2: -t100, WAL ~256 bytes
> 
> If okay, I'll fire the tests.

Sounds like a sensible duration, yes.  What's your setting for
min/max_wal_size?  I assume that there are still 16GB throttled with
target_completion at 0.9?
--
Michael


signature.asc
Description: PGP signature


Re: WAL Insertion Lock Improvements

2023-05-08 Thread Bharath Rupireddy
On Tue, May 9, 2023 at 9:02 AM Michael Paquier  wrote:
>
> On Mon, May 08, 2023 at 04:04:10PM -0700, Nathan Bossart wrote:
> > On Mon, May 08, 2023 at 05:57:09PM +0530, Bharath Rupireddy wrote:
> >> test-case 1: -T5, WAL ~16 bytes
> >> test-case 1: -t1000, WAL ~16 bytes
> >
> > I wonder if it's worth doing a couple of long-running tests, too.
>
> Yes, 5s or 1000 transactions per client is too small, though it shows
> that things are going in the right direction.

I'll pick a test case that generates a reasonable amount of WAL 256
bytes. What do you think of the following?

test-case 2: -T900, WAL ~256 bytes (for c in 1 2 4 8 16 32 64 128 256
512 768 1024 2048 4096 - takes 3.5hrs)
test-case 2: -t100, WAL ~256 bytes

If okay, I'll fire the tests.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: DROP DATABASE is interruptible

2023-05-08 Thread Thomas Munro
On Tue, May 9, 2023 at 3:41 PM Thomas Munro  wrote:
> I tried out the patch you posted over at [1].

I forgot to add, +1, I think this is a good approach.

(I'm still a little embarrassed at how long we spent trying to debug
this in the other thread from the supplied clues, when you'd already
pointed this failure mechanism out including the exact error message a
couple of months ago.  One thing I've noticed is that new threads
posted in the middle of commitfests are hard to see :-D  We were
getting pretty close, though.)




Re: DROP DATABASE is interruptible

2023-05-08 Thread Thomas Munro
I tried out the patch you posted over at [1].  For those wanting an
easy way to test it, or test the buggy behaviour in master without
this patch, you can simply kill -STOP the checkpointer, so that DROP
DATABASE hangs in RequestCheckpoint() (or you could SIGSTOP any other
backend so it hangs in the barrier thing instead), and then you can
just press ^C like this:

postgres=# create database db2;
CREATE DATABASE
postgres=# drop database db2;
^CCancel request sent
ERROR:  canceling statement due to user request

After that you get:

$ psql db2
psql: error: connection to server on socket "/tmp/.s.PGSQL.5432"
failed: FATAL:  database "db2" is invalid
DETAIL:  Use DROP DATABASE to drop invalid databases

I suppose it should be a HINT?

+# FIXME: It'd be good to test the actual interruption path. But it's not
+# immediately obvious how.

I wonder if there is some way to incorporate something based on
SIGSTOP signals into the test, but I don't know how to do it on
Windows and maybe that's a bit weird anyway.  For a non-OS-specific
way to do it, I was wondering about having a test module function that
has a wait loop that accepts ^C but deliberately ignores
ProcSignalBarrier, and leaving that running in a background psql for a
similar effect?

Not sure why the test is under src/test/recovery.

While a database exists in this state, we get periodic autovacuum
noise, which I guess we should actually skip?  I suppose someone might
eventually wonder if autovacuum could complete the drop, but it seems
a bit of a sudden weird leap in duties and might be confusing (perhaps
it'd make more sense if 'invalid because creating' and 'invalid
because dropping' were distinguished).

2023-05-09 15:24:10.860 NZST [523191] FATAL:  database "db2" is invalid
2023-05-09 15:24:10.860 NZST [523191] DETAIL:  Use DROP DATABASE to
drop invalid databases
2023-05-09 15:25:10.883 NZST [523279] FATAL:  database "db2" is invalid
2023-05-09 15:25:10.883 NZST [523279] DETAIL:  Use DROP DATABASE to
drop invalid databases
2023-05-09 15:26:10.899 NZST [523361] FATAL:  database "db2" is invalid
2023-05-09 15:26:10.899 NZST [523361] DETAIL:  Use DROP DATABASE to
drop invalid databases
2023-05-09 15:27:10.919 NZST [523408] FATAL:  database "db2" is invalid
2023-05-09 15:27:10.919 NZST [523408] DETAIL:  Use DROP DATABASE to
drop invalid databases
2023-05-09 15:28:10.938 NZST [523456] FATAL:  database "db2" is invalid
2023-05-09 15:28:10.938 NZST [523456] DETAIL:  Use DROP DATABASE to
drop invalid databases

[1] 
https://www.postgresql.org/message-id/20230509013255.fjrlpitnj3ltur76%40awork3.anarazel.de




Re: WAL Insertion Lock Improvements

2023-05-08 Thread Michael Paquier
On Mon, May 08, 2023 at 04:04:10PM -0700, Nathan Bossart wrote:
> On Mon, May 08, 2023 at 05:57:09PM +0530, Bharath Rupireddy wrote:
>> test-case 1: -T5, WAL ~16 bytes
>> test-case 1: -t1000, WAL ~16 bytes
> 
> I wonder if it's worth doing a couple of long-running tests, too.

Yes, 5s or 1000 transactions per client is too small, though it shows
that things are going in the right direction. 

(Will reply to the rest in a bit..)
--
Michael


signature.asc
Description: PGP signature


Re: Tab completion for CREATE SCHEMAAUTHORIZATION

2023-05-08 Thread Michael Paquier
On Mon, May 08, 2023 at 05:36:27PM +0100, Dagfinn Ilmari Mannsåker wrote:
> Here's an updated v3 patch with that.  While adding that, I noticed that
> CREATE UNLOGGED only tab-completes TABLE and MATERIALIZED VIEW, not
> SEQUENCE, so I added that (and removed MATERIALIZED VIEW when part of
> CREATE SCHEMA).

+   /* but not MATVIEW in CREATE SCHEMA */
+   if (HeadMatches("CREATE", "SCHEMA"))
+   COMPLETE_WITH("TABLE", "SEQUENCE");
+   else
+   COMPLETE_WITH("TABLE", "SEQUENCE", "MATERIALIZED VIEW");

This may look strange at first glance, but the grammar is what it
is..  Perhaps matviews could be part of that at some point.  Or not. :)

+   /* only some object types can be created as part of CREATE SCHEMA */
+   if (HeadMatches("CREATE", "SCHEMA"))
+   COMPLETE_WITH("TABLE", "VIEW", "INDEX", "SEQUENCE", "TRIGGER",
+ /* for INDEX and TABLE/SEQUENCE, respectively */
+ "UNIQUE", "UNLOGGED");

Not including TEMPORARY is OK here as the grammar does not allow a
directly to create a temporary schema.  The (many) code paths that
have TailMatches() to cope with CREATE SCHEMA would continue the
completion of added, but at least this approach avoids the
recommendation if possible.

That looks pretty much OK to me.  One tiny comment I have is that this
lacks brackets for the inner blocks, so I have added some in the v4
attached.
--
Michael
From d163ead2e81048af130ab4b72642b8e89d0c7e32 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 9 May 2023 12:25:03 +0900
Subject: [PATCH v4] Add tab completion for CREATE SCHEMA

 - AUTHORIZATION both in addition to and after a schema name
 - list of owner roles after AUTHORIZATION
 - CREATE and GRANT after any otherwise-complete command
 - Only the allowed object types after CREATE

Also adjust complation after CREATE UNLOGGED:

 - Add SEQUENCE
 - Don't suggest MATERIALIZED VIEW in CREATE TABLE
---
 src/bin/psql/tab-complete.c | 45 ++---
 1 file changed, 37 insertions(+), 8 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index bd04244969..9b0b7ddb33 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1031,9 +1031,13 @@ static const SchemaQuery Query_for_trigger_of_table = {
 "   FROM pg_catalog.pg_roles "\
 "  WHERE rolname LIKE '%s'"
 
+/* add these to Query_for_list_of_roles in OWNER contexts */
+#define Keywords_for_list_of_owner_roles \
+"CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"
+
 /* add these to Query_for_list_of_roles in GRANT contexts */
 #define Keywords_for_list_of_grant_roles \
-"PUBLIC", "CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"
+Keywords_for_list_of_owner_roles, "PUBLIC"
 
 #define Query_for_all_table_constraints \
 "SELECT conname "\
@@ -1785,8 +1789,15 @@ psql_completion(const char *text, int start, int end)
 /* CREATE */
 	/* complete with something you can create */
 	else if (TailMatches("CREATE"))
-		matches = rl_completion_matches(text, create_command_generator);
-
+	{
+		/* only some object types can be created as part of CREATE SCHEMA */
+		if (HeadMatches("CREATE", "SCHEMA"))
+			COMPLETE_WITH("TABLE", "VIEW", "INDEX", "SEQUENCE", "TRIGGER",
+		  /* for INDEX and TABLE/SEQUENCE, respectively */
+		  "UNIQUE", "UNLOGGED");
+		else
+			matches = rl_completion_matches(text, create_command_generator);
+	}
 	/* complete with something you can create or replace */
 	else if (TailMatches("CREATE", "OR", "REPLACE"))
 		COMPLETE_WITH("FUNCTION", "PROCEDURE", "LANGUAGE", "RULE", "VIEW",
@@ -3154,6 +3165,20 @@ psql_completion(const char *text, int start, int end)
 	else if (TailMatches("AS", "ON", "SELECT|UPDATE|INSERT|DELETE", "TO"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
 
+/* CREATE SCHEMA [  ] [ AUTHORIZATION ] */
+	else if (Matches("CREATE", "SCHEMA"))
+		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_schemas,
+ "AUTHORIZATION");
+	else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION") ||
+			 Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION"))
+		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_roles,
+ Keywords_for_list_of_owner_roles);
+	else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION", MatchAny) ||
+			 Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", MatchAny))
+		COMPLETE_WITH("CREATE", "GRANT");
+	else if (Matches("CREATE", "SCHEMA", MatchAny))
+		COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT");
+
 /* CREATE SEQUENCE --- is allowed inside CREATE SCHEMA, so use TailMatches */
 	else if (TailMatches("CREATE", "SEQUENCE", MatchAny) ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY", "SEQUENCE", MatchAny))
@@ -3185,9 +3210,15 @@ psql_completion(const char *text, int start, int end)
 	/* Complete "CREATE TEMP/TEMPORARY" with the possible temp objects */
 	else if (TailMatches("CREATE", "TEMP|TEMPORARY"))
 		COMPLETE_WITH("SEQUENCE", "TABLE", "VIEW");
-	/* Complete "CREATE UNLOGGED" with TABLE or MATVIEW */
+	/* 

Re: Improve list manipulation in several places

2023-05-08 Thread Richard Guo
On Tue, May 9, 2023 at 1:48 AM Ranier Vilela  wrote:

> I think you missed list_nth_xid, It makes perfect sense to exist.
>

It seems that list_nth_xid is more about simplification.  So maybe we
should put it in 0001?

Thanks
Richard


Re: Improve list manipulation in several places

2023-05-08 Thread Richard Guo
On Tue, May 9, 2023 at 1:26 AM Alvaro Herrera 
wrote:

> The problem I see is that each of these new functions has a single
> caller, and the only one that looks like it could have a performance
> advantage is list_copy_move_nth_to_head() (which is the weirdest of the
> lot).  I'm inclined not to have any of these single-use functions unless
> a performance case can be made for them.


Yeah, maybe this is the reason I failed to devise a query that shows any
performance gain.  I tried with a query which makes the 'all_pathkeys'
in sort_inner_and_outer being length of 500 and still cannot see any
notable performance improvements gained by list_copy_move_nth_to_head.
Maybe the cost of other parts of planning swamps the performance gain
here?  Now I agree that maybe 0002 is not worthwhile to do.

Thanks
Richard


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-05-08 Thread Peter Smith
Hi Kuroda-san. Here are some review comments for the v10-0001 patch.

==

General.

1. pg_dump option is documented to the user.

I'm not sure about exposing the new pg_dump
--logical-replication-slots-only option to the user.

I thought this pg_dump option was intended only to be called
*internally* by the pg_upgrade.
But, this patch is also documenting the new option for the user (in
case they want to call it independently?)

Maybe exposing it  is OK, but if you do that then I thought perhaps
there should also be some additional pg_dump tests just for this
option (i.e. tested independently of the pg_upgrade)

==
Commit message

2.
For pg_upgrade, when '--include-logical-replication-slots' is
specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and
restores from the
dump. Apart from restoring schema, pg_resetwal must not be called
after restoring
replication slots. This is because the command discards WAL files and
starts from a
new segment, even if they are required by replication slots. This
leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this,
replication slots
are restored at a different time than other objects, after running pg_resetwal.

~~

The "Apart from" sentence maybe could do with some rewording. I
noticed there is a code comment (below fragment) that says the same as
this, but more clearly. Maybe it is better to use that code-comment
wording in the comment message.

+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring
+ * replication slots and other objects. Replication slots, in
+ * particular, should not be restored before executing the pg_resetwal
+ * command because it will remove WALs that are required by the slots.

==
src/bin/pg_dump/pg_dump.c

3. main

+ if (dopt.logical_slots_only && !dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.logical_slots_only && dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+ if (dopt.logical_slots_only && dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
+

Consider if it might be simpler to combine together all those
dopt.logical_slots_only checks.

SUGGESTION

if (dopt.logical_slots_only)
{
if (!dopt.binary_upgrade)
pg_fatal("options --logical-replication-slots-only requires
option --binary-upgrade");

if (dopt.dataOnly)
pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
if (dopt.schemaOnly)
pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
}

~~~

4. getLogicalReplicationSlots

+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 16 || !dopt->logical_slots_only)
+ return;

I'm not sure if this check is necessary. Given the way this function
is called, is it possible for this check to fail? Maybe that quick
exit would be better code as an Assert?

~~~

5. dumpLogicalReplicationSlot

+dumpLogicalReplicationSlot(Archive *fout,
+const LogicalReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+
+ if (!dopt->logical_slots_only)
+ return;

(Similar to the previous comment). Is it even possible to arrive here
when dopt->logical_slots_only is false. Maybe that quick exit would be
better coded as an Assert?

~

6.
+ PQExpBuffer query = createPQExpBuffer();
+ char*slotname = pg_strdup(slotinfo->dobj.name);

I wondered if it was really necessary to strdup/free this slotname.
e.g. And if it is, then why don't you do this for the slotinfo->plugin
field?

==
src/bin/pg_upgrade/check.c

7. check_and_dump_old_cluster

  /* Extract a list of databases and tables from the old cluster */
  get_db_and_rel_infos(_cluster);
+ get_logical_slot_infos(_cluster);

Is it correct to associate this new call with that existing comment
about "databases and tables"?

~~~

8. check_new_cluster

@@ -188,6 +190,7 @@ void
 check_new_cluster(void)
 {
  get_db_and_rel_infos(_cluster);
+ get_logical_slot_infos(_cluster);

  check_new_cluster_is_empty();

@@ -210,6 +213,9 @@ check_new_cluster(void)
  check_for_prepared_transactions(_cluster);

  check_for_new_tablespace_dir(_cluster);
+
+ if (user_opts.include_logical_slots)
+ check_for_parameter_settings(_cluster);

Can the get_logical_slot_infos() be done later, guarded by that the
same condition if (user_opts.include_logical_slots)?

~~~

9. check_new_cluster_is_empty

+ * If --include-logical-replication-slots is required, check the
+ * existing of slots
+ */

Did you mean to say "check the existence of slots"?

~~~

10. check_for_parameter_settings

+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but set to \"%s\"",
+ wal_level);

/but set to/but is set to/



Re: Improve list manipulation in several places

2023-05-08 Thread Richard Guo
On Mon, May 8, 2023 at 11:22 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 23.04.23 08:42, Richard Guo wrote:
> > Thanks for the suggestion.  I've split the patch into two as attached.
> > 0001 is just a minor simplification by replacing lfirst(list_head(list))
> > with linitial(list).  0002 introduces new functions to reduce the
> > movement of list elements in several places so as to gain performance
> > improvement and benefit future callers.
>
> These look sensible to me.  If you could show some numbers that support
> the claim that there is a performance advantage, it would be even more
> convincing.


Thanks Peter for looking at those patches.  I tried to devise a query to
show performance gain but did not succeed :-(.  So I begin to wonder if
0002 is worthwhile to do, as it seems that it does not solve any real
problem.

Thanks
Richard


Re: Cleaning up array_in()

2023-05-08 Thread Tom Lane
Alexander Lakhin  writes:
> The only thing that confused me, is the error message (it's not new, too):
> select '{{1}}'::int[];
> or even:
> select '{{'::int[];
> ERROR:  number of array dimensions (7) exceeds the maximum allowed (6)

Yeah, I didn't touch that, but it's pretty bogus because the first
number will always be "7" even if you wrote more than 7 left braces,
since the code errors out immediately upon finding that it's seen
too many braces.

The equivalent message in the PLs just says "number of array dimensions
exceeds the maximum allowed (6)".  I'm inclined to do likewise in
array_in, but didn't touch it here.

> Beside that, I would like to note the following error text changes
> (all of these are feasible, I think):

I'll look into whether we can improve those, unless you had a patch
in mind already?

regards, tom lane




Re: Cleaning up array_in()

2023-05-08 Thread Alexander Lakhin

02.05.2023 18:41, Tom Lane wrote:

So, here's a rewrite.

Although I view this as a bug fix, AFAICT the only effects are to
accept input that should be rejected.  So again I don't advocate
back-patching.  But should we sneak it into v16, or wait for v17?


I've tested the patch from a user perspective and found no interesting cases
that were valid before, but not accepted with the patch (or vice versa):
The only thing that confused me, is the error message (it's not new, too):
select '{{1}}'::int[];
or even:
select '{{'::int[];
ERROR:  number of array dimensions (7) exceeds the maximum allowed (6)

Maybe it could be reworded like that?:
too many opening braces defining dimensions (maximum dimensions allowed: 6)

Beside that, I would like to note the following error text changes
(all of these are feasible, I think):
select '{{1},{{'::int[];
Before:
ERROR:  malformed array literal: "{{1},{{"
LINE 1: select '{{1},{{'::int[];
   ^
DETAIL:  Unexpected end of input.

After:
ERROR:  malformed array literal: "{{1},{{"
LINE 1: select '{{1},{{'::int[];
   ^
DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
---
select '{{1},{{'::int[];
Before:
ERROR:  number of array dimensions (7) exceeds the maximum allowed (6)

After:
ERROR:  malformed array literal: "{{1},{{"
LINE 1: select '{{1},{{'::int[];
   ^
DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
---
select '{{1},{}}}'::int[];
Before:
ERROR:  malformed array literal: "{{1},{}}}"
LINE 1: select '{{1},{}}}'::int[];
   ^
DETAIL:  Unexpected "}" character.

After:
ERROR:  malformed array literal: "{{1},{}}}"
LINE 1: select '{{1},{}}}'::int[];
   ^
DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
---
select '{{}}}'::int[];
Before:
ERROR:  malformed array literal: "{{}}}"
LINE 1: select '{{}}}'::int[];
   ^
DETAIL:  Unexpected "}" character.

After:
ERROR:  malformed array literal: "{{}}}"
LINE 1: select '{{}}}'::int[];
   ^
DETAIL:  Junk after closing right brace.

Best regards,
Alexander




Re: Perform streaming logical transactions by background workers and parallel apply

2023-05-08 Thread Masahiko Sawada
On Mon, May 8, 2023 at 8:09 PM Amit Kapila  wrote:
>
> On Fri, May 5, 2023 at 9:14 AM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Wednesday, May 3, 2023 3:17 PM Amit Kapila  
> > wrote:
> > >
> >
> > Attach another patch to fix the problem that pa_shutdown will access invalid
> > MyLogicalRepWorker. I personally want to avoid introducing new static 
> > variable,
> > so I only reorder the callback registration in this version.
> >
> > When testing this, I notice a rare case that the leader is possible to 
> > receive
> > the worker termination message after the leader stops the parallel worker. 
> > This
> > is unnecessary and have a risk that the leader would try to access the 
> > detached
> > memory queue. This is more likely to happen and sometimes cause the failure 
> > in
> > regression tests after the registration reorder patch because the dsm is
> > detached earlier after applying the patch.
> >
>
> I think it is only possible for the leader apply can worker to try to
> receive the error message from an error queue after your 0002 patch.
> Because another place already detached from the queue before stopping
> the parallel apply workers. So, I combined both the patches and
> changed a few comments and a commit message. Let me know what you
> think of the attached.

I have one comment on the detaching error queue part:

+   /*
+* Detach from the error_mq_handle for the parallel apply worker before
+* stopping it. This prevents the leader apply worker from trying to
+* receive the message from the error queue that might already
be detached
+* by the parallel apply worker.
+*/
+   shm_mq_detach(winfo->error_mq_handle);
+   winfo->error_mq_handle = NULL;

In pa_detach_all_error_mq(), we try to detach error queues of all
workers in the pool. I think we should check if the queue is already
detached (i.e. is NULL) there. Otherwise, we will end up a SEGV if an
error happens after detaching the error queue and before removing the
worker from the pool.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Perform streaming logical transactions by background workers and parallel apply

2023-05-08 Thread Masahiko Sawada
On Mon, May 8, 2023 at 3:34 PM Amit Kapila  wrote:
>
> On Mon, May 8, 2023 at 11:08 AM Masahiko Sawada  wrote:
> >
> > On Mon, May 8, 2023 at 12:52 PM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Monday, May 8, 2023 11:08 AM Masahiko Sawada 
> > >
> > > Hi,
> > >
> > > >
> > > > On Tue, May 2, 2023 at 12:22 PM Amit Kapila 
> > > > wrote:
> > > > >
> > > > > On Fri, Apr 28, 2023 at 11:48 AM Masahiko Sawada
> > > >  wrote:
> > > > > >
> > > > > > While investigating this issue, I've reviewed the code around
> > > > > > callbacks and worker termination etc and I found a problem.
> > > > > >
> > > > > > A parallel apply worker calls the before_shmem_exit callbacks in the
> > > > > > following order:
> > > > > >
> > > > > > 1. ShutdownPostgres()
> > > > > > 2. logicalrep_worker_onexit()
> > > > > > 3. pa_shutdown()
> > > > > >
> > > > > > Since the worker is detached during logicalrep_worker_onexit(),
> > > > > > MyLogicalReplication->leader_pid is an invalid when we call
> > > > > > pa_shutdown():
> > > > > >
> > > > > > static void
> > > > > > pa_shutdown(int code, Datum arg)
> > > > > > {
> > > > > > Assert(MyLogicalRepWorker->leader_pid != InvalidPid);
> > > > > > SendProcSignal(MyLogicalRepWorker->leader_pid,
> > > > > >PROCSIG_PARALLEL_APPLY_MESSAGE,
> > > > > >InvalidBackendId);
> > > > > >
> > > > > > Also, if the parallel apply worker fails shm_toc_lookup() during the
> > > > > > initialization, it raises an error (because of noError = false) but
> > > > > > ends up a SEGV as MyLogicalRepWorker is still NULL.
> > > > > >
> > > > > > I think that we should not use MyLogicalRepWorker->leader_pid in
> > > > > > pa_shutdown() but instead store the leader's pid to a static 
> > > > > > variable
> > > > > > before registering pa_shutdown() callback.
> > > > > >
> > > > >
> > > > > Why not simply move the registration of pa_shutdown() to someplace
> > > > > after logicalrep_worker_attach()?
> > > >
> > > > If we do that, the worker won't call dsm_detach() if it raises an
> > > > ERROR in logicalrep_worker_attach(), is that okay? It seems that it's
> > > > no practically problem since we call dsm_backend_shutdown() in
> > > > shmem_exit(), but if so why do we need to call it in pa_shutdown()?
> > >
> > > I think the dsm_detach in pa_shutdown was intended to fire on_dsm_detach
> > > callbacks to give callback a chance to report stat before the stat system 
> > > is
> > > shutdown, following what we do in ParallelWorkerShutdown() (e.g.
> > > sharedfileset.c callbacks cause fd.c to do ReportTemporaryFileUsage(), so 
> > > we
> > > need to fire that earlier).
> > >
> > > But for parallel apply, we currently only have one on_dsm_detach
> > > callback(shm_mq_detach_callback) which doesn't report extra stats. So the
> > > dsm_detach in pa_shutdown is only used to make it a bit future-proof in 
> > > case
> > > we add some other on_dsm_detach callbacks in the future which need to 
> > > report
> > > stats.
> >
> > Make sense . Given that it's possible that we add other callbacks that
> > report stats in the future, I think it's better not to move the
> > position to register pa_shutdown() callback.
> >
>
> Hmm, what kind of stats do we expect to be collected before we
> register pa_shutdown? I think if required we can register such a
> callback after pa_shutdown. I feel without reordering the callbacks,
> the fix would be a bit complicated as explained in my previous email,
> so I don't think it is worth complicating this code unless really
> required.

Fair point. I agree that the issue can be resolved by carefully
ordering the callback registration.

Another thing I'm concerned about is that since both the leader worker
and parallel worker detach DSM before logicalrep_worker_onexit(),
cleaning up work that touches DSM cannot be done in
logicalrep_worker_onexit(). If we need to do something in the future,
we would need to have another callback called before detaching DSM.
But I'm fine as it's not a problem for now.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

2023-05-08 Thread Nathan Bossart
On Mon, May 08, 2023 at 04:23:15PM +0900, Masahiko Sawada wrote:
> We call pgstat_drop_subscription() at the end of DropSubscription()
> but we could leave from this function earlier e.g. when no slot is
> associated with the subscription. In this case, the statistics entry
> for the subscription remains. To fix it, I think we need to call it
> earlier, just after removing the catalog tuple. There is a chance the
> transaction dropping the subscription fails due to network error etc
> but we don't need to worry about it as reporting the subscription drop
> is transactional.

Looks reasonable to me.  IIUC calling pgstat_drop_subscription() earlier
makes no real difference (besides avoiding this bug) because it is uѕing
pgstat_drop_transactional() behind the scenes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Cleaning up array_in()

2023-05-08 Thread Nathan Bossart
On Tue, May 02, 2023 at 11:41:27AM -0400, Tom Lane wrote:
> It looks like back in the dim mists of the
> Berkeley era, there was an intentional attempt to allow
> non-rectangular array input, with the missing elements automatically
> filled out as NULLs.  Since that was undocumented, we concluded it was
> a bug and plastered on some code to check for rectangularity of the
> input.

Interesting.

> Although I view this as a bug fix, AFAICT the only effects are to
> accept input that should be rejected.  So again I don't advocate
> back-patching.  But should we sneak it into v16, or wait for v17?

I think it'd be okay to sneak it into v16, given it is technically a bug
fix.

> (This leaves ArrayGetOffset0() unused, but I'm unsure whether to
> remove that.)

Why's that?  Do you think it is likely to be used again in the future?
Otherwise, 0001 LGTM.

I haven't had a chance to look at 0002 closely yet.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: evtcache: EventTriggerCache vs Event Trigger Cache

2023-05-08 Thread Nathan Bossart
On Mon, May 08, 2023 at 10:39:42AM +0200, Daniel Gustafsson wrote:
> On 4 May 2023, at 14:18, Daniel Gustafsson  wrote:
>> On 4 May 2023, at 14:09, Tom Lane  wrote:
>>> Hmm, I'm kinda -1 on them having the same name visible in the
>>> contexts dump --- that seems very confusing.  How about naming
>>> the hash "EventTriggerCacheHash" or so?
>> 
>> I think the level is the indicator here, but I have no strong opinions,
>> EventTriggerCacheHash is fine by me.
> 
> The attached trivial diff does that, parking this in the next CF.

+1 for EventTriggerCacheHash

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: WAL Insertion Lock Improvements

2023-05-08 Thread Nathan Bossart
On Mon, May 08, 2023 at 05:57:09PM +0530, Bharath Rupireddy wrote:
> I ran performance tests on the patch with different use-cases. Clearly
> the patch reduces burden on LWLock's waitlist lock (evident from perf
> reports [1]). However, to see visible impact in the output, the txns
> must be generating small (between 16 bytes to 2 KB) amounts of WAL in
> a highly concurrent manner, check the results below (FWIW, I've zipped
> and attached perf images for better illustration along with test
> setup).
> 
> When the txns are generating a small amount of WAL i.e. between 16
> bytes to 2 KB in a highly concurrent manner, the benefit is clearly
> visible in the TPS more than 2.3X improvement. When the txns are
> generating more WAL i.e. more than 2 KB, the gain from reduced burden
> on waitlist lock is offset by increase in the wait/release for WAL
> insertion locks and no visible benefit is seen.
> 
> As the amount of WAL each txn generates increases, it looks like the
> benefit gained from reduced burden on waitlist lock is offset by
> increase in the wait for WAL insertion locks.

Nice.

> test-case 1: -T5, WAL ~16 bytes
> test-case 1: -t1000, WAL ~16 bytes

I wonder if it's worth doing a couple of long-running tests, too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: PGDOCS - Replica Identity quotes

2023-05-08 Thread Peter Smith
On Mon, May 8, 2023 at 2:05 PM Michael Paquier  wrote:
>
> On Mon, May 08, 2023 at 01:57:33PM +1000, Peter Smith wrote:
> > I agree. Do you want me to make a new v4 patch to also do that, or
> > will you handle them in passing?
>
> I'll just go handle them on the way, no need to send an updated
> patch.

Thanks for pushing this yesterday, and for handling the other quotes.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Add LZ4 compression in pg_dump

2023-05-08 Thread Michael Paquier
On Mon, May 08, 2023 at 08:00:39PM +0200, Tomas Vondra wrote:
> The LZ4Stream_write() forgot to move the pointer to the next chunk, so
> it was happily decompressing the initial chunk over and over. A bit
> embarrassing oversight :-(
> 
> The custom format calls WriteDataToArchiveLZ4(), which was correct.
> 
> The attached patch fixes this for me.

Ouch.  So this was corrupting the dumps and the compression when
trying to write more than two chunks at once, not the decompression
steps.  That addresses the issue here as well, thanks!
--
Michael


signature.asc
Description: PGP signature


Re: Order changes in PG16 since ICU introduction

2023-05-08 Thread Jeff Davis
On Mon, 2023-05-08 at 17:47 -0400, Tom Lane wrote:
> -ERROR:  could not convert locale name "C" to language tag:
> U_ILLEGAL_ARGUMENT_ERROR
> +NOTICE:  using standard form "en-US-u-va-posix" for locale "C"

...

> I suppose this is environment-dependent.  Sadly, the buildfarm
> client does not show the prevailing LANG or LC_XXX settings.

Looks like it's failing-to-fail on some versions of ICU which
automatically perform that conversion.

The easiest thing to do is revert it for now, and after we sort out the
memcmp() path for the ICU provider, then I can commit it again (after
that point it would just be code cleanup and should have no functional
impact).

Regards,
Jeff Davis





Re: Order changes in PG16 since ICU introduction

2023-05-08 Thread Tom Lane
Jeff Davis  writes:
> === 0001: do not convert C to en-US-u-va-posix

> I plan to commit this soon.

Several buildfarm animals have failed since this went in.  The
only one showing enough info to diagnose is siskin [1]:

@@ -1043,16 +1043,15 @@
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to 
DISABLED.
 CREATE COLLATION testx (provider = icu, locale = 'C'); -- fails
-ERROR:  could not convert locale name "C" to language tag: 
U_ILLEGAL_ARGUMENT_ERROR
+NOTICE:  using standard form "en-US-u-va-posix" for locale "C"
 CREATE COLLATION testx (provider = icu, locale = 
'@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to 
language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = 
'@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to 
language tag: U_ILLEGAL_ARGUMENT_ERROR
+ERROR:  collation "testx" already exists
 CREATE COLLATION testx (provider = icu, locale = 'C'); DROP COLLATION testx;
-WARNING:  could not convert locale name "C" to language tag: 
U_ILLEGAL_ARGUMENT_ERROR
-WARNING:  ICU locale "C" has unknown language "c"
-HINT:  To disable ICU locale validation, set parameter icu_validation_level to 
DISABLED.
+NOTICE:  using standard form "en-US-u-va-posix" for locale "C"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP 
COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to 
DISABLED.

I suppose this is environment-dependent.  Sadly, the buildfarm
client does not show the prevailing LANG or LC_XXX settings.

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=siskin=2023-05-08%2020%3A09%3A26




Re: pg_stat_io for the startup process

2023-05-08 Thread Andres Freund
Hi,

On 2023-04-26 18:47:14 +0900, Kyotaro Horiguchi wrote:
> I see four issues here.
> 
> 1. The current database stats omits buffer fetches that don't
>   originate from a relation.
> 
> In this case pgstat_relation can't work since recovery isn't conscious
> of relids.  We might be able to resolve relfilenode into a relid, but
> it may not be that simple. Fortunately we already count fetches and
> hits process-wide using pgBufferUsage, so we can use this for database
> stats.

I don't think we need to do anything about that for 16 - they aren't updated
at process end either.

I think the fix here is to do the architectural change of maintaining most
stats keyed by relfilenode as we've discussed in some other threads. Then we
also can have relation level write stats etc.


> 2. Even if we wanted to report stats for the startup process,
>   pgstat_report_stats wouldn't permit it since transaction-end
>   timestamp doesn't advance.
> 
> I'm not certain if it's the correct approach, but perhaps we could use
> GetCurrentTimestamp() instead of GetCurrentTransactionStopTimestamp()
> specifically for the startup process.

What about using GetCurrentTimestamp() when force == true? That'd make sense
for other users as well, I think?


> 3. When should we call pgstat_report_stats on the startup process?
> 
> During recovery, I think we can call pgstat_report_stats() (or a
> subset of it) right before invoking WaitLatch and at segment
> boundaries.

I've pondered that as well. But I don't think it's great - it's not exactly
intuitive that stats reporting gets far less common if you use a 1GB
wal_segment_size.

Greetings,

Andres Freund




Re: pg_stat_io for the startup process

2023-05-08 Thread Melanie Plageman
On Wed, May 03, 2023 at 04:11:33PM +0300, Melih Mutlu wrote:
> Andres Freund , 27 Nis 2023 Per, 19:27 tarihinde şunu 
> yazdı:
> >  #ifdef WAL_DEBUG
> > >   if (XLOG_DEBUG ||
> > >   (record->xl_rmid == RM_XACT_ID &&
> > trace_recovery_messages <= DEBUG2) ||
> > > @@ -3617,6 +3621,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr,
> > bool randAccess,
> > >   /* Do background tasks
> > that might benefit us later. */
> > >
> >  KnownAssignedTransactionIdsIdleMaintenance();
> > >
> > > + /*
> > > +  * Need to disable flush
> > timeout to avoid unnecessary
> > > +  * wakeups. Enable it
> > again after a WAL record is read
> > > +  * in PerformWalRecovery.
> > > +  */
> > > +
> >  disable_startup_stat_flush_timeout();
> > > +
> > >   (void)
> > WaitLatch(>recoveryWakeupLatch,
> > >
> >   WL_LATCH_SET | WL_TIMEOUT |
> > >
> >   WL_EXIT_ON_PM_DEATH,
> >
> > I think always disabling the timer here isn't quite right - we want to
> > wake up
> > *once* in WaitForWALToBecomeAvailable(), otherwise we'll not submit pending
> > stats before waiting - potentially for a long time - for WAL. One way
> > would be
> > just explicitly report before the WaitLatch().
> >
> > Another approach, I think better, would be to not use
> > enable_timeout_every(),
> > and to explicitly rearm the timer in HandleStartupProcInterrupts(). When
> > called from WaitForWALToBecomeAvailable(), we'd not rearm, and instead do
> > so
> > at the end of WaitForWALToBecomeAvailable().  That way we also wouldn't
> > repeatedly fire between calls to HandleStartupProcInterrupts().
> >
> 
> Attached patch is probably not doing what you asked. IIUC
> HandleStartupProcInterrupts should arm the timer but also shouldn't arm it
> if it's called from  WaitForWALToBecomeAvailable. But the timer should be
> armed again at the very end of WaitForWALToBecomeAvailable. I've been
> thinking about how to do this properly. Should HandleStartupProcInterrupts
> take a parameter to decide whether the timer needs to be armed? Or need to
> add an additional global flag to rearm the timer? Any thoughts?

I had the same question about how to determine whether or not to rearm.

> From 9be7360e49db424c45c53e85efe8a4f5359b5244 Mon Sep 17 00:00:00 2001
> From: Melih Mutlu 
> Date: Wed, 26 Apr 2023 18:21:32 +0300
> Subject: [PATCH v2] Add timeout to flush stats during startup's main replay
>  loop
> diff --git a/src/backend/postmaster/startup.c 
> b/src/backend/postmaster/startup.c
> index efc2580536..842394bc8f 100644
> --- a/src/backend/postmaster/startup.c
> +++ b/src/backend/postmaster/startup.c
> @@ -72,6 +72,9 @@ static TimestampTz startup_progress_phase_start_time;
>   */
>  static volatile sig_atomic_t startup_progress_timer_expired = false;
>  
> +/* Indicates whether flushing stats is needed. */
> +static volatile sig_atomic_t idle_stats_update_pending = false;
> +
>  /*
>   * Time between progress updates for long-running startup operations.
>   */
> @@ -206,6 +209,18 @@ HandleStartupProcInterrupts(void)
>   /* Perform logging of memory contexts of this process */
>   if (LogMemoryContextPending)
>   ProcessLogMemoryContextInterrupt();
> +
> + if (idle_stats_update_pending)
> + {
> + /* It's time to report wal stats. */

If we want dbstats to be updated, we'll probably have to call
pgstat_report_stat() here and fix the timestamp issue Horiguchi-san
points out upthread. Perhaps you could review those changes and consider
adding those as preliminary patches before this in a set.

I think you will then need to handle the issue he mentions with partial
flushes coming from pgstat_report_stat() and remembering to try and
flush stats again in case of a partial flush. Though, perhaps we can
just pass force=true.

> + pgstat_report_wal(true);
> + idle_stats_update_pending = false;
> + }

Good that you thought to check if the timeout was already active.

> + else if (!get_timeout_active(IDLE_STATS_UPDATE_TIMEOUT))
> + {
> + /* Set the next timeout. */
> + enable_idle_stats_update_timeout();
> + }
>  }
>  
>  
> @@ -385,3 +400,22 @@ has_startup_progress_timeout_expired(long *secs, int 
> *usecs)
>  
>   return true;
>  }
> +
> +/* Set a flag indicating that it's time to flush wal stats. */
> +void
> +idle_stats_update_timeout_handler(void)
> +{
> + idle_stats_update_pending = true;

Is WakeupRecovery() needed when the timer goes off and the startup
process is waiting on a latch? Does this happen in
WaitForWalToBecomeAvailable()?

> + WakeupRecovery();
> +}
> +
> +/* Enable the timeout set for wal stat 

Re: base backup vs. concurrent truncation

2023-05-08 Thread Andres Freund
Hi,

On 2023-04-25 10:28:58 -0700, Andres Freund wrote:
> On 2023-04-25 11:42:43 -0400, Robert Haas wrote:
> > On Mon, Apr 24, 2023 at 8:03 PM Andres Freund  wrote:
> > > What we've discussed somewhere in the past is to always truncate N+1 when
> > > creating the first page in N. I.e. if we extend into 23456.1, we truncate
> > > 23456.2 to 0 blocks.  As far as I can tell, that'd solve this issue?
> > 
> > Yeah, although leaving 23456.2 forever unless and until that happens
> > doesn't sound amazing.
> 
> It isn't - but the alternatives aren't great either. It's not that easy to hit
> this scenario, so I think something along these lines is more palatable than
> adding a pass through the entire data directory.

> I think eventually we'll have to make the WAL logging bulletproof enough
> against this to avoid the risk of it. I think that is possible.


I think we should extend my proposal above with improved WAL logging.

Right now the replay of XLOG_SMGR_TRUNCATE records uses the normal
smgrtruncate() path - which essentially relies on smgrnblocks() to determine
the relation size, which in turn iterates over the segments until it finds one
< SEGSIZE.

That's fine during normal running, where we are consistent. But it's bogus
when we're not consistent - in case of a concurrent truncation, the base
backup might have missed intermediate segments, while copying later segments.

We should fix this by including not just the "target" length in the WAL
record, but also the "current" length. Then during WAL replay of such records
we'd not look at the files currently present, we'd just stupidly truncate all
the segments mentioned in the range.


I think we ought to do the same for mdunlinkfork().



> I suspect we would need to prevent checkpoints from happening in the wrong
> moment, if we were to go down that route.
> 
> I guess that eventually we'll need to polish the infrastructure for
> determining restartpointsm so that delayChkptFlags doesn't actually prevent
> checkpoints, just moves the restart to a safe LSN.  Although I guess that
> truncations aren't frequent enough (compared to transaction commits), for that
> to be required "now".
> 

Unfortunately the current approach of smgrtruncate records is quite racy with
checkpoints. You can end up with a sequence of something like

1) SMGRTRUNCATE record
2) CHECKPOINT;
3) Truncating of segment files

if you crash anywhere in 3), we don't replay the SMGRTRUNCATE record. It's not
a great solution, but I suspect we'll just have to set delayChkptFlags during
truncations to prevent this.


I also think that we need to fix the order in which mdunlink() operates. It
seems *quite* bogus that we unlink in "forward" segment order, rather than in
reverse order (like mdtruncate()). If we crash after unlinking segment.N,
we'll not redo unlinking the later segments after a crash. Nor in WAL
replay. While the improved WAL logging would address this as well, it still
seems pointlessly risky to iterate forward, rather than backward.


Even with those changes, I think we might still need something like the
"always truncate the next segment" bit I described in my prior email though.

Greetings,

Andres Freund




Re: base backup vs. concurrent truncation

2023-05-08 Thread Andres Freund
Hi,

On 2023-05-08 08:57:08 -0400, Robert Haas wrote:
> On Mon, May 1, 2023 at 12:54 PM Aleksander Alekseev
>  wrote:
> > So I'm still unable to reproduce the described scenario, at least on PG16.
> 
> Well, that proves that either (1) the scenario that I described is
> impossible for some unknown reason or (2) something is wrong with your
> test scenario. I bet it's (2), but if it's (1), it would be nice to
> know what the reason is. One can't feel good about code that appears
> on the surface to be broken even if one knows that some unknown
> magical thing is preventing disaster.

It seems pretty easy to create disconnected segments. You don't even need a
basebackup for it.


To make it easier, I rebuilt with segsize_blocks=16. This isn't required, it
just makes it a lot cheaper to play around. To noones surprise: I'm not a
patient person...


Started server with autovacuum=off.

DROP TABLE IF EXISTS large;
CREATE TABLE large AS SELECT generate_series(1, 10);
SELECT current_setting('data_directory') || '/' || 
pg_relation_filepath('large');

ls -l /srv/dev/pgdev-dev/base/5/24585*
shows lots of segments.

attach gdb, set breakpoint on truncate.

DROP TABLE large;

breakpoint will fire. Continue once.

In concurrent session, trigger checkpoint. Due to the checkpoint we'll not
replay any WAL record. And the checkpoint will unlink the first segment.

Kill the server.

After crash recovery, you end up with all but the first segment still
present. As the first segment doesn't exist anymore, nothing prevents that oid
from being recycled in the future. Once it is recycled and the first segment
grows large enough, the later segments will suddenly re-appear.


It's not quite so trivial to reproduce issues with partial truncations /
concurrent base backups. The problem is that it's hard to guarantee the
iteration order of the base backup process.  You'd just need to write a manual
base backup script though.

Consider a script mimicking the filesystem returning directory entries in
"reverse name order". Recipe includes two sessions. One (BB) doing a base
backup, the other (DT) running VACUUM making the table shorter.

BB: Copy .2
BB: Copy .1
SS: Truncate relation to < SEGSIZE
BB: Copy 


The replay of the smgrtruncate record will determine the relation size to
figure out what segments to remove. Because  is < SEGSIZE it'll
only truncate , not .N.  And boom, a disconnected
segment.

(I'll post a separate email about an evolved proposal about fixing this set of
issues)

Greetings,

Andres Freund




Re: [PATCH] Add native windows on arm64 support

2023-05-08 Thread Tom Lane
Andres Freund  writes:
> I don't really have feelings either way - but haven't we gone further and even
> backpatched things like spinlock support for new arches in the past?

Mmmm ... don't really think those cases were comparable.  We weren't
adding support for a whole new OS.  Now, you might argue that Windows
on arm64 will be just like Windows on x86_64, but I think the jury
is still out on that.  Microsoft was so Intel-only for so many years
that I bet they've had to change quite a bit to make it go on ARM.

Also, the cases of back-patched spinlock support that I can find
in the last few years were pretty low-risk.  I'll grant that
c32fcac56 was a bit blue-sky because hardly anybody had RISC-V
at that point, but by the same token anybody relying on it at the
time would be dealing with a beta-grade OS too.  On the other hand,
1c72d82c2 was immediately testable in the buildfarm, and f3bd00c01
was importing code already verified by our OpenBSD packagers.

As I said upthread, this seems like something to put in at the
beginning of a dev cycle, not post-feature-freeze.

regards, tom lane




Re: issue with meson builds on msys2

2023-05-08 Thread Andres Freund
Hi,

On 2023-05-05 07:08:39 -0400, Andrew Dunstan wrote:
> On 2023-05-04 Th 19:54, Andres Freund wrote:
> > Hm. I can't reproduce this in my test win10 VM, unfortunately. What OS / OS
> > version is the host? Any chance to get systeminfo.exe output or something 
> > like
> > that?
> 
> 
> Its a Windows Server 2019 (v 1809) instance running on AWS.

Hm. When I hit the python issue I also couldn't repro it on windows 10. Cirrus
was also using Windows Server 2019...


> > I think we ought to do something here. If newer environments cause failures
> > like this, it seems likely that this will spread to more and more 
> > applications
> > over time...
> > 
> 
> Just to reassure myself I have not been hallucinating, I repeated the test.
> 
> 
> pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf/root/HEAD/inst
> $ /usr/bin/perl -e 'system(qq{"bin/pg_ctl" -D data-C -w -l logfile start >
> startlog 2>&1}) ; print $? ? "BANG: $?\n" : "OK\n";'
> OK
> 
> pgrunner@EC2AMAZ-GCB871B UCRT64 ~/bf/root/HEAD/inst
> $ /usr/bin/perl -e 'system(qq{"bin/pg_ctl" -D data-C -w -l logfile stop >
> stoplog 2>&1}) ; print $? ? "BANG: $?\n" : "OK\n";'
> BANG: 33280

Oh, so it only happens when stopping, never when starting? That's
interesting...


> If you want to play I can arrange access.

That'd be very helpful.

Greetings,

Andres Freund




Re: pg_stat_io for the startup process

2023-05-08 Thread Melanie Plageman
I've reviewed both your patch versions and responded to the ideas in
both of them and the associated emails below.

On Wed, Apr 26, 2023 at 5:47 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 25 Apr 2023 16:04:23 -0700, Andres Freund  wrote 
> in
> > key those stats by oid. However, it *does* report the read/write time. But
> > only at process exit, of course.  The weird part is that the startup process
> > does *NOT* increase pg_stat_database.blks_read/blks_hit, because instead of
> > basing those on pgBufferUsage.shared_blks_read etc, we compute them based on
> > the relation level stats. pgBufferUsage is just used for EXPLAIN.  This 
> > isn't
> > recent, afaict.
>
> I see four issues here.
>
> 1. The current database stats omits buffer fetches that don't
>   originate from a relation.
>
> In this case pgstat_relation can't work since recovery isn't conscious
> of relids.  We might be able to resolve relfilenode into a relid, but
> it may not be that simple. Fortunately we already count fetches and
> hits process-wide using pgBufferUsage, so we can use this for database
> stats.
...
> > TL;DR: Currently the startup process maintains blk_read_time, 
> > blk_write_time,
> > but doesn't maintain blks_read, blks_hit - which doesn't make sense.
>
> As a result, the attached patch, which is meant for discussion, allows
> pg_stat_database to show fetches and reads by the startup process as
> the counts for the database with id 0.

I would put this in its own patch in a patchset. Of course it relies on
having pgstat_report_stat() called at appropriate times by the startup
process, but having pg_stat_database show read/hit counts is a separate
issue than having pg_stat_io do so. I'm not suggesting we do this, but
you could argue that if we fix the startup process stats reporting that
pg_stat_database not showing reads and hits for the startup process is a
bug that also exists in 15.

Not directly related, but I do not get why the existing stats counters
for pg_stat_database count "fetches" and "hits" and then use subtraction
to get the number of reads. I find it confusing and seems like it could
lead to subtle inconsistencies with those counters counting reads closer
to where they are actually happening.

> 2. Even if we wanted to report stats for the startup process,
>   pgstat_report_stats wouldn't permit it since transaction-end
>   timestamp doesn't advance.
>
> I'm not certain if it's the correct approach, but perhaps we could use
> GetCurrentTimestamp() instead of GetCurrentTransactionStopTimestamp()
> specifically for the startup process.

In theory, since all of the approaches proposed in this thread would
exercise rigid control over how often we flush stats in the startup
process, I think it is okay to use GetCurrentTimestamp() when
pgstat_report_stat() is called by the startup process (i.e. we don't
have to worry about overhead of doing it). But looking at it implemented
in the patch made me feel unsettled for some reason.

> 3. When should we call pgstat_report_stats on the startup process?
>
> During recovery, I think we can call pgstat_report_stats() (or a
> subset of it) right before invoking WaitLatch and at segment
> boundaries.

I see in the patch you call pgstat_report_stat() in XLogPageRead(). Will
this only be called on segment boundaries?

> 4. In the existing ReadBuffer_common, there's an inconsistency in
> counting hits and reads between pgstat_io and pgBufferUsage.
>
> The difference comes from the case of RBM_ZERO pages. We should simply
> align them.

I would definitely make this a separate patch and probably a separate
thread. It isn't related to the startup process and is worth a separate
discussion.

On Thu, Apr 27, 2023 at 10:43 PM Kyotaro Horiguchi
 wrote:
>
> At Fri, 28 Apr 2023 11:15:51 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> > At Thu, 27 Apr 2023 17:30:40 -0400, Melanie Plageman 
> >  wrote in
> > > After a quick example implementation of this, I found that it seemed to
> > > try and flush the stats less often on an idle standby (good) than using
> > > enable_timeout_every().
> >
> > Just rearming with the full-interval will work that way. Our existing
> > strategy for this is seen in PostgresMain().
> >
> >stats_timeout = pgstat_report_stat(false);
> >if (stats_timeout > 0)
> >{
> >   if (!get_timeout_active(BLAH_TIMEOUT))
> >   enable_timeout_after(BLAH_TIMEOUT, stats_timeout);
> >}
> >else
> >{
> >if (get_timeout_active(BLAH_TIMEOUT))
> >disable_timeout(BLAH_TIMEOUT, false);
> >}
> >WaitLatch();
>
> Im my example, I left out idle-time flushing, but I realized we don't
> need the timeout mechanism here since we're already managing it. So
> the following should work (assuming the timestamp updates with
> GetCurrentTimestamp() in my last patch).
>
> @@ -3889,13 +3900,23 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
> randAccess,
> /* Update pg_stat_recovery_prefetch 

Re: Add LZ4 compression in pg_dump

2023-05-08 Thread gkokolatos





--- Original Message ---
On Monday, May 8th, 2023 at 8:20 PM, Tomas Vondra 
 wrote:


> 
> 
> 
> 
> On 5/8/23 18:19, gkokola...@pm.me wrote:
> 
> > --- Original Message ---
> > On Monday, May 8th, 2023 at 3:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
> > 
> > > I wrote:
> > > 
> > > > Michael Paquier mich...@paquier.xyz writes:
> > > > 
> > > > > While testing this patch, I have triggered an error pointing out that
> > > > > the decompression path of LZ4 is broken for table data. I can
> > > > > reproduce that with a dump of the regression database, as of:
> > > > > make installcheck
> > > > > pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
> > > 
> > > > Ugh. Reproduced here ... so we need an open item for this.
> > > 
> > > BTW, it seems to work with --format=c.
> > 
> > Thank you for the extra tests. It seems that exists a gap in the test
> > coverage. Please find a patch attached that is addressing the issue
> > and attempt to provide tests for it.
> 
> 
> Seems I'm getting messages with a delay - this is mostly the same fix I
> ended up with, not realizing you already posted a fix.

Thank you very much for looking.

> I don't think we need the local "in" variable - the pointer parameter is
> local in the function, so we can modify it directly (with a cast).
> WriteDataToArchiveLZ4 does it that way too.

Sure, patch updated.
 
> The tests are definitely a good idea.

Thank you.

> I wonder if we should add a
> comment to DEFAULT_IO_BUFFER_SIZE mentioning that if we choose to
> increase the value in the future, we needs to tweak the tests too to use
> more data in order to exercise the buffering etc. Maybe it's obvious?
> 

You are right. Added a comment both in the header and in the test.

I hope v2 gets closer to closing the open item for this.

Cheers,
//Georgios


> 
> regards
> 
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL CompanyFrom 89e7066d6c3c6a7eeb147c3f2d345c3046a4d155 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Mon, 8 May 2023 19:48:03 +
Subject: [PATCH v2] Advance input pointer when LZ4 compressing data

LZ4File_write() did not advance the input pointer on subsequent invocations of
LZ4F_compressUpdate(). As a result the generated compressed output would be a
compressed version of the same input chunk.

WriteDataToArchiveLZ4() which is also using LZ4F_compressUpdate() did not suffer
from this omission. Tests failed to catch this error because all of their input
would comfortably fit within the same input chunk. Tests have been added to
provide adequate coverage.
---
 src/bin/pg_dump/compress_io.h|  7 -
 src/bin/pg_dump/compress_lz4.c   |  2 ++
 src/bin/pg_dump/t/002_pg_dump.pl | 46 
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index fd8752db0d..e8efc57f1a 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -17,7 +17,12 @@
 
 #include "pg_backup_archiver.h"
 
-/* Default size used for IO buffers */
+/*
+ * Default size used for IO buffers
+ *
+ * When altering this value it might be useful to verify that the relevant tests
+ * cases are meaningfully updated to provide coverage.
+ */
 #define DEFAULT_IO_BUFFER_SIZE	4096
 
 extern char *supports_compression(const pg_compress_specification compression_spec);
diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index f97b7550d1..b869780c0b 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -588,6 +588,8 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 			errno = (errno) ? errno : ENOSPC;
 			return false;
 		}
+
+		ptr = ((const char *) ptr) + chunk;
 	}
 
 	return true;
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 93e24d5145..d66f3b42ea 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -3108,6 +3108,52 @@ my %tests = (
 		},
 	},
 
+	'CREATE TABLE test_compression_method' => {
+		create_order => 110,
+		create_sql   => 'CREATE TABLE dump_test.test_compression_method (
+		   col1 text
+	   );',
+		regexp => qr/^
+			\QCREATE TABLE dump_test.test_compression_method (\E\n
+			\s+\Qcol1 text\E\n
+			\Q);\E
+			/xm,
+		like => {
+			%full_runs,
+			%dump_test_schema_runs,
+			section_pre_data => 1,
+		},
+		unlike => {
+			exclude_dump_test_schema => 1,
+			only_dump_measurement=> 1,
+		},
+	},
+
+	# Insert enough data to surpass DEFAULT_IO_BUFFER_SIZE during
+	# (de)compression operations
+	'COPY test_compression_method' => {
+		create_order => 111,
+		create_sql   => 'INSERT INTO dump_test.test_compression_method (col1) '
+		  . 'SELECT string_agg(a::text, \'\') FROM generate_series(1,4096) a;',
+		regexp => qr/^
+			\QCOPY dump_test.test_compression_method (col1) FROM stdin;\E
+			\n(?:\d{15277}\n){1}\\\.\n
+			/xm,
+		like => {

Re: [PATCH] Add native windows on arm64 support

2023-05-08 Thread Andres Freund
Hi,

On 2023-05-06 00:35:40 -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > Seeing how simple this has become, I would be really tempted to get
> > that applied, especially if there's a buildfarm machine able to follow
> > up..  On the one hand, we are in a stability period for v16, so it may
> > not be the best moment ever.  On the other hand, waiting one more year
> > sounds like a waste for a patch that just adds new code paths.
> 
> Indeed, there's not much in this patch ... but it's impossible to see
> how "run on an entirely new platform" isn't a new feature.  Moreover,
> it's a platform that very few of us will have any ability to support
> or debug for.  I can't see how it's a good idea to shove this in
> post-feature-freeze.  Let's plan to push it in when v17 opens.

I don't really have feelings either way - but haven't we gone further and even
backpatched things like spinlock support for new arches in the past?

Greetings,

Andres Freund




Re: benchmark results comparing versions 15.2 and 16

2023-05-08 Thread Andres Freund
Hi,

On 2023-05-08 16:00:01 +0300, Alexander Lakhin wrote:
> This difference is confirmed by multiple test runs. `git bisect` for this
> regression pointed at f193883fc.

I can reproduce a significant regression due to f193883fc of a workload just
running
SELECT CURRENT_TIMESTAMP;

A single session running it on my workstation via pgbench -Mprepared gets
before:
tps = 89359.128359 (without initial connection time)
after:
tps = 83843.585152 (without initial connection time)

Obviously this is an extreme workload, but that nevertheless seems too large
to just accept...


Michael, the commit message notes that there were no measured performance
regression - yet I see one in a trivial test. What were you measuring?


I'm a bit surprised by the magnitude of the regression, but it's not
surprising that there is a performance effect. You're replacing something that
doesn't go through the whole generic function rigamarole, and replace it with
something that does...

Looking at two perf profiles, the biggest noticable difference is

Before:

-5.51% 0.13%  postgres  postgres  [.] ExecInitResult
   - 5.38% ExecInitResult
  + 2.29% ExecInitResultTupleSlotTL
  - 2.22% ExecAssignProjectionInfo
 - 2.19% ExecBuildProjectionInfo
  0.47% ExecReadyInterpretedExpr
- 0.43% ExecInitExprRec
   - 0.10% palloc
AllocSetAlloc.localalias (inlined)
+ 0.32% expression_tree_walker_impl.localalias (inlined)
+ 0.28% get_typlen
  0.09% ExecPushExprSlots
+ 0.06% MemoryContextAllocZeroAligned
+ 0.04% MemoryContextAllocZeroAligned
  0.02% exprType.localalias (inlined)
  + 0.41% ExecAssignExprContext
  + 0.35% MemoryContextAllocZeroAligned
0.11% ExecInitQual.localalias (inlined)
   + 0.11% _start
   + 0.02% 0x55b89c764d7f

After:

-6.57% 0.17%  postgres  postgres  [.] ExecInitResult
   - 6.40% ExecInitResult
  - 3.00% ExecAssignProjectionInfo
 - ExecBuildProjectionInfo
- 0.91% ExecInitExprRec
   - 0.65% ExecInitFunc
0.23% fmgr_info_cxt_security
  + 0.18% palloc0
  + 0.07% object_aclcheck
0.04% fmgr_info
 0.05% check_stack_depth
   + 0.05% palloc
+ 0.58% expression_tree_walker_impl.localalias (inlined)
+ 0.55% get_typlen
  0.37% ExecReadyInterpretedExpr
+ 0.11% MemoryContextAllocZeroAligned
  0.09% ExecPushExprSlots
  0.04% exprType.localalias (inlined)
  + 2.77% ExecInitResultTupleSlotTL
  + 0.50% ExecAssignExprContext
  + 0.09% MemoryContextAllocZeroAligned
0.05% ExecInitQual.localalias (inlined)
   + 0.10% _start

I.e. we spend more time building the expression state for expression
evaluation, because we now go through the generic ExecInitFunc(), instead of
something dedicated. We also now need to do permission checking etc.


I don't think that's the entirety of the regression...

Greetings,

Andres Freund




Re: Add LZ4 compression in pg_dump

2023-05-08 Thread Tomas Vondra



On 5/8/23 18:19, gkokola...@pm.me wrote:
> 
> 
> 
> 
> 
> --- Original Message ---
> On Monday, May 8th, 2023 at 3:16 AM, Tom Lane  wrote:
> 
> 
>>
>>
>> I wrote:
>>
>>> Michael Paquier mich...@paquier.xyz writes:
>>>
 While testing this patch, I have triggered an error pointing out that
 the decompression path of LZ4 is broken for table data. I can
 reproduce that with a dump of the regression database, as of:
 make installcheck
 pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
>>
>>> Ugh. Reproduced here ... so we need an open item for this.
>>
>>
>> BTW, it seems to work with --format=c.
>>
> 
> Thank you for the extra tests. It seems that exists a gap in the test
> coverage. Please find a patch attached that is addressing the issue
> and attempt to provide tests for it.
> 

Seems I'm getting messages with a delay - this is mostly the same fix I
ended up with, not realizing you already posted a fix.

I don't think we need the local "in" variable - the pointer parameter is
local in the function, so we can modify it directly (with a cast).
WriteDataToArchiveLZ4 does it that way too.

The tests are definitely a good idea. I wonder if we should add a
comment to DEFAULT_IO_BUFFER_SIZE mentioning that if we choose to
increase the value in the future, we needs to tweak the tests too to use
more data in order to exercise the buffering etc. Maybe it's obvious?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add LZ4 compression in pg_dump

2023-05-08 Thread Tomas Vondra
On 5/8/23 03:16, Tom Lane wrote:
> I wrote:
>> Michael Paquier  writes:
>>> While testing this patch, I have triggered an error pointing out that
>>> the decompression path of LZ4 is broken for table data.  I can
>>> reproduce that with a dump of the regression database, as of:
>>> make installcheck
>>> pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
> 
>> Ugh.  Reproduced here ... so we need an open item for this.
> 
> BTW, it seems to work with --format=c.
> 

The LZ4Stream_write() forgot to move the pointer to the next chunk, so
it was happily decompressing the initial chunk over and over. A bit
embarrassing oversight :-(

The custom format calls WriteDataToArchiveLZ4(), which was correct.

The attached patch fixes this for me.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index 423e1b7976f..43c4b9187ef 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -584,6 +584,8 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 			errno = (errno) ? errno : ENOSPC;
 			return false;
 		}
+
+		ptr = ((char *) ptr) + chunk;
 	}
 
 	return true;


Re: Improve list manipulation in several places

2023-05-08 Thread Ranier Vilela
Em seg., 8 de mai. de 2023 às 14:26, Alvaro Herrera 
escreveu:

> On 2023-May-08, Peter Eisentraut wrote:
>
> > On 23.04.23 08:42, Richard Guo wrote:
> > > Thanks for the suggestion.  I've split the patch into two as attached.
> > > 0001 is just a minor simplification by replacing
> lfirst(list_head(list))
> > > with linitial(list).  0002 introduces new functions to reduce the
> > > movement of list elements in several places so as to gain performance
> > > improvement and benefit future callers.
> >
> > These look sensible to me.  If you could show some numbers that support
> the
> > claim that there is a performance advantage, it would be even more
> > convincing.
>
> 0001 looks fine.
>
> The problem I see is that each of these new functions has a single
> caller, and the only one that looks like it could have a performance
> advantage is list_copy_move_nth_to_head() (which is the weirdest of the
> lot).  I'm inclined not to have any of these single-use functions unless
> a performance case can be made for them.
>
I think you missed list_nth_xid, It makes perfect sense to exist.

regards,
Ranier Vilela


Re: Improve list manipulation in several places

2023-05-08 Thread Alvaro Herrera
On 2023-May-08, Peter Eisentraut wrote:

> On 23.04.23 08:42, Richard Guo wrote:
> > Thanks for the suggestion.  I've split the patch into two as attached.
> > 0001 is just a minor simplification by replacing lfirst(list_head(list))
> > with linitial(list).  0002 introduces new functions to reduce the
> > movement of list elements in several places so as to gain performance
> > improvement and benefit future callers.
> 
> These look sensible to me.  If you could show some numbers that support the
> claim that there is a performance advantage, it would be even more
> convincing.

0001 looks fine.

The problem I see is that each of these new functions has a single
caller, and the only one that looks like it could have a performance
advantage is list_copy_move_nth_to_head() (which is the weirdest of the
lot).  I'm inclined not to have any of these single-use functions unless
a performance case can be made for them.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Tab completion for CREATE SCHEMAAUTHORIZATION

2023-05-08 Thread Dagfinn Ilmari Mannsåker
Michael Paquier  writes:

> On Tue, May 02, 2023 at 01:19:49PM +0100, Dagfinn Ilmari Mannsåker wrote:
>> Dagfinn Ilmari Mannsåker  writes:
>>> This is for completing the word CREATE itself after CREATE SCHEMA
>>> [[] AUTHORIZATION] .  The things that can come after that
>>> are already handled generically earlier in the function:
>>>
>>> /* CREATE */
>>> /* complete with something you can create */
>>> else if (TailMatches("CREATE"))
>>> matches = rl_completion_matches(text, create_command_generator);
>>>
>>> create_command_generator uses the words_after_create array, which lists
>>> all the things that can be created.
>
> You are right.  I have completely forgotten that this code path would
> append everything that supports CREATE for a CREATE SCHEMA command :)
>
>> But, looking closer at the docs, only tables, views, indexes, sequences
>> and triggers can be created as part of a CREATE SCHEMA statement. Maybe
>> we should add a HeadMatches("CREATE", "SCHEMA") exception in the above?
>
> Yes, it looks like we are going to need an exception and append only
> the keywords that are supported, or we will end up recommending mostly
> things that are not accepted by the parser.

Here's an updated v3 patch with that.  While adding that, I noticed that
CREATE UNLOGGED only tab-completes TABLE and MATERIALIZED VIEW, not
SEQUENCE, so I added that (and removed MATERIALIZED VIEW when part of
CREATE SCHEMA).

- ilmari

>From 31856bf5397253da76cbce9ccb590855a44da30d Mon Sep 17 00:00:00 2001
From: tanghy 
Date: Mon, 9 Aug 2021 18:47:12 +0100
Subject: [PATCH v3] Add tab completion for CREATE SCHEMA

 - AUTHORIZATION both in addition to and after a schema name
 - list of owner roles after AUTHORIZATION
 - CREATE and GRANT after any otherwise-complete command
 - Only the allowed object types after CREATE

Also adjust complation after CREATE UNLOGGED:

 - Add SEQUENCE
 - Don't suggest MATERIALIZED VIEW in CREATE TABLE
---
 src/bin/psql/tab-complete.c | 40 ++---
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index bd04244969..66b3f00c1b 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1031,9 +1031,13 @@ static const SchemaQuery Query_for_trigger_of_table = {
 "   FROM pg_catalog.pg_roles "\
 "  WHERE rolname LIKE '%s'"
 
+/* add these to Query_for_list_of_roles in OWNER contexts */
+#define Keywords_for_list_of_owner_roles \
+"CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"
+
 /* add these to Query_for_list_of_roles in GRANT contexts */
 #define Keywords_for_list_of_grant_roles \
-"PUBLIC", "CURRENT_ROLE", "CURRENT_USER", "SESSION_USER"
+Keywords_for_list_of_owner_roles, "PUBLIC"
 
 #define Query_for_all_table_constraints \
 "SELECT conname "\
@@ -1785,7 +1789,13 @@ psql_completion(const char *text, int start, int end)
 /* CREATE */
 	/* complete with something you can create */
 	else if (TailMatches("CREATE"))
-		matches = rl_completion_matches(text, create_command_generator);
+		/* only some object types can be created as part of CREATE SCHEMA */
+		if (HeadMatches("CREATE", "SCHEMA"))
+			COMPLETE_WITH("TABLE", "VIEW", "INDEX", "SEQUENCE", "TRIGGER",
+		  /* for INDEX and TABLE/SEQUENCE, respectively */
+		  "UNIQUE", "UNLOGGED");
+		else
+			matches = rl_completion_matches(text, create_command_generator);
 
 	/* complete with something you can create or replace */
 	else if (TailMatches("CREATE", "OR", "REPLACE"))
@@ -3154,6 +3164,20 @@ psql_completion(const char *text, int start, int end)
 	else if (TailMatches("AS", "ON", "SELECT|UPDATE|INSERT|DELETE", "TO"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables);
 
+/* CREATE SCHEMA [  ] [ AUTHORIZATION ] */
+	else if (Matches("CREATE", "SCHEMA"))
+		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_schemas,
+ "AUTHORIZATION");
+	else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION") ||
+			 Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION"))
+		COMPLETE_WITH_QUERY_PLUS(Query_for_list_of_roles,
+ Keywords_for_list_of_owner_roles);
+	else if (Matches("CREATE", "SCHEMA", "AUTHORIZATION", MatchAny) ||
+			 Matches("CREATE", "SCHEMA", MatchAny, "AUTHORIZATION", MatchAny))
+		COMPLETE_WITH("CREATE", "GRANT");
+	else if (Matches("CREATE", "SCHEMA", MatchAny))
+		COMPLETE_WITH("AUTHORIZATION", "CREATE", "GRANT");
+
 /* CREATE SEQUENCE --- is allowed inside CREATE SCHEMA, so use TailMatches */
 	else if (TailMatches("CREATE", "SEQUENCE", MatchAny) ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY", "SEQUENCE", MatchAny))
@@ -3185,9 +3209,13 @@ psql_completion(const char *text, int start, int end)
 	/* Complete "CREATE TEMP/TEMPORARY" with the possible temp objects */
 	else if (TailMatches("CREATE", "TEMP|TEMPORARY"))
 		COMPLETE_WITH("SEQUENCE", "TABLE", "VIEW");
-	/* Complete "CREATE UNLOGGED" with TABLE or MATVIEW */
+	/* Complete "CREATE UNLOGGED" with TABLE, SEQUENCE or MATVIEW */
 	else if 

Re: Proposal for Prototype Implementation to Enhance C/C++ Interoperability in PostgreSQL

2023-05-08 Thread Peter Eisentraut

On 08.05.23 04:38, 盏一 wrote:

 > It seems extremely specific to one particular C++ implementation

To perform a force unwind during longjmp, the _Unwind_ForcedUnwind 
function is used. This function is defined in the [Itanium C++ ABI 
Standard](https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html#base-throw), which is followed by all C++ implementations. Additionally, the glibc [nptl/unwind.c](https://elixir.bootlin.com/glibc/latest/source/nptl/unwind.c#L130) file shows that on all platforms, pthread_exit is also implemented using _Unwind_ForcedUnwind.


Furthermore, the Itanium C++ ABI specification also defines 
_Unwind_RaiseException as the entry point for all C++ exceptions thrown.


I ran your patch through Cirrus CI, and it passed on Linux but failed on 
FreeBSD, macOS, and Windows.  You should fix that if you want to 
alleviate the concerns about the portability of this approach.






Re: Add LZ4 compression in pg_dump

2023-05-08 Thread gkokolatos





--- Original Message ---
On Monday, May 8th, 2023 at 3:16 AM, Tom Lane  wrote:


> 
> 
> I wrote:
> 
> > Michael Paquier mich...@paquier.xyz writes:
> > 
> > > While testing this patch, I have triggered an error pointing out that
> > > the decompression path of LZ4 is broken for table data. I can
> > > reproduce that with a dump of the regression database, as of:
> > > make installcheck
> > > pg_dump --format=d --file=dump_lz4 --compress=lz4 regression
> 
> > Ugh. Reproduced here ... so we need an open item for this.
> 
> 
> BTW, it seems to work with --format=c.
> 

Thank you for the extra tests. It seems that exists a gap in the test
coverage. Please find a patch attached that is addressing the issue
and attempt to provide tests for it.

Cheers,
//Georgios

> regards, tom laneFrom 8c6c86c362820e93f066992ede6e5ca23f128807 Mon Sep 17 00:00:00 2001
From: Georgios Kokolatos 
Date: Mon, 8 May 2023 15:25:25 +
Subject: [PATCH v1] Advance input pointer when LZ4 compressing data

LZ4File_write() did not advance the input pointer on subsequent invocations of
LZ4F_compressUpdate(). As a result the generated compressed output would be a
compressed version of the same input chunk.

WriteDataToArchiveLZ4() which is also using LZ4F_compressUpdate() did not suffer
from this omission. Tests failed to catch this error because all of their input
would comfortably fit within the same input chunk. Tests have been added to
provide adequate coverage.
---
 src/bin/pg_dump/compress_lz4.c   |  5 +++-
 src/bin/pg_dump/t/002_pg_dump.pl | 44 
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/src/bin/pg_dump/compress_lz4.c b/src/bin/pg_dump/compress_lz4.c
index f97b7550d1..76211c82f0 100644
--- a/src/bin/pg_dump/compress_lz4.c
+++ b/src/bin/pg_dump/compress_lz4.c
@@ -564,6 +564,7 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 	LZ4State   *state = (LZ4State *) CFH->private_data;
 	size_t		status;
 	int			remaining = size;
+	const void *in = ptr;
 
 	/* Lazy init */
 	if (!LZ4Stream_init(state, size, true))
@@ -576,7 +577,7 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 		remaining -= chunk;
 
 		status = LZ4F_compressUpdate(state->ctx, state->buffer, state->buflen,
-	 ptr, chunk, NULL);
+	 in, chunk, NULL);
 		if (LZ4F_isError(status))
 		{
 			state->errcode = status;
@@ -588,6 +589,8 @@ LZ4Stream_write(const void *ptr, size_t size, CompressFileHandle *CFH)
 			errno = (errno) ? errno : ENOSPC;
 			return false;
 		}
+
+		in = ((const char *) in) + chunk;
 	}
 
 	return true;
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 93e24d5145..c6b1225815 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -3108,6 +3108,50 @@ my %tests = (
 		},
 	},
 
+	'CREATE TABLE test_compression_method' => {
+		create_order => 110,
+		create_sql   => 'CREATE TABLE dump_test.test_compression_method (
+		   col1 text
+	   );',
+		regexp => qr/^
+			\QCREATE TABLE dump_test.test_compression_method (\E\n
+			\s+\Qcol1 text\E\n
+			\Q);\E
+			/xm,
+		like => {
+			%full_runs,
+			%dump_test_schema_runs,
+			section_pre_data => 1,
+		},
+		unlike => {
+			exclude_dump_test_schema => 1,
+			only_dump_measurement=> 1,
+		},
+	},
+
+	'COPY test_compression_method' => {
+		create_order => 111,
+		create_sql   => 'INSERT INTO dump_test.test_compression_method (col1) '
+		  . 'SELECT string_agg(a::text, \'\') FROM generate_series(1,4096) a;',
+		regexp => qr/^
+			\QCOPY dump_test.test_compression_method (col1) FROM stdin;\E
+			\n(?:\d{15277}\n){1}\\\.\n
+			/xm,
+		like => {
+			%full_runs,
+			data_only=> 1,
+			section_data => 1,
+			only_dump_test_schema=> 1,
+			test_schema_plus_large_objects=> 1,
+		},
+		unlike => {
+			binary_upgrade   => 1,
+			exclude_dump_test_schema => 1,
+			schema_only  => 1,
+			only_dump_measurement=> 1,
+		},
+	},
+
 	'CREATE TABLE fk_reference_test_table' => {
 		create_order => 21,
 		create_sql   => 'CREATE TABLE dump_test.fk_reference_test_table (
-- 
2.34.1



Re: Memory leak from ExecutorState context?

2023-05-08 Thread Melanie Plageman
Thanks for continuing to work on this!

On Thu, May 04, 2023 at 07:30:06PM +0200, Jehan-Guillaume de Rorthais wrote:
> On Fri, 21 Apr 2023 16:44:48 -0400 Melanie Plageman 
>  wrote:
...
> > I think the biggest change that is needed is to implement this memory
> > context usage for parallel hash join.
> 
> Indeed. 

...

> 4. accessor->read_buffer is currently allocated in accessor->context as well.
> 
>This buffer holds tuple read from the fileset. This is still a buffer, but
>not related to any file anymore...
> 
> Because of 3 and 4, and the gross context granularity of SharedTuplestore
> related-code, I'm now wondering if "fileCxt" shouldn't be renamed "bufCxt".

I think bufCxt is a potentially confusing name. The context contains
pointers to the batch files and saying those are related to buffers is
confusing. It also sounds like it could include any kind of buffer
related to the hashtable or hash join.

Perhaps we could call this memory context the "spillCxt"--indicating it
is for the memory required for spilling to permanent storage while
executing hash joins.

I discuss this more in my code review below.

> From c5ed2ae2c2749af4f5058b012dc5e8a9e1529127 Mon Sep 17 00:00:00 2001
> From: Jehan-Guillaume de Rorthais 
> Date: Mon, 27 Mar 2023 15:54:39 +0200
> Subject: [PATCH 2/3] Allocate hash batches related BufFile in a dedicated
>  context

> diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
> index 5fd1c5553b..a4fbf29301 100644
> --- a/src/backend/executor/nodeHash.c
> +++ b/src/backend/executor/nodeHash.c
> @@ -570,15 +574,21 @@ ExecHashTableCreate(HashState *state, List 
> *hashOperators, List *hashCollations,
>  
>   if (nbatch > 1 && hashtable->parallel_state == NULL)
>   {
> + MemoryContext oldctx;
> +
>   /*
>* allocate and initialize the file arrays in hashCxt (not 
> needed for
>* parallel case which uses shared tuplestores instead of raw 
> files)
>*/
> + oldctx = MemoryContextSwitchTo(hashtable->fileCxt);
> +
>   hashtable->innerBatchFile = palloc0_array(BufFile *, nbatch);
>   hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch);
>   /* The files will not be opened until needed... */
>   /* ... but make sure we have temp tablespaces established for 
> them */

I haven't studied it closely, but I wonder if we shouldn't switch back
into the oldctx before calling PrepareTempTablespaces().
PrepareTempTablespaces() is explicit about what memory context it is
allocating in, however, I'm not sure it makes sense to call it in the
fileCxt. If you have a reason, you should add a comment and do the same
in ExecHashIncreaseNumBatches().

>   PrepareTempTablespaces();
> +
> + MemoryContextSwitchTo(oldctx);
>   }

> @@ -934,13 +943,16 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
>  hashtable, nbatch, hashtable->spaceUsed);
>  #endif
>  
> - oldcxt = MemoryContextSwitchTo(hashtable->hashCxt);
> -
>   if (hashtable->innerBatchFile == NULL)
>   {
> + MemoryContext oldcxt = 
> MemoryContextSwitchTo(hashtable->fileCxt);
> +
>   /* we had no file arrays before */
>   hashtable->innerBatchFile = palloc0_array(BufFile *, nbatch);
>   hashtable->outerBatchFile = palloc0_array(BufFile *, nbatch);
> +

As mentioned above, you should likely make ExecHashTableCreate()
consistent with this.

> + MemoryContextSwitchTo(oldcxt);
> +
>   /* time to establish the temp tablespaces, too */
>   PrepareTempTablespaces();
>   }
> @@ -951,8 +963,6 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)

I don't see a reason to call repalloc0_array() in a different context
than the initial palloc0_array().

>   hashtable->outerBatchFile = 
> repalloc0_array(hashtable->outerBatchFile, BufFile *, oldnbatch, nbatch);
>   }
>  
> - MemoryContextSwitchTo(oldcxt);
> -
>   hashtable->nbatch = nbatch;
>  
>   /*

> diff --git a/src/backend/executor/nodeHashjoin.c 
> b/src/backend/executor/nodeHashjoin.c
> index 920d1831c2..ac72fbfbb6 100644
> --- a/src/backend/executor/nodeHashjoin.c
> +++ b/src/backend/executor/nodeHashjoin.c
> @@ -485,8 +485,10 @@ ExecHashJoinImpl(PlanState *pstate, bool parallel)
>*/
>   Assert(parallel_state == NULL);
>   Assert(batchno > hashtable->curbatch);
> +
>   ExecHashJoinSaveTuple(mintuple, 
> hashvalue,
> - 
>   >outerBatchFile[batchno]);
> + 
>   >outerBatchFile[batchno],
> + 

Re: Add standard collation UNICODE

2023-05-08 Thread Peter Eisentraut

On 27.04.23 13:44, Daniel Verite wrote:

This collation has an empty pg_collation.collversion column, instead
of being set to the same value as "und-x-icu" to track its version.



The original patch implements this as an INSERT in which it would be easy to
fix I guess, but in current HEAD it comes as an entry in
include/catalog/pg_collation.dat:

{ oid => '963',
   descr => 'sorts using the Unicode Collation Algorithm with default
settings',
   collname => 'unicode', collprovider => 'i', collencoding => '-1',
   colliculocale => 'und' },

Should it be converted back into an INSERT or better left
in this file and collversion being updated afterwards?


How about we do it with an UPDATE command.  We already do this for 
pg_database in a similar way.  See attached patch.
From 91f2aff04f9bf137e4ac6e7df624dde770503bfd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 8 May 2023 17:45:46 +0200
Subject: [PATCH] initdb: Set collversion for standard collation UNICODE

Discussion: 
https://www.postgresql.org/message-id/49417853-7bdd-4b23-a4e9-04c7aff33...@manitou-mail.org
---
 src/bin/initdb/initdb.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2c208ead01..632f6c9c72 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1695,6 +1695,13 @@ setup_description(FILE *cmdfd)
 static void
 setup_collation(FILE *cmdfd)
 {
+   /*
+* Set the collation version for collations defined in pg_collation.dat,
+* except the ones where we know that the collation behavior will never
+* change.
+*/
+   PG_CMD_PUTS("UPDATE pg_collation SET collversion = 
pg_collation_actual_version(oid) WHERE collname = 'unicode';\n\n");
+
/* Import all collations we can find in the operating system */
PG_CMD_PUTS("SELECT pg_import_system_collations('pg_catalog');\n\n");
 }
-- 
2.40.0



Re: Improve list manipulation in several places

2023-05-08 Thread Peter Eisentraut

On 23.04.23 08:42, Richard Guo wrote:

Thanks for the suggestion.  I've split the patch into two as attached.
0001 is just a minor simplification by replacing lfirst(list_head(list))
with linitial(list).  0002 introduces new functions to reduce the
movement of list elements in several places so as to gain performance
improvement and benefit future callers.


These look sensible to me.  If you could show some numbers that support 
the claim that there is a performance advantage, it would be even more 
convincing.






Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-05-08 Thread Tom Lane
Alvaro Herrera  writes:
> This made me wonder if storing the unadorned port number is really the
> best way.  Suppose we did extend things so that we listen on different
> ports on different interfaces; how would this scheme work at all?

Yeah, the probability that that will happen someday is one of the
things bothering me about this proposal.  I'd rather change the
file format to support that first (it can be dummy for now, with
all lines showing the same port), and then document it second.

regards, tom lane




Re: WAL Insertion Lock Improvements

2023-05-08 Thread Bharath Rupireddy
On Mon, May 8, 2023 at 5:57 PM Bharath Rupireddy
 wrote:
>
> On Mon, Apr 10, 2023 at 9:38 AM Michael Paquier  wrote:
>
> > -LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
> > +LWLockUpdateVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val)
> > [...]
> > Assert(pg_atomic_read_u32(>state) & LW_VAL_EXCLUSIVE);
> >
> > -   /* Update the lock's value */
> > -   *valptr = val;
> >
> > The sensitive change is in LWLockUpdateVar().  I am not completely
> > sure to understand this removal, though.  Does that influence the case
> > where there are waiters?
>
> I'll send about this in a follow-up email to not overload this
> response with too much data.

It helps the case when there are no waiters. IOW, it updates value
without waitlist lock when there are no waiters, so no extra waitlist
lock acquisition/release just to update the value. In turn, it helps
the other backend wanting to flush the WAL looking for the new updated
value of insertingAt in WaitXLogInsertionsToFinish(), now the flushing
backend can get the new value faster.

> > Another thing I was wondering about: how much does the fast-path used
> > in LWLockUpdateVar() influence the performance numbers? Am I right to
> > guess that it counts for most of the gain seen?
>
> I'll send about this in a follow-up email to not overload this
> response with too much data.

The fastpath exit in LWLockUpdateVar() doesn't seem to influence the
results much, see below results. However, it avoids waitlist lock
acquisition when there are no waiters.

test-case 1: -T5, WAL ~16 bytes
clientsHEADPATCHED with fastpathPATCHED no fast path
1148214861457
2161716201569
4317432333031
8613663655725
16125661226911685
32242842362123177
64501354552846653
128949038979189103
25682289152915152835
51262498138838142084
76857083125074126768
102451308113593115930
2048410848876485110
4096199394225743917

> > Or could it be that
> > the removal of the spin lock in
> > LWLockConflictsWithVar()/LWLockWaitForVar() the point that has the
> > highest effect?
>
> I'll send about this in a follow-up email to not overload this
> response with too much data.

Out of 3 functions that got rid of waitlist lock
LWLockConflictsWithVar/LWLockWaitForVar, LWLockUpdateVar,
LWLockReleaseClearVar, perf reports tell that the biggest gain (for
the use-cases that I've tried) is for
LWLockConflictsWithVar/LWLockWaitForVar:

test-case 1: -T5, WAL ~16 bytes
HEAD:
+   61.89% 0.05%  postgres  [.] LWLockWaitForVar
+   43.19% 0.12%  postgres  [.] LWLockConflictsWithVar
+1.62% 0.00%  postgres  [.] LWLockReleaseClearVar

PATCHED:
+   38.79% 0.11%  postgres  [.] LWLockWaitForVar
 0.40% 0.02%  postgres  [.] LWLockConflictsWithVar
+2.80% 0.00%  postgres  [.] LWLockReleaseClearVar

test-case 6: -T5, WAL 4096 bytes
HEAD:
+   29.66% 0.07%  postgres  [.] LWLockWaitForVar
+   20.94% 0.08%  postgres  [.] LWLockConflictsWithVar
 0.19% 0.03%  postgres  [.] LWLockUpdateVar

PATCHED:
+3.95% 0.08%  postgres  [.] LWLockWaitForVar
 0.19% 0.03%  postgres  [.] LWLockConflictsWithVar
+1.73% 0.00%  postgres  [.] LWLockReleaseClearVar

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-05-08 Thread Alvaro Herrera
On 2023-Apr-19, Yurii Rashkovskii wrote:

> If we consider this path, then (if we assume the format of the file is
> still to be private), we can make the port line accept multiple ports using
> a delimiter like `:` so that the next line still remains the same.

This made me wonder if storing the unadorned port number is really the
best way.  Suppose we did extend things so that we listen on different
ports on different interfaces; how would this scheme work at all?  I
suspect it would be simpler to store both the interface address and the
port, perhaps separated by :.  You would keep it to one pair per line,
so you'd get the IPv6 address/port separately from the IPv4 address, for
example.  And if you happen to have multiple addresses, you know exactly
which ones you're listening on.

To match a problem that has been discussed in the lists previously,
suppose you have listen_addresses='localhost' and the resolver does
funny things with that name (say you mess up /etc/hosts after starting).
Things would be much simpler if you knew exactly what the resolver did
at postmaster start time.

> (I consider this feature so small that it doesn't deserve such a lengthy
> discussion. However, I also get Tom's point about how we document this

You should see the discussion that led to the addition of psql's 'exit'
command sometime:
https://www.postgresql.org/message-id/flat/CALVFHFb-C_5_94hueWg6Dd0zu7TfbpT7hzsh9Zf0DEDOSaAnfA%40mail.gmail.com#949321e44856b7fa295834d6a3997ab4


-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Every machine is a smoke machine if you operate it wrong enough."
https://twitter.com/libseybieda/status/1541673325781196801




Re: benchmark results comparing versions 15.2 and 16

2023-05-08 Thread Alexander Lakhin

Hello Mark,

05.05.2023 20:45, MARK CALLAGHAN wrote:
This is mostly a hobby project for me - my other hobby is removing invasive weeds. I am happy to answer questions and 
run more tests, but turnaround for answers won't be instant. Getting results from Linux perf for these tests is on my 
TODO list. For now I am just re-running a subset of these to get more certainty that the regressions are real and not 
noise.




It's a very interesting topic to me, too. I had developed some scripts to
measure and compare postgres`s performance using miscellaneous public
benchmarks (ycsb, tpcds, benchmarksql_tpcc, htapbench, benchbase, gdprbench,
s64da-benchmark, ...). Having compared 15.3 (56e869a09) with master
(58f5edf84) I haven't seen significant regressions except a few minor ones.
First regression observed with a simple pgbench test:
pgbench -i benchdb
pgbench -c 10 -T 300 benchdb
(with default compilation options and fsync = off)

On master I get:
tps = 10349.826645 (without initial connection time)
On 15.3:
tps = 11296.064992 (without initial connection time)

This difference is confirmed by multiple test runs. `git bisect` for this
regression pointed at f193883fc.

Best regards,
Alexander




Re: base backup vs. concurrent truncation

2023-05-08 Thread Robert Haas
On Mon, May 1, 2023 at 12:54 PM Aleksander Alekseev
 wrote:
> So I'm still unable to reproduce the described scenario, at least on PG16.

Well, that proves that either (1) the scenario that I described is
impossible for some unknown reason or (2) something is wrong with your
test scenario. I bet it's (2), but if it's (1), it would be nice to
know what the reason is. One can't feel good about code that appears
on the surface to be broken even if one knows that some unknown
magical thing is preventing disaster.

I find it pretty hard to accept that there's no problem at all here,
especially in view of the fact that Andres independently posted about
the same issue on another thread. It's pretty clear from looking at
the code that mdnblocks() can't open any segments past the first one
that isn't of the maximum possible size. It's also fairly clear that a
crash or a base backup can create such situations. And finally it's
pretty clear that having an old partial segment be rediscovered due to
the relation be re-extended would be quite bad. So how can there not
be a problem? I admit I haven't done the legwork to nail down a test
case where everything comes together just right to show user-visible
breakage, but your success in finding one where it doesn't is no proof
of anything.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Allow Postgres to pick an unused port to listen

2023-05-08 Thread Robert Haas
On Mon, Apr 24, 2023 at 10:16 AM Peter Eisentraut
 wrote:
> Right.  I'm perfectly content with just allowing port number 0 and
> leaving it at that.

That seems fine to me, too. If somebody wants to add a pg_ctl feature
to extract this or any other information from the postmaster.pid file,
that can be a separate patch. But it's not necessarily the case that
users would even prefer that interface. Some might, some might not. Or
so it seems to me, anyway.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: proposal: psql: show current user in prompt

2023-05-08 Thread Jelte Fennema
I'm very much in favor of adding a way to support reporting other GUC
variables than the current hardcoded list. This can be quite useful to
support some amount of session state in connection poolers.

Some general feedback on the patch:
1. I don't think the synchronization mechanism that you added should
be part of PQlinkParameterStatus. It seems likely for people to want
to turn on reporting for multiple GUCs in one go. Having to
synchronize for each would introduce unnecessary round trips. Maybe
you also don't care about syncing at all at this point in time.

2. Support for this message should probably require a protocol
extension. There is another recent thread that discusses about adding
message types and protocol extensions:
https://www.postgresql.org/message-id/flat/CA%2BTgmoaxfJ3whOqnxTjT-%2BHDgZYbEho7dVHHsuEU2sgRw17OEQ%40mail.gmail.com#acd99fde0c037cc6cb35d565329b6e00

3. Some tests for this new libpq API should be added in
src/test/modules/libpq_pipeline

4. s/massages/messages


Finally, I think this patch should be split into two commits:
1. adding custom GUC_REPORT protocol support+libpq API
2. using this libpq API in psql for the user prompt

If you have multiple commits (which are rebased on master), you can
very easily create multiple patch files using this command:
git format-patch master --base=master --reroll-count={version_number_here}


On Fri, 28 Apr 2023 at 07:00, Pavel Stehule  wrote:
>
>
>
> čt 27. 4. 2023 v 7:39 odesílatel Pavel Stehule  
> napsal:
>>
>> Hi
>>
>> rebased version + fix warning possibly uninitialized variable
>
>
> fix not unique function id
>
> Regards
>
> Pavel
>
>>
>> Regards
>>
>> Pavel
>>




Re: Perform streaming logical transactions by background workers and parallel apply

2023-05-08 Thread Amit Kapila
On Fri, May 5, 2023 at 9:14 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Wednesday, May 3, 2023 3:17 PM Amit Kapila  wrote:
> >
>
> Attach another patch to fix the problem that pa_shutdown will access invalid
> MyLogicalRepWorker. I personally want to avoid introducing new static 
> variable,
> so I only reorder the callback registration in this version.
>
> When testing this, I notice a rare case that the leader is possible to receive
> the worker termination message after the leader stops the parallel worker. 
> This
> is unnecessary and have a risk that the leader would try to access the 
> detached
> memory queue. This is more likely to happen and sometimes cause the failure in
> regression tests after the registration reorder patch because the dsm is
> detached earlier after applying the patch.
>

I think it is only possible for the leader apply can worker to try to
receive the error message from an error queue after your 0002 patch.
Because another place already detached from the queue before stopping
the parallel apply workers. So, I combined both the patches and
changed a few comments and a commit message. Let me know what you
think of the attached.

-- 
With Regards,
Amit Kapila.


v2-0001-Fix-invalid-memory-access-during-the-shutdown-of-.patch
Description: Binary data


Re: Support logical replication of DDLs

2023-05-08 Thread shveta malik
On Mon, May 8, 2023 at 3:58 PM shveta malik  wrote:
>
> On Tue, May 2, 2023 at 8:30 AM shveta malik  wrote:
> >
> > On Fri, Apr 28, 2023 at 5:11 PM Amit Kapila  wrote:
> > >
> > > Now, I think we can try to eliminate this entire ObjTree machinery and
> > > directly from the JSON blob during deparsing. We have previously also
> > > discussed this in an email chain at [1]. I think now the functionality
> > > of JSONB has also been improved and we should investigate whether it
> > > is feasible to directly use JSONB APIs to form the required blob.
> >
> > +1.
> > I will investigate this and will share my findings.
> >
>
>
> Please find the PoC patch for create-table after object-tree removal.
>

Missed to mention that it is a combined effort by Vignesh and myself,
so let us know your feedback.

thanks
Shveta




Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-05-08 Thread John Naylor
On Fri, Apr 7, 2023 at 4:55 PM John Naylor 
wrote:

> - Fixed-size PagetableEntry's are pretty large, but the tid compression
scheme used in this thread (in addition to being complex) is not a great
fit for tidbitmap because it makes it more difficult to track per-block
metadata (see also next point). With the "combined pointer-value slots"
technique, if a page's max tid offset is 63 or less, the offsets can be
stored directly in the pointer for the exact case. The lowest bit can tag
to indicate a pointer to a single-value leaf. That would complicate
operations like union/intersection and tracking "needs recheck", but it
would reduce memory use and node-traversal in common cases.

[just getting some thoughts out there before I have something concrete]

Thinking some more, this needn't be complicated at all. We'd just need to
reserve some bits of a bitmapword for the tag, as well as flags for
"ischunk" and "recheck". The other bits can be used for offsets.
Getting/storing the offsets basically amounts to adjusting the shift by a
constant. That way, this "embeddable PTE" could serve as both "PTE embedded
in a node pointer" and also the first member of a full PTE. A full PTE is
now just an array of embedded PTEs, except only the first one has the flags
we need. That reduces the number of places that have to be different.
Storing any set of offsets all less than ~60 would save
allocation/traversal in a large number of real cases. Furthermore, that
would reduce a full PTE to 40 bytes because there would be no padding.

This all assumes the key (block number) is no longer stored in the PTE,
whether embedded or not. That would mean this technique:

> - With lazy expansion and single-value leaves, the root of a radix tree
can point to a single leaf. That might get rid of the need to track
TBMStatus, since setting a single-leaf tree should be cheap.

...is not a good trade off because it requires each leaf to have the key,
and would thus reduce the utility of embedded leaves. We just need to make
sure storing a single value is not costly, and I suspect it's not.
(Currently the overhead avoided is allocating and zeroing a few kilobytes
for a hash table). If it is not, then we don't need a special case in
tidbitmap, which would be a great simplification. If it is, there are other
ways to mitigate.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: evtcache: EventTriggerCache vs Event Trigger Cache

2023-05-08 Thread Daniel Gustafsson
> On 4 May 2023, at 14:18, Daniel Gustafsson  wrote:
> 
>> On 4 May 2023, at 14:09, Tom Lane  wrote:
>> 
>> Daniel Gustafsson  writes:
>>> When reading a memory contexts log I realized that we have this:
>>> LOG:  level: 2; EventTriggerCache: 8192 total in 1 blocks; 7928 free (4 
>>> chunks); 264 used
>>> LOG:  level: 3; Event Trigger Cache: 8192 total in 1 blocks; 2616 free (0 
>>> chunks); 5576 used
>> 
>>> The reason is that BuildEventTriggerCache sets up a context 
>>> "EventTriggerCache"
>>> which house a hash named "Event Trigger Cache" which in turn creates a 
>>> context
>>> with the table name.  I think it makes sense that these share the same name,
>>> but I think it would be less confusing if they also shared the same spelling
>>> whitespace-wise.  Any reason to not rename the hash EventTriggerCache to 
>>> make
>>> the logging a tiny bit easier to read and grep?
>> 
>> Hmm, I'm kinda -1 on them having the same name visible in the
>> contexts dump --- that seems very confusing.  How about naming
>> the hash "EventTriggerCacheHash" or so?
> 
> I think the level is the indicator here, but I have no strong opinions,
> EventTriggerCacheHash is fine by me.

The attached trivial diff does that, parking this in the next CF.

--
Daniel Gustafsson



evtcachehash.diff
Description: Binary data


Re: [DOC] Update ALTER SUBSCRIPTION documentation

2023-05-08 Thread Masahiko Sawada
On Mon, May 8, 2023 at 12:07 PM Amit Kapila  wrote:
>
> On Fri, May 5, 2023 at 6:47 PM Robert Sjöblom  
> wrote:
> >
> > We have recently used the PostgreSQL documentation when setting up our
> > logical replication. We noticed there was a step missing in the
> > documentation on how to drop a logical replication subscription with a
> > replication slot attached.
> >
> > We clarify the documentation to include prerequisites for running the
> > DROP SUBSCRIPTION command. Please see attached patch.
> >
>
> Shouldn't we also change the following errhint in the code as well?
> ReportSlotConnectionError()
> {
> ...
> ereport(ERROR,
> (errcode(ERRCODE_CONNECTION_FAILURE),
> errmsg("could not connect to publisher when attempting to drop
> replication slot \"%s\": %s",
> slotname, err),
> /* translator: %s is an SQL ALTER command */
> errhint("Use %s to disassociate the subscription from the slot.",
> "ALTER SUBSCRIPTION ... SET (slot_name = NONE)")));
> ...
> }

Yeah, if the subscription is enabled, it might be helpful for users if
the error hint message says something like:

Use ALTER SUBSCRIPTION ... SET (slot_name = NONE) to disassociate the
subscription from the slot after disabling the subscription.

Apart from the documentation change, given that setting slot_name =
NONE always requires for the subscription to be disabled beforehand,
does it make sense to change ALTER SUBSCRIPTION SET so that we disable
the subscription when setting slot_name = NONE? Setting slot_name to a
valid slot name doesn't enable the subscription, though.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Add two missing tests in 035_standby_logical_decoding.pl

2023-05-08 Thread Drouvot, Bertrand

Hi,

On 5/8/23 4:42 AM, Amit Kapila wrote:

On Sat, May 6, 2023 at 9:33 PM Drouvot, Bertrand
 wrote:


On 5/6/23 3:28 PM, Amit Kapila wrote:

On Sat, May 6, 2023 at 1:52 PM Drouvot, Bertrand
 wrote:



Next steps:
=
1. The first thing is we should verify this theory by adding some LOG
in KeepLogSeg() to see if the _logSegNo is reduced due to the value
returned by XLogGetReplicationSlotMinimumLSN().


Yeah, will do that early next week.


It's done with the following changes:

https://github.com/bdrouvot/postgres/commit/79e1bd9ab429a22f876b9364eb8a0da2dacaaef7#diff-c1cb3ab2a19606390c1a7ed00ffe5a45531702ca5faf999d401c548f8951c65bL7454-R7514

With that in place, there is one failing test here: 
https://cirrus-ci.com/task/5173216310722560

Where we can see:

2023-05-08 07:42:56.301 UTC [18038][checkpointer] LOCATION:  
UpdateMinRecoveryPoint, xlog.c:2500
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  0: 
CreateRestartPoint: After XLByteToSeg RedoRecPtr is 0/498, _logSegNo is 4
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  
CreateRestartPoint, xlog.c:7271
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  0: KeepLogSeg: 
segno changed to 4 due to XLByteToSeg
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  KeepLogSeg, 
xlog.c:7473
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  0: KeepLogSeg: 
segno changed to 3 due to XLogGetReplicationSlotMinimumLSN()
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  KeepLogSeg, 
xlog.c:7483
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  0: 
CreateRestartPoint: After KeepLogSeg RedoRecPtr is 0/498, endptr is 
0/4000148, _logSegNo is 3
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  
CreateRestartPoint, xlog.c:7284
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  0: BDT1 about to 
call RemoveOldXlogFiles in CreateRestartPoint
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOCATION:  
CreateRestartPoint, xlog.c:7313
2023-05-08 07:42:56.302 UTC [18038][checkpointer] LOG:  0: attempting to 
remove WAL segments older than log file 0002

So the suspicion about XLogGetReplicationSlotMinimumLSN() was correct 
(_logSegNo moved from
4 to 3 due to XLogGetReplicationSlotMinimumLSN()).


What about postponing the physical slot creation on the standby and the
cascading standby node initialization after this test?



Yeah, that is also possible. But, I have a few questions regarding
that: (a) There doesn't seem to be a physical slot on cascading
standby, if I am missing something, can you please point me to the
relevant part of the test?


That's right. There is a physical slot only on the primary and on the standby.

What I meant up-thread is to also postpone the cascading standby node 
initialization
after this test (once the physical slot on the standby is created).

Please find attached a proposal doing so.


(b) Which test is currently dependent on
the physical slot on standby?


Not a test but the cascading standby initialization with the 
"primary_slot_name" parameter.

Also, now I think that's better to have the physical slot on the standby + hsf 
set to on on the
cascading standby (coming from the standby backup).

Idea is to avoid any risk of logical slot invalidation on the cascading standby 
in the
standby promotion test.

That was not the case before the attached proposal though (hsf was off on the 
cascading standby).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 38c67b0fd8f2d83e97a93108484fe23c881dd91c Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Mon, 8 May 2023 07:02:50 +
Subject: [PATCH v1] fix retaining WAL test

---
 .../t/035_standby_logical_decoding.pl | 38 ++-
 1 file changed, 21 insertions(+), 17 deletions(-)
 100.0% src/test/recovery/t/

diff --git a/src/test/recovery/t/035_standby_logical_decoding.pl 
b/src/test/recovery/t/035_standby_logical_decoding.pl
index ad1def2474..4bda706eae 100644
--- a/src/test/recovery/t/035_standby_logical_decoding.pl
+++ b/src/test/recovery/t/035_standby_logical_decoding.pl
@@ -276,20 +276,6 @@ $node_standby->append_conf('postgresql.conf',
max_replication_slots = 5]);
 $node_standby->start;
 $node_primary->wait_for_replay_catchup($node_standby);
-$node_standby->safe_psql('testdb', qq[SELECT * FROM 
pg_create_physical_replication_slot('$standby_physical_slotname');]);
-
-###
-# Initialize cascading standby node
-###
-$node_standby->backup($backup_name);
-$node_cascading_standby->init_from_backup(
-   $node_standby, $backup_name,
-   has_streaming => 1,
-   has_restoring => 1);
-$node_cascading_standby->append_conf('postgresql.conf',
-   qq[primary_slot_name = '$standby_physical_slotname']);
-$node_cascading_standby->start;
-$node_standby->wait_for_replay_catchup($node_cascading_standby, 

Subscription statistics are not dropped at DROP SUBSCRIPTION in some cases

2023-05-08 Thread Masahiko Sawada
Hi,

We call pgstat_drop_subscription() at the end of DropSubscription()
but we could leave from this function earlier e.g. when no slot is
associated with the subscription. In this case, the statistics entry
for the subscription remains. To fix it, I think we need to call it
earlier, just after removing the catalog tuple. There is a chance the
transaction dropping the subscription fails due to network error etc
but we don't need to worry about it as reporting the subscription drop
is transactional.

I've attached the patch. Feedback is very welcome.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


fix_drop_subscription_stats.patch
Description: Binary data


Re: SQL:2011 application time

2023-05-08 Thread Peter Eisentraut

On 03.05.23 23:02, Paul Jungwirth wrote:
Thank you again for the review. Here is a patch with most of your 
feedback addressed. Sorry it has taken so long! These patches are 
rebased up to 1ab763fc22adc88e5d779817e7b42b25a9dd7c9e

(May 3).


Here are a few small fixup patches to get your patch set compiling cleanly.

Also, it looks like the patches 0002, 0003, and 0004 are not split up 
correctly.  0002 contains tests using the FOR PORTION OF syntax 
introduced in 0003, and 0003 uses the function build_period_range() from 
0004.
From 6fa819b675255af763e51a67d4d8c88f1d390b6c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 8 May 2023 08:45:32 +0200
Subject: [PATCH 1/3] fixup! Add PERIODs

---
 doc/src/sgml/ref/alter_table.sgml| 2 +-
 src/test/modules/test_ddl_deparse/test_ddl_deparse.c | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 487f09f88a..d6aed3dff8 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -582,7 +582,7 @@ Description
 

 
-   
+   
 DROP PERIOD FOR
 
  
diff --git a/src/test/modules/test_ddl_deparse/test_ddl_deparse.c 
b/src/test/modules/test_ddl_deparse/test_ddl_deparse.c
index b7c6f98577..6f4e44de3f 100644
--- a/src/test/modules/test_ddl_deparse/test_ddl_deparse.c
+++ b/src/test/modules/test_ddl_deparse/test_ddl_deparse.c
@@ -309,6 +309,12 @@ get_altertable_subcmdinfo(PG_FUNCTION_ARGS)
case AT_ReAddStatistics:
strtype = "(re) ADD STATS";
break;
+   case AT_AddPeriod:
+   strtype = "ADD PERIOD";
+   break;
+   case AT_DropPeriod:
+   strtype = "DROP PERIOD";
+   break;
}
 
if (subcmd->recurse)
-- 
2.40.0

From 809e1fe145896b190aa4c0ec73902071e5ccdccc Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 8 May 2023 09:04:42 +0200
Subject: [PATCH 2/3] fixup! Add PERIODs

---
 src/backend/catalog/meson.build | 1 +
 src/include/catalog/meson.build | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/backend/catalog/meson.build b/src/backend/catalog/meson.build
index fa6609e577..c499cd2f5d 100644
--- a/src/backend/catalog/meson.build
+++ b/src/backend/catalog/meson.build
@@ -26,6 +26,7 @@ backend_sources += files(
   'pg_namespace.c',
   'pg_operator.c',
   'pg_parameter_acl.c',
+  'pg_period.c',
   'pg_proc.c',
   'pg_publication.c',
   'pg_range.c',
diff --git a/src/include/catalog/meson.build b/src/include/catalog/meson.build
index 3179be09d3..c92d4928a3 100644
--- a/src/include/catalog/meson.build
+++ b/src/include/catalog/meson.build
@@ -57,6 +57,7 @@ catalog_headers = [
   'pg_collation.h',
   'pg_parameter_acl.h',
   'pg_partitioned_table.h',
+  'pg_period.h',
   'pg_range.h',
   'pg_transform.h',
   'pg_sequence.h',
-- 
2.40.0

From 94f46deacdeaa3dbac1d3988678981ac8cf5fa9a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 8 May 2023 09:05:04 +0200
Subject: [PATCH 3/3] fixup! Add UPDATE/DELETE FOR PORTION OF

---
 src/backend/utils/adt/meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/utils/adt/meson.build 
b/src/backend/utils/adt/meson.build
index 8515cd9365..9deb26f953 100644
--- a/src/backend/utils/adt/meson.build
+++ b/src/backend/utils/adt/meson.build
@@ -65,6 +65,7 @@ backend_sources += files(
   'oracle_compat.c',
   'orderedsetaggs.c',
   'partitionfuncs.c',
+  'period.c',
   'pg_locale.c',
   'pg_lsn.c',
   'pg_upgrade_support.c',
-- 
2.40.0



Re: Perform streaming logical transactions by background workers and parallel apply

2023-05-08 Thread Amit Kapila
On Mon, May 8, 2023 at 11:08 AM Masahiko Sawada  wrote:
>
> On Mon, May 8, 2023 at 12:52 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Monday, May 8, 2023 11:08 AM Masahiko Sawada 
> >
> > Hi,
> >
> > >
> > > On Tue, May 2, 2023 at 12:22 PM Amit Kapila 
> > > wrote:
> > > >
> > > > On Fri, Apr 28, 2023 at 11:48 AM Masahiko Sawada
> > >  wrote:
> > > > >
> > > > > While investigating this issue, I've reviewed the code around
> > > > > callbacks and worker termination etc and I found a problem.
> > > > >
> > > > > A parallel apply worker calls the before_shmem_exit callbacks in the
> > > > > following order:
> > > > >
> > > > > 1. ShutdownPostgres()
> > > > > 2. logicalrep_worker_onexit()
> > > > > 3. pa_shutdown()
> > > > >
> > > > > Since the worker is detached during logicalrep_worker_onexit(),
> > > > > MyLogicalReplication->leader_pid is an invalid when we call
> > > > > pa_shutdown():
> > > > >
> > > > > static void
> > > > > pa_shutdown(int code, Datum arg)
> > > > > {
> > > > > Assert(MyLogicalRepWorker->leader_pid != InvalidPid);
> > > > > SendProcSignal(MyLogicalRepWorker->leader_pid,
> > > > >PROCSIG_PARALLEL_APPLY_MESSAGE,
> > > > >InvalidBackendId);
> > > > >
> > > > > Also, if the parallel apply worker fails shm_toc_lookup() during the
> > > > > initialization, it raises an error (because of noError = false) but
> > > > > ends up a SEGV as MyLogicalRepWorker is still NULL.
> > > > >
> > > > > I think that we should not use MyLogicalRepWorker->leader_pid in
> > > > > pa_shutdown() but instead store the leader's pid to a static variable
> > > > > before registering pa_shutdown() callback.
> > > > >
> > > >
> > > > Why not simply move the registration of pa_shutdown() to someplace
> > > > after logicalrep_worker_attach()?
> > >
> > > If we do that, the worker won't call dsm_detach() if it raises an
> > > ERROR in logicalrep_worker_attach(), is that okay? It seems that it's
> > > no practically problem since we call dsm_backend_shutdown() in
> > > shmem_exit(), but if so why do we need to call it in pa_shutdown()?
> >
> > I think the dsm_detach in pa_shutdown was intended to fire on_dsm_detach
> > callbacks to give callback a chance to report stat before the stat system is
> > shutdown, following what we do in ParallelWorkerShutdown() (e.g.
> > sharedfileset.c callbacks cause fd.c to do ReportTemporaryFileUsage(), so we
> > need to fire that earlier).
> >
> > But for parallel apply, we currently only have one on_dsm_detach
> > callback(shm_mq_detach_callback) which doesn't report extra stats. So the
> > dsm_detach in pa_shutdown is only used to make it a bit future-proof in case
> > we add some other on_dsm_detach callbacks in the future which need to report
> > stats.
>
> Make sense . Given that it's possible that we add other callbacks that
> report stats in the future, I think it's better not to move the
> position to register pa_shutdown() callback.
>

Hmm, what kind of stats do we expect to be collected before we
register pa_shutdown? I think if required we can register such a
callback after pa_shutdown. I feel without reordering the callbacks,
the fix would be a bit complicated as explained in my previous email,
so I don't think it is worth complicating this code unless really
required.

-- 
With Regards,
Amit Kapila.