Re: [PATCH] Add inline comments to the pg_hba_file_rules view

2023-09-15 Thread Michael Paquier
On Fri, Sep 15, 2023 at 09:37:23AM +0200, Jim Jones wrote:
> SELECT type, database, user_name, address, c.comment
> FROM  pg_hba_file_rules h, pg_read_conf_comments(h.file_name) c
> WHERE user_name[1]='jim' AND h.line_number = c.line_number ;
> 
>  type | database | user_name |  address  | comment
> --+--+---+---+-
>  host | {db} | {jim} | 127.0.0.1 | foo
>  host | {db} | {jim} | 127.0.0.1 | bar
>  host | {db} | {jim} | 127.0.0.1 | #foo#
> (3 rows)
> 
> 
> Is it more or less what you had in mind?

That was the idea.  I forgot about strpos(), but if you do that, do we
actually need a function in core to achieve that?  There are a few
fancy cases with the SQL function you have sent, actually..  strpos()
would grep the first '#' character, ignoring quoted areas.
--
Michael


signature.asc
Description: PGP signature


Re: JSON Path and GIN Questions

2023-09-15 Thread Erik Rijkers

Op 9/15/23 om 22:27 schreef David E. Wheeler:

On Sep 12, 2023, at 21:00, Erik Wienhold  wrote:


That's also my understanding.  We had a discussion about the docs on @@, @?, and
jsonb_path_query on -general a while back [1].  Maybe it's useful also.


Okay, I’ll take a pass at expanding the docs on this. I think a little 
mini-tutorial on these two operators would be useful.

Meanwhile, I’d like to re-up this question about the index qualification of 
non-equality JSON Path operators.

On Sep 12, 2023, at 20:16, David E. Wheeler  wrote:


Issue 3: Index Use for Comparison
-

 From the docs 
(https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I had 
assumed any JSON Path query would be able to use the GIN index. However while the 
use of the == JSON Path operator is able to take advantage of the GIN index, 
apparently the >= operator cannot:

david=# explain analyze select id from movies where movie @? '$ ?($.year >= 
2023)';
   QUERY PLAN   
   
-
Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual 
time=34.815..36.259 rows=192 loops=1)
   Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
   Rows Removed by Filter: 36081
Planning Time: 1.864 ms
Execution Time: 36.338 ms
(5 rows)

Is this expected? Originally I tried with json_path_ops, which I can understand 
not working, since it stores hashes of paths, which would allow only exact 
matches. But a plain old GIN index doesn’t appear to work, either. Should it? Is 
there perhaps some other op class that would allow it to work? Or would I have to 
create a separate BTREE index on `movie -> 'year'`?




movie @? '$ ?($.year >= 2023)'

I believe it is indeed not possible to have such a unequality-search use 
the GIN index.  It is another weakness of JSON that can be unexpected to 
those not in the fullness of Knowledge of the manual. Yes, this too 
would be good to explain in the doc where JSON indexes are explained.


Erik Rijkers


Thanks,

David






Re: Add 'worker_type' to pg_stat_subscription

2023-09-15 Thread Michael Paquier
On Fri, Sep 15, 2023 at 09:35:38AM -0700, Nathan Bossart wrote:
> Concretely, like this.

There are two references to "synchronization worker" in tablesync.c
(exit routine and busy loop), and a bit more of "sync worker"..
Anyway, these don't matter much, but there are two errmsgs where the
term "tablesync worker" is used.  Even if they are internal, these
could be made more consistent at least?

In config.sgml, max_sync_workers_per_subscription's description uses
"synchronization workers".  In the second case, adding "table" makes
little sense, but could it for the two other sentences?
--
Michael


signature.asc
Description: PGP signature


Re: [PoC] Improve dead tuple storage for lazy vacuum

2023-09-15 Thread Andres Freund
Hi,

On 2023-08-28 23:43:22 +0900, Masahiko Sawada wrote:
> I've attached v42 patch set. I improved tidstore regression test codes
> in addition of imcorporating the above comments.

Why did you need to disable the benchmark module for CI?

Greetings,

Andres Freund




Re: Performance degradation on concurrent COPY into a single relation in PG16.

2023-09-15 Thread Andres Freund
Hi,

On 2023-09-06 18:01:53 -0400, Tom Lane wrote:
> It turns out that this patch is what's making buildfarm member
> chipmunk fail in contrib/pg_visibility [1].  That's easily reproduced
> by running the test with shared_buffers = 10MB.  I didn't dig further
> than the "git bisect" result:

At first I was a bit confounded by not being able to reproduce this. My test
environment had max_connections=110 for some other reason - and the problem
doesn't reproduce with that setting...


> $ git bisect bad
> 82a4edabd272f70d044faec8cf7fd1eab92d9991 is the first bad commit
> commit 82a4edabd272f70d044faec8cf7fd1eab92d9991
> Author: Andres Freund 
> Date:   Mon Aug 14 09:54:03 2023 -0700
>
> hio: Take number of prior relation extensions into account
>
> but I imagine the problem is that the patch's more aggressive
> relation-extension heuristic is causing the table to have more
> pages than the test case expects.  Is it really a good idea
> for such a heuristic to depend on shared_buffers?

The heuristic doesn't directly depend on shared buffers. However, the amount
we extend by is limited by needing to pin shared buffers covering all the
newly extended buffers.

That's what ends up limiting things here - shared_buffers = 10MB and
max_connections = 10 doesn't allow for a lot of buffers to be pinned
concurrently in each backend.  Although perhaps LimitAdditionalPins() is a bit
too conservative, due to not checking the private refcount array and just
assuming REFCOUNT_ARRAY_ENTRIES.


> If you don't want to change the heuristic then we'll have to find a way to
> tweak this test to avoid it.

We could tweak LimitAdditionalPins() by checking PrivateRefCountArray instead
of assuming the worst-case REFCOUNT_ARRAY_ENTRIES.

However, it seems that the logic in the test is pretty fragile independent of
this issue? Different alignment, block size or an optimization of the page
layout could also break the test?

Unfortunately a query that doesn't falsely alert in this case is a bit ugly,
due to needing to deal with the corner case of an empty page at the end:

select *
from pg_visibility_map('copyfreeze')
where
  (not all_visible or not all_frozen)
  -- deal with trailing empty pages due to potentially bulk-extending too 
aggressively
  and exists(SELECT * FROM copyfreeze WHERE ctid >= ('('||blkno||', 0)')::tid)
;

Before 82a4edabd27 this situation was rare - you'd have needed contended
extensions. But after it has become more common. I worry that that might cause
other issues :(. OTOH, I think we'll need to extend way more aggressively at
some point...

Greetings,

Andres Freund




Re: SET ROLE documentation improvement

2023-09-15 Thread Yurii Rashkovskii
On Fri, Sep 15, 2023 at 1:47 PM Nathan Bossart 
wrote:

> On Fri, Sep 15, 2023 at 11:26:16AM -0700, Yurii Rashkovskii wrote:
> > I believe SET ROLE documentation makes a slightly incomplete statement
> > about what happens when a superuser uses SET ROLE.
> >
> > The documentation reading suggests that the superuser would lose all
> their
> > privileges. However, they still retain the ability to use `SET ROLE`
> again.
> >
> > The attached patch adds this bit to the documentation.
>
> IMO this is arguably covered by the following note:
>
>The specified role_name
>must be a role that the current session user is a member of.
>(If the session user is a superuser, any role can be selected.)
>
>
I agree that this may be considered sufficient coverage, but I believe that
giving contextual clarification goes a long way to help people understand.
Documentation reading can be challenging.


> But I don't see a big issue with clarifying things further as you propose.
>
> I think another issue is that the aforementioned note doesn't mention the
> new SET option added in 3d14e17.
>

How do you think we should word it in that note to make it useful?


-- 
Y.


Re: ALTER ROLE documentation improvement

2023-09-15 Thread Yurii Rashkovskii
On Fri, Sep 15, 2023 at 1:53 PM Nathan Bossart 
wrote:

> On Fri, Sep 15, 2023 at 11:46:35AM -0700, Yurii Rashkovskii wrote:
> > It appears that 16.0 improved some of the checks in ALTER ROLE.
> Previously,
> > it was possible to do the following (assuming current_user is a bootstrap
> > user):
> >
> > ```
> > ALTER ROLE current_user NOSUPERUSER
> > ```
> >
> > As of 16.0, this produces an error:
> >
> > ```
> > ERROR:  permission denied to alter role
> > DETAIL:  The bootstrap user must have the SUPERUSER attribute.
> > ```
> >
> > The attached patch documents this behavior by providing a bit more
> > clarification to the following statement:
> >
> > "Database superusers can change any of these settings for any role."
>
> I think this could also be worth a mention in the glossary [0].  BTW the
> glossary calls this role the "bootstrap superuser", but the DETAIL message
> calls it the "bootstrap user".  Perhaps we should standardize on one name.
>
> [0] https://www.postgresql.org/docs/devel/glossary.html
>
>
Thank you for the feedback. I've updated the glossary and updated the
terminology to be consistent. Please see the new patch attached.

-- 
Y.


0001-Improve-ALTER-ROLE-documentation-to-document-current-v2.patch
Description: Binary data


Re: JSON Path and GIN Questions

2023-09-15 Thread Tom Lane
"David E. Wheeler"  writes:
> On Sep 14, 2023, at 00:41, Tom Lane  wrote:
>> As far as json in particular is concerned, 8.14.4 jsonb Indexing [4]
>> is pretty clear about what is or is not supported.

> How do you feel about this note, then?

I think it's unnecessary.  If we did consider it necessary,
why wouldn't just about every subsection in chapter 8 need
similar wording?

regards, tom lane




Re: ALTER ROLE documentation improvement

2023-09-15 Thread Nathan Bossart
On Fri, Sep 15, 2023 at 11:46:35AM -0700, Yurii Rashkovskii wrote:
> It appears that 16.0 improved some of the checks in ALTER ROLE. Previously,
> it was possible to do the following (assuming current_user is a bootstrap
> user):
> 
> ```
> ALTER ROLE current_user NOSUPERUSER
> ```
> 
> As of 16.0, this produces an error:
> 
> ```
> ERROR:  permission denied to alter role
> DETAIL:  The bootstrap user must have the SUPERUSER attribute.
> ```
> 
> The attached patch documents this behavior by providing a bit more
> clarification to the following statement:
> 
> "Database superusers can change any of these settings for any role."

I think this could also be worth a mention in the glossary [0].  BTW the
glossary calls this role the "bootstrap superuser", but the DETAIL message
calls it the "bootstrap user".  Perhaps we should standardize on one name.

[0] https://www.postgresql.org/docs/devel/glossary.html

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: SET ROLE documentation improvement

2023-09-15 Thread Nathan Bossart
On Fri, Sep 15, 2023 at 11:26:16AM -0700, Yurii Rashkovskii wrote:
> I believe SET ROLE documentation makes a slightly incomplete statement
> about what happens when a superuser uses SET ROLE.
> 
> The documentation reading suggests that the superuser would lose all their
> privileges. However, they still retain the ability to use `SET ROLE` again.
> 
> The attached patch adds this bit to the documentation.

IMO this is arguably covered by the following note:

   The specified role_name
   must be a role that the current session user is a member of.
   (If the session user is a superuser, any role can be selected.)

But I don't see a big issue with clarifying things further as you propose.

I think another issue is that the aforementioned note doesn't mention the
new SET option added in 3d14e17.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




semantics of "convenient to store" in FmgrInfo ?

2023-09-15 Thread Chapman Flack

Hi,

At one time 12 years ago, fn_collation was stored in FmgrInfo,
with a comment saying it was really "parse-time-determined
information about the arguments, rather than about the function
itself" but saying "it's convenient" to store it in FmgrInfo
rather than in FunctionCallInfoData.

But in d64713d, fn_collation was booted out of FmgrInfo and into
FunctionCallInfoData, with this commit comment: "Since collation
is effectively an argument, not a property of the function, FmgrInfo
is really the wrong place for it; and this becomes critical in cases
where a cached FmgrInfo is used for varying purposes that might need
different collation settings."

However, fn_expr is still there in FmgrInfo, with exactly the same
vague rationale in the comment about being "convenient" to keep in
FmgrInfo rather than in the function call info where it might more
logically belong.

Is there a good quick story for why that's ok for fn_expr and not
for fn_collation? In particular, what can I count on?

Can I count on, if a FmgrInfo has a non-null fn_expr, that all
forthcoming function calls based on it will have:

- the same value for fcinfo->nargs ? (matching the count of fn_expr)
- the same arg types (and same polymorphic arg resolved types) ?

Then what about fcinfo->fn_collation? Can it vary from one call
to another in those circumstances? Or can that only happen when
fn_expr is null, and a cached FmgrInfo is being used for varying
purposes /other than/ "as part of an SQL expression"?

Are there ever circumstances where a FmgrInfo with a non-null
fn_expr is reused with a different fn_expr?

Regards,
-Chap




Re: JSON Path and GIN Questions

2023-09-15 Thread David E. Wheeler
On Sep 12, 2023, at 21:00, Erik Wienhold  wrote:

> That's also my understanding.  We had a discussion about the docs on @@, @?, 
> and
> jsonb_path_query on -general a while back [1].  Maybe it's useful also.

Okay, I’ll take a pass at expanding the docs on this. I think a little 
mini-tutorial on these two operators would be useful.

Meanwhile, I’d like to re-up this question about the index qualification of 
non-equality JSON Path operators.

On Sep 12, 2023, at 20:16, David E. Wheeler  wrote:

> Issue 3: Index Use for Comparison
> -
> 
> From the docs 
> (https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING), I 
> had assumed any JSON Path query would be able to use the GIN index. However 
> while the use of the == JSON Path operator is able to take advantage of the 
> GIN index, apparently the >= operator cannot:
> 
> david=# explain analyze select id from movies where movie @? '$ ?($.year >= 
> 2023)';
>   QUERY PLAN  
> 
> -
> Seq Scan on movies  (cost=0.00..3741.41 rows=366 width=4) (actual 
> time=34.815..36.259 rows=192 loops=1)
>   Filter: (movie @? '$?($."year" >= 2023)'::jsonpath)
>   Rows Removed by Filter: 36081
> Planning Time: 1.864 ms
> Execution Time: 36.338 ms
> (5 rows)
> 
> Is this expected? Originally I tried with json_path_ops, which I can 
> understand not working, since it stores hashes of paths, which would allow 
> only exact matches. But a plain old GIN index doesn’t appear to work, either. 
> Should it? Is there perhaps some other op class that would allow it to work? 
> Or would I have to create a separate BTREE index on `movie -> 'year'`?

Thanks,

David



signature.asc
Description: Message signed with OpenPGP


Re: JSON Path and GIN Questions

2023-09-15 Thread David E. Wheeler
On Sep 14, 2023, at 00:41, Tom Lane  wrote:

> As far as json in particular is concerned, 8.14.4 jsonb Indexing [4]
> is pretty clear about what is or is not supported.

How do you feel about this note, then?

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index b6c2ddbf55..7dda727f0d 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -413,6 +413,13 @@ SELECT doc-'site_name' FROM websites
 Two GIN operator classes are provided, offering different
 performance and flexibility trade-offs.
   
+  
+
+As with all indexes, only operators on indexed columns are considered as
+index qualifications. In other words, only jsonb operators can
+take advantage of GIN indexes; jsonb functions cannot.
+
+  
   
 The default GIN operator class for jsonb supports queries with
 the key-exists operators ?, ?|


Best,

David



signature.asc
Description: Message signed with OpenPGP


Re: Detoasting optionally to make Explain-Analyze less misleading

2023-09-15 Thread stepan rutz

Hi,

please see a revised version yesterday's mail. The patch attached now
provides the following:

EXPLAIN(ANALYZE,SERIALIZE)

and

EXPLAIN(ANALYZE,SERIALIZEBINARY)

and timing output.

Both options perform the serialization during analyze and provide an
additional output in the plan like this:


template1=# explain (analyze,serialize) select * from t12 limit 1;
  QUERY PLAN
---

 ...

 Serialized Bytes: 36 bytes
 Execution Time: 0.035 ms
(5 rows)

or also this


template1=# explain (analyze,serialize) select * from t1;
 QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..1.02 rows=2 width=19) (actual
time=0.101..0.111 rows=5 loops=1)
 Planning Time: 0.850 ms
 Serialized Bytes: 85777978 bytes
 Execution Time: 354.284 ms
(4 rows)


Its tempting to divide Serialized-Bytes by Execution-Time to get an idea
of the serialization bandwidth. This is /dev/null serialization though.
The results are length-counted and then discarded.

Since detoasting happens implicitly during serialization, the number of
bytes becomes huge in this case and accounts for the detoasted lengths
as well. I tried to get the number of bytes send for the protocol's
messages and the attribute headers correctly. For the actual values I am
quite sure I get the correct measures, as one can really tell by sending
more values across. Null is 4 bytes on the wire interestingly. I didn't
know that, but it makes sense, since its using the same prefix
length-field as all values do.

I have checked the JBDC driver and it uses binary and text formats
depending on an attribute's type oid. So having the SERIALIZEBINARY
option is not accurate, as in reality both formats can be occur for the
same tuple.

Please provide some feedback on the new patch and let me know if this
makes sense. In general this kind of option for EXPLAIN is a good thing
for sure.


Greetings,

Stepan


On 14.09.23 21:27, stepan rutz wrote:

Hi Tom, Hi Matthias,

you are right of course. I have looked at the code from printtup.c and
made a new version of the patch.

Thanks for the MemoryContextReset hint too (@Matthias)

This time is called  EXPLAIN(ANALYZE,SERIALIZE) (hey, it also sounds
nicer phonetically)

If the option SERIALIZE is active, the output functions are called and
they perform the detoasting, which I have even checked.

So things are better this way, however I hardcoded the output option
"Text" (format=0). In printtup.c there is an incoming array which
applies Text (format=0) or Binary (format=1) for each column
individually. I am not sure whether this is even needed. I left in the
if-statement from printtup.c which calls the binary output method of a
given type. The result of the output is ignored and apparently free'd
because of the memory-context-reset at the end.

Please also note, that I added a call to DestReceiver's rDestroy hook,
which was missing from explain.c before altogether.

Feedback is appreciated.

/Stepan


On 12.09.23 17:26, Tom Lane wrote:

Matthias van de Meent  writes:

Hmm, maybe we should measure the overhead of serializing the tuples
instead.
The difference between your patch and "serializing the tuples, but not
sending them" is that serializing also does the detoasting, but also
includes any time spent in the serialization functions of the type. So
an option "SERIALIZE" which measures all the time the server spent on
the query (except the final step of sending the bytes to the client)
would likely be more useful than "just" detoasting.

+1, that was my immediate reaction to the proposal as well. Some
output functions are far from cheap.  Doing only the detoast part
seems like it's still misleading.

Do we need to go as far as offering both text-output and binary-output
options?

    regards, tom lane
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8570b14f62..9090e0f114 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -26,6 +26,7 @@
 #include "nodes/nodeFuncs.h"
 #include "parser/analyze.h"
 #include "parser/parsetree.h"
+#include "mb/pg_wchar.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
 #include "tcop/tcopprot.h"
@@ -47,7 +48,6 @@ ExplainOneQuery_hook_type ExplainOneQuery_hook = NULL;
 /* Hook for plugins to get control in explain_get_index_name() */
 explain_get_index_name_hook_type explain_get_index_name_hook = NULL;
 
-
 /* OR-able flags for ExplainXMLTag() */
 #define X_OPENING 0
 #define X_CLOSING 1
@@ -154,7 +154,8 @@ static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
-
+static DestReceiver *CreateSerializeDestReceiver(int16 format);
+static uint64 

ALTER ROLE documentation improvement

2023-09-15 Thread Yurii Rashkovskii
Hi,

It appears that 16.0 improved some of the checks in ALTER ROLE. Previously,
it was possible to do the following (assuming current_user is a bootstrap
user):

```
ALTER ROLE current_user NOSUPERUSER
```

As of 16.0, this produces an error:

```
ERROR:  permission denied to alter role
DETAIL:  The bootstrap user must have the SUPERUSER attribute.
```

The attached patch documents this behavior by providing a bit more
clarification to the following statement:

"Database superusers can change any of these settings for any role."


-- 
Y.


0001-Improve-ALTER-ROLE-documentation-to-document-current.patch
Description: Binary data


Re: Faster "SET search_path"

2023-09-15 Thread Jeff Davis
On Tue, 2023-09-12 at 13:55 -0700, Jeff Davis wrote:
> On Mon, 2023-08-07 at 15:39 -0700, Nathan Bossart wrote:
> > 0003 is looking pretty good, too, but I think we
> > should get some more eyes on it, given the complexity.
> 
> Attached rebased version of 0003.

Is someone else planning to look at 0003, or should I just proceed? It
seems to be clearly wanted, and I think it's better to get it in this
'fest than to wait.

Regards,
Jeff Davis





SET ROLE documentation improvement

2023-09-15 Thread Yurii Rashkovskii
Hi,

I believe SET ROLE documentation makes a slightly incomplete statement
about what happens when a superuser uses SET ROLE.

The documentation reading suggests that the superuser would lose all their
privileges. However, they still retain the ability to use `SET ROLE` again.

The attached patch adds this bit to the documentation.

-- 
Y.


0001-Minor-improvement-to-SET-ROLE-documentation.patch
Description: Binary data


Re: Small patch modifying variable name to reflect the logic involved

2023-09-15 Thread Daniel Gustafsson
> On 14 Sep 2023, at 11:30, Daniel Gustafsson  wrote:
> 
>> On 14 Sep 2023, at 08:28, Krishnakumar R  wrote:
> 
>> Please find a small patch to improve code readability by modifying
>> variable name to reflect the logic involved - finding diff between end
>> and start time of WAL sync.
> 
> - INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start);
> + INSTR_TIME_SET_CURRENT(end);
> + INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
> 
> Agreed, the duration is the result of the INSTR_TIME_ACCUM_DIFF calculation,
> not what's stored in the instr_time variable.

And done, with a small fixup to handle another occurrence in the same file.

--
Daniel Gustafsson





Re: Add 'worker_type' to pg_stat_subscription

2023-09-15 Thread Nathan Bossart
On Thu, Sep 14, 2023 at 03:04:19PM -0700, Nathan Bossart wrote:
> The only reason I didn't apply this already is because IMHO we should
> adjust the worker types and the documentation for the view to be
> consistent.  For example, the docs say "leader apply worker" but the view
> just calls them "apply" workers.  The docs say "synchronization worker" but
> the view calls them "table synchronization" workers.  My first instinct is
> to call apply workers "leader apply" workers in the view, and to call table
> synchronization workers "table synchronization workers" in the docs.

Concretely, like this.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 965dc5f05eee9e1c6cac1374d5800fff1ea5cba2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 14 Sep 2023 14:31:44 -0700
Subject: [PATCH v7 1/1] Add worker type to pg_stat_subscription.

Thanks to 2a8b40e368, the logical replication worker type is easily
determined, and this information is a nice addition to the
pg_stat_subscription view.  The worker type could already be
deduced via other columns such as leader_pid and relid, but that is
unnecessary complexity for users.

Bumps catversion.

Author: Peter Smith
Reviewed-by: Michael Paquier, Maxim Orlov
Discussion: https://postgr.es/m/CAHut%2BPtmbSMfErSk0S7xxVdZJ9XVE3xVLhqBTmT91kf57BeKDQ%40mail.gmail.com
---
 doc/src/sgml/monitoring.sgml   | 13 -
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/replication/logical/launcher.c | 18 +-
 src/include/catalog/catversion.h   |  2 +-
 src/include/catalog/pg_proc.dat|  6 +++---
 src/test/regress/expected/rules.out|  3 ++-
 src/test/subscription/t/004_sync.pl|  2 +-
 7 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4ff415d6a0..d2328eb85d 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1993,6 +1993,17 @@ description | Waiting for a newly initialized WAL file to reach durable storage
   
  
 
+ 
+  
+   worker_type text
+  
+  
+   Type of the subscription worker process.  Possible types are
+   leader apply, parallel apply, and
+   table synchronization.
+  
+ 
+
  
   
pid integer
@@ -2008,7 +2019,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage
   
   
Process ID of the leader apply worker if this process is a parallel
-   apply worker; NULL if this process is a leader apply worker or a
+   apply worker; NULL if this process is a leader apply worker or a table
synchronization worker
   
  
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 77b06e2a7a..fcb14976c0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -949,6 +949,7 @@ CREATE VIEW pg_stat_subscription AS
 SELECT
 su.oid AS subid,
 su.subname,
+st.worker_type,
 st.pid,
 st.leader_pid,
 st.relid,
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 7882fc91ce..54ab8a37f4 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -1278,7 +1278,7 @@ GetLeaderApplyWorkerPid(pid_t pid)
 Datum
 pg_stat_get_subscription(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_SUBSCRIPTION_COLS	9
+#define PG_STAT_GET_SUBSCRIPTION_COLS	10
 	Oid			subid = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0);
 	int			i;
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
@@ -1339,6 +1339,22 @@ pg_stat_get_subscription(PG_FUNCTION_ARGS)
 		else
 			values[8] = TimestampTzGetDatum(worker.reply_time);
 
+		switch (worker.type)
+		{
+			case WORKERTYPE_APPLY:
+values[9] = CStringGetTextDatum("leader apply");
+break;
+			case WORKERTYPE_PARALLEL_APPLY:
+values[9] = CStringGetTextDatum("parallel apply");
+break;
+			case WORKERTYPE_TABLESYNC:
+values[9] = CStringGetTextDatum("table synchronization");
+break;
+			case WORKERTYPE_UNKNOWN:
+/* Should never happen. */
+elog(ERROR, "unknown worker type");
+		}
+
 		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
 			 values, nulls);
 
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 4eaef54d0c..f1f6c5855b 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*			mmddN */
-#define CATALOG_VERSION_NO	202309061
+#define CATALOG_VERSION_NO	202309151
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..f0b7b9cbd8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5484,9 +5484,9 @@
   proname => 'pg_stat_get_subscription', prorows => 

Re: Implement a column store for pg?

2023-09-15 Thread Jonah H. Harris
On Fri, Sep 15, 2023 at 10:21 AM jacktby jacktby  wrote:

> > I’m trying to implement a new column store for pg, is there a good
> example to reference?
> That’s too complex, I just need to know the interface about design a
> column store. In fact, I just need a simple example, and I will implement
> it by myself, what I’m confusing is that, I don’t know how to implement a
> MVCC, because old version is tuple, this will make a big difference to the
> transaction?


If you're looking for the simplest version of a columnar implementation for
Postgres, I'd check out Citus' original cstore implemented via FDW. It
hasn't been updated in years, but it's still one of the faster simple
columnar implementations out there https://github.com/citusdata/cstore_fdw

--
Jonah H. Harris


Re: Make --help output fit within 80 columns per line

2023-09-15 Thread torikoshia

On 2023-09-12 15:27, Peter Eisentraut wrote:

Also, it would be very useful if the TAP test function could print out
the violating lines if a test fails.  (Similar to how is() and like()
print the failing values.)


Attached patch for this.
Below are the the outputs when test failed:

```
$ cd contrib/vacuumlo
$ make check
...(snip)...
t/001_basic.pl .. 1/?
#   Failed test '  -n, --dry-run don't remove large 
objects, just show what would be done'
#   at 
/home/atorik/postgres/contrib/vacuumlo/../../src/test/perl/PostgreSQL/Test/Utils.pm 
line 850.

# Looks like you failed 1 test of 21.

#   Failed test 'vacuumlo --help outputs fit within 80 columns per line'
#   at t/001_basic.pl line 10.
# Looks like you failed 1 test of 9.
t/001_basic.pl .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/9 subtests

Test Summary Report
---
t/001_basic.pl (Wstat: 256 (exited 1) Tests: 9 Failed: 1)
  Failed test:  4
  Non-zero exit status: 1
Files=1, Tests=9,  0 wallclock secs ( 0.01 usr  0.01 sys +  0.04 cusr  
0.01 csys =  0.07 CPU)

Result: FAIL
```

```
$ cat tmp_check/log/regress_log_001_basic
# Running: vacuumlo --help
[23:11:10.378](0.230s) ok 1 - vacuumlo --help exit code 0
[23:11:10.379](0.001s) ok 2 - vacuumlo --help goes to stdout
[23:11:10.379](0.000s) ok 3 - vacuumlo --help nothing to stderr
[23:11:10.380](0.000s) # Subtest: vacuumlo --help outputs fit within 80 
columns per line
[23:11:10.380](0.001s) ok 1 - vacuumlo removes unreferenced large 
objects from databases.

[23:11:10.380](0.000s) ok 2 -
[23:11:10.381](0.000s) ok 3 - Usage:
[23:11:10.381](0.000s) ok 4 -   vacuumlo [OPTION]... DBNAME...
[23:11:10.381](0.000s) ok 5 -
[23:11:10.381](0.000s) ok 6 - Options:
[23:11:10.381](0.000s) ok 7 -   -l, --limit=LIMIT commit 
after removing each LIMIT large objects
[23:11:10.382](0.000s) ok 20 - Report bugs to 
.
[23:11:10.382](0.000s) ok 21 - PostgreSQL home page: 


[23:11:10.382](0.000s) 1..21
[23:11:10.382](0.000s) # Looks like you failed 1 test of 21.
[23:11:10.382](0.000s) not ok 4 - vacuumlo --help outputs fit within 80 
columns per line

[23:11:10.382](0.000s)
[23:11:10.382](0.000s) #   Failed test 'vacuumlo --help outputs fit 
within 80 columns per line'

#   at t/001_basic.pl line 10.
# Running: vacuumlo --version
[23:11:10.388](0.005s) ok 5 - vacuumlo --version exit code 0
[23:11:10.388](0.000s) ok 6 - vacuumlo --version goes to stdout
[23:11:10.388](0.000s) ok 7 - vacuumlo --version nothing to stderr
# Running: vacuumlo --not-a-valid-option
[23:11:10.391](0.003s) ok 8 - vacuumlo with invalid option nonzero exit 
code
[23:11:10.391](0.000s) ok 9 - vacuumlo with invalid option prints error 
message

[23:11:10.391](0.000s) 1..9
[23:11:10.391](0.000s) # Looks like you failed 1 test of 9.
```

I feel using subtest in Test::More improves readability.


On 2023-09-14 02:46, Greg Sabino Mullane wrote:
All this seems an awful lot of work to support this mythical 80-column 
terminal user.
It's 2023, perhaps it's time to widen the default assumption past 80 
characters?


That may be a good idea.

However, from what I have seen some basic commands like `ls` in my Linux 
environments, the man command has over 100 characters per line, while 
the output of the --help option seems to be within 80 characters per 
line.
Also, the current PostgreSQL commands follow the "no more than 80 
characters per line".


I do not intend to adhere to this rule(my terminals are usually bigger 
than 80 chars per line), but wouldn't it be a not bad direction to use 
80 characters for all commands?


Thoughts?

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 3dbfdb79680a0c1b68d4f742ae408810a1ee999d Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Fri, 15 Sep 2023 23:29:23 +0900
Subject: [PATCH v1] Added a test for checking the length of --help output

---
 src/test/perl/PostgreSQL/Test/Utils.pm | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 617caa022f..989f369ae7 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -843,6 +843,13 @@ sub program_help_ok
 	ok($result, "$cmd --help exit code 0");
 	isnt($stdout, '', "$cmd --help goes to stdout");
 	is($stderr, '', "$cmd --help nothing to stderr");
+
+	subtest "$cmd --help outputs fit within 80 columns per line" => sub {
+	foreach my $line (split /\n/, $stdout)
+	{
+		ok(length($line) <= 80, "$line");
+	}
+};
 	return;
 }
 
-- 
2.39.2



Re: Unlogged relations and WAL-logging

2023-09-15 Thread Heikki Linnakangas

On 01/09/2023 15:49, Peter Eisentraut wrote:

Is the patch
0003-Remove-unnecessary-smgrimmedsync-when-creating-unlog.patch still
relevant, or can this commitfest entry be closed?


Yes. Pushed it now, thanks!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: bug fix and documentation improvement about vacuumdb

2023-09-15 Thread Nathan Bossart
On Fri, Sep 15, 2023 at 10:13:10AM +0200, Daniel Gustafsson wrote:
>> On 15 Sep 2023, at 04:39, Kyotaro Horiguchi  wrote:
>> It seems to work fine. However, if we're aiming for consistent
>> spacing, the "IS NULL" (two spaces in between) might be an concern.
> 
> I don't think that's a problem.  I would rather have readable C code and two
> spaces in the generated SQL than contorting the C code to produce less
> whitespace in a query few will read in its generated form.

I think we could pretty easily avoid the extra space and keep the C code
relatively readable.  These sorts of things bug me, too (see 2af3336).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set

2023-09-15 Thread Daniel Gustafsson
> On 15 Sep 2023, at 16:38, Nathan Bossart  wrote:

> this should use errdetail() instead of errhint().  In
> the provided patch, the new message explains how the module is not
> configured.  It doesn't hint at how to fix it (although presumably one
> could figure that out pretty easily).

Fair point, I agree with your reasoning that errdetail seems more appropriate.

--
Daniel Gustafsson





Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set

2023-09-15 Thread Nathan Bossart
On Fri, Sep 15, 2023 at 02:48:55PM +0200, Daniel Gustafsson wrote:
>> On 15 Sep 2023, at 12:49, Alvaro Herrera  wrote:
>> 
>> On 2023-Sep-15, Daniel Gustafsson wrote:
>> 
>>> -basic_archive_configured(ArchiveModuleState *state)
>>> +basic_archive_configured(ArchiveModuleState *state, const char **errmsg)
>>> 
>>> The variable name errmsg implies that it will contain the errmsg() data 
>>> when it
>>> in fact is used for errhint() data, so it should be named accordingly.

I have no objection to allowing this callback to provide additional
information, but IMHO this should use errdetail() instead of errhint().  In
the provided patch, the new message explains how the module is not
configured.  It doesn't hint at how to fix it (although presumably one
could figure that out pretty easily).

>>> It's probably better to define the interface as ArchiveCheckConfiguredCB
>>> functions returning an allocated string in the passed pointer which the 
>>> caller
>>> is responsible for freeing.

That does seem more flexible.

>> Also note that this callback is documented in archive-modules.sgml, so
>> that needs to be updated as well.  This also means you can't backpatch
>> this change, or you risk breaking external software that implements this
>> interface.
> 
> Absolutely, this is master only for v17.

+1

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}

2023-09-15 Thread Melanie Plageman
On Fri, Sep 15, 2023 at 9:24 AM Nazir Bilal Yavuz  wrote:
> I found that pgBufferUsage.blk_{read|write}_time are zero although there are 
> pgBufferUsage.local_blks_{read|written}

Yes, good catch. This is a bug. I will note that at least in 15 and
likely before, pgBufferUsage.local_blks_written is incremented for
local buffers but pgBufferUsage.blk_write_time is only added to for
shared buffers (in FlushBuffer()). I think it makes sense to propose a
bug fix to stable branches counting blk_write_time for local buffers
as well.

- Melanie




Re: sslinfo extension - add notbefore and notafter timestamps

2023-09-15 Thread Daniel Gustafsson
> On 12 Sep 2023, at 21:40, Jacob Champion  wrote:
> 
> Hello,
> 
> On 7/25/23 07:21, Daniel Gustafsson wrote:
>> The attached version passes ssl tests for me on 1.0.2 through OpenSSL Git 
>> HEAD.
> 
> Tests pass for me too, including LibreSSL 3.8.

Thanks for testing!

>> +   /* Calculate the diff from the epoch to the certificat timestamp */
> 
> "certificate"

Fixed.

>> + ssl_client_get_notbefore() returns text
>> ...> + ssl_client_get_notafter() returns text
> 
> I think this should say timestamptz rather than text? Ditto for the
> pg_stat_ssl documentation.
> 
> Speaking of which: is the use of `timestamp` rather than `timestamptz`
> in pg_proc.dat intentional? Will that cause problems with comparisons?

It should be timestamptz, it was a tyop on my part. Fixed.

> I haven't been able to poke any holes in the ASN1_TIME_to_timestamp()
> implementations themselves. I went down a rabbit hole trying to find out
> whether leap seconds could cause problems for us when we switch to
> `struct tm` in the future, but it turns out OpenSSL rejects leap seconds
> in the Validity fields. That seems weird -- as far as I can tell, RFC
> 5280 defers to ASN.1 which defers to ISO 8601 which appears to allow
> leap seconds -- but I don't plan to worry about it anymore. (I do idly
> wonder whether some CA, somewhere, has ever had a really Unhappy New
> Year due to that.)

That's an interesting thought, maybe the CA's have adapted given the
marketshare of OpenSSL?

Thanks for reviewing, the attached v8 contains the fixes from this review along
with a fresh rebase and some attempts at making tests more stable in the face
of timezones by casting to date.

--
Daniel Gustafsson



v8-0001-Add-notBefore-and-notAfter-to-SSL-cert-info-displ.patch
Description: Binary data


Re: Implement a column store for pg?

2023-09-15 Thread jacktby jacktby


> 2023年9月15日 20:31,jacktby jacktby  写道:
> 
> I’m trying to implement a new column store for pg, is there a good example to 
> reference?
That’s too complex, I just need to know the interface about design a column 
store. In fact, I just need a simple example, and I will implement it by 
myself, what I’m confusing is that, I don’t know how to implement a MVCC, 
because old version is tuple, this will make a big difference to the 
transaction? 



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-15 Thread Hayato Kuroda (Fujitsu)
> Thank you for reviewing! PSA new version patch set.

Sorry, wrong patch attached. PSA the correct ones.
There is a possibility that XLOG_PARAMETER_CHANGE may be generated, when GUC
parameters are changed just before doing the upgrade. Added to list.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v38-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v38-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


v38-0002-Use-binary_upgrade_validate_wal_record_types_aft.patch
Description:  v38-0002-Use-binary_upgrade_validate_wal_record_types_aft.patch
From ff250bb4bd467aaf8b09a27ab9edc93a3d9bb9bc Mon Sep 17 00:00:00 2001
From: Hayato Kuroda 
Date: Thu, 14 Sep 2023 06:01:40 +
Subject: [PATCH v38] Another one: Reads all WAL records ahead
 confirmed_flush_lsn

---
 doc/src/sgml/ref/pgupgrade.sgml   |   5 +-
 src/backend/utils/adt/pg_upgrade_support.c| 133 ++
 src/bin/pg_upgrade/check.c|   8 +-
 src/bin/pg_upgrade/controldata.c  |  39 -
 src/bin/pg_upgrade/info.c |   6 +-
 src/bin/pg_upgrade/pg_upgrade.h   |   1 -
 .../t/003_logical_replication_slots.pl|  20 +++
 src/include/catalog/pg_proc.dat   |   6 +
 8 files changed, 167 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4e2281bae4..2588d6d7b8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -418,10 +418,7 @@ make prefix=/usr/local/pgsql.new install
  
  
   
-   pg_replication_slots.confirmed_flush_lsn
-   of all slots on the old cluster must be the same as the latest
-   checkpoint location. This ensures that all the data has been replicated
-   before the upgrade.
+   Old cluster has replicated all the changes replicated to subscribers.
   
  
  
diff --git a/src/backend/utils/adt/pg_upgrade_support.c 
b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..5b58769b5e 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,22 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/rmgr.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE
\
@@ -29,6 +37,9 @@ do {  
\
 errmsg("function can only be called when 
server is in binary upgrade mode"))); \
 } while (0)
 
+#define CHECK_WAL_RECORD(rmgrid, info, expected_rmgrid, expected_info) \
+   (rmgrid == expected_rmgrid && info == expected_info)
+
 Datum
 binary_upgrade_set_next_pg_tablespace_oid(PG_FUNCTION_ARGS)
 {
@@ -261,3 +272,125 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
PG_RETURN_VOID();
 }
+
+/*
+ * Return true if we didn't find any unexpected WAL record, false otherwise.
+ *
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted after logical waslenders exit, so such types would be
+ * ignored.
+ *
+ * XLOG_CHECKPOINT_SHUTDOWN is ignored because it would be inserted after the
+ * waslender exits. XLOG_PARAMETER_CHANGE is also ignored because it would be
+ * inserted when GUC parameters are changed just before doing the upgrade.
+ * Moreover, the following types of records would be during
+ * the pg_upgrade --check, so they are ignored too.
+ *
+ * - XLOG_CHECKPOINT_ONLINE
+ * - XLOG_RUNNING_XACTS
+ * - XLOG_FPI_FOR_HINT
+ * - XLOG_HEAP2_PRUNE
+ */
+Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
+{
+   XLogRecPtrstart_lsn = PG_GETARG_LSN(0);
+   XLogReaderState *xlogreader;
+   boolinitial_record = true;
+   boolresult = true;
+   ReadLocalXLogPageNoWaitPrivate *private_data;
+
+   CHECK_IS_BINARY_UPGRADE;
+
+   /* Quick exit if the given lsn is larger than current one */
+   if (start_lsn >= GetFlushRecPtr(NULL))
+   PG_RETURN_BOOL(false);
+
+   private_data = (ReadLocalXLogPageNoWaitPrivate *)
+   palloc0(sizeof(ReadLocalXLogPageNoWaitPrivate));
+
+   xlogreader = 

Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set

2023-09-15 Thread Daniel Gustafsson
> On 15 Sep 2023, at 12:49, Alvaro Herrera  wrote:
> 
> On 2023-Sep-15, Daniel Gustafsson wrote:
> 
>> -basic_archive_configured(ArchiveModuleState *state)
>> +basic_archive_configured(ArchiveModuleState *state, const char **errmsg)
>> 
>> The variable name errmsg implies that it will contain the errmsg() data when 
>> it
>> in fact is used for errhint() data, so it should be named accordingly.
>> 
>> It's probably better to define the interface as ArchiveCheckConfiguredCB
>> functions returning an allocated string in the passed pointer which the 
>> caller
>> is responsible for freeing.
> 
> Also note that this callback is documented in archive-modules.sgml, so
> that needs to be updated as well.  This also means you can't backpatch
> this change, or you risk breaking external software that implements this
> interface.

Absolutely, this is master only for v17.

> I suggest that 'msg' shouldn't be a global variable.  There's no need
> for that AFAICS; but if there is, this is a terrible name for it.

Agreed.

--
Daniel Gustafsson





Re: Implement a column store for pg?

2023-09-15 Thread Daniel Gustafsson
> On 15 Sep 2023, at 14:31, jacktby jacktby  wrote:
> 
> I’m trying to implement a new column store for pg, is there a good example to 
> reference?

There are open-source forks of postgres that have column-stores, like Greenplum
for example.  Be sure to check the license and existence of any patents on any
code before studying it though.  The most recent attempt to make a column-store
for PostgreSQL was, IIRC, zedstore.  The zedstore thread might give some
insights:

https://postgr.es/m/calfoeiuf-m5jg51mjupm5gn8u396o5sa2af5n97vtraedya...@mail.gmail.com

--
Daniel Gustafsson





RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-15 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thank you for reviewing! PSA new version patch set.

> Few comments:
> 1. Why is the FPI record (XLOG_FPI_FOR_HINT) not considered a record
> to be ignored? This can be generated during reading system tables.

Oh, I just missed. Written in comments atop the function, but not added here.
Added to white-list.

> 2.
> +binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
> {
> ...
> + if (initial_record)
> + {
> + /* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
> + if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
> +   XLOG_CHECKPOINT_SHUTDOWN))
> + result = false;
> ...
> + if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
> XLOG_CHECKPOINT_SHUTDOWN) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
> XLOG_CHECKPOINT_ONLINE) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID,
> XLOG_RUNNING_XACTS) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
> + result = false;
> ...
> }
> 
> Isn't it better to immediately return false if any unexpected WAL is
> found? This will avoid reading unnecessary WAL

IIUC we can exit the loop of the result == false, so we do not have to read
unnecessary WALs. See the condition below. I used the approach because
private_data and xlogreader should be pfree()'d as cleanup.

```
/* Loop until all WALs are read, or unexpected record is found */
while (result && ReadNextXLogRecord(xlogreader))
{
```

> 3.
> +Datum
> +binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
> +{
> ...
> +
> + CHECK_IS_BINARY_UPGRADE;
> +
> + /* Quick exit if the given lsn is larger than current one */
> + if (start_lsn >= curr_lsn)
> + PG_RETURN_BOOL(true);
> 
> Why do you return true here? My understanding was if the first record
> is not a shutdown checkpoint record then it should fail, if that is
> not true then I think we need to explain the same in comments.

I wondered what should be because it is unexpected input for us (note that this 
unction could be used only for upgrade purpose). But yes, initially read WAL 
must
be XLOG_SHUTDOWN_CHECKPOINT,  so changed as you said.

Also, I did a self-reviewing again and reworded comments.

BTW, the 0002 ports some functions from pg_walinspect, it may be not elegant.
Coupling degree between core/extensions should be also lower. So I made another
patch which does not port anything and implements similar functionalities 
instead.
I called the patch 0003, but can be applied atop 0001 (not 0002). To make cfbot
happy, attached as txt file.
Could you please tell me which do you like 0002 or 0003?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

From d3fa36f3bc7f8ef4c0c541742ac8ad6d9eee5f09 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda 
Date: Thu, 14 Sep 2023 06:01:40 +
Subject: [PATCH v38] Another one: Reads all WAL records ahead
 confirmed_flush_lsn

---
 doc/src/sgml/ref/pgupgrade.sgml   |   5 +-
 src/backend/utils/adt/pg_upgrade_support.c| 130 ++
 src/bin/pg_upgrade/check.c|   8 +-
 src/bin/pg_upgrade/controldata.c  |  39 --
 src/bin/pg_upgrade/info.c |   6 +-
 src/bin/pg_upgrade/pg_upgrade.h   |   1 -
 .../t/003_logical_replication_slots.pl|  20 +++
 src/include/catalog/pg_proc.dat   |   6 +
 8 files changed, 164 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 4e2281bae4..2588d6d7b8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -418,10 +418,7 @@ make prefix=/usr/local/pgsql.new install
  
  
   
-   pg_replication_slots.confirmed_flush_lsn
-   of all slots on the old cluster must be the same as the latest
-   checkpoint location. This ensures that all the data has been replicated
-   before the upgrade.
+   Old cluster has replicated all the changes replicated to subscribers.
   
  
  
diff --git a/src/backend/utils/adt/pg_upgrade_support.c 
b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..340dc180be 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,22 @@
 
 #include "postgres.h"
 
+#include "access/heapam_xlog.h"
+#include "access/rmgr.h"
+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogutils.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_control.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
 #include "miscadmin.h"
+#include "storage/standbydefs.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"
 
 
 #define CHECK_IS_BINARY_UPGRADE
\
@@ -29,6 +37,9 @@ do {  
\

Implement a column store for pg?

2023-09-15 Thread jacktby jacktby
I’m trying to implement a new column store for pg, is there a good example to 
reference?



Re: POC: Extension for adding distributed tracing - pg_tracing

2023-09-15 Thread Nikita Malakhov
Hi!
Great you continue to work on this patch!
I'm checking out the newest changes.


On Fri, Sep 15, 2023 at 2:32 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi,
>
> > Renaming/Refactoring:
> > - All spans are now tracked in the palloced current_trace_spans buffer
> > compared to top_span and parse_span being kept in a static variable
> > before.
> > - I've renamed query_spans to top_span. A top_span serves as the
> > parent for all spans in a specific nested level.
> > - More debugging information and assertions. Spans now track their
> > nested level, if they've been ended and if they are a top_span.
> >
> > Changes:
> > - I've added the subxact_count to the span's metadata. This can help
> > identify the moment a subtransaction was started.
> > - I've reworked nested queries and utility statements. Previously, I
> > made the assumptions that we could only have one top_span per nested
> > level which is not the case. Some utility statements can execute
> > multiple queries in the same nested level. Tracing utility statement
> > now works better (see image of tracing a create extension).
>
> Many thanks for the updated patch.
>
> If it's not too much trouble, please use `git format-patch`. This
> makes applying and testing the patch much easier. Also you can provide
> a commit message which 1. will simplify the work for the committer and
> 2. can be reviewed as well.
>
> The tests fail on CI [1]. I tried to run them locally and got the same
> results. I'm using full-build.sh from this repository [2] for
> Autotools and the following one-liner for Meson:
>
> ```
> time CPPFLAGS="-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS" sh -c 'git
> clean -dfx && meson setup --buildtype debug -Dcassert=true
> -DPG_TEST_EXTRA="kerberos ldap ssl load_balance" -Dldap=disabled
> -Dssl=openssl -Dtap_tests=enabled
> -Dprefix=/home/eax/projects/pginstall build && ninja -C build && ninja
> -C build docs && PG_TEST_EXTRA=1 meson test -C build'
> ```
>
> The comments for pg_tracing.c don't seem to be formatted properly.
> Please make sure to run pg_indent. See src/tools/pgindent/README
>
> The interface pg_tracing_spans(true | false) doesn't strike me as
> particularly readable. Maybe this function should be private and the
> interface be like pg_tracing_spans() and pg_trancing_consume_spans().
> Alternatively you could use named arguments in the tests, but I don't
> think this is a preferred solution.
>
> Speaking of the tests I suggest adding a bit more comments before
> every (or most) of the queries. Figuring out what they test could be
> not particularly straightforward for somebody who will make changes
> after the patch will be accepted.
>
> [1]: http://cfbot.cputube.org/
> [2]: https://github.com/afiskon/pgscripts/
>
> --
> Best regards,
> Aleksander Alekseev
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: POC: Extension for adding distributed tracing - pg_tracing

2023-09-15 Thread Aleksander Alekseev
Hi,

> Renaming/Refactoring:
> - All spans are now tracked in the palloced current_trace_spans buffer
> compared to top_span and parse_span being kept in a static variable
> before.
> - I've renamed query_spans to top_span. A top_span serves as the
> parent for all spans in a specific nested level.
> - More debugging information and assertions. Spans now track their
> nested level, if they've been ended and if they are a top_span.
>
> Changes:
> - I've added the subxact_count to the span's metadata. This can help
> identify the moment a subtransaction was started.
> - I've reworked nested queries and utility statements. Previously, I
> made the assumptions that we could only have one top_span per nested
> level which is not the case. Some utility statements can execute
> multiple queries in the same nested level. Tracing utility statement
> now works better (see image of tracing a create extension).

Many thanks for the updated patch.

If it's not too much trouble, please use `git format-patch`. This
makes applying and testing the patch much easier. Also you can provide
a commit message which 1. will simplify the work for the committer and
2. can be reviewed as well.

The tests fail on CI [1]. I tried to run them locally and got the same
results. I'm using full-build.sh from this repository [2] for
Autotools and the following one-liner for Meson:

```
time CPPFLAGS="-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS" sh -c 'git
clean -dfx && meson setup --buildtype debug -Dcassert=true
-DPG_TEST_EXTRA="kerberos ldap ssl load_balance" -Dldap=disabled
-Dssl=openssl -Dtap_tests=enabled
-Dprefix=/home/eax/projects/pginstall build && ninja -C build && ninja
-C build docs && PG_TEST_EXTRA=1 meson test -C build'
```

The comments for pg_tracing.c don't seem to be formatted properly.
Please make sure to run pg_indent. See src/tools/pgindent/README

The interface pg_tracing_spans(true | false) doesn't strike me as
particularly readable. Maybe this function should be private and the
interface be like pg_tracing_spans() and pg_trancing_consume_spans().
Alternatively you could use named arguments in the tests, but I don't
think this is a preferred solution.

Speaking of the tests I suggest adding a bit more comments before
every (or most) of the queries. Figuring out what they test could be
not particularly straightforward for somebody who will make changes
after the patch will be accepted.

[1]: http://cfbot.cputube.org/
[2]: https://github.com/afiskon/pgscripts/

-- 
Best regards,
Aleksander Alekseev




Re: Unlogged relation copy is not fsync'd

2023-09-15 Thread Heikki Linnakangas

On 05/09/2023 21:20, Robert Haas wrote:

In other words, somehow it feels like we ought to be trying to defer
the fsync here until a clean shutdown actually occurs, instead of
performing it immediately.


+1


Admittedly, the bookkeeping seems like a problem, so maybe this is
the best we can do, but it's clearly worse than what we do in other
cases.
I think we can do it, I have been working on a patch along those lines 
on the side. But I want to focus on a smaller, backpatchable fix in this 
thread.



Thinking about this some more, I think this is still not 100% correct, 
even with the patch I posted earlier:



/*
 * When we WAL-logged rel pages, we must nonetheless fsync them.  The
 * reason is that since we're copying outside shared buffers, a 
CHECKPOINT
 * occurring during the copy has no way to flush the previously written
 * data to disk (indeed it won't know the new rel even exists).  A crash
 * later on would replay WAL from the checkpoint, therefore it wouldn't
 * replay our earlier WAL entries. If we do not fsync those pages here,
 * they might still not be on disk when the crash occurs.
 */
if (use_wal || relpersistence == RELPERSISTENCE_UNLOGGED)
smgrimmedsync(dst, forkNum);


Let's walk through each case one by one:

1. Temporary table. No fsync() needed. This is correct.

2. Unlogged rel, main fork. Needs to be fsync'd, because we skipped WAL, 
and also because we bypassed the buffer manager. Correct.


3. Unlogged rel, init fork. Needs to be fsync'd, even though we 
WAL-logged it, because we bypassed the buffer manager. Like the comment 
explains. This is now correct with the patch.


4. Permanent rel, use_wal == true. Needs to be fsync'd, even though we 
WAL-logged it, because we bypassed the buffer manager. Like the comment 
explains. Correct.


5. Permanent rel, use_wal == false. We skip fsync, because it means that 
it's a new relation, so we have a sync request pending for it. (An 
assertion for that would be nice). At the end of transaction, in 
smgrDoPendingSyncs(), we will either fsync it or we will WAL-log all the 
pages if it was a small relation. I believe this is *wrong*. If 
smgrDoPendingSyncs() decides to WAL-log the pages, we have the same race 
condition that's explained in the comment, because we bypassed the 
buffer manager:


1. RelationCopyStorage() copies some of the pages.
2. Checkpoint happens, which fsyncs the relation (smgrcreate() 
registered a dirty segment when the relation was created)

3. RelationCopyStorage() copies the rest of the pages.
4. smgrDoPendingSyncs() WAL-logs all the pages.
5. Another checkpoint happens. It does *not* fsync the relation.
6. Crash.

WAL replay will not see the WAL-logged pages, because they were 
WAL-logged before the last checkpoint. And the contents were not fsync'd 
either.


In other words, we must do smgrimmedsync() here for permanent relations, 
even if use_wal==false, because we bypassed the buffer manager. Same 
reason we need to do it with use_wal==true.


For a backportable fix, I think we should change this to only exempt 
temporary tables, and call smgrimmedsync() for all other cases. Maybe 
would be safe to skip it in some cases, but it feels too dangerous to be 
clever here. The other similar callers of smgrimmedsync() in nbtsort.c, 
gistbuild.c, and rewriteheap.c, need similar treatment.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Unlogged relation copy is not fsync'd

2023-09-15 Thread Heikki Linnakangas

On 04/09/2023 16:59, Melanie Plageman wrote:

The patch looks reasonable to me. Is this [1] case in hash index build
that I reported but didn't take the time to reproduce similar?

[1] 
https://www.postgresql.org/message-id/CAAKRu_bPc81M121pOEU7W%3D%2BwSWEebiLnrie4NpaFC%2BkWATFtSA%40mail.gmail.com


Yes, I think you're right. Any caller of smgrwrite() must either:

a) Call smgrimmedsync(), after smgrwrite() and before the relation is 
visible to other transactions. Regardless of the 'skipFsync' parameter! 
I don't think this is ever completely safe unless it's a new relation. 
Like you pointed out with the hash index case. Or:


b) Hold the buffer lock, so that if a checkpoint happens, it cannot 
"race past" the page without seeing the sync request.


The comment on smgwrite() doesn't make this clear. It talks about 
'skipFsync', and gives the impression that as long as you pass 
'skipFsync=false', you don't need to worry about fsyncing. But that is 
not true. Even in these bulk loading cases where we currently call 
smgrimmedsync(), we would *still* need to call smgrimmedsync() if we 
used 'skipFsync=false'.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: pg_upgrade and logical replication

2023-09-15 Thread vignesh C
On Fri, 15 Sept 2023 at 15:08, vignesh C  wrote:
>
> On Tue, 12 Sept 2023 at 14:25, Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Dear Vignesh,
> >
> > Thank you for updating the patch! Here are some comments.
> >
> > Sorry if there are duplicate comments - the thread revived recently so I 
> > might
> > lose my memory.
> >
> > 01. General
> >
> > Is there a possibility that apply worker on old cluster connects to the
> > publisher during the upgrade? Regarding the pg_upgrade on publisher, the we
> > refuse TCP/IP connections from remotes and port number is also changed, so 
> > we can
> > assume that subscriber does not connect to. But IIUC such settings may not 
> > affect
> > to the connection source, so that the apply worker may try to connect to the
> > publisher. Also, is there any hazards if it happens?
>
> Yes, there is a possibility that the apply worker gets started and new
> transaction data is being synced from the publisher. I have made a fix
> not to start the launcher process in binary ugprade mode as we don't
> want the launcher to start apply worker during upgrade.

Another approach to solve this as suggested by one of my colleague
Hou-san would be to set max_logical_replication_workers = 0 while
upgrading. I will evaluate this and update the next version of patch
accordingly.

Regards,
Vignesh




Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set

2023-09-15 Thread Alvaro Herrera
On 2023-Sep-15, Daniel Gustafsson wrote:

> -basic_archive_configured(ArchiveModuleState *state)
> +basic_archive_configured(ArchiveModuleState *state, const char **errmsg)
> 
> The variable name errmsg implies that it will contain the errmsg() data when 
> it
> in fact is used for errhint() data, so it should be named accordingly.
> 
> It's probably better to define the interface as ArchiveCheckConfiguredCB
> functions returning an allocated string in the passed pointer which the caller
> is responsible for freeing.

Also note that this callback is documented in archive-modules.sgml, so
that needs to be updated as well.  This also means you can't backpatch
this change, or you risk breaking external software that implements this
interface.

I suggest that 'msg' shouldn't be a global variable.  There's no need
for that AFAICS; but if there is, this is a terrible name for it.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Speed up transaction completion faster after many relations are accessed in a transaction

2023-09-15 Thread Heikki Linnakangas

On 11/09/2023 15:00, David Rowley wrote:

On Wed, 5 Jul 2023 at 21:44, Heikki Linnakangas  wrote:



index 296dc82d2ee..edb8b6026e5 100644
--- a/src/backend/commands/discard.c
+++ b/src/backend/commands/discard.c
@@ -71,7 +71,7 @@ DiscardAll(bool isTopLevel)
 Async_UnlistenAll();
-   LockReleaseAll(USER_LOCKMETHOD, true);
+   LockReleaseSession(USER_LOCKMETHOD);
 ResetPlanCache();


This assumes that there are no transaction-level advisory locks. I think
that's OK. It took me a while to convince myself of that, though. I
think we need a high level comment somewhere that explains what
assumptions we make on which locks can be held in session mode and which
in transaction mode.


Isn't it ok because DISCARD ALL cannot run inside a transaction block,
so there should be no locks taken apart from possibly session-level
locks?


Hmm, sounds valid. I think I convinced myself that it's OK through some 
other reasoning, but I don't remember it now.



I've added a call to LockAssertNoneHeld(false) in there.


I don't see it in the patch?

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: psql: Add command to use extended query protocol

2023-09-15 Thread Alvaro Herrera
On 2023-Sep-14, Tobias Bussmann wrote:

> In one of my environments, this feature didn't work as expected.
> Digging into it, I found that it is incompatible with FETCH_COUNT
> being set. Sorry for not recognising this during the betas.
> 
> Attached a simple patch with tests running the cursor declaration
> through PQexecParams instead of PGexec.

Hmm, strange.  I had been trying to make \bind work with extended
protocol, and my findings were that there's interactions with the code
that was added for pipeline mode(*).  I put research aside to work on
other things, but intended to get back to it soon ... I'm really
surprised that it works for you here.

Maybe your tests are just not extensive enough to show that it fails.

(*) This is not actually proven, but Peter had told me that his \bind
stuff had previously worked when he first implemented it before pipeline
landed.  Because that's the only significant change that has happened to
the libpq code lately, it's a reasonable hypothesis.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo." (Jean B. Say)




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-09-15 Thread Damir Belyalov
> Since v5 patch failed applying anymore, updated the patch.
Thank you for updating the patch . I made a little review on it where
corrected some formatting.


> - COPY with a datatype error that can't be handled as a soft error
>
> I didn't know proper way to test this, but I've found data type widget's
> input function widget_in() defined to occur hard-error in regress.c,
> attached patch added a test using it.
>

This test seems to be weird a bit, because of the "widget" type. The hard
error is thrown by the previous test with missing data. Also it'll be
interesting for me to list all cases when a hard error can be thrown.

Regards,
Damir Belyalov
Postgres Professional
From 0e1193e00bb5ee810a015a2baaf7c79e395a54c7 Mon Sep 17 00:00:00 2001
From: Damir Belyalov 
Date: Fri, 15 Sep 2023 11:14:57 +0300
Subject: [PATCH] ignore errors

---
 doc/src/sgml/ref/copy.sgml   | 13 +
 src/backend/commands/copy.c  | 13 +
 src/backend/commands/copyfrom.c  | 37 
 src/backend/commands/copyfromparse.c | 20 ++---
 src/bin/psql/tab-complete.c  |  3 +-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  3 ++
 src/test/regress/expected/copy2.out  | 28 ++
 src/test/regress/sql/copy2.sql   | 26 +
 9 files changed, 139 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4d614a0225..d5cdbb4025 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL ( column_name [, ...] )
 FORCE_NULL ( column_name [, ...] )
+IGNORE_DATATYPE_ERRORS [ boolean ]
 ENCODING 'encoding_name'
 
  
@@ -370,6 +371,18 @@ COPY { table_name [ ( 

 
+   
+IGNORE_DATATYPE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  with columns where the data type's input-function raises an error.
+  This option is not allowed when using binary format.  Note that this
+  is only supported in current COPY syntax.
+ 
+
+   
+

 ENCODING
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..beb73f5357 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified = true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
@@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate,
 (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("cannot specify DEFAULT in BINARY mode")));
 
+	if (opts_out->binary && opts_out->ignore_datatype_errors)
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode")));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
 		opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 70871ed819..b18aea6376 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -752,6 +752,14 @@ CopyFrom(CopyFromState cstate)
 		ti_options |= TABLE_INSERT_FROZEN;
 	}
 
+	/* Set up soft error handler for IGNORE_DATATYPE_ERRORS */
+	if (cstate->opts.ignore_datatype_errors)
+	{
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
+		escontext.details_wanted = true;
+		cstate->escontext = escontext;
+	}
+
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -987,7 +995,36 @@ CopyFrom(CopyFromState cstate)
 
 		/* Directly store the values/nulls array in the slot */
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+		{
+			if (cstate->opts.ignore_datatype_errors &&
+cstate->ignored_errors_count > 0)
+ereport(WARNING,
+		errmsg("%zd rows were skipped due to data type incompatibility",
+			   cstate->ignored_errors_count));
 			break;
+		}
+
+		/* Soft error occured, skip this tuple and log the reason */
+		if (cstate->escontext.error_occurred)
+		{
+			ErrorSaveContext new_escontext = {T_ErrorSaveContext};
+
+			/* Adjust elevel so we don't jump out */
+			

Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set

2023-09-15 Thread Daniel Gustafsson
> On 15 Sep 2023, at 11:38, bt23nguyent  wrote:
> 
> Hi,
> 
> When archive_library is set to 'basic_archive' but 
> basic_archive.archive_directory is not set, WAL archiving doesn't work and 
> only the following warning message is logged.
> 
>$ emacs $PGDATA/postgresql.conf
>archive_mode = on
>archive_library = 'basic_archive'
> 
>$ bin/pg_ctl -D $PGDATA restart
>
>WARNING:  archive_mode enabled, yet archiving is not configured
> 
> The issue here is that this warning message doesn't suggest any hint 
> regarding the cause of WAL archiving failure. In other words, I think that 
> the log message in this case should report that WAL archiving failed because 
> basic_archive.archive_directory is not set.

That doesn't seem unreasonable, and I can imagine other callbacks having the
need to give errhints as well to assist the user.

> Thus, I think it's worth implementing new patch that improves that warning 
> message, and here is the patch for that.

-basic_archive_configured(ArchiveModuleState *state)
+basic_archive_configured(ArchiveModuleState *state, const char **errmsg)

The variable name errmsg implies that it will contain the errmsg() data when it
in fact is used for errhint() data, so it should be named accordingly.

It's probably better to define the interface as ArchiveCheckConfiguredCB
functions returning an allocated string in the passed pointer which the caller
is responsible for freeing.

--
Daniel Gustafsson





pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}

2023-09-15 Thread Nazir Bilal Yavuz
Hi,

I found that pgBufferUsage.blk_{read|write}_time are zero although there
are pgBufferUsage.local_blks_{read|written}. For example, when you run
(track_io_timing should be on):

CREATE EXTENSION pg_stat_statements;
CREATE TEMP TABLE example_table (id serial PRIMARY KEY, data text);
INSERT INTO example_table (data) SELECT 'Some data'
FROM generate_series(1, 10);
UPDATE example_table SET data = 'Updated data';
SELECT query, local_blks_read, local_blks_written,
blk_read_time, blk_write_time FROM pg_stat_statements
WHERE query like '%UPDATE%';

on master:

query  | UPDATE example_table SET data = $1
local_blks_read| 467
local_blks_written | 2087
blk_read_time  | 0
blk_write_time | 0

There are two reasons of that:

1- When local_blks_{read|written} are incremented,
pgstat_count_io_op_time() is called with IOOBJECT_TEMP_RELATION. But in
pgstat_count_io_op_time(), pgBufferUsage.blk_{read|write}_time increments
are called only when io_object is IOOBJECT_RELATION. The first patch
attached fixes that.

2- in ExtendBufferedRelLocal() and in ExtendBufferedRelShared(), extend
calls increment local_blks_written and shared_blks_written respectively.
But these extends are not counted while calculating the blk_write_time. If
there is no specific reason to not do that, I think these extends needs to
be counted in blk_write_time. The second patch attached does that.

Results after applying first patch:

query  | UPDATE example_table SET data = $1
local_blks_read| 467
local_blks_written | 2087
blk_read_time  | 0.30085
blk_write_time | 1.475123

Results after applying both patches:

query  | UPDATE example_table SET data = $1
local_blks_read| 467
local_blks_written | 2087
blk_read_time  | 0.329597
blk_write_time | 4.050305

Any kind of feedback would be appreciated.

Regards,
Nazir Bilal Yavuz
Microsoft
From 96f7b82e2ec5e8f68b509ea58131fba42deac7c0 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 15 Sep 2023 11:55:43 +0300
Subject: [PATCH v1 2/2] Include IOOp IOOP_EXTENDs while calculating block
 write times

---
 src/backend/utils/activity/pgstat_io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index c8058b57962..56051fc6072 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -119,7 +119,7 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 		INSTR_TIME_SET_CURRENT(io_time);
 		INSTR_TIME_SUBTRACT(io_time, start_time);
 
-		if (io_op == IOOP_WRITE)
+		if (io_op == IOOP_WRITE || io_op == IOOP_EXTEND)
 		{
 			pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
 			if (io_object == IOOBJECT_RELATION
-- 
2.40.1

From de0f429280b24c42c951d86a468118ed09183a6e Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz 
Date: Fri, 15 Sep 2023 12:35:01 +0300
Subject: [PATCH v1 1/2] Increase pgBufferUsage.blk_{read|write}_time when
 IOObject is IOOBJECT_TEMP_RELATION

---
 src/backend/utils/activity/pgstat_io.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index eb7d35d4225..c8058b57962 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -122,13 +122,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 		if (io_op == IOOP_WRITE)
 		{
 			pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
-			if (io_object == IOOBJECT_RELATION)
+			if (io_object == IOOBJECT_RELATION
+			|| io_object == IOOBJECT_TEMP_RELATION)
 INSTR_TIME_ADD(pgBufferUsage.blk_write_time, io_time);
 		}
 		else if (io_op == IOOP_READ)
 		{
 			pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
-			if (io_object == IOOBJECT_RELATION)
+			if (io_object == IOOBJECT_RELATION
+			|| io_object == IOOBJECT_TEMP_RELATION)
 INSTR_TIME_ADD(pgBufferUsage.blk_read_time, io_time);
 		}
 
-- 
2.40.1



Re: pg_upgrade and logical replication

2023-09-15 Thread vignesh C
On Tue, 12 Sept 2023 at 18:52, Zhijie Hou (Fujitsu)
 wrote:
>
> On Monday, September 11, 2023 6:32 PM vignesh C  wrote:
> >
> >
> > The attached v7 patch has the changes for the same.
>
> Thanks for updating the patch, here are few comments:
>
>
> 1.
>
> +/*
> + * binary_upgrade_sub_replication_origin_advance
> + *
> + * Update the remote_lsn for the subscriber's replication origin.
> + */
> +Datum
> +binary_upgrade_sub_replication_origin_advance(PG_FUNCTION_ARGS)
> +{
>
> Is there any usage apart from pg_upgrade for this function, if not, I think
> we'd better move this function to pg_upgrade_support.c. If yes, I think maybe
> better to rename it to a general one.

Moved to pg_upgrade_support.c and renamed to binary_upgrade_replorigin_advance

> 2.
>
> + * Verify that all subscriptions have a valid remote_lsn and don't contain
> + * any table in srsubstate different than ready ('r').
> + */
> +static void
> +check_for_subscription_state(ClusterInfo *cluster)
>
> I think we'd better follow the same style of
> check_for_isn_and_int8_passing_mismatch() to record the invalid things in a
> file.

Modfied

>
> 3.
>
> +   if (fout->dopt->binary_upgrade && fout->remoteVersion >= 
> 10)
> +   {
> +   appendPQExpBuffer(query,
> + "SELECT 
> binary_upgrade_create_sub_rel_state('%s', %u, '%c'",
> + subrinfo->dobj.name,
>
> I think we'd better consider using appendStringLiteral or related function for
> the dobj.name here to make sure the string convertion is safe.
>

Modified

> 4.
>
> The following commit message may need update:
> "binary_upgrade_create_sub_rel_state SQL function, and also provides an
> additional LSN parameter for CREATE SUBSCRIPTION to restore the underlying
> replication origin remote LSN. "
>
> I think we have changed to another approach which doesn't provide new 
> parameter
> in DDL.

Modified

>
> 5.
> +   /* Fetch the existing tuple. */
> +   tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
> + 
> CStringGetDatum(subname));
>
> Since we don't modify the tuple here, SearchSysCache2 seems enough.
>
>
> 6.
> +   "LEFT 
> JOIN pg_catalog.pg_database d"
> +   "  ON 
> d.oid = s.subdbid "
> +   
> "WHERE coalesce(remote_lsn, '0/0') = '0/0'");
>
> For the subscriptions that were just created and finished the table sync but
> haven't applied any changes, their remote_lsn will also be 0/0. Do we
> need to report ERROR in this case ?
I will handle this in the next version.

Thanks for the comments, the v8 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm1JzqTreCUrhNu5E1gq7Q8r_u3%2BFrisyT7moOED%3DUdoCg%40mail.gmail.com

Regards,
Vignesh




Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set

2023-09-15 Thread bt23nguyent

Hi,

When archive_library is set to 'basic_archive' but 
basic_archive.archive_directory is not set, WAL archiving doesn't work 
and only the following warning message is logged.


$ emacs $PGDATA/postgresql.conf
archive_mode = on
archive_library = 'basic_archive'

$ bin/pg_ctl -D $PGDATA restart

WARNING:  archive_mode enabled, yet archiving is not configured

The issue here is that this warning message doesn't suggest any hint 
regarding the cause of WAL archiving failure. In other words, I think 
that the log message in this case should report that WAL archiving 
failed because basic_archive.archive_directory is not set. Thus, I think 
it's worth implementing new patch that improves that warning message, 
and here is the patch for that.


Best regards,
Tung Nguyendiff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c
index 4d78c31859..dedc53c367 100644
--- a/contrib/basic_archive/basic_archive.c
+++ b/contrib/basic_archive/basic_archive.c
@@ -48,7 +48,7 @@ typedef struct BasicArchiveData
 static char *archive_directory = NULL;
 
 static void basic_archive_startup(ArchiveModuleState *state);
-static bool basic_archive_configured(ArchiveModuleState *state);
+static bool basic_archive_configured(ArchiveModuleState *state, const char **errmsg);
 static bool basic_archive_file(ArchiveModuleState *state, const char *file, const char *path);
 static void basic_archive_file_internal(const char *file, const char *path);
 static bool check_archive_directory(char **newval, void **extra, GucSource source);
@@ -159,9 +159,15 @@ check_archive_directory(char **newval, void **extra, GucSource source)
  * Checks that archive_directory is not blank.
  */
 static bool
-basic_archive_configured(ArchiveModuleState *state)
+basic_archive_configured(ArchiveModuleState *state, const char **errmsg)
 {
-	return archive_directory != NULL && archive_directory[0] != '\0';
+	if (archive_directory == NULL || archive_directory[0] == '\0')
+{
+*errmsg = "WAL archiving failed because basic_archive.archive_directory is not set";
+return false;
+}
+else
+return true;
 }
 
 /*
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 46af349564..100e33f48d 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -91,6 +91,7 @@ typedef struct PgArchData
 } PgArchData;
 
 char	   *XLogArchiveLibrary = "";
+char   *msg;
 
 
 /* --
@@ -410,10 +411,11 @@ pgarch_ArchiverCopyLoop(void)
 
 			/* can't do anything if not configured ... */
 			if (ArchiveCallbacks->check_configured_cb != NULL &&
-!ArchiveCallbacks->check_configured_cb(archive_module_state))
+!ArchiveCallbacks->check_configured_cb(archive_module_state, ))
 			{
 ereport(WARNING,
-		(errmsg("archive_mode enabled, yet archiving is not configured")));
+		(errmsg("archive_mode enabled, yet archiving is not configured")),
+		(msg != NULL) ? errhint("%s", msg) : 0);
 return;
 			}
 
diff --git a/src/include/archive/archive_module.h b/src/include/archive/archive_module.h
index 679ce5a6db..7f7fbeae8a 100644
--- a/src/include/archive/archive_module.h
+++ b/src/include/archive/archive_module.h
@@ -36,7 +36,7 @@ typedef struct ArchiveModuleState
  * archive modules documentation.
  */
 typedef void (*ArchiveStartupCB) (ArchiveModuleState *state);
-typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state);
+typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state, const char **errmsg);
 typedef bool (*ArchiveFileCB) (ArchiveModuleState *state, const char *file, const char *path);
 typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state);

Re: pg_upgrade and logical replication

2023-09-15 Thread vignesh C
On Tue, 12 Sept 2023 at 14:25, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> Thank you for updating the patch! Here are some comments.
>
> Sorry if there are duplicate comments - the thread revived recently so I might
> lose my memory.
>
> 01. General
>
> Is there a possibility that apply worker on old cluster connects to the
> publisher during the upgrade? Regarding the pg_upgrade on publisher, the we
> refuse TCP/IP connections from remotes and port number is also changed, so we 
> can
> assume that subscriber does not connect to. But IIUC such settings may not 
> affect
> to the connection source, so that the apply worker may try to connect to the
> publisher. Also, is there any hazards if it happens?

Yes, there is a possibility that the apply worker gets started and new
transaction data is being synced from the publisher. I have made a fix
not to start the launcher process in binary ugprade mode as we don't
want the launcher to start apply worker during upgrade.

> 02. Upgrade functions
>
> Two functions - binary_upgrade_create_sub_rel_state and 
> binary_upgrade_sub_replication_origin_advance
> should be located at pg_upgrade_support.c. Also, CHECK_IS_BINARY_UPGRADE() 
> macro
> can be used.

Modified

> 03. Parameter combinations
>
> IIUC getSubscriptionTables() should be exitted quickly if --no-subscriptions 
> is
> specified, whereas binary_upgrade_create_sub_rel_state() is failed.

Modified

>
> 04. I failed my test
>
> I executed attached script but failed to upgrade:
>
> ```
> Restoring database schemas in the new cluster
>   postgres
> *failure*
>
> Consult the last few lines of 
> "data_N3/pg_upgrade_output.d/20230912T054546.320/log/pg_upgrade_dump_5.log" 
> for
> the probable cause of the failure.
> Failure, exiting
> ```
>
> I checked the log and found that binary_upgrade_create_sub_rel_state() does 
> not
> support skipping the fourth argument:
>
> ```
> pg_restore: from TOC entry 4059; 16384 16387 SUBSCRIPTION TABLE sub sub 
> postgres
> pg_restore: error: could not execute query: ERROR:  function 
> binary_upgrade_create_sub_rel_state(unknown, integer, unknown) does not exist
> LINE 1: SELECT binary_upgrade_create_sub_rel_state('sub', 16384, 'r'...
>^
> HINT:  No function matches the given name and argument types. You might need 
> to add explicit type casts.
> Command was: SELECT binary_upgrade_create_sub_rel_state('sub', 16384, 'r');
> ```
>
> IIUC if we allow to skip arguments, we must define wrappers like 
> pg_copy_logical_replication_slot_*.
> Another approach is that pg_dump always dumps srsublsn even if it is NULL.
Modified

The attached v8 version patch has the changes for the same.

Regards,
Vignesh
From 84dff766b98381dafd7382b7521107a9f76608dd Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Thu, 14 Sep 2023 09:59:46 +0530
Subject: [PATCH v8 1/3] Don't start launcher process in binary upgrade mode.

We don't want launcher to run in binary upgrade mode because
launcher might start apply worker which will start receiving changes
from the publisher and update the old cluster before the upgradation
is completed.
---
 src/backend/replication/logical/launcher.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index 7882fc91ce..ec8cfd40ef 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -925,6 +925,14 @@ ApplyLauncherRegister(void)
 {
 	BackgroundWorker bgw;
 
+	/*
+	 * We don't want launcher to run in binary upgrade mode because
+	 * launcher might start apply worker which will start receiving changes
+	 * from the publisher before the physical files are put in place.
+	 */
+	if (IsBinaryUpgrade)
+		return;
+
 	if (max_logical_replication_workers == 0)
 		return;
 
-- 
2.34.1

From 1d0a182cbfbab749b267e2400b17e99f88bd5571 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Thu, 7 Sep 2023 11:37:36 +0530
Subject: [PATCH v8 2/3] Preserve the full subscription's state during
 pg_upgrade

Previously, only the subscription metadata information was preserved.  Without
the list of relations and their state it's impossible to re-enable the
subscriptions without missing some records as the list of relations can only be
refreshed after enabling the subscription (and therefore starting the apply
worker).  Even if we added a way to refresh the subscription while enabling a
publication, we still wouldn't know which relations are new on the publication
side, and therefore should be fully synced, and which shouldn't.

To fix this problem, this patch teaches pg_dump to restore the content of
pg_subscription_rel from the old cluster by using
binary_upgrade_create_sub_rel_state SQL function. This is supported only
in binary upgrade mode.

The new SQL binary_upgrade_create_sub_rel_state function has the following
syntax:
SELECT binary_upgrade_create_sub_rel_state(subname text, relid oid, state char [,sublsn pg_lsn])

In the 

Re: Possibility to disable `ALTER SYSTEM`

2023-09-15 Thread Daniel Gustafsson
> On 11 Sep 2023, at 15:50, Magnus Hagander  wrote:
> 
> On Sat, Sep 9, 2023 at 5:14 PM Alvaro Herrera  wrote:
>> 
>> On 2023-Sep-08, Magnus Hagander wrote:
>> 
>>> Now, it might be that you don't care at all about the *security* side
>>> of the feature, and only care about the convenience side. But in that
>>> case, the original suggestion from Tom of using an even trigger seems
>>> like a fine enough solution?
>> 
>> ALTER SYSTEM, like all system-wide commands, does not trigger event
>> triggers.  These are per-database only.
>> 
>> https://www.postgresql.org/docs/16/event-trigger-matrix.html
> 
> Hah, didn't think of that. And yes, that's a very good point. But one
> way to fix that would be to actually make event triggers for system
> wide commands, which would then be useful for other things as well...

Wouldn't having system wide EVTs be a generic solution which could be the
infrastructure for this requested change as well as others in the same area?

--
Daniel Gustafsson





Re: Possibility to disable `ALTER SYSTEM`

2023-09-15 Thread Gabriele Bartolini
Hi Greg,

On Wed, 13 Sept 2023 at 19:10, Greg Sabino Mullane 
wrote:

> Seems to be some resistance to getting this in core, so why not just use
> an extension? I was able to create a quick POC to do just that. Hook into
> PG and look for AlterSystemStmt, throw a "Sorry, ALTER SYSTEM is not
> currently allowed" error. Put into shared_preload_libraries and you're
> done. As a bonus, works on all supported versions, so no need to wait for
> Postgres 17 - or Postgres 18/19 given the feature drift this thread is
> experiencing :)
>

As much as I would like to see your extension, I would still like to
understand why Postgres itself shouldn't solve this basic requirement
coming from the configuration management driven/Kubernetes space. It
shouldn't be a big deal to have such an option, either as a startup one or
a GUC, should it?

Thanks,
Gabriele
-- 
Gabriele Bartolini
Vice President, Cloud Native at EDB
enterprisedb.com


Re: Synchronizing slots from primary to standby

2023-09-15 Thread Peter Smith
Hi. Here are some review comments for v17-0002.

This is a WIP and a long way from complete, but I wanted to send what
I have so far (while it is still current with your latest posted
patches).

==
1. GENERAL - loop variable declaration

There are some code examples like below where the loop variable is
declared within the for. AFAIK this style of declaration is atypical
for the PG source.

+ /* Find unused worker slot. */
+ for (int i = 0; i < max_slotsync_workers; i++)

Search/Replace.

~~~

2. GENERAL - from primary

There are multiple examples in messages and comments that say "from
primary". I felt most would be better to say "from the primary".

Search/Replace.

~~~

3. GENERAL - pg_indent

There are lots of examples of function arguments like "* worker"
(with space) which changed to "*worker" (without space) in v16 and
then changed back to "* worker" with space in v17.

Can all these toggles be cleaned up by running pg_indent?

==
Commit message.

4.
This patch attempts to implement logical replication slots synchronization
from primary server to physical standby so that logical subscribers are not
blocked after failover. Now-on, all the logical replication slots created on
primary (assuming configurations are appropriate) are automatically created
on physical standbys and are synced periodically. This has been acheived
by starting slot-sync worker(s) on standby server which pings primary at
regular intervals to get the logical slots information and create/update the
slots locally.

SUGGESTION (just minor rewording)
This patch implements synchronization of logical replication slots
from the primary server to the physical standby so that logical
subscribers are not blocked after failover. All the logical
replication slots on the primary (assuming configurations are
appropriate) are automatically created on the physical standbys and
are synced periodically. Slot-sync worker(s) on the standby server
ping the primary at regular intervals to get the necessary logical
slot information and create/update the slots locally.

~

5.
For max number of slot-sync workers on standby, new GUC max_slotsync_workers
has been added, default value and max value is kept at 2 and 50 respectively.
This parameter can only be set at server start.

5a.
SUGGESTION (minor rewording)
A new GUC 'max_slotsync_workers' defines the maximum number of
slot-sync workers on the standby: default value = 2, max value = 50.
This parameter can only be set at server start

~

5b.
Actually, I think mentioning the values 2 and 50 here might be too
much detail, but I left it anyway. Consider removing that.

~~~

6.
Now replication launcher on physical standby queries primary to get list
of dbids which belong to slots mentioned in GUC 'synchronize_slot_names'.
Once it gets the dbids, if dbids < max_slotsync_workers, it starts only
that many workers and if dbids > max_slotsync_workers, it starts
max_slotsync_workers and divides the work equally among them. Each worker
is then responsible to keep on syncing the concerned logical slots belonging
to the DBs assigned to it.

~

6a.
SUGGESTION (first sentence)
Now the replication launcher on the physical standby queries primary
to get the list of dbids that belong to the...

~

6b.
"concerned" ??

~~~

7.
Let us say slots mentioned in 'synchronize_slot_names' on primary belongs to
4 DBs and say 'max_slotsync_workers' is 4, then a new worker will be launched
for each db. If a new logical slot with a different DB is found by replication
launcher, it will assign this new db to the worker handling the minimum number
of dbs currently (or first worker in case of equal count).

~

/Let us say/For example, let's say/

~~~

8.
The naptime of worker is tuned as per the activity on primary. Each worker
starts with naptime of 10ms and if no activity is observed on primary for some
time, then naptime is increased to 10sec. And if activity is observed again,
naptime is reduced back to 10ms. Each worker does it by choosing one slot
(first one assigned to it) for monitoring purpose. If there is no change
in lsn of that slot for say over 10 sync-checks, naptime is increased to 10sec
and as soon as a change is observed, naptime is reduced back to 10ms.

~

/as per the activity on primary/according to the activity on the primary/

/is observed on primary/is observed on the primary/

/Each worker does it by choosing one slot/Each worker uses one slot/

~~~

9.
If there is any change in synchronize_slot_names, then the slots which are no
longer part of it or the ones which no longer exist on primary will be dropped
by slot-sync workers on physical standbys.

~

9a.
/on primary/on the primary/

/which no longer exist/that no longer exist/

~

9b.
I didn't really understand why this says "or the ones which no longer
exist". IIUC (from prior paragraph)  such slots would already be
invalidated/removed by the sync-slot worker in due course -- i.e. we
don't need to wait for some change to the 'synchronize_slot_names'

Re: bug fix and documentation improvement about vacuumdb

2023-09-15 Thread Daniel Gustafsson
> On 15 Sep 2023, at 04:39, Kyotaro Horiguchi  wrote:
> At Thu, 14 Sep 2023 07:57:57 -0700, Nathan Bossart  
> wrote in 

>> Yeah, I think we can fix the JOIN as you suggest.  I quickly put a patch
>> together to demonstrate.  

Looks good from a quick skim.

>> We should probably add some tests...

Agreed.

> It seems to work fine. However, if we're aiming for consistent
> spacing, the "IS NULL" (two spaces in between) might be an concern.

I don't think that's a problem.  I would rather have readable C code and two
spaces in the generated SQL than contorting the C code to produce less
whitespace in a query few will read in its generated form.

--
Daniel Gustafsson





Re: [PATCH] Add inline comments to the pg_hba_file_rules view

2023-09-15 Thread Jim Jones

On 15.09.23 01:28, Michael Paquier wrote:

Yes, my suggestion was to define a new set-returning function that
takes in input a file path and that returns as one row one comment and
its line number from the configuration file.
--
Michael


Thanks!

If reading the file again is an option, perhaps a simple SQL function 
would suffice?


Something along these lines ..

CREATE OR REPLACE FUNCTION pg_read_conf_comments(text)
RETURNS TABLE (line_number int, comment text) AS $$
  SELECT lnum,
    trim(substring(line,
     nullif(strpos(line,'#'),0)+1,
         length(line)-strpos(line,'#')
    )) AS comment
  FROM unnest(string_to_array(pg_read_file($1),E'\n'))
   WITH ORDINALITY hba(line,lnum)
  WHERE trim(line) !~~ '#%' AND trim(line) <> '';
$$
STRICT LANGUAGE SQL ;


.. then we could join it with pg_hba_file_rules (or any other conf file)


SELECT type, database, user_name, address, c.comment
FROM  pg_hba_file_rules h, pg_read_conf_comments(h.file_name) c
WHERE user_name[1]='jim' AND h.line_number = c.line_number ;

 type | database | user_name |  address  | comment
--+--+---+---+-
 host | {db} | {jim} | 127.0.0.1 | foo
 host | {db} | {jim} | 127.0.0.1 | bar
 host | {db} | {jim} | 127.0.0.1 | #foo#
(3 rows)


Is it more or less what you had in mind?




Re: BufferUsage counters' values have changed

2023-09-15 Thread Karina Litskevich
Nazir, Andres, thank you both for help!

On Wed, Sep 13, 2023 at 10:10 PM Andres Freund  wrote:

> On 2023-09-13 16:04:00 +0300, Nazir Bilal Yavuz wrote:
> > local_blks_read became zero because:
> > 1_ One more cache hit. It was supposed to be local_blks_read but it is
> > local_blks_hit now. This is an assumption, I didn't check this deeply.
> > 2_ Before fcdda1e4b5, there was one local_blks_read coming from
> > buf = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, blkno,
> > RBM_ZERO_ON_ERROR, NULL) in freespace.c -> ReadBuffer_common() ->
> > pgBufferUsage.local_blks_read++.
> > But buf = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, blkno,
> > RBM_ZERO_ON_ERROR, NULL) is moved into the else case, so it didn't
> > called and local_blks_read isn't incremented.
>
> That imo is a legitimate difference / improvement. The read we previously
> did
> here was unnecessary.
>
>
> > local_blks_written is greater because of the combination of fcdda1e4b5
> > and 00d1e02be2.
> > In PG_15:
> > RelationGetBufferForTuple() -> ReadBufferBI(P_NEW, RBM_ZERO_AND_LOCK)
> > -> ReadBufferExtended() -> ReadBuffer_common() ->
> > pgBufferUsage.local_blks_written++; (called 5 times) [0]
> > In PG_16:
> > 1_ 5 of the local_blks_written is coming from:
> > RelationGetBufferForTuple() -> RelationAddBlocks() ->
> > ExtendBufferedRelBy() -> ExtendBufferedRelCommon() ->
> > ExtendBufferedRelLocal() -> pgBufferUsage.local_blks_written +=
> > extend_by; (extend_by is 1, this is called 5 times) [1]
> > 2_ 3 of the local_blks_written is coming from:
> > RelationGetBufferForTuple() -> RecordAndGetPageWithFreeSpace() ->
> > fsm_set_and_search() -> fsm_readbuf() -> fsm_extend() ->
> > ExtendBufferedRelTo() -> ExtendBufferedRelCommon() ->
> > ExtendBufferedRelLocal() -> pgBufferUsage.local_blks_written +=
> > extend_by; (extend_by is 3, this is called 1 time) [2]
> >
> > I think [0] is the same path as [1] but [2] is new. 'fsm extends'
> > wasn't counted in local_blks_written in PG_15. Calling
> > ExtendBufferedRelTo() from fsm_extend() caused 'fsm extends' to be
> > counted in local_blks_written. I am not sure which one is correct.
>
> I think it's correct to count the fsm writes here. The pg_stat_statement
> columns aren't specific to the main relation for or such... If anything it
> was
> a bug to not count them before.
>


Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/


Re: make add_paths_to_append_rel aware of startup cost

2023-09-15 Thread David Rowley
On Thu, 7 Sept 2023 at 04:37, Andy Fan  wrote:
> Currently add_paths_to_append_rel overlooked the startup cost for creating
> append path, so it may have lost some optimization chances.  After a glance,
> the following 4 identifiers can be impacted.

> - We shouldn't do the optimization if there are still more tables to join,
>   the reason is similar to has_multiple_baserels(root) in
>   set_subquery_pathlist. But the trouble here is we may inherit multiple
>   levels to build an appendrel, so I have to keep the 'top_relids' all the
>   time and compare it with PlannerInfo.all_baserels. If they are the same,
>   then it is the case we want to optimize.

I think you've likely gone to the trouble of trying to determine if
there are any joins pending because you're considering using a cheap
startup path *instead* of the cheapest total path and you don't want
to do that when some join will cause all the rows to be read thus
making the plan more expensive if a cheap startup path was picked.

Instead of doing that, why don't you just create a completely new
AppendPath containing all the cheapest_startup_paths and add that to
the append rel. You can optimise this and only do it when
rel->consider_startup is true.

Does the attached do anything less than what your patch does?

David


consider_cheapest_startup_appendpath.patch
Description: Binary data


Re: RFC: Logging plan of the running query

2023-09-15 Thread Lepikhov Andrei
On Thu, Sep 7, 2023, at 1:09 PM, torikoshia wrote:
> On 2023-09-06 11:17, James Coleman wrote:
> It seems that we can know what queries were running from the stack 
> traces you shared.
> As described above, I suspect a lock which was acquired prior to 
> ProcessLogQueryPlanInterrupt() caused the issue.
> I will try a little more to see if I can devise a way to create the same 
> situation.
Hi,
I have explored this patch and, by and large, agree with the code. But some 
questions I want to discuss:
1. Maybe add a hook to give a chance for extensions, like pg_query_state, to do 
their job?
2. In this implementation, ProcessInterrupts does a lot of work during the 
explain creation: memory allocations, pass through the plan, systables locks, 
syscache access, etc. I guess it can change the semantic meaning of 'safe 
place' where CHECK_FOR_INTERRUPTS can be called - I can imagine a SELECT query, 
which would call a stored procedure, which executes some DDL and acquires row 
exclusive lock at the time of interruption. So, in my mind, it is too 
unpredictable to make the explain in the place of interruption processing. 
Maybe it is better to think about some hook at the end of ExecProcNode, where a 
pending explain could be created?

Also, I suggest minor code change with the diff in attachment. 

-- 
Regards,
Andrei Lepikhov

improve.diff
Description: Binary data


Re: Bug fix for psql's meta-command \ev

2023-09-15 Thread Kyotaro Horiguchi
At Fri, 15 Sep 2023 11:37:46 +0900, Ryoga Yoshida 
 wrote in 
> I think this is a bug in psql's \ev meta-command. Even when \ev fails,
> it should not leave the garbage string in psql's query buffer and the
> following query should be completed successfully.

Good catch! I agree to this.

> This problem can be resolved by resetting the query buffer on
> error. You can see the attached source code. After that, it will
> result in output like the following:

While exec_command_ef_ev() currently preserves the existing content of
the query buffer in case of certain failures, This behavior doesn't
seem to be particularly significant, especially given that both \ef
and \ev are intended to overwrite the query buffer on success.

We have the option to fix get_create_object_cmd() and ensure
exec_command_ef_ev() retains the existing content of the query buffer
on failure. However, this approach seems like overly cumbersome. So
I'm +1 to this approach.

A comment might be necessary to clarify that we need to wipe out the
query buffer because it could be overwritten with an incomplete query
string due to certain failures.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-09-15 Thread Amit Kapila
On Fri, Sep 15, 2023 at 8:43 AM Hayato Kuroda (Fujitsu)
 wrote:
>

Few comments:
1. Why is the FPI record (XLOG_FPI_FOR_HINT) not considered a record
to be ignored? This can be generated during reading system tables.

2.
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
{
...
+ if (initial_record)
+ {
+ /* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
+   XLOG_CHECKPOINT_SHUTDOWN))
+ result = false;
...
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+ result = false;
...
}

Isn't it better to immediately return false if any unexpected WAL is
found? This will avoid reading unnecessary WAL

3.
+Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
+{
...
+
+ CHECK_IS_BINARY_UPGRADE;
+
+ /* Quick exit if the given lsn is larger than current one */
+ if (start_lsn >= curr_lsn)
+ PG_RETURN_BOOL(true);

Why do you return true here? My understanding was if the first record
is not a shutdown checkpoint record then it should fail, if that is
not true then I think we need to explain the same in comments.

-- 
With Regards,
Amit Kapila.